WorldWideScience

Sample records for region sequence analysis

  1. Sequence analysis of mitochondrial DNA hypervariable region III of ...

    African Journals Online (AJOL)

    The aims of this research were to study mitochondrial DNA hypervariable region III and establish the degree of variation characteristic of a fragment. The mitochondrial DNA (mtDNA) is a small circular genome located within the mitochondria in the cytoplasm of the cell and a smaller 1.2 kb pair fragment, called the control ...

  2. Porcine MYF6 gene: sequence, homology analysis, and variation in the promoter region.

    Science.gov (United States)

    Wyszyńska-Koko, J; Kurył, J

    2004-01-01

    MYF6 gene codes for the bHLH transcription factor belonging to MyoD family. Its expression accompanies the processes of differentiation and maturation of myotubes during embriogenesis and continues on a relatively high level after birth, affecting the muscle phenotype. The porcine MYF6 gene was amplified and sequenced and compared with MYF6 gene sequences of other species. The amino acid sequence was deduced and an interspecies homology analysis was performed. Myf-6 protein shows a high conservation among species of 99 and 97% identity when comparing pig with cow and human, respectively, and of 93% when comparing pig with mouse and rat. The single nucleotide polymorphism (SNP) was revealed within the promoter region, which appeared to be T --> C transition recognized by a MspI restriction enzyme.

  3. Region segmentation along image sequence

    International Nuclear Information System (INIS)

    Monchal, L.; Aubry, P.

    1995-01-01

    A method to extract regions in sequence of images is proposed. Regions are not matched from one image to the following one. The result of a region segmentation is used as an initialization to segment the following and image to track the region along the sequence. The image sequence is exploited as a spatio-temporal event. (authors). 12 refs., 8 figs

  4. DNA Barcoding: Amplification and sequence analysis of rbcl and matK genome regions in three divergent plant species

    Directory of Open Access Journals (Sweden)

    Javed Iqbal Wattoo

    2016-11-01

    Full Text Available Background: DNA barcoding is a novel method of species identification based on nucleotide diversity of conserved sequences. The establishment and refining of plant DNA barcoding systems is more challenging due to high genetic diversity among different species. Therefore, targeting the conserved nuclear transcribed regions would be more reliable for plant scientists to reveal genetic diversity, species discrimination and phylogeny. Methods: In this study, we amplified and sequenced the chloroplast DNA regions (matk+rbcl of Solanum nigrum, Euphorbia helioscopia and Dalbergia sissoo to study the functional annotation, homology modeling and sequence analysis to allow a more efficient utilization of these sequences among different plant species. These three species represent three families; Solanaceae, Euphorbiaceae and Fabaceae respectively. Biological sequence homology and divergence of amplified sequences was studied using Basic Local Alignment Tool (BLAST. Results: Both primers (matk+rbcl showed good amplification in three species. The sequenced regions reveled conserved genome information for future identification of different medicinal plants belonging to these species. The amplified conserved barcodes revealed different levels of biological homology after sequence analysis. The results clearly showed that the use of these conserved DNA sequences as barcode primers would be an accurate way for species identification and discrimination. Conclusion: The amplification and sequencing of conserved genome regions identified a novel sequence of matK in native species of Solanum nigrum. The findings of the study would be applicable in medicinal industry to establish DNA based identification of different medicinal plant species to monitor adulteration.

  5. Sequence analysis of the canine mitochondrial DNA control region from shed hair samples in criminal investigations.

    Science.gov (United States)

    Berger, C; Berger, B; Parson, W

    2012-01-01

    In recent years, evidence from domestic dogs has increasingly been analyzed by forensic DNA testing. Especially, canine hairs have proved most suitable and practical due to the high rate of hair transfer occurring between dogs and humans. Starting with the description of a contamination-free sample handling procedure, we give a detailed workflow for sequencing hypervariable segments (HVS) of the mtDNA control region from canine evidence. After the hair material is lysed and the DNA extracted by Phenol/Chloroform, the amplification and sequencing strategy comprises the HVS I and II of the canine control region and is optimized for DNA of medium-to-low quality and quantity. The sequencing procedure is based on the Sanger Big-dye deoxy-terminator method and the separation of the sequencing reaction products is performed on a conventional multicolor fluorescence detection capillary electrophoresis platform. Finally, software-aided base calling and sequence interpretation are addressed exemplarily.

  6. Sequencing analysis reveals a unique gene organization in the gyrB region of Mycoplasma hominis

    DEFF Research Database (Denmark)

    Ladefoged, Søren; Christiansen, Gunna

    1994-01-01

    of which showed similarity to that which encodes the LicA protein of Haemophilus influenzae. The organization of the genes in the region showed no resemblance to that in the corresponding regions of other bacteria sequenced so far. The gyrA gene was mapped 35 kb downstream from the gyrB gene.......The homolog of the gyrB gene, which has been reported to be present in the vicinity of the initiation site of replication in bacteria, was mapped on the Mycoplasma hominis genome, and the region was subsequently sequenced. Five open reading frames were identified flanking the gyrB gene, one...

  7. DNA sequence analysis of the photosynthesis region of Rhodobacter sphaeroides 2.4.1T

    OpenAIRE

    Choudhary, M.; Kaplan, Samuel

    2000-01-01

    This paper describes the DNA sequence of the photosynthesis region of Rhodobacter sphaeroides 2.4.1T. The photosynthesis gene cluster is located within a ~73 kb AseI genomic DNA fragment containing the puf, puhA, cycA and puc operons. A total of 65 open reading frames (ORFs) have been identified, of which 61 showed significant similarity to genes/proteins of other organisms while only four did not reveal any significant sequence similarity to any gene/protein sequences in the database. The da...

  8. Multifractal Detrended Fluctuation Analysis of Regional Precipitation Sequences Based on the CEEMDAN-WPT

    Science.gov (United States)

    Liu, Dong; Cheng, Chen; Fu, Qiang; Liu, Chunlei; Li, Mo; Faiz, Muhammad Abrar; Li, Tianxiao; Khan, Muhammad Imran; Cui, Song

    2018-03-01

    In this paper, the complete ensemble empirical mode decomposition with the adaptive noise (CEEMDAN) algorithm is introduced into the complexity research of precipitation systems to improve the traditional complexity measure method specific to the mode mixing of the Empirical Mode Decomposition (EMD) and incomplete decomposition of the ensemble empirical mode decomposition (EEMD). We combined the CEEMDAN with the wavelet packet transform (WPT) and multifractal detrended fluctuation analysis (MF-DFA) to create the CEEMDAN-WPT-MFDFA, and used it to measure the complexity of the monthly precipitation sequence of 12 sub-regions in Harbin, Heilongjiang Province, China. The results show that there are significant differences in the monthly precipitation complexity of each sub-region in Harbin. The complexity of the northwest area of Harbin is the lowest and its predictability is the best. The complexity and predictability of the middle and Midwest areas of Harbin are about average. The complexity of the southeast area of Harbin is higher than that of the northwest, middle, and Midwest areas of Harbin and its predictability is worse. The complexity of Shuangcheng is the highest and its predictability is the worst of all the studied sub-regions. We used terrain and human activity as factors to analyze the causes of the complexity of the local precipitation. The results showed that the correlations between the precipitation complexity and terrain are obvious, and the correlations between the precipitation complexity and human influence factors vary. The distribution of the precipitation complexity in this area may be generated by the superposition effect of human activities and natural factors such as terrain, general atmospheric circulation, land and sea location, and ocean currents. To evaluate the stability of the algorithm, the CEEMDAN-WPT-MFDFA was compared with the equal probability coarse graining LZC algorithm, fuzzy entropy, and wavelet entropy. The results show

  9. Sequence analysis and typing of Saprolegnia strains isolated from freshwater fish from Southern Chinese regions

    Directory of Open Access Journals (Sweden)

    Siya Liu

    2017-09-01

    Full Text Available Saprolegniasis, caused by Saprolegnia infection, is one of the most common diseases in freshwater fish. Our study aimed to determine the epidemiological characteristics of saprolegniasis in Chinese regions of high incidence. Saprolegnia were isolated and identified by morphological and molecular methods targeting the internal transcribed spacer (ITS ribosomal DNA (rDNA and building neighbor-joining (NJ and maximum parsimony (MP phylogenetic trees. The ITS sequences of eight isolated strains were compared with GenBank sequences and all strains fell into three clades: CLADE1 (02, LP, 04 and 14, CLADE2 (S1, and CLADE3 (CP, S2, L5 and the reference ATCC200013. Isolates 02 and LP shared 80% sequence similarity with S. diclina, S. longicaulis, S. ferax, S. mixta, and S. anomalies. Further, isolates 04 and 14 shared 80% similarity with S. bulbosa and S. oliviae. Finally, extremely high ITS sequence similarities were identified between isolates S1 and S. australis (100%; CP and S. hypogyna (96%; and S2, L5, ATCC200013 and S. salmonis (98%. This research provides insights into the identification, prevention and control of saprolegniasis pathogens and the potential development of effective drugs.

  10. Genetic Diversity of Toxoplasma gondii Strains from Different Hosts and Geographical Regions by Sequence Analysis of GRA20 Gene.

    Science.gov (United States)

    Ning, Hong-Rui; Huang, Si-Yang; Wang, Jin-Lei; Xu, Qian-Ming; Zhu, Xing-Quan

    2015-06-01

    Toxoplasma gondii is a eukaryotic parasite of the phylum Apicomplexa, which infects all warm-blood animals, including humans. In the present study, we examined sequence variation in dense granule 20 (GRA20) genes among T. gondii isolates collected from different hosts and geographical regions worldwide. The complete GRA20 genes were amplified from 16 T. gondii isolates using PCR, sequence were analyzed, and phylogenetic reconstruction was analyzed by maximum parsimony (MP) and maximum likelihood (ML) methods. The results showed that the complete GRA20 gene sequence was 1,586 bp in length among all the isolates used in this study, and the sequence variations in nucleotides were 0-7.9% among all strains. However, removing the type III strains (CTG, VEG), the sequence variations became very low, only 0-0.7%. These results indicated that the GRA20 sequence in type III was more divergence. Phylogenetic analysis of GRA20 sequences using MP and ML methods can differentiate 2 major clonal lineage types (type I and type III) into their respective clusters, indicating the GRA20 gene may represent a novel genetic marker for intraspecific phylogenetic analyses of T. gondii.

  11. Identification of similar regions of protein structures using integrated sequence and structure analysis tools

    Directory of Open Access Journals (Sweden)

    Heiland Randy

    2006-03-01

    Full Text Available Abstract Background Understanding protein function from its structure is a challenging problem. Sequence based approaches for finding homology have broad use for annotation of both structure and function. 3D structural information of protein domains and their interactions provide a complementary view to structure function relationships to sequence information. We have developed a web site http://www.sblest.org/ and an API of web services that enables users to submit protein structures and identify statistically significant neighbors and the underlying structural environments that make that match using a suite of sequence and structure analysis tools. To do this, we have integrated S-BLEST, PSI-BLAST and HMMer based superfamily predictions to give a unique integrated view to prediction of SCOP superfamilies, EC number, and GO term, as well as identification of the protein structural environments that are associated with that prediction. Additionally, we have extended UCSF Chimera and PyMOL to support our web services, so that users can characterize their own proteins of interest. Results Users are able to submit their own queries or use a structure already in the PDB. Currently the databases that a user can query include the popular structural datasets ASTRAL 40 v1.69, ASTRAL 95 v1.69, CLUSTER50, CLUSTER70 and CLUSTER90 and PDBSELECT25. The results can be downloaded directly from the site and include function prediction, analysis of the most conserved environments and automated annotation of query proteins. These results reflect both the hits found with PSI-BLAST, HMMer and with S-BLEST. We have evaluated how well annotation transfer can be performed on SCOP ID's, Gene Ontology (GO ID's and EC Numbers. The method is very efficient and totally automated, generally taking around fifteen minutes for a 400 residue protein. Conclusion With structural genomics initiatives determining structures with little, if any, functional characterization

  12. Sequence analysis of the breakpoint regions of an X;5 translocation in a female with Duchenne muscular dystrophy

    Energy Technology Data Exchange (ETDEWEB)

    Bakel, I. van; Holt, S.; Craig, I. [Univ. of Oxford (United Kingdom)] [and others

    1995-08-01

    X;autosome translocations in females with Duchenne muscular dystrophy (DMD) provide an opportunity to study the mechanisms responsible for chromosomal rearrangements that occur in the germ line. We describe here a detailed molecular analysis of the translocation breakpoints of an X;autosome reciprocal translocation, t(X;5) (p21;q31.1), in a female with DMD. Cosmid clones that contained the X-chromosome breakpoint region were identified, and subclones that hybridized to the translocation junction fragment in restriction digests of the patient`s DNA were isolated and sequenced. Primers designed from the X-chromosomal sequence were used to obtain the junction fragments on the der(X) and the der(5) by inverse PCR. The resultant clones were also cloned and sequenced, and this information used to isolate the chromosome 5 breakpoint region. Comparison of the DNA sequences of the junction fragments with those of the breakpoint regions on chromosomes X and 5 revealed that the translocation arose by nonhomologous recombination with an imprecise reciprocal exchange. Four and six base pairs of unknown origin are inserted at the exchange points of the der(X) and der(5), respectively, and three nucleotides are deleted from the X-chromosome sequence. Two features were found that may have played a role in the generation of the translocation. These were (1) a repeat motif with an internal homopyrimidine stretch 10 bp upstream from the X-chromosome breakpoint and (2) a 9-bp sequence of 78% homology located near the breakpoints on chromosomes 5 and X. 32 refs., 4 figs., 2 tabs.

  13. Sequence Analysis of How Disability Influenced Life Trajectories in a Past Population from the Nineteenth-Century Sundsvall Region, Sweden

    Directory of Open Access Journals (Sweden)

    Lotta Vikström

    2017-05-01

    Full Text Available Historically, little is known about whether and to what extent disabled people found work and formed families. To fill this gap, this study analyses the life course trajectories of both disabled and non-disabled individuals, between the ages of 15 and 33, from the Sundsvall region in Sweden during the nineteenth century. Having access to micro-data that report disabilities in a population of 8,874 individuals from the parish registers digitised by the Demographic Data Base, Umeå University, we employ sequence analysis on a series of events that are expected to occur in life of young adults: getting a job, marrying and becoming a parent, while also taking into account out-migration and death. Through this method we obtain a holistic picture of the life course of disabled people. Main findings show that their trajectories did not include work or family to the same extent as those of non-disabled people. Secondary findings concerning migration and mortality indicate that the disabled rarely out-migrated from the region, and they suffered from premature deaths. To our knowledge this is the first study to employ sequence analysis on a substantially large number of cases to provide demographic evidence of how disability shaped human trajectories in the past during an extended period of life. Accordingly, we detail our motivation for this method, describe our analytical approach, and discuss the advantages and disadvantages associated with sequence analysis for our case study.

  14. Designing a Bioengine for Detection and Analysis of Base String on an Affected Sequence in High-Concentration Regions

    Directory of Open Access Journals (Sweden)

    Debnath Bhattacharyya

    2013-01-01

    Full Text Available We design an Algorithm for bioengine. As a program are enable optimal alignments searching between two sequences, the host sequence (normal plant as well as query sequence (virus. Searching for homologues has become a routine operation of biological sequences in 4 × 4 combination with different subsequence (word size. This program takes the advantage of the high degree of homology between such sequences to construct an alignment of the matching regions. There is a main aim which is to detect the overlapping reading frames. This program also enables to find out the highly infected colones selection highest matching region with minimum gap or mismatch zones and unique virus colones matches. This is a small, portable, interactive, front-end program intended to be used to find out the regions of matching between host sequence and query subsequences. All the operations are carried out in fraction of seconds, depending on the required task and on the sequence length.

  15. Designing a Bioengine for Detection and Analysis of Base String on an Affected Sequence in High-Concentration Regions

    Science.gov (United States)

    Mandal, Bijoy Kumar; Kim, Tai-hoon

    2013-01-01

    We design an Algorithm for bioengine. As a program are enable optimal alignments searching between two sequences, the host sequence (normal plant) as well as query sequence (virus). Searching for homologues has become a routine operation of biological sequences in 4 × 4 combination with different subsequence (word size). This program takes the advantage of the high degree of homology between such sequences to construct an alignment of the matching regions. There is a main aim which is to detect the overlapping reading frames. This program also enables to find out the highly infected colones selection highest matching region with minimum gap or mismatch zones and unique virus colones matches. This is a small, portable, interactive, front-end program intended to be used to find out the regions of matching between host sequence and query subsequences. All the operations are carried out in fraction of seconds, depending on the required task and on the sequence length. PMID:24000321

  16. Sequence analysis of the its-2 region: a tool to identify strains of Scenedesmus (Chlorophyceae)

    NARCIS (Netherlands)

    Van Hannen, E.J.; Lürling, M.; Van Donk, E.

    2000-01-01

    The genetic distances between several strains of Senedesmus obliquus (Turp,) Kutz,, S, acutus Hortobagyi, and S, naegelii Chod. calculated from ITS-2 sequences were found to be smaller than the genetic distances within other strains of Scenedesmus-that is, in S, acuminatus (Lagerh,) Chod, and S,

  17. Nucleotide sequence analysis of regions of adenovirus 5 DNA containing the origins of DNA replication

    International Nuclear Information System (INIS)

    Steenbergh, P.H.

    1979-01-01

    The purpose of the investigations described is the determination of nucleotide sequences at the molecular ends of the linear adenovirus type 5 DNA. Knowledge of the primary structure at the termini of this DNA molecule is of particular interest in the study of the mechanism of replication of adenovirus DNA. The initiation- and termination sites of adenovirus DNA replication are located at the ends of the DNA molecule. (Auth.)

  18. Sequencing and association analysis of the type 1 diabetes – linked region on chromosome 10p12-q11

    Directory of Open Access Journals (Sweden)

    Barratt Bryan J

    2007-05-01

    Full Text Available Abstract Background In an effort to locate susceptibility genes for type 1 diabetes (T1D several genome-wide linkage scans have been undertaken. A chromosomal region designated IDDM10 retained genome-wide significance in a combined analysis of the main linkage scans. Here, we studied sequence polymorphisms in 23 Mb on chromosome 10p12-q11, including the putative IDDM10 region, to identify genes associated with T1D. Results Initially, we resequenced the functional candidate genes, CREM and SDF1, located in this region, genotyped 13 tag single nucleotide polymorphisms (SNPs and found no association with T1D. We then undertook analysis of the whole 23 Mb region. We constructed and sequenced a contig tile path from two bacterial artificial clone libraries. By comparison with a clone library from an unrelated person used in the Human Genome Project, we identified 12,058 SNPs. We genotyped 303 SNPs and 25 polymorphic microsatellite markers in 765 multiplex T1D families and followed up 22 associated polymorphisms in up to 2,857 families. We found nominal evidence of association in six loci (P = 0.05 – 0.0026, located near the PAPD1 gene. Therefore, we resequenced 38.8 kb in this region, found 147 SNPs and genotyped 84 of them in the T1D families. We also tested 13 polymorphisms in the PAPD1 gene and in five other loci in 1,612 T1D patients and 1,828 controls from the UK. Overall, only the D10S193 microsatellite marker located 28 kb downstream of PAPD1 showed nominal evidence of association in both T1D families and in the case-control sample (P = 0.037 and 0.03, respectively. Conclusion We conclude that polymorphisms in the CREM and SDF1 genes have no major effect on T1D. The weak T1D association that we detected in the association scan near the PAPD1 gene may be either false or due to a small genuine effect, and cannot explain linkage at the IDDM10 region.

  19. Sequence requirements of the HIV-1 protease flap region determined by saturation mutagenesis and kinetic analysis of flap mutants

    Science.gov (United States)

    Shao, Wei; Everitt, Lorraine; Manchester, Marianne; Loeb, Daniel D.; Hutchison, Clyde A.; Swanstrom, Ronald

    1997-01-01

    The retroviral proteases (PRs) have a structural feature called the flap, which consists of a short antiparallel β-sheet with a turn. The flap extends over the substrate binding cleft and must be flexible to allow entry and exit of the polypeptide substrates and products. We analyzed the sequence requirements of the amino acids within the flap region (positions 46–56) of the HIV-1 PR. The phenotypes of 131 substitution mutants were determined using a bacterial expression system. Four of the mutant PRs with mutations in different regions of the flap were selected for kinetic analysis. Our phenotypic analysis, considered in the context of published structures of the HIV-1 PR with a bound substrate analogs, shows that: (i) Met-46 and Phe-53 participate in hydrophobic interactions on the solvent-exposed face of the flap; (ii) Ile-47, Ile-54, and Val-56 participate in hydrophobic interactions on the inner face of the flap; (iii) Ile-50 has hydrophobic interactions at the distance of both the δ and γ carbons; (iv) the three glycine residues in the β-turn of the flap are virtually intolerant of substitutions. Among these mutant PRs, we have identified changes in both kcat and Km. These results establish the nature of the side chain requirements at each position in the flap and document a role for the flap in both substrate binding and catalysis. PMID:9122179

  20. Sequence analysis of the L protein of the Ebola 2014 outbreak: Insight into conserved regions and mutations.

    Science.gov (United States)

    Ayub, Gohar; Waheed, Yasir

    2016-06-01

    The 2014 Ebola outbreak was one of the largest that have occurred; it started in Guinea and spread to Nigeria, Liberia and Sierra Leone. Phylogenetic analysis of the current virus species indicated that this outbreak is the result of a divergent lineage of the Zaire ebolavirus. The L protein of Ebola virus (EBOV) is the catalytic subunit of the RNA‑dependent RNA polymerase complex, which, with VP35, is key for the replication and transcription of viral RNA. Earlier sequence analysis demonstrated that the L protein of all non‑segmented negative‑sense (NNS) RNA viruses consists of six domains containing conserved functional motifs. The aim of the present study was to analyze the presence of these motifs in 2014 EBOV isolates, highlight their function and how they may contribute to the overall pathogenicity of the isolates. For this purpose, 81 2014 EBOV L protein sequences were aligned with 475 other NNS RNA viruses, including Paramyxoviridae and Rhabdoviridae viruses. Phylogenetic analysis of all EBOV outbreak L protein sequences was also performed. Analysis of the amino acid substitutions in the 2014 EBOV outbreak was conducted using sequence analysis. The alignment demonstrated the presence of previously conserved motifs in the 2014 EBOV isolates and novel residues. Notably, all the mutations identified in the 2014 EBOV isolates were tolerant, they were pathogenic with certain examples occurring within previously determined functional conserved motifs, possibly altering viral pathogenicity, replication and virulence. The phylogenetic analysis demonstrated that all sequences with the exception of the 2014 EBOV sequences were clustered together. The 2014 EBOV outbreak has acquired a great number of mutations, which may explain the reasons behind this unprecedented outbreak. Certain residues critical to the function of the polymerase remain conserved and may be targets for the development of antiviral therapeutic agents.

  1. Direct, rapid RNA sequence analysis

    International Nuclear Information System (INIS)

    Peattie, D.A.

    1987-01-01

    The original methods of RNA sequence analysis were based on enzymatic production and chromatographic separation of overlapping oligonucleotide fragments from within an RNA molecule followed by identification of the mononucleotides comprising the oligomer. Over the past decade the field of nucleic acid sequencing has changed dramatically, however, and RNA molecules now can be sequenced in a variety of more streamlined fashions. Most of the more recent advances in RNA sequencing have involved one-dimensional electrophoretic separation of 32 P-end-labeled oligoribonucleotides on polyacrylamide gels. In this chapter the author discusses two of these methods for determining the nucleotide sequences of RNA molecules rapidly: the chemical method and the enzymatic method. Both methods are direct and degradative, i.e., they rely on fragmatic and chemical approaches should be utilized. The single-strand-specific ribonucleases (A, T 1 , T 2 , and S 1 ) provide an efficient means to locate double-helical regions rapidly, and the chemical reactions provide a means to determine the RNA sequence within these regions. In addition, the chemical reactions allow one to assign interactions to specific atoms and to distinguish secondary interactions from tertiary ones. If the RNA molecule is small enough to be sequenced directly by the enzymatic or chemical method, the probing reactions can be done easily at the same time as sequencing reactions

  2. Biological sequence analysis

    DEFF Research Database (Denmark)

    Durbin, Richard; Eddy, Sean; Krogh, Anders Stærmose

    This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis, and phylogene...

  3. Introduction of the Python script STRinNGS for analysis of STR regions in FASTQ or BAM files and expansion of the Danish STR sequence database to 11 STRs

    DEFF Research Database (Denmark)

    Friis, Susanne L; Buchard, Anders; Rockenbauer, Eszter

    2016-01-01

    This work introduces the in-house developed Python application STRinNGS for analysis of STR sequence elements in BAM or FASTQ files. STRinNGS identifies sequence reads with STR loci by their flanking sequences, it analyses the STR sequence and the flanking regions, and generates a report with the......This work introduces the in-house developed Python application STRinNGS for analysis of STR sequence elements in BAM or FASTQ files. STRinNGS identifies sequence reads with STR loci by their flanking sequences, it analyses the STR sequence and the flanking regions, and generates a report...

  4. Species composition of the genus Saprolegnia in fin fish aquaculture environments, as determined by nucleotide sequence analysis of the nuclear rDNA ITS regions.

    Science.gov (United States)

    de la Bastide, Paul Y; Leung, Wai Lam; Hintz, William E

    2015-01-01

    The ITS region of the rDNA gene was compared for Saprolegnia spp. in order to improve our understanding of nucleotide sequence variability within and between species of this genus, determine species composition in Canadian fin fish aquaculture facilities, and to assess the utility of ITS sequence variability in genetic marker development. From a collection of more than 400 field isolates, ITS region nucleotide sequences were studied and it was determined that there was sufficient consistent inter-specific variation to support the designation of species identity based on ITS sequence data. This non-subjective approach to species identification does not rely upon transient morphological features. Phylogenetic analyses comparing our ITS sequences and species designations with data from previous studies generally supported the clade scheme of Diéguez-Uribeondo et al. (2007) and found agreement with the molecular taxonomic cluster system of Sandoval-Sierra et al. (2014). Our Canadian ITS sequence collection will thus contribute to the public database and assist the clarification of Saprolegnia spp. taxonomy. The analysis of ITS region sequence variability facilitated genus- and species-level identification of unknown samples from aquaculture facilities and provided useful information on species composition. A unique ITS-RFLP for the identification of S. parasitica was also described. Copyright © 2014 The British Mycological Society. Published by Elsevier Ltd. All rights reserved.

  5. The effects of alignment quality, distance calculation method, sequence filtering, and region on the analysis of 16S rRNA gene-based studies.

    Directory of Open Access Journals (Sweden)

    Patrick D Schloss

    Full Text Available Pyrosequencing of PCR-amplified fragments that target variable regions within the 16S rRNA gene has quickly become a powerful method for analyzing the membership and structure of microbial communities. This approach has revealed and introduced questions that were not fully appreciated by those carrying out traditional Sanger sequencing-based methods. These include the effects of alignment quality, the best method of calculating pairwise genetic distances for 16S rRNA genes, whether it is appropriate to filter variable regions, and how the choice of variable region relates to the genetic diversity observed in full-length sequences. I used a diverse collection of 13,501 high-quality full-length sequences to assess each of these questions. First, alignment quality had a significant impact on distance values and downstream analyses. Specifically, the greengenes alignment, which does a poor job of aligning variable regions, predicted higher genetic diversity, richness, and phylogenetic diversity than the SILVA and RDP-based alignments. Second, the effect of different gap treatments in determining pairwise genetic distances was strongly affected by the variation in sequence length for a region; however, the effect of different calculation methods was subtle when determining the sample's richness or phylogenetic diversity for a region. Third, applying a sequence mask to remove variable positions had a profound impact on genetic distances by muting the observed richness and phylogenetic diversity. Finally, the genetic distances calculated for each of the variable regions did a poor job of correlating with the full-length gene. Thus, while it is tempting to apply traditional cutoff levels derived for full-length sequences to these shorter sequences, it is not advisable. Analysis of beta-diversity metrics showed that each of these factors can have a significant impact on the comparison of community membership and structure. Taken together, these results

  6. Sequence analysis of the internal transcribed spacer (ITS) region reveals a novel clade of Ichthyophonus sp. from rainbow trout.

    Science.gov (United States)

    Rasmussen, C; Purcell, M K; Gregg, J L; LaPatra, S E; Winton, J R; Hershberger, P K

    2010-03-09

    The mesomycetozoean parasite Ichthyophonus hoferi is most commonly associated with marine fish hosts but also occurs in some components of the freshwater rainbow trout Oncorhynchus mykiss aquaculture industry in Idaho, USA. It is not certain how the parasite was introduced into rainbow trout culture, but it might have been associated with the historical practice of feeding raw, ground common carp Cyprinus carpio that were caught by commercial fisherman. Here, we report a major genetic division between west coast freshwater and marine isolates of Ichthyophonus hoferi. Sequence differences were not detected in 2 regions of the highly conserved small subunit (18S) rDNA gene; however, nucleotide variation was seen in internal transcribed spacer loci (ITS1 and ITS2), both within and among the isolates. Intra-isolate variation ranged from 2.4 to 7.6 nucleotides over a region consisting of approximately 740 bp. Majority consensus sequences from marine/anadromous hosts differed in only 0 to 3 nucleotides (99.6 to 100% nucleotide identity), while those derived from freshwater rainbow trout had no nucleotide substitutions relative to each other. However, the consensus sequences between isolates from freshwater rainbow trout and those from marine/anadromous hosts differed in 13 to 16 nucleotides (97.8 to 98.2% nucleotide identity).

  7. Diversity analysis of Bemisia tabaci biotypes: RAPD, PCR-RFLP and sequencing of the ITS1 rDNA region

    OpenAIRE

    Rabello, Aline R.; Queiroz, Paulo R.; Simões, Kenya C.C.; Hiragi, Cássia O.; Lima, Luzia H.C.; Oliveira, Maria Regina V.; Mehta, Angela

    2008-01-01

    The Bemisia tabaci complex is formed by approximately 41 biotypes, two of which (B and BR) occur in Brazil. In this work we aimed at obtaining genetic markers to assess the genetic diversity of the different biotypes. In order to do that we analyzed Bemisia tabaci biotypes B, BR, Q and Cassava using molecular techniques including RAPD, PCR-RFLP and sequencing of the ITS1 rDNA region. The analyses revealed a high similarity between the individuals of the B and Q biotypes, which could be distin...

  8. Sequence analysis of the 3’-untranslated region of HSP70 (type I genes in the genus Leishmania: its usefulness as a molecular marker for species identification

    Directory of Open Access Journals (Sweden)

    Requena Jose M

    2012-04-01

    Full Text Available Abstract Background The Leishmaniases are a group of clinically diverse diseases caused by parasites of the genus Leishmania. To distinguish between species is crucial for correct diagnosis and prognosis as well as for treatment decisions. Recently, sequencing of the HSP70 coding region has been applied in phylogenetic studies and for identifying of Leishmania species with excellent results. Methods In the present study, we analyzed the 3’-untranslated region (UTR of Leishmania HSP70-type I gene from 24 strains representing eleven Leishmania species in the belief that this non-coding region would have a better discriminatory capacity for species typing than coding regions. Results It was observed that there was a remarkable degree of sequence conservation in this region, even between species of the subgenus Leishmania and Viannia. In addition, the presence of many microsatellites was a common feature of the 3´-UTR of HSP70-I genes in the Leishmania genus. Finally, we constructed dendrograms based on global sequence alignments of the analyzed Leishmania species and strains, the results indicated that this particular region of HSP70 genes might be useful for species (or species complex typing, improving for particular species the discrimination capacity of phylogenetic trees based on HSP70 coding sequences. Given the large size variation of the analyzed region between the Leishmania and Viannia subgenera, direct visualization of the PCR amplification product would allow discrimination between subgenera, and a HaeIII-PCR-RFLP analysis might be used for differentiating some species within each subgenera. Conclusions Sequence and phylogenetic analyses indicated that this region, which is readily amplified using a single pair of primers from both Old and New World Leishmania species, might be useful as a molecular marker for species discrimination.

  9. Image sequence analysis

    CERN Document Server

    1981-01-01

    The processing of image sequences has a broad spectrum of important applica­ tions including target tracking, robot navigation, bandwidth compression of TV conferencing video signals, studying the motion of biological cells using microcinematography, cloud tracking, and highway traffic monitoring. Image sequence processing involves a large amount of data. However, because of the progress in computer, LSI, and VLSI technologies, we have now reached a stage when many useful processing tasks can be done in a reasonable amount of time. As a result, research and development activities in image sequence analysis have recently been growing at a rapid pace. An IEEE Computer Society Workshop on Computer Analysis of Time-Varying Imagery was held in Philadelphia, April 5-6, 1979. A related special issue of the IEEE Transactions on Pattern Anal­ ysis and Machine Intelligence was published in November 1980. The IEEE Com­ puter magazine has also published a special issue on the subject in 1981. The purpose of this book ...

  10. Population genetic structure of skipjack tuna Katsuwonus pelamis from the Indian coast using sequence analysis of the mitochondrial DNA D-loop region

    Digital Repository Service at National Institute of Oceanography (India)

    Menezes, M.R.; Kumar, G.; Kunal, S.P.

    Biology (2012) 80, 2198–2212 doi:10.1111/j.1095-8649.2012.03270.x, available online at wileyonlinelibrary.com Population genetic structure of skipjack tuna Katsuwonus pelamis from the Indian coast using sequence analysis of the mitochondrial DNA D...-loop region M. R. Menezes*, G. Kumar and S. P. Kunal Biological Oceanography Division, National Institute of Oceanography (CSIR), Dona Paula, Goa 403 004, India (Received 26 May 2011, Accepted 14 February 2012) Genetic structure of skipjack tuna Katsuwonus...

  11. [Cloning and sequence analysis of the DHBV genome of the brown ducks in Guilin region and establishment of the quantitative method for detecting DHBV].

    Science.gov (United States)

    Su, He-Ling; Huang, Ri-Dong; He, Song-Qing; Xu, Qing; Zhu, Hua; Mo, Zhi-Jing; Liu, Qing-Bo; Liu, Yong-Ming

    2013-03-01

    Brown ducks carrying DHBV were widely used as hepatitis B animal model in the research of the activity and toxicity of anti-HBV dugs. Studies showed that the ratio of DHBV carriers in the brown ducks in Guilin region was relatively high. Nevertheless, the characters of the DHBV genome of Guilin brown duck remain unknown. Here we report the cloning of the genome of Guilin brown duck DHBV and the sequence analysis of the genome. The full length of the DHBV genome of Guilin brown duck was 3 027bp. Analysis using ORF finder found that there was an ORF for an unknown peptide other than S-ORF, PORF and C-ORF in the genome of the DHBV. Vector NTI 8. 0 analysis revealed that the unknown peptide contained a motif which binded to HLA * 0201. Aligning with the DHBV sequences from different countries and regions indicated that there were no obvious differences of regional distribution among the sequences. A fluorescence quantitative PCR for detecting DHBV was establishment based on the recombinant plasmid pGEM-DHBV-S constructed. This study laid the groundwork for using Guilin brown duck as a hepatitis B animal model.

  12. Optimization of sequence alignment for simple sequence repeat regions

    Directory of Open Access Journals (Sweden)

    Ogbonnaya Francis C

    2011-07-01

    Full Text Available Abstract Background Microsatellites, or simple sequence repeats (SSRs, are tandemly repeated DNA sequences, including tandem copies of specific sequences no longer than six bases, that are distributed in the genome. SSR has been used as a molecular marker because it is easy to detect and is used in a range of applications, including genetic diversity, genome mapping, and marker assisted selection. It is also very mutable because of slipping in the DNA polymerase during DNA replication. This unique mutation increases the insertion/deletion (INDELs mutation frequency to a high ratio - more than other types of molecular markers such as single nucleotide polymorphism (SNPs. SNPs are more frequent than INDELs. Therefore, all designed algorithms for sequence alignment fit the vast majority of the genomic sequence without considering microsatellite regions, as unique sequences that require special consideration. The old algorithm is limited in its application because there are many overlaps between different repeat units which result in false evolutionary relationships. Findings To overcome the limitation of the aligning algorithm when dealing with SSR loci, a new algorithm was developed using PERL script with a Tk graphical interface. This program is based on aligning sequences after determining the repeated units first, and the last SSR nucleotides positions. This results in a shifting process according to the inserted repeated unit type. When studying the phylogenic relations before and after applying the new algorithm, many differences in the trees were obtained by increasing the SSR length and complexity. However, less distance between different linage had been observed after applying the new algorithm. Conclusions The new algorithm produces better estimates for aligning SSR loci because it reflects more reliable evolutionary relations between different linages. It reduces overlapping during SSR alignment, which results in a more realistic

  13. Analysis of HIV-1 intersubtype recombination breakpoints suggests region with high pairing probability may be a more fundamental factor than sequence similarity affecting HIV-1 recombination.

    Science.gov (United States)

    Jia, Lei; Li, Lin; Gui, Tao; Liu, Siyang; Li, Hanping; Han, Jingwan; Guo, Wei; Liu, Yongjian; Li, Jingyun

    2016-09-21

    With increasing data on HIV-1, a more relevant molecular model describing mechanism details of HIV-1 genetic recombination usually requires upgrades. Currently an incomplete structural understanding of the copy choice mechanism along with several other issues in the field that lack elucidation led us to perform an analysis of the correlation between breakpoint distributions and (1) the probability of base pairing, and (2) intersubtype genetic similarity to further explore structural mechanisms. Near full length sequences of URFs from Asia, Europe, and Africa (one sequence/patient), and representative sequences of worldwide CRFs were retrieved from the Los Alamos HIV database. Their recombination patterns were analyzed by jpHMM in detail. Then the relationships between breakpoint distributions and (1) the probability of base pairing, and (2) intersubtype genetic similarities were investigated. Pearson correlation test showed that all URF groups and the CRF group exhibit the same breakpoint distribution pattern. Additionally, the Wilcoxon two-sample test indicated a significant and inexplicable limitation of recombination in regions with high pairing probability. These regions have been found to be strongly conserved across distinct biological states (i.e., strong intersubtype similarity), and genetic similarity has been determined to be a very important factor promoting recombination. Thus, the results revealed an unexpected disagreement between intersubtype similarity and breakpoint distribution, which were further confirmed by genetic similarity analysis. Our analysis reveals a critical conflict between results from natural HIV-1 isolates and those from HIV-1-based assay vectors in which genetic similarity has been shown to be a very critical factor promoting recombination. These results indicate the region with high-pairing probabilities may be a more fundamental factor affecting HIV-1 recombination than sequence similarity in natural HIV-1 infections. Our

  14. Genomic relationships of Actinobacillus pleuropneumoniae serotype 2 strains evaluated by ribotyping, sequence analysis of ribosomal intergenic regions, and pulsed-field gel electrophoresis

    DEFF Research Database (Denmark)

    Fussing, V.

    1998-01-01

    The aim of the present study was to examine the genomic relationship among 112 Actinobacillus pleuropneumoniae serotype 2 strains obtained throughout Europe and North America. HindIII ribotyping of the strains resulted in five ribotypes of high similarity (87-98%). Sequence analysis of the riboso......The aim of the present study was to examine the genomic relationship among 112 Actinobacillus pleuropneumoniae serotype 2 strains obtained throughout Europe and North America. HindIII ribotyping of the strains resulted in five ribotypes of high similarity (87-98%). Sequence analysis...... of the ribosomal intergenic region of strains representing each ribotype and each country showed no differences. A common ribotype was further characterized by PFGE of 12 strains representing all countries. The resultant five PFGE patterns of European strains showed a similarity of more than 91%, to which the two...

  15. Phylogenetic relationships within the cyst-forming nematodes (Nematoda, Heteroderidae) based on analysis of sequences from the ITS regions of ribosomal DNA.

    Science.gov (United States)

    Subbotin, S A; Vierstraete, A; De Ley, P; Rowe, J; Waeyenberge, L; Moens, M; Vanfleteren, J R

    2001-10-01

    The ITS1, ITS2, and 5.8S gene sequences of nuclear ribosomal DNA from 40 taxa of the family Heteroderidae (including the genera Afenestrata, Cactodera, Heterodera, Globodera, Punctodera, Meloidodera, Cryphodera, and Thecavermiculatus) were sequenced and analyzed. The ITS regions displayed high levels of sequence divergence within Heteroderinae and compared to outgroup taxa. Unlike recent findings in root knot nematodes, ITS sequence polymorphism does not appear to complicate phylogenetic analysis of cyst nematodes. Phylogenetic analyses with maximum-parsimony, minimum-evolution, and maximum-likelihood methods were performed with a range of computer alignments, including elision and culled alignments. All multiple alignments and phylogenetic methods yielded similar basic structure for phylogenetic relationships of Heteroderidae. The cyst-forming nematodes are represented by six main clades corresponding to morphological characters and host specialization, with certain clades assuming different positions depending on alignment procedure and/or method of phylogenetic inference. Hypotheses of monophyly of Punctoderinae and Heteroderinae are, respectively, strongly and moderately supported by the ITS data across most alignments. Close relationships were revealed between the Avenae and the Sacchari groups and between the Humuli group and the species H. salixophila within Heteroderinae. The Goettingiana group occupies a basal position within this subfamily. The validity of the genera Afenestrata and Bidera was tested and is discussed based on molecular data. We conclude that ITS sequence data are appropriate for studies of relationships within the different species groups and less so for recovery of more ancient speciations within Heteroderidae. Copyright 2001 Academic Press.

  16. Identification of genome-wide non-canonical spliced regions and analysis of biological functions for spliced sequences using Read-Split-Fly.

    Science.gov (United States)

    Bai, Yongsheng; Kinne, Jeff; Ding, Lizhong; Rath, Ethan C; Cox, Aaron; Naidu, Siva Dharman

    2017-10-03

    It is generally thought that most canonical or non-canonical splicing events involving U2- and U12 spliceosomes occur within nuclear pre-mRNAs. However, the question of whether at least some U12-type splicing occurs in the cytoplasm is still unclear. In recent years next-generation sequencing technologies have revolutionized the field. The "Read-Split-Walk" (RSW) and "Read-Split-Run" (RSR) methods were developed to identify genome-wide non-canonical spliced regions including special events occurring in cytoplasm. As the significant amount of genome/transcriptome data such as, Encyclopedia of DNA Elements (ENCODE) project, have been generated, we have advanced a newer more memory-efficient version of the algorithm, "Read-Split-Fly" (RSF), which can detect non-canonical spliced regions with higher sensitivity and improved speed. The RSF algorithm also outputs the spliced sequences for further downstream biological function analysis. We used open access ENCODE project RNA-Seq data to search spliced intron sequences against the U12-type spliced intron sequence database to examine whether some events could occur as potential signatures of U12-type splicing. The check was performed by searching spliced sequences against 5'ss and 3'ss sequences from the well-known orthologous U12-type spliceosomal intron database U12DB. Preliminary results of searching 70 ENCODE samples indicated that the presence of 5'ss with U12-type signature is more frequent than U2-type and prevalent in non-canonical junctions reported by RSF. The selected spliced sequences have also been further studied using miRBase to elucidate their functionality. Preliminary results from 70 samples of ENCODE datasets show that several miRNAs are prevalent in studied ENCODE samples. Two of these are associated with many diseases as suggested in the literature. Specifically, hsa-miR-1273 and hsa-miR-548 are associated with many diseases and cancers. Our RSF pipeline is able to detect many possible junctions

  17. Comparison of PCR-RFLP pattern with sequencing analysis of the ITS region of Hyrcanain\\'s Tilia

    Directory of Open Access Journals (Sweden)

    Hamed Yousefzadeh

    2014-01-01

    T. hyrcana and T. rubra from Hyrcanian's origin, but it could not separate T. begonifloia from the other hyrcanian species. In this respect, derived results were similar to sequencing one. In conclusion, with regard to less expensive and less time consuming PCR-RFLP technique and high similarity between its result with sequencing, we recommend this method as a simple and economical method with relatively high efficiency studding plant phylogeny.

  18. Sequence analysis of the Epstein-Barr virus (EBV) latent membrane protein-1 gene and promoter region

    DEFF Research Database (Denmark)

    Sandvej, Kristian; Gratama, J W; Munch, M

    1997-01-01

    Sequence variations in the Epstein-Barr virus (EBV) encoded latent membrane protein-1 (LMP-1) gene have been described in a Chinese nasopharyngeal carcinoma-derived isolate (CAO), and in viral isolates from various EBV-associated tumors. It has been suggested that these genetic changes, which...... include loss of a Xho I restriction site (position 169425) and a C-terminal 30-base pair (bp) deletion (position 168287-168256), define EBV genotypes associated with increased tumorigenicity or with disease among particular geographic populations. To determine the frequency of LMP-1 variations in European...... wild-type virus isolates, we sequenced the LMP-1 promoter and gene in EBV from lymphoblastoid cell lines from healthy carriers and patients without EBV-associated disease. Sequence changes were often present, and defined at least four main groups of viral isolates, which we designate Groups A through D...

  19. Sequencing analysis of ghrelin gene 5' flanking region: relations between the sequence variants, fasting plasma total ghrelin concentrations, and body mass index.

    Science.gov (United States)

    Vartiainen, Johanna; Kesäniemi, Y Antero; Ukkola, Olavi

    2006-10-01

    Ghrelin is a 28-amino-acid peptide with several functions linked to energy metabolism. Low ghrelin plasma concentrations are associated with obesity, hypertension, and type 2 diabetes mellitus, whereas high concentrations reflect states of negative energy balance. Several studies addressing the hormonal and neural regulation of ghrelin gene expression have been carried out, but the role of genetic factors in the regulation of ghrelin plasma levels remains unclear. To elucidate the role of genetic factors in the regulation of ghrelin expression, we screened 1657 nucleotides of the ghrelin gene 5' flanking region (promoter and possible regulatory sites) for new sequential variations from patient samples with low (n = 50) and high (n = 50) fasting plasma total ghrelin concentrations (low- and high-ghrelin groups). Eleven single nucleotide polymorphisms (SNPs), 3 of which were rare variants (allelic frequency less than 1%) were found in our population. The genotype distribution patterns of the SNPs did not differ between the study groups, except for SNP-501A>C (P = .039). In addition, the SNP-01A>C was associated with body mass index (BMI) (P = .018). This variant was studied further in our large and well-defined Oulu Project Elucidating Risk for Atherosclerosis (OPERA) cohort (n = 1045) by the restriction fragment length polymorphism (RFLP) technique. No significant association of SNP-501A>C genotypes with fasting ghrelin plasma concentrations was found in the whole OPERA population. However, the association of this SNP with BMI and with waist circumference reached statistical significance in OPERA (P = .047 and .049, respectively), remaining of borderline significance for BMI after adjustments (P = .055). The results indicate that factors other than the 11 SNPs found in this study in the 5' flanking region of ghrelin gene are the main determinants of ghrelin plasma levels. However, SNP-501 A>C genotype distribution seems to be different in subjects having the highest

  20. 'Mitominis': multiplex PCR analysis of reduced size amplicons for compound sequence analysis of the entire mtDNA control region in highly degraded samples.

    Science.gov (United States)

    Eichmann, Cordula; Parson, Walther

    2008-09-01

    The traditional protocol for forensic mitochondrial DNA (mtDNA) analyses involves the amplification and sequencing of the two hypervariable segments HVS-I and HVS-II of the mtDNA control region. The primers usually span fragment sizes of 300-400 bp each region, which may result in weak or failed amplification in highly degraded samples. Here we introduce an improved and more stable approach using shortened amplicons in the fragment range between 144 and 237 bp. Ten such amplicons were required to produce overlapping fragments that cover the entire human mtDNA control region. These were co-amplified in two multiplex polymerase chain reactions and sequenced with the individual amplification primers. The primers were carefully selected to minimize binding on homoplasic and haplogroup-specific sites that would otherwise result in loss of amplification due to mis-priming. The multiplexes have successfully been applied to ancient and forensic samples such as bones and teeth that showed a high degree of degradation.

  1. Phylogenetic Analysis of a ?Jewel Orchid? Genus Goodyera (Orchidaceae) Based on DNA Sequence Data from Nuclear and Plastid Regions

    OpenAIRE

    Hu, Chao; Tian, Huaizhen; Li, Hongqing; Hu, Aiqun; Xing, Fuwu; Bhattacharjee, Avishek; Hsu, Tianchuan; Kumar, Pankaj; Chung, Shihwen

    2016-01-01

    A molecular phylogeny of Asiatic species of Goodyera (Orchidaceae, Cranichideae, Goodyerinae) based on the nuclear ribosomal internal transcribed spacer (ITS) region and two chloroplast loci (matK and trnL-F) was presented. Thirty-five species represented by 132 samples of Goodyera were analyzed, along with other 27 genera/48 species, using Pterostylis longifolia and Chloraea gaudichaudii as outgroups. Bayesian inference, maximum parsimony and maximum likelihood methods were used to reveal th...

  2. Phylogenetic Analysis of a 'Jewel Orchid' Genus Goodyera (Orchidaceae) Based on DNA Sequence Data from Nuclear and Plastid Regions.

    Science.gov (United States)

    Hu, Chao; Tian, Huaizhen; Li, Hongqing; Hu, Aiqun; Xing, Fuwu; Bhattacharjee, Avishek; Hsu, Tianchuan; Kumar, Pankaj; Chung, Shihwen

    2016-01-01

    A molecular phylogeny of Asiatic species of Goodyera (Orchidaceae, Cranichideae, Goodyerinae) based on the nuclear ribosomal internal transcribed spacer (ITS) region and two chloroplast loci (matK and trnL-F) was presented. Thirty-five species represented by 132 samples of Goodyera were analyzed, along with other 27 genera/48 species, using Pterostylis longifolia and Chloraea gaudichaudii as outgroups. Bayesian inference, maximum parsimony and maximum likelihood methods were used to reveal the intrageneric relationships of Goodyera and its intergeneric relationships to related genera. The results indicate that: 1) Goodyera is not monophyletic; 2) Goodyera could be divided into four sections, viz., Goodyera, Otosepalum, Reticulum and a new section; 3) sect. Reticulum can be further divided into two subsections, viz., Reticulum and Foliosum, whereas sect. Goodyera can in turn be divided into subsections Goodyera and a new subsection.

  3. Phylogenetic Analysis of a 'Jewel Orchid' Genus Goodyera (Orchidaceae Based on DNA Sequence Data from Nuclear and Plastid Regions.

    Directory of Open Access Journals (Sweden)

    Chao Hu

    Full Text Available A molecular phylogeny of Asiatic species of Goodyera (Orchidaceae, Cranichideae, Goodyerinae based on the nuclear ribosomal internal transcribed spacer (ITS region and two chloroplast loci (matK and trnL-F was presented. Thirty-five species represented by 132 samples of Goodyera were analyzed, along with other 27 genera/48 species, using Pterostylis longifolia and Chloraea gaudichaudii as outgroups. Bayesian inference, maximum parsimony and maximum likelihood methods were used to reveal the intrageneric relationships of Goodyera and its intergeneric relationships to related genera. The results indicate that: 1 Goodyera is not monophyletic; 2 Goodyera could be divided into four sections, viz., Goodyera, Otosepalum, Reticulum and a new section; 3 sect. Reticulum can be further divided into two subsections, viz., Reticulum and Foliosum, whereas sect. Goodyera can in turn be divided into subsections Goodyera and a new subsection.

  4. Microbial rRNA sequencing analysis of evaporative cooler indoor environments located in the Great Basin Desert region of the United States†

    Science.gov (United States)

    Lemons, Angela R.; Hogan, Mary Beth; Gault, Ruth A.; Holland, Kathleen; Sobek, Edward; Olsen-Wilson, Kimberly A.; Park, Yeonmi; Park, Ju-Hyeong; Gu, Ja Kook; Kashon, Michael L.; Green, Brett J.

    2017-01-01

    Recent studies conducted in the Great Basin Desert region of the United States have shown that skin test reactivity to fungal and dust mite allergens are increased in children with asthma or allergy living in homes with evaporative coolers (EC). The objective of this study was to determine if the increased humidity previously reported in EC homes leads to varying microbial populations compared to homes with air conditioners (AC). Children with physician-diagnosed allergic rhinitis living in EC or AC environments were recruited into the study. Air samples were collected from the child's bedroom for genomic DNA extraction and metagenomic analysis of bacteria and fungi using the Illumina MiSeq sequencing platform. The analysis of bacterial populations revealed no major differences between EC and AC sampling environments. The fungal populations observed in EC homes differed from AC homes. The most prevalent species discovered in AC environments belonged to the genera Cryptococcus (20%) and Aspergillus (20%). In contrast, the most common fungi identified in EC homes belonged to the order Pleosporales and included Alternaria alternata (32%) and Phoma spp. (22%). The variations in fungal populations provide preliminary evidence of the microbial burden children may be exposed to within EC environments in this region. PMID:28091681

  5. Sequence-Stratigraphic Analysis of the Regional Observation Monitoring Program (ROMP) 29A Test Corehole and Its Relation to Carbonate Porosity and Regional Transmissivity in the Floridan Aquifer System, Highlands County, Florida

    Science.gov (United States)

    Ward, W. C.; Cunningham, K.J.; Renken, R.A.; Wacker, M.A.; Carlson, J.I.

    2003-01-01

    An analysis was made to describe and interpret the lithology of a part of the Upper Floridan aquifer penetrated by the Regional Observation Monitoring Program (ROMP) 29A test corehole in Highlands County, Florida. This information was integrated into a one-dimensional hydrostratigraphic model that delineates candidate flow zones and confining units in the context of sequence stratigraphy. Results from this test corehole will serve as a starting point to build a robust three-dimensional sequence-stratigraphic framework of the Floridan aquifer system. The ROMP 29A test corehole penetrated the Avon Park Formation, Ocala Limestone, Suwannee Limestone, and Hawthorn Group of middle Eocene to Pliocene age. The part of the Avon Park Formation penetrated in the ROMP 29A test corehole contains two composite depositional sequences. A transgressive systems tract and a highstand systems tract were interpreted for the upper composite sequence; however, only a highstand systems tract was interpreted for the lower composite sequence of the deeper Avon Park stratigraphic section. The composite depositional sequences are composed of at least five high-frequency depositional sequences. These sequences contain high-frequency cycle sets that are an amalgamation of vertically stacked high-frequency cycles. Three types of high-frequency cycles have been identified in the Avon Park Formation: peritidal, shallow subtidal, and deeper subtidal high-frequency cycles. The vertical distribution of carbonate-rock diffuse flow zones within the Avon Park Formation is heterogeneous. Porous vuggy intervals are less than 10 feet, and most are much thinner. The volumetric arrangement of the diffuse flow zones shows that most occur in the highstand systems tract of the lower composite sequence of the Avon Park Formation as compared to the upper composite sequence, which contains both a backstepping transgressive systems tract and a prograding highstand systems tract. Although the porous and permeable

  6. 16S-23S rDNA intergenic spacer region polymorphism of Lactococcus garvieae, Lactococcus raffinolactis and Lactococcus lactis as revealed by PCR and nucleotide sequence analysis.

    Science.gov (United States)

    Blaiotta, Giuseppe; Pepe, Olimpia; Mauriello, Gianluigi; Villani, Francesco; Andolfi, Rosamaria; Moschetti, Giancarlo

    2002-12-01

    The intergenic spacer region (ISR) between the 16S and 23S rRNA genes was tested as a tool for differentiating lactococci commonly isolated in a dairy environment. 17 reference strains, representing 11 different species belonging to the genera Lactococcus, Streptococcus, Lactobacillus, Enterococcus and Leuconostoc, and 127 wild streptococcal strains isolated during the whole fermentation process of "Fior di Latte" cheese were analyzed. After 16S-23S rDNA ISR amplification by PCR, species or genus-specific patterns were obtained for most of the reference strains tested. Moreover, results obtained after nucleotide analysis show that the 16S-23S rDNA ISR sequences vary greatly, in size and sequence, among Lactococcus garvieae, Lactococcus raffinolactis, Lactococcus lactis as well as other streptococci from dairy environments. Because of the high degree of inter-specific polymorphism observed, 16S-23S rDNA ISR can be considered a good potential target for selecting species-specific molecular assays, such as PCR primer or probes, for a rapid and extremely reliable differentiation of dairy lactococcal isolates.

  7. Characterization of the bovine pregnancy-associated glycoprotein gene family – analysis of gene sequences, regulatory regions within the promoter and expression of selected genes

    Directory of Open Access Journals (Sweden)

    Walker Angela M

    2009-04-01

    Full Text Available Abstract Background The Pregnancy-associated glycoproteins (PAGs belong to a large family of aspartic peptidases expressed exclusively in the placenta of species in the Artiodactyla order. In cattle, the PAG gene family is comprised of at least 22 transcribed genes, as well as some variants. Phylogenetic analyses have shown that the PAG family segregates into 'ancient' and 'modern' groupings. Along with sequence differences between family members, there are clear distinctions in their spatio-temporal distribution and in their relative level of expression. In this report, 1 we performed an in silico analysis of the bovine genome to further characterize the PAG gene family, 2 we scrutinized proximal promoter sequences of the PAG genes to evaluate the evolution pressures operating on them and to identify putative regulatory regions, 3 we determined relative transcript abundance of selected PAGs during pregnancy and, 4 we performed preliminary characterization of the putative regulatory elements for one of the candidate PAGs, bovine (bo PAG-2. Results From our analysis of the bovine genome, we identified 18 distinct PAG genes and 14 pseudogenes. We observed that the first 500 base pairs upstream of the translational start site contained multiple regions that are conserved among all boPAGs. However, a preponderance of conserved regions, that harbor recognition sites for putative transcriptional factors (TFs, were found to be unique to the modern boPAG grouping, but not the ancient boPAGs. We gathered evidence by means of Q-PCR and screening of EST databases to show that boPAG-2 is the most abundant of all boPAG transcripts. Finally, we provided preliminary evidence for the role of ETS- and DDVL-related TFs in the regulation of the boPAG-2 gene. Conclusion PAGs represent a relatively large gene family in the bovine genome. The proximal promoter regions of these genes display differences in putative TF binding sites, likely contributing to observed

  8. Integrated sequence analysis. Final report

    International Nuclear Information System (INIS)

    Andersson, K.; Pyy, P.

    1998-02-01

    The NKS/RAK subprojet 3 'integrated sequence analysis' (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term 'methodology' denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  9. Phylogenetic relationships of Scomberomorus commerson using sequence analysis of the mtDNA D-loop region in the Persian Gulf, Oman Sea and Arabian Sea

    Directory of Open Access Journals (Sweden)

    Ana Mansourkiaei

    2016-04-01

    Full Text Available Abstract Narrow-barred Spanish mackerel, Scomberomorus commerson, is an epipelagic and migratory species of family Scombridae which have a significant role in terms of ecology and fishery. 100 samples were collected from the Persian Gulf, Oman Sea and Arabian Sea. Part of their dorsal fins was snipped and transferred to micro-tubes containing ethanol; then, DNAs were extracted and HRM-Real Time PCR was performed to designate representative specimens for sequencing. Phylogenetic relationships of S. commerson from Persian Gulf, Oman Sea and Arabian Sea were investigated using sequence data of mitochondrial DNA D-loop region. None clustered Neighbor Joining tree indicated the proximity amid S. commerson in four sites. As numbers demonstrated in sequence analyses of mitochondrial DNA D-Loop region a sublimely high degree of genetic similarity among S. commerson from the Persian Gulf and Oman Sea were perceived, thereafter, having one stock structure of S. commerson in four regions were proved, and this approximation can be merely justified by their migration process along the coasts of Oman Sea and Persian Gulf. Therefore, the assessment of distribution patterns of 20 haplotypes in the constructed phylogenetic tree using mtDNA D-Loop sequences ascertained that no significant clustering according to the sampling sites was concluded.

  10. Fractals in DNA sequence analysis

    Institute of Scientific and Technical Information of China (English)

    Yu Zu-Guo(喻祖国); Vo Anh; Gong Zhi-Min(龚志民); Long Shun-Chao(龙顺潮)

    2002-01-01

    Fractal methods have been successfully used to study many problems in physics, mathematics, engineering, finance,and even in biology. There has been an increasing interest in unravelling the mysteries of DNA; for example, how can we distinguish coding and noncoding sequences, and the problems of classification and evolution relationship of organisms are key problems in bioinformatics. Although much research has been carried out by taking into consideration the long-range correlations in DNA sequences, and the global fractal dimension has been used in these works by other people, the models and methods are somewhat rough and the results are not satisfactory. In recent years, our group has introduced a time series model (statistical point of view) and a visual representation (geometrical point of view)to DNA sequence analysis. We have also used fractal dimension, correlation dimension, the Hurst exponent and the dimension spectrum (multifractal analysis) to discuss problems in this field. In this paper, we introduce these fractal models and methods and the results of DNA sequence analysis.

  11. Complete genome sequence of a Chinese isolate of pepper vein yellows virus and evolutionary analysis based on the CP, MP and RdRp coding regions.

    Science.gov (United States)

    Liu, Maoyan; Liu, Xiangning; Li, Xun; Zhang, Deyong; Dai, Liangyin; Tang, Qianjun

    2016-03-01

    The genome sequence of pepper vein yellows virus (PeVYV) (PeVYV-HN, accession number KP326573), isolated from pepper plants (Capsicum annuum L.) grown at the Hunan Vegetables Institute (Changsha, Hunan, China), was determined by deep sequencing of small RNAs. The PeVYV-HN genome consists of 6244 nucleotides, contains six open reading frames (ORFs), and is similar to that of an isolate (AB594828) from Japan. Its genomic organization is similar to that of members of the genus Polerovirus. Sequence analysis revealed that PeVYV-HN shared 92% sequence identity with the Japanese PeVYV genome at both the nucleotide and amino acid levels. Evolutionary analysis based on the coat protein (CP), movement protein (MP), and RNA-dependent RNA polymerase (RdRP) showed that PeVYV could be divided into two major lineages corresponding to their geographical origins. The Asian isolates have a higher population expansion frequency than the African isolates. Negative selection and genetic drift (founder effect) were found to be the potential drivers of the molecular evolution of PeVYV. Moreover, recombination was not the distinct cause of PeVYV evolution. This is the first report of a complete genomic sequence of PeVYV in China.

  12. Integrated sequence analysis. Final report

    Energy Technology Data Exchange (ETDEWEB)

    Andersson, K.; Pyy, P

    1998-02-01

    The NKS/RAK subprojet 3 `integrated sequence analysis` (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term `methodology` denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  13. Novel Bacteriocinogenic Lactobacillus plantarum Strains and Their Differentiation by Sequence Analysis of 16S rDNA, 16S-23S and 23S-5S Intergenic Spacer Regions and Randomly Amplified Polymorphic DNA Analysis

    Directory of Open Access Journals (Sweden)

    Morteza Shojaei Moghadam

    2010-01-01

    Full Text Available Six strains of bacteriocinogenic Lactobacillus plantarum (TL1, RG11, RS5, UL4, RG14 and RI11 isolated from Malaysian foods were investigated for their structural bacteriocin genes. A new combination of plantaricin EF and plantaricin W bacteriocin structural genes was successfully amplified from all studied strains, suggesting that they were novel bacteriocin-producing L. plantarum strains. A four-base pair variable region was detected in the short 16S-23S intergenic spacer regions of the studied strains by a comparative analysis with 17 L. plantarum strains deposited in the GenBank, implying they were new genotypes. The studied L. plantarum strains were subsequently differentiated into four groups on the basis of the detected four-base pair variable region of the short 16S-23S intergenic spacer region. Further analysis of the DNA sequence of 23S-5S intergenic spacer region revealed only one type of 23S-5S intergenic spacer region present in the studied strains, indicating it was highly conserved among the studied L. plantarum strains. Three randomly amplified polymorphic DNA experiments using three different combinations of arbitrary primers successfully differentiated the studied L. plantarum strains from each other, confirming they were different strains. In conclusion, the studied L. plantarum strains were shown to be novel bacteriocin producers and high level of strain discrimination could be achieved with a combination of randomly amplified polymorphic DNA analysis and the analysis of the variable region of short 16S-23S intergenic spacer region present in L. plantarum strains.

  14. Prevalence of Hepatitis C Virus Subgenotypes 1a and 1b in Japanese Patients: Ultra-Deep Sequencing Analysis of HCV NS5B Genotype-Specific Region

    Science.gov (United States)

    Wu, Shuang; Kanda, Tatsuo; Nakamoto, Shingo; Jiang, Xia; Miyamura, Tatsuo; Nakatani, Sueli M.; Ono, Suzane Kioko; Takahashi-Nakaguchi, Azusa; Gonoi, Tohru; Yokosuka, Osamu

    2013-01-01

    Background Hepatitis C virus (HCV) subgenotypes 1a and 1b have different impacts on the treatment response to peginterferon plus ribavirin with direct-acting antivirals (DAAs) against patients infected with HCV genotype 1, as the emergence rates of resistance mutations are different between these two subgenotypes. In Japan, almost all of HCV genotype 1 belongs to subgenotype 1b. Methods and Findings To determine HCV subgenotype 1a or 1b in Japanese patients infected with HCV genotype 1, real-time PCR-based method and Sanger method were used for the HCV NS5B region. HCV subgenotypes were determined in 90% by real-time PCR-based method. We also analyzed the specific probe regions for HCV subgenotypes 1a and 1b using ultra-deep sequencing, and uncovered mutations that could not be revealed using direct-sequencing by Sanger method. We estimated the prevalence of HCV subgenotype 1a as 1.2-2.5% of HCV genotype 1 patients in Japan. Conclusions Although real-time PCR-based HCV subgenotyping method seems fair for differentiating HCV subgenotypes 1a and 1b, it may not be sufficient for clinical practice. Ultra-deep sequencing is useful for revealing the resistant strain(s) of HCV before DAA treatment as well as mixed infection with different genotypes or subgenotypes of HCV. PMID:24069214

  15. The complete chloroplast genome sequence of Taxus chinensis var. mairei (Taxaceae): loss of an inverted repeat region and comparative analysis with related species.

    Science.gov (United States)

    Zhang, Yanzhen; Ma, Ji; Yang, Bingxian; Li, Ruyi; Zhu, Wei; Sun, Lianli; Tian, Jingkui; Zhang, Lin

    2014-05-01

    Taxus chinensis var. mairei (Taxaceae) is a domestic variety of yew species in local China. This plant is one of the sources for paclitaxel, which is a promising antineoplastic chemotherapy drugs during the last decade. We have sequenced the complete nucleotide sequence of the chloroplast (cp) genome of T. chinensis var. mairei. The T. chinensis var. mairei cp genome is 129,513 bp in length, with 113 single copy genes and two duplicated genes (trnI-CAU, trnQ-UUG). Among the 113 single copy genes, 9 are intron-containing. Compared to other land plant cp genomes, the T. chinensis var. mairei cp genome has lost one of the large inverted repeats (IRs) found in angiosperms, fern, liverwort, and gymnosperm such as Cycas revoluta and Ginkgo biloba L. Compared to related species, the gene order of T. chinensis var. mairei has a large inversion of ~110kb including 91 genes (from rps18 to accD) with gene contents unarranged. Repeat analysis identified 48 direct and 2 inverted repeats 30 bp long or longer with a sequence identity greater than 90%. Repeated short segments were found in genes rps18, rps19 and clpP. Analysis also revealed 22 simple sequence repeat (SSR) loci and almost all are composed of A or T. Copyright © 2014 Elsevier B.V. All rights reserved.

  16. Chloroplast DNA analysis of Tunisian cork oak populations (Quercus suber L.): sequence variations and molecular evolution of the trnL (UAA)-trnF (GAA) region.

    Science.gov (United States)

    Abdessamad, A; Baraket, G; Sakka, H; Ammari, Y; Ksontini, M; Hannachi, A Salhi

    2016-10-24

    Sequences of the trnL-trnF spacer and combined trnL-trnF region in chloroplast DNA of cork oak (Quercus suber L.) were analyzed to detect polymorphisms and to elucidate molecular evolution and demographic history. The aligned sequences varied in length and nucleotide composition. The overall ratio of transition/transversion (ti/tv) of 0.724 for the intergenic spacer and 0.258 for the pooled sequences were estimated, and indicated that transversions are more frequent than transitions. The molecular evolution and demographic history of Q. suber were investigated. Neutrality tests (Tajima's D and Fu and Li) ruled out the null hypothesis of a strictly neutral model, and Fu's Fs and Ramos-Onsins and Rozas' R2 confirmed the recent expansion of cork oak trees, validating its persistency in North Africa since the last glaciation during the Quaternary. The observed uni-modal mismatch distribution and the Harpending's raggedness index confirmed the demographic history model for cork oak. A phylogenetic dendrogram showed that the distribution of Q. suber trees occurs independently of geographical origin, the relief of the population site, and the bioclimatic stages. The molecular history and cytoplasmic diversity suggest that in situ and ex situ conservation strategies can be recommended for preserving landscape value and facing predictable future climatic changes.

  17. In situ optical sequencing and structure analysis of a trinucleotide repeat genome region by localization microscopy after specific COMBO-FISH nano-probing

    Science.gov (United States)

    Stuhlmüller, M.; Schwarz-Finsterle, J.; Fey, E.; Lux, J.; Bach, M.; Cremer, C.; Hinderhofer, K.; Hausmann, M.; Hildenbrand, G.

    2015-10-01

    Trinucleotide repeat expansions (like (CGG)n) of chromatin in the genome of cell nuclei can cause neurological disorders such as for example the Fragile-X syndrome. Until now the mechanisms are not clearly understood as to how these expansions develop during cell proliferation. Therefore in situ investigations of chromatin structures on the nanoscale are required to better understand supra-molecular mechanisms on the single cell level. By super-resolution localization microscopy (Spectral Position Determination Microscopy; SPDM) in combination with nano-probing using COMBO-FISH (COMBinatorial Oligonucleotide FISH), novel insights into the nano-architecture of the genome will become possible. The native spatial structure of trinucleotide repeat expansion genome regions was analysed and optical sequencing of repetitive units was performed within 3D-conserved nuclei using SPDM after COMBO-FISH. We analysed a (CGG)n-expansion region inside the 5' untranslated region of the FMR1 gene. The number of CGG repeats for a full mutation causing the Fragile-X syndrome was found and also verified by Southern blot. The FMR1 promotor region was similarly condensed like a centromeric region whereas the arrangement of the probes labelling the expansion region seemed to indicate a loop-like nano-structure. These results for the first time demonstrate that in situ chromatin structure measurements on the nanoscale are feasible. Due to further methodological progress it will become possible to estimate the state of trinucleotide repeat mutations in detail and to determine the associated chromatin strand structural changes on the single cell level. In general, the application of the described approach to any genome region will lead to new insights into genome nano-architecture and open new avenues for understanding mechanisms and their relevance in the development of heredity diseases.

  18. Phylogenetic and Genetic Analysis of D-loop and Cyt-b Region of mtDNA Sequence in Iranian Sistani, Sarabi and Brown Swiss Cows

    Directory of Open Access Journals (Sweden)

    reza valizadeh

    2016-06-01

    Full Text Available Cattle have an important role in primary human civilization, so molecular studies for more accurate recognition of their origin are effective to identify unknown historical aspects. Cattle can be divided in to 2 main groups including Bos Tuarus and Bos Indicus. Both types of cattle can be found in Iran; therefore study of their origin has particular importance. The aim of this study was to investigate the nucleotide sequences of Cytochrome-b (Cyt-b and HVR1&2 loci of D-loop gene region in mitochondrial DNA of Sistani, Sarabi and Brown Swiss breeds of cattle. Twenty blood samples of each breed, from non-relative individuals were obtained from blood bank of animal science department of Faculty of Agriculture, Ferdowsi University of Mashhad. The DNA content of sample was extracted based on the guanidinium thiocianate-silicagel method. Polymerase Chain Reaction with specific designed primers was performed to amplify Cyt-b and HVR 1&2 loci with 751 and 701 bp lengths, respectively. Sequencing of amplified Cyt-b and HVR 1&2 loci were done based on Sanger method by automatic sequencer machine (ABI 3130. Nucleotide diversity in Brown Swiss, Sarabi and Sistani breeds were estimated 0.0037, 0.0024 and 0.0029, respectively. Sequences of Cyt-b and HVR 1&2 were register in National Center for Biotechnology Institute due to nucleotide differences. Results of phylogenetic test using UPGMA for both loci showed that Sarabi and Sistani breeds are belonging to first group and Brown Swiss breed to other group.

  19. Genetic diversity and relatedness of Fasciola spp. isolates from different hosts and geographic regions revealed by analysis of mitochondrial DNA sequences.

    Science.gov (United States)

    Ai, L; Weng, Y B; Elsheikha, H M; Zhao, G H; Alasaad, S; Chen, J X; Li, J; Li, H L; Wang, C R; Chen, M X; Lin, R Q; Zhu, X Q

    2011-09-27

    The present study examined sequence variability in a portion of the mitochondrial cytochrome c oxidase subunit 1 (pcox1) and NADH dehydrogenase subunits 4 and 5 (pnad4 and pnad5) among 39 isolates of Fasciola spp., from different hosts from China, Niger, France, the United States of America, and Spain; and their phylogenetic relationships were re-constructed. Intra-species sequence variations were 0.0-1.1% for pcox1, 0.0-2.7% for pnad4, and 0.0-3.3% for pnad5 for Fasciola hepatica; 0.0-1.8% for pcox1, 0.0-2.5% for pnad4, and 0.0-4.2% for pnad5 for Fasciola gigantica, and 0.0-0.9% for pcox1, 0.0-0.2% for pnad4, and 0.0-1.1% for pnad5 for the intermediate Fasciola form. Whereas, nucleotide differences were 2.1-2.7% for pcox1, 3.1-3.3% for pnad4, and 4.2-4.8% for pnad5 between F. hepatica and F. gigantica; were 1.3-1.5% for pcox1, 2.1-2.9% for pnad4, 3.1-3.4% for pnad5 between F. hepatica and the intermediate form; and were 0.9-1.1% for pcox1, 1.4-1.8% for pnad4, 2.2-2.4% for pnad5 between F. gigantica and the intermediate form. Phylogenetic analysis based on the combined sequences of pcox1, pnad4 and pnad5 revealed distinct groupings of isolates of F. hepatica, F. gigantica, or the intermediate Fasciola form irrespective of their origin, demonstrating the usefulness of the mtDNA sequences for the delineation of Fasciola species, and reinforcing the genetic evidence for the existence of the intermediate Fasciola form. Copyright © 2011 Elsevier B.V. All rights reserved.

  20. Analysis of Pteridium ribosomal RNA sequences by rapid direct sequencing.

    Science.gov (United States)

    Tan, M K

    1991-08-01

    A total of 864 bases from 5 regions interspersed in the 18S and 26S rRNA molecules from various clones of Pteridium covering the general geographical distribution of the genus was analysed using a rapid rRNA sequencing technique. No base difference has been detected amongst the three major lineages, two of which apparently separated before the breakup of the ancient supercontinent, Pangaea. These regions of the rRNA sequences have thus been conserved for at least 160 million years and are here compared with other eukaryotic, especially plant rRNAs.

  1. [Complete genome sequencing and sequence analysis of BCG Tice].

    Science.gov (United States)

    Wang, Zhiming; Pan, Yuanlong; Wu, Jun; Zhu, Baoli

    2012-10-04

    The objective of this study is to obtain the complete genome sequence of Bacillus Calmette-Guerin Tice (BCG Tice), in order to provide more information about the molecular biology of BCG Tice and design more reasonable vaccines to prevent tuberculosis. We assembled the data from high-throughput sequencing with SOAPdenovo software, with many contigs and scaffolds obtained. There are many sequence gaps and physical gaps remained as a result of regional low coverage and low quality. We designed primers at the end of contigs and performed PCR amplification in order to link these contigs and scaffolds. With various enzymes to perform PCR amplification, adjustment of PCR reaction conditions, and combined with clone construction to sequence, all the gaps were finished. We obtained the complete genome sequence of BCG Tice and submitted it to GenBank of National Center for Biotechnology Information (NCBI). The genome of BCG Tice is 4334064 base pairs in length, with GC content 65.65%. The problems and strategies during the finishing step of BCG Tice sequencing are illuminated here, with the hope of affording some experience to those who are involved in the finishing step of genome sequencing. The microarray data were verified by our results.

  2. Sequence analysis of the MYC oncogene involved in the t(8;14)(q24;q11) chromosome translocation in a human leukemia T-cell line indicates that putative regulatory regions are not altered

    International Nuclear Information System (INIS)

    Finver, S.N.; Nishikura, K.; Finger, L.R.; Haluska, F.G.; Finan, J.; Nowell, P.C.; Croce, C.M.

    1988-01-01

    The authors cloned the translocation-associated and homologous normal MYC alleles from SKW-3, a leukemia T-cell line with the t(8; 14)(q24; q11) translocation, and determined the sequence of the MYC oncogene first exon and flanking 5' putative regulatory regions. S1 nuclease protection experiments utilizing a MYC first exon probe demonstrated transcriptional deregulation of the MYC gene associated with the T-cell receptor α locus on the 8q + chromosome of SKW-3 cells. Nucleotide sequence analysis of the translocation-associated (8q +) MYC allele identified a single base substitution within the upstream flanking region; the homologous nontranslocated allele contained an additional substitution and a two-base deletion. None of the deletions or substitutions localized to putative 5' regulatory regions. The MYC first exon sequence was germ line in both alleles. These results demonstrate that alterations within the putative 5' MYC regulatory regions are not necessarily involved in MYC deregulation in T-cell leukemias, and they show that juxtaposition of the T-cell receptor α locus to a germ-line MYC oncogene results in MYC deregulation

  3. Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment.

    Science.gov (United States)

    Nagar, Anurag; Hahsler, Michael

    2013-01-01

    Next Generation Sequencing techniques are producing enormous amounts of biological sequence data and analysis becomes a major computational problem. Currently, most analysis, especially the identification of conserved regions, relies heavily on Multiple Sequence Alignment and its various heuristics such as progressive alignment, whose run time grows with the square of the number and the length of the aligned sequences and requires significant computational resources. In this work, we present a method to efficiently discover regions of high similarity across multiple sequences without performing expensive sequence alignment. The method is based on approximating edit distance between segments of sequences using p-mer frequency counts. Then, efficient high-throughput data stream clustering is used to group highly similar segments into so called quasi-alignments. Quasi-alignments have numerous applications such as identifying species and their taxonomic class from sequences, comparing sequences for similarities, and, as in this paper, discovering conserved regions across related sequences. In this paper, we show that quasi-alignments can be used to discover highly similar segments across multiple sequences from related or different genomes efficiently and accurately. Experiments on a large number of unaligned 16S rRNA sequences obtained from the Greengenes database show that the method is able to identify conserved regions which agree with known hypervariable regions in 16S rRNA. Furthermore, the experiments show that the proposed method scales well for large data sets with a run time that grows only linearly with the number and length of sequences, whereas for existing multiple sequence alignment heuristics the run time grows super-linearly. Quasi-alignment-based algorithms can detect highly similar regions and conserved areas across multiple sequences. Since the run time is linear and the sequences are converted into a compact clustering model, we are able to

  4. Sequence analysis of Leukemia DNA

    Science.gov (United States)

    Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa

    2018-03-01

    Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.

  5. Analysis of a new strain of Euphorbia mosaic virus with distinct replication specificity unveils a lineage of begomoviruses with short Rep sequences in the DNA-B intergenic region

    Directory of Open Access Journals (Sweden)

    Argüello-Astorga Gerardo R

    2010-10-01

    Full Text Available Abstract Background Euphorbia mosaic virus (EuMV is a member of the SLCV clade, a lineage of New World begomoviruses that display distinctive features in their replication-associated protein (Rep and virion-strand replication origin. The first entirely characterized EuMV isolate is native from Yucatan Peninsula, Mexico; subsequently, EuMV was detected in weeds and pepper plants from another region of Mexico, and partial DNA-A sequences revealed significant differences in their putative replication specificity determinants with respect to EuMV-YP. This study was aimed to investigate the replication compatibility between two EuMV isolates from the same country. Results A new isolate of EuMV was obtained from pepper plants collected at Jalisco, Mexico. Full-length clones of both genomic components of EuMV-Jal were biolistically inoculated into plants of three different species, which developed symptoms indistinguishable from those induced by EuMV-YP. Pseudorecombination experiments with EuMV-Jal and EuMV-YP genomic components demonstrated that these viruses do not form infectious reassortants in Nicotiana benthamiana, presumably because of Rep-iteron incompatibility. Sequence analysis of the EuMV-Jal DNA-B intergenic region (IR led to the unexpected discovery of a 35-nt-long sequence that is identical to a segment of the rep gene in the cognate viral DNA-A. Similar short rep sequences ranging from 35- to 51-nt in length were identified in all EuMV isolates and in three distinct viruses from South America related to EuMV. These short rep sequences in the DNA-B IR are positioned downstream to a ~160-nt non-coding domain highly similar to the CP promoter of begomoviruses belonging to the SLCV clade. Conclusions EuMV strains are not compatible in replication, indicating that this begomovirus species probably is not a replicating lineage in nature. The genomic analysis of EuMV-Jal led to the discovery of a subgroup of SLCV clade viruses that contain in

  6. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; Van Der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah; Siame, Kabengele Keith; Gey Van Pittius, Nicolaas Claudius; Van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-01-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  7. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan

    2015-10-21

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  8. Sub-grouping of Plasmodium falciparum 3D7 var genes based on sequence analysis of coding and non-coding regions

    DEFF Research Database (Denmark)

    Lavstsen, Thomas; Salanti, Ali; Jensen, Anja T R

    2003-01-01

    and organization of the 3D7 PfEMP1 repertoire was investigated on the basis of the complete genome sequence. METHODS: Using two tree-building methods we analysed the coding and non-coding sequences of 3D7 var and rif genes as well as var genes of other parasite strains. RESULTS: var genes can be sub...

  9. Ballooning mode second stability region for sequences of tokamak equilibria

    International Nuclear Information System (INIS)

    Sugiyama, L.; Mark, J.W.K.

    A numerical study of several sequences of tokamak equilibria derived from two flux conserving sequences confirms the tendency of high n ideal MHD ballooning modes to stabilize for values of the plasma beta greater than a second critical beta, for sufficiently favorable equilibria. The major stabilizing effect of increasing the inverse rotational transform profile q(Psi) for equilibria with the same flux surface geometry is shown. The unstable region shifts toward larger shear d ln q/d ln γ and the width of the region measured in terms of the poloidal beta or a pressure gradient parameter, for fixed shear, decreases. The smaller aspect ratio sequences are more sensitive to changes in q and have less stringent limits on the attainable value of the plasma beta in the high beta stable region. Finally, the disconnected mode approximation is shown to provide a reasonable description of the second high beta stability boundary

  10. Nuclear power regional analysis

    International Nuclear Information System (INIS)

    Parera, María Delia

    2011-01-01

    In this study, a regional analysis of the Argentine electricity market was carried out considering the effects of regional cooperation, national and international interconnections; additionally, the possibilities of insertion of new nuclear power plants in different regions were evaluated, indicating the most suitable areas for these facilities to increase the penetration of nuclear energy in national energy matrix. The interconnection of electricity markets and natural gas due to the linkage between both energy forms was also studied. With this purpose, MESSAGE program was used (Model for Energy Supply Strategy Alternatives and their General Environmental Impacts), promoted by the International Atomic Energy Agency (IAEA). This model performs a country-level economic optimization, resulting in the minimum cost for the modelling system. Regionalization executed by the Wholesale Electricity Market Management Company (CAMMESA, by its Spanish acronym) that divides the country into eight regions. The characteristics and the needs of each region, their respective demands and supplies of electricity and natural gas, as well as existing and planned interconnections, consisting of power lines and pipelines were taken into account. According to the results obtained through the model, nuclear is a competitive option. (author) [es

  11. Genome Sequencing and Analysis Conference IV

    Energy Technology Data Exchange (ETDEWEB)

    1993-12-31

    J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

  12. Colorectal Cancer Genetic Heterogeneity Delineated by Multi-Region Sequencing.

    Directory of Open Access Journals (Sweden)

    You-Wang Lu

    Full Text Available Intratumor heterogeneity (ITH leads to an underestimation of the mutational landscape portrayed by a single needle biopsy and consequently affects treatment precision. The extent of colorectal cancer (CRC genetic ITH is not well understood in Chinese patients. Thus, we conducted deep sequencing by using the OncoGxOne™ Plus panel, targeting 333 cancer-specific genes in multi-region biopsies of primary and liver metastatic tumors from three Chinese CRC patients. We determined that the extent of ITH varied among the three cases. On average, 65% of all the mutations detected were common within individual tumors. KMT2C aberrations and the NCOR1 mutation were the only ubiquitous events. Subsequent phylogenetic analysis showed that the tumors evolved in a branched manner. Comparison of the primary and metastatic tumors revealed that PPP2R1A (E370X, SETD2 (I1608V, SMAD4 (G382T, and AR splicing site mutations may be specific to liver metastatic cancer. These mutations might contribute to the initiation and progression of distant metastasis. Collectively, our analysis identified a substantial level of genetic ITH in CRC, which should be considered for personalized therapeutic strategies.

  13. Regional Shelter Analysis Methodology

    Energy Technology Data Exchange (ETDEWEB)

    Dillon, Michael B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Dennison, Deborah [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Kane, Jave [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Walker, Hoyt [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Miller, Paul [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2015-08-01

    The fallout from a nuclear explosion has the potential to injure or kill 100,000 or more people through exposure to external gamma (fallout) radiation. Existing buildings can reduce radiation exposure by placing material between fallout particles and exposed people. Lawrence Livermore National Laboratory was tasked with developing an operationally feasible methodology that could improve fallout casualty estimates. The methodology, called a Regional Shelter Analysis, combines the fallout protection that existing buildings provide civilian populations with the distribution of people in various locations. The Regional Shelter Analysis method allows the consideration of (a) multiple building types and locations within buildings, (b) country specific estimates, (c) population posture (e.g., unwarned vs. minimally warned), and (d) the time of day (e.g., night vs. day). The protection estimates can be combined with fallout predictions (or measurements) to (a) provide a more accurate assessment of exposure and injury and (b) evaluate the effectiveness of various casualty mitigation strategies. This report describes the Regional Shelter Analysis methodology, highlights key operational aspects (including demonstrating that the methodology is compatible with current tools), illustrates how to implement the methodology, and provides suggestions for future work.

  14. Using a sequence characterized amplified region (SCAR) marker for ...

    African Journals Online (AJOL)

    GREGORY

    2010-09-13

    Sep 13, 2010 ... This work used sequence characterized amplified region (SCAR) marker to detect the Bacillus cereus strain in strawberry fields. The purpose was to develop an effective molecular method for detecting the functional target microorganisms applied in agricultural fields. A 3×109. CFU/ml vegetative cell.

  15. Recurrence time statistics: versatile tools for genomic DNA sequence analysis.

    Science.gov (United States)

    Cao, Yinhe; Tung, Wen-Wen; Gao, J B

    2004-01-01

    With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.

  16. Targeted sequencing of large genomic regions with CATCH-Seq.

    Directory of Open Access Journals (Sweden)

    Kenneth Day

    Full Text Available Current target enrichment systems for large-scale next-generation sequencing typically require synthetic oligonucleotides used as capture reagents to isolate sequences of interest. The majority of target enrichment reagents are focused on gene coding regions or promoters en masse. Here we introduce development of a customizable targeted capture system using biotinylated RNA probe baits transcribed from sheared bacterial artificial chromosome clone templates that enables capture of large, contiguous blocks of the genome for sequencing applications. This clone adapted template capture hybridization sequencing (CATCH-Seq procedure can be used to capture both coding and non-coding regions of a gene, and resolve the boundaries of copy number variations within a genomic target site. Furthermore, libraries constructed with methylated adapters prior to solution hybridization also enable targeted bisulfite sequencing. We applied CATCH-Seq to diverse targets ranging in size from 125 kb to 3.5 Mb. Our approach provides a simple and cost effective alternative to other capture platforms because of template-based, enzymatic probe synthesis and the lack of oligonucleotide design costs. Given its similarity in procedure, CATCH-Seq can also be performed in parallel with commercial systems.

  17. Robustness analysis of chiller sequencing control

    International Nuclear Information System (INIS)

    Liao, Yundan; Sun, Yongjun; Huang, Gongsheng

    2015-01-01

    Highlights: • Uncertainties with chiller sequencing control were systematically quantified. • Robustness of chiller sequencing control was systematically analyzed. • Different sequencing control strategies were sensitive to different uncertainties. • A numerical method was developed for easy selection of chiller sequencing control. - Abstract: Multiple-chiller plant is commonly employed in the heating, ventilating and air-conditioning system to increase operational feasibility and energy-efficiency under part load condition. In a multiple-chiller plant, chiller sequencing control plays a key role in achieving overall energy efficiency while not sacrifices the cooling sufficiency for indoor thermal comfort. Various sequencing control strategies have been developed and implemented in practice. Based on the observation that (i) uncertainty, which cannot be avoided in chiller sequencing control, has a significant impact on the control performance and may cause the control fail to achieve the expected control and/or energy performance; and (ii) in current literature few studies have systematically addressed this issue, this paper therefore presents a study on robustness analysis of chiller sequencing control in order to understand the robustness of various chiller sequencing control strategies under different types of uncertainty. Based on the robustness analysis, a simple and applicable method is developed to select the most robust control strategy for a given chiller plant in the presence of uncertainties, which will be verified using case studies

  18. Enrichment of colorectal cancer associations in functional regions: Insight for using epigenomics data in the analysis of whole genome sequence-imputed GWAS data.

    Directory of Open Access Journals (Sweden)

    Stephanie A Bien

    Full Text Available The evaluation of less frequent genetic variants and their effect on complex disease pose new challenges for genomic research. To investigate whether epigenetic data can be used to inform aggregate rare-variant association methods (RVAM, we assessed whether variants more significantly associated with colorectal cancer (CRC were preferentially located in non-coding regulatory regions, and whether enrichment was specific to colorectal tissues.Active regulatory elements (ARE were mapped using data from 127 tissues and cell-types from NIH Roadmap Epigenomics and Encyclopedia of DNA Elements (ENCODE projects. We investigated whether CRC association p-values were more significant for common variants inside versus outside AREs, or 2 inside colorectal (CR AREs versus AREs of other tissues and cell-types. We employed an integrative epigenomic RVAM for variants with allele frequency <1%. Gene sets were defined as ARE variants within 200 kilobases of a transcription start site (TSS using either CR ARE or ARE from non-digestive tissues. CRC-set association p-values were used to evaluate enrichment of less frequent variant associations in CR ARE versus non-digestive ARE.ARE from 126/127 tissues and cell-types were significantly enriched for stronger CRC-variant associations. Strongest enrichment was observed for digestive tissues and immune cell types. CR-specific ARE were also enriched for stronger CRC-variant associations compared to ARE combined across non-digestive tissues (p-value = 9.6 × 10-4. Additionally, we found enrichment of stronger CRC association p-values for rare variant sets of CR ARE compared to non-digestive ARE (p-value = 0.029.Integrative epigenomic RVAM may enable discovery of less frequent variants associated with CRC, and ARE of digestive and immune tissues are most informative. Although distance-based aggregation of less frequent variants in CR ARE surrounding TSS showed modest enrichment, future association studies would likely

  19. Comparative sequence analysis of VRN1 alleles of Lolium perenne with the co-linear regions in barley, wheat, and rice

    DEFF Research Database (Denmark)

    Asp, Torben; Byrne, Stephen; Gundlach, Heidrun

    2011-01-01

    Vernalization, a period of low temperature to induce transition from vegetative to reproductive state, is an important environmental stimulus for many cool season grasses. A key gene in the vernalization pathway in grasses is the VRN1 gene. The objective of this study was to identify causative...... polymorphism(s) at the VRN1 locus in perennial ryegrass (Lolium perenne) for variation in vernalization requirement. Two allelic Bacterial Artificial Chromosome clones of the VRN1 locus from the two genotypes Veyo and Falster with contrasting vernalization requirements were identified, sequenced...

  20. Identification of clinically relevant nonhemolytic Streptococci on the basis of sequence analysis of 16S-23S intergenic spacer region and partial gdh gene

    DEFF Research Database (Denmark)

    Nielsen, Xiaohui Chen; Justesen, Ulrik Stenz; Dargis, Rimtas

    2009-01-01

    Nonhemolytic streptococci (NHS) cause serious infections, such as endocarditis and septicemia. Many conventional phenotypic methods are insufficient for the identification of bacteria in this group to the species level. Genetic analysis has revealed that single-gene analysis is insufficient...

  1. Probabilistic accident sequence recovery analysis

    International Nuclear Information System (INIS)

    Stutzke, Martin A.; Cooper, Susan E.

    2004-01-01

    Recovery analysis is a method that considers alternative strategies for preventing accidents in nuclear power plants during probabilistic risk assessment (PRA). Consideration of possible recovery actions in PRAs has been controversial, and there seems to be a widely held belief among PRA practitioners, utility staff, plant operators, and regulators that the results of recovery analysis should be skeptically viewed. This paper provides a framework for discussing recovery strategies, thus lending credibility to the process and enhancing regulatory acceptance of PRA results and conclusions. (author)

  2. An ongoing earthquake sequence near Dhaka, Bangladesh, from regional recordings

    Science.gov (United States)

    Howe, M.; Mondal, D. R.; Akhter, S. H.; Kim, W.; Seeber, L.; Steckler, M. S.

    2013-12-01

    Earthquakes in and around the syntaxial region between the continent-continent collision of the Himalayan arc and oceanic subduction of the Sunda arc result primarily from the convergence of India and Eurasia-Sunda plates along two fronts. The northern front, the convergence of the Indian and Eurasian plates, has produced the Himalayas. The eastern front, the convergence of the Indian and Sunda plates, ranges from ocean-continent subduction at the Andaman Arc and Burma Arc, and transitions to continent-continent collision to the north at the Assam Syntaxis in northeast India. The India-Sunda convergence at the Burma Arc is extremely oblique. The boundary-normal convergence rate is ~17 mm/yr while the boundary-parallel rate is ~45 mm/yr including the well-known Sagaing strike-slip fault, which accommodates about half the shear component. This heterogeneous tectonic setting produces multiple earthquake sources that need to be considered when assessing seismic hazard and risk in this region. The largest earthquakes, just as in other subduction systems, are expected to be interplate events that occur on the low-angle megathrusts, such as the Mw 9.2 2004 Sumatra-Andaman earthquake and the 1762 earthquake along the Arakan margin. These earthquakes are known to produce large damage over vast areas, but since they account for large fault motions they are relatively rare. The majority of current seismicity in the study area is intraplate. Most of the seismicity associated with the Burma Arc subduction system is in the down-going slab, including the shallow-dipping part below the megathrust flooring the accretionary wedge. The strike of the wedge is ~N-S and Dhaka lies at its outer limit. One particular source relevant to seismic risk in Dhaka is illuminated by a multi-year sequence of earthquakes in Bangladesh less than 100 km southeast of Dhaka. The population in Dhaka (now at least 15 million) has been increasing dramatically due to rapid urbanization. The vulnerability

  3. Simple, Low-Cost Detection of Candida parapsilosis Complex Isolates and Molecular Fingerprinting of Candida orthopsilosis Strains in Kuwait by ITS Region Sequencing and Amplified Fragment Length Polymorphism Analysis.

    Science.gov (United States)

    Asadzadeh, Mohammad; Ahmad, Suhail; Hagen, Ferry; Meis, Jacques F; Al-Sweih, Noura; Khan, Ziauddin

    2015-01-01

    Candida parapsilosis has now emerged as the second or third most important cause of healthcare-associated Candida infections. Molecular studies have shown that phenotypically identified C. parapsilosis isolates represent a complex of three species, namely, C. parapsilosis, C. orthopsilosis and C. metapsilosis. Lodderomyces elongisporus is another species phenotypically closely related to the C. parapsilosis-complex. The aim of this study was to develop a simple, low cost multiplex (m) PCR assay for species-specific identification of C. parapsilosis complex isolates and to study genetic relatedness of C. orthopsilosis isolates in Kuwait. Species-specific amplicons from C. parapsilosis (171 bp), C. orthopsilosis (109 bp), C. metapsilosis (217 bp) and L. elongisporus (258 bp) were obtained in mPCR. Clinical isolates identified as C. parapsilosis (n = 380) by Vitek2 in Kuwait and an international collection of 27 C. parapsilosis complex and L. elongisporus isolates previously characterized by rDNA sequencing were analyzed to evaluate mPCR. Species-specific PCR and DNA sequencing of internal transcribed spacer (ITS) region of rDNA were performed to validate the results of mPCR. Fingerprinting of 19 clinical C. orthopsilosis isolates (including 4 isolates from a previous study) was performed by amplified fragment length polymorphism (AFLP) analysis. Phenotypically identified C. parapsilosis isolates (n = 380) were identified as C. parapsilosis sensu stricto (n = 361), C. orthopsilosis (n = 15), C. metapsilosis (n = 1) and L. elongisporus (n = 3) by mPCR. The mPCR also accurately detected all epidemiologically unrelated C. parapsilosis complex and L. elongisporus isolates. The 19 C. orthopsilosis isolates obtained from 16 patients were divided into 3 haplotypes based on ITS region sequence data. Seven distinct genotypes were identified among the 19 C. orthopsilosis isolates by AFLP including a dominant genotype (AFLP1) comprising 11 isolates recovered from 10 patients. A

  4. MOVES regional level sensitivity analysis

    Science.gov (United States)

    2012-01-01

    The MOVES Regional Level Sensitivity Analysis was conducted to increase understanding of the operations of the MOVES Model in regional emissions analysis and to highlight the following: : the relative sensitivity of selected MOVES Model input paramet...

  5. Time fluctuation analysis of forest fire sequences

    Science.gov (United States)

    Vega Orozco, Carmen D.; Kanevski, Mikhaïl; Tonini, Marj; Golay, Jean; Pereira, Mário J. G.

    2013-04-01

    depends on the threshold which helps to understand the time pattern of the studied events. Our findings detected the presence of overdensity of events in particular time periods and showed that the forest fire sequences in Portugal can be considered as a multifractal process with a degree of time-clustering of the events. Key words: time sequences, Morisita index, fractals, multifractals, box-counting, Ripley's K-function, Allan Factor, variography, forest fires, point process. Acknowledgements This work was partly supported by the SNFS Project No. 200021-140658, "Analysis and Modelling of Space-Time Patterns in Complex Regions". References - Kanevski M. (Editor). 2008. Advanced Mapping of Environmental Data: Geostatistics, Machine Learning and Bayesian Maximum Entropy. London / Hoboken: iSTE / Wiley. - Telesca L. and Pereira M.G. 2010. Time-clustering investigation of fire temporal fluctuations in Portugal, Nat. Hazards Earth Syst. Sci., vol. 10(4): 661-666. - Vega Orozco C., Tonini M., Conedera M., Kanevski M. (2012) Cluster recognition in spatial-temporal sequences: the case of forest fires, Geoinformatica, vol. 16(4): 653-673.

  6. Tools for integrated sequence-structure analysis with UCSF Chimera

    Directory of Open Access Journals (Sweden)

    Huang Conrad C

    2006-07-01

    Full Text Available Abstract Background Comparing related structures and viewing the structures in the context of sequence alignments are important tasks in protein structure-function research. While many programs exist for individual aspects of such work, there is a need for interactive visualization tools that: (a provide a deep integration of sequence and structure, far beyond mapping where a sequence region falls in the structure and vice versa; (b facilitate changing data of one type based on the other (for example, using only sequence-conserved residues to match structures, or adjusting a sequence alignment based on spatial fit; (c can be used with a researcher's own data, including arbitrary sequence alignments and annotations, closely or distantly related sets of proteins, etc.; and (d interoperate with each other and with a full complement of molecular graphics features. We describe enhancements to UCSF Chimera to achieve these goals. Results The molecular graphics program UCSF Chimera includes a suite of tools for interactive analyses of sequences and structures. Structures automatically associate with sequences in imported alignments, allowing many kinds of crosstalk. A novel method is provided to superimpose structures in the absence of a pre-existing sequence alignment. The method uses both sequence and secondary structure, and can match even structures with very low sequence identity. Another tool constructs structure-based sequence alignments from superpositions of two or more proteins. Chimera is designed to be extensible, and mechanisms for incorporating user-specific data without Chimera code development are also provided. Conclusion The tools described here apply to many problems involving comparison and analysis of protein structures and their sequences. Chimera includes complete documentation and is intended for use by a wide range of scientists, not just those in the computational disciplines. UCSF Chimera is free for non-commercial use and is

  7. Regional energy facility siting analysis

    International Nuclear Information System (INIS)

    Eberhart, R.C.; Eagles, T.W.

    1976-01-01

    Results of the energy facility siting analysis portion of a regional pilot study performed for the anticipated National Energy Siting and Facility Report are presented. The question of cell analysis versus site-specific analysis is explored, including an evaluation of the difference in depth between the two approaches. A discussion of the possible accomplishments of regional analysis is presented. It is concluded that regional sitting analysis could be of use in a national siting study, if its inherent limits are recognized

  8. Phylogenetic Analysis of a ‘Jewel Orchid’ Genus Goodyera (Orchidaceae) Based on DNA Sequence Data from Nuclear and Plastid Regions

    Science.gov (United States)

    Hu, Chao; Tian, Huaizhen; Li, Hongqing; Hu, Aiqun; Xing, Fuwu; Bhattacharjee, Avishek; Hsu, Tianchuan; Kumar, Pankaj; Chung, Shihwen

    2016-01-01

    A molecular phylogeny of Asiatic species of Goodyera (Orchidaceae, Cranichideae, Goodyerinae) based on the nuclear ribosomal internal transcribed spacer (ITS) region and two chloroplast loci (matK and trnL-F) was presented. Thirty-five species represented by 132 samples of Goodyera were analyzed, along with other 27 genera/48 species, using Pterostylis longifolia and Chloraea gaudichaudii as outgroups. Bayesian inference, maximum parsimony and maximum likelihood methods were used to reveal the intrageneric relationships of Goodyera and its intergeneric relationships to related genera. The results indicate that: 1) Goodyera is not monophyletic; 2) Goodyera could be divided into four sections, viz., Goodyera, Otosepalum, Reticulum and a new section; 3) sect. Reticulum can be further divided into two subsections, viz., Reticulum and Foliosum, whereas sect. Goodyera can in turn be divided into subsections Goodyera and a new subsection. PMID:26927946

  9. Taxonomy and phylogeny of the genus citrus based on the nuclear ribosomal dna its region sequence

    International Nuclear Information System (INIS)

    Sun, Y.L.

    2015-01-01

    The genus Citrus (Aurantioideae, Rutaceae) is the sole source of the citrus fruits of commerce showing high economic values. In this study, the taxonomy and phylogeny of Citrus species is evaluated using sequence analysis of the ITS region of nrDNA. This study is based on 26 plants materials belonging to 22 Citrus species having wild, domesticated, and cultivated species. Through DNA alignment of the ITS sequence, ITS1 and ITS2 regions showed relatively high variations of sequence length and nucleotide among these Citrus species. According to previous six-tribe discrimination theory by Swingle and Reece, the grouping in our ITS phylogenetic tree reconstructed by ITS sequences was not related to tribe discrimination but species discrimination. However, the molecular analysis could provide more information on citrus taxonomy. Combined with ITS sequences of other subgenera in then true citrus fruit tree group, the ITS phylogenetic tree indicated subgenera Citrus was monophyletic and nearer to Fortunella, Poncirus, and Clymenia compared to Microcitrus and Eremocitrus. Abundant sequence variations of the ITS region shown in this study would help species identification and tribe differentiation of the genus Citrus. (author)

  10. Time Separation Between Events in a Sequence: a Regional Property?

    Science.gov (United States)

    Muirwood, R.; Fitzenz, D. D.

    2013-12-01

    Earthquake sequences are loosely defined as events occurring too closely in time and space to appear unrelated. Depending on the declustering method, several, all, or no event(s) after the first large event might be recognized as independent mainshocks. It can therefore be argued that a probabilistic seismic hazard assessment (PSHA, traditionally dealing with mainshocks only) might already include the ground shaking effects of such sequences. Alternatively all but the largest event could be classified as an ';aftershock' and removed from the earthquake catalog. While in PSHA the question is only whether to keep or remove the events from the catalog, for Risk Management purposes, the community response to the earthquakes, as well as insurance risk transfer mechanisms, can be profoundly affected by the actual timing of events in such a sequence. In particular the repetition of damaging earthquakes over a period of weeks to months can lead to businesses closing and families evacuating from the region (as happened in Christchurch, New Zealand in 2011). Buildings that are damaged in the first earthquake may go on to be damaged again, even while they are being repaired. Insurance also functions around a set of critical timeframes - including the definition of a single 'event loss' for reinsurance recoveries within the 192 hour ';hours clause', the 6-18 month pace at which insurance claims are settled, and the annual renewal of insurance and reinsurance contracts. We show how temporal aspects of earthquake sequences need to be taken into account within models for Risk Management, and what time separation between events are most sensitive, both in terms of the modeled disruptions to lifelines and business activity as well as in the losses to different parties (such as insureds, insurers and reinsurers). We also explore the time separation between all events and between loss causing events for a collection of sequences from across the world and we point to the need to

  11. Sequence analysis by iterated maps, a review.

    Science.gov (United States)

    Almeida, Jonas S

    2014-05-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.

  12. Preliminary hazard analysis using sequence tree method

    International Nuclear Information System (INIS)

    Huang Huiwen; Shih Chunkuan; Hung Hungchih; Chen Minghuei; Yih Swu; Lin Jiinming

    2007-01-01

    A system level PHA using sequence tree method was developed to perform Safety Related digital I and C system SSA. The conventional PHA is a brainstorming session among experts on various portions of the system to identify hazards through discussions. However, this conventional PHA is not a systematic technique, the analysis results strongly depend on the experts' subjective opinions. The analysis quality cannot be appropriately controlled. Thereby, this research developed a system level sequence tree based PHA, which can clarify the relationship among the major digital I and C systems. Two major phases are included in this sequence tree based technique. The first phase uses a table to analyze each event in SAR Chapter 15 for a specific safety related I and C system, such as RPS. The second phase uses sequence tree to recognize what I and C systems are involved in the event, how the safety related systems work, and how the backup systems can be activated to mitigate the consequence if the primary safety systems fail. In the sequence tree, the defense-in-depth echelons, including Control echelon, Reactor trip echelon, ESFAS echelon, and Indication and display echelon, are arranged to construct the sequence tree structure. All the related I and C systems, include digital system and the analog back-up systems are allocated in their specific echelon. By this system centric sequence tree based analysis, not only preliminary hazard can be identified systematically, the vulnerability of the nuclear power plant can also be recognized. Therefore, an effective simplified D3 evaluation can be performed as well. (author)

  13. Sequence Ready Characterization of the Pericentromeric Region of 19p12

    Energy Technology Data Exchange (ETDEWEB)

    Evan E. Eichler

    2006-08-31

    Current mapping and sequencing strategies have been inadequate within the proximal portion of 19p12 due, in part, to the presence of a recently expanded ZNF (zinc-finger) gene family and the presence of large (25-50 kb) inverted beta-satellite repeat structures which bracket this tandemly duplicated gene family. The virtual of absence of classically defined “unique” sequence within the region has hampered efforts to identify and characterize a suitable minimal tiling path of clones which can be used as templates required for finished sequencing of the region. The goal of this proposal is to develop and implement a novel sequence-anchor strategy to generate a contiguous BAC map of the most proximal portion of chromosome 19p12 for the purpose of complete sequence characterization. The target region will be an estimated 4.5 Mb of DNA extending from STS marker D19S450 (the beginning of the ZNF gene cluster) to the centromeric (alpha-satellite) junction of 19p11. The approach will entail 1) pre-selection of 19p12 BAC and cosmid clones (NIH approved library) utilizing both 19p12 -unique and 19p12-SPECIFIC repeat probes (Eichler et al., 1998); 2) the generation of a BAC/cosmid end-sequence map across the region with a density of one marker every 8kb; 3) the development of a second-generation of STS (sequence tagged sites) which will be used to identify and verify clonal overlap at the level of the sequence; 4) incorporation of these sequence-anchored overlapping clones into existing cosmid/BAC restriction maps developed at Livermore National Laboratory; and 5) validation of the organization of this region utilizing high-resolution FISH techniques (extended chromatin analysis) on monochromosomal 19 somatic cell hybrids and parental cell lines of source material. The data generated will be used in the selection of the most parsimonious tiling path of BAC clones to be sequenced as part of the JGI effort on chromosome 19 and should serve as a model for the sequence

  14. Large-scale chromatin immunoprecipitation with promoter sequence microarray analysis of the interaction of the NSs protein of Rift Valley fever virus with regulatory DNA regions of the host genome.

    Science.gov (United States)

    Benferhat, Rima; Josse, Thibaut; Albaud, Benoit; Gentien, David; Mansuroglu, Zeyni; Marcato, Vasco; Souès, Sylvie; Le Bonniec, Bernard; Bouloy, Michèle; Bonnefoy, Eliette

    2012-10-01

    Rift Valley fever virus (RVFV) is a highly pathogenic Phlebovirus that infects humans and ruminants. Initially confined to Africa, RVFV has spread outside Africa and presently represents a high risk to other geographic regions. It is responsible for high fatality rates in sheep and cattle. In humans, RVFV can induce hepatitis, encephalitis, retinitis, or fatal hemorrhagic fever. The nonstructural NSs protein that is the major virulence factor is found in the nuclei of infected cells where it associates with cellular transcription factors and cofactors. In previous work, we have shown that NSs interacts with the promoter region of the beta interferon gene abnormally maintaining the promoter in a repressed state. In this work, we performed a genome-wide analysis of the interactions between NSs and the host genome using a genome-wide chromatin immunoprecipitation combined with promoter sequence microarray, the ChIP-on-chip technique. Several cellular promoter regions were identified as significantly interacting with NSs, and the establishment of NSs interactions with these regions was often found linked to deregulation of expression of the corresponding genes. Among annotated NSs-interacting genes were present not only genes regulating innate immunity and inflammation but also genes regulating cellular pathways that have not yet been identified as targeted by RVFV. Several of these pathways, such as cell adhesion, axonal guidance, development, and coagulation were closely related to RVFV-induced disorders. In particular, we show in this work that NSs targeted and modified the expression of genes coding for coagulation factors, demonstrating for the first time that this hemorrhagic virus impairs the host coagulation cascade at the transcriptional level.

  15. Application of multi-station time sequence aerosol sampling and proton induced x-ray emission analysis techniques to the St. Louis regional air pollution study for investigating sulfur-trace metal relationships

    International Nuclear Information System (INIS)

    Pilotte, J.O.; Nelson, J.W.; Winchester, J.W.

    1976-01-01

    Time sequence streaker samplers, employing Nuclepore filters for aerosol collection, have been deployed over the 25-station St. Louis regional air monitoring network and operated for the months of July and August 1975 so as to determine aerosol composition variations with 2-hour time resolution. Elemental analysis of the 84 individual time steps per station for each week of sampling is carried out by 5 MeV proton irradiation and X-ray counting by Si(Li) detector, using a Van de Graaff accelerator with a special automated step drive sample handling device. Computer resolution of the X-ray spectra for the elements S, Cl, K, Ca, Ti, V, Cr, Mn, Fe Ni, Cu, Zn, Br, and Pb is carried out at a rate equal to the proton irradiation rate, five minutes or less for each time step analysis. The aerosol particle sampling equipment and conditions have been designed to take advantage of the high sensitivity of PIXE analysis, in the nanogram range for the elements determined

  16. Tracking TCRβ sequence clonotype expansions during antiviral therapy using high-throughput sequencing of the hypervariable region

    Directory of Open Access Journals (Sweden)

    Mark W Robinson

    2016-04-01

    Full Text Available To maintain a persistent infection viruses such as hepatitis C virus (HCV employ a range of mechanisms that subvert protective T cell responses. The suppression of antigen-specific T cell responses by HCV hinders efforts to profile T cell responses during chronic infection and antiviral therapy. Conventional methods of detecting antigen-specific T cells utilise either antigen stimulation (e.g. ELISpot, proliferation assays, cytokine production or antigen-loaded tetramer staining. This limits the ability to profile T cell responses during chronic infection due to suppressed effector function and the requirement for prior knowledge of antigenic viral peptide sequences. Recently high-throughput sequencing (HTS technologies have been developed for the analysis of T cell repertoires. In the present study we have assessed the feasibility of HTS of the TCRβ complementarity determining region (CDR3 to track T cell expansions in an antigen-independent manner. Using sequential blood samples from HCV-infected individuals undergoing anti-viral therapy we were able to measure the population frequencies of >35,000 TCRβ sequence clonotypes in each individual over the course of 12 weeks. TRBV/TRBJ gene segment usage varied markedly between individuals but remained relatively constant within individuals across the course of therapy. Despite this stable TRBV/TRBJ gene segment usage, a number of TCRβ sequence clonotypes showed dramatic changes in read frequency. These changes could not be linked to therapy outcomes in the present study however the TCRβ CDR3 sequences with the largest fold changes did include sequences with identical TRBV/TRBJ gene segment usage and high joining region homology to previously published CDR3 sequences from HCV-specific T cells targeting the HLA-B*0801-restricted 1395HSKKKCDEL1403 and HLA-A*0101–restricted 1435ATDALMTGY1443 epitopes. The pipeline developed in this proof of concept study provides a platform for the design of

  17. Digital image sequence processing, compression, and analysis

    CERN Document Server

    Reed, Todd R

    2004-01-01

    IntroductionTodd R. ReedCONTENT-BASED IMAGE SEQUENCE REPRESENTATIONPedro M. Q. Aguiar, Radu S. Jasinschi, José M. F. Moura, andCharnchai PluempitiwiriyawejTHE COMPUTATION OF MOTIONChristoph Stiller, Sören Kammel, Jan Horn, and Thao DangMOTION ANALYSIS AND DISPLACEMENT ESTIMATION IN THE FREQUENCY DOMAINLuca Lucchese and Guido Maria CortelazzoQUALITY OF SERVICE ASSESSMENT IN NEW GENERATION WIRELESS VIDEO COMMUNICATIONSGaetano GiuntaERROR CONCEALMENT IN DIGITAL VIDEOFrancesco G.B. De NataleIMAGE SEQUENCE RESTORATION: A WIDER PERSPECTIVEAnil KokaramVIDEO SUMMARIZATIONCuneyt M. Taskiran and Edward

  18. Sequence comparison and phylogenetic analysis of core gene of ...

    African Journals Online (AJOL)

    Phylogenetic analysis suggests that our sequences are clustered with sequences reported from Japan. This is the first phylogenetic analysis of HCV core gene from Pakistani population. Our sequences and sequences from Japan are grouped into same cluster in the phylogenetic tree. Sequence comparison and ...

  19. OTU analysis using metagenomic shotgun sequencing data.

    Directory of Open Access Journals (Sweden)

    Xiaolin Hao

    Full Text Available Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs. Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison between simulated datasets revealed a relationship between 16S rRNA gene fragments and full-length 16S rRNA sequences that a 16S rRNA gene fragment having a length >150 bp provides the same accuracy as a full-length 16S rRNA sequence using our proposed pipeline, which could serve as a good starting point for experimental design and making the comparison between 16S rRNA gene fragment-based and targeted 16S rRNA sequencing-based surveys possible.

  20. Sequence Matching Analysis for Curriculum Development

    Directory of Open Access Journals (Sweden)

    Liem Yenny Bendatu

    2015-06-01

    Full Text Available Many organizations apply information technologies to support their business processes. Using the information technologies, the actual events are recorded and utilized to conform with predefined model. Conformance checking is an approach to measure the fitness and appropriateness between process model and actual events. However, when there are multiple events with the same timestamp, the traditional approach unfit to result such measures. This study attempts to develop a sequence matching analysis. Considering conformance checking as the basis of this approach, this proposed approach utilizes the current control flow technique in process mining domain. A case study in the field of educational process has been conducted. This study also proposes a curriculum analysis framework to test the proposed approach. By considering the learning sequence of students, it results some measurements for curriculum development. Finally, the result of the proposed approach has been verified by relevant instructors for further development.

  1. Characterization of race 65 of Colletotrichum lindemuthianum by sequencing ITS regions

    Directory of Open Access Journals (Sweden)

    Marcela Coelho

    2016-09-01

    Full Text Available The present work aimed characterize isolates of C. lindemuthianum race 65 from different regions in Brazil by ITS sequencing. A total of 17 isolates of race 65, collected in the states of Mato Grosso, Minas Gerais, Paraná, Santa Catarina and São Paulo, were studied. Analysis of the sequences of isolates 8, 9, 12, 14 and 15 revealed the presence of two single nucleotide polymorphisms (SNPs in the ITS1 region at the same positions. These isolates, when analyzed together with the sequence of isolate 17, revealed a SNP in the ITS2 region. The highest genetic dissimilarity, observed between isolates 11 and  3 and between isolates 11 and 10, was 0.772. In turn, isolates 7 and 2 were the most similar, with a value of 0.002 for genetic distance. The phylogenetic tree obtained based on the sequences of the ITS1 and ITS2 regions revealed the formation of two groups, one with a subgroup. The results reveal high molecular variability among isolates of race 65 of C. lindemuthianum.

  2. Differentiation of Actinobacillus pleuropneumoniae strains by sequence analysis of 16S rDNA and ribosomal intergenic regions, and development of a species specific oligonucleotide for in situ detection

    DEFF Research Database (Denmark)

    Fussing, Vivian; Paster, Bruce J.; Dewhirst, Floyd E.

    1998-01-01

    . The larger RIS's were different between the 3 species tested. The sequence of the 16S ribosomal gene was determined for 8 serotypes of A. pleuropneumoniae. These sequences showed only minor base differences, indicating a close genetic relatedness of these serotypes within the species. An oligonucleotide DNA...... probe designed from the 16S rRNA gene sequence of A. pleuropneumoniae was specific for all strains of the target species and did not cross react with A. lignieresii, the closest known relative of A. pleuropneumoniae. This species-specific DNA probe labeled with fluorescein was used for in situ......The aims of this study were to characterize and determine intraspecies and interspecies relatedness of Actinobacillus pleuropneumoniae to Actinobacillus lignieresii and Actinobacillus suis by sequence analysis of the ribosomal operon and to find a species-specific area for in situ detection of A...

  3. Keragaman Spesies Ikan Tuna di Pasar Ikan Kedonganan Bali dengan Analisis Sekuen Kontrol Daerah Mitokondria DNA (SPECIES DIVERSITY OF TUNA FISH USING MITOCHONDRIAL DNA CONTROL REGION SEQUENCE ANALYSIS AT KEDONGANAN FISH MARKET

    Directory of Open Access Journals (Sweden)

    Daud Steven Triyomi Hariyanto

    2015-10-01

    Full Text Available Tuna is an export commodity which has very high economic value. However, some tuna speciesare threatened with extinction. The purpose of this study was to identify the tuna species that aresold in Kedonganan Fish Market. The research method was polymerase chain reaction technique(PCR using the marker sequence mitochondrial DNA control region. Samples were obtained fromthe Fish Market tuna Kedonganan, Kuta, Badung, Bali. The total number of samples are 28specimens. Sequence from each sample was obtained through sequencing techniques. Sequencesobtained were run in BLAST (Basic Local Alignment Search Tool and subsequently analyzed withMEGA 5 for species confirmation. Three species of tuna that are identified in the Kedonganan FishMarket is: Thunnus albacares, T. obesus, and Katsuwonus pelamis. All three species have highgenetic variation HD = 1. This study needed to be continued with more number of samples todetermine the species of tuna sold in Kedonganan Fish Market.

  4. Long-read sequencing data analysis for yeasts.

    Science.gov (United States)

    Yue, Jia-Xing; Liti, Gianni

    2018-06-01

    Long-read sequencing technologies have become increasingly popular due to their strengths in resolving complex genomic regions. As a leading model organism with small genome size and great biotechnological importance, the budding yeast Saccharomyces cerevisiae has many isolates currently being sequenced with long reads. However, analyzing long-read sequencing data to produce high-quality genome assembly and annotation remains challenging. Here, we present a modular computational framework named long-read sequencing data analysis for yeasts (LRSDAY), the first one-stop solution that streamlines this process. Starting from the raw sequencing reads, LRSDAY can produce chromosome-level genome assembly and comprehensive genome annotation in a highly automated manner with minimal manual intervention, which is not possible using any alternative tool available to date. The annotated genomic features include centromeres, protein-coding genes, tRNAs, transposable elements (TEs), and telomere-associated elements. Although tailored for S. cerevisiae, we designed LRSDAY to be highly modular and customizable, making it adaptable to virtually any eukaryotic organism. When applying LRSDAY to an S. cerevisiae strain, it takes ∼41 h to generate a complete and well-annotated genome from ∼100× Pacific Biosciences (PacBio) running the basic workflow with four threads. Basic experience working within the Linux command-line environment is recommended for carrying out the analysis using LRSDAY.

  5. The complementarity-determining region sequences in IgY antivenom hypervariable regions

    Directory of Open Access Journals (Sweden)

    David Gitirana da Rocha

    2017-08-01

    Full Text Available The data presented in this article are related to the research article entitled "Development of IgY antibodies against anti-snake toxins endowed with highly lethal neutralizing activity" (da Rocha et al., 2017 [1]. Complementarity-determining region (CDR sequences are variable antibody (Ab sequences that respond with specificity, duration and strength to identify and bind to antigen (Ag epitopes. B lymphocytes isolated from hens immunized with Bitis arietans (Ba and anti-Crotalus durissus terrificus (Cdt venoms and expressing high specificity, affinity and toxicity neutralizing antibody titers were used as DNA sources. The VLF1, CDR1, CDR2, VLR1 and CDR3 sequences were validated by BLASTp, and values corresponding to IgY VL and VH anti-Ba or anti-Cdt venoms were identified, registered [Gallus gallus IgY Fv Light chain (GU815099/Gallus gallus IgY Fv Heavy chain (GU815098] and used for molecular modeling of IgY scFv anti-Ba. The resulting CDR1, CDR2 and CDR3 sequences were combined to construct the three - dimensional structure of the Ab paratope.

  6. Regional climate change mitigation analysis

    Energy Technology Data Exchange (ETDEWEB)

    Rowlands, Ian H [UNEP Collaborating Centre on Energy and Environment, and Univ. of Waterloo (Canada)

    1998-10-01

    The purpose of this paper is to explore some of the key methodological issues that arise from an analysis of regional climate change mitigation options. The rationale for any analysis of regional mitigation activities, emphasising both the theoretical attractiveness and the existing political encouragement and the methodology that has been developed are reviewed. The differences arising from the fact that mitigation analyses have been taken from the level of the national - where the majority of the work has been completed to date - to the level of the international - that is, the `regional` - will be especially highlighted. (EG)

  7. Regional climate change mitigation analysis

    International Nuclear Information System (INIS)

    Rowlands, Ian H.

    1998-01-01

    The purpose of this paper is to explore some of the key methodological issues that arise from an analysis of regional climate change mitigation options. The rationale for any analysis of regional mitigation activities, emphasising both the theoretical attractiveness and the existing political encouragement and the methodology that has been developed are reviewed. The differences arising from the fact that mitigation analyses have been taken from the level of the national - where the majority of the work has been completed to date - to the level of the international - that is, the 'regional' - will be especially highlighted. (EG)

  8. FAST: FAST Analysis of Sequences Toolbox

    Directory of Open Access Journals (Sweden)

    Travis J. Lawrence

    2015-05-01

    Full Text Available FAST (FAST Analysis of Sequences Toolbox provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU’s Not Unix Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics makes FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format. Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought.

  9. Bayesian Correlation Analysis for Sequence Count Data.

    Directory of Open Access Journals (Sweden)

    Daniel Sánchez-Taltavull

    Full Text Available Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities' measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low-especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities' signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset.

  10. A basic analysis toolkit for biological sequences

    Directory of Open Access Journals (Sweden)

    Siragusa Enrico

    2007-09-01

    Full Text Available Abstract This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filtering operations to select strings from a set and establish their statistical significance, via z-score computation. None of the algorithms is new, but although they are generally regarded as fundamental for sequence analysis, they have not been implemented in a single and consistent software package, as we do here. Therefore, our main contribution is to fill this gap between algorithmic theory and practice by providing an extensible and easy to use software library that includes algorithms for the mentioned string matching and alignment problems. The library consists of C/C++ library functions as well as Perl library functions. It can be interfaced with Bioperl and can also be used as a stand-alone system with a GUI. The software is available at http://www.math.unipa.it/~raffaele/BATS/ under the GNU GPL.

  11. Nucleotide sequence determination of the region in adenovirus 5 DNA involved in cell transformation

    International Nuclear Information System (INIS)

    Maat, J.

    1978-01-01

    A description is given of investigations into the primary structure of the transforming region of adenovirus type 5 DNA. The phenomenon of cell transformation is discussed in general terms and the principles of a number of fairly recent techniques, which have been in use for DNA sequence determination since 1975 are dealt with. A few of the author's own techniques are described which deal both with nucleotide sequence analysis and with the determination of DNA cleavage sites of restriction endonucleases. The results are given of the mapping of cleavage sites in the HpaI-E fragment of adenovirus DNA of HpaII, HaeIII, AluI, HinfI and TaqI and of the determination of the nucleotide sequence in the transforming region of adenovirus type 5 DNA. The results of the sequence determination of the Ad5 HindIII-G fragment are discussed in relation with the investigation on the transforming proteins isolated from in vitro and in vivo synthesizing systems. Labelling procedures of DNA are described including the exonuclease III/DNA polymerase 1 method and TA polynucleotide kinase labelling of DNA fragments. (Auth.)

  12. Functional region prediction with a set of appropriate homologous sequences-an index for sequence selection by integrating structure and sequence information with spatial statistics

    Science.gov (United States)

    2012-01-01

    Background The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions. Results We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence

  13. WeederH: an algorithm for finding conserved regulatory motifs and regions in homologous sequences

    Directory of Open Access Journals (Sweden)

    Pesole Graziano

    2007-02-01

    Full Text Available Abstract Background This work addresses the problem of detecting conserved transcription factor binding sites and in general regulatory regions through the analysis of sequences from homologous genes, an approach that is becoming more and more widely used given the ever increasing amount of genomic data available. Results We present an algorithm that identifies conserved transcription factor binding sites in a given sequence by comparing it to one or more homologs, adapting a framework we previously introduced for the discovery of sites in sequences from co-regulated genes. Differently from the most commonly used methods, the approach we present does not need or compute an alignment of the sequences investigated, nor resorts to descriptors of the binding specificity of known transcription factors. The main novel idea we introduce is a relative measure of conservation, assuming that true functional elements should present a higher level of conservation with respect to the rest of the sequence surrounding them. We present tests where we applied the algorithm to the identification of conserved annotated sites in homologous promoters, as well as in distal regions like enhancers. Conclusion Results of the tests show how the algorithm can provide fast and reliable predictions of conserved transcription factor binding sites regulating the transcription of a gene, with better performances than other available methods for the same task. We also show examples on how the algorithm can be successfully employed when promoter annotations of the genes investigated are missing, or when regulatory sites and regions are located far away from the genes.

  14. Comparative analysis of the full genome sequence of European bat lyssavirus type 1 and type 2 with other lyssaviruses and evidence for a conserved transcription termination and polyadenylation motif in the G-L 3' non-translated region.

    Science.gov (United States)

    Marston, D A; McElhinney, L M; Johnson, N; Müller, T; Conzelmann, K K; Tordo, N; Fooks, A R

    2007-04-01

    We report the first full-length genomic sequences for European bat lyssavirus type-1 (EBLV-1) and type-2 (EBLV-2). The EBLV-1 genomic sequence was derived from a virus isolated from a serotine bat in Hamburg, Germany, in 1968 and the EBLV-2 sequence was derived from a virus isolate from a human case of rabies that occurred in Scotland in 2002. A long-distance PCR strategy was used to amplify the open reading frames (ORFs), followed by standard and modified RACE (rapid amplification of cDNA ends) techniques to amplify the 3' and 5' ends. The lengths of each complete viral genome for EBLV-1 and EBLV-2 were 11 966 and 11 930 base pairs, respectively, and follow the standard rhabdovirus genome organization of five viral proteins. Comparison with other lyssavirus sequences demonstrates variation in degrees of homology, with the genomic termini showing a high degree of complementarity. The nucleoprotein was the most conserved, both intra- and intergenotypically, followed by the polymerase (L), matrix and glyco- proteins, with the phosphoprotein being the most variable. In addition, we have shown that the two EBLVs utilize a conserved transcription termination and polyadenylation (TTP) motif, approximately 50 nt upstream of the L gene start codon. All available lyssavirus sequences to date, with the exception of Pasteur virus (PV) and PV-derived isolates, use the second TTP site. This observation may explain differences in pathogenicity between lyssavirus strains, dependent on the length of the untranslated region, which might affect transcriptional activity and RNA stability.

  15. Computational analysis of sequence selection mechanisms.

    Science.gov (United States)

    Meyerguz, Leonid; Grasso, Catherine; Kleinberg, Jon; Elber, Ron

    2004-04-01

    Mechanisms leading to gene variations are responsible for the diversity of species and are important components of the theory of evolution. One constraint on gene evolution is that of protein foldability; the three-dimensional shapes of proteins must be thermodynamically stable. We explore the impact of this constraint and calculate properties of foldable sequences using 3660 structures from the Protein Data Bank. We seek a selection function that receives sequences as input, and outputs survival probability based on sequence fitness to structure. We compute the number of sequences that match a particular protein structure with energy lower than the native sequence, the density of the number of sequences, the entropy, and the "selection" temperature. The mechanism of structure selection for sequences longer than 200 amino acids is approximately universal. For shorter sequences, it is not. We speculate on concrete evolutionary mechanisms that show this behavior.

  16. Comparative analysis of sequences from PT 2013

    DEFF Research Database (Denmark)

    Mikkelsen, Susie Sommer

    Sheatfish and not EHNV. Generally, mistakes occurred at the ends of the sequences. This can be due to several factors. One is that the sequence has not been trimmed of the sequence primer sites. Another is the lack of quality control of the chromatogram. Finally, sequencing in just one direction can result...... diseases in Europe. As part of the EURL proficiency test for fish diseases it is required to sequence any RANA virus isolates found in any of the samples. It is also highly recommended to sequence the ISA virus to determine whether it be HPRΔ or HPR0. Furthermore, it is recommended that any VHSV and IHNV...... isolates be genotyped. As part of the evaluation of the proficiency results it was decided this year to look into the quality and similarity of the sequence results for selected viruses. Ampoule III in the proficiency test 2013 contained an EHNV isolate. The EURL received 43 sequences from 41 laboratories...

  17. SVAMP: Sequence variation analysis, maps and phylogeny

    KAUST Repository

    Naeem, Raeece

    2014-04-03

    Summary: SVAMP is a stand-alone desktop application to visualize genomic variants (in variant call format) in the context of geographical metadata. Users of SVAMP are able to generate phylogenetic trees and perform principal coordinate analysis in real time from variant call format (VCF) and associated metadata files. Allele frequency map, geographical map of isolates, Tajima\\'s D metric, single nucleotide polymorphism density, GC and variation density are also available for visualization in real time. We demonstrate the utility of SVAMP in tracking a methicillin-resistant Staphylococcus aureus outbreak from published next-generation sequencing data across 15 countries. We also demonstrate the scalability and accuracy of our software on 245 Plasmodium falciparum malaria isolates from three continents. Availability and implementation: The Qt/C++ software code, binaries, user manual and example datasets are available at http://cbrc.kaust.edu.sa/svamp. © The Author 2014.

  18. Statistical analysis of next generation sequencing data

    CERN Document Server

    Nettleton, Dan

    2014-01-01

    Next Generation Sequencing (NGS) is the latest high throughput technology to revolutionize genomic research. NGS generates massive genomic datasets that play a key role in the big data phenomenon that surrounds us today. To extract signals from high-dimensional NGS data and make valid statistical inferences and predictions, novel data analytic and statistical techniques are needed. This book contains 20 chapters written by prominent statisticians working with NGS data. The topics range from basic preprocessing and analysis with NGS data to more complex genomic applications such as copy number variation and isoform expression detection. Research statisticians who want to learn about this growing and exciting area will find this book useful. In addition, many chapters from this book could be included in graduate-level classes in statistical bioinformatics for training future biostatisticians who will be expected to deal with genomic data in basic biomedical research, genomic clinical trials and personalized med...

  19. Movement Pattern Analysis Based on Sequence Signatures

    Directory of Open Access Journals (Sweden)

    Seyed Hossein Chavoshi

    2015-09-01

    Full Text Available Increased affordability and deployment of advanced tracking technologies have led researchers from various domains to analyze the resulting spatio-temporal movement data sets for the purpose of knowledge discovery. Two different approaches can be considered in the analysis of moving objects: quantitative analysis and qualitative analysis. This research focuses on the latter and uses the qualitative trajectory calculus (QTC, a type of calculus that represents qualitative data on moving point objects (MPOs, and establishes a framework to analyze the relative movement of multiple MPOs. A visualization technique called sequence signature (SESI is used, which enables to map QTC patterns in a 2D indexed rasterized space in order to evaluate the similarity of relative movement patterns of multiple MPOs. The applicability of the proposed methodology is illustrated by means of two practical examples of interacting MPOs: cars on a highway and body parts of a samba dancer. The results show that the proposed method can be effectively used to analyze interactions of multiple MPOs in different domains.

  20. Direct chloroplast sequencing: comparison of sequencing platforms and analysis tools for whole chloroplast barcoding.

    Directory of Open Access Journals (Sweden)

    Marta Brozynska

    Full Text Available Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina and Ion Torrent (Life Technology sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare. Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis.

  1. SEQUENCING AND SEQUENCE ANALYSIS OF MYOSTATIN GENE IN THE EXON 1 OF THE CAMEL (CAMELUS DROMEDARIUS

    Directory of Open Access Journals (Sweden)

    M. G. SHAH, A. S. QURESHI1, M. REISSMANN2 AND H. J. SCHWARTZ3

    2006-10-01

    Full Text Available Myostatin, also called growth differentiation factor-8 (GDF-8, is a member of the mammalian growth transforming family (TGF-beta superfamily, which is expressed specifically in developing an adult skeletal muscle. Muscular hypertrophy allele (mh allele in the double muscle breeds involved mutation within the myostatin gene. Genomic DNA was isolated from the camel hair using NucleoSpin Tissue kit. Two animals of each of the six breeds namely, Marecha, Dhatti, Larri, Kohi, Sakrai and Cambelpuri were used for sequencing. For PCR amplification of the gene, a primer pair was designed from homolog regions of already published sequences of farm animals from GenBank. Results showed that camel myostatin possessed more than 90% homology with that of cattle, sheep and pig. Camel formed separate cluster from the pig in spite of having high homology (98% and showed 94% homology with cattle and sheep as reported in literature. Sequence analysis of the PCR amplified part of exon 1 (256 bp of the camel myostatin was identical among six camel breeds.

  2. Noncoding sequence classification based on wavelet transform analysis: part I

    Science.gov (United States)

    Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

    2017-09-01

    DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.

  3. Image sequence analysis workstation for multipoint motion analysis

    Science.gov (United States)

    Mostafavi, Hassan

    1990-08-01

    This paper describes an application-specific engineering workstation designed and developed to analyze motion of objects from video sequences. The system combines the software and hardware environment of a modem graphic-oriented workstation with the digital image acquisition, processing and display techniques. In addition to automation and Increase In throughput of data reduction tasks, the objective of the system Is to provide less invasive methods of measurement by offering the ability to track objects that are more complex than reflective markers. Grey level Image processing and spatial/temporal adaptation of the processing parameters is used for location and tracking of more complex features of objects under uncontrolled lighting and background conditions. The applications of such an automated and noninvasive measurement tool include analysis of the trajectory and attitude of rigid bodies such as human limbs, robots, aircraft in flight, etc. The system's key features are: 1) Acquisition and storage of Image sequences by digitizing and storing real-time video; 2) computer-controlled movie loop playback, freeze frame display, and digital Image enhancement; 3) multiple leading edge tracking in addition to object centroids at up to 60 fields per second from both live input video or a stored Image sequence; 4) model-based estimation and tracking of the six degrees of freedom of a rigid body: 5) field-of-view and spatial calibration: 6) Image sequence and measurement data base management; and 7) offline analysis software for trajectory plotting and statistical analysis.

  4. Novel algorithms for protein sequence analysis

    NARCIS (Netherlands)

    Ye, Kai

    2008-01-01

    Each protein is characterized by its unique sequential order of amino acids, the so-called protein sequence. Biology”s paradigm is that this order of amino acids determines the protein”s architecture and function. In this thesis, we introduce novel algorithms to analyze protein sequences. Chapter 1

  5. Pig genome sequence - analysis and publication strategy

    DEFF Research Database (Denmark)

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol

    2010-01-01

    preferentially selected for sequencing. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement the data have been released into public sequence repositories (Genbank/EMBL, NCBI/Ensembl trace repositories) in a timely manner and in advance of publication. CONCLUSIONS...

  6. THE ENVIRONMENT OF REGIONAL DEVELOPMENT FINANCIAL ANALYSIS

    OpenAIRE

    Bechis Liviu; MOSCVICIOV Andrei

    2012-01-01

    The paper presents the difference between the two concepts regionalism and regionalization. It also presents the three types of regionalism analysis depending on the dimension and the nature of the relations: regionalism at national level, transnational regionalism and international regionalism analysis.

  7. Characterization of Campylobacter jejuni applying flaA short variable region sequencing, multilocus sequencing and Fourier transform infrared spectroscopy

    DEFF Research Database (Denmark)

    Josefsen, Mathilde Hartmann; Bonnichsen, Lise; Larsson, Jonas

    flaA short variable region sequencing and phenetic Fourier transform infrared (FTIR) spectroscopy was applied on a collection of 102 Campylobacter jejuni isolated from continuous sampling of organic, free range geese and chickens. FTIR has been shown to serve as a valuable tool in typing...

  8. Intra-Genomic Internal Transcribed Spacer Region Sequence Heterogeneity and Molecular Diagnosis in Clinical Microbiology.

    Science.gov (United States)

    Zhao, Ying; Tsang, Chi-Ching; Xiao, Meng; Cheng, Jingwei; Xu, Yingchun; Lau, Susanna K P; Woo, Patrick C Y

    2015-10-22

    Internal transcribed spacer region (ITS) sequencing is the most extensively used technology for accurate molecular identification of fungal pathogens in clinical microbiology laboratories. Intra-genomic ITS sequence heterogeneity, which makes fungal identification based on direct sequencing of PCR products difficult, has rarely been reported in pathogenic fungi. During the process of performing ITS sequencing on 71 yeast strains isolated from various clinical specimens, direct sequencing of the PCR products showed ambiguous sequences in six of them. After cloning the PCR products into plasmids for sequencing, interpretable sequencing electropherograms could be obtained. For each of the six isolates, 10-49 clones were selected for sequencing and two to seven intra-genomic ITS copies were detected. The identities of these six isolates were confirmed to be Candida glabrata (n=2), Pichia (Candida) norvegensis (n=2), Candida tropicalis (n=1) and Saccharomyces cerevisiae (n=1). Multiple sequence alignment revealed that one to four intra-genomic ITS polymorphic sites were present in the six isolates, and all these polymorphic sites were located in the ITS1 and/or ITS2 regions. We report and describe the first evidence of intra-genomic ITS sequence heterogeneity in four different pathogenic yeasts, which occurred exclusively in the ITS1 and ITS2 spacer regions for the six isolates in this study.

  9. Characterization and sequence analysis of cysteine and glycine-rich ...

    African Journals Online (AJOL)

    Primers specific for CSRP3 were designed using known cDNA sequences of Bos taurus published in database with different accession numbers. Polymerase chain reaction (PCR) was performed and products were purified and sequenced. Sequence analysis and alignment were carried out using CLUSTAL W (1.83).

  10. Genetic structure of Florida green turtle rookeries as indicated by mitochondrial DNA control region sequences

    Science.gov (United States)

    Shamblin, Brian M.; Bagley, Dean A.; Ehrhart, Llewellyn M.; Desjardin, Nicole A.; Martin, R. Erik; Hart, Kristen M.; Naro-Maciel, Eugenia; Rusenko, Kirt; Stiner, John C.; Sobel, Debra; Johnson, Chris; Wilmers, Thomas; Wright, Laura J.; Nairn, Campbell J.

    2014-01-01

    Green turtle (Chelonia mydas) nesting has increased dramatically in Florida over the past two decades, ranking the Florida nesting aggregation among the largest in the Greater Caribbean region. Individual beaches that comprise several hundred kilometers of Florida’s east coast and Keys support tens to thousands of nests annually. These beaches encompass natural to highly developed habitats, and the degree of demographic partitioning among rookeries was previously unresolved. We characterized the genetic structure of ten Florida rookeries from Cape Canaveral to the Dry Tortugas through analysis of 817 base pair mitochondrial DNA (mtDNA) control region sequences from 485 nesting turtles. Two common haplotypes, CM-A1.1 and CM-A3.1, accounted for 87 % of samples, and the haplotype frequencies were strongly partitioned by latitude along Florida’s Atlantic coast. Most genetic structure occurred between rookeries on either side of an apparent genetic break in the vicinity of the St. Lucie Inlet that separates Hutchinson Island and Jupiter Island, representing the finest scale at which mtDNA structure has been documented in marine turtle rookeries. Florida and Caribbean scale analyses of population structure support recognition of at least two management units: central eastern Florida and southern Florida. More thorough sampling and deeper sequencing are necessary to better characterize connectivity among Florida green turtle rookeries as well as between the Florida nesting aggregation and others in the Greater Caribbean region.

  11. Incident sequence analysis; event trees, methods and graphical symbols

    International Nuclear Information System (INIS)

    1980-11-01

    When analyzing incident sequences, unwanted events resulting from a certain cause are looked for. Graphical symbols and explanations of graphical representations are presented. The method applies to the analysis of incident sequences in all types of facilities. By means of the incident sequence diagram, incident sequences, i.e. the logical and chronological course of repercussions initiated by the failure of a component or by an operating error, can be presented and analyzed simply and clearly

  12. Computer-aided visualization and analysis system for sequence evaluation

    Energy Technology Data Exchange (ETDEWEB)

    Chee, Mark S.; Wang, Chunwei; Jevons, Luis C.; Bernhart, Derek H.; Lipshutz, Robert J.

    2004-05-11

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  13. Exome sequencing generates high quality data in non-target regions

    Directory of Open Access Journals (Sweden)

    Guo Yan

    2012-05-01

    Full Text Available Abstract Background Exome sequencing using next-generation sequencing technologies is a cost efficient approach to selectively sequencing coding regions of human genome for detection of disease variants. A significant amount of DNA fragments from the capture process fall outside target regions, and sequence data for positions outside target regions have been mostly ignored after alignment. Result We performed whole exome sequencing on 22 subjects using Agilent SureSelect capture reagent and 6 subjects using Illumina TrueSeq capture reagent. We also downloaded sequencing data for 6 subjects from the 1000 Genomes Project Pilot 3 study. Using these data, we examined the quality of SNPs detected outside target regions by computing consistency rate with genotypes obtained from SNP chips or the Hapmap database, transition-transversion (Ti/Tv ratio, and percentage of SNPs inside dbSNP. For all three platforms, we obtained high-quality SNPs outside target regions, and some far from target regions. In our Agilent SureSelect data, we obtained 84,049 high-quality SNPs outside target regions compared to 65,231 SNPs inside target regions (a 129% increase. For our Illumina TrueSeq data, we obtained 222,171 high-quality SNPs outside target regions compared to 95,818 SNPs inside target regions (a 232% increase. For the data from the 1000 Genomes Project, we obtained 7,139 high-quality SNPs outside target regions compared to 1,548 SNPs inside target regions (a 461% increase. Conclusions These results demonstrate that a significant amount of high quality genotypes outside target regions can be obtained from exome sequencing data. These data should not be ignored in genetic epidemiology studies.

  14. Planarian homeobox genes: cloning, sequence analysis, and expression.

    Science.gov (United States)

    Garcia-Fernàndez, J; Baguñà, J; Saló, E

    1991-01-01

    Freshwater planarians (Platyhelminthes, Turbellaria, and Tricladida) are acoelomate, triploblastic, unsegmented, and bilaterally symmetrical organisms that are mainly known for their ample power to regenerate a complete organism from a small piece of their body. To identify potential pattern-control genes in planarian regeneration, we have isolated two homeobox-containing genes, Dth-1 and Dth-2 [Dugesia (Girardia) tigrina homeobox], by using degenerate oligonucleotides corresponding to the most conserved amino acid sequence from helix-3 of the homeodomain. Dth-1 and Dth-2 homeodomains are closely related (68% at the nucleotide level and 78% at the protein level) and show the conserved residues characteristic of the homeodomains identified to data. Similarity with most homeobox sequences is low (30-50%), except with Drosophila NK homeodomains (80-82% with NK-2) and the rodent TTF-1 homeodomain (77-87%). Some unusual amino acid residues specific to NK-2, TTF-1, Dth-1, and Dth-2 can be observed in the recognition helix (helix-3) and may define a family of homeodomains. The deduced amino acid sequences from the cDNAs contain, in addition to the homeodomain, other domains also present in various homeobox-containing genes. The expression of both genes, detected by Northern blot analysis, appear slightly higher in cephalic regions than in the rest of the intact organism, while a slight increase is detected in the central period (5 days) or regeneration. Images PMID:1714599

  15. Evaluation of a target region capture sequencing platform using monogenic diabetes as a study-model

    DEFF Research Database (Denmark)

    Gao, Rui; Liu, Yanxia; Gjesing, Anette Marianne Prior

    2014-01-01

    Monogenic diabetes is a genetic disease often caused by mutations in genes involved in beta-cell function. Correct sub-categorization of the disease is a prerequisite for appropriate treatment and genetic counseling. Target-region capture sequencing is a combination of genomic region enrichment...... and next generation sequencing which might be used as an efficient way to diagnose various genetic disorders. We aimed to develop a target-region capture sequencing platform to screen 117 selected candidate genes involved in metabolism for mutations and to evaluate its performance using monogenic diabetes...

  16. Reverse transcriptase sequences from mulberry LTR retrotransposons: characterization analysis

    Directory of Open Access Journals (Sweden)

    Ma Bi

    2017-10-01

    Full Text Available Copia and Gypsy play important roles in structural, functional and evolutionary dynamics of plant genomes. In this study, a total of 106 and 101, Copia and Gypsy reverse transcriptase (rt were amplified respectively in the Morus notabilis genome using degenerate primers. All sequences exhibited high levels of heterogeneity, were rich in AT and possessed higher sequence divergence of Copia rt in comparison to Gypsy rt. Two reasons are likely to account for this phenomenon: a these elements often experience deletions or fragmentation by illegitimate or unequal homologous recombination in the transposition process; b strong purifying selective pressure drives the evolution of these elements through “selective silencing” with random mutation and eventual deletion from the host genome. Interestingly, mulberry rt clustered with other rt from distantly related taxa according to the phylogenetic analysis. This phenomenon did not result from horizontal transposable element transfer. Results obtained from fluorescence in situ hybridization revealed that most of the hybridization signals were preferentially concentrated in pericentromeric and distal regions of chromosomes, and these elements may play important roles in the regions in which they are found. Results of this study support the continued pursuit of further functional studies of Copia and Gypsy in the mulberry genome.

  17. CSReport: A New Computational Tool Designed for Automatic Analysis of Class Switch Recombination Junctions Sequenced by High-Throughput Sequencing.

    Science.gov (United States)

    Boyer, François; Boutouil, Hend; Dalloul, Iman; Dalloul, Zeinab; Cook-Moreau, Jeanne; Aldigier, Jean-Claude; Carrion, Claire; Herve, Bastien; Scaon, Erwan; Cogné, Michel; Péron, Sophie

    2017-05-15

    B cells ensure humoral immune responses due to the production of Ag-specific memory B cells and Ab-secreting plasma cells. In secondary lymphoid organs, Ag-driven B cell activation induces terminal maturation and Ig isotype class switch (class switch recombination [CSR]). CSR creates a virtually unique IgH locus in every B cell clone by intrachromosomal recombination between two switch (S) regions upstream of each C region gene. Amount and structural features of CSR junctions reveal valuable information about the CSR mechanism, and analysis of CSR junctions is useful in basic and clinical research studies of B cell functions. To provide an automated tool able to analyze large data sets of CSR junction sequences produced by high-throughput sequencing (HTS), we designed CSReport, a software program dedicated to support analysis of CSR recombination junctions sequenced with a HTS-based protocol (Ion Torrent technology). CSReport was assessed using simulated data sets of CSR junctions and then used for analysis of Sμ-Sα and Sμ-Sγ1 junctions from CH12F3 cells and primary murine B cells, respectively. CSReport identifies junction segment breakpoints on reference sequences and junction structure (blunt-ended junctions or junctions with insertions or microhomology). Besides the ability to analyze unprecedentedly large libraries of junction sequences, CSReport will provide a unified framework for CSR junction studies. Our results show that CSReport is an accurate tool for analysis of sequences from our HTS-based protocol for CSR junctions, thereby facilitating and accelerating their study. Copyright © 2017 by The American Association of Immunologists, Inc.

  18. Establishing a framework for comparative analysis of genome sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  19. Comparative sequence analysis of Sordaria macrospora and Neurospora crassa as a means to improve genome annotation.

    Science.gov (United States)

    Nowrousian, Minou; Würtz, Christian; Pöggeler, Stefanie; Kück, Ulrich

    2004-03-01

    One of the most challenging parts of large scale sequencing projects is the identification of functional elements encoded in a genome. Recently, studies of genomes of up to six different Saccharomyces species have demonstrated that a comparative analysis of genome sequences from closely related species is a powerful approach to identify open reading frames and other functional regions within genomes [Science 301 (2003) 71, Nature 423 (2003) 241]. Here, we present a comparison of selected sequences from Sordaria macrospora to their corresponding Neurospora crassa orthologous regions. Our analysis indicates that due to the high degree of sequence similarity and conservation of overall genomic organization, S. macrospora sequence information can be used to simplify the annotation of the N. crassa genome.

  20. [Sequence polymorphisms of the mitochondrial DNA HVR I and HVR II regions in the Deng populations from Tibet in China].

    Science.gov (United States)

    Kang, Longli; Zhang, Xiaofeng; Liu, Kai; Zhao, Jianmin

    2009-12-01

    To analyze the sequence polymorphisms of the mitochondrial DNA hypervariable regions I (HVR I) and HVR II in the Deng population in Linzhi area of Tibet. mtDNAs obtained from 119 unrelated individuals were amplified and directly sequenced. One hundred and ten variable sites were identified, including nucleotide transitions, transversions, and insertions. In the HVR I region (nt16024-nt16365), 68 polymorphic sites and 119 haplotypes were observed, the genetic diversity was 0.9916. In the HVR II (nt73-nt340) region, 42 polymorphic sites and 113 haplotypes were observed, and the genetic diversity was 0.9907. The random match probability of the HVR I and HVR II regions were 0.0084 and 0.0093, respectively. When combining the HVR I and HVR II regions, 119 different haplotypes were found. The combined match probability of two unrelated persons having the same sequence was 0.0084. There are some unique polymorphic loci in the Deng population. There are different genetic structures between Chinese and other Asian populations in the mitochondrial DNA D-loop region. Sequence polymorphism of mitochondrial DNA HVR I and HVR II can be used as a genetic marker for forensic individual identification and genetic analysis.

  1. AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences

    Directory of Open Access Journals (Sweden)

    Claros M Gonzalo

    2010-06-01

    Full Text Available Abstract Background Multiple sequence alignments are used to study gene or protein function, phylogenetic relations, genome evolution hypotheses and even gene polymorphisms. Virtually without exception, all available tools focus on conserved segments or residues. Small divergent regions, however, are biologically important for specific quantitative polymerase chain reaction, genotyping, molecular markers and preparation of specific antibodies, and yet have received little attention. As a consequence, they must be selected empirically by the researcher. AlignMiner has been developed to fill this gap in bioinformatic analyses. Results AlignMiner is a Web-based application for detection of conserved and divergent regions in alignments of conserved sequences, focusing particularly on divergence. It accepts alignments (protein or nucleic acid obtained using any of a variety of algorithms, which does not appear to have a significant impact on the final results. AlignMiner uses different scoring methods for assessing conserved/divergent regions, Entropy being the method that provides the highest number of regions with the greatest length, and Weighted being the most restrictive. Conserved/divergent regions can be generated either with respect to the consensus sequence or to one master sequence. The resulting data are presented in a graphical interface developed in AJAX, which provides remarkable user interaction capabilities. Users do not need to wait until execution is complete and can.even inspect their results on a different computer. Data can be downloaded onto a user disk, in standard formats. In silico and experimental proof-of-concept cases have shown that AlignMiner can be successfully used to designing specific polymerase chain reaction primers as well as potential epitopes for antibodies. Primer design is assisted by a module that deploys several oligonucleotide parameters for designing primers "on the fly". Conclusions AlignMiner can be used

  2. Scalable Kernel Methods and Algorithms for General Sequence Analysis

    Science.gov (United States)

    Kuksa, Pavel

    2011-01-01

    Analysis of large-scale sequential data has become an important task in machine learning and pattern recognition, inspired in part by numerous scientific and technological applications such as the document and text classification or the analysis of biological sequences. However, current computational methods for sequence comparison still lack…

  3. Tandemly repeated sequence in 5'end of mtDNA control region of ...

    African Journals Online (AJOL)

    Extensive length variability was observed in 5' end sequence of the mitochondrial DNA control region of the Japanese Spanish mackerel (Scomberomorus niphonius). This length variability was due to the presence of varying numbers of a 56-bp tandemly repeated sequence and a 46-bp insertion/deletion (indel).

  4. Recurrence plot analysis of DNA sequences

    Energy Technology Data Exchange (ETDEWEB)

    Wu Zuobing [State Key Laboratory of Nonlinear Mechanics, Institute of Mechanics, Chinese Academy of Sciences, Beijing 100080 (China)]. E-mail: wuzb@lnm.imech.ac.cn

    2004-11-15

    Recurrence plot technique of DNA sequences is established on metric representation and employed to analyze correlation structure of nucleotide strings. It is found that, in the transference of nucleotide strings, a human DNA fragment has a major correlation distance, but a yeast chromosome's correlation distance has a constant increasing.

  5. Genomic region operation kit for flexible processing of deep sequencing data.

    Science.gov (United States)

    Ovaska, Kristian; Lyly, Lauri; Sahu, Biswajyoti; Jänne, Olli A; Hautaniemi, Sampsa

    2013-01-01

    Computational analysis of data produced in deep sequencing (DS) experiments is challenging due to large data volumes and requirements for flexible analysis approaches. Here, we present a mathematical formalism based on set algebra for frequently performed operations in DS data analysis to facilitate translation of biomedical research questions to language amenable for computational analysis. With the help of this formalism, we implemented the Genomic Region Operation Kit (GROK), which supports various DS-related operations such as preprocessing, filtering, file conversion, and sample comparison. GROK provides high-level interfaces for R, Python, Lua, and command line, as well as an extension C++ API. It supports major genomic file formats and allows storing custom genomic regions in efficient data structures such as red-black trees and SQL databases. To demonstrate the utility of GROK, we have characterized the roles of two major transcription factors (TFs) in prostate cancer using data from 10 DS experiments. GROK is freely available with a user guide from >http://csbi.ltdk.helsinki.fi/grok/.

  6. Detection of genomic variation by selection of a 9 mb DNA region and high throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Sergey I Nikolaev

    Full Text Available Detection of the rare polymorphisms and causative mutations of genetic diseases in a targeted genomic area has become a major goal in order to understand genomic and phenotypic variability. We have interrogated repeat-masked regions of 8.9 Mb on human chromosomes 21 (7.8 Mb and 7 (1.1 Mb from an individual from the International HapMap Project (NA12872. We have optimized a method of genomic selection for high throughput sequencing. Microarray-based selection and sequencing resulted in 260-fold enrichment, with 41% of reads mapping to the target region. 83% of SNPs in the targeted region had at least 4-fold sequence coverage and 54% at least 15-fold. When assaying HapMap SNPs in NA12872, our sequence genotypes are 91.3% concordant in regions with coverage > or = 4-fold, and 97.9% concordant in regions with coverage > or = 15-fold. About 81% of the SNPs recovered with both thresholds are listed in dbSNP. We observed that regions with low sequence coverage occur in close proximity to low-complexity DNA. Validation experiments using Sanger sequencing were performed for 46 SNPs with 15-20 fold coverage, with a confirmation rate of 96%, suggesting that DNA selection provides an accurate and cost-effective method for identifying rare genomic variants.

  7. Analysis of Neuronal Sequences Using Pairwise Biases

    Science.gov (United States)

    2015-08-27

    semantic memory (knowledge of facts) and implicit memory (e.g., how to ride a bike ). Evidence for the participation of the hippocampus in the formation of...hippocampal formation in an attempt to be cured of severe epileptic seizures. Although the surgery was successful in regards to reducing the frequency and...very different from each other in many ways including duration and number of spikes. Still, these sequences share a similar trend in the general order

  8. Google matrix analysis of DNA sequences.

    Science.gov (United States)

    Kandiah, Vivek; Shepelyansky, Dima L

    2013-01-01

    For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW). At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  9. Google matrix analysis of DNA sequences.

    Directory of Open Access Journals (Sweden)

    Vivek Kandiah

    Full Text Available For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW. At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  10. A genome-wide analysis of lentivector integration sites using targeted sequence capture and next generation sequencing technology.

    Science.gov (United States)

    Ustek, Duran; Sirma, Sema; Gumus, Ergun; Arikan, Muzaffer; Cakiris, Aris; Abaci, Neslihan; Mathew, Jaicy; Emrence, Zeliha; Azakli, Hulya; Cosan, Fulya; Cakar, Atilla; Parlak, Mahmut; Kursun, Olcay

    2012-10-01

    One application of next-generation sequencing (NGS) is the targeted resequencing of interested genes which has not been used in viral integration site analysis of gene therapy applications. Here, we combined targeted sequence capture array and next generation sequencing to address the whole genome profiling of viral integration sites. Human 293T and K562 cells were transduced with a HIV-1 derived vector. A custom made DNA probe sets targeted pLVTHM vector used to capture lentiviral vector/human genome junctions. The captured DNA was sequenced using GS FLX platform. Seven thousand four hundred and eighty four human genome sequences flanking the long terminal repeats (LTR) of pLVTHM fragment sequences matched with an identity of at least 98% and minimum 50 bp criteria in both cells. In total, 203 unique integration sites were identified. The integrations in both cell lines were totally distant from the CpG islands and from the transcription start sites and preferentially located in introns. A comparison between the two cell lines showed that the lentiviral-transduced DNA does not have the same preferred regions in the two different cell lines. Copyright © 2012 Elsevier B.V. All rights reserved.

  11. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    Directory of Open Access Journals (Sweden)

    Wadim L. Matochko

    2013-01-01

    Full Text Available Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N×1 frequency vector n=ni, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N×N matrix and a stochastic sampling operator (Sa. The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq. Sequencing without any bias and errors is Seq=Sa IN, where IN is a N×N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN, which describes elimination or statistically significant downsampling, of specific reads during the sequencing process.

  12. Identification and verification of hybridoma-derived monoclonal antibody variable region sequences using recombinant DNA technology and mass spectrometry.

    Science.gov (United States)

    Babrak, Lmar; McGarvey, Jeffery A; Stanker, Larry H; Hnasko, Robert

    2017-10-01

    Antibody engineering requires the identification of antigen binding domains or variable regions (VR) unique to each antibody. It is the VR that define the unique antigen binding properties and proper sequence identification is essential for functional evaluation and performance of recombinant antibodies (rAb). This determination can be achieved by sequence analysis of immunoglobulin (Ig) transcripts obtained from a monoclonal antibody (MAb) producing hybridoma and subsequent expression of a rAb. However the polyploidy nature of a hybridoma cell often results in the added expression of aberrant immunoglobulin-like transcripts or even production of anomalous antibodies which can confound production of rAb. An incorrect VR sequence will result in a non-functional rAb and de novo assembly of Ig primary structure without a sequence map is challenging. To address these problems, we have developed a methodology which combines: 1) selective PCR amplification of VR from both the heavy and light chain IgG from hybridoma, 2) molecular cloning and DNA sequence analysis and 3) tandem mass spectrometry (MS/MS) on enzyme digests obtained from the purified IgG. Peptide analysis proceeds by evaluating coverage of the predicted primary protein sequence provided by the initial DNA maps for the VR. This methodology serves to both identify and verify the primary structure of the MAb VR for production as rAb. Published by Elsevier Ltd.

  13. Cloning and sequence analysis of benzo-a-pyreneinducible ...

    African Journals Online (AJOL)

    The phylogenetic tree based on the amino acid sequences clearly shows tilapia CYP1A and killifish CYP1A to be more closely related to each other than to the other CYP1A subfamilies. Sequence analysis of 3727 bp of genomic DNA showed that the clone obtained was the structural gene of CYP1A which consists of ...

  14. Biological sequence analysis: probabilistic models of proteins and nucleic acids

    National Research Council Canada - National Science Library

    Durbin, Richard

    1998-01-01

    ... analysis methods are now based on principles of probabilistic modelling. Examples of such methods include the use of probabilistically derived score matrices to determine the significance of sequence alignments, the use of hidden Markov models as the basis for profile searches to identify distant members of sequence families, and the inference...

  15. Phylogenetic analysis of the genus Hordeum using repetitive DNA sequences

    DEFF Research Database (Denmark)

    Svitashev, S.; Bryngelsson, T.; Vershinin, A.

    1994-01-01

    A set of six cloned barley (Hordeum vulgare) repetitive DNA sequences was used for the analysis of phylogenetic relationships among 31 species (46 taxa) of the genus Hordeum, using molecular hybridization techniques. In situ hybridization experiments showed dispersed organization of the sequences...

  16. Evaluation of exome variants using the Ion Proton Platform to sequence error-prone regions.

    Science.gov (United States)

    Seo, Heewon; Park, Yoomi; Min, Byung Joo; Seo, Myung Eui; Kim, Ju Han

    2017-01-01

    The Ion Proton sequencer from Thermo Fisher accurately determines sequence variants from target regions with a rapid turnaround time at a low cost. However, misleading variant-calling errors can occur. We performed a systematic evaluation and manual curation of read-level alignments for the 675 ultrarare variants reported by the Ion Proton sequencer from 27 whole-exome sequencing data but that are not present in either the 1000 Genomes Project and the Exome Aggregation Consortium. We classified positive variant calls into 393 highly likely false positives, 126 likely false positives, and 156 likely true positives, which comprised 58.2%, 18.7%, and 23.1% of the variants, respectively. We identified four distinct error patterns of variant calling that may be bioinformatically corrected when using different strategies: simplicity region, SNV cluster, peripheral sequence read, and base inversion. Local de novo assembly successfully corrected 201 (38.7%) of the 519 highly likely or likely false positives. We also demonstrate that the two sequencing kits from Thermo Fisher (the Ion PI Sequencing 200 kit V3 and the Ion PI Hi-Q kit) exhibit different error profiles across different error types. A refined calling algorithm with better polymerase may improve the performance of the Ion Proton sequencing platform.

  17. Evaluation of exome variants using the Ion Proton Platform to sequence error-prone regions.

    Directory of Open Access Journals (Sweden)

    Heewon Seo

    Full Text Available The Ion Proton sequencer from Thermo Fisher accurately determines sequence variants from target regions with a rapid turnaround time at a low cost. However, misleading variant-calling errors can occur. We performed a systematic evaluation and manual curation of read-level alignments for the 675 ultrarare variants reported by the Ion Proton sequencer from 27 whole-exome sequencing data but that are not present in either the 1000 Genomes Project and the Exome Aggregation Consortium. We classified positive variant calls into 393 highly likely false positives, 126 likely false positives, and 156 likely true positives, which comprised 58.2%, 18.7%, and 23.1% of the variants, respectively. We identified four distinct error patterns of variant calling that may be bioinformatically corrected when using different strategies: simplicity region, SNV cluster, peripheral sequence read, and base inversion. Local de novo assembly successfully corrected 201 (38.7% of the 519 highly likely or likely false positives. We also demonstrate that the two sequencing kits from Thermo Fisher (the Ion PI Sequencing 200 kit V3 and the Ion PI Hi-Q kit exhibit different error profiles across different error types. A refined calling algorithm with better polymerase may improve the performance of the Ion Proton sequencing platform.

  18. Compilation and analysis of Escherichia coli promoter DNA sequences.

    OpenAIRE

    Hawley, D K; McClure, W R

    1983-01-01

    The DNA sequence of 168 promoter regions (-50 to +10) for Escherichia coli RNA polymerase were compiled. The complete listing was divided into two groups depending upon whether or not the promoter had been defined by genetic (promoter mutations) or biochemical (5' end determination) criteria. A consensus promoter sequence based on homologies among 112 well-defined promoters was determined that was in substantial agreement with previous compilations. In addition, we have tabulated 98 promoter ...

  19. SEQUENCE ANALYSIS OF MATURASE K (MATK): A ...

    African Journals Online (AJOL)

    Global Journal

    presence of frame shift indels as well as few cases of premature stop .... scan at high sensitivity. ..... Specifically, there are two schools of thought regarding ... better used for shallow evolutionary histories while ... genomic regions in deep level phylogenetics (Yang, .... genomes possibly being a consequences of the higher.

  20. Parametric inference for biological sequence analysis.

    Science.gov (United States)

    Pachter, Lior; Sturmfels, Bernd

    2004-11-16

    One of the major successes in computational biology has been the unification, by using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. Graphical models that have been applied to these problems include hidden Markov models for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the sum-product algorithm, solves many of the inference problems that are associated with different statistical models. This article introduces the polytope propagation algorithm for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sum-product algorithm and is used to analyze the parametric behavior of maximum a posteriori inference calculations for graphical models.

  1. Reference voltage calculation method based on zero-sequence component optimisation for a regional compensation DVR

    Science.gov (United States)

    Jian, Le; Cao, Wang; Jintao, Yang; Yinge, Wang

    2018-04-01

    This paper describes the design of a dynamic voltage restorer (DVR) that can simultaneously protect several sensitive loads from voltage sags in a region of an MV distribution network. A novel reference voltage calculation method based on zero-sequence voltage optimisation is proposed for this DVR to optimise cost-effectiveness in compensation of voltage sags with different characteristics in an ungrounded neutral system. Based on a detailed analysis of the characteristics of voltage sags caused by different types of faults and the effect of the wiring mode of the transformer on these characteristics, the optimisation target of the reference voltage calculation is presented with several constraints. The reference voltages under all types of voltage sags are calculated by optimising the zero-sequence component, which can reduce the degree of swell in the phase-to-ground voltage after compensation to the maximum extent and can improve the symmetry degree of the output voltages of the DVR, thereby effectively increasing the compensation ability. The validity and effectiveness of the proposed method are verified by simulation and experimental results.

  2. RESEARCH NOTE Genome-based exome-sequencing analysis ...

    Indian Academy of Sciences (India)

    Navya

    2017-02-22

    Feb 22, 2017 ... Genome-based exome-sequencing analysis identifies GYG1, DIS3L, DDRGK1 genes ... Cardiology Division, Department of Internal Medicine, Severance .... with p values of <0.05 byanalyzing differences in allele distribution.

  3. Editorial: Special Issue on Algorithms for Sequence Analysis and Storage

    Directory of Open Access Journals (Sweden)

    Veli Mäkinen

    2014-03-01

    Full Text Available This special issue of Algorithms is dedicated to approaches to biological sequence analysis that have algorithmic novelty and potential for fundamental impact in methods used for genome research.

  4. THE 'MAIN SEQUENCE' OF EXPLOSIVE SOLAR ACTIVE REGIONS: DISCOVERY AND INTERPRETATION

    Energy Technology Data Exchange (ETDEWEB)

    Falconer, David A; Moore, Ronald L; Adams, Mitzi [Space Science Office, VP62, Marshall Space Flight Center, Huntsville, AL 35812 (United States); Gary, G. Allen [Center for Space Plasma and Aeronomic Research, University of Alabama in Huntsville, Huntsville, AL 35899 (United States)], E-mail: David.falconer@msfc.nasa.gov

    2009-08-01

    We examine the location and distribution of the production of coronal mass ejections (CMEs) and major flares by sunspot active regions in the phase space of two whole-active-region magnetic quantities measured from 1897 SOHO/MDI magnetograms. These magnetograms track the evolution of 44 active regions across the central disk of radius 0.5 R {sub Sun}. The two quantities are {sup L}WL{sub SG}, a gauge of the total free energy in an active region's magnetic field, and {sup L}{phi}, a measure of the active region's total magnetic flux. From these data and each active region's history of production of CMEs, X flares, and M flares, we find (1) that CME/flare-productive active regions are concentrated in a straight-line 'main sequence' in (log {sup L}WL{sub SG}, log {sup L}{phi}) space, (2) that main-sequence active regions have nearly their maximum attainable free magnetic energy, and (3) evidence that this arrangement plausibly results from equilibrium between input of free energy to an explosive active region's magnetic field in the chromosphere and corona by contortion of the field via convection in and below the photosphere and loss of free energy via CMEs, flares, and coronal heating, an equilibrium between energy gain and loss that is analogous to that of the main sequence of hydrogen-burning stars in (mass, luminosity) space.

  5. THE 'MAIN SEQUENCE' OF EXPLOSIVE SOLAR ACTIVE REGIONS: DISCOVERY AND INTERPRETATION

    International Nuclear Information System (INIS)

    Falconer, David A.; Moore, Ronald L.; Adams, Mitzi; Gary, G. Allen

    2009-01-01

    We examine the location and distribution of the production of coronal mass ejections (CMEs) and major flares by sunspot active regions in the phase space of two whole-active-region magnetic quantities measured from 1897 SOHO/MDI magnetograms. These magnetograms track the evolution of 44 active regions across the central disk of radius 0.5 R Sun . The two quantities are L WL SG , a gauge of the total free energy in an active region's magnetic field, and L Φ, a measure of the active region's total magnetic flux. From these data and each active region's history of production of CMEs, X flares, and M flares, we find (1) that CME/flare-productive active regions are concentrated in a straight-line 'main sequence' in (log L WL SG , log L Φ) space, (2) that main-sequence active regions have nearly their maximum attainable free magnetic energy, and (3) evidence that this arrangement plausibly results from equilibrium between input of free energy to an explosive active region's magnetic field in the chromosphere and corona by contortion of the field via convection in and below the photosphere and loss of free energy via CMEs, flares, and coronal heating, an equilibrium between energy gain and loss that is analogous to that of the main sequence of hydrogen-burning stars in (mass, luminosity) space.

  6. Targeted DNA Methylation Analysis by High Throughput Sequencing in Porcine Peri-attachment Embryos

    OpenAIRE

    MORRILL, Benson H.; COX, Lindsay; WARD, Anika; HEYWOOD, Sierra; PRATHER, Randall S.; ISOM, S. Clay

    2013-01-01

    Abstract The purpose of this experiment was to implement and evaluate the effectiveness of a next-generation sequencing-based method for DNA methylation analysis in porcine embryonic samples. Fourteen discrete genomic regions were amplified by PCR using bisulfite-converted genomic DNA derived from day 14 in vivo-derived (IVV) and parthenogenetic (PA) porcine embryos as template DNA. Resulting PCR products were subjected to high-throughput sequencing using the Illumina Genome Analyzer IIx plat...

  7. Automatic analysis of the 2015 Gorkha earthquake aftershock sequence.

    Science.gov (United States)

    Baillard, C.; Lyon-Caen, H.; Bollinger, L.; Rietbrock, A.; Letort, J.; Adhikari, L. B.

    2016-12-01

    The Mw 7.8 Gorkha earthquake, that partially ruptured the Main Himalayan Thrust North of Kathmandu on the 25th April 2015, was the largest and most catastrophic earthquake striking Nepal since the great M8.4 1934 earthquake. This mainshock was followed by multiple aftershocks, among them, two notable events that occurred on the 12th May with magnitudes of 7.3 Mw and 6.3 Mw. Due to these recent events it became essential for the authorities and for the scientific community to better evaluate the seismic risk in the region through a detailed analysis of the earthquake catalog, amongst others, the spatio-temporal distribution of the Gorkha aftershock sequence. Here we complement this first study by doing a microseismic study using seismic data coming from the eastern part of the Nepalese Seismological Center network associated to one broadband station in Everest. Our primary goal is to deliver an accurate catalog of the aftershock sequence. Due to the exceptional number of events detected we performed an automatic picking/locating procedure which can be splitted in 4 steps: 1) Coarse picking of the onsets using a classical STA/LTA picker, 2) phase association of picked onsets to detect and declare seismic events, 3) Kurtosis pick refinement around theoretical arrival times to increase picking and location accuracy and, 4) local magnitude calculation based amplitude of waveforms. This procedure is time efficient ( 1 sec/event), reduces considerably the location uncertainties ( 2 to 5 km errors) and increases the number of events detected compared to manual processing. Indeed, the automatic detection rate is 10 times higher than the manual detection rate. By comparing to the USGS catalog we were able to give a new attenuation law to compute local magnitudes in the region. A detailed analysis of the seismicity shows a clear migration toward the east of the region and a sudden decrease of seismicity 100 km east of Kathmandu which may reveal the presence of a tectonic

  8. Sequencing and Analysis of Neanderthal Genomic DNA

    Energy Technology Data Exchange (ETDEWEB)

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith,Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo,Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2006-06-13

    Recovery and analysis of multiple Neanderthal autosomalsequences using a metagenomic approach reveals that modern humans andNeanderthals split ~;400,000 years ago, without significant evidence ofsubsequent admixture.

  9. SVAMP: Sequence variation analysis, maps and phylogeny

    KAUST Repository

    Naeem, Raeece; Hidayah, Lailatul; Preston, Mark D.; Clark, Taane G.; Pain, Arnab

    2014-01-01

    Summary: SVAMP is a stand-alone desktop application to visualize genomic variants (in variant call format) in the context of geographical metadata. Users of SVAMP are able to generate phylogenetic trees and perform principal coordinate analysis

  10. DSAP: deep-sequencing small RNA analysis pipeline.

    Science.gov (United States)

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.

  11. Quantiprot - a Python package for quantitative analysis of protein sequences.

    Science.gov (United States)

    Konopka, Bogumił M; Marciniak, Marta; Dyrka, Witold

    2017-07-17

    The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted. Quantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf's law coefficient. We propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences.

  12. Multi-region and single-cell sequencing reveal variable genomic heterogeneity in rectal cancer.

    Science.gov (United States)

    Liu, Mingshan; Liu, Yang; Di, Jiabo; Su, Zhe; Yang, Hong; Jiang, Beihai; Wang, Zaozao; Zhuang, Meng; Bai, Fan; Su, Xiangqian

    2017-11-23

    Colorectal cancer is a heterogeneous group of malignancies with complex molecular subtypes. While colon cancer has been widely investigated, studies on rectal cancer are very limited. Here, we performed multi-region whole-exome sequencing and single-cell whole-genome sequencing to examine the genomic intratumor heterogeneity (ITH) of rectal tumors. We sequenced nine tumor regions and 88 single cells from two rectal cancer patients with tumors of the same molecular classification and characterized their mutation profiles and somatic copy number alterations (SCNAs) at the multi-region and the single-cell levels. A variable extent of genomic heterogeneity was observed between the two patients, and the degree of ITH increased when analyzed on the single-cell level. We found that major SCNAs were early events in cancer development and inherited steadily. Single-cell sequencing revealed mutations and SCNAs which were hidden in bulk sequencing. In summary, we studied the ITH of rectal cancer at regional and single-cell resolution and demonstrated that variable heterogeneity existed in two patients. The mutational scenarios and SCNA profiles of two patients with treatment naïve from the same molecular subtype are quite different. Our results suggest each tumor possesses its own architecture, which may result in different diagnosis, prognosis, and drug responses. Remarkable ITH exists in the two patients we have studied, providing a preliminary impression of ITH in rectal cancer.

  13. Genomic sequence around butterfly wing development genes: annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Inês C Conceição

    Full Text Available BACKGROUND: Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. METHODOLOGY/PRINCIPAL FINDINGS: We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes. CONCLUSIONS: The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1 the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2 the high

  14. Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs

    Directory of Open Access Journals (Sweden)

    Ruan Jishou

    2007-04-01

    Full Text Available Abstract Background Traditionally, it is believed that the native structure of a protein corresponds to a global minimum of its free energy. However, with the growing number of known tertiary (3D protein structures, researchers have discovered that some proteins can alter their structures in response to a change in their surroundings or with the help of other proteins or ligands. Such structural shifts play a crucial role with respect to the protein function. To this end, we propose a machine learning method for the prediction of the flexible/rigid regions of proteins (referred to as FlexRP; the method is based on a novel sequence representation and feature selection. Knowledge of the flexible/rigid regions may provide insights into the protein folding process and the 3D structure prediction. Results The flexible/rigid regions were defined based on a dataset, which includes protein sequences that have multiple experimental structures, and which was previously used to study the structural conservation of proteins. Sequences drawn from this dataset were represented based on feature sets that were proposed in prior research, such as PSI-BLAST profiles, composition vector and binary sequence encoding, and a newly proposed representation based on frequencies of k-spaced amino acid pairs. These representations were processed by feature selection to reduce the dimensionality. Several machine learning methods for the prediction of flexible/rigid regions and two recently proposed methods for the prediction of conformational changes and unstructured regions were compared with the proposed method. The FlexRP method, which applies Logistic Regression and collocation-based representation with 95 features, obtained 79.5% accuracy. The two runner-up methods, which apply the same sequence representation and Support Vector Machines (SVM and Naïve Bayes classifiers, obtained 79.2% and 78.4% accuracy, respectively. The remaining considered methods are

  15. Phylogenetic relations of humans and African apes from DNA sequences in the Psi eta-globin region

    Energy Technology Data Exchange (ETDEWEB)

    Miyamoto, M.M.; Slightom, J.L.; Goodman, M.

    1987-10-16

    Sequences from the upstream and downstream flanking DNA regions of the Psi eta-globin locus in Pan troglodytes (common chimpanzee), Gorilla gorilla (gorilla), and Pongo pygmaeus (orangutan, the closest living relative to Homo, Pan, and Gorilla) provided further data for evaluating the phylogenetic relations of humans and African apes. These newly sequenced orthologs (an additional 4.9 kilobase pairs (kbp) for each species) were combined with published Psi eta-gene sequences and then compared to the same orthologous stretch (a continuous 7.1-kbp region) available for humans. Phylogenetic analysis of these nucleotide sequences by the parsimony method indicated (i) that human and chimpanzee are more closely related to each other than either is to gorilla and (ii) that the slowdown in the rate of sequence evolution evident in higher primates is especially pronounced in humans. These results indicate that features unique to African apes (but not to humans) are primitive and that even local molecular clocks should be applied with caution.

  16. Sequence polymorphism data of the hypervariable regions of mitochondrial DNA in the Yadav population of Haryana.

    Science.gov (United States)

    Verma, Kapil; Sharma, Sapna; Sharma, Arun; Dalal, Jyoti; Bhardwaj, Tapeshwar

    2018-06-01

    Genetic variations among humans occur both within and among populations and range from single nucleotide changes to multiple-nucleotide variants. These multiple-nucleotide variants are useful for studying the relationships among individuals or various population groups. The study of human genetic variations can help scientists understand how different population groups are biologically related to one another. Sequence analysis of hypervariable regions of human mitochondrial DNA (mtDNA) has been successfully used for the genetic characterization of different population groups for forensic purposes. It is well established that different ethnic or population groups differ significantly in their mtDNA distributions. In the last decade, very little research has been conducted on mtDNA variations in the Indian population, although such data would be useful for elucidating the history of human population expansion across the world. Moreover, forensic studies on mtDNA variations in the Indian subcontinent are also scarce, particularly in the northern part of India. In this report, variations in the hypervariable regions of mtDNA were analyzed in the Yadav population of Haryana. Different molecular diversity indices were computed. Further, the obtained haplotypes were classified into different haplogroups and the phylogenetic relationship between different haplogroups was inferred.

  17. Establishment of screening technique for mutant cell and analysis of base sequence in the mutation

    International Nuclear Information System (INIS)

    Sofuni, Toshio; Nomi, Takehiko; Yamada, Masami; Masumura, Kenichi

    2000-01-01

    This research project aimed to establish an easy and quick detection method for radiation-induced mutation using molecular-biological techniques and an effective analyzing method for the molecular changes in base sequence. In this year, Spi mutants derived from γ-radiation exposed mouse were analyzed by PCR method and DNA sequence method. Male transgenic mice were exposed to γ-ray at 5,10, 50 Gy and the transgene was taken out from the genome DNA from the spleen in vivo packaging method. Spi mutant plaques were obtained by infecting the recovered phage to E. coli. Sequence analysis for the mutants was made using ALFred DNA sequencer and SequiTherm TM Long-Red Cycle sequencing kit. Sequence analysis was carried out for 41 of 50 independent Spi mutants obtained. The deletions were classified into 4 groups; Group 1 included 15 mutants that were characterized with a large deletion (43 bp-10 kb) with a short homologous sequence. Group 2 included 11 mutants of a large deletion having no homologous sequence at the connecting region. Group 3 included 11 mutants having a short deletion of less than 20 bp, which occurred in the non-repetitive sequence of gam gene and possibly caused by oxidative breakage of DNA or recombination of DNA fragment produced by the breakage. Group 4 included 4 mutants having deletions as short as 20 bp or less in the repetitive sequence of gam gene, resulting in an alteration of the reading frame. Thus, the synthesis of Gam protein was terminated by the appearance of TGA between code 13 and 14 of redB gene, leading to inactivation of gam gene and redBA gene. These results indicated that most of Spi mutants had a deletion in red/gam region and the deletions in more than half mutants occurred in homologous sequences as short as 8 bp. (M.N.)

  18. Identifications of Captive and Wild Tilapia Species Existing in Hawaii by Mitochondrial DNA Control Region Sequence

    Science.gov (United States)

    Wu, Liang; Yang, Jinzeng

    2012-01-01

    Background The tilapia family of the Cichlidae includes many fish species, which live in freshwater and saltwater environments. Several species, such as O. niloticus, O. aureus, and O. mossambicus, are excellent for aquaculture because these fish are easily reproduced and readily adapt to diverse environments. Historically, tilapia species, including O. mossambicus, S. melanotheron, and O. aureus, were introduced to Hawaii many decades ago, and the state of Hawaii uses the import permit policy to prevent O. niloticus from coming into the islands. However, hybrids produced from O. niloticus may already be present in the freshwater and marine environments of the islands. The purpose of this study was to identify tilapia species that exist in Hawaii using mitochondrial DNA analysis. Methodology/Principal Findings In this study, we analyzed 382 samples collected from 13 farm (captive) and wild tilapia populations in Oahu and the Hawaii Islands. Comparison of intraspecies variation between the mitochondrial DNA control region (mtDNA CR) and cytochrome c oxidase I (COI) gene from five populations indicated that mtDNA CR had higher nucleotide diversity than COI. A phylogenetic tree of all sampled tilapia was generated using mtDNA CR sequences. The neighbor-joining tree analysis identified seven distinctive tilapia species: O. aureus, O. mossambicus, O. niloticus, S. melanotheron, O. urolepies, T. redalli, and a hybrid of O. massambicus and O. niloticus. Of all the populations examined, 10 populations consisting of O. aureus, O. mossambicus, O. urolepis, and O. niloticus from the farmed sites were relatively pure, whereas three wild populations showed some degree of introgression and hybridization. Conclusions/Significance This DNA-based tilapia species identification is the first report that confirmed tilapia species identities in the wild and captive populations in Hawaii. The DNA sequence comparisons of mtDNA CR appear to be a valid method for tilapia species

  19. Sequence and expression analysis of gaps in human chromosome 20

    DEFF Research Database (Denmark)

    Minocherhomji, Sheroy; Seemann, Stefan; Mang, Yuan

    2012-01-01

    /or overlap disease-associated loci, including the DLGAP4 locus. In this study, we sequenced ~99% of all three unfinished gaps on human chr 20, determined their complete genomic sizes and assessed epigenetic profiles using a combination of Sanger sequencing, mate pair paired-end high-throughput sequencing......The finished human genome-assemblies comprise several hundred un-sequenced euchromatic gaps, which may be rich in long polypurine/polypyrimidine stretches. Human chromosome 20 (chr 20) currently has three unfinished gaps remaining on its q-arm. All three gaps are within gene-dense regions and...... and chromatin, methylation and expression analyses. We found histone 3 trimethylated at Lysine 27 to be distributed across all three gaps in immortalized B-lymphocytes. In one gap, five novel CpG islands were predominantly hypermethylated in genomic DNA from peripheral blood lymphocytes and human cerebellum...

  20. [Sequence of the ITS region of nuclear ribosomal DNA(nrDNA) in Xinjiang wild Dianthus and its phylogenetic relationship].

    Science.gov (United States)

    Zhang, Lu; Cai, You-Ming; Zhuge, Qiang; Zou, Hui-Yu; Huang, Min-Ren

    2002-06-01

    Xinjiang is a center of distribution and differentiation of genus Dianthus in China, and has a great deal of species resources. The sequences of ITS region (including ITS-1, 5.8S rDNA and ITS-2) of nuclear ribosomal DNA from 8 species of genus Dianthus wildly distributed in Xinjiang were determined by direct sequencing of PCR products. The result showed that the size of the ITS of Dianthus is from 617 to 621 bp, and the length variation is only 4 bp. There are very high homogeneous (97.6%-99.8%) sequences between species, and about 80% homogeneous sequences between genus Dianthus and outgroup. The sequences of ITS in genus Dianthus are relatively conservative. In general, there are more conversion than transition in the variation sites among genus Dianthus. The conversion rates are relatively high, and the ratios of conversion/transition are 1.0-3.0. On the basis of phylogenetic analysis of nucleotide sequences the species of Dianthus in China would be divided into three sections. There is a distant relationship between sect. Barbulatum Williams and sect. Dianthus and between sect. Barbulatum Williams and sect. Fimbriatum Williams, and there is a close relationship between sect. Dianthus and sect. Fimbriatum Williams. From the phylogenetic tree of ITS it was found that the origin of sect. Dianthusis is earlier than that of sect. Fimbriatum Williams and sect. Barbulatum Williams.

  1. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    Science.gov (United States)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.

  2. Comparative analysis of catfish BAC end sequences with the zebrafish genome

    Directory of Open Access Journals (Sweden)

    Abernathy Jason

    2009-12-01

    Full Text Available Abstract Background Comparative mapping is a powerful tool to transfer genomic information from sequenced genomes to closely related species for which whole genome sequence data are not yet available. However, such an approach is still very limited in catfish, the most important aquaculture species in the United States. This project was initiated to generate additional BAC end sequences and demonstrate their applications in comparative mapping in catfish. Results We reported the generation of 43,000 BAC end sequences and their applications for comparative genome analysis in catfish. Using these and the additional 20,000 existing BAC end sequences as a resource along with linkage mapping and existing physical map, conserved syntenic regions were identified between the catfish and zebrafish genomes. A total of 10,943 catfish BAC end sequences (17.3% had significant BLAST hits to the zebrafish genome (cutoff value ≤ e-5, of which 3,221 were unique gene hits, providing a platform for comparative mapping based on locations of these genes in catfish and zebrafish. Genetic linkage mapping of microsatellites associated with contigs allowed identification of large conserved genomic segments and construction of super scaffolds. Conclusion BAC end sequences and their associated polymorphic markers are great resources for comparative genome analysis in catfish. Highly conserved chromosomal regions were identified to exist between catfish and zebrafish. However, it appears that the level of conservation at local genomic regions are high while a high level of chromosomal shuffling and rearrangements exist between catfish and zebrafish genomes. Orthologous regions established through comparative analysis should facilitate both structural and functional genome analysis in catfish.

  3. Nonlinear analysis of river flow time sequences

    Science.gov (United States)

    Porporato, Amilcare; Ridolfi, Luca

    1997-06-01

    Within the field of chaos theory several methods for the analysis of complex dynamical systems have recently been proposed. In light of these ideas we study the dynamics which control the behavior over time of river flow, investigating the existence of a low-dimension deterministic component. The present article follows the research undertaken in the work of Porporato and Ridolfi [1996a] in which some clues as to the existence of chaos were collected. Particular emphasis is given here to the problem of noise and to nonlinear prediction. With regard to the latter, the benefits obtainable by means of the interpolation of the available time series are reported and the remarkable predictive results attained with this nonlinear method are shown.

  4. Identification of new polymorphic regions and differentiation of cultivated olives (Olea europaea L.) through plastome sequence comparison

    Science.gov (United States)

    2010-01-01

    Background The cultivated olive (Olea europaea L.) is the most agriculturally important species of the Oleaceae family. Although many studies have been performed on plastid polymorphisms to evaluate taxonomy, phylogeny and phylogeography of Olea subspecies, only few polymorphic regions discriminating among the agronomically and economically important olive cultivars have been identified. The objective of this study was to sequence the entire plastome of olive and analyze many potential polymorphic regions to develop new inter-cultivar genetic markers. Results The complete plastid genome of the olive cultivar Frantoio was determined by direct sequence analysis using universal and novel PCR primers designed to amplify all overlapping regions. The chloroplast genome of the olive has an organisation and gene order that is conserved among numerous Angiosperm species and do not contain any of the inversions, gene duplications, insertions, inverted repeat expansions and gene/intron losses that have been found in the chloroplast genomes of the genera Jasminum and Menodora, from the same family as Olea. The annotated sequence was used to evaluate the content of coding genes, the extent, and distribution of repeated and long dispersed sequences and the nucleotide composition pattern. These analyses provided essential information for structural, functional and comparative genomic studies in olive plastids. Furthermore, the alignment of the olive plastome sequence to those of other varieties and species identified 30 new organellar polymorphisms within the cultivated olive. Conclusions In addition to identifying mutations that may play a functional role in modifying the metabolism and adaptation of olive cultivars, the new chloroplast markers represent a valuable tool to assess the level of olive intercultivar plastome variation for use in population genetic analysis, phylogenesis, cultivar characterisation and DNA food tracking. PMID:20868482

  5. Accident sequence analysis of human-computer interface design

    International Nuclear Information System (INIS)

    Fan, C.-F.; Chen, W.-H.

    2000-01-01

    It is important to predict potential accident sequences of human-computer interaction in a safety-critical computing system so that vulnerable points can be disclosed and removed. We address this issue by proposing a Multi-Context human-computer interaction Model along with its analysis techniques, an Augmented Fault Tree Analysis, and a Concurrent Event Tree Analysis. The proposed augmented fault tree can identify the potential weak points in software design that may induce unintended software functions or erroneous human procedures. The concurrent event tree can enumerate possible accident sequences due to these weak points

  6. Food Fish Identification from DNA Extraction through Sequence Analysis

    Science.gov (United States)

    Hallen-Adams, Heather E.

    2015-01-01

    This experiment exposed 3rd and 4th y undergraduates and graduate students taking a course in advanced food analysis to DNA extraction, polymerase chain reaction (PCR), and DNA sequence analysis. Students provided their own fish sample, purchased from local grocery stores, and the class as a whole extracted DNA, which was then subjected to PCR,…

  7. Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers.

    Directory of Open Access Journals (Sweden)

    Stephan Pabinger

    Full Text Available Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM. Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage

  8. Probabilistic topic modeling for the analysis and classification of genomic sequences

    Science.gov (United States)

    2015-01-01

    Background Studies on genomic sequences for classification and taxonomic identification have a leading role in the biomedical field and in the analysis of biodiversity. These studies are focusing on the so-called barcode genes, representing a well defined region of the whole genome. Recently, alignment-free techniques are gaining more importance because they are able to overcome the drawbacks of sequence alignment techniques. In this paper a new alignment-free method for DNA sequences clustering and classification is proposed. The method is based on k-mers representation and text mining techniques. Methods The presented method is based on Probabilistic Topic Modeling, a statistical technique originally proposed for text documents. Probabilistic topic models are able to find in a document corpus the topics (recurrent themes) characterizing classes of documents. This technique, applied on DNA sequences representing the documents, exploits the frequency of fixed-length k-mers and builds a generative model for a training group of sequences. This generative model, obtained through the Latent Dirichlet Allocation (LDA) algorithm, is then used to classify a large set of genomic sequences. Results and conclusions We performed classification of over 7000 16S DNA barcode sequences taken from Ribosomal Database Project (RDP) repository, training probabilistic topic models. The proposed method is compared to the RDP tool and Support Vector Machine (SVM) classification algorithm in a extensive set of trials using both complete sequences and short sequence snippets (from 400 bp to 25 bp). Our method reaches very similar results to RDP classifier and SVM for complete sequences. The most interesting results are obtained when short sequence snippets are considered. In these conditions the proposed method outperforms RDP and SVM with ultra short sequences and it exhibits a smooth decrease of performance, at every taxonomic level, when the sequence length is decreased. PMID:25916734

  9. Multilocus Sequence Analysis and rpoB Sequencing of Mycobacterium abscessus (Sensu Lato) Strains▿

    Science.gov (United States)

    Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate

    2011-01-01

    Mycobacterium abscessus, Mycobacterium bolletii, and Mycobacterium massiliense (Mycobacterium abscessus sensu lato) are closely related species that currently are identified by the sequencing of the rpoB gene. However, recent studies show that rpoB sequencing alone is insufficient to discriminate between these species, and some authors have questioned their current taxonomic classification. We studied here a large collection of M. abscessus (sensu lato) strains by partial rpoB sequencing (752 bp) and multilocus sequence analysis (MLSA). The final MLSA scheme developed was based on the partial sequences of eight housekeeping genes: argH, cya, glpK, gnd, murC, pgm, pta, and purH. The strains studied included the three type strains (M. abscessus CIP 104536T, M. massiliense CIP 108297T, and M. bolletii CIP 108541T) and 120 isolates recovered between 1997 and 2007 in France, Germany, Switzerland, and Brazil. The rpoB phylogenetic tree confirmed the existence of three main clusters, each comprising the type strain of one species. However, divergence values between the M. massiliense and M. bolletii clusters all were below 3% and between the M. abscessus and M. massiliense clusters were from 2.66 to 3.59%. The tree produced using the concatenated MLSA gene sequences (4,071 bp) also showed three main clusters, each comprising the type strain of one species. The M. abscessus cluster had a bootstrap value of 100% and was mostly compact. Bootstrap values for the M. massiliense and M. bolletii branches were much lower (71 and 61%, respectively), with the M. massiliense cluster having a fuzzy aspect. Mean (range) divergence values were 2.17% (1.13 to 2.58%) between the M. abscessus and M. massiliense clusters, 2.37% (1.5 to 2.85%) between the M. abscessus and M. bolletii clusters, and 2.28% (0.86 to 2.68%) between the M. massiliense and M. bolletii clusters. Adding the rpoB sequence to the MLSA-concatenated sequence (total sequence, 4,823 bp) had little effect on the clustering

  10. Multilocus sequence analysis and rpoB sequencing of Mycobacterium abscessus (sensu lato) strains.

    Science.gov (United States)

    Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate

    2011-02-01

    Mycobacterium abscessus, Mycobacterium bolletii, and Mycobacterium massiliense (Mycobacterium abscessus sensu lato) are closely related species that currently are identified by the sequencing of the rpoB gene. However, recent studies show that rpoB sequencing alone is insufficient to discriminate between these species, and some authors have questioned their current taxonomic classification. We studied here a large collection of M. abscessus (sensu lato) strains by partial rpoB sequencing (752 bp) and multilocus sequence analysis (MLSA). The final MLSA scheme developed was based on the partial sequences of eight housekeeping genes: argH, cya, glpK, gnd, murC, pgm, pta, and purH. The strains studied included the three type strains (M. abscessus CIP 104536(T), M. massiliense CIP 108297(T), and M. bolletii CIP 108541(T)) and 120 isolates recovered between 1997 and 2007 in France, Germany, Switzerland, and Brazil. The rpoB phylogenetic tree confirmed the existence of three main clusters, each comprising the type strain of one species. However, divergence values between the M. massiliense and M. bolletii clusters all were below 3% and between the M. abscessus and M. massiliense clusters were from 2.66 to 3.59%. The tree produced using the concatenated MLSA gene sequences (4,071 bp) also showed three main clusters, each comprising the type strain of one species. The M. abscessus cluster had a bootstrap value of 100% and was mostly compact. Bootstrap values for the M. massiliense and M. bolletii branches were much lower (71 and 61%, respectively), with the M. massiliense cluster having a fuzzy aspect. Mean (range) divergence values were 2.17% (1.13 to 2.58%) between the M. abscessus and M. massiliense clusters, 2.37% (1.5 to 2.85%) between the M. abscessus and M. bolletii clusters, and 2.28% (0.86 to 2.68%) between the M. massiliense and M. bolletii clusters. Adding the rpoB sequence to the MLSA-concatenated sequence (total sequence, 4,823 bp) had little effect on the

  11. An optimum analysis sequence for environmental gamma-ray spectrometry

    Energy Technology Data Exchange (ETDEWEB)

    De la Torre, F.; Rios M, C.; Ruvalcaba A, M. G.; Mireles G, F.; Saucedo A, S.; Davila R, I.; Pinedo, J. L., E-mail: fta777@hotmail.co [Universidad Autonoma de Zacatecas, Centro Regional de Estudis Nucleares, Calle Cipres No. 10, Fracc. La Penuela, 98068 Zacatecas (Mexico)

    2010-10-15

    This work aims to obtain an optimum analysis sequence for environmental gamma-ray spectroscopy by means of Genie 2000 (Canberra). Twenty different analysis sequences were customized using different peak area percentages and different algorithms for: 1) peak finding, and 2) peak area determination, and with or without the use of a library -based on evaluated nuclear data- of common gamma-ray emitters in environmental samples. The use of an optimum analysis sequence with certified nuclear information avoids the problems originated by the significant variations in out-of-date nuclear parameters of commercial software libraries. Interference-free gamma ray energies with absolute emission probabilities greater than 3.75% were included in the customized library. The gamma-ray spectroscopy system (based on a Ge Re-3522 Canberra detector) was calibrated both in energy and shape by means of the IAEA-2002 reference spectra for software intercomparison. To test the performance of the analysis sequences, the IAEA-2002 reference spectrum was used. The z-score and the reduced {chi}{sup 2} criteria were used to determine the optimum analysis sequence. The results show an appreciable variation in the peak area determinations and their corresponding uncertainties. Particularly, the combination of second derivative peak locate with simple peak area integration algorithms provides the greater accuracy. Lower accuracy comes from the combination of library directed peak locate algorithm and Genie's Gamma-M peak area determination. (Author)

  12. An optimum analysis sequence for environmental gamma-ray spectrometry

    International Nuclear Information System (INIS)

    De la Torre, F.; Rios M, C.; Ruvalcaba A, M. G.; Mireles G, F.; Saucedo A, S.; Davila R, I.; Pinedo, J. L.

    2010-10-01

    This work aims to obtain an optimum analysis sequence for environmental gamma-ray spectroscopy by means of Genie 2000 (Canberra). Twenty different analysis sequences were customized using different peak area percentages and different algorithms for: 1) peak finding, and 2) peak area determination, and with or without the use of a library -based on evaluated nuclear data- of common gamma-ray emitters in environmental samples. The use of an optimum analysis sequence with certified nuclear information avoids the problems originated by the significant variations in out-of-date nuclear parameters of commercial software libraries. Interference-free gamma ray energies with absolute emission probabilities greater than 3.75% were included in the customized library. The gamma-ray spectroscopy system (based on a Ge Re-3522 Canberra detector) was calibrated both in energy and shape by means of the IAEA-2002 reference spectra for software intercomparison. To test the performance of the analysis sequences, the IAEA-2002 reference spectrum was used. The z-score and the reduced χ 2 criteria were used to determine the optimum analysis sequence. The results show an appreciable variation in the peak area determinations and their corresponding uncertainties. Particularly, the combination of second derivative peak locate with simple peak area integration algorithms provides the greater accuracy. Lower accuracy comes from the combination of library directed peak locate algorithm and Genie's Gamma-M peak area determination. (Author)

  13. Isolation and sequence analysis of a cDNA clone encoding the fifth complement component

    DEFF Research Database (Denmark)

    Lundwall, Åke B; Wetsel, Rick A; Kristensen, Torsten

    1985-01-01

    DNA clone of 1.85 kilobase pairs was isolated. Hybridization of the mixed-sequence probe to the complementary strand of the plasmid insert and sequence analysis by the dideoxy method predicted the expected protein sequence of C5a (positions 1-12), amino-terminal to the anticipated priming site. The sequence......, subcloned into M13 mp8, and sequenced at random by the dideoxy technique, thereby generating a contiguous sequence of 1703 base pairs. This clone contained coding sequence for the C-terminal 262 amino acid residues of the beta-chain, the entire C5a fragment, and the N-terminal 98 residues of the alpha......'-chain. The 3' end of the clone had a polyadenylated tail preceded by a polyadenylation recognition site, a 3'-untranslated region, and base pairs homologous to the human Alu concensus sequence. Comparison of the derived partial human C5 protein sequence with that previously determined for murine C3 and human...

  14. Analysis of Multiple Genomic Sequence Alignments: A Web Resource, Online Tools, and Lessons Learned From Analysis of Mammalian SCL Loci

    Science.gov (United States)

    Chapman, Michael A.; Donaldson, Ian J.; Gilbert, James; Grafham, Darren; Rogers, Jane; Green, Anthony R.; Göttgens, Berthold

    2004-01-01

    Comparative analysis of genomic sequences is becoming a standard technique for studying gene regulation. However, only a limited number of tools are currently available for the analysis of multiple genomic sequences. An extensive data set for the testing and training of such tools is provided by the SCL gene locus. Here we have expanded the data set to eight vertebrate species by sequencing the dog SCL locus and by annotating the dog and rat SCL loci. To provide a resource for the bioinformatics community, all SCL sequences and functional annotations, comprising a collation of the extensive experimental evidence pertaining to SCL regulation, have been made available via a Web server. A Web interface to new tools specifically designed for the display and analysis of multiple sequence alignments was also implemented. The unique SCL data set and new sequence comparison tools allowed us to perform a rigorous examination of the true benefits of multiple sequence comparisons. We demonstrate that multiple sequence alignments are, overall, superior to pairwise alignments for identification of mammalian regulatory regions. In the search for individual transcription factor binding sites, multiple alignments markedly increase the signal-to-noise ratio compared to pairwise alignments. PMID:14718377

  15. Draft genome sequence of bitter gourd (Momordica charantia), a vegetable and medicinal plant in tropical and subtropical regions.

    Science.gov (United States)

    Urasaki, Naoya; Takagi, Hiroki; Natsume, Satoshi; Uemura, Aiko; Taniai, Naoki; Miyagi, Norimichi; Fukushima, Mai; Suzuki, Shouta; Tarora, Kazuhiko; Tamaki, Moritoshi; Sakamoto, Moriaki; Terauchi, Ryohei; Matsumura, Hideo

    2017-02-01

    Bitter gourd (Momordica charantia) is an important vegetable and medicinal plant in tropical and subtropical regions globally. In this study, the draft genome sequence of a monoecious bitter gourd inbred line, OHB3-1, was analyzed. Through Illumina sequencing and de novo assembly, scaffolds of 285.5 Mb in length were generated, corresponding to ∼84% of the estimated genome size of bitter gourd (339 Mb). In this draft genome sequence, 45,859 protein-coding gene loci were identified, and transposable elements accounted for 15.3% of the whole genome. According to synteny mapping and phylogenetic analysis of conserved genes, bitter gourd was more related to watermelon (Citrullus lanatus) than to cucumber (Cucumis sativus) or melon (C. melo). Using RAD-seq analysis, 1507 marker loci were genotyped in an F2 progeny of two bitter gourd lines, resulting in an improved linkage map, comprising 11 linkage groups. By anchoring RAD tag markers, 255 scaffolds were assigned to the linkage map. Comparative analysis of genome sequences and predicted genes determined that putative trypsin-inhibitor and ribosome-inactivating genes were distinctive in the bitter gourd genome. These genes could characterize the bitter gourd as a medicinal plant. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  16. Validation of Genotyping-By-Sequencing Analysis in Populations of Tetraploid Alfalfa by 454 Sequencing

    Science.gov (United States)

    Rocher, Solen; Jean, Martine; Castonguay, Yves; Belzile, François

    2015-01-01

    Genotyping-by-sequencing (GBS) is a relatively low-cost high throughput genotyping technology based on next generation sequencing and is applicable to orphan species with no reference genome. A combination of genome complexity reduction and multiplexing with DNA barcoding provides a simple and affordable way to resolve allelic variation between plant samples or populations. GBS was performed on ApeKI libraries using DNA from 48 genotypes each of two heterogeneous populations of tetraploid alfalfa (Medicago sativa spp. sativa): the synthetic cultivar Apica (ATF0) and a derived population (ATF5) obtained after five cycles of recurrent selection for superior tolerance to freezing (TF). Nearly 400 million reads were obtained from two lanes of an Illumina HiSeq 2000 sequencer and analyzed with the Universal Network-Enabled Analysis Kit (UNEAK) pipeline designed for species with no reference genome. Following the application of whole dataset-level filters, 11,694 single nucleotide polymorphism (SNP) loci were obtained. About 60% had a significant match on the Medicago truncatula syntenic genome. The accuracy of allelic ratios and genotype calls based on GBS data was directly assessed using 454 sequencing on a subset of SNP loci scored in eight plant samples. Sequencing depth in this study was not sufficient for accurate tetraploid allelic dosage, but reliable genotype calls based on diploid allelic dosage were obtained when using additional quality filtering. Principal Component Analysis of SNP loci in plant samples revealed that a small proportion (<5%) of the genetic variability assessed by GBS is able to differentiate ATF0 and ATF5. Our results confirm that analysis of GBS data using UNEAK is a reliable approach for genome-wide discovery of SNP loci in outcrossed polyploids. PMID:26115486

  17. Cloning and sequence analysis of hyaluronoglucosaminidase (nagH gene of Clostridium chauvoei

    Directory of Open Access Journals (Sweden)

    Saroj K. Dangi

    2017-09-01

    Full Text Available Aim: Blackleg disease is caused by Clostridium chauvoei in ruminants. Although virulence factors such as C. chauvoei toxin A, sialidase, and flagellin are well characterized, hyaluronidases of C. chauvoei are not characterized. The present study was aimed at cloning and sequence analysis of hyaluronoglucosaminidase (nagH gene of C. chauvoei. Materials and Methods: C. chauvoei strain ATCC 10092 was grown in ATCC 2107 media and confirmed by polymerase chain reaction (PCR using the primers specific for 16-23S rDNA spacer region. nagH gene of C. chauvoei was amplified and cloned into pRham-SUMO vector and transformed into Escherichia cloni 10G cells. The construct was then transformed into E. cloni cells. Colony PCR was carried out to screen the colonies followed by sequencing of nagH gene in the construct. Results: PCR amplification yielded nagH gene of 1143 bp product, which was cloned in prokaryotic expression system. Colony PCR, as well as sequencing of nagH gene, confirmed the presence of insert. Sequence was then subjected to BLAST analysis of NCBI, which confirmed that the sequence was indeed of nagH gene of C. chauvoei. Phylogenetic analysis of the sequence showed that it is closely related to Clostridium perfringens and Clostridium paraputrificum. Conclusion: The gene for virulence factor nagH was cloned into a prokaryotic expression vector and confirmed by sequencing.

  18. The BsaHI restriction-modification system: Cloning, sequencing and analysis of conserved motifs

    Directory of Open Access Journals (Sweden)

    Roberts Richard J

    2008-05-01

    Full Text Available Abstract Background Restriction and modification enzymes typically recognise short DNA sequences of between two and eight bases in length. Understanding the mechanism of this recognition represents a significant challenge that we begin to address for the BsaHI restriction-modification system, which recognises the six base sequence GRCGYC. Results The DNA sequences of the genes for the BsaHI methyltransferase, bsaHIM, and restriction endonuclease, bsaHIR, have been determined (GenBank accession #EU386360, cloned and expressed in E. coli. Both the restriction endonuclease and methyltransferase enzymes share significant similarity with a group of 6 other enzymes comprising the restriction-modification systems HgiDI and HgiGI and the putative HindVP, NlaCORFDP, NpuORFC228P and SplZORFNP restriction-modification systems. A sequence alignment of these homologues shows that their amino acid sequences are largely conserved and highlights several motifs of interest. We target one such conserved motif, reading SPERRFD, at the C-terminal end of the bsaHIR gene. A mutational analysis of these amino acids indicates that the motif is crucial for enzymatic activity. Sequence alignment of the methyltransferase gene reveals a short motif within the target recognition domain that is conserved among enzymes recognising the same sequences. Thus, this motif may be used as a diagnostic tool to define the recognition sequences of the cytosine C5 methyltransferases. Conclusion We have cloned and sequenced the BsaHI restriction and modification enzymes. We have identified a region of the R. BsaHI enzyme that is crucial for its activity. Analysis of the amino acid sequence of the BsaHI methyltransferase enzyme led us to propose two new motifs that can be used in the diagnosis of the recognition sequence of the cytosine C5-methyltransferases.

  19. Regularized rare variant enrichment analysis for case-control exome sequencing data.

    Science.gov (United States)

    Larson, Nicholas B; Schaid, Daniel J

    2014-02-01

    Rare variants have recently garnered an immense amount of attention in genetic association analysis. However, unlike methods traditionally used for single marker analysis in GWAS, rare variant analysis often requires some method of aggregation, since single marker approaches are poorly powered for typical sequencing study sample sizes. Advancements in sequencing technologies have rendered next-generation sequencing platforms a realistic alternative to traditional genotyping arrays. Exome sequencing in particular not only provides base-level resolution of genetic coding regions, but also a natural paradigm for aggregation via genes and exons. Here, we propose the use of penalized regression in combination with variant aggregation measures to identify rare variant enrichment in exome sequencing data. In contrast to marginal gene-level testing, we simultaneously evaluate the effects of rare variants in multiple genes, focusing on gene-based least absolute shrinkage and selection operator (LASSO) and exon-based sparse group LASSO models. By using gene membership as a grouping variable, the sparse group LASSO can be used as a gene-centric analysis of rare variants while also providing a penalized approach toward identifying specific regions of interest. We apply extensive simulations to evaluate the performance of these approaches with respect to specificity and sensitivity, comparing these results to multiple competing marginal testing methods. Finally, we discuss our findings and outline future research. © 2013 WILEY PERIODICALS, INC.

  20. Third-Generation Sequencing and Analysis of Four Complete Pig Liver Esterase Gene Sequences in Clones Identified by Screening BAC Library.

    Science.gov (United States)

    Zhou, Qiongqiong; Sun, Wenjuan; Liu, Xiyan; Wang, Xiliang; Xiao, Yuncai; Bi, Dingren; Yin, Jingdong; Shi, Deshi

    2016-01-01

    Pig liver carboxylesterase (PLE) gene sequences in GenBank are incomplete, which has led to difficulties in studying the genetic structure and regulation mechanisms of gene expression of PLE family genes. The aim of this study was to obtain and analysis of complete gene sequences of PLE family by screening from a Rongchang pig BAC library and third-generation PacBio gene sequencing. After a number of existing incomplete PLE isoform gene sequences were analysed, primers were designed based on conserved regions in PLE exons, and the whole pig genome used as a template for Polymerase chain reaction (PCR) amplification. Specific primers were then selected based on the PCR amplification results. A three-step PCR screening method was used to identify PLE-positive clones by screening a Rongchang pig BAC library and PacBio third-generation sequencing was performed. BLAST comparisons and other bioinformatics methods were applied for sequence analysis. Five PLE-positive BAC clones, designated BAC-10, BAC-70, BAC-75, BAC-119 and BAC-206, were identified. Sequence analysis yielded the complete sequences of four PLE genes, PLE1, PLE-B9, PLE-C4, and PLE-G2. Complete PLE gene sequences were defined as those containing regulatory sequences, exons, and introns. It was found that, not only did the PLE exon sequences of the four genes show a high degree of homology, but also that the intron sequences were highly similar. Additionally, the regulatory region of the genes contained two 720bps reverse complement sequences that may have an important function in the regulation of PLE gene expression. This is the first report to confirm the complete sequences of four PLE genes. In addition, the study demonstrates that each PLE isoform is encoded by a single gene and that the various genes exhibit a high degree of sequence homology, suggesting that the PLE family evolved from a single ancestral gene. Obtaining the complete sequences of these PLE genes provides the necessary foundation for

  1. Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes

    Directory of Open Access Journals (Sweden)

    Rebecca M. Davidson

    2011-11-01

    Full Text Available Transcriptome sequencing is a powerful method for studying global expression patterns in large, complex genomes. Evaluation of sequence-based expression profiles during reproductive development would provide functional annotation to genes underlying agronomic traits. We generated transcriptome profiles for 12 diverse maize ( L. reproductive tissues representing male, female, developing seed, and leaf tissues using high throughput transcriptome sequencing. Overall, ∼80% of annotated genes were expressed. Comparative analysis between sequence and hybridization-based methods demonstrated the utility of ribonucleic acid sequencing (RNA-seq for expression determination and differentiation of paralagous genes (∼85% of maize genes. Analysis of 4975 gene families across reproductive tissues revealed expression divergence is proportional to family size. In all pairwise comparisons between tissues, 7 (pre- vs. postemergence cobs to 48% (pollen vs. ovule of genes were differentially expressed. Genes with expression restricted to a single tissue within this study were identified with the highest numbers observed in leaves, endosperm, and pollen. Coexpression network analysis identified 17 gene modules with complex and shared expression patterns containing many previously described maize genes. The data and analyses in this study provide valuable tools through improved gene annotation, gene family characterization, and a core set of candidate genes to further characterize maize reproductive development and improve grain yield potential.

  2. Analysis of regional climate strategies in the Barents region

    Energy Technology Data Exchange (ETDEWEB)

    Himanen, S.; Inkeroeinen, J.; Latola, K.; Vaisanen, T.; Alasaarela, E.

    2012-11-15

    Climate change is a global phenomenon with especially harsh effects on the Arctic and northern regions. The Arctic's average temperature has risen at almost twice the rate as elsewhere in the past few decades. Since 1966, the Arctic land area covered by snow in early summer has shrunk by almost a fifth. The Barents Region consists of the northern parts of Norway, Sweden, Finland and Russia (i.e. the European part of Russia). Climate change will cause serious impacts in the Barents Region because of its higher density of population living under harsh climatic conditions, thus setting it apart from other Arctic areas. In many cases, economic activities, like tourism, rely on certain weather conditions. For this reason, climate change and adaptation to it is of special urgency for the region. Regional climate change strategies are important tools for addressing mitigation and adaptation to climate change as they can be used to consolidate the efforts of different stakeholders of the public and private sectors. Regional strategies can be important factors in achieving the national and international goals. The study evaluated how the national climate change goals were implemented in the regional and local strategies and programmes in northern Finland. The specific goal was to describe the processes by which the regional strategies were prepared and implemented, and how the work was expanded to include the whole of northern Finland. Finally, the Finnish preparatory processes were compared to case examples of processes for preparing climate change strategies elsewhere in the Barents Region. This analysis provides examples of good practices in preparing a climate change strategy and implementing it. (orig.)

  3. Digital Sequences and a Time Reversal-Based Impact Region Imaging and Localization Method

    Science.gov (United States)

    Qiu, Lei; Yuan, Shenfang; Mei, Hanfei; Qian, Weifeng

    2013-01-01

    To reduce time and cost of damage inspection, on-line impact monitoring of aircraft composite structures is needed. A digital monitor based on an array of piezoelectric transducers (PZTs) is developed to record the impact region of impacts on-line. It is small in size, lightweight and has low power consumption, but there are two problems with the impact alarm region localization method of the digital monitor at the current stage. The first one is that the accuracy rate of the impact alarm region localization is low, especially on complex composite structures. The second problem is that the area of impact alarm region is large when a large scale structure is monitored and the number of PZTs is limited which increases the time and cost of damage inspections. To solve the two problems, an impact alarm region imaging and localization method based on digital sequences and time reversal is proposed. In this method, the frequency band of impact response signals is estimated based on the digital sequences first. Then, characteristic signals of impact response signals are constructed by sinusoidal modulation signals. Finally, the phase synthesis time reversal impact imaging method is adopted to obtain the impact region image. Depending on the image, an error ellipse is generated to give out the final impact alarm region. A validation experiment is implemented on a complex composite wing box of a real aircraft. The validation results show that the accuracy rate of impact alarm region localization is approximately 100%. The area of impact alarm region can be reduced and the number of PZTs needed to cover the same impact monitoring region is reduced by more than a half. PMID:24084123

  4. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

    Science.gov (United States)

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-06-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. © The Author 2013. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  5. Genetic mutation analysis of human gastric adenocarcinomas using ion torrent sequencing platform.

    Directory of Open Access Journals (Sweden)

    Zhi Xu

    Full Text Available Gastric cancer is the one of the major causes of cancer-related death, especially in Asia. Gastric adenocarcinoma, the most common type of gastric cancer, is heterogeneous and its incidence and cause varies widely with geographical regions, gender, ethnicity, and diet. Since unique mutations have been observed in individual human cancer samples, identification and characterization of the molecular alterations underlying individual gastric adenocarcinomas is a critical step for developing more effective, personalized therapies. Until recently, identifying genetic mutations on an individual basis by DNA sequencing remained a daunting task. Recent advances in new next-generation DNA sequencing technologies, such as the semiconductor-based Ion Torrent sequencing platform, makes DNA sequencing cheaper, faster, and more reliable. In this study, we aim to identify genetic mutations in the genes which are targeted by drugs in clinical use or are under development in individual human gastric adenocarcinoma samples using Ion Torrent sequencing. We sequenced 737 loci from 45 cancer-related genes in 238 human gastric adenocarcinoma samples using the Ion Torrent Ampliseq Cancer Panel. The sequencing analysis revealed a high occurrence of mutations along the TP53 locus (9.7% in our sample set. Thus, this study indicates the utility of a cost and time efficient tool such as Ion Torrent sequencing to screen cancer mutations for the development of personalized cancer therapy.

  6. De novo transcriptome sequencing and sequence analysis of the malaria vector Anopheles sinensis (Diptera: Culicidae)

    Science.gov (United States)

    2014-01-01

    Background Anopheles sinensis is the major malaria vector in China and Southeast Asia. Vector control is one of the most effective measures to prevent malaria transmission. However, there is little transcriptome information available for the malaria vector. To better understand the biological basis of malaria transmission and to develop novel and effective means of vector control, there is a need to build a transcriptome dataset for functional genomics analysis by large-scale RNA sequencing (RNA-seq). Methods To provide a more comprehensive and complete transcriptome of An. sinensis, eggs, larvae, pupae, male adults and female adults RNA were pooled together for cDNA preparation, sequenced using the Illumina paired-end sequencing technology and assembled into unigenes. These unigenes were then analyzed in their genome mapping, functional annotation, homology, codon usage bias and simple sequence repeats (SSRs). Results Approximately 51.6 million clean reads were obtained, trimmed, and assembled into 38,504 unigenes with an average length of 571 bp, an N50 of 711 bp, and an average GC content 51.26%. Among them, 98.4% of unigenes could be mapped onto the reference genome, and 69% of unigenes could be annotated with known biological functions. Homology analysis identified certain numbers of An. sinensis unigenes that showed homology or being putative 1:1 orthologues with genomes of other Dipteran species. Codon usage bias was analyzed and 1,904 SSRs were detected, which will provide effective molecular markers for the population genetics of this species. Conclusions Our data and analysis provide the most comprehensive transcriptomic resource and characteristics currently available for An. sinensis, and will facilitate genetic, genomic studies, and further vector control of An. sinensis. PMID:25000941

  7. High Sequence Variations in Mitochondrial DNA Control Region among Worldwide Populations of Flathead Mullet Mugil cephalus

    Directory of Open Access Journals (Sweden)

    Brian Wade Jamandre

    2014-01-01

    Full Text Available The sequence and structure of the complete mtDNA control region (CR of M. cephalus from African, Pacific, and Atlantic populations are presented in this study to assess its usefulness in phylogeographic studies of this species. The mtDNA CR sequence variations among M. cephalus populations largely exceeded intraspecific polymorphisms that are generally observed in other vertebrates. The length of CR sequence varied among M. cephalus populations due to the presence of indels and variable number of tandem repeats at the 3′ hypervariable domain. The high evolutionary rate of the CR in this species probably originated from these mutations. However, no excessive homoplasic mutations were noticed. Finally, the star shaped tree inferred from the CR polymorphism stresses a rapid radiation worldwide, in this species. The CR still appears as a good marker for phylogeographic investigations and additional worldwide samples are warranted to further investigate the genetic structure and evolution in M. cephalus.

  8. Hydraulic fracturing and the Crooked Lake Sequences: Insights gleaned from regional seismic networks

    Science.gov (United States)

    Schultz, Ryan; Stern, Virginia; Novakovic, Mark; Atkinson, Gail; Gu, Yu Jeffrey

    2015-04-01

    Within central Alberta, Canada, a new sequence of earthquakes has been recognized as of 1 December 2013 in a region of previous seismic quiescence near Crooked Lake, ~30 km west of the town of Fox Creek. We utilize a cross-correlation detection algorithm to detect more than 160 events to the end of 2014, which is temporally distinguished into five subsequences. This observation is corroborated by the uniqueness of waveforms clustered by subsequence. The Crooked Lake Sequences have come under scrutiny due to its strong temporal correlation (>99.99%) to the timing of hydraulic fracturing operations in the Duvernay Formation. We assert that individual subsequences are related to fracturing stimulation and, despite adverse initial station geometry, double-difference techniques allow us to spatially relate each cluster back to a unique horizontal well. Overall, we find that seismicity in the Crooked Lake Sequences is consistent with first-order observations of hydraulic fracturing induced seismicity.

  9. Plastome Sequence Determination and Comparative Analysis for Members of the Lolium-Festuca Grass Species Complex

    Science.gov (United States)

    Hand, Melanie L.; Spangenberg, German C.; Forster, John W.; Cogan, Noel O. I.

    2013-01-01

    Chloroplast genome sequences are of broad significance in plant biology, due to frequent use in molecular phylogenetics, comparative genomics, population genetics, and genetic modification studies. The present study used a second-generation sequencing approach to determine and assemble the plastid genomes (plastomes) of four representatives from the agriculturally important Lolium-Festuca species complex of pasture grasses (Lolium multiflorum, Festuca pratensis, Festuca altissima, and Festuca ovina). Total cellular DNA was extracted from either roots or leaves, was sequenced, and the output was filtered for plastome-related reads. A comparison between sources revealed fewer plastome-related reads from root-derived template but an increase in incidental bacterium-derived sequences. Plastome assembly and annotation indicated high levels of sequence identity and a conserved organization and gene content between species. However, frequent deletions within the F. ovina plastome appeared to contribute to a smaller plastid genome size. Comparative analysis with complete plastome sequences from other members of the Poaceae confirmed conservation of most grass-specific features. Detailed analysis of the rbcL–psaI intergenic region, however, revealed a “hot-spot” of variation characterized by independent deletion events. The evolutionary implications of this observation are discussed. The complete plastome sequences are anticipated to provide the basis for potential organelle-specific genetic modification of pasture grasses. PMID:23550121

  10. Sequence analysis corresponding to the PPE and PE proteins in ...

    Indian Academy of Sciences (India)

    Unknown

    AB repeats; Mycobacterium tuberculosis genome; PE-PPE domain; PPE, PE proteins; sequence analysis; surface antigens. J. Biosci. | Vol. ... bacterium tuberculosis genomes resulted in the identification of a previously uncharacterized 225 amino acid- ...... Vega Lopez F, Brooks L A, Dockrell H M, De Smet K A,. Thompson ...

  11. Molecular cloning, expression analysis and sequence prediction of ...

    African Journals Online (AJOL)

    CCAAT/enhancer-binding protein beta as an essential transcriptional factor, regulates the differentiation of adipocytes and the deposition of fat. Herein, we cloned the whole open reading frame (ORF) of bovine C/EBPβ gene and analyzed its putative protein structures via DNA cloning and sequence analysis. Then, the ...

  12. Multilocus sequence analysis of phytopathogenic species of the genus Streptomyces

    Science.gov (United States)

    The identification and classification of species within the genus Streptomyces is difficult because there are presently 576 validly described species and this number increases every year. The value of the application of multilocus sequence analysis scheme to the systematics of Streptomyces species h...

  13. Sequence symmetry analysis in pharmacovigilance and pharmacoepidemiologic studies

    DEFF Research Database (Denmark)

    Lai, Edward Chia Cheng; Pratt, Nicole; Hsieh, Cheng Yang

    2017-01-01

    Sequence symmetry analysis (SSA) is a method for detecting adverse drug events by utilizing computerized claims data. The method has been increasingly used to investigate safety concerns of medications and as a pharmacovigilance tool to identify unsuspected side effects. Validation studies have i...

  14. DNAApp: a mobile application for sequencing data analysis.

    Science.gov (United States)

    Nguyen, Phi-Vu; Verma, Chandra Shekhar; Gan, Samuel Ken-En

    2014-11-15

    There have been numerous applications developed for decoding and visualization of ab1 DNA sequencing files for Windows and MAC platforms, yet none exists for the increasingly popular smartphone operating systems. The ability to decode sequencing files cannot easily be carried out using browser accessed Web tools. To overcome this hurdle, we have developed a new native app called DNAApp that can decode and display ab1 sequencing file on Android and iOS. In addition to in-built analysis tools such as reverse complementation, protein translation and searching for specific sequences, we have incorporated convenient functions that would facilitate the harnessing of online Web tools for a full range of analysis. Given the high usage of Android/iOS tablets and smartphones, such bioinformatics apps would raise productivity and facilitate the high demand for analyzing sequencing data in biomedical research. The Android version of DNAApp is available in Google Play Store as 'DNAApp', and the iOS version is available in the App Store. More details on the app can be found at www.facebook.com/APDLab; www.bii.a-star.edu.sg/research/trd/apd.php The DNAApp user guide is available at http://tinyurl.com/DNAAppuser, and a video tutorial is available on Google Play Store and App Store, as well as on the Facebook page. samuelg@bii.a-star.edu.sg. © The Author 2014. Published by Oxford University Press.

  15. DNAApp: a mobile application for sequencing data analysis

    Science.gov (United States)

    Nguyen, Phi-Vu; Verma, Chandra Shekhar; Gan, Samuel Ken-En

    2014-01-01

    Summary: There have been numerous applications developed for decoding and visualization of ab1 DNA sequencing files for Windows and MAC platforms, yet none exists for the increasingly popular smartphone operating systems. The ability to decode sequencing files cannot easily be carried out using browser accessed Web tools. To overcome this hurdle, we have developed a new native app called DNAApp that can decode and display ab1 sequencing file on Android and iOS. In addition to in-built analysis tools such as reverse complementation, protein translation and searching for specific sequences, we have incorporated convenient functions that would facilitate the harnessing of online Web tools for a full range of analysis. Given the high usage of Android/iOS tablets and smartphones, such bioinformatics apps would raise productivity and facilitate the high demand for analyzing sequencing data in biomedical research. Availability and implementation: The Android version of DNAApp is available in Google Play Store as ‘DNAApp’, and the iOS version is available in the App Store. More details on the app can be found at www.facebook.com/APDLab; www.bii.a-star.edu.sg/research/trd/apd.php The DNAApp user guide is available at http://tinyurl.com/DNAAppuser, and a video tutorial is available on Google Play Store and App Store, as well as on the Facebook page. Contact: samuelg@bii.a-star.edu.sg PMID:25095882

  16. Construction of an integrated database to support genomic sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  17. Sequence organization and control of transcription in the bacteriophage T4 tRNA region.

    Science.gov (United States)

    Broida, J; Abelson, J

    1985-10-05

    Bacteriophage T4 contains genes for eight transfer RNAs and two stable RNAs of unknown function. These are found in two clusters at 70 X 10(3) base-pairs on the T4 genetic map. To understand the control of transcription in this region we have completed the sequencing of 5000 base-pairs in this region. The sequence contains a part of gene 3, gene 1, gene 57, internal protein I, the tRNA genes and five open reading frames which most likely code for heretofore unidentified proteins. We have used subclones of the region to investigate the kinetics of transcription in vivo. The results show that transcription in this region consists of overlapping early, middle and late transcripts. Transcription is directed from two early promoters, one or two middle promoters and perhaps two late promoters. This region contains all of the features that are seen in T4 transcription and as such is a good place to study the phenomenon in more detail.

  18. Genome-Based Identification of Active Prophage Regions by Next Generation Sequencing in Bacillus licheniformis DSM13

    Science.gov (United States)

    Hertel, Robert; Rodríguez, David Pintor; Hollensteiner, Jacqueline; Dietrich, Sascha; Leimbach, Andreas; Hoppert, Michael; Liesegang, Heiko; Volland, Sonja

    2015-01-01

    Prophages are viruses, which have integrated their genomes into the genome of a bacterial host. The status of the prophage genome can vary from fully intact with the potential to form infective particles to a remnant state where only a few phage genes persist. Prophages have impact on the properties of their host and are therefore of great interest for genomic research and strain design. Here we present a genome- and next generation sequencing (NGS)-based approach for identification and activity evaluation of prophage regions. Seven prophage or prophage-like regions were identified in the genome of Bacillus licheniformis DSM13. Six of these regions show similarity to members of the Siphoviridae phage family. The remaining region encodes the B. licheniformis orthologue of the PBSX prophage from Bacillus subtilis. Analysis of isolated phage particles (induced by mitomycin C) from the wild-type strain and prophage deletion mutant strains revealed activity of the prophage regions BLi_Pp2 (PBSX-like), BLi_Pp3 and BLi_Pp6. In contrast to BLi_Pp2 and BLi_Pp3, neither phage DNA nor phage particles of BLi_Pp6 could be visualized. However, the ability of prophage BLi_Pp6 to generate particles could be confirmed by sequencing of particle-protected DNA mapping to prophage locus BLi_Pp6. The introduced NGS-based approach allows the investigation of prophage regions and their ability to form particles. Our results show that this approach increases the sensitivity of prophage activity analysis and can complement more conventional approaches such as transmission electron microscopy (TEM). PMID:25811873

  19. Analysis of Sequence Diagram Layout in Advanced UML Modelling Tools

    Directory of Open Access Journals (Sweden)

    Ņikiforova Oksana

    2016-05-01

    Full Text Available System modelling using Unified Modelling Language (UML is the task that should be solved for software development. The more complex software becomes the higher requirements are stated to demonstrate the system to be developed, especially in its dynamic aspect, which in UML is offered by a sequence diagram. To solve this task, the main attention is devoted to the graphical presentation of the system, where diagram layout plays the central role in information perception. The UML sequence diagram due to its specific structure is selected for a deeper analysis on the elements’ layout. The authors research represents the abilities of modern UML modelling tools to offer automatic layout of the UML sequence diagram and analyse them according to criteria required for the diagram perception.

  20. Network clustering coefficient approach to DNA sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gerhardt, Guenther J.L. [Universidade Federal do Rio Grande do Sul-Hospital de Clinicas de Porto Alegre, Rua Ramiro Barcelos 2350/sala 2040/90035-003 Porto Alegre (Brazil); Departamento de Fisica e Quimica da Universidade de Caxias do Sul, Rua Francisco Getulio Vargas 1130, 95001-970 Caxias do Sul (Brazil); Lemke, Ney [Programa Interdisciplinar em Computacao Aplicada, Unisinos, Av. Unisinos, 950, 93022-000 Sao Leopoldo, RS (Brazil); Corso, Gilberto [Departamento de Biofisica e Farmacologia, Centro de Biociencias, Universidade Federal do Rio Grande do Norte, Campus Universitario, 59072 970 Natal, RN (Brazil)]. E-mail: corso@dfte.ufrn.br

    2006-05-15

    In this work we propose an alternative DNA sequence analysis tool based on graph theoretical concepts. The methodology investigates the path topology of an organism genome through a triplet network. In this network, triplets in DNA sequence are vertices and two vertices are connected if they occur juxtaposed on the genome. We characterize this network topology by measuring the clustering coefficient. We test our methodology against two main bias: the guanine-cytosine (GC) content and 3-bp (base pairs) periodicity of DNA sequence. We perform the test constructing random networks with variable GC content and imposed 3-bp periodicity. A test group of some organisms is constructed and we investigate the methodology in the light of the constructed random networks. We conclude that the clustering coefficient is a valuable tool since it gives information that is not trivially contained in 3-bp periodicity neither in the variable GC content.

  1. Evolutionary analysis of hepatitis C virus gene sequences from 1953

    Science.gov (United States)

    Gray, Rebecca R.; Tanaka, Yasuhito; Takebe, Yutaka; Magiorkinis, Gkikas; Buskell, Zelma; Seeff, Leonard; Alter, Harvey J.; Pybus, Oliver G.

    2013-01-01

    Reconstructing the transmission history of infectious diseases in the absence of medical or epidemiological records often relies on the evolutionary analysis of pathogen genetic sequences. The precision of evolutionary estimates of epidemic history can be increased by the inclusion of sequences derived from ‘archived’ samples that are genetically distinct from contemporary strains. Historical sequences are especially valuable for viral pathogens that circulated for many years before being formally identified, including HIV and the hepatitis C virus (HCV). However, surprisingly few HCV isolates sampled before discovery of the virus in 1989 are currently available. Here, we report and analyse two HCV subgenomic sequences obtained from infected individuals in 1953, which represent the oldest genetic evidence of HCV infection. The pairwise genetic diversity between the two sequences indicates a substantial period of HCV transmission prior to the 1950s, and their inclusion in evolutionary analyses provides new estimates of the common ancestor of HCV in the USA. To explore and validate the evolutionary information provided by these sequences, we used a new phylogenetic molecular clock method to estimate the date of sampling of the archived strains, plus the dates of four more contemporary reference genomes. Despite the short fragments available, we conclude that the archived sequences are consistent with a proposed sampling date of 1953, although statistical uncertainty is large. Our cross-validation analyses suggest that the bias and low statistical power observed here likely arise from a combination of high evolutionary rate heterogeneity and an unstructured, star-like phylogeny. We expect that attempts to date other historical viruses under similar circumstances will meet similar problems. PMID:23938759

  2. Using SQL Databases for Sequence Similarity Searching and Analysis.

    Science.gov (United States)

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  3. RNA2 of grapevine fanleaf virus: sequence analysis and coat protein cistron location.

    Science.gov (United States)

    Serghini, M A; Fuchs, M; Pinck, M; Reinbolt, J; Walter, B; Pinck, L

    1990-07-01

    The nucleotide sequence of the genomic RNA2 (3774 nucleotides) of grapevine fanleaf virus strain F13 was determined from overlapping cDNA clones and its genetic organization was deduced. Two rapid and efficient methods were used for cDNA cloning of the 5' region of RNA2. The complete sequence contained only one long open reading frame of 3555 nucleotides (1184 codons, 131K product). The analysis of the N-terminal sequence of purified coat protein (CP) and identification of its C-terminal residue have allowed the CP cistron to be precisely positioned within the polyprotein. The CP produced by proteolytic cleavage at the Arg/Gly site between residues 680 and 681 contains 504 amino acids (Mr 56019) and has hydrophobic properties. The Arg/Gly cleavage site deduced by N-terminal amino acid sequence analysis is the first for a nepovirus coat protein and for plant viruses expressing their genomic RNAs by polyprotein synthesis. Comparison of GFLV RNA2 with M RNA of cowpea mosaic comovirus and with RNA2 of two closely related nepoviruses, tomato black ring virus and Hungarian grapevine chrome mosaic virus, showed strong similarities among the 3' non-coding regions but less similarity among the 5' end non-coding sequences than reported among other nepovirus RNAs.

  4. Now And Next Generation Sequencing Techniques: Future of Sequence Analysis using Cloud Computing

    Directory of Open Access Journals (Sweden)

    Radhe Shyam Thakur

    2012-12-01

    Full Text Available Advancements in the field of sequencing techniques resulted in the huge sequenced data to be produced at a very faster rate. It is going cumbersome for the datacenter to maintain the databases. Data mining and sequence analysis approaches needs to analyze the databases several times to reach any efficient conclusion. To cope with such overburden on computer resources and to reach efficient and effective conclusions quickly, the virtualization of the resources and computation on pay as you go concept was introduced and termed as cloud computing. The datacenter’s hardware and software is collectively known as cloud which when available publicly is termed as public cloud. The datacenter’s resources are provided in a virtual mode to the clients via a service provider like Amazon, Google and Joyent which charges on pay as you go manner. The workload is shifted to the provider which is maintained by the required hardware and software upgradation. The service provider manages it by upgrading the requirements in the virtual mode. Basically a virtual environment is created according to the need of the user by taking permission from datacenter via internet, the task is performed and the environment is deleted after the task is over. In this discussion, we are focusing on the basics of cloud computing, the prerequisites and overall working of clouds. Furthermore, briefly the applications of cloud computing in biological systems, especially in comparative genomics, genome informatics and SNP detection with reference to traditional workflow are discussed.

  5. Now and next-generation sequencing techniques: future of sequence analysis using cloud computing.

    Science.gov (United States)

    Thakur, Radhe Shyam; Bandopadhyay, Rajib; Chaudhary, Bratati; Chatterjee, Sourav

    2012-01-01

    Advances in the field of sequencing techniques have resulted in the greatly accelerated production of huge sequence datasets. This presents immediate challenges in database maintenance at datacenters. It provides additional computational challenges in data mining and sequence analysis. Together these represent a significant overburden on traditional stand-alone computer resources, and to reach effective conclusions quickly and efficiently, the virtualization of the resources and computation on a pay-as-you-go concept (together termed "cloud computing") has recently appeared. The collective resources of the datacenter, including both hardware and software, can be available publicly, being then termed a public cloud, the resources being provided in a virtual mode to the clients who pay according to the resources they employ. Examples of public companies providing these resources include Amazon, Google, and Joyent. The computational workload is shifted to the provider, which also implements required hardware and software upgrades over time. A virtual environment is created in the cloud corresponding to the computational and data storage needs of the user via the internet. The task is then performed, the results transmitted to the user, and the environment finally deleted after all tasks are completed. In this discussion, we focus on the basics of cloud computing, and go on to analyze the prerequisites and overall working of clouds. Finally, the applications of cloud computing in biological systems, particularly in comparative genomics, genome informatics, and SNP detection are discussed with reference to traditional workflows.

  6. Molecular Identification of Isolated Fungi from Unopened Containers of Greek Yogurt by DNA Sequencing of Internal Transcribed Spacer Region

    Directory of Open Access Journals (Sweden)

    Irshad M. Sulaiman

    2014-06-01

    Full Text Available In our previous study, we described the development of an internal transcribed spacer (ITS1 sequencing method, and used this protocol in species-identification of isolated fungi collected from the manufacturing areas of a compounding company known to have caused the multistate fungal meningitis outbreak in the United States. In this follow-up study, we have analyzed the unopened vials of Greek yogurt from the recalled batch to determine the possible cause of microbial contamination in the product. A total of 15 unopened vials of Greek yogurt belonging to the recalled batch were examined for the detection of fungi in these samples known to cause foodborne illness following conventional microbiological protocols. Fungi were isolated from all of the 15 Greek yogurt samples analyzed. The isolated fungi were genetically typed by DNA sequencing of PCR-amplified ITS1 region of rRNA gene. Analysis of data confirmed all of the isolated fungal isolates from the Greek yogurt to be Rhizomucor variabilis. The generated ITS1 sequences matched 100% with the published sequences available in GenBank. In addition, these yogurt samples were also tested for the presence of five types of bacteria (Salmonella, Listeria, Staphylococcus, Bacillus and Escherichia coli causing foodborne disease in humans, and found negative for all of them.

  7. Comparative In silico Study of Sex-Determining Region Y (SRY) Protein Sequences Involved in Sex-Determining.

    Science.gov (United States)

    Vakili Azghandi, Masoume; Nasiri, Mohammadreza; Shamsa, Ali; Jalali, Mohsen; Shariati, Mohammad Mahdi

    2016-04-01

    The SRY gene (SRY) provides instructions for making a transcription factor called the sex-determining region Y protein. The sex-determining region Y protein causes a fetus to develop as a male. In this study, SRY of 15 spices included of human, chimpanzee, dog, pig, rat, cattle, buffalo, goat, sheep, horse, zebra, frog, urial, dolphin and killer whale were used for determine of bioinformatic differences. Nucleotide sequences of SRY were retrieved from the NCBI databank. Bioinformatic analysis of SRY is done by CLC Main Workbench version 5.5 and ClustalW (http:/www.ebi.ac.uk/clustalw/) and MEGA6 softwares. The multiple sequence alignment results indicated that SRY protein sequences from Orcinus orca (killer whale) and Tursiopsaduncus (dolphin) have least genetic distance of 0.33 in these 15 species and are 99.67% identical at the amino acid level. Homosapiens and Pantroglodytes (chimpanzee) have the next lowest genetic distance of 1.35 and are 98.65% identical at the amino acid level. These findings indicate that the SRY proteins are conserved in the 15 species, and their evolutionary relationships are similar.

  8. Comparative In silico Study of Sex-Determining Region Y (SRY Protein Sequences Involved in Sex-Determining

    Directory of Open Access Journals (Sweden)

    Masoume Vakili Azghandi

    2016-05-01

    Full Text Available Background: The SRY gene (SRY provides instructions for making a transcription factor called the sex-determining region Y protein. The sex-determining region Y protein causes a fetus to develop as a male. In this study, SRY of 15 spices included of human, chimpanzee, dog, pig, rat, cattle, buffalo, goat, sheep, horse, zebra, frog, urial, dolphin and killer whale were used for determine of bioinformatic differences. Methods: Nucleotide sequences of SRY were retrieved from the NCBI databank. Bioinformatic analysis of SRY is done by CLC Main Workbench version 5.5 and ClustalW (http:/www.ebi.ac.uk/clustalw/ and MEGA6 softwares. Results: The multiple sequence alignment results indicated that SRY protein sequences from Orcinus orca (killer whale and Tursiopsaduncus (dolphin have least genetic distance of 0.33 in these 15 species and are 99.67% identical at the amino acid level. Homosapiens and Pantroglodytes (chimpanzee have the next lowest genetic distance of 1.35 and are 98.65% identical at the amino acid level. Conclusion: These findings indicate that the SRY proteins are conserved in the 15 species, and their evolutionary relationships are similar.

  9. In silico Analysis of osr40c1 Promoter Sequence Isolated from Indica Variety Pokkali

    OpenAIRE

    W.S.I. de Silva; M.M.N. Perera; K.L.N.S. Perera; A.M. Wickramasuriya; G.A.U. Jayasekera

    2017-01-01

    The promoter region of a drought and abscisic acid (ABA) inducible gene, osr40c1, was isolated from a salt-tolerant indica rice variety Pokkali, which is 670 bp upstream of the putative translation start codon. In silico promoter analysis of resulted sequence showed that at least 15 types of putative motifs were distributed within the sequence, including two types of common promoter elements, TATA and CAAT boxes. Additionally, several putative cis-acing regulatory elements which may be involv...

  10. An Imaging And Graphics Workstation For Image Sequence Analysis

    Science.gov (United States)

    Mostafavi, Hassan

    1990-01-01

    This paper describes an application-specific engineering workstation designed and developed to analyze imagery sequences from a variety of sources. The system combines the software and hardware environment of the modern graphic-oriented workstations with the digital image acquisition, processing and display techniques. The objective is to achieve automation and high throughput for many data reduction tasks involving metric studies of image sequences. The applications of such an automated data reduction tool include analysis of the trajectory and attitude of aircraft, missile, stores and other flying objects in various flight regimes including launch and separation as well as regular flight maneuvers. The workstation can also be used in an on-line or off-line mode to study three-dimensional motion of aircraft models in simulated flight conditions such as wind tunnels. The system's key features are: 1) Acquisition and storage of image sequences by digitizing real-time video or frames from a film strip; 2) computer-controlled movie loop playback, slow motion and freeze frame display combined with digital image sharpening, noise reduction, contrast enhancement and interactive image magnification; 3) multiple leading edge tracking in addition to object centroids at up to 60 fields per second from both live input video or a stored image sequence; 4) automatic and manual field-of-view and spatial calibration; 5) image sequence data base generation and management, including the measurement data products; 6) off-line analysis software for trajectory plotting and statistical analysis; 7) model-based estimation and tracking of object attitude angles; and 8) interface to a variety of video players and film transport sub-systems.

  11. Multilocus sequence analysis of Treponema denticola strains of diverse origin

    Directory of Open Access Journals (Sweden)

    Mo Sisu

    2013-02-01

    Full Text Available Abstract Background The oral spirochete bacterium Treponema denticola is associated with both the incidence and severity of periodontal disease. Although the biological or phenotypic properties of a significant number of T. denticola isolates have been reported in the literature, their genetic diversity or phylogeny has never been systematically investigated. Here, we describe a multilocus sequence analysis (MLSA of 20 of the most highly studied reference strains and clinical isolates of T. denticola; which were originally isolated from subgingival plaque samples taken from subjects from China, Japan, the Netherlands, Canada and the USA. Results The sequences of the 16S ribosomal RNA gene, and 7 conserved protein-encoding genes (flaA, recA, pyrH, ppnK, dnaN, era and radC were successfully determined for each strain. Sequence data was analyzed using a variety of bioinformatic and phylogenetic software tools. We found no evidence of positive selection or DNA recombination within the protein-encoding genes, where levels of intraspecific sequence polymorphism varied from 18.8% (flaA to 8.9% (dnaN. Phylogenetic analysis of the concatenated protein-encoding gene sequence data (ca. 6,513 nucleotides for each strain using Bayesian and maximum likelihood approaches indicated that the T. denticola strains were monophyletic, and formed 6 well-defined clades. All analyzed T. denticola strains appeared to have a genetic origin distinct from that of ‘Treponema vincentii’ or Treponema pallidum. No specific geographical relationships could be established; but several strains isolated from different continents appear to be closely related at the genetic level. Conclusions Our analyses indicate that previous biological and biophysical investigations have predominantly focused on a subset of T. denticola strains with a relatively narrow range of genetic diversity. Our methodology and results establish a genetic framework for the discrimination and phylogenetic

  12. Sirius PSB: a generic system for analysis of biological sequences.

    Science.gov (United States)

    Koh, Chuan Hock; Lin, Sharene; Jedd, Gregory; Wong, Limsoon

    2009-12-01

    Computational tools are essential components of modern biological research. For example, BLAST searches can be used to identify related proteins based on sequence homology, or when a new genome is sequenced, prediction models can be used to annotate functional sites such as transcription start sites, translation initiation sites and polyadenylation sites and to predict protein localization. Here we present Sirius Prediction Systems Builder (PSB), a new computational tool for sequence analysis, classification and searching. Sirius PSB has four main operations: (1) Building a classifier, (2) Deploying a classifier, (3) Search for proteins similar to query proteins, (4) Preliminary and post-prediction analysis. Sirius PSB supports all these operations via a simple and interactive graphical user interface. Besides being a convenient tool, Sirius PSB has also introduced two novelties in sequence analysis. Firstly, genetic algorithm is used to identify interesting features in the feature space. Secondly, instead of the conventional method of searching for similar proteins via sequence similarity, we introduced searching via features' similarity. To demonstrate the capabilities of Sirius PSB, we have built two prediction models - one for the recognition of Arabidopsis polyadenylation sites and another for the subcellular localization of proteins. Both systems are competitive against current state-of-the-art models based on evaluation of public datasets. More notably, the time and effort required to build each model is greatly reduced with the assistance of Sirius PSB. Furthermore, we show that under certain conditions when BLAST is unable to find related proteins, Sirius PSB can identify functionally related proteins based on their biophysical similarities. Sirius PSB and its related supplements are available at: http://compbio.ddns.comp.nus.edu.sg/~sirius.

  13. Cloning, nucleotide sequence and transcriptional analysis of the uvrA gene from Neisseria gonorrhoeae

    International Nuclear Information System (INIS)

    Black, C.G.; Fyfe, J.A.M.; Davies, J.K.

    1997-01-01

    A recombinant plasmid capable of restoring UV resistance to an Escherichia coli uvrA mutant was isolated from a genomic library of Neisseria gonorrhoeae. Sequence analysis revealed an open reading frame whose deduced amino acid sequence displayed significant similarity to those of the UvrA proteins of other bacterial species. A second open reading frame (ORF259) was identified upstream from, and in the opposite orientation to the gonococcal uvrA gene. Transcriptional fusions between portions of the gonococcal uvrA upstream region and a reporter gene were used to localise promoter activity in both E. coli and N. gonorrhoeae. The transcriptional starting points of uvrA and ORF259 were mapped in E. coli by primer extension analysis, and corresponding σ 70 promoters were identified. The arrangement of the uvrA-ORF259 intergenic region is similar to that of the gonococcal recA-aroD intergenic region. Both contain inverted copies of the 10 bp neisserial DNA uptake sequence situated between divergently transcribed genes. However, there is no evidence that either the uptake sequence or the proximity of the promoters influences expression of these genes. (author)

  14. Re-annotation of the physical map of Glycine max for polyploid-like regions by BAC end sequence driven whole genome shotgun read assembly

    Directory of Open Access Journals (Sweden)

    Shultz Jeffry

    2008-07-01

    Full Text Available Abstract Background Many of the world's most important food crops have either polyploid genomes or homeologous regions derived from segmental shuffling following polyploid formation. The soybean (Glycine max genome has been shown to be composed of approximately four thousand short interspersed homeologous regions with 1, 2 or 4 copies per haploid genome by RFLP analysis, microsatellite anchors to BACs and by contigs formed from BAC fingerprints. Despite these similar regions,, the genome has been sequenced by whole genome shotgun sequence (WGS. Here the aim was to use BAC end sequences (BES derived from three minimum tile paths (MTP to examine the extent and homogeneity of polyploid-like regions within contigs and the extent of correlation between the polyploid-like regions inferred from fingerprinting and the polyploid-like sequences inferred from WGS matches. Results Results show that when sequence divergence was 1–10%, the copy number of homeologous regions could be identified from sequence variation in WGS reads overlapping BES. Homeolog sequence variants (HSVs were single nucleotide polymorphisms (SNPs; 89% and single nucleotide indels (SNIs 10%. Larger indels were rare but present (1%. Simulations that had predicted fingerprints of homeologous regions could be separated when divergence exceeded 2% were shown to be false. We show that a 5–10% sequence divergence is necessary to separate homeologs by fingerprinting. BES compared to WGS traces showed polyploid-like regions with less than 1% sequence divergence exist at 2.3% of the locations assayed. Conclusion The use of HSVs like SNPs and SNIs to characterize BACs wil improve contig building methods. The implications for bioinformatic and functional annotation of polyploid and paleopolyploid genomes show that a combined approach of BAC fingerprint based physical maps, WGS sequence and HSV-based partitioning of BAC clones from homeologous regions to separate contigs will allow reliable de

  15. An integrative variant analysis suite for whole exome next-generation sequencing data

    Directory of Open Access Journals (Sweden)

    Challis Danny

    2012-01-01

    Full Text Available Abstract Background Whole exome capture sequencing allows researchers to cost-effectively sequence the coding regions of the genome. Although the exome capture sequencing methods have become routine and well established, there is currently a lack of tools specialized for variant calling in this type of data. Results Using statistical models trained on validated whole-exome capture sequencing data, the Atlas2 Suite is an integrative variant analysis pipeline optimized for variant discovery on all three of the widely used next generation sequencing platforms (SOLiD, Illumina, and Roche 454. The suite employs logistic regression models in conjunction with user-adjustable cutoffs to accurately separate true SNPs and INDELs from sequencing and mapping errors with high sensitivity (96.7%. Conclusion We have implemented the Atlas2 Suite and applied it to 92 whole exome samples from the 1000 Genomes Project. The Atlas2 Suite is available for download at http://sourceforge.net/projects/atlas2/. In addition to a command line version, the suite has been integrated into the Genboree Workbench, allowing biomedical scientists with minimal informatics expertise to remotely call, view, and further analyze variants through a simple web interface. The existing genomic databases displayed via the Genboree browser also streamline the process from variant discovery to functional genomics analysis, resulting in an off-the-shelf toolkit for the broader community.

  16. Cloning and sequence analysis of cDNA coding for rat nucleolar protein C23

    International Nuclear Information System (INIS)

    Ghaffari, S.H.; Olson, M.O.J.

    1986-01-01

    Using synthetic oligonucleotides as primers and probes, the authors have isolated and sequenced cDNA clones encoding protein C23, a putative nucleolus organizer protein. Poly(A + ) RNA was isolated from rat Novikoff hepatoma cells and enriched in C23 mRNA by sucrose density gradient ultracentrifugation. Two deoxyoligonuleotides, a 48- and a 27-mer, were synthesized on the basis of amino acid sequence from the C-terminal half of protein C23 and cDNA sequence data from CHO cell protein. The 48-mer was used a primer for synthesis of cDNA which was then inserted into plasmid pUC9. Transformed bacterial colonies were screened by hybridization with 32 P labeled 27-mer. Two clones among 5000 gave a strong positive signal. Plasmid DNAs from these clones were purified and characterized by blotting and nucleotide sequence analysis. The length of C23 mRNA was estimated to be 3200 bases in a northern blot analysis. The sequence of a 267 b.p. insert shows high homology with the CHO cDNA with only 9 nucleotide differences and an identical amino acid sequence. These studies indicate that this region of the protein is highly conserved

  17. Illumina MiSeq Sequencing for Preliminary Analysis of Microbiome Causing Primary Endodontic Infections in Egypt

    Directory of Open Access Journals (Sweden)

    Sally Ali Tawfik

    2018-01-01

    Full Text Available The use of high throughput next generation technologies has allowed more comprehensive analysis than traditional Sanger sequencing. The specific aim of this study was to investigate the microbial diversity of primary endodontic infections using Illumina MiSeq sequencing platform in Egyptian patients. Samples were collected from 19 patients in Suez Canal University Hospital (Endodontic Department using sterile # 15K file and paper points. DNA was extracted using Mo Bio power soil DNA isolation extraction kit followed by PCR amplification and agarose gel electrophoresis. The microbiome was characterized on the basis of the V3 and V4 hypervariable region of the 16S rRNA gene by using paired-end sequencing on Illumina MiSeq device. MOTHUR software was used in sequence filtration and analysis of sequenced data. A total of 1858 operational taxonomic units at 97% similarity were assigned to 26 phyla, 245 families, and 705 genera. Four main phyla Firmicutes, Bacteroidetes, Proteobacteria, and Synergistetes were predominant in all samples. At genus level, Prevotella, Bacillus, Porphyromonas, Streptococcus, and Bacteroides were the most abundant. Illumina MiSeq platform sequencing can be used to investigate oral microbiome composition of endodontic infections. Elucidating the ecology of endodontic infections is a necessary step in developing effective intracanal antimicrobials.

  18. Face processing regions are sensitive to distinct aspects of temporal sequence in facial dynamics.

    Science.gov (United States)

    Reinl, Maren; Bartels, Andreas

    2014-11-15

    Facial movement conveys important information for social interactions, yet its neural processing is poorly understood. Computational models propose that shape- and temporal sequence sensitive mechanisms interact in processing dynamic faces. While face processing regions are known to respond to facial movement, their sensitivity to particular temporal sequences has barely been studied. Here we used fMRI to examine the sensitivity of human face-processing regions to two aspects of directionality in facial movement trajectories. We presented genuine movie recordings of increasing and decreasing fear expressions, each of which were played in natural or reversed frame order. This two-by-two factorial design matched low-level visual properties, static content and motion energy within each factor, emotion-direction (increasing or decreasing emotion) and timeline (natural versus artificial). The results showed sensitivity for emotion-direction in FFA, which was timeline-dependent as it only occurred within the natural frame order, and sensitivity to timeline in the STS, which was emotion-direction-dependent as it only occurred for decreased fear. The occipital face area (OFA) was sensitive to the factor timeline. These findings reveal interacting temporal sequence sensitive mechanisms that are responsive to both ecological meaning and to prototypical unfolding of facial dynamics. These mechanisms are temporally directional, provide socially relevant information regarding emotional state or naturalness of behavior, and agree with predictions from modeling and predictive coding theory. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

  19. Human papilloma viruses and cervical tumours: mapping of integration sites and analysis of adjacent cellular sequences

    International Nuclear Information System (INIS)

    Klimov, Eugene; Vinokourova, Svetlana; Moisjak, Elena; Rakhmanaliev, Elian; Kobseva, Vera; Laimins, Laimonis; Kisseljov, Fjodor; Sulimova, Galina

    2002-01-01

    In cervical tumours the integration of human papilloma viruses (HPV) transcripts often results in the generation of transcripts that consist of hybrids of viral and cellular sequences. Mapping data using a variety of techniques has demonstrated that HPV integration occurred without obvious specificity into human genome. However, these techniques could not demonstrate whether integration resulted in the generation of transcripts encoding viral or viral-cellular sequences. The aim of this work was to map the integration sites of HPV DNA and to analyse the adjacent cellular sequences. Amplification of the INTs was done by the APOT technique. The APOT products were sequenced according to standard protocols. The analysis of the sequences was performed using BLASTN program and public databases. To localise the INTs PCR-based screening of GeneBridge4-RH-panel was used. Twelve cellular sequences adjacent to integrated HPV16 (INT markers) expressed in squamous cell cervical carcinomas were isolated. For 11 INT markers homologous human genomic sequences were readily identified and 9 of these showed significant homologies to known genes/ESTs. Using the known locations of homologous cDNAs and the RH-mapping techniques, mapping studies showed that the INTs are distributed among different human chromosomes for each tumour sample and are located in regions with the high levels of expression. Integration of HPV genomes occurs into the different human chromosomes but into regions that contain highly transcribed genes. One interpretation of these studies is that integration of HPV occurs into decondensed regions, which are more accessible for integration of foreign DNA

  20. Molecular characterization of Giardia psittaci by multilocus sequence analysis.

    Science.gov (United States)

    Abe, Niichiro; Makino, Ikuko; Kojima, Atsushi

    2012-12-01

    Multilocus sequence analyses targeting small subunit ribosomal DNA (SSU rDNA), elongation factor 1 alpha (ef1α), glutamate dehydrogenase (gdh), and beta giardin (β-giardin) were performed on Giardia psittaci isolates from three Budgerigars (Melopsittacus undulates) and four Barred parakeets (Bolborhynchus lineola) kept in individual households or imported from overseas. Nucleotide differences and phylogenetic analyses at four loci indicate the distinction of G. psittaci from the other known Giardia species: Giardia muris, Giardia microti, Giardia ardeae, and Giardia duodenalis assemblages. Furthermore, G. psittaci was related more closely to G. duodenalis than to the other known Giardia species, except for G. microti. Conflicting signals regarded as "double peaks" were found at the same nucleotide positions of the ef1α in all isolates. However, the sequences of the other three loci, including gdh and β-giardin, which are known to be highly variable, from all isolates were also mutually identical at every locus. They showed no double peaks. These results suggest that double peaks found in the ef1α sequences are caused not by mixed infection with genetically different G. psittaci isolates but by allelic sequence heterogeneity (ASH), which is observed in diplomonad lineages including G. duodenalis. No sequence difference was found in any G. psittaci isolates at the gdh and β-giardin, suggesting that G. psittaci is indeed not more diverse genetically than other Giardia species. This report is the first to provide evidence related to the genetic characteristics of G. psittaci obtained using multilocus sequence analysis. Copyright © 2012 Elsevier B.V. All rights reserved.

  1. CISAPS: Complex Informational Spectrum for the Analysis of Protein Sequences

    Directory of Open Access Journals (Sweden)

    Charalambos Chrysostomou

    2015-01-01

    Full Text Available Complex informational spectrum analysis for protein sequences (CISAPS and its web-based server are developed and presented. As recent studies show, only the use of the absolute spectrum in the analysis of protein sequences using the informational spectrum analysis is proven to be insufficient. Therefore, CISAPS is developed to consider and provide results in three forms including absolute, real, and imaginary spectrum. Biologically related features to the analysis of influenza A subtypes as presented as a case study in this study can also appear individually either in the real or imaginary spectrum. As the results presented, protein classes can present similarities or differences according to the features extracted from CISAPS web server. These associations are probable to be related with the protein feature that the specific amino acid index represents. In addition, various technical issues such as zero-padding and windowing that may affect the analysis are also addressed. CISAPS uses an expanded list of 611 unique amino acid indices where each one represents a different property to perform the analysis. This web-based server enables researchers with little knowledge of signal processing methods to apply and include complex informational spectrum analysis to their work.

  2. Comparative analysis of full genomic sequences among different genotypes of dengue virus type 3

    Directory of Open Access Journals (Sweden)

    Lin Ting-Hsiang

    2008-05-01

    Full Text Available Abstract Background Although the previous study demonstrated the envelope protein of dengue viruses is under purifying selection pressure, little is known about the genetic differences of full-length viral genomes of DENV-3. In our study, complete genomic sequencing of DENV-3 strains collected from different geographical locations and isolation years were determined and the sequence diversity as well as selection pressure sites in the DENV genome other than within the E gene were also analyzed. Results Using maximum likelihood and Bayesian approaches, our phylogenetic analysis revealed that the Taiwan's indigenous DENV-3 isolated from 1994 and 1998 dengue/DHF epidemics and one 1999 sporadic case were of the three different genotypes – I, II, and III, each associated with DENV-3 circulating in Indonesia, Thailand and Sri Lanka, respectively. Sequence diversity and selection pressure of different genomic regions among DENV-3 different genotypes was further examined to understand the global DENV-3 evolution. The highest nucleotide sequence diversity among the fully sequenced DENV-3 strains was found in the nonstructural protein 2A (mean ± SD: 5.84 ± 0.54 and envelope protein gene regions (mean ± SD: 5.04 ± 0.32. Further analysis found that positive selection pressure of DENV-3 may occur in the non-structural protein 1 gene region and the positive selection site was detected at position 178 of the NS1 gene. Conclusion Our study confirmed that the envelope protein is under purifying selection pressure although it presented higher sequence diversity. The detection of positive selection pressure in the non-structural protein along genotype II indicated that DENV-3 originated from Southeast Asia needs to monitor the emergence of DENV strains with epidemic potential for better epidemic prevention and vaccine development.

  3. Context based computational analysis and characterization of ARS consensus sequences (ACS of Saccharomyces cerevisiae genome

    Directory of Open Access Journals (Sweden)

    Vinod Kumar Singh

    2016-09-01

    Full Text Available Genome-wide experimental studies in Saccharomyces cerevisiae reveal that autonomous replicating sequence (ARS requires an essential consensus sequence (ACS for replication activity. Computational studies identified thousands of ACS like patterns in the genome. However, only a few hundreds of these sites act as replicating sites and the rest are considered as dormant or evolving sites. In a bid to understand the sequence makeup of replication sites, a content and context-based analysis was performed on a set of replicating ACS sequences that binds to origin-recognition complex (ORC denoted as ORC-ACS and non-replicating ACS sequences (nrACS, that are not bound by ORC. In this study, DNA properties such as base composition, correlation, sequence dependent thermodynamic and DNA structural profiles, and their positions have been considered for characterizing ORC-ACS and nrACS. Analysis reveals that ORC-ACS depict marked differences in nucleotide composition and context features in its vicinity compared to nrACS. Interestingly, an A-rich motif was also discovered in ORC-ACS sequences within its nucleosome-free region. Profound changes in the conformational features, such as DNA helical twist, inclination angle and stacking energy between ORC-ACS and nrACS were observed. Distribution of ACS motifs in the non-coding segments points to the locations of ORC-ACS which are found far away from the adjacent gene start position compared to nrACS thereby enabling an accessible environment for ORC-proteins. Our attempt is novel in considering the contextual view of ACS and its flanking region along with nucleosome positioning in the S. cerevisiae genome and may be useful for any computational prediction scheme.

  4. Genomic insight into the common carp (Cyprinus carpio genome by sequencing analysis of BAC-end sequences

    Directory of Open Access Journals (Sweden)

    Wang Jintu

    2011-04-01

    Full Text Available Abstract Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio, a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3

  5. Genomic insight into the common carp (Cyprinus carpio) genome by sequencing analysis of BAC-end sequences

    Science.gov (United States)

    2011-01-01

    Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES) are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio), a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3,100 microsyntenies, covering over 50% of

  6. CAFE: aCcelerated Alignment-FrEe sequence analysis.

    Science.gov (United States)

    Lu, Yang Young; Tang, Kujin; Ren, Jie; Fuhrman, Jed A; Waterman, Michael S; Sun, Fengzhu

    2017-07-03

    Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, $d_2^*$ and $d_2^S$ are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software, aCcelerated Alignment-FrEe sequence analysis (CAFE), for efficient calculation of 28 alignment-free dissimilarity measures. CAFE allows for both assembled genome sequences and unassembled NGS shotgun reads as input, and wraps the output in a standard PHYLIP format. In downstream analyses, CAFE can also be used to visualize the pairwise dissimilarity measures, including dendrograms, heatmap, principal coordinate analysis and network display. CAFE serves as a general k-mer based alignment-free analysis platform for studying the relationships among genomes and metagenomes, and is freely available at https://github.com/younglululu/CAFE. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. Sequence analysis of L RNA of Lassa virus

    International Nuclear Information System (INIS)

    Vieth, Simon; Torda, Andrew E.; Asper, Marcel; Schmitz, Herbert; Guenther, Stephan

    2004-01-01

    The L RNA of three Lassa virus strains originating from Nigeria, Ghana/Ivory Coast, and Sierra Leone was sequenced and the data subjected to structure predictions and phylogenetic analyses. The L gene products had 2218-2221 residues, diverged by 18% at the amino acid level, and contained several conserved regions. Only one region of 504 residues (positions 1043-1546) could be assigned a function, namely that of an RNA polymerase. Secondary structure predictions suggest that this domain is very similar to RNA-dependent RNA polymerases of known structure encoded by plus-strand RNA viruses, permitting a model to be built. Outside the polymerase region, there is little structural data, except for regions of strong alpha-helical content and probably a coiled-coil domain at the N terminus. No evidence for reassortment or recombination during Lassa virus evolution was found. The secondary structure-assisted alignment of the RNA polymerase region permitted a reliable reconstruction of the phylogeny of all negative-strand RNA viruses, indicating that Arenaviridae are most closely related to Nairoviruses. In conclusion, the data provide a basis for structural and functional characterization of the Lassa virus L protein and reveal new insights into the phylogeny of negative-strand RNA viruses

  8. In silico Analysis of osr40c1 Promoter Sequence Isolated from Indica Variety Pokkali

    Directory of Open Access Journals (Sweden)

    W.S.I. de Silva

    2017-07-01

    Full Text Available The promoter region of a drought and abscisic acid (ABA inducible gene, osr40c1, was isolated from a salt-tolerant indica rice variety Pokkali, which is 670 bp upstream of the putative translation start codon. In silico promoter analysis of resulted sequence showed that at least 15 types of putative motifs were distributed within the sequence, including two types of common promoter elements, TATA and CAAT boxes. Additionally, several putative cis-acing regulatory elements which may be involved in regulation of osr40c1 expression under different conditions were found in the 5′-upstream region of osr40c1. These are ABA-responsive element, light-responsive elements (ATCT-motif, Box I, G-box, GT1-motif, Gap-box and Sp1, myeloblastosis oncogene response element (CCAAT-box, auxin responsive element (TGA-element, gibberellin-responsive element (GARE-motif and fungal-elicitor responsive elements (Box E and Box-W1. A putative regulatory element, required for endosperm-specific pattern of gene expression designated as Skn-1 motif, was also detected in the Pokkali osr40c1 promoter region. In conclusion, the bioinformatic analysis of osr40c1 promoter region isolated from indica rice variety Pokkali led to the identification of several important stress-responsive cis-acting regulatory elements, and therefore, the isolated promoter sequence could be employed in rice genetic transformation to mediate expression of abiotic stress induced genes.

  9. Sequence and transcription analysis of the human cytomegalovirus DNA polymerase gene

    International Nuclear Information System (INIS)

    Kouzarides, T.; Bankier, A.T.; Satchwell, S.C.; Weston, K.; Tomlinson, P.; Barrell, B.G.

    1987-01-01

    DNA sequence analysis has revealed that the gene coding for the human cytomegalovirus (HCMV) DNA polymerase is present within the long unique region of the virus genome. Identification is based on extensive amino acid homology between the predicted HCMV open reading frame HFLF2 and the DNA polymerase of herpes simplex virus type 1. The authors present here a 5280 base-pair DNA sequence containing the HCMV pol gene, along with the analysis of transcripts encoded within this region. Since HCMV pol also shows homology to the predicted Epstein-Barr virus pol, they were able to analyze the extent of homology between the DNA polymerases of three distantly related herpes viruses, HCMV, Epstein-Barr virus, and herpes simplex virus. The comparison shows that these DNA polymerases exhibit considerable amino acid homology and highlights a number of highly conserved regions; two such regions show homology to sequences within the adenovirus type 2 DNA polymerase. The HCMV pol gene is flanked by open reading frames with homology to those of other herpes viruses; upstream, there is a reading frame homologous to the glycoprotein B gene of herpes simplex virus type I and Epstein-Barr virus, and downstream there is a reading frame homologous to BFLF2 of Epstein-Barr virus

  10. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

    Science.gov (United States)

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

    2015-05-01

    To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.

  11. OPAL: prediction of MoRF regions in intrinsically disordered protein sequences.

    Science.gov (United States)

    Sharma, Ronesh; Raicar, Gaurav; Tsunoda, Tatsuhiko; Patil, Ashwini; Sharma, Alok

    2018-06-01

    Intrinsically disordered proteins lack stable 3-dimensional structure and play a crucial role in performing various biological functions. Key to their biological function are the molecular recognition features (MoRFs) located within long disordered regions. Computationally identifying these MoRFs from disordered protein sequences is a challenging task. In this study, we present a new MoRF predictor, OPAL, to identify MoRFs in disordered protein sequences. OPAL utilizes two independent sources of information computed using different component predictors. The scores are processed and combined using common averaging method. The first score is computed using a component MoRF predictor which utilizes composition and sequence similarity of MoRF and non-MoRF regions to detect MoRFs. The second score is calculated using half-sphere exposure (HSE), solvent accessible surface area (ASA) and backbone angle information of the disordered protein sequence, using information from the amino acid properties of flanks surrounding the MoRFs to distinguish MoRF and non-MoRF residues. OPAL is evaluated using test sets that were previously used to evaluate MoRF predictors, MoRFpred, MoRFchibi and MoRFchibi-web. The results demonstrate that OPAL outperforms all the available MoRF predictors and is the most accurate predictor available for MoRF prediction. It is available at http://www.alok-ai-lab.com/tools/opal/. ashwini@hgc.jp or alok.sharma@griffith.edu.au. Supplementary data are available at Bioinformatics online.

  12. Complete genome sequencing and phylogenetic analysis of dengue type 1 virus isolated from Jeddah, Saudi Arabia.

    Science.gov (United States)

    Azhar, Esam I; Hashem, Anwar M; El-Kafrawy, Sherif A; Abol-Ela, Said; Abd-Alla, Adly M M; Sohrab, Sayed Sartaj; Farraj, Suha A; Othman, Norah A; Ben-Helaby, Huda G; Ashshi, Ahmed; Madani, Tariq A; Jamjoom, Ghazi

    2015-01-16

    Dengue viruses (DENVs) are mosquito-borne viruses which can cause disease ranging from mild fever to severe dengue infection. These viruses are endemic in several tropical and subtropical regions. Multiple outbreaks of DENV serotypes 1, 2 and 3 (DENV-1, DENV-2 and DENV-3) have been reported from the western region in Saudi Arabia since 1994. Strains from at least two genotypes of DENV-1 (Asia and America/Africa genotypes) have been circulating in western Saudi Arabia until 2006. However, all previous studies reported from Saudi Arabia were based on partial sequencing data of the envelope (E) gene without any reports of full genome sequences for any DENV serotypes circulating in Saudi Arabia. Here, we report the isolation and the first complete genome sequence of a DENV-1 strain (DENV-1-Jeddah-1-2011) isolated from a patient from Jeddah, Saudi Arabia in 2011. Whole genome sequence alignment and phylogenetic analysis showed high similarity between DENV-1-Jeddah-1-2011 strain and D1/H/IMTSSA/98/606 isolate (Asian genotype) reported from Djibouti in 1998. Further analysis of the full envelope gene revealed a close relationship between DENV-1-Jeddah-1-2011 strain and isolates reported between 2004-2006 from Jeddah as well as recent isolates from Somalia, suggesting the widespread of the Asian genotype in this region. These data suggest that strains belonging to the Asian genotype might have been introduced into Saudi Arabia long before 2004 most probably by African pilgrims and continued to circulate in western Saudi Arabia at least until 2011. Most importantly, these results indicate that pilgrims from dengue endemic regions can play an important role in the spread of new DENVs in Saudi Arabia and the rest of the world. Therefore, availability of complete genome sequences would serve as a reference for future epidemiological studies of DENV-1 viruses.

  13. A unique genomic sequence in the Wolf-Hirschhorn syndrome [WHS] region of humans is conserved in the great apes.

    Science.gov (United States)

    Tarzami, S T; Kringstein, A M; Conte, R A; Verma, R S

    1996-10-01

    The Wolf-Hirschhorn syndrome (WHS) is caused by a partial deletion in the short arm of chromosome 4 band 16.3 (4p 16.3). A unique-sequence human DNA probe (39 kb) localized within this region has been used to search for sequence homology in the apes' equivalent chromosome 3 by FISH-technique. The WHS loci are conserved in higher primates at the expected position. Nevertheless, a control probe, which detects alphoid sequences of the pericentromeric region of humans, is diverged in chimpanzee, gorilla, and orangutan. The conservation of WHS loci and divergence of DNA alphoid sequences have further added to the controversy concerning human descent.

  14. Phylogenetic relationships of Malaysia's pig-tailed macaque Macaca nemestrina based on D-loop region sequences

    Science.gov (United States)

    Abdul-Latiff M. A., B.; Ampeng, A.; Yaakop, S.; Md-Zain B., M.

    2014-09-01

    Phylogenetic relationships among Malaysian pig-tailed macaques have never been established even though the data are crucial in aiding conservation plan for the species. The aims of this study is to establish the phylogenetic relationships of Macaca nemestrina in Malaysia. A total of 21 genetic samples of M. nemestrina yielding 458 bp of D-loop sequences were used in phylogenetic analyses, in addition to one sample of M. fascicularis which was used as an outgroup. Sequence character analysis revealed that D-loop locus contains 23% parsimony informative character detected among the ingroups. Further analysis indicated a clear separation between populations originating from different regions; the Malay Peninsula populations are separated from Borneo Insular population; and Perak population formed a distinctive clade within Peninsular Malaysia populations. Phylogenetic trees (NJ, MP and Bayesian) portray a consistent clustering paradigm as Borneo population was distinguished from Peninsula population (100% bootstrap value in the NJ, MP, 1.00 posterior probability in Bayesian trees). Perak's population was separated from other Peninsula populations (100% in NJ, 99% in MP and 1.00 in Bayesian). D-loop region of mtDNA is proven to be a suitable locus in studying the separation of M. nemestrina at population level. These findings are crucial in aiding the conservation management and translocation process of M. fascicularis populations in Malaysia.

  15. Environmental impact analysis for the main accidental sequences of ignitor

    International Nuclear Information System (INIS)

    Carpignano, A.; Francabandiera, S.; Vella, R.; Zucchetti, M.

    1996-01-01

    A safety analysis study has been applied to the Ignitor machine using Probabilistic Safety Assessment. The main initiating events have been identified, and accident sequences have been studied by means of traditional methods such as Failure Mode and Effect Analysis (FMEA), Fault Trees (FT) and Event Trees (ET). The consequences of the radioactive environmental releases have been assessed in terms of Effective Dose Equivalent (EDEs) to the Most Exposed Individuals (MEI) of the chosen site, by means of a population dose code. Results point out the low enviromental impact of the machine. 13 refs., 1 fig., 3 tabs

  16. Multilocus sequence analysis of nectar pseudomonads reveals high genetic diversity and contrasting recombination patterns.

    Science.gov (United States)

    Alvarez-Pérez, Sergio; de Vega, Clara; Herrera, Carlos M

    2013-01-01

    The genetic and evolutionary relationships among floral nectar-dwelling Pseudomonas 'sensu stricto' isolates associated to South African and Mediterranean plants were investigated by multilocus sequence analysis (MLSA) of four core housekeeping genes (rrs, gyrB, rpoB and rpoD). A total of 35 different sequence types were found for the 38 nectar bacterial isolates characterised. Phylogenetic analyses resulted in the identification of three main clades [nectar groups (NGs) 1, 2 and 3] of nectar pseudomonads, which were closely related to five intrageneric groups: Pseudomonas oryzihabitans (NG 1); P. fluorescens, P. lutea and P. syringae (NG 2); and P. rhizosphaerae (NG 3). Linkage disequilibrium analysis pointed to a mostly clonal population structure, even when the analysis was restricted to isolates from the same floristic region or belonging to the same NG. Nevertheless, signatures of recombination were observed for NG 3, which exclusively included isolates retrieved from the floral nectar of insect-pollinated Mediterranean plants. In contrast, the other two NGs comprised both South African and Mediterranean isolates. Analyses relating diversification to floristic region and pollinator type revealed that there has been more unique evolution of the nectar pseudomonads within the Mediterranean region than would be expected by chance. This is the first work analysing the sequence of multiple loci to reveal geno- and ecotypes of nectar bacteria.

  17. Multilocus Sequence Analysis of Nectar Pseudomonads Reveals High Genetic Diversity and Contrasting Recombination Patterns

    Science.gov (United States)

    Álvarez-Pérez, Sergio; de Vega, Clara; Herrera, Carlos M.

    2013-01-01

    The genetic and evolutionary relationships among floral nectar-dwelling Pseudomonas ‘sensu stricto’ isolates associated to South African and Mediterranean plants were investigated by multilocus sequence analysis (MLSA) of four core housekeeping genes (rrs, gyrB, rpoB and rpoD). A total of 35 different sequence types were found for the 38 nectar bacterial isolates characterised. Phylogenetic analyses resulted in the identification of three main clades [nectar groups (NGs) 1, 2 and 3] of nectar pseudomonads, which were closely related to five intrageneric groups: Pseudomonas oryzihabitans (NG 1); P. fluorescens, P. lutea and P. syringae (NG 2); and P. rhizosphaerae (NG 3). Linkage disequilibrium analysis pointed to a mostly clonal population structure, even when the analysis was restricted to isolates from the same floristic region or belonging to the same NG. Nevertheless, signatures of recombination were observed for NG 3, which exclusively included isolates retrieved from the floral nectar of insect-pollinated Mediterranean plants. In contrast, the other two NGs comprised both South African and Mediterranean isolates. Analyses relating diversification to floristic region and pollinator type revealed that there has been more unique evolution of the nectar pseudomonads within the Mediterranean region than would be expected by chance. This is the first work analysing the sequence of multiple loci to reveal geno- and ecotypes of nectar bacteria. PMID:24116076

  18. Optimal depth-based regional frequency analysis

    Directory of Open Access Journals (Sweden)

    H. Wazneh

    2013-06-01

    Full Text Available Classical methods of regional frequency analysis (RFA of hydrological variables face two drawbacks: (1 the restriction to a particular region which can lead to a loss of some information and (2 the definition of a region that generates a border effect. To reduce the impact of these drawbacks on regional modeling performance, an iterative method was proposed recently, based on the statistical notion of the depth function and a weight function φ. This depth-based RFA (DBRFA approach was shown to be superior to traditional approaches in terms of flexibility, generality and performance. The main difficulty of the DBRFA approach is the optimal choice of the weight function ϕ (e.g., φ minimizing estimation errors. In order to avoid a subjective choice and naïve selection procedures of φ, the aim of the present paper is to propose an algorithm-based procedure to optimize the DBRFA and automate the choice of ϕ according to objective performance criteria. This procedure is applied to estimate flood quantiles in three different regions in North America. One of the findings from the application is that the optimal weight function depends on the considered region and can also quantify the region's homogeneity. By comparing the DBRFA to the canonical correlation analysis (CCA method, results show that the DBRFA approach leads to better performances both in terms of relative bias and mean square error.

  19. Optimal depth-based regional frequency analysis

    Science.gov (United States)

    Wazneh, H.; Chebana, F.; Ouarda, T. B. M. J.

    2013-06-01

    Classical methods of regional frequency analysis (RFA) of hydrological variables face two drawbacks: (1) the restriction to a particular region which can lead to a loss of some information and (2) the definition of a region that generates a border effect. To reduce the impact of these drawbacks on regional modeling performance, an iterative method was proposed recently, based on the statistical notion of the depth function and a weight function φ. This depth-based RFA (DBRFA) approach was shown to be superior to traditional approaches in terms of flexibility, generality and performance. The main difficulty of the DBRFA approach is the optimal choice of the weight function ϕ (e.g., φ minimizing estimation errors). In order to avoid a subjective choice and naïve selection procedures of φ, the aim of the present paper is to propose an algorithm-based procedure to optimize the DBRFA and automate the choice of ϕ according to objective performance criteria. This procedure is applied to estimate flood quantiles in three different regions in North America. One of the findings from the application is that the optimal weight function depends on the considered region and can also quantify the region's homogeneity. By comparing the DBRFA to the canonical correlation analysis (CCA) method, results show that the DBRFA approach leads to better performances both in terms of relative bias and mean square error.

  20. Regional Convergence of Income: Spatial Analysis

    Directory of Open Access Journals (Sweden)

    Vera Ivanovna Ivanova

    2014-12-01

    Full Text Available Russia has a huge territory and a strong interregional heterogeneity, so we can assume that geographical factors have a significant impact on the pace of economic growth in Russian regions. Therefore the article is focused on the following issues: 1 correlation between comparative advantages of geographical location and differences in growth rates; 2 impact of more developed regions on their neighbors and 3 correlation between economic growth of regions and their spatial interaction. The article is devoted to the empirical analysis of regional per capita incomes from 1996 to 2012 and explores the dynamics of the spatial autocorrelation of regional development indicator. It is shown that there is a problem of measuring the intensity of spatial dependence: factor value of Moran’s index varies greatly depending on the choice of the matrix of distances. In addition, with the help of spatial econometrics the author tests the following hypotheses: 1 there is convergence between regions for a specified period; 2 the process of beta convergence is explained by the spatial arrangement of regions and 3 there is positive impact of market size on regional growth. The author empirically confirmed all three hypotheses

  1. Analysis of sequence diversity through internal transcribed spacers and simple sequence repeats to identify Dendrobium species.

    Science.gov (United States)

    Liu, Y T; Chen, R K; Lin, S J; Chen, Y C; Chin, S W; Chen, F C; Lee, C Y

    2014-04-08

    The Orchidaceae is one of the largest and most diverse families of flowering plants. The Dendrobium genus has high economic potential as ornamental plants and for medicinal purposes. In addition, the species of this genus are able to produce large crops. However, many Dendrobium varieties are very similar in outward appearance, making it difficult to distinguish one species from another. This study demonstrated that the 12 Dendrobium species used in this study may be divided into 2 groups by internal transcribed spacer (ITS) sequence analysis. Red and yellow flowers may also be used to separate these species into 2 main groups. In particular, the deciduous characteristic is associated with the ITS genetic diversity of the A group. Of 53 designed simple sequence repeat (SSR) primer pairs, 7 pairs were polymorphic for polymerase chain reaction products that were amplified from a specific band. The results of this study demonstrate that these 7 SSR primer pairs may potentially be used to identify Dendrobium species and their progeny in future studies.

  2. Complete Chloroplast Genome Sequences and Comparative Analysis of Chenopodium quinoa and C. album.

    Science.gov (United States)

    Hong, Su-Young; Cheon, Kyeong-Sik; Yoo, Ki-Oug; Lee, Hyun-Oh; Cho, Kwang-Soo; Suh, Jong-Taek; Kim, Su-Jeong; Nam, Jeong-Hwan; Sohn, Hwang-Bae; Kim, Yul-Ho

    2017-01-01

    The Chenopodium genus comprises ~150 species, including Chenopodium quinoa and Chenopodium album , two important crops with high nutritional value. To elucidate the phylogenetic relationship between the two species, the complete chloroplast (cp) genomes of these species were obtained by next generation sequencing. We performed comparative analysis of the sequences and, using InDel markers, inferred phylogeny and genetic diversity of the Chenopodium genus. The cp genome is 152,099 bp ( C. quinoa ) and 152,167 bp ( C. album ) long. In total, 119 genes (78 protein-coding, 37 tRNA, and 4 rRNA) were identified. We found 14 ( C. quinoa ) and 15 ( C. album ) tandem repeats (TRs); 14 TRs were present in both species and C. album and C. quinoa each had one species-specific TR. The trnI-GAU intron sequences contained one ( C. quinoa ) or two ( C. album ) copies of TRs (66 bp); the InDel marker was designed based on the copy number variation in TRs. Using the InDel markers, we detected this variation in the TR copy number in four species, Chenopodium hybridum, Chenopodium pumilio, Chenopodium ficifolium , and Chenopodium koraiense , but not in Chenopodium glaucum . A comparison of coding and non-coding regions between C. quinoa and C. album revealed divergent sites. Nucleotide diversity >0.025 was found in 17 regions-14 were located in the large single copy region (LSC), one in the inverted repeats, and two in the small single copy region (SSC). A phylogenetic analysis based on 59 protein-coding genes from 25 taxa resolved Chenopodioideae monophyletic and sister to Betoideae. The complete plastid genome sequences and molecular markers based on divergence hotspot regions in the two Chenopodium taxa will help to resolve the phylogenetic relationships of Chenopodium .

  3. Sequence analysis of mitochondrial DNA hypervariable region III of ...

    African Journals Online (AJOL)

    Aghomotsegin

    2015-07-01

    Jul 1, 2015 ... population genetics research, studies based on mitochondrial DNA (mtDNA) and Y-chromosome DNA are an excellent way of illustrating population structure .... avoid landing investigators into serious situations of medical genetic privacy and ethnics, especially for. mtDNA coding area whose mutation often ...

  4. Genetic distance of Malaysian mousedeer based on mitochondrial DNA cytochrome oxidase I (COI) and D-loop region sequences

    Science.gov (United States)

    Bakar, Mohamad-Azam Akmal Abu; Rovie-Ryan, Jeffrine Japning; Ampeng, Ahmad; Yaakop, Salmah; Nor, Shukor Md; Md-Zain, Badrul Munir

    2018-04-01

    Mousedeer is one of the primitive mammals that can be found mainly in Southeast-Asia region. There are two species of mousedeer in Malaysia which are Tragulus kanchil and Tragulus napu. Both species can be distinguish by size, coat coloration, and throat pattern but clear diagnosis still cannot be found. The objective of the study is to show the genetic distance relationship between T. kanchil and T. napu and their population based on mitochondrial DNA (mtDNA) cytochrome oxidase I (COI) and D-loop region. There are 42 sample of mousedeer were used in this study collected by PERHILITAN from different locality. Another 29 D-loop sequence were retrieved from Genbank for comparative analysis. All sample were amplified using universal primer and species-specific primer for COI and D-loop genes via PCR process. The amplified sequences were analyzed to determine genetic distance of T. kanchil and T. napu. From the analysis, the average genetic distance between T. kanchil and T. napu based on locus COI and D-loop were 0.145 and 0.128 respectively. The genetic distance between populations of T. kanchil based on locus COI was between 0.003-0.013. For locus D-loop, genetic distance analysis showed distance in relationship between west-coast populations to east-coast population of T. kanchil. COI and D-loop mtDNA region provided a clear picture on the relationship within the mousedeer species. Last but not least, conservation effort toward protecting this species can be done by study the molecular genetics and prevent the extinction of this species.

  5. Using Behavior Sequence Analysis to Map Serial Killers' Life Histories.

    Science.gov (United States)

    Keatley, David A; Golightly, Hayley; Shephard, Rebecca; Yaksic, Enzo; Reid, Sasha

    2018-03-01

    The aim of the current research was to provide a novel method for mapping the developmental sequences of serial killers' life histories. An in-depth biographical account of serial killers' lives, from birth through to conviction, was gained and analyzed using Behavior Sequence Analysis. The analyses highlight similarities in behavioral events across the serial killers' lives, indicating not only which risk factors occur, but the temporal order of these factors. Results focused on early childhood environment, indicating the role of parental abuse; behaviors and events surrounding criminal histories of serial killers, showing that many had previous convictions and were known to police for other crimes; behaviors surrounding their murders, highlighting differences in victim choice and modus operandi; and, finally, trial pleas and convictions. The present research, therefore, provides a novel approach to synthesizing large volumes of data on criminals and presenting results in accessible, understandable outcomes.

  6. Sequences within the 5' untranslated region regulate the levels of a kinetoplast DNA topoisomerase mRNA during the cell cycle.

    Science.gov (United States)

    Pasion, S G; Hines, J C; Ou, X; Mahmood, R; Ray, D S

    1996-12-01

    Gene expression in trypanosomatids appears to be regulated largely at the posttranscriptional level and involves maturation of mRNA precursors by trans splicing of a 39-nucleotide miniexon sequence to the 5' end of the mRNA and cleavage and polyadenylation at the 3' end of the mRNA. To initiate the identification of sequences involved in the periodic expression of DNA replication genes in trypanosomatids, we have mapped splice acceptor sites in the 5' flanking region of the TOP2 gene, which encodes the kinetoplast DNA topoisomerase, and have carried out deletion analysis of this region on a plasmid-encoded TOP2 gene. Block deletions within the 5' untranslated region (UTR) identified two regions (-608 to -388 and -387 to -186) responsible for periodic accumulation of the mRNA. Deletion of one or the other of these sequences had no effect on periodic expression of the mRNA, while deletion of both regions resulted in constitutive expression of the mRNA throughout the cell cycle. Subcloning of these sequences into the 5' UTR of a construct lacking both regions of the TOP2 5' UTR has shown that an octamer consensus sequence present in the 5' UTR of the TOP2, RPA1, and DHFR-TS mRNAs is required for normal cycling of the TOP2 mRNA. Mutation of the consensus octamer sequence in the TOP2 5' UTR in a plasmid construct containing only a single consensus octamer and that shows normal cycling of the plasmid-encoded TOP2 mRNA resulted in substantial reduction of the cycling of the mRNA level. These results imply a negative regulation of TOP2 mRNA during the cell cycle by a mechanism involving redundant elements containing one or more copies of a conserved octamer sequence within the 5' UTR of TOP2 mRNA.

  7. In Silico Genome Comparison and Distribution Analysis of Simple Sequences Repeats in Cassava

    Directory of Open Access Journals (Sweden)

    Andrea Vásquez

    2014-01-01

    Full Text Available We conducted a SSRs density analysis in different cassava genomic regions. The information obtained was useful to establish comparisons between cassava’s SSRs genomic distribution and those of poplar, flax, and Jatropha. In general, cassava has a low SSR density (~50 SSRs/Mbp and has a high proportion of pentanucleotides, (24,2 SSRs/Mbp. It was found that coding sequences have 15,5 SSRs/Mbp, introns have 82,3 SSRs/Mbp, 5′ UTRs have 196,1 SSRs/Mbp, and 3′ UTRs have 50,5 SSRs/Mbp. Through motif analysis of cassava’s genome SSRs, the most abundant motif was AT/AT while in intron sequences and UTRs regions it was AG/CT. In addition, in coding sequences the motif AAG/CTT was also found to occur most frequently; in fact, it is the third most used codon in cassava. Sequences containing SSRs were classified according to their functional annotation of Gene Ontology categories. The identified SSRs here may be a valuable addition for genetic mapping and future studies in phylogenetic analyses and genomic evolution.

  8. Genetic diversity of the captive Asian tapir population in Thailand, based on mitochondrial control region sequence data and the comparison of its nucleotide structure with Brazilian tapir.

    Science.gov (United States)

    Muangkram, Yuttamol; Amano, Akira; Wajjwalku, Worawidh; Pinyopummintr, Tanu; Thongtip, Nikorn; Kaolim, Nongnid; Sukmak, Manakorn; Kamolnorranath, Sumate; Siriaroonrat, Boripat; Tipkantha, Wanlaya; Maikaew, Umaporn; Thomas, Warisara; Polsrila, Kanda; Dongsaard, Kwanreaun; Sanannu, Saowaphang; Wattananorrasate, Anuwat

    2017-07-01

    The Asian tapir (Tapirus indicus) has been classified as Endangered on the IUCN Red List of Threatened Species (2008). Genetic diversity data provide important information for the management of captive breeding and conservation of this species. We analyzed mitochondrial control region (CR) sequences from 37 captive Asian tapirs in Thailand. Multiple alignments of the full-length CR sequences sized 1268 bp comprised three domains as described in other mammal species. Analysis of 16 parsimony-informative variable sites revealed 11 haplotypes. Furthermore, the phylogenetic analysis using median-joining network clearly showed three clades correlated with our earlier cytochrome b gene study in this endangered species. The repetitive motif is located between first and second conserved sequence blocks, similar to the Brazilian tapir. The highest polymorphic site was located in the extended termination associated sequences domain. The results could be applied for future genetic management based in captivity and wild that shows stable populations.

  9. Swab-to-Sequence: Real-time Data Analysis Platform for the Biomolecule Sequencer

    Data.gov (United States)

    National Aeronautics and Space Administration — DNA was successfully sequenced on the ISS in 2016, but the DNA sequenced was prepared on the ground. With FY’16 IRAD funds, the same team developed a...

  10. [Analysis of COX1 sequences of Taenia isolates from four areas of Guangxi].

    Science.gov (United States)

    Yang, Yi-Chao; Ou-Yang, Yi; Su, Ai-Rong; Wan, Xiao-Ling; Li, Shu-Lin

    2012-06-01

    To analyze the COX1 sequences of Taenia isolates from four areas of Guangxi Zhuang Autonomous Region, and to understand the distribution of Taenia asiatica in Guangxi. Patients with taeniasis in Luzhai, Rongshui, Tiandong and Sanjiang in Guangxi were treated by deworming, and the Taenia isolates were collected. Cyclooxygenase-1 (COX1) sequences of these isolates were amplified by PCR, and the PCR products were sequenced by T-A clone sequencing. The homogeneities and genetic distances were calculated and analyzed, and the phylogenic trees were constructed by some softwares. Meanwhile, the COX1 sequences of the isolates from the 4 areas were compared separately with the sequences of Taenia species in GenBank. The COX1 sequence of the 5 Taenia isolates collected had the same length of 444 bp. There were 5 variable positions between the Luzhai isolate and Taenia asiatica, the homogeneity was 98.87% and their genetic distance was 0.011. The phylogenetic tree analysis revealed that the Luzhai isolate and Taenia asiatica locating at the same node had a close relationship. The homogeneity between Rongshui isolate A and Taenia solium was 100%, while the homogeneity of Rongshui isolate B with Taeniasis saginata and Taenia asiatica were 98.20% and 96.17%, respectively. The homogeneities of the Tiandong and Sanjiang isolates with Taenia solium were 99.55% and 96.40%, respectively, and the genetic distances were 0.005 and 0.037, respectively. The homogeneity between the Luzhai isolate and Taeniasis saginate was 96.40%. Taenia asiatica exists in Luzhai and Taenia solium and Taenia saginata coexist in Rongshui, Guangxi Zhuang Autonomous Region.

  11. Assessment of homogeneity of regions for regional flood frequency analysis

    Science.gov (United States)

    Lee, Jeong Eun; Kim, Nam Won

    2016-04-01

    This paper analyzed the effect of rainfall on hydrological similarity, which is an important step for regional flood frequency analysis (RFFA). For the RFFA, storage function method (SFM) using spatial extension technique was applied for the 22 sub-catchments that are partitioned from Chungju dam watershed in Republic of Korea. We used the SFM to generate the annual maximum floods for 22 sub-catchments using annual maximum storm events (1986~2010) as input data. Then the quantiles of rainfall and flood were estimated using the annual maximum series for the 22 sub-catchments. Finally, spatial variations in terms of two quantiles were analyzed. As a result, there were significant correlation between spatial variations of the two quantiles. This result demonstrates that spatial variation of rainfall is an important factor to explain the homogeneity of regions when applying RFFA. Acknowledgements: This research was supported by a grant (11-TI-C06) from Advanced Water Management Research Program funded by Ministry of Land, Infrastructure and Transport of Korean government.

  12. The 2012 Ferrara seismic sequence: Regional crustal structure, earthquake sources, and seismic hazard

    Science.gov (United States)

    Malagnini, Luca; Herrmann, Robert B.; Munafò, Irene; Buttinelli, Mauro; Anselmi, Mario; Akinci, Aybige; Boschi, E.

    2012-10-01

    Inadequate seismic design codes can be dangerous, particularly when they underestimate the true hazard. In this study we use data from a sequence of moderate-sized earthquakes in northeast Italy to validate and test a regional wave propagation model which, in turn, is used to understand some weaknesses of the current design spectra. Our velocity model, while regionalized and somewhat ad hoc, is consistent with geophysical observations and the local geology. In the 0.02-0.1 Hz band, this model is validated by using it to calculate moment tensor solutions of 20 earthquakes (5.6 ≥ MW ≥ 3.2) in the 2012 Ferrara, Italy, seismic sequence. The seismic spectra observed for the relatively small main shock significantly exceeded the design spectra to be used in the area for critical structures. Observations and synthetics reveal that the ground motions are dominated by long-duration surface waves, which, apparently, the design codes do not adequately anticipate. In light of our results, the present seismic hazard assessment in the entire Pianura Padana, including the city of Milan, needs to be re-evaluated.

  13. [Study on sequence characterized amplified region (SCAR) markers of Cornus officinalis].

    Science.gov (United States)

    Chen, Suiqing; Lu, Xiaolei; Wang, Lili

    2011-05-01

    To establish sequence characterized amplified region markers of Cornus officinalis and provide a scientific basis for molecular identification of C. officinalis. The random primer was screened through RAPD to obtain specific RAPD marker bands. The RAPD marker bands were separated, extracted, cloned and sequenced. Both ends of the sequence of RAPD marker bands were determined. A pair of specific primers was designed for conventional PCR reaction, and SCAR marker was acquired. Four pairs of primers were designed based on the sequence of RAPD marker bands. The DNA of the seven varieties of C. officinalis was amplified by using YST38 and YST43 primer. The results showed that seven varieties of C. officinalis were able to produce a single PCR product. It was an effective way to identify C. officinalis. The varieties with cylindrical and long-pear shape fruits amplified by YST38 showed a specific band, which could be used as the evidence of variety identification. Seven varieties of C. oficinalis were amplified by using primer YST39. But the size of band of the variety with spindly shape fruit (35,0400 bp) was about 300 bp, which was shorter than those of the variety with the other shape fruits of C. officinalis (650-700 bp). The variety with the spindly shape fruit could be identified through this difference. The primer YST92 could produce a fragment from 600-700 bp in the varieties with cylindrical and long-pear shape fruits, a fragment from 200-300 bp in the varieties with oval and short-cylindrical shape fruits and had no fragment in the varieties with long cylindrical, elliptic and short-pear shape fruits, which could be used to select the different shapes of C. officinalis. SCAR mark is established and can be used as the basis for breeding and distinguishing the verieties of C. officinalis.

  14. Quick regional centroid moment tensor solutions for the Emilia 2012 (northern Italy seismic sequence

    Directory of Open Access Journals (Sweden)

    Silvia Pondrelli

    2012-10-01

    Full Text Available In May 2012, a seismic sequence struck the Emilia region (northern Italy. The mainshock, of Ml 5.9, occurred on May 20, 2012, at 02:03 UTC. This was preceded by a smaller Ml 4.1 foreshock some hours before (23:13 UTC on May 19, 2012 and followed by more than 2,500 earthquakes in the magnitude range from Ml 0.7 to 5.2. In addition, on May 29, 2012, three further strong earthquakes occurred, all with magnitude Ml ≥5.2: a Ml 5.8 earthquake in the morning (07:00 UTC, followed by two events within just 5 min of each other, one at 10:55 UTC (Ml 5.3 and the second at 11:00 UTC (Ml 5.2. For all of the Ml ≥4.0 earthquakes in Italy and for all of the Ml ≥4.5 in the Mediterranean area, an automatic procedure for the computation of a regional centroid moment tensor (RCMT is triggered by an email alert. Within 1 h of the event, a manually revised quick RCMT (QRCMT can be published on the website if the solution is considered stable. In particular, for the Emilia seismic sequence, 13 QRCMTs were determined and for three of them, those with M >5.5, the automatically computed QRCMTs fitted the criteria for publication without manual revision. Using this seismic sequence as a test, we can then identify the magnitude threshold for automatic publication of our QRCMTs.

  15. Recent Vs. Historical Seismicity Analysis For Banat Seismic Region (Western Part Of Romania)

    OpenAIRE

    Oros Eugen; Diaconescu Mihai

    2015-01-01

    The present day seismic activity from a region reflects the active tectonics and can confirm the seismic potential of the seismogenic sources as they are modelled using the historical seismicity. This paper makes a comparative analysis of the last decade seismicity recorded in the Banat Seismic Region (western part of Romania) and the historical seismicity of the region (Mw≥4.0). Four significant earthquake sequences have been recently localized in the region, three of them nearby the city of...

  16. Sequence comparison and phylogenetic analysis of core gene of ...

    African Journals Online (AJOL)

    STORAGESEVER

    2010-07-19

    Jul 19, 2010 ... and antisense primers, a single band of 573 base pairs .... Amino acid sequence alignment of Cluster I and Cluster II of phylogenetic tree. First ten sequences ... sequence weighting, postion-spiecific gap penalties and weight.

  17. Draft genome sequences of three virulent Streptococcus thermophilus bacteriophages isolated from the dairy environment in the Veneto region of Italy

    DEFF Research Database (Denmark)

    Duarte, Viní­cius da Silva; Giaretta, Sabrina; Treu, Laura

    2018-01-01

    Streptococcus thermophilus, a very important dairy species, is constantly threatened by phage infection. We report the genome sequences of three S. thermophilus bacteriophages isolated from a dairy environment in the Veneto region of Italy. These sequences will be used for the development of new ...

  18. The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes

    NARCIS (Netherlands)

    Skaletsky, Helen; Kuroda-Kawaguchi, Tomoko; Minx, Patrick J.; Cordum, Holland S.; Hillier, LaDeana; Brown, Laura G.; Repping, Sjoerd; Pyntikova, Tatyana; Ali, Johar; Bieri, Tamberlyn; Chinwalla, Asif; Delehaunty, Andrew; Delehaunty, Kim; Du, Hui; Fewell, Ginger; Fulton, Lucinda; Fulton, Robert; Graves, Tina; Hou, Shun-Fang; Latrielle, Philip; Leonard, Shawn; Mardis, Elaine; Maupin, Rachel; McPherson, John; Miner, Tracie; Nash, William; Nguyen, Christine; Ozersky, Philip; Pepin, Kymberlie; Rock, Susan; Rohlfing, Tracy; Scott, Kelsi; Schultz, Brian; Strong, Cindy; Tin-Wollam, Aye; Yang, Shiaw-Pyng; Waterston, Robert H.; Wilson, Richard K.; Rozen, Steve; Page, David C.

    2003-01-01

    The male-specific region of the Y chromosome, the MSY, differentiates the sexes and comprises 95% of the chromosome's length. Here, we report that the MSY is a mosaic of heterochromatic sequences and three classes of euchromatic sequences: X-transposed, X-degenerate and ampliconic. These classes

  19. Linear discriminant analysis of character sequences using occurrences of words

    KAUST Repository

    Dutta, Subhajit; Chaudhuri, Probal; Ghosh, Anil

    2014-01-01

    Classification of character sequences, where the characters come from a finite set, arises in disciplines such as molecular biology and computer science. For discriminant analysis of such character sequences, the Bayes classifier based on Markov models turns out to have class boundaries defined by linear functions of occurrences of words in the sequences. It is shown that for such classifiers based on Markov models with unknown orders, if the orders are estimated from the data using cross-validation, the resulting classifier has Bayes risk consistency under suitable conditions. Even when Markov models are not valid for the data, we develop methods for constructing classifiers based on linear functions of occurrences of words, where the word length is chosen by cross-validation. Such linear classifiers are constructed using ideas of support vector machines, regression depth, and distance weighted discrimination. We show that classifiers with linear class boundaries have certain optimal properties in terms of their asymptotic misclassification probabilities. The performance of these classifiers is demonstrated in various simulated and benchmark data sets.

  20. Analysis of correlations between sites in models of protein sequences

    International Nuclear Information System (INIS)

    Giraud, B.G.; Lapedes, A.; Liu, L.C.

    1998-01-01

    A criterion based on conditional probabilities, related to the concept of algorithmic distance, is used to detect correlated mutations at noncontiguous sites on sequences. We apply this criterion to the problem of analyzing correlations between sites in protein sequences; however, the analysis applies generally to networks of interacting sites with discrete states at each site. Elementary models, where explicit results can be derived easily, are introduced. The number of states per site considered ranges from 2, illustrating the relation to familiar classical spin systems, to 20 states, suitable for representing amino acids. Numerical simulations show that the criterion remains valid even when the genetic history of the data samples (e.g., protein sequences), as represented by a phylogenetic tree, introduces nonindependence between samples. Statistical fluctuations due to finite sampling are also investigated and do not invalidate the criterion. A subsidiary result is found: The more homogeneous a population, the more easily its average properties can drift from the properties of its ancestor. copyright 1998 The American Physical Society

  1. Linear discriminant analysis of character sequences using occurrences of words

    KAUST Repository

    Dutta, Subhajit

    2014-02-01

    Classification of character sequences, where the characters come from a finite set, arises in disciplines such as molecular biology and computer science. For discriminant analysis of such character sequences, the Bayes classifier based on Markov models turns out to have class boundaries defined by linear functions of occurrences of words in the sequences. It is shown that for such classifiers based on Markov models with unknown orders, if the orders are estimated from the data using cross-validation, the resulting classifier has Bayes risk consistency under suitable conditions. Even when Markov models are not valid for the data, we develop methods for constructing classifiers based on linear functions of occurrences of words, where the word length is chosen by cross-validation. Such linear classifiers are constructed using ideas of support vector machines, regression depth, and distance weighted discrimination. We show that classifiers with linear class boundaries have certain optimal properties in terms of their asymptotic misclassification probabilities. The performance of these classifiers is demonstrated in various simulated and benchmark data sets.

  2. Analysis of the 2005-2016 Earthquake Sequence in Northern Iran Using the Visibility Graph Method

    Science.gov (United States)

    Khoshnevis, Naeem; Taborda, Ricardo; Azizzadeh-Roodpish, Shima; Telesca, Luciano

    2017-11-01

    We present an analysis of the seismicity of northern Iran in the period between 2005 and 2016 using a recently introduced method based on concepts of graph theory. The method relies on the inter-event visibility defined in terms of a connectivity degree parameter, k, which is correlated with the earthquake magnitude, M. Previous studies show that the slope m of the line fitting the k- M plot by the least squares method also observes a relationship with the b value from the Gutenberg-Richter law, thus rendering the graph analysis useful to examine the seismicity of a region. These correlations seem to hold for the analysis of relatively small sequences of earthquakes, offering the possibility of studying seismicity parameters in time. We apply this approach to the case of the seismicity of northern Iran, using an earthquake catalog for the tectonic seismic regions of Azerbaijan, Alborz, and Kopeh Dagh. We use results drawn for this region with the visibility graph approach in combination with results from other similar studies to further improve the universal relationship between m and b, and show that the visibility graph approach can be considered as a valid alternative for analyzing regional seismicity properties and earthquake sequences.

  3. Multifractal analysis of 2001 Mw 7 . 7 Bhuj earthquake sequence in Gujarat, Western India

    Science.gov (United States)

    Aggarwal, Sandeep Kumar; Pastén, Denisse; Khan, Prosanta Kumar

    2017-12-01

    The 2001 Mw 7 . 7 Bhuj mainshock seismic sequence in the Kachchh area, occurring during 2001 to 2012, has been analyzed using mono-fractal and multi-fractal dimension spectrum analysis technique. This region was characterized by frequent moderate shocks of Mw ≥ 5 . 0 for more than a decade since the occurrence of 2001 Bhuj earthquake. The present study is therefore important for precursory analysis using this sequence. The selected long-sequence has been investigated first time for completeness magnitude Mc 3.0 using the maximum curvature method. Multi-fractal Dq spectrum (Dq ∼ q) analysis was carried out using effective window-length of 200 earthquakes with a moving window of 20 events overlapped by 180 events. The robustness of the analysis has been tested by considering the magnitude completeness correction term of 0.2 to Mc 3.0 as Mc 3.2 and we have tested the error in the calculus of Dq for each magnitude threshold. On the other hand, the stability of the analysis has been investigated down to the minimum magnitude of Mw ≥ 2 . 6 in the sequence. The analysis shows the multi-fractal dimension spectrum Dq decreases with increasing of clustering of events with time before a moderate magnitude earthquake in the sequence, which alternatively accounts for non-randomness in the spatial distribution of epicenters and its self-organized criticality. Similar behavior is ubiquitous elsewhere around the globe, and warns for proximity of a damaging seismic event in an area. OS: Please confirm math roman or italics in abs.

  4. Sequence analysis of PROTEOLYSIS 6 from Solanum lycopersicum

    Science.gov (United States)

    Roslan, Nur Farhana; Chew, Bee Lyn; Goh, Hoe-Han; Isa, Nurulhikma Md

    2018-04-01

    The N-end rule pathway is a protein degradation pathway that relates the protein half-life with the identity of its N-terminal residues. A destabilizing N-terminal residues is created by enzymatic reaction or chemical modifications. This destabilized substrate will be recognized by PROTEOLYSIS 6 (PRT6) protein, which encodes an E3 ligase enzyme and resulted in substrate degradation by proteasome. PRT6 has been studied in Arabidopsis thaliana and barley but not yet been studied in fleshy fruit plants. Hence, this study was carried out in tomato that is known as the model for fleshy fruit plants. BLASTX analysis identified that Solyc09g010830 which encodes for a PRT6 gene in tomato based on its sequence similarity with PRT6 in A. thaliana. In silico gene expression analysis shows that PRT6 gene was highly expressed in tomato fruits breaker +5. Co-expression analysis shows that PRT6 may not only involved in abiotic stresses but also in biotic stresses. The objective is to analyze the sequence and characterize PRT6 gene in tomato.

  5. Determining physical constraints in transcriptional initiationcomplexes using DNA sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Shultzaberger, Ryan K.; Chiang, Derek Y.; Moses, Alan M.; Eisen,Michael B.

    2007-07-01

    Eukaryotic gene expression is often under the control ofcooperatively acting transcription factors whose binding is limited bystructural constraints. By determining these structural constraints, wecan understand the "rules" that define functional cooperativity.Conversely, by understanding the rules of binding, we can inferstructural characteristics. We have developed an information theory basedmethod for approximating the physical limitations of cooperativeinteractions by comparing sequence analysis to microarray expressiondata. When applied to the coordinated binding of the sulfur amino acidregulatory protein Met4 by Cbf1 and Met31, we were able to create acombinatorial model that can correctly identify Met4 regulatedgenes.

  6. In Vivo Enhancer Analysis Chromosome 16 Conserved NoncodingSequences

    Energy Technology Data Exchange (ETDEWEB)

    Pennacchio, Len A.; Ahituv, Nadav; Moses, Alan M.; Nobrega,Marcelo; Prabhakar, Shyam; Shoukry, Malak; Minovitsky, Simon; Visel,Axel; Dubchak, Inna; Holt, Amy; Lewis, Keith D.; Plajzer-Frick, Ingrid; Akiyama, Jennifer; De Val, Sarah; Afzal, Veena; Black, Brian L.; Couronne, Olivier; Eisen, Michael B.; Rubin, Edward M.

    2006-02-01

    The identification of enhancers with predicted specificitiesin vertebrate genomes remains a significant challenge that is hampered bya lack of experimentally validated training sets. In this study, weleveraged extreme evolutionary sequence conservation as a filter toidentify putative gene regulatory elements and characterized the in vivoenhancer activity of human-fish conserved and ultraconserved1 noncodingelements on human chromosome 16 as well as such elements from elsewherein the genome. We initially tested 165 of these extremely conservedsequences in a transgenic mouse enhancer assay and observed that 48percent (79/165) functioned reproducibly as tissue-specific enhancers ofgene expression at embryonic day 11.5. While driving expression in abroad range of anatomical structures in the embryo, the majority of the79 enhancers drove expression in various regions of the developingnervous system. Studying a set of DNA elements that specifically droveforebrain expression, we identified DNA signatures specifically enrichedin these elements and used these parameters to rank all ~;3,400human-fugu conserved noncoding elements in the human genome. The testingof the top predictions in transgenic mice resulted in a three-foldenrichment for sequences with forebrain enhancer activity. These datadramatically expand the catalogue of in vivo-characterized human geneenhancers and illustrate the future utility of such training sets for avariety of iological applications including decoding the regulatoryvocabulary of the human genome.

  7. [Using exon combined target region capture sequencing chip to detect the disease-causing genes of retinitis pigmentosa].

    Science.gov (United States)

    Rong, Weining; Chen, Xuejuan; Li, Huiping; Liu, Yani; Sheng, Xunlun

    2014-06-01

    To detect the disease-causing genes of 10 retinitis pigmentosa pedigrees by using exon combined target region capture sequencing chip. Pedigree investigation study. From October 2010 to December 2013, 10 RP pedigrees were recruited for this study in Ningxia Eye Hospital. All the patients and family members received complete ophthalmic examinations. DNA was abstracted from patients, family members and controls. Using exon combined target region capture sequencing chip to screen the candidate disease-causing mutations. Polymerase chain reaction (PCR) and direct sequencing were used to confirm the disease-causing mutations. Seventy patients and 23 normal family members were recruited from 10 pedigrees. Among 10 RP pedigrees, 1 was autosomal dominant pedigrees and 9 were autosomal recessive pedigrees. 7 mutations related to 5 genes of 5 pedigrees were detected. A frameshift mutation on BBS7 gene was detected in No.2 pedigree, the patients of this pedigree combined with central obesity, polydactyly and mental handicap. No.2 pedigree was diagnosed as Bardet-Biedl syndrome finally. A missense mutation was detected in No.7 and No.10 pedigrees respectively. Because the patients suffered deafness meanwhile, the final diagnosis was Usher syndrome. A missense mutation on C3 gene related to age-related macular degeneration was also detected in No. 7 pedigrees. A nonsense mutation and a missense mutation on CRB1 gene were detected in No. 1 pedigree and a splicesite mutation on PROM1 gene was detected in No. 5 pedigree. Retinitis pigmentosa is a kind of genetic eye disease with diversity clinical phenotypes. Rapid and effective genetic diagnosis technology combined with clinical characteristics analysis is helpful to improve the level of clinical diagnosis of RP.

  8. Whole genome sequence phylogenetic analysis of four Mexican rabies viruses isolated from cattle.

    Science.gov (United States)

    Bárcenas-Reyes, I; Loza-Rubio, E; Cantó-Alarcón, G J; Luna-Cozar, J; Enríquez-Vázquez, A; Barrón-Rodríguez, R J; Milián-Suazo, F

    2017-08-01

    Phylogenetic analysis of the rabies virus in molecular epidemiology has been traditionally performed on partial sequences of the genome, such as the N, G, and P genes; however, that approach raises concerns about the discriminatory power compared to whole genome sequencing. In this study we characterized four strains of the rabies virus isolated from cattle in Querétaro, Mexico by comparing the whole genome sequence to that of strains from the American, European and Asian continents. Four cattle brain samples positive to rabies and characterized as AgV11, genotype 1, were used in the study. A cDNA sequence was generated by reverse transcription PCR (RT-PCR) using oligo dT. cDNA samples were sequenced in an Illumina NextSeq 500 platform. The phylogenetic analysis was performed with MEGA 6.0. Minimum evolution phylogenetic trees were constructed with the Neighbor-Joining method and bootstrapped with 1000 replicates. Three large and seven small clusters were formed with the 26 sequences used. The largest cluster grouped strains from different species in South America: Brazil, and the French Guyana. The second cluster grouped five strains from Mexico. A Mexican strain reported in a different study was highly related to our four strains, suggesting common source of infection. The phylogenetic analysis shows that the type of host is different for the different regions in the American Continent; rabies is more related to bats. It was concluded that the rabies virus in central Mexico is genetically stable and that it is transmitted by the vampire bat Desmodus rotundus. Copyright © 2017 Elsevier Ltd. All rights reserved.

  9. Chronology of Eocene-Miocene sequences on the New Jersey shallow shelf: implications for regional, interregional, and global correlations

    Science.gov (United States)

    Browning, James V.; Miller, Kenneth G.; Sugarman, Peter J.; Barron, John; McCarthy, Francine M.G.; Kulhanek, Denise K.; Katz, Miriam E.; Feigenson, Mark D.

    2013-01-01

    Integrated Ocean Drilling Program Expedition 313 continuously cored and logged latest Eocene to early-middle Miocene sequences at three sites (M27, M28, and M29) on the inner-middle continental shelf offshore New Jersey, providing an opportunity to evaluate the ages, global correlations, and significance of sequence boundaries. We provide a chronology for these sequences using integrated strontium isotopic stratigraphy and biostratigraphy (primarily calcareous nannoplankton, diatoms, and dinocysts [dinoflagellate cysts]). Despite challenges posed by shallow-water sediments, age resolution is typically ±0.5 m.y. and in many sequences is as good as ±0.25 m.y. Three Oligocene sequences were sampled at Site M27 on sequence bottomsets. Fifteen early to early-middle Miocene sequences were dated at Sites M27, M28, and M29 across clinothems in topsets, foresets (where the sequences are thickest), and bottomsets. A few sequences have coarse (∼1 m.y.) or little age constraint due to barren zones; we constrain the age estimates of these less well dated sequences by applying the principle of superposition, i.e., sediments above sequence boundaries in any site are younger than the sediments below the sequence boundaries at other sites. Our age control provides constraints on the timing of deposition in the clinothem; sequences on the topsets are generally the youngest in the clinothem, whereas the bottomsets generally are the oldest. The greatest amount of time is represented on foresets, although we have no evidence for a correlative conformity. Our chronology provides a baseline for regional and interregional correlations and sea-level reconstructions: (1) we correlate a major increase in sedimentation rate precisely with the timing of the middle Miocene climate changes associated with the development of a permanent East Antarctic Ice Sheet; and (2) the timing of sequence boundaries matches the deep-sea oxygen isotopic record, implicating glacioeustasy as a major driver

  10. Streaming support for data intensive cloud-based sequence analysis.

    Science.gov (United States)

    Issa, Shadi A; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

    2013-01-01

    Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of "resources-on-demand" and "pay-as-you-go", scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.

  11. Streaming Support for Data Intensive Cloud-Based Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Shadi A. Issa

    2013-01-01

    Full Text Available Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client’s site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.

  12. Next-generation sequence analysis of cancer xenograft models.

    Directory of Open Access Journals (Sweden)

    Fernando J Rossello

    Full Text Available Next-generation sequencing (NGS studies in cancer are limited by the amount, quality and purity of tissue samples. In this situation, primary xenografts have proven useful preclinical models. However, the presence of mouse-derived stromal cells represents a technical challenge to their use in NGS studies. We examined this problem in an established primary xenograft model of small cell lung cancer (SCLC, a malignancy often diagnosed from small biopsy or needle aspirate samples. Using an in silico strategy that assign reads according to species-of-origin, we prospectively compared NGS data from primary xenograft models with matched cell lines and with published datasets. We show here that low-coverage whole-genome analysis demonstrated remarkable concordance between published genome data and internal controls, despite the presence of mouse genomic DNA. Exome capture sequencing revealed that this enrichment procedure was highly species-specific, with less than 4% of reads aligning to the mouse genome. Human-specific expression profiling with RNA-Seq replicated array-based gene expression experiments, whereas mouse-specific transcript profiles correlated with published datasets from human cancer stroma. We conclude that primary xenografts represent a useful platform for complex NGS analysis in cancer research for tumours with limited sample resources, or those with prominent stromal cell populations.

  13. Streaming Support for Data Intensive Cloud-Based Sequence Analysis

    Science.gov (United States)

    Issa, Shadi A.; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J.; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

    2013-01-01

    Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation. PMID:23710461

  14. Identification of a nuclear matrix attachment region like sequence in the last intron of PI3Kγ

    International Nuclear Information System (INIS)

    Dai Bingbing; Ying Lei; Cai Rong; Li Ying; Zhang Xingqian; Lu Jian; Qian Guanxiang

    2006-01-01

    MARs are not only the structure bases of chromatin higher order structure but also have much biological significance. In this study, the whole sequence of about 100 kb in length from BAC clone of GS1-223D4 (GI: 5931478), in which human PI3Kγ gene is localized, was analyzed by two online-based computer programs, MARFinder and SMARTest. A strong potential MAR was predicted in the last and largest intron of PI3Kγ. The predicted 2 kb MAR, we refer to PIMAR, was further analyzed through biochemical methods in vitro and in vivo. The results showed that the PIMAR could be associated with nuclear matrices from HeLa cells both in vitro and in vivo. Further reporter gene analysis showed that in the transient transfection the expression of reporter gene linked with reversed PIMAR was repressed slightly, while in stably integrated state, the luciferase reporter both linked with reversed and orientated PIMAR was enhanced greatly in NIH-3T3 and K-562. These results suggest that the PIMAR maybe has the capacity of shielding integrated heterogeneous gene from chromatin position effect. Through combination of computer program analysis with confirmation by biochemical methods, we identified, for First time, a 2 kb matrix attachment region like sequence in the last intron of human PI3Kγ

  15. Extended -Regular Sequence for Automated Analysis of Microarray Images

    Directory of Open Access Journals (Sweden)

    Jin Hee-Jeong

    2006-01-01

    Full Text Available Microarray study enables us to obtain hundreds of thousands of expressions of genes or genotypes at once, and it is an indispensable technology for genome research. The first step is the analysis of scanned microarray images. This is the most important procedure for obtaining biologically reliable data. Currently most microarray image processing systems require burdensome manual block/spot indexing work. Since the amount of experimental data is increasing very quickly, automated microarray image analysis software becomes important. In this paper, we propose two automated methods for analyzing microarray images. First, we propose the extended -regular sequence to index blocks and spots, which enables a novel automatic gridding procedure. Second, we provide a methodology, hierarchical metagrid alignment, to allow reliable and efficient batch processing for a set of microarray images. Experimental results show that the proposed methods are more reliable and convenient than the commercial tools.

  16. Sequence Quality Analysis Tool for HIV Type 1 Protease and Reverse Transcriptase

    OpenAIRE

    DeLong, Allison K.; Wu, Mingham; Bennett, Diane; Parkin, Neil; Wu, Zhijin; Hogan, Joseph W.; Kantor, Rami

    2012-01-01

    Access to antiretroviral therapy is increasing globally and drug resistance evolution is anticipated. Currently, protease (PR) and reverse transcriptase (RT) sequence generation is increasing, including the use of in-house sequencing assays, and quality assessment prior to sequence analysis is essential. We created a computational HIV PR/RT Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment. Sequence quality thresholds are calculated from a large dataset (46,802...

  17. Transcriptome sequencing and de novo analysis of the copepod Calanus sinicus using 454 GS FLX.

    Directory of Open Access Journals (Sweden)

    Juan Ning

    Full Text Available BACKGROUND: Despite their species abundance and primary economic importance, genomic information about copepods is still limited. In particular, genomic resources are lacking for the copepod Calanus sinicus, which is a dominant species in the coastal waters of East Asia. In this study, we performed de novo transcriptome sequencing to produce a large number of expressed sequence tags for the copepod C. sinicus. RESULTS: Copepodid larvae and adults were used as the basic material for transcriptome sequencing. Using 454 pyrosequencing, a total of 1,470,799 reads were obtained, which were assembled into 56,809 high quality expressed sequence tags. Based on their sequence similarity to known proteins, about 14,000 different genes were identified, including members of all major conserved signaling pathways. Transcripts that were putatively involved with growth, lipid metabolism, molting, and diapause were also identified among these genes. Differentially expressed genes related to several processes were found in C. sinicus copepodid larvae and adults. We detected 284,154 single nucleotide polymorphisms (SNPs that provide a resource for gene function studies. CONCLUSION: Our data provide the most comprehensive transcriptome resource available for C. sinicus. This resource allowed us to identify genes associated with primary physiological processes and SNPs in coding regions, which facilitated the quantitative analysis of differential gene expression. These data should provide foundation for future genetic and genomic studies of this and related species.

  18. Selection of mRNA 5'-untranslated region sequence with high translation efficiency through ribosome display

    International Nuclear Information System (INIS)

    Mie, Masayasu; Shimizu, Shun; Takahashi, Fumio; Kobatake, Eiry

    2008-01-01

    The 5'-untranslated region (5'-UTR) of mRNAs functions as a translation enhancer, promoting translation efficiency. Many in vitro translation systems exhibit a reduced efficiency in protein translation due to decreased translation initiation. The use of a 5'-UTR sequence with high translation efficiency greatly enhances protein production in these systems. In this study, we have developed an in vitro selection system that favors 5'-UTRs with high translation efficiency using a ribosome display technique. A 5'-UTR random library, comprised of 5'-UTRs tagged with a His-tag and Renilla luciferase (R-luc) fusion, were in vitro translated in rabbit reticulocytes. By limiting the translation period, only mRNAs with high translation efficiency were translated. During translation, mRNA, ribosome and translated R-luc with His-tag formed ternary complexes. They were collected with translated His-tag using Ni-particles. Extracted mRNA from ternary complex was amplified using RT-PCR and sequenced. Finally, 5'-UTR with high translation efficiency was obtained from random 5'-UTR library

  19. Regional analysis of the nuclear-electricity

    International Nuclear Information System (INIS)

    Parera, M. D.

    2011-11-01

    In this study was realized a regional analysis of the Argentinean electric market contemplating the effects of regional cooperation, the internal and international interconnections; and the possibilities of insert of new nuclear power stations were evaluated in different regions of the country, indicating the most appropriate areas to carry out these facilities to increase the penetration of the nuclear energy in the national energy matrix. Also was studied the interconnection of the electricity and natural gas markets, due to the existent linking among both energy forms. With this purpose the program Message (Model for energy supply strategy alternatives and their general environmental impacts) was used, promoted by the International Atomic Energy Agency. This model carries out an economic optimization level country, obtaining the minimum cost as a result for the modeling system. The division for regions realized by the Compania Administradora del Mercado Mayorista Electrico (CAMMESA) was used, which divides to the country in eight regions. They were considered the characteristics and necessities of each one of them, their respective demands and offers of electric power and natural gas, as well as their existent and projected interconnections, composed by the electric lines and gas pipes. According to the results obtained through the model, the nuclear-electricity is a competitive option. (Author)

  20. Geographical data structures supporting regional analysis

    International Nuclear Information System (INIS)

    Edwards, R.G.; Durfee, R.C.

    1978-01-01

    In recent years the computer has become a valuable aid in solving regional environmental problems. Over a hundred different geographic information systems have been developed to digitize, store, analyze, and display spatially distributed data. One important aspect of these systems is the data structure (e.g. grids, polygons, segments) used to model the environment being studied. This paper presents eight common geographic data structures and their use in studies of coal resources, power plant siting, population distributions, LANDSAT imagery analysis, and landuse analysis

  1. Genome sequencing of bacteria: sequencing, de novo assembly and rapid analysis using open source tools.

    Science.gov (United States)

    Kisand, Veljo; Lettieri, Teresa

    2013-04-01

    De novo genome sequencing of previously uncharacterized microorganisms has the potential to open up new frontiers in microbial genomics by providing insight into both functional capabilities and biodiversity. Until recently, Roche 454 pyrosequencing was the NGS method of choice for de novo assembly because it generates hundreds of thousands of long reads (tools for processing NGS data are increasingly free and open source and are often adopted for both their high quality and role in promoting academic freedom. The error rate of pyrosequencing the Alcanivorax borkumensis genome was such that thousands of insertions and deletions were artificially introduced into the finished genome. Despite a high coverage (~30 fold), it did not allow the reference genome to be fully mapped. Reads from regions with errors had low quality, low coverage, or were missing. The main defect of the reference mapping was the introduction of artificial indels into contigs through lower than 100% consensus and distracting gene calling due to artificial stop codons. No assembler was able to perform de novo assembly comparable to reference mapping. Automated annotation tools performed similarly on reference mapped and de novo draft genomes, and annotated most CDSs in the de novo assembled draft genomes. Free and open source software (FOSS) tools for assembly and annotation of NGS data are being developed rapidly to provide accurate results with less computational effort. Usability is not high priority and these tools currently do not allow the data to be processed without manual intervention. Despite this, genome assemblers now readily assemble medium short reads into long contigs (>97-98% genome coverage). A notable gap in pyrosequencing technology is the quality of base pair calling and conflicting base pairs between single reads at the same nucleotide position. Regardless, using draft whole genomes that are not finished and remain fragmented into tens of contigs allows one to characterize

  2. An overview of the Phalaenopsis orchid genome through BAC end sequence analysis

    Directory of Open Access Journals (Sweden)

    Hsiao Yu-Yun

    2011-01-01

    Full Text Available Abstract Background Phalaenopsis orchids are popular floral crops, and development of new cultivars is economically important to floricultural industries worldwide. Analysis of orchid genes could facilitate orchid improvement. Bacterial artificial chromosome (BAC end sequences (BESs can provide the first glimpses into the sequence composition of a novel genome and can yield molecular markers for use in genetic mapping and breeding. Results We used two BAC libraries (constructed using the BamHI and HindIII restriction enzymes of Phalaenopsis equestris to generate pair-end sequences from 2,920 BAC clones (71.4% and 28.6% from the BamHI and HindIII libraries, respectively, at a success rate of 95.7%. A total of 5,535 BESs were generated, representing 4.5 Mb, or about 0.3% of the Phalaenopsis genome. The trimmed sequences ranged from 123 to 1,397 base pairs (bp in size, with an average edited read length of 821 bp. When these BESs were subjected to sequence homology searches, it was found that 641 (11.6% were predicted to represent protein-encoding regions, whereas 1,272 (23.0% contained repetitive DNA. Most of the repetitive DNA sequences were gypsy- and copia-like retrotransposons (41.9% and 12.8%, respectively, whereas only 10.8% were DNA transposons. Further, 950 potential simple sequence repeats (SSRs were discovered. Dinucleotides were the most abundant repeat motifs; AT/TA dimer repeats were the most frequent SSRs, representing 253 (26.6% of all identified SSRs. Microsynteny analysis revealed that more BESs mapped to the whole-genome sequences of poplar than to those of grape or Arabidopsis, and even fewer mapped to the rice genome. This work will facilitate analysis of the Phalaenopsis genome, and will help clarify similarities and differences in genome composition between orchids and other plant species. Conclusion Using BES analysis, we obtained an overview of the Phalaenopsis genome in terms of gene abundance, the presence of repetitive

  3. Multilocus Sequence Analysis of Cercospora spp. from Different Host Plant Families

    Directory of Open Access Journals (Sweden)

    Floreta Fiska Yuliarni

    2014-06-01

    Full Text Available Identification of the genus Cercospora is still complicated due to the host preferences often being used as the main criteria to propose a new name. We determined the relationship between host plants and multilocus sequence variations (ITS rDNA including 5.8S rDNA, elongation factor 1-α, and calmodulin in Cercospora spp. to investigate the host specificity. We used 53 strains of Cercospora spp. infecting 12 plant families for phylogenetic analysis. The sequences of 23 strains of Cercospora spp. infecting the plant families of Asteraceae, Cucurbitaceae, and Solanaceae were determined in this study. The sequences of 30 strains of Cercospora spp. infecting the plant families of Fabaceae, Amaranthaceae, Apiaceae, Plumbaginaceae, Malvaceae, Cistaceae, Plantaginaceae, Lamiaceae, and Poaceae were obtained from GenBank. The molecular phylogenetic analysis revealed that the majority of Cercospora species lack host specificity, and only C. zinniicola, C. zeina, C. zeae-maydis, C. cocciniae, and C. mikaniicola were found to be host-specific. Closely related species of Cercospora could not be distinguished using molecular analyses of ITS, EF, and CAL gene regions. The topology of the phylogenetic tree based on the CAL gene showed a better topology and Cercospora species separation than the trees developed based on the ITS rDNA region or the EF gene.

  4. Region-wide and ecotype-specific differences in demographic histories of threespine stickleback populations, estimated from whole genome sequences.

    Science.gov (United States)

    Liu, Shenglin; Hansen, Michael M; Jacobsen, Magnus W

    2016-10-01

    We analysed 81 whole genome sequences of threespine sticklebacks from Pacific North America, Greenland and Northern Europe, representing 16 populations. Principal component analysis of nuclear SNPs grouped populations according to geographical location, with Pacific populations being more divergent from each other relative to European and Greenlandic populations. Analysis of mitogenome sequences showed Northern European populations to represent a single phylogeographical lineage, whereas Greenlandic and particularly Pacific populations showed admixture between lineages. We estimated demographic history using a genomewide coalescence with recombination approach. The Pacific populations showed gradual population expansion starting >100 Kya, possibly reflecting persistence in cryptic refuges near the present distributional range, although we do not rule out possible influence of ancient admixture. Sharp population declines ca. 14-15 Kya were suggested to reflect founding of freshwater populations by marine ancestors. In Greenland and Northern Europe, demographic expansion started ca. 20-25 Kya coinciding with the end of the Last Glacial Maximum. In both regions, marine and freshwater populations started to show different demographic trajectories ca. 8-9 Kya, suggesting that this was the time of recolonization. In Northern Europe, this estimate was surprisingly late, but found support in subfossil evidence for presence of several freshwater fish species but not sticklebacks 12 Kya. The results demonstrate distinctly different demographic histories across geographical regions with potential consequences for adaptive processes. They also provide empirical support for previous assumptions about freshwater populations being founded independently from large, coherent marine populations, a key element in the Transporter Hypothesis invoked to explain the widespread occurrence of parallel evolution across freshwater stickleback populations. © 2016 John Wiley & Sons Ltd.

  5. Seismic stratigraphy and regional unconformity analysis of Chukchi Sea Basins

    Science.gov (United States)

    Agasheva, Mariia; Karpov, Yury; Stoupakova, Antonina; Suslova, Anna

    2017-04-01

    Russian Chukchi Sea Shelf one of petroleum potential province and still one of the most uninvestigated area. North and Sough Chukchi Trough that separated by Wrangel-Hearld Arch have different origin. The main challenge is stratigraphic sequences determination that filled North and South Chukchi basins. The joint tectonic evolution of the territory as Canada basin opening and Brooks Range-Wrangel Herald orogenic events enable to expect the analogous stratigraphy sequences in Russian Part. Analysis of 2D seismic data of Russian and American Chukchi Sea represent the major seismic reflectance that traced throughout the basins. Referring to this data North Chukchi basin includes four seismic stratigraphic sequences - Franklian (pre-Mississippian), Ellesmirian (Upper Devonian-Jurassic), Beaufortian (Jurassic-Lower Cretaceous) and Brookian (Lower Cretaceous-Cenozoic), as it is in North Slope Alaska [1]. South Chukchi basin has different tectonic nature, representing only Franclian basement and Brookian sequences. Sedimentary cover of North Chukchi basins starts with Ellesmirian sequence it is marked by bright reflector that separates from chaotic folded Franklian sequence. Lower Ellesmirian sequence fills of grabens that formed during upper Devonian rifting. Devonian extension event was initiated as a result of Post-Caledonian orogenic collapse, terminating with the opening of Arctic oceans. Beaufortian sequence is distinguished in Colville basin and Hanna Trough by seismically defined clinoforms. Paleozoic and Mesozoic strata are eroded by regional Lower Cretaceous Unconformity (LCU) linked with Canada basin opening. LCU is defined at seismic by angular unconformity, tracing at most arctic basins. Lower Cretaceous erosion and uplift event are of Hauterivian to Aptian age in Brooks Range and the Loppa High uplift refer to the early Barremian. The Lower Cretaceous clinoform complex downlaps to LCU horizon and filling North Chukchi basin (as in Colville basin Alska

  6. Sequencing Infrastructure Investments under Deep Uncertainty Using Real Options Analysis

    Directory of Open Access Journals (Sweden)

    Nishtha Manocha

    2018-02-01

    Full Text Available The adaptation tipping point and adaptation pathway approach developed to make decisions under deep uncertainty do not shed light on which among the multiple available pathways should be chosen as the preferred pathway. This creates the need to extend these approaches by means of suitable tools that can help sequence actions and subsequently enable the outlining of relevant policies. This paper presents two sequencing approaches, namely, the “Build to Target” and “Build Up” approach, to aid in sub-selecting a set of preferred pathways. Both approaches differ in the levels of flexibility they offer. They are exemplified by means of two case studies wherein the Net Present Valuation and the Real Options Analysis are employed as selection criterions. The results demonstrate the benefit of these two approaches when used in conjunction with the adaptation pathways and show how the pathways selected by means of a Build to Target approach generally have a value greater than, or at least the same as, the pathways selected by the Build Up approach. Further, this paper also demonstrates the capacity of Real Options to quantify and capture the economic value of flexibility, which cannot be done by traditional valuation approaches such as Net Present Valuation.

  7. Nonlinear analysis of sequence repeats of multi-domain proteins

    Energy Technology Data Exchange (ETDEWEB)

    Huang Yanzhao [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China); Li Mingfeng [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China); Xiao Yi [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China)]. E-mail: lmf_bill@sina.com

    2007-11-15

    Many multi-domain proteins have repetitive three-dimensional structures but nearly-random amino acid sequences. In the present paper, by using a modified recurrence plot proposed by us previously, we show that these amino acid sequences have hidden repetitions in fact. These results indicate that the repetitive domain structures are encoded by the repetitive sequences. This also gives a method to detect the repetitive domain structures directly from amino acid sequences.

  8. Genome sequence of the acid-tolerant Desulfovibrio sp. DV isolated from the sediments of a Pb-Zn mine tailings dam in the Chita region, Russia

    Directory of Open Access Journals (Sweden)

    Anastasiia Kovaliova

    2017-03-01

    Full Text Available Here we report the draft genome sequence of the acid-tolerant Desulfovibrio sp. DV isolated from the sediments of a Pb-Zn mine tailings dam in the Chita region, Russia. The draft genome has a size of 4.9 Mb and encodes multiple K+-transporters and proton-consuming decarboxylases. The phylogenetic analysis based on concatenated ribosomal proteins revealed that strain DV clusters together with the acid-tolerant Desulfovibrio sp. TomC and Desulfovibrio magneticus. The draft genome sequence and annotation have been deposited at GenBank under the accession number MLBG00000000.

  9. Sequence evolution of the hypervariable region in the putative envelope region E2/NS1 of hepatitis C virus is correlated with specific humoral immune responses.

    OpenAIRE

    van Doorn, L J; Capriles, I; Maertens, G; DeLeys, R; Murray, K; Kos, T; Schellekens, H; Quint, W

    1995-01-01

    Sequence evolution of the hypervariable region 1 (HVR1) in the N terminus of E2/NS1 of hepatitis C virus (HCV) was studied retrospectively in six chimpanzees inoculated with the same genotype 1b strain, containing a unique predominant HVR1 sequence. Immediately after inoculation, all animals contained the same HVR predominant sequence. Two animals developed an acute self-limiting infection. Anti-HVR1 immunoglobulin G (IgG) was produced 40 to 60 days after inoculation and rapidly disappeared a...

  10. Human factors review for Severe Accident Sequence Analysis (SASA)

    International Nuclear Information System (INIS)

    Krois, P.A.; Haas, P.M.; Manning, J.J.; Bovell, C.R.

    1984-01-01

    The paper will discuss work being conducted during this human factors review including: (1) support of the Severe Accident Sequence Analysis (SASA) Program based on an assessment of operator actions, and (2) development of a descriptive model of operator severe accident management. Research by SASA analysts on the Browns Ferry Unit One (BF1) anticipated transient without scram (ATWS) was supported through a concurrent assessment of operator performance to demonstrate contributions to SASA analyses from human factors data and methods. A descriptive model was developed called the Function Oriented Accident Management (FOAM) model, which serves as a structure for bridging human factors, operations, and engineering expertise and which is useful for identifying needs/deficiencies in the area of accident management. The assessment of human factors issues related to ATWS required extensive coordination with SASA analysts. The analysis was consolidated primarily to six operator actions identified in the Emergency Procedure Guidelines (EPGs) as being the most critical to the accident sequence. These actions were assessed through simulator exercises, qualitative reviews, and quantitative human reliability analyses. The FOAM descriptive model assumes as a starting point that multiple operator/system failures exceed the scope of procedures and necessitates a knowledge-based emergency response by the operators. The FOAM model provides a functionally-oriented structure for assembling human factors, operations, and engineering data and expertise into operator guidance for unconventional emergency responses to mitigate severe accident progression and avoid/minimize core degradation. Operators must also respond to potential radiological release beyond plant protective barriers. Research needs in accident management and potential uses of the FOAM model are described. 11 references, 1 figure

  11. Rapid profiling of the antigen regions recognized by serum antibodies using massively parallel sequencing of antigen-specific libraries.

    KAUST Repository

    Domina, Maria; Lanza Cariccio, Veronica; Benfatto, Salvatore; D'Aliberti, Deborah; Venza, Mario; Borgogni, Erica; Castellino, Flora; Biondo, Carmelo; D'Andrea, Daniel; Grassi, Luigi; Tramontano, Anna; Teti, Giuseppe; Felici, Franco; Beninati, Concetta

    2014-01-01

    There is a need for techniques capable of identifying the antigenic epitopes targeted by polyclonal antibody responses during deliberate or natural immunization. Although successful, traditional phage library screening is laborious and can map only some of the epitopes. To accelerate and improve epitope identification, we have employed massive sequencing of phage-displayed antigen-specific libraries using the Illumina MiSeq platform. This enabled us to precisely identify the regions of a model antigen, the meningococcal NadA virulence factor, targeted by serum antibodies in vaccinated individuals and to rank hundreds of antigenic fragments according to their immunoreactivity. We found that next generation sequencing can significantly empower the analysis of antigen-specific libraries by allowing simultaneous processing of dozens of library/serum combinations in less than two days, including the time required for antibody-mediated library selection. Moreover, compared with traditional plaque picking, the new technology (named Phage-based Representation OF Immuno-Ligand Epitope Repertoire or PROFILER) provides superior resolution in epitope identification. PROFILER seems ideally suited to streamline and guide rational antigen design, adjuvant selection, and quality control of newly produced vaccines. Furthermore, this method is also susceptible to find important applications in other fields covered by traditional quantitative serology.

  12. Rapid profiling of the antigen regions recognized by serum antibodies using massively parallel sequencing of antigen-specific libraries.

    Directory of Open Access Journals (Sweden)

    Maria Domina

    Full Text Available There is a need for techniques capable of identifying the antigenic epitopes targeted by polyclonal antibody responses during deliberate or natural immunization. Although successful, traditional phage library screening is laborious and can map only some of the epitopes. To accelerate and improve epitope identification, we have employed massive sequencing of phage-displayed antigen-specific libraries using the Illumina MiSeq platform. This enabled us to precisely identify the regions of a model antigen, the meningococcal NadA virulence factor, targeted by serum antibodies in vaccinated individuals and to rank hundreds of antigenic fragments according to their immunoreactivity. We found that next generation sequencing can significantly empower the analysis of antigen-specific libraries by allowing simultaneous processing of dozens of library/serum combinations in less than two days, including the time required for antibody-mediated library selection. Moreover, compared with traditional plaque picking, the new technology (named Phage-based Representation OF Immuno-Ligand Epitope Repertoire or PROFILER provides superior resolution in epitope identification. PROFILER seems ideally suited to streamline and guide rational antigen design, adjuvant selection, and quality control of newly produced vaccines. Furthermore, this method is also susceptible to find important applications in other fields covered by traditional quantitative serology.

  13. Rapid profiling of the antigen regions recognized by serum antibodies using massively parallel sequencing of antigen-specific libraries.

    KAUST Repository

    Domina, Maria

    2014-12-04

    There is a need for techniques capable of identifying the antigenic epitopes targeted by polyclonal antibody responses during deliberate or natural immunization. Although successful, traditional phage library screening is laborious and can map only some of the epitopes. To accelerate and improve epitope identification, we have employed massive sequencing of phage-displayed antigen-specific libraries using the Illumina MiSeq platform. This enabled us to precisely identify the regions of a model antigen, the meningococcal NadA virulence factor, targeted by serum antibodies in vaccinated individuals and to rank hundreds of antigenic fragments according to their immunoreactivity. We found that next generation sequencing can significantly empower the analysis of antigen-specific libraries by allowing simultaneous processing of dozens of library/serum combinations in less than two days, including the time required for antibody-mediated library selection. Moreover, compared with traditional plaque picking, the new technology (named Phage-based Representation OF Immuno-Ligand Epitope Repertoire or PROFILER) provides superior resolution in epitope identification. PROFILER seems ideally suited to streamline and guide rational antigen design, adjuvant selection, and quality control of newly produced vaccines. Furthermore, this method is also susceptible to find important applications in other fields covered by traditional quantitative serology.

  14. Sequence analysis of cereal sucrose synthase genes and isolation ...

    African Journals Online (AJOL)

    SERVER

    2007-10-18

    Oct 18, 2007 ... sequencing of sucrose synthase gene fragment from sor- ghum using primers designed at their conserved exons. MATERIALS AND METHODS. Multiple sequence alignment. Sucrose synthase gene sequences of various cereals like rice, maize, and barley were accessed from NCBI Genbank database.

  15. Chimera: construction of chimeric sequences for phylogenetic analysis

    NARCIS (Netherlands)

    Leunissen, J.A.M.

    2003-01-01

    Chimera allows the construction of chimeric protein or nucleic acid sequence files by concatenating sequences from two or more sequence files in PHYLIP formats. It allows the user to interactively select genes and species from the input files. The concatenated result is stored to one single output

  16. A new method for detecting signal regions in ordered sequences of real numbers, and application to viral genomic data.

    Science.gov (United States)

    Gog, Julia R; Lever, Andrew M L; Skittrall, Jordan P

    2018-01-01

    We present a fast, robust and parsimonious approach to detecting signals in an ordered sequence of numbers. Our motivation is in seeking a suitable method to take a sequence of scores corresponding to properties of positions in virus genomes, and find outlying regions of low scores. Suitable statistical methods without using complex models or making many assumptions are surprisingly lacking. We resolve this by developing a method that detects regions of low score within sequences of real numbers. The method makes no assumptions a priori about the length of such a region; it gives the explicit location of the region and scores it statistically. It does not use detailed mechanistic models so the method is fast and will be useful in a wide range of applications. We present our approach in detail, and test it on simulated sequences. We show that it is robust to a wide range of signal morphologies, and that it is able to capture multiple signals in the same sequence. Finally we apply it to viral genomic data to identify regions of evolutionary conservation within influenza and rotavirus.

  17. Accident Sequence Evaluation Program: Human reliability analysis procedure

    Energy Technology Data Exchange (ETDEWEB)

    Swain, A.D.

    1987-02-01

    This document presents a shortened version of the procedure, models, and data for human reliability analysis (HRA) which are presented in the Handbook of Human Reliability Analysis With emphasis on Nuclear Power Plant Applications (NUREG/CR-1278, August 1983). This shortened version was prepared and tried out as part of the Accident Sequence Evaluation Program (ASEP) funded by the US Nuclear Regulatory Commission and managed by Sandia National Laboratories. The intent of this new HRA procedure, called the ''ASEP HRA Procedure,'' is to enable systems analysts, with minimal support from experts in human reliability analysis, to make estimates of human error probabilities and other human performance characteristics which are sufficiently accurate for many probabilistic risk assessments. The ASEP HRA Procedure consists of a Pre-Accident Screening HRA, a Pre-Accident Nominal HRA, a Post-Accident Screening HRA, and a Post-Accident Nominal HRA. The procedure in this document includes changes made after tryout and evaluation of the procedure in four nuclear power plants by four different systems analysts and related personnel, including human reliability specialists. The changes consist of some additional explanatory material (including examples), and more detailed definitions of some of the terms. 42 refs.

  18. Accident Sequence Evaluation Program: Human reliability analysis procedure

    International Nuclear Information System (INIS)

    Swain, A.D.

    1987-02-01

    This document presents a shortened version of the procedure, models, and data for human reliability analysis (HRA) which are presented in the Handbook of Human Reliability Analysis With emphasis on Nuclear Power Plant Applications (NUREG/CR-1278, August 1983). This shortened version was prepared and tried out as part of the Accident Sequence Evaluation Program (ASEP) funded by the US Nuclear Regulatory Commission and managed by Sandia National Laboratories. The intent of this new HRA procedure, called the ''ASEP HRA Procedure,'' is to enable systems analysts, with minimal support from experts in human reliability analysis, to make estimates of human error probabilities and other human performance characteristics which are sufficiently accurate for many probabilistic risk assessments. The ASEP HRA Procedure consists of a Pre-Accident Screening HRA, a Pre-Accident Nominal HRA, a Post-Accident Screening HRA, and a Post-Accident Nominal HRA. The procedure in this document includes changes made after tryout and evaluation of the procedure in four nuclear power plants by four different systems analysts and related personnel, including human reliability specialists. The changes consist of some additional explanatory material (including examples), and more detailed definitions of some of the terms. 42 refs

  19. A Quantitative Accident Sequence Analysis for a VHTR

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Jintae; Lee, Joeun; Jae, Moosung [Hanyang University, Seoul (Korea, Republic of)

    2016-05-15

    In Korea, the basic design features of VHTR are currently discussed in the various design concepts. Probabilistic risk assessment (PRA) offers a logical and structured method to assess risks of a large and complex engineered system, such as a nuclear power plant. It will be introduced at an early stage in the design, and will be upgraded at various design and licensing stages as the design matures and the design details are defined. Risk insights to be developed from the PRA are viewed as essential to developing a design that is optimized in meeting safety objectives and in interpreting the applicability of the existing demands to the safety design approach of the VHTR. In this study, initiating events which may occur in VHTRs were selected through MLD method. The initiating events were then grouped into four categories for the accident sequence analysis. Initiating events frequency and safety systems failure rate were calculated by using reliability data obtained from the available sources and fault tree analysis. After quantification, uncertainty analysis was conducted. The SR and LR frequency are calculated respectively 7.52E- 10/RY and 7.91E-16/RY, which are relatively less than the core damage frequency of LWRs.

  20. Prediction of HIV-1 coreceptor usage (tropism) by sequence analysis using a genotypic approach.

    Science.gov (United States)

    Sierra, Saleta; Kaiser, Rolf; Lübke, Nadine; Thielen, Alexander; Schuelter, Eugen; Heger, Eva; Däumer, Martin; Reuter, Stefan; Esser, Stefan; Fätkenheuer, Gerd; Pfister, Herbert; Oette, Mark; Lengauer, Thomas

    2011-12-01

    Maraviroc (MVC) is the first licensed antiretroviral drug from the class of coreceptor antagonists. It binds to the host coreceptor CCR5, which is used by the majority of HIV strains in order to infect the human immune cells (Fig. 1). Other HIV isolates use a different coreceptor, the CXCR4. Which receptor is used, is determined in the virus by the Env protein (Fig. 2). Depending on the coreceptor used, the viruses are classified as R5 or X4, respectively. MVC binds to the CCR5 receptor inhibiting the entry of R5 viruses into the target cell. During the course of disease, X4 viruses may emerge and outgrow the R5 viruses. Determination of coreceptor usage (also called tropism) is therefore mandatory prior to administration of MVC, as demanded by EMA and FDA. The studies for MVC efficiency MOTIVATE, MERIT and 1029 have been performed with the Trofile assay from Monogram, San Francisco, U.S.A. This is a high quality assay based on sophisticated recombinant tests. The acceptance for this test for daily routine is rather low outside of the U.S.A., since the European physicians rather tend to work with decentralized expert laboratories, which also provide concomitant resistance testing. These laboratories have undergone several quality assurance evaluations, the last one being presented in 2011. For several years now, we have performed tropism determinations based on sequence analysis from the HIV env-V3 gene region (V3). This region carries enough information to perform a reliable prediction. The genotypic determination of coreceptor usage presents advantages such as: shorter turnover time (equivalent to resistance testing), lower costs, possibility to adapt the results to the patients' needs and possibility of analysing clinical samples with very low or even undetectable viral load (VL), particularly since the number of samples analysed with VL < 1000 copies/μl roughly increased in the last years (Fig. 3). The main steps for tropism testing (Fig. 4) demonstrated in

  1. DNA-PK dependent targeting of DNA-ends to a protein complex assembled on matrix attachment region DNA sequences

    International Nuclear Information System (INIS)

    Mauldin, S.K.; Getts, R.C.; Perez, M.L.; DiRienzo, S.; Stamato, T.D.

    2003-01-01

    Full text: We find that nuclear protein extracts from mammalian cells contain an activity that allows DNA ends to associate with circular pUC18 plasmid DNA. This activity requires the catalytic subunit of DNA-PK (DNA-PKcs) and Ku since it was not observed in mutants lacking Ku or DNA-PKcs but was observed when purified Ku/DNA-PKcs was added to these mutant extracts. Competition experiments between pUC18 and pUC18 plasmids containing various nuclear matrix attachment region (MAR) sequences suggest that DNA ends preferentially associate with plasmids containing MAR DNA sequences. At a 1:5 mass ratio of MAR to pUC18, approximately equal amounts of DNA end binding to the two plasmids were observed, while at a 1:1 ratio no pUC18 end-binding was observed. Calculation of relative binding activities indicates that DNA-end binding activities to MAR sequences was 7 to 21 fold higher than pUC18. Western analysis of proteins bound to pUC18 and MAR plasmids indicates that XRCC4, DNA ligase IV, scaffold attachment factor A, topoisomerase II, and poly(ADP-ribose) polymerase preferentially associate with the MAR plasmid in the absence or presence of DNA ends. In contrast, Ku and DNA-PKcs were found on the MAR plasmid only in the presence of DNA ends. After electroporation of a 32P-labeled DNA probe into human cells and cell fractionation, 87% of the total intercellular radioactivity remained in nuclei after a 0.5M NaCl extraction suggesting the probe was strongly bound in the nucleus. The above observations raise the possibility that DNA-PK targets DNA-ends to a repair and/or DNA damage signaling complex which is assembled on MAR sites in the nucleus

  2. Highly conserved intragenic HSV-2 sequences: Results from next-generation sequencing of HSV-2 UL and US regions from genital swabs collected from 3 continents.

    Science.gov (United States)

    Johnston, Christine; Magaret, Amalia; Roychoudhury, Pavitra; Greninger, Alexander L; Cheng, Anqi; Diem, Kurt; Fitzgibbon, Matthew P; Huang, Meei-Li; Selke, Stacy; Lingappa, Jairam R; Celum, Connie; Jerome, Keith R; Wald, Anna; Koelle, David M

    2017-10-01

    Understanding the variability in circulating herpes simplex virus type 2 (HSV-2) genomic sequences is critical to the development of HSV-2 vaccines. Genital lesion swabs containing ≥ 10 7 log 10 copies HSV DNA collected from Africa, the USA, and South America underwent next-generation sequencing, followed by K-mer based filtering and de novo genomic assembly. Sites of heterogeneity within coding regions in unique long and unique short (U L _U S ) regions were identified. Phylogenetic trees were created using maximum likelihood reconstruction. Among 46 samples from 38 persons, 1468 intragenic base-pair substitutions were identified. The maximum nucleotide distance between strains for concatenated U L_ U S segments was 0.4%. Phylogeny did not reveal geographic clustering. The most variable proteins had non-synonymous mutations in < 3% of amino acids. Unenriched HSV-2 DNA can undergo next-generation sequencing to identify intragenic variability. The use of clinical swabs for sequencing expands the information that can be gathered directly from these specimens. Copyright © 2017 Elsevier Inc. All rights reserved.

  3. Comparing methods of classifying life courses: Sequence analysis and latent class analysis

    NARCIS (Netherlands)

    Elzinga, C.H.; Liefbroer, Aart C.; Han, Sapphire

    2017-01-01

    We compare life course typology solutions generated by sequence analysis (SA) and latent class analysis (LCA). First, we construct an analytic protocol to arrive at typology solutions for both methodologies and present methods to compare the empirical quality of alternative typologies. We apply this

  4. Comparing methods of classifying life courses: sequence analysis and latent class analysis

    NARCIS (Netherlands)

    Han, Y.; Liefbroer, A.C.; Elzinga, C.

    2017-01-01

    We compare life course typology solutions generated by sequence analysis (SA) and latent class analysis (LCA). First, we construct an analytic protocol to arrive at typology solutions for both methodologies and present methods to compare the empirical quality of alternative typologies. We apply this

  5. [Sequence analysis of LEAFY homologous gene from Dendrobium moniliforme and application for identification of medicinal Dendrobium].

    Science.gov (United States)

    Xing, Wen-Rui; Hou, Bei-Wei; Guan, Jing-Jiao; Luo, Jing; Ding, Xiao-Yu

    2013-04-01

    The LEAFY (LFY) homologous gene of Dendrobium moniliforme (L.) Sw. was cloned by new primers which were designed based on the conservative region of known sequences of orchid LEAFY gene. Partial LFY homologous gene was cloned by common PCR, then we got the complete LFY homologous gene Den LFY by Tail-PCR. The complete sequence of DenLFY gene was 3 575 bp which contained three exons and two introns. Using BLAST method, comparison analysis among the exon of LFY homologous gene indicted that the DenLFY gene had high identity with orchids LFY homologous, including the related fragment of PhalLFY (84%) in Phalaenopsis hybrid cultivar, LFY homologous gene in Oncidium (90%) and in other orchid (over 80%). Using MP analysis, Dendrobium is found to be the sister to Oncidium and Phalaenopsis. Homologous analysis demonstrated that the C-terminal amino acids were highly conserved. When the exons and introns were separately considered, exons and the sequence of amino acid were good markers for the function research of DenLFY gene. The second intron can be used in authentication research of Dendrobium based on the length polymorphism between Dendrobium moniliforme and Dendrobium officinale.

  6. In silico analysis of Simple Sequence Repeats from chloroplast genomes of Solanaceae species

    Directory of Open Access Journals (Sweden)

    Evandro Vagner Tambarussi

    2009-01-01

    Full Text Available The availability of chloroplast genome (cpDNA sequences of Atropa belladonna, Nicotiana sylvestris, N.tabacum, N. tomentosiformis, Solanum bulbocastanum, S. lycopersicum and S. tuberosum, which are Solanaceae species,allowed us to analyze the organization of cpSSRs in their genic and intergenic regions. In general, the number of cpSSRs incpDNA ranged from 161 in S. tuberosum to 226 in N. tabacum, and the number of intergenic cpSSRs was higher than geniccpSSRs. The mononucleotide repeats were the most frequent in studied species, but we also identified di-, tri-, tetra-, pentaandhexanucleotide repeats. Multiple alignments of all cpSSRs sequences from Solanaceae species made the identification ofnucleotide variability possible and the phylogeny was estimated by maximum parsimony. Our study showed that the plastomedatabase can be exploited for phylogenetic analysis and biotechnological approaches.

  7. An integrated tool to study MHC region: accurate SNV detection and HLA genes typing in human MHC region using targeted high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Hongzhi Cao

    Full Text Available The major histocompatibility complex (MHC is one of the most variable and gene-dense regions of the human genome. Most studies of the MHC, and associated regions, focus on minor variants and HLA typing, many of which have been demonstrated to be associated with human disease susceptibility and metabolic pathways. However, the detection of variants in the MHC region, and diagnostic HLA typing, still lacks a coherent, standardized, cost effective and high coverage protocol of clinical quality and reliability. In this paper, we presented such a method for the accurate detection of minor variants and HLA types in the human MHC region, using high-throughput, high-coverage sequencing of target regions. A probe set was designed to template upon the 8 annotated human MHC haplotypes, and to encompass the 5 megabases (Mb of the extended MHC region. We deployed our probes upon three, genetically diverse human samples for probe set evaluation, and sequencing data show that ∼97% of the MHC region, and over 99% of the genes in MHC region, are covered with sufficient depth and good evenness. 98% of genotypes called by this capture sequencing prove consistent with established HapMap genotypes. We have concurrently developed a one-step pipeline for calling any HLA type referenced in the IMGT/HLA database from this target capture sequencing data, which shows over 96% typing accuracy when deployed at 4 digital resolution. This cost-effective and highly accurate approach for variant detection and HLA typing in the MHC region may lend further insight into immune-mediated diseases studies, and may find clinical utility in transplantation medicine research. This one-step pipeline is released for general evaluation and use by the scientific community.

  8. RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

    Science.gov (United States)

    Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

    2012-01-01

    RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.

  9. Identification of Dendrobium species by a candidate DNA barcode sequence: the chloroplast psbA-trnH intergenic region.

    Science.gov (United States)

    Yao, Hui; Song, Jing-Yuan; Ma, Xin-Ye; Liu, Chang; Li, Ying; Xu, Hong-Xi; Han, Jian-Ping; Duan, Li-Sheng; Chen, Shi-Lin

    2009-05-01

    DNA barcoding is a novel technology that uses a standard DNA sequence to facilitate species identification. Although a consensus has not been reached regarding which DNA sequences can be used as the best plant barcodes, the psbA-trnH spacer region has been tested extensively in recent years. In this study, we hypothesize that the psbA-trnH spacer regions are also effective barcodes for Dendrobium species. We have sequenced the chloroplast psbA-trnH intergenic spacers of 17 Dendrobium species to test this hypothesis. The sequences were found to be significantly different from those of other species, with percentages of variation ranging from 0.3 % to 2.3 % and an average of 1.2 %. In contrast, the intraspecific variation among the Dendrobium species studied ranged from 0 % to 0.1 %. The sequence difference between the psbA-trnH sequences of 17 Dendrobium species and one Bulbophyllum odoratissimum ranged from 2.0 % to 3.1 %, with an average of 2.5 %. Our results support the notion that the psbA-trnH intergenic spacer region could be used as a barcode to distinguish various Dendrobium species and to differentiate Dendrobium species from other adulterating species. Copyright Georg Thieme Verlag KG Stuttgart. New York.

  10. Frame sequences analysis technique of linear objects movement

    Science.gov (United States)

    Oshchepkova, V. Y.; Berg, I. A.; Shchepkin, D. V.; Kopylova, G. V.

    2017-12-01

    Obtaining data by noninvasive methods are often needed in many fields of science and engineering. This is achieved through video recording in various frame rate and light spectra. In doing so quantitative analysis of movement of the objects being studied becomes an important component of the research. This work discusses analysis of motion of linear objects on the two-dimensional plane. The complexity of this problem increases when the frame contains numerous objects whose images may overlap. This study uses a sequence containing 30 frames at the resolution of 62 × 62 pixels and frame rate of 2 Hz. It was required to determine the average velocity of objects motion. This velocity was found as an average velocity for 8-12 objects with the error of 15%. After processing dependencies of the average velocity vs. control parameters were found. The processing was performed in the software environment GMimPro with the subsequent approximation of the data obtained using the Hill equation.

  11. Transcriptome sequencing and positive selected genes analysis of Bombyx mandarina.

    Directory of Open Access Journals (Sweden)

    Tingcai Cheng

    Full Text Available The wild silkworm Bombyx mandarina is widely believed to be an ancestor of the domesticated silkworm, Bombyx mori. Silkworms are often used as a model for studying the mechanism of species domestication. Here, we performed transcriptome sequencing of the wild silkworm using an Illumina HiSeq2000 platform. We produced 100,004,078 high-quality reads and assembled them into 50,773 contigs with an N50 length of 1764 bp and a mean length of 941.62 bp. A total of 33,759 unigenes were identified, with 12,805 annotated in the Nr database, 8273 in the Pfam database, and 9093 in the Swiss-Prot database. Expression profile analysis found significant differential expression of 1308 unigenes between the middle silk gland (MSG and posterior silk gland (PSG. Three sericin genes (sericin 1, sericin 2, and sericin 3 were expressed specifically in the MSG and three fibroin genes (fibroin-H, fibroin-L, and fibroin/P25 were expressed specifically in the PSG. In addition, 32,297 Single-nucleotide polymorphisms (SNPs and 361 insertion-deletions (INDELs were detected. Comparison with the domesticated silkworm p50/Dazao identified 5,295 orthologous genes, among which 400 might have experienced or to be experiencing positive selection by Ka/Ks analysis. These data and analyses presented here provide insights into silkworm domestication and an invaluable resource for wild silkworm genomics research.

  12. Evolution and strengthening of the Calabrian Regional Seismic Network during the Pollino sequence

    Science.gov (United States)

    D'Alessandro, Antonino; Gervasi, Anna; Guerra, Ignazio

    2013-04-01

    In the last three years the Calabria-Lucania border area is affected by an intense seismic activity generated by the activation of geological structures which be seat of clusters of microearthquakes, with energy release sufficient to be felt and to generate alarm and bother. Besides to the historical memory of the inhabitants of Mormanno (the town most affected of macroseismic effects) there are some historical documents that indicate the occurrence of a similar seismic crisis in 1888. A more recent seismic sequence, the first monitored by seismic instruments, occurred in 1973-1974. In the last case, the activity started in early 2010 and is still ongoing. The two shocks of ML = 4.3 and 5.0 and the the very long time duration differs this crisis from the previous ones. Given this background, in 1981 was installed at Mormanno a seismic station (MMN) belonging to Regional Seismic Network of the University of Calabria (RSRC), now also a station of the Italian National Seismic Network of the Istituto Nazionale di Geofisica Vulcanolgia (INSN-INGV). This seismic station made it possible to follow the evolution of seismicity in this area and in particular the progressive increase in seismic activity started in 2010. Since 2010, some 3D stand-alone, was installed by the University of Calabria. Further stations of INGV were installed in November 2011 after a sharp increase of the energy release and subsequently by the INGV and the GeoForschungsZentrum (Potsdam) after the main shock of the whole sequence. Seismic networks are powerful tools for understanding active tectonic processes in a monitored seismically active region. However, the optimal monitoring of a seismic region requires the assessment of the seismic network capabilities to identify seismogenic areas that are not adequately covered and to quantify measures that will allow the network improvement. In this paper we examine in detail the evolution and the strengthening of the RSRC in the last years analyzing the

  13. Improved Efficiency and Reliability of NGS Amplicon Sequencing Data Analysis for Genetic Diagnostic Procedures Using AGSA Software

    Directory of Open Access Journals (Sweden)

    Axel Poulet

    2016-01-01

    Full Text Available Screening for BRCA mutations in women with familial risk of breast or ovarian cancer is an ideal situation for high-throughput sequencing, providing large amounts of low cost data. However, 454, Roche, and Ion Torrent, Thermo Fisher, technologies produce homopolymer-associated indel errors, complicating their use in routine diagnostics. We developed software, named AGSA, which helps to detect false positive mutations in homopolymeric sequences. Seventy-two familial breast cancer cases were analysed in parallel by amplicon 454 pyrosequencing and Sanger dideoxy sequencing for genetic variations of the BRCA genes. All 565 variants detected by dideoxy sequencing were also detected by pyrosequencing. Furthermore, pyrosequencing detected 42 variants that were missed with Sanger technique. Six amplicons contained homopolymer tracts in the coding sequence that were systematically misread by the software supplied by Roche. Read data plotted as histograms by AGSA software aided the analysis considerably and allowed validation of the majority of homopolymers. As an optimisation, additional 250 patients were analysed using microfluidic amplification of regions of interest (Access Array Fluidigm of the BRCA genes, followed by 454 sequencing and AGSA analysis. AGSA complements a complete line of high-throughput diagnostic sequence analysis, reducing time and costs while increasing reliability, notably for homopolymer tracts.

  14. Cloning, sequencing, and sequence analysis of two novel plasmids from the thermophilic anaerobic bacterium Anaerocellum thermophilum

    DEFF Research Database (Denmark)

    Clausen, Anders; Mikkelsen, Marie Just; Schrøder, I.

    2004-01-01

    The nucleotide sequence of two novel plasmids isolated from the extreme thermophilic anaerobic bacterium Anaerocellum thermophilum DSM6725 (A. thermophilum), growing optimally at 70degreesC, has been determined. pBAS2 was found to be a 3653 bp plasmid with a GC content of 43%, and the sequence re...... with highest similarity to DNA repair protein from Campylobacter jejuni (25% aa). Orf34 showed similarity to sigma factors with highest similarity (28% aa) to the sporulation specific Sigma factor, Sigma 28(K) from Bacillus thuringiensis....

  15. Genetic relatedness among indigenous rice varieties in the Eastern Himalayan region based on nucleotide sequences of the Waxy gene.

    Science.gov (United States)

    Choudhury, Baharul I; Khan, Mohammed L; Dayanandan, Selvadurai

    2014-12-29

    Indigenous rice varieties in the Eastern Himalayan region of Northeast India are traditionally classified into sali, boro and jum ecotypes based on geographical locality and the season of cultivation. In this study, we used DNA sequence data from the Waxy (Wx) gene to infer the genetic relatedness among indigenous rice varieties in Northeast India and to assess the genetic distinctiveness of ecotypes. The results of all three analyses (Bayesian, Maximum Parsimony and Neighbor Joining) were congruent and revealed two genetically distinct clusters of rice varieties in the region. The large group comprised several varieties of sali and boro ecotypes, and all agronomically improved varieties. The small group consisted of only traditionally cultivated indigenous rice varieties, which included one boro, few sali and all jum varieties. The fixation index analysis revealed a very low level of differentiation between sali and boro (F(ST) = 0.005), moderate differentiation between sali and jum (F(ST) = 0.108) and high differentiation between jum and boro (F(ST) = 0.230) ecotypes. The genetic relatedness analyses revealed that sali, boro and jum ecotypes are genetically heterogeneous, and the current classification based on cultivation type is not congruent with the genetic background of rice varieties. Indigenous rice varieties chosen from genetically distinct clusters could be used in breeding programs to improve genetic gain through heterosis, while maintaining high genetic diversity.

  16. Rapid sequence divergence rates in the 5 prime regulatory regions of young Drosophila melanogaster duplicate gene pairs

    Directory of Open Access Journals (Sweden)

    Michael H. Kohn

    2008-01-01

    Full Text Available While it remains a matter of some debate, rapid sequence evolution of the coding sequences of duplicate genes is characteristic for early phases past duplication, but long established duplicates generally evolve under constraint, much like the rest of the coding genome. As for coding sequences, it may be possible to infer evolutionary rate, selection, and constraint via contrasts between duplicate gene divergence in the 5 prime regions and in the corresponding synonymous site divergence in the coding regions. Finding elevated rates for the 5 prime regions of duplicated genes, in addition to the coding regions, would enable statements regarding the early processes of duplicate gene evolution. Here, 1 kb of each of the 5 prime regulatory regions of Drosophila melanogaster duplicate gene pairs were mapped onto one another to isolate shared sequence blocks. Genetic distances within shared sequence blocks (d5’ were found to increase as a function of synonymous (dS, and to a lesser extend, amino-acid (dA site divergence between duplicates. The rate d5’/dS was found to rapidly decay from values > 1 in young duplicate pairs (dS 0.8. Such rapid rates of 5 prime evolution exceeding 1 (~neutral predominantly were found to occur in duplicate pairs with low amino-acid site divergence and that tended to be co-regulated when assayed on microarrays. Conceivably, functional redundancy and relaxation of selective constraint facilitates subsequent positive selection on the 5 prime regions of young duplicate genes. This might promote the evolution of new functions (neofunctionalization or division of labor among duplicate genes (subfunctionalization. In contrast, similar to the vast portion of the non-coding genome, the 5 prime regions of long-established gene duplicates appear to evolve under selective constraint, indicating that these long-established gene duplicates have assumed critical functions.

  17. A DNA Structure-Based Bionic Wavelet Transform and Its Application to DNA Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Fei Chen

    2003-01-01

    Full Text Available DNA sequence analysis is of great significance for increasing our understanding of genomic functions. An important task facing us is the exploration of hidden structural information stored in the DNA sequence. This paper introduces a DNA structure-based adaptive wavelet transform (WT – the bionic wavelet transform (BWT – for DNA sequence analysis. The symbolic DNA sequence can be separated into four channels of indicator sequences. An adaptive symbol-to-number mapping, determined from the structural feature of the DNA sequence, was introduced into WT. It can adjust the weight value of each channel to maximise the useful energy distribution of the whole BWT output. The performance of the proposed BWT was examined by analysing synthetic and real DNA sequences. Results show that BWT performs better than traditional WT in presenting greater energy distribution. This new BWT method should be useful for the detection of the latent structural features in future DNA sequence analysis.

  18. Pooled-DNA sequencing identifies genomic regions of selection in Nigerian isolates of Plasmodium falciparum.

    Science.gov (United States)

    Oyebola, Kolapo M; Idowu, Emmanuel T; Olukosi, Yetunde A; Awolola, Taiwo S; Amambua-Ngwa, Alfred

    2017-06-29

    The burden of falciparum malaria is especially high in sub-Saharan Africa. Differences in pressure from host immunity and antimalarial drugs lead to adaptive changes responsible for high level of genetic variations within and between the parasite populations. Population-specific genetic studies to survey for genes under positive or balancing selection resulting from drug pressure or host immunity will allow for refinement of interventions. We performed a pooled sequencing (pool-seq) of the genomes of 100 Plasmodium falciparum isolates from Nigeria. We explored allele-frequency based neutrality test (Tajima's D) and integrated haplotype score (iHS) to identify genes under selection. Fourteen shared iHS regions that had at least 2 SNPs with a score > 2.5 were identified. These regions code for genes that were likely to have been under strong directional selection. Two of these genes were the chloroquine resistance transporter (CRT) on chromosome 7 and the multidrug resistance 1 (MDR1) on chromosome 5. There was a weak signature of selection in the dihydrofolate reductase (DHFR) gene on chromosome 4 and MDR5 genes on chromosome 13, with only 2 and 3 SNPs respectively identified within the iHS window. We observed strong selection pressure attributable to continued chloroquine and sulfadoxine-pyrimethamine use despite their official proscription for the treatment of uncomplicated malaria. There was also a major selective sweep on chromosome 6 which had 32 SNPs within the shared iHS region. Tajima's D of circumsporozoite protein (CSP), erythrocyte-binding antigen (EBA-175), merozoite surface proteins - MSP3 and MSP7, merozoite surface protein duffy binding-like (MSPDBL2) and serine repeat antigen (SERA-5) were 1.38, 1.29, 0.73, 0.84 and 0.21, respectively. We have demonstrated the use of pool-seq to understand genomic patterns of selection and variability in P. falciparum from Nigeria, which bears the highest burden of infections. This investigation identified known

  19. Comparative genomic sequence analysis of strawberry and other rosids reveals significant microsynteny

    Directory of Open Access Journals (Sweden)

    Abbott Albert

    2010-06-01

    Full Text Available Abstract Background Fragaria belongs to the Rosaceae, an economically important family that includes a number of important fruit producing genera such as Malus and Prunus. Using genomic sequences from 50 Fragaria fosmids, we have examined the microsynteny between Fragaria and other plant models. Results In more than half of the strawberry fosmids, we found syntenic regions that are conserved in Populus, Vitis, Medicago and/or Arabidopsis with Populus containing the greatest number of syntenic regions with Fragaria. The longest syntenic region was between LG VIII of the poplar genome and the strawberry fosmid 72E18, where seven out of twelve predicted genes were collinear. We also observed an unexpectedly high level of conserved synteny between Fragaria (rosid I and Vitis (basal rosid. One of the strawberry fosmids, 34E24, contained a cluster of R gene analogs (RGAs with NBS and LRR domains. We detected clusters of RGAs with high sequence similarity to those in 34E24 in all the genomes compared. In the phylogenetic tree we have generated, all the NBS-LRR genes grouped together with Arabidopsis CNL-A type NBS-LRR genes. The Fragaria RGA grouped together with those of Vitis and Populus in the phylogenetic tree. Conclusions Our analysis shows considerable microsynteny between Fragaria and other plant genomes such as Populus, Medicago, Vitis, and Arabidopsis to a lesser degree. We also detected a cluster of NBS-LRR type genes that are conserved in all the genomes compared.

  20. Comparison of variable region 3 sequences of human immunodeficiency virus type 1 from infected children with the RNA and DNA sequences of the virus populations of their mothers.

    Science.gov (United States)

    Scarlatti, G; Leitner, T; Halapi, E; Wahlberg, J; Marchisio, P; Clerici-Schoeller, M A; Wigzell, H; Fenyö, E M; Albert, J; Uhlén, M

    1993-01-01

    We have compared the variable region 3 sequences from 10 human immunodeficiency virus type 1 (HIV-1)-infected infants to virus sequences from the corresponding mothers. The sequences were derived from DNA of uncultured peripheral blood mononuclear cells (PBMC), DNA of cultured PBMC, and RNA from serum collected at or shortly after delivery. The infected infants, in contrast to the mothers, harbored homogeneous virus populations. Comparison of sequences from the children and clones derived from DNA of the corresponding mothers showed that the transmitted virus represented either a minor or a major virus population of the mother. In contrast to an earlier study, we found no evidence of selection of minor virus variants during transmission. Furthermore, the transmitted virus variant did not show any characteristic molecular features. In some cases the transmitted virus was more related to the virus RNA population of the mother and in other cases it was more related to the virus DNA population. This suggests that either cell-free or cell-associated virus may be transmitted. These data will help AIDS researchers to understand the mechanism of transmission and to plan strategies for prevention of transmission. PMID:8446584

  1. Two‐phase designs for joint quantitative‐trait‐dependent and genotype‐dependent sampling in post‐GWAS regional sequencing

    Science.gov (United States)

    Espin‐Garcia, Osvaldo; Craiu, Radu V.

    2017-01-01

    ABSTRACT We evaluate two‐phase designs to follow‐up findings from genome‐wide association study (GWAS) when the cost of regional sequencing in the entire cohort is prohibitive. We develop novel expectation‐maximization‐based inference under a semiparametric maximum likelihood formulation tailored for post‐GWAS inference. A GWAS‐SNP (where SNP is single nucleotide polymorphism) serves as a surrogate covariate in inferring association between a sequence variant and a normally distributed quantitative trait (QT). We assess test validity and quantify efficiency and power of joint QT‐SNP‐dependent sampling and analysis under alternative sample allocations by simulations. Joint allocation balanced on SNP genotype and extreme‐QT strata yields significant power improvements compared to marginal QT‐ or SNP‐based allocations. We illustrate the proposed method and evaluate the sensitivity of sample allocation to sampling variation using data from a sequencing study of systolic blood pressure. PMID:29239496

  2. Complete genome sequence analysis of novel human bocavirus reveals genetic recombination between human bocavirus 2 and human bocavirus 4.

    Science.gov (United States)

    Khamrin, Pattara; Okitsu, Shoko; Ushijima, Hiroshi; Maneekarn, Niwat

    2013-07-01

    Epidemiological surveillance of human bocavirus (HBoV) was conducted on fecal specimens collected from hospitalized children with diarrhea in Chiang Mai, Thailand in 2011. By partial sequence analysis of VP1 gene, an unusual strain of HBoV (CMH-S011-11), was initially identified as HBoV4. The complete genome sequence of CMH-S011-11 was performed and analyzed further to clarify whether it was a recombinant strain or a new HBoV variant. Analysis of complete genome sequence revealed that the coding sequence starting from NS1, NP1 to VP1/VP2 was 4795 nucleotides long. Interestingly, the nucleotide sequence of NS1 gene of CMH-S011-11 was most closely related to the HBoV2 reference strains detected in Pakistan, which contradicted to the initial genotyping result of the partial VP1 region in the previous study. In addition, comparison of NP1 nucleotide sequence of CMH-S011-11 with those of other HBoV1-4 reference strains also revealed a high level of sequence identity with HBoV2. On the other hand, nucleotide sequence of VP1/VP2 gene of CMH-S011-11 was most closely related to those of HBoV4 reference strains detected in Nigeria. The overall full-length sequence analysis revealed that this CMH-S011-11 was grouped within HBoV4 species, but located in a separate branch from other HBoV4 prototype strains. Recombination analysis revealed that CMH-S011-11 was the result of recombination between HBoV2 and HBoV4 strains with the break point located near the start codon of VP2. Copyright © 2013 Elsevier B.V. All rights reserved.

  3. Sequencing and analysis of an Irish human genome.

    LENUS (Irish Health Repository)

    Tong, Pin

    2010-01-01

    Recent studies generating complete human sequences from Asian, African and European subgroups have revealed population-specific variation and disease susceptibility loci. Here, choosing a DNA sample from a population of interest due to its relative geographical isolation and genetic impact on further populations, we extend the above studies through the generation of 11-fold coverage of the first Irish human genome sequence.

  4. Exome Sequence Analysis of 14 Families With High Myopia

    DEFF Research Database (Denmark)

    Kloss, Bethany A.; Tompson, Stuart W.; Whisenhunt, Kristina N.

    2017-01-01

    Purpose: To identify causal gene mutations in 14 families with autosomal dominant (AD) high myopia using exome sequencing. Methods: Select individuals from 14 large Caucasian families with high myopia were exome sequenced. Gene variants were filtered to identify potential pathogenic changes. Sang...

  5. Database-driven primary analysis of raw sequencing data

    DEFF Research Database (Denmark)

    2014-01-01

    The present invention relates to methods for identifying the source of a biological sequence containing sample from raw sequencing reads. The method may be used to identify the source of unknown DNA and can be used for diagnostic, biodefense, food safety and quality, and hygiene applications...

  6. Accelerating next generation sequencing data analysis with system level optimizations.

    Science.gov (United States)

    Kathiresan, Nagarajan; Temanni, Ramzi; Almabrazi, Hakeem; Syed, Najeeb; Jithesh, Puthen V; Al-Ali, Rashid

    2017-08-22

    Next generation sequencing (NGS) data analysis is highly compute intensive. In-memory computing, vectorization, bulk data transfer, CPU frequency scaling are some of the hardware features in the modern computing architectures. To get the best execution time and utilize these hardware features, it is necessary to tune the system level parameters before running the application. We studied the GATK-HaplotypeCaller which is part of common NGS workflows, that consume more than 43% of the total execution time. Multiple GATK 3.x versions were benchmarked and the execution time of HaplotypeCaller was optimized by various system level parameters which included: (i) tuning the parallel garbage collection and kernel shared memory to simulate in-memory computing, (ii) architecture-specific tuning in the PairHMM library for vectorization, (iii) including Java 1.8 features through GATK source code compilation and building a runtime environment for parallel sorting and bulk data transfer (iv) the default 'on-demand' mode of CPU frequency is over-clocked by using 'performance-mode' to accelerate the Java multi-threads. As a result, the HaplotypeCaller execution time was reduced by 82.66% in GATK 3.3 and 42.61% in GATK 3.7. Overall, the execution time of NGS pipeline was reduced to 70.60% and 34.14% for GATK 3.3 and GATK 3.7 respectively.

  7. The sequence and analysis of a Chinese pig genome

    Directory of Open Access Journals (Sweden)

    Fang Xiaodong

    2012-11-01

    Full Text Available Abstract Background The pig is an economically important food source, amounting to approximately 40% of all meat consumed worldwide. Pigs also serve as an important model organism because of their similarity to humans at the anatomical, physiological and genetic level, making them very useful for studying a variety of human diseases. A pig strain of particular interest is the miniature pig, specifically the Wuzhishan pig (WZSP, as it has been extensively inbred. Its high level of homozygosity offers increased ease for selective breeding for specific traits and a more straightforward understanding of the genetic changes that underlie its biological characteristics. WZSP also serves as a promising means for applications in surgery, tissue engineering, and xenotransplantation. Here, we report the sequencing and analysis of an inbreeding WZSP genome. Results Our results reveal some unique genomic features, including a relatively high level of homozygosity in the diploid genome, an unusual distribution of heterozygosity, an over-representation of tRNA-derived transposable elements, a small amount of porcine endogenous retrovirus, and a lack of type C retroviruses. In addition, we carried out systematic research on gene evolution, together with a detailed investigation of the counterparts of human drug target genes. Conclusion Our results provide the opportunity to more clearly define the genomic character of pig, which could enhance our ability to create more useful pig models.

  8. Analysis of expressed sequence tags from the Ulva prolifera (Chlorophyta)

    Science.gov (United States)

    Niu, Jianfeng; Hu, Haiyan; Hu, Songnian; Wang, Guangce; Peng, Guang; Sun, Song

    2010-01-01

    In 2008, a green tide broke out before the sailing competition of the 29th Olympic Games in Qingdao. The causative species was determined to be Enteromorpha prolifera ( Ulva prolifera O. F. Müller), a familiar green macroalga along the coastline of China. Rapid accumulation of a large biomass of floating U. prolifera prompted research on different aspects of this species. In this study, we constructed a nonnormalized cDNA library from the thalli of U. prolifera and acquired 10 072 high-quality expressed sequence tags (ESTs). These ESTs were assembled into 3 519 nonredundant gene groups, including 1 446 clusters and 2 073 singletons. After annotation with the nr database, a large number of genes were found to be related with chloroplast and ribosomal protein, GO functional classification showed 1 418 ESTs participated in photosynthesis and 1 359 ESTs were responsible for the generation of precursor metabolites and energy. In addition, rather comprehensive carbon fixation pathways were found in U. prolifera using KEGG. Some stress-related and signal transduction-related genes were also found in this study. All the evidences displayed that U. prolifera had substance and energy foundation for the intense photosynthesis and the rapid proliferation. Phylogenetic analysis of cytochrome c oxidase subunit I revealed that this green-tide causative species is most closely affiliated to Pseudendoclonium akinetum (Ulvophyceae).

  9. Analysis of trade condition in Ras region

    Directory of Open Access Journals (Sweden)

    Andelić Slavica

    2017-01-01

    Full Text Available Modern academic literature in the field of trade in macro and mesoeconomic atmosphere, is trying to shed light on the data which defines exchange flows in intra and international environment. The study of this work is based on the database based through state registers, where with their sizing and analysis, we are coming to a deeper insight into the condition of market channels of Ras region and its relationship with the environment. The aim of this work is meticulous interpretation of trade patterns as a result of macro and meso trade policy, which could serve as an incentive for local and governmental structures in developing commercial potential of the southern part of our country.

  10. Seismic fault analysis of Chicoutimi region

    International Nuclear Information System (INIS)

    Woussen, G.; Ngandee, S.

    1996-01-01

    On November 25, 1988, an earthquake measuring 6.5 on the Richter Scale occurred at a depth of 29 km in Precambrian bedrock in the Saguenay Region (Quebec). Given that the seismic event was located near a major zone of normal faults, it is important to determine if the earthquake could be associated with this large structure or with faults associated with this structure. This is discussed through a compilation and interpretation of structural discontinuities on key outcrops in the vicinity of the epicenter. The report is broken in four parts. The first part gives a brief overview of the geology in order to provide a geologic context for the structural measurements. The second comprises an analysis of fractures in each of the three lithotectonic units defined in the first part. The third part discusses the data and the fourth provides a conclusion. 30 refs., 53 figs

  11. Large-scale analysis of intrinsic disorder flavors and associated functions in the protein sequence universe.

    Science.gov (United States)

    Necci, Marco; Piovesan, Damiano; Tosatto, Silvio C E

    2016-12-01

    Intrinsic disorder (ID) in proteins has been extensively described for the last decade; a large-scale classification of ID in proteins is mostly missing. Here, we provide an extensive analysis of ID in the protein universe on the UniProt database derived from sequence-based predictions in MobiDB. Almost half the sequences contain an ID region of at least five residues. About 9% of proteins have a long ID region of over 20 residues which are more abundant in Eukaryotic organisms and most frequently cover less than 20% of the sequence. A small subset of about 67,000 (out of over 80 million) proteins is fully disordered and mostly found in Viruses. Most proteins have only one ID, with short ID evenly distributed along the sequence and long ID overrepresented in the center. The charged residue composition of Das and Pappu was used to classify ID proteins by structural propensities and corresponding functional enrichment. Swollen Coils seem to be used mainly as structural components and in biosynthesis in both Prokaryotes and Eukaryotes. In Bacteria, they are confined in the nucleoid and in Viruses provide DNA binding function. Coils & Hairpins seem to be specialized in ribosome binding and methylation activities. Globules & Tadpoles bind antigens in Eukaryotes but are involved in killing other organisms and cytolysis in Bacteria. The Undefined class is used by Bacteria to bind toxic substances and mediate transport and movement between and within organisms in Viruses. Fully disordered proteins behave similarly, but are enriched for glycine residues and extracellular structures. © 2016 The Protein Society.

  12. Sequencing chemotherapy and radiotherapy in locoregional advanced breast cancer patients after mastectomy – a retrospective analysis

    International Nuclear Information System (INIS)

    Piroth, Marc D; Pinkawa, Michael; Gagel, Bernd; Stanzel, Sven; Asadpour, Branka; Eble, Michael J

    2008-01-01

    Combined chemo- and radiotherapy are established in breast cancer treatment. Chemotherapy is recommended prior to radiotherapy but decisive data on the optimal sequence are rare. This retrospective analysis aimed to assess the role of sequencing in patients after mastectomy because of advanced locoregional disease. A total of 212 eligible patients had a stage III breast cancer and had adjuvant chemotherapy and radiotherapy after mastectomy and axillary dissection between 1996 and 2004. According to concerted multi-modality treatment strategies 86 patients were treated sequentially (chemotherapy followed by radiotherapy) (SEQgroup), 70 patients had a sandwich treatment (SW-group) and 56 patients had simultaneous chemoradiation (SIM-group) during that time period. Radiotherapy comprised the thoracic wall and/or regional lymph nodes. The total dose was 45–50.4 Gray. As simultaneous chemoradiation CMF was given in 95.4% of patients while in sequential or sandwich application in 86% and 87.1% of patients an anthracycline-based chemotherapy was given. Concerning the parameters nodal involvement, lymphovascular invasion, extracapsular spread and extension of the irradiated region the three treatment groups were significantly imbalanced. The other parameters, e.g. age, pathological tumor stage, grading and receptor status were homogeneously distributed. Looking on those two groups with an equally effective chemotherapy (EC, FEC), the SEQ- and SW-group, the sole imbalance was the extension of LVI (57.1 vs. 25.6%, p < 0.0001). 5-year overall- and disease free survival were 53.2%/56%, 38.1%/32% and 64.2%/50%, for the sequential, sandwich and simultaneous regime, respectively, which differed significantly in the univariate analysis (p = 0.04 and p = 0.03, log-rank test). Also the 5-year locoregional or distant recurrence free survival showed no significant differences according to the sequence of chemo- and radiotherapy. In the multivariate analyses the sequence had no

  13. Applications of statistical physics and information theory to the analysis of DNA sequences

    Science.gov (United States)

    Grosse, Ivo

    2000-10-01

    DNA carries the genetic information of most living organisms, and the of genome projects is to uncover that genetic information. One basic task in the analysis of DNA sequences is the recognition of protein coding genes. Powerful computer programs for gene recognition have been developed, but most of them are based on statistical patterns that vary from species to species. In this thesis I address the question if there exist universal statistical patterns that are different in coding and noncoding DNA of all living species, regardless of their phylogenetic origin. In search for such species-independent patterns I study the mutual information function of genomic DNA sequences, and find that it shows persistent period-three oscillations. To understand the biological origin of the observed period-three oscillations, I compare the mutual information function of genomic DNA sequences to the mutual information function of stochastic model sequences. I find that the pseudo-exon model is able to reproduce the mutual information function of genomic DNA sequences. Moreover, I find that a generalization of the pseudo-exon model can connect the existence and the functional form of long-range correlations to the presence and the length distributions of coding and noncoding regions. Based on these theoretical studies I am able to find an information-theoretical quantity, the average mutual information (AMI), whose probability distributions are significantly different in coding and noncoding DNA, while they are almost identical in all studied species. These findings show that there exist universal statistical patterns that are different in coding and noncoding DNA of all studied species, and they suggest that the AMI may be used to identify genes in different living species, irrespective of their taxonomic origin.

  14. Genetic characterization of UCS region of Pneumocystis jirovecii and construction of allelic profiles of Indian isolates based on sequence typing at three regions.

    Science.gov (United States)

    Gupta, Rashmi; Mirdha, Bijay Ranjan; Guleria, Randeep; Kumar, Lalit; Luthra, Kalpana; Agarwal, Sanjay Kumar; Sreenivas, Vishnubhatla

    2013-01-01

    Pneumocystis jirovecii is an opportunistic pathogen that causes severe pneumonia in immunocompromised patients. To study the genetic diversity of P. jirovecii in India the upstream conserved sequence (UCS) region of Pneumocystis genome was amplified, sequenced and genotyped from a set of respiratory specimens obtained from 50 patients with a positive result for nested mitochondrial large subunit ribosomal RNA (mtLSU rRNA) PCR during the years 2005-2008. Of these 50 cases, 45 showed a positive PCR for UCS region. Variations in the tandem repeats in UCS region were characterized by sequencing all the positive cases. Of the 45 cases, one case showed five repeats, 11 cases showed four repeats, 29 cases showed three repeats and four cases showed two repeats. By running amplified DNA from all these cases on a high-resolution gel, mixed infection was observed in 12 cases (26.7%, 12/45). Forty three of 45 cases included in this study had previously been typed at mtLSU rRNA and internal transcribed spacer (ITS) region by our group. In the present study, the genotypes at those two regions were combined with UCS repeat patterns to construct allelic profiles of 43 cases. A total of 36 allelic profiles were observed in 43 isolates indicating high genetic variability. A statistically significant association was observed between mtLSU rRNA genotype 1, ITS type Ea and UCS repeat pattern 4. Copyright © 2012 Elsevier B.V. All rights reserved.

  15. Implementing targeted region capture sequencing for the clinical detection of Alagille syndrome: An efficient and cost‑effective method.

    Science.gov (United States)

    Huang, Tianhong; Yang, Guilin; Dang, Xiao; Ao, Feijian; Li, Jiankang; He, Yizhou; Tang, Qiyuan; He, Qing

    2017-11-01

    Alagille syndrome (AGS) is a highly variable, autosomal dominant disease that affects multiple structures including the liver, heart, eyes, bones and face. Targeted region capture sequencing focuses on a panel of known pathogenic genes and provides a rapid, cost‑effective and accurate method for molecular diagnosis. In a Chinese family, this method was used on the proband and Sanger sequencing was applied to validate the candidate mutation. A de novo heterozygous mutation (c.3254_3255insT p.Leu1085PhefsX24) of the jagged 1 gene was identified as the potential disease‑causing gene mutation. In conclusion, the present study suggested that target region capture sequencing is an efficient, reliable and accurate approach for the clinical diagnosis of AGS. Furthermore, these results expand on the understanding of the pathogenesis of AGS.

  16. Phylogenetic relationships in Solanaceae and related species based on cpDNA sequence from plastid trnE-trnT region

    Directory of Open Access Journals (Sweden)

    Danila Montewka Melotto-Passarin

    2008-01-01

    Full Text Available Intergenic spacers of chloroplast DNA (cpDNA are very useful in phylogenetic and population genetic studiesof plant species, to study their potential integration in phylogenetic analysis. The non-coding trnE-trnT intergenic spacer ofcpDNA was analyzed to assess the nucleotide sequence polymorphism of 16 Solanaceae species and to estimate its ability tocontribute to the resolution of phylogenetic studies of this group. Multiple alignments of DNA sequences of trnE-trnT intergenicspacer made the identification of nucleotide variability in this region possible and the phylogeny was estimated by maximumparsimony and rooted with Convolvulaceae Ipomoea batatas, the most closely related family. Besides, this intergenic spacerwas tested for the phylogenetic ability to differentiate taxonomic levels. For this purpose, species from four other families wereanalyzed and compared with Solanaceae species. Results confirmed polymorphism in the trnE-trnT region at different taxonomiclevels.

  17. Identification and verification of hybridoma-derived monoclonal antibody variable region sequences using recombinant DNA technology and mass spectrometry

    Science.gov (United States)

    Antibody engineering requires the identification of antigen binding domains or variable regions (VR) unique to each antibody. It is the VR that define the unique antigen binding properties and proper sequence identification is essential for functional evaluation and performance of recombinant antibo...

  18. UBV(RI)sub(c) photometry of some standard sequences in the Harvard F regions and in the Magellanic Clouds

    International Nuclear Information System (INIS)

    Menzies, J.W.; Laing, J.D.

    1988-01-01

    This paper presents the results of a photometric programme aimed at improving the UBV(RI)sub(c) standard sequences in the Harvard F regions and in the Small and Large Magellanic Clouds. Magnitudes and colours are given for 99 stars, and they are compared with the current values which were obtained or compiled by a previous author. (author)

  19. Identification and positional distribution analysis of transcription factor binding sites for genes from the wheat fl-cDNA sequences.

    Science.gov (United States)

    Chen, Zhen-Yong; Guo, Xiao-Jiang; Chen, Zhong-Xu; Chen, Wei-Ying; Wang, Ji-Rui

    2017-06-01

    The binding sites of transcription factors (TFs) in upstream DNA regions are called transcription factor binding sites (TFBSs). TFBSs are important elements for regulating gene expression. To date, there have been few studies on the profiles of TFBSs in plants. In total, 4,873 sequences with 5' upstream regions from 8530 wheat fl-cDNA sequences were used to predict TFBSs. We found 4572 TFBSs for the MADS TF family, which was twice as many as for bHLH (1951), B3 (1951), HB superfamily (1914), ERF (1820), and AP2/ERF (1725) TFs, and was approximately four times higher than the remaining TFBS types. The percentage of TFBSs and TF members showed a distinct distribution in different tissues. Overall, the distribution of TFBSs in the upstream regions of wheat fl-cDNA sequences had significant difference. Meanwhile, high frequencies of some types of TFBSs were found in specific regions in the upstream sequences. Both TFs and fl-cDNA with TFBSs predicted in the same tissues exhibited specific distribution preferences for regulating gene expression. The tissue-specific analysis of TFs and fl-cDNA with TFBSs provides useful information for functional research, and can be used to identify relationships between tissue-specific TFs and fl-cDNA with TFBSs. Moreover, the positional distribution of TFBSs indicates that some types of wheat TFBS have different positional distribution preferences in the upstream regions of genes.

  20. The complete chloroplast genome sequence of Aconitum coreanum and Aconitum carmichaelii and comparative analysis with other Aconitum species.

    Directory of Open Access Journals (Sweden)

    Inkyu Park

    Full Text Available Aconitum species (belonging to the Ranunculaceae are well known herbaceous medicinal ingredients and have great economic value in Asian countries. However, there are still limited genomic resources available for Aconitum species. In this study, we sequenced the chloroplast (cp genomes of two Aconitum species, A. coreanum and A. carmichaelii, using the MiSeq platform. The two Aconitum chloroplast genomes were 155,880 and 157,040 bp in length, respectively, and exhibited LSC and SSC regions separated by a pair of inverted repeat regions. Both cp genomes had 38% GC content and contained 131 unique functional genes including 86 protein-coding genes, eight ribosomal RNA genes, and 37 transfer RNA genes. The gene order, content, and orientation of the two Aconitum cp genomes exhibited the general structure of angiosperms, and were similar to those of other Aconitum species. Comparison of the cp genome structure and gene order with that of other Aconitum species revealed general contraction and expansion of the inverted repeat regions and single copy boundary regions. Divergent regions were also identified. In phylogenetic analysis, Aconitum species positon among the Ranunculaceae was determined with other family cp genomes in the Ranunculales. We obtained a barcoding target sequence in a divergent region, ndhC-trnV, and successfully developed a SCAR (sequence characterized amplified region marker for discrimination of A. coreanum. Our results provide useful genetic information and a specific barcode for discrimination of Aconitum species.

  1. The complete chloroplast genome sequence of Aconitum coreanum and Aconitum carmichaelii and comparative analysis with other Aconitum species.

    Science.gov (United States)

    Park, Inkyu; Kim, Wook-Jin; Yang, Sungyu; Yeo, Sang-Min; Li, Hulin; Moon, Byeong Cheol

    2017-01-01

    Aconitum species (belonging to the Ranunculaceae) are well known herbaceous medicinal ingredients and have great economic value in Asian countries. However, there are still limited genomic resources available for Aconitum species. In this study, we sequenced the chloroplast (cp) genomes of two Aconitum species, A. coreanum and A. carmichaelii, using the MiSeq platform. The two Aconitum chloroplast genomes were 155,880 and 157,040 bp in length, respectively, and exhibited LSC and SSC regions separated by a pair of inverted repeat regions. Both cp genomes had 38% GC content and contained 131 unique functional genes including 86 protein-coding genes, eight ribosomal RNA genes, and 37 transfer RNA genes. The gene order, content, and orientation of the two Aconitum cp genomes exhibited the general structure of angiosperms, and were similar to those of other Aconitum species. Comparison of the cp genome structure and gene order with that of other Aconitum species revealed general contraction and expansion of the inverted repeat regions and single copy boundary regions. Divergent regions were also identified. In phylogenetic analysis, Aconitum species positon among the Ranunculaceae was determined with other family cp genomes in the Ranunculales. We obtained a barcoding target sequence in a divergent region, ndhC-trnV, and successfully developed a SCAR (sequence characterized amplified region) marker for discrimination of A. coreanum. Our results provide useful genetic information and a specific barcode for discrimination of Aconitum species.

  2. Species-level analysis of DNA sequence data from the NIH Human Microbiome Project.

    Science.gov (United States)

    Conlan, Sean; Kong, Heidi H; Segre, Julia A

    2012-01-01

    Outbreaks of antibiotic-resistant bacterial infections emphasize the importance of surveillance of potentially pathogenic bacteria. Genomic sequencing of clinical microbiological specimens expands our capacity to study cultivable, fastidious and uncultivable members of the bacterial community. Herein, we compared the primary data collected by the NIH's Human Microbiome Project (HMP) with published epidemiological surveillance data of Staphylococcus aureus. The HMP's initial dataset contained microbial survey data from five body regions (skin, nares, oral cavity, gut and vagina) of 242 healthy volunteers. A significant component of the HMP dataset was deep sequencing of the 16S ribosomal RNA gene, which contains variable regions enabling taxonomic classification. Since species-level identification is essential in clinical microbiology, we built a reference database and used phylogenetic placement followed by most recent common ancestor classification to look at the species distribution for Staphylococcus, Klebsiella and Enterococcus. We show that selecting the accurate region of the 16S rRNA gene to sequence is analogous to carefully selecting culture conditions to distinguish closely related bacterial species. Analysis of the HMP data showed that Staphylococcus aureus was present in the nares of 36% of healthy volunteers, consistent with culture-based epidemiological data. Klebsiella pneumoniae and Enterococcus faecalis were found less frequently, but across many habitats. This work demonstrates that large 16S rRNA survey studies can be used to support epidemiological goals in the context of an increasing awareness that microbes flourish and compete within a larger bacterial community. This study demonstrates how genomic techniques and information could be critically important to trace microbial evolution and implement hospital infection control.

  3. Species-level analysis of DNA sequence data from the NIH Human Microbiome Project.

    Directory of Open Access Journals (Sweden)

    Sean Conlan

    Full Text Available BACKGROUND: Outbreaks of antibiotic-resistant bacterial infections emphasize the importance of surveillance of potentially pathogenic bacteria. Genomic sequencing of clinical microbiological specimens expands our capacity to study cultivable, fastidious and uncultivable members of the bacterial community. Herein, we compared the primary data collected by the NIH's Human Microbiome Project (HMP with published epidemiological surveillance data of Staphylococcus aureus. METHODS: The HMP's initial dataset contained microbial survey data from five body regions (skin, nares, oral cavity, gut and vagina of 242 healthy volunteers. A significant component of the HMP dataset was deep sequencing of the 16S ribosomal RNA gene, which contains variable regions enabling taxonomic classification. Since species-level identification is essential in clinical microbiology, we built a reference database and used phylogenetic placement followed by most recent common ancestor classification to look at the species distribution for Staphylococcus, Klebsiella and Enterococcus. MAIN RESULTS: We show that selecting the accurate region of the 16S rRNA gene to sequence is analogous to carefully selecting culture conditions to distinguish closely related bacterial species. Analysis of the HMP data showed that Staphylococcus aureus was present in the nares of 36% of healthy volunteers, consistent with culture-based epidemiological data. Klebsiella pneumoniae and Enterococcus faecalis were found less frequently, but across many habitats. CONCLUSIONS: This work demonstrates that large 16S rRNA survey studies can be used to support epidemiological goals in the context of an increasing awareness that microbes flourish and compete within a larger bacterial community. This study demonstrates how genomic techniques and information could be critically important to trace microbial evolution and implement hospital infection control.

  4. Complete chloroplast genome sequence of MD-2 pineapple and its comparative analysis among nine other plants from the subclass Commelinidae.

    Science.gov (United States)

    Redwan, R M; Saidin, A; Kumar, S V

    2015-08-12

    Pineapple (Ananas comosus var. comosus) is known as the king of fruits for its crown and is the third most important tropical fruit after banana and citrus. The plant, which is indigenous to South America, is the most important species in the Bromeliaceae family and is largely traded for fresh fruit consumption. Here, we report the complete chloroplast sequence of the MD-2 pineapple that was sequenced using the PacBio sequencing technology. In this study, the high error rate of PacBio long sequence reads of A. comosus's total genomic DNA were improved by leveraging on the high accuracy but short Illumina reads for error-correction via the latest error correction module from Novocraft. Error corrected long PacBio reads were assembled by using a single tool to produce a contig representing the pineapple chloroplast genome. The genome of 159,636 bp in length is featured with the conserved quadripartite structure of chloroplast containing a large single copy region (LSC) with a size of 87,482 bp, a small single copy region (SSC) with a size of 18,622 bp and two inverted repeat regions (IRA and IRB) each with the size of 26,766 bp. Overall, the genome contained 117 unique coding regions and 30 were repeated in the IR region with its genes contents, structure and arrangement similar to its sister taxon, Typha latifolia. A total of 35 repeats structure were detected in both the coding and non-coding regions with a majority being tandem repeats. In addition, 205 SSRs were detected in the genome with six protein-coding genes contained more than two SSRs. Comparative chloroplast genomes from the subclass Commelinidae revealed a conservative protein coding gene albeit located in a highly divergence region. Analysis of selection pressure on protein-coding genes using Ka/Ks ratio showed significant positive selection exerted on the rps7 gene of the pineapple chloroplast with P less than 0.05. Phylogenetic analysis confirmed the recent taxonomical relation among the member of

  5. Different region analysis for genotyping Yersinia pestis isolates from China.

    Directory of Open Access Journals (Sweden)

    Yanjun Li

    Full Text Available BACKGROUND: DFR (different region analysis has been developed for typing Yesinia pestis in our previous study, and in this study, we extended this method by using 23 DFRs to investigate 909 Chinese Y. pestis strains for validating DFR-based genotyping method and better understanding adaptive microevolution of Y. pestis. METHODOLOGY/PRINCIPAL FINDINGS: On the basis of PCR and Bionumerics data analysis, 909 Y. pestis strains were genotyped into 32 genomovars according to their DFR profiles. New terms, Major genomovar and Minor genomovar, were coined for illustrating evolutionary relationship between Y. pestis strains from different plague foci and different hosts. In silico DFR profiling of the completed or draft genomes shed lights on the evolutionary scenario of Y. pestis from Y. pseudotuberculosis. Notably, several sequenced Y. pestis strains share the same DFR profiles with Chinese strains, providing data for revealing the global plague foci expansion. CONCLUSIONS/SIGNIFICANCE: Distribution of Y. pestis genomovars is plague focus-specific. Microevolution of biovar Orientalis was deduced according to DFR profiles. DFR analysis turns to be an efficient and inexpensive method to portrait the genome plasticity of Y. pestis based on horizontal gene transfer (HGT. DFR analysis can also be used as a tool in comparative and evolutionary genomic research for other bacteria with similar genome plasticity.

  6. SEQATOMS: a web tool for identifying missing regions in PDB in sequence context

    NARCIS (Netherlands)

    Brandt, B.W.; Heringa, J.; Leunissen, J.A.M.

    2008-01-01

    With over 46 000 proteins, the Protein Data Bank (PDB) is the most important database with structural information of biological macromolecules. PDB files contain sequence and coordinate information. Residues present in the sequence can be absent from the coordinate section, which means their

  7. Event Sequence Analysis of the Air Intelligence Agency Information Operations Center Flight Operations

    National Research Council Canada - National Science Library

    Larsen, Glen

    1998-01-01

    This report applies Event Sequence Analysis, methodology adapted from aircraft mishap investigation, to an investigation of the performance of the Air Intelligence Agency's Information Operations Center (IOC...

  8. Delineation and analysis of chromosomal regions specifying Yersinia pestis.

    Science.gov (United States)

    Derbise, Anne; Chenal-Francisque, Viviane; Huon, Christèle; Fayolle, Corinne; Demeure, Christian E; Chane-Woon-Ming, Béatrice; Médigue, Claudine; Hinnebusch, B Joseph; Carniel, Elisabeth

    2010-09-01

    Yersinia pestis, the causative agent of plague, has recently diverged from the less virulent enteropathogen Yersinia pseudotuberculosis. Its emergence has been characterized by massive genetic loss and inactivation and limited gene acquisition. The acquired genes include two plasmids, a filamentous phage, and a few chromosomal loci. The aim of this study was to characterize the chromosomal regions acquired by Y. pestis. Following in silico comparative analysis and PCR screening of 98 strains of Y. pseudotuberculosis and Y. pestis, we found that eight chromosomal loci (six regions [R1pe to R6pe] and two coding sequences [CDS1pe and CDS2pe]) specified Y. pestis. Signatures of integration by site specific or homologous recombination were identified for most of them. These acquisitions and the loss of ancestral DNA sequences were concentrated in a chromosomal region opposite to the origin of replication. The specific regions were acquired very early during Y. pestis evolution and were retained during its microevolution, suggesting that they might bring some selective advantages. Only one region (R3pe), predicted to carry a lambdoid prophage, is most likely no longer functional because of mutations. With the exception of R1pe and R2pe, which have the potential to encode a restriction/modification and a sugar transport system, respectively, no functions could be predicted for the other Y. pestis-specific loci. To determine the role of the eight chromosomal loci in the physiology and pathogenicity of the plague bacillus, each of them was individually deleted from the bacterial chromosome. None of the deletants exhibited defects during growth in vitro. Using the Xenopsylla cheopis flea model, all deletants retained the capacity to produce a stable and persistent infection and to block fleas. Similarly, none of the deletants caused any acute flea toxicity. In the mouse model of infection, all deletants were fully virulent upon subcutaneous or aerosol infections. Therefore

  9. Analysis of the Macaca mulatta transcriptome and the sequence divergence between Macaca and human.

    Science.gov (United States)

    Magness, Charles L; Fellin, P Campion; Thomas, Matthew J; Korth, Marcus J; Agy, Michael B; Proll, Sean C; Fitzgibbon, Matthew; Scherer, Christina A; Miner, Douglas G; Katze, Michael G; Iadonato, Shawn P

    2005-01-01

    We report the initial sequencing and comparative analysis of the Macaca mulatta transcriptome. Cloned sequences from 11 tissues, nine animals, and three species (M. mulatta, M. fascicularis, and M. nemestrina) were sampled, resulting in the generation of 48,642 sequence reads. These data represent an initial sampling of the putative rhesus orthologs for 6,216 human genes. Mean nucleotide diversity within M. mulatta and sequence divergence among M. fascicularis, M. nemestrina, and M. mulatta are also reported.

  10. Genetic Diversity in Passiflora Species Assessed by Morphological and ITS Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Shiamala Devi Ramaiya

    2014-01-01

    Full Text Available This study used morphological characterization and phylogenetic analysis of the internal transcribed spacer (ITS region of nuclear ribosomal DNA to investigate the phylogeny of Passiflora species. The samples were collected from various regions of East Malaysia, and discriminant function analysis based on linear combinations of morphological variables was used to classify the Passiflora species. The biplots generated five distinct groups discriminated by morphological variables. The group consisted of cultivars of P. edulis with high levels of genetic similarity; in contrast, P. foetida was highly divergent from other species in the morphological biplots. The final dataset of aligned sequences from nine studied Passiflora accessions and 30 other individuals obtained from GenBank database (NCBI yielded one most parsimonious tree with two strongly supported clades. Maximum parsimony (MP tree showed the phylogenetic relationships within this subgenus Passiflora support the classification at the series level. The constructed phylogenic tree also confirmed the divergence of P. foetida from all other species and the closeness of wild and cultivated species. The phylogenetic relationships were consistent with results of morphological assessments. The results of this study indicate that ITS region analysis represents a useful tool for evaluating genetic diversity in Passiflora at the species level.

  11. Comparative analysis of complete chloroplast genome sequence and inversion variation in Lasthenia burkei (Madieae, Asteraceae).

    Science.gov (United States)

    Walker, Joseph F; Zanis, Michael J; Emery, Nancy C

    2014-04-01

    Complete chloroplast genome studies can help resolve relationships among large, complex plant lineages such as Asteraceae. We present the first whole plastome from the Madieae tribe and compare its sequence variation to other chloroplast genomes in Asteraceae. We used high throughput sequencing to obtain the Lasthenia burkei chloroplast genome. We compared sequence structure and rates of molecular evolution in the small single copy (SSC), large single copy (LSC), and inverted repeat (IR) regions to those for eight Asteraceae accessions and one Solanaceae accession. The chloroplast sequence of L. burkei is 150 746 bp and contains 81 unique protein coding genes and 4 coding ribosomal RNA sequences. We identified three major inversions in the L. burkei chloroplast, all of which have been found in other Asteraceae lineages, and a previously unreported inversion in Lactuca sativa. Regions flanking inversions contained tRNA sequences, but did not have particularly high G + C content. Substitution rates varied among the SSC, LSC, and IR regions, and rates of evolution within each region varied among species. Some observed differences in rates of molecular evolution may be explained by the relative proportion of coding to noncoding sequence within regions. Rates of molecular evolution vary substantially within and among chloroplast genomes, and major inversion events may be promoted by the presence of tRNAs. Collectively, these results provide insight into different mechanisms that may promote intramolecular recombination and the inversion of large genomic regions in the plastome.

  12. Sequence analysis of mitochondrial 16S ribosomal RNA gene ...

    Indian Academy of Sciences (India)

    Unknown

    For the understanding of their vectorial capacity, identification of disease carrying and refractory strains is essential. ... been widely used for phylogenetic studies and sequence differences in ... In order to fill up the internal gap, a new set.

  13. simple sequence repeat (SSR) markers in genetic analysis of

    African Journals Online (AJOL)

    Yomi

    2012-08-28

    1998). Cross- species amplification of soybean (Glycine max) simple sequence repeats (SSRs) within the genus and other legume genera: implications for the transferability of SSRs in plants. Mol. Biol. Evol. 15:1275-1287.

  14. DELIMINATE--a fast and efficient method for loss-less compression of genomic sequences: sequence analysis.

    Science.gov (United States)

    Mohammed, Monzoorul Haque; Dutta, Anirban; Bose, Tungadri; Chadaram, Sudha; Mande, Sharmila S

    2012-10-01

    An unprecedented quantity of genome sequence data is currently being generated using next-generation sequencing platforms. This has necessitated the development of novel bioinformatics approaches and algorithms that not only facilitate a meaningful analysis of these data but also aid in efficient compression, storage, retrieval and transmission of huge volumes of the generated data. We present a novel compression algorithm (DELIMINATE) that can rapidly compress genomic sequence data in a loss-less fashion. Validation results indicate relatively higher compression efficiency of DELIMINATE when compared with popular general purpose compression algorithms, namely, gzip, bzip2 and lzma. Linux, Windows and Mac implementations (both 32 and 64-bit) of DELIMINATE are freely available for download at: http://metagenomics.atc.tcs.com/compression/DELIMINATE. sharmila@atc.tcs.com Supplementary data are available at Bioinformatics online.

  15. Analysis of 16S rRNA amplicon sequencing options on the Roche/454 next-generation titanium sequencing platform.

    Directory of Open Access Journals (Sweden)

    Hideyuki Tamaki

    Full Text Available BACKGROUND: 16S rRNA gene pyrosequencing approach has revolutionized studies in microbial ecology. While primer selection and short read length can affect the resulting microbial community profile, little is known about the influence of pyrosequencing methods on the sequencing throughput and the outcome of microbial community analyses. The aim of this study is to compare differences in output, ease, and cost among three different amplicon pyrosequencing methods for the Roche/454 Titanium platform METHODOLOGY/PRINCIPAL FINDINGS: The following three pyrosequencing methods for 16S rRNA genes were selected in this study: Method-1 (standard method is the recommended method for bi-directional sequencing using the LIB-A kit; Method-2 is a new option designed in this study for unidirectional sequencing with the LIB-A kit; and Method-3 uses the LIB-L kit for unidirectional sequencing. In our comparison among these three methods using 10 different environmental samples, Method-2 and Method-3 produced 1.5-1.6 times more useable reads than the standard method (Method-1, after quality-based trimming, and did not compromise the outcome of microbial community analyses. Specifically, Method-3 is the most cost-effective unidirectional amplicon sequencing method as it provided the most reads and required the least effort in consumables management. CONCLUSIONS: Our findings clearly demonstrated that alternative pyrosequencing methods for 16S rRNA genes could drastically affect sequencing output (e.g. number of reads before and after trimming but have little effect on the outcomes of microbial community analysis. This finding is important for both researchers and sequencing facilities utilizing 16S rRNA gene pyrosequencing for microbial ecological studies.

  16. Sequence quality analysis tool for HIV type 1 protease and reverse transcriptase.

    Science.gov (United States)

    Delong, Allison K; Wu, Mingham; Bennett, Diane; Parkin, Neil; Wu, Zhijin; Hogan, Joseph W; Kantor, Rami

    2012-08-01

    Access to antiretroviral therapy is increasing globally and drug resistance evolution is anticipated. Currently, protease (PR) and reverse transcriptase (RT) sequence generation is increasing, including the use of in-house sequencing assays, and quality assessment prior to sequence analysis is essential. We created a computational HIV PR/RT Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment. Sequence quality thresholds are calculated from a large dataset (46,802 PR and 44,432 RT sequences) from the published literature ( http://hivdb.Stanford.edu ). Nucleic acid sequences are read into SQUAT, identified, aligned, and translated. Nucleic acid sequences are flagged if with >five 1-2-base insertions; >one 3-base insertion; >one deletion; >six PR or >18 RT ambiguous bases; >three consecutive PR or >four RT nucleic acid mutations; >zero stop codons; >three PR or >six RT ambiguous amino acids; >three consecutive PR or >four RT amino acid mutations; >zero unique amino acids; or 15% genetic distance from another submitted sequence. Thresholds are user modifiable. SQUAT output includes a summary report with detailed comments for troubleshooting of flagged sequences, histograms of pairwise genetic distances, neighbor joining phylogenetic trees, and aligned nucleic and amino acid sequences. SQUAT is a stand-alone, free, web-independent tool to ensure use of high-quality HIV PR/RT sequences in interpretation and reporting of drug resistance, while increasing awareness and expertise and facilitating troubleshooting of potentially problematic sequences.

  17. Clonal study of avian Escherichia coli strains by fliC conserved-DNA-sequence regions analysis Estudo clonal de Escherichia coli aviário por análise de seqüências de DNA conservadas do gene fliC

    Directory of Open Access Journals (Sweden)

    Tatiana Amabile de Campos

    2008-10-01

    Full Text Available The clonal relationship among avian Escherichia coli strains and their genetic proximity with human pathogenic E. coli, Salmonela enterica, Yersinia enterocolitica and Proteus mirabilis, was determined by the DNA sequencing of the conserved 5' and 3'regions fliC gene (flagellin encoded gene. Among 30 commensal avian E. coli strains and 49 pathogenic avian E. coli strains (APEC, 24 commensal and 39 APEC strains harbored fliC gene with fragments size varying from 670bp to 1,900bp. The comparative analysis of these regions allowed the construction of a dendrogram of similarity possessing two main clusters: one compounded mainly by APEC strains and by H-antigens from human E. coli, and another one compounded by commensal avian E. coli strains, S. enterica, and by other H-antigens from human E. coli. Overall, this work demonstrated that fliC conserved regions may be associated with pathogenic clones of APEC strains, and also shows a great similarity among APEC and H-antigens of E. coli strains isolated from humans. These data, can add evidence that APEC strains can exhibit a zoonotic risk.A relação clonal entre linhagens de Escherichia coli de origem aviária e sua proximidade genética com E. coli patogênica para humanos, Salmonella enterica, Yersinia enterocolitica e Proteus mirabilis foi determinada através da utilização das seqüências conservadas 5' e 3' do gene fliC (responsável pela codificação da flagelina. Entre as 30 linhagens comensais de E. coli aviária e as 49 linhagens patogênicas de E. coli para aves (APEC, 24 linhagens comensais e 39 APEC apresentaram o gene fliC, que foi encontrado em tamanhos que variam de 670pb a 1900pb. Um dendrograma representando similaridade genética foi obtido a partir do seqüenciamento das regiões 5' e 3' conservadas do gene fliC das linhagens de E. coli de origem aviária, das seqüências dos antígenos H de E. coli de origem humana, de S. enterica, Y. enterocolitica e de P. mirabilis. A an

  18. Complete sequence and comparative analysis of the chloroplast genome of Plinia trunciflora

    Directory of Open Access Journals (Sweden)

    Maria Eguiluz

    2017-11-01

    Full Text Available Abstract Plinia trunciflora is a Brazilian native fruit tree from the Myrtaceae family, also known as jaboticaba. This species has great potential by its fruit production. Due to the high content of essential oils in their leaves and of anthocyanins in the fruits, there is also an increasing interest by the pharmaceutical industry. Nevertheless, there are few studies focusing on its molecular biology and genetic characterization. We herein report the complete chloroplast (cp genome of P. trunciflora using high-throughput sequencing and compare it to other previously sequenced Myrtaceae genomes. The cp genome of P. trunciflora is 159,512 bp in size, comprising inverted repeats of 26,414 bp and single-copy regions of 88,097 bp (LSC and 18,587 bp (SSC. The genome contains 111 single-copy genes (77 protein-coding, 30 tRNA and four rRNA genes. Phylogenetic analysis using 57 cp protein-coding genes demonstrated that P. trunciflora, Eugenia uniflora and Acca sellowiana form a cluster with closer relationship to Syzygium cumini than with Eucalyptus. The complete cp sequence reported here can be used in evolutionary and population genetics studies, contributing to resolve the complex taxonomy of this species and fill the gap in genetic characterization.

  19. Complete sequence and comparative analysis of the chloroplast genome of Plinia trunciflora

    Science.gov (United States)

    Eguiluz, Maria; Yuyama, Priscila Mary; Guzman, Frank; Rodrigues, Nureyev Ferreira; Margis, Rogerio

    2017-01-01

    Abstract Plinia trunciflora is a Brazilian native fruit tree from the Myrtaceae family, also known as jaboticaba. This species has great potential by its fruit production. Due to the high content of essential oils in their leaves and of anthocyanins in the fruits, there is also an increasing interest by the pharmaceutical industry. Nevertheless, there are few studies focusing on its molecular biology and genetic characterization. We herein report the complete chloroplast (cp) genome of P. trunciflora using high-throughput sequencing and compare it to other previously sequenced Myrtaceae genomes. The cp genome of P. trunciflora is 159,512 bp in size, comprising inverted repeats of 26,414 bp and single-copy regions of 88,097 bp (LSC) and 18,587 bp (SSC). The genome contains 111 single-copy genes (77 protein-coding, 30 tRNA and four rRNA genes). Phylogenetic analysis using 57 cp protein-coding genes demonstrated that P. trunciflora, Eugenia uniflora and Acca sellowiana form a cluster with closer relationship to Syzygium cumini than with Eucalyptus. The complete cp sequence reported here can be used in evolutionary and population genetics studies, contributing to resolve the complex taxonomy of this species and fill the gap in genetic characterization. PMID:29111566

  20. Complete sequence and comparative analysis of the chloroplast genome of Plinia trunciflora.

    Science.gov (United States)

    Eguiluz, Maria; Yuyama, Priscila Mary; Guzman, Frank; Rodrigues, Nureyev Ferreira; Margis, Rogerio

    2017-01-01

    Plinia trunciflora is a Brazilian native fruit tree from the Myrtaceae family, also known as jaboticaba. This species has great potential by its fruit production. Due to the high content of essential oils in their leaves and of anthocyanins in the fruits, there is also an increasing interest by the pharmaceutical industry. Nevertheless, there are few studies focusing on its molecular biology and genetic characterization. We herein report the complete chloroplast (cp) genome of P. trunciflora using high-throughput sequencing and compare it to other previously sequenced Myrtaceae genomes. The cp genome of P. trunciflora is 159,512 bp in size, comprising inverted repeats of 26,414 bp and single-copy regions of 88,097 bp (LSC) and 18,587 bp (SSC). The genome contains 111 single-copy genes (77 protein-coding, 30 tRNA and four rRNA genes). Phylogenetic analysis using 57 cp protein-coding genes demonstrated that P. trunciflora, Eugenia uniflora and Acca sellowiana form a cluster with closer relationship to Syzygium cumini than with Eucalyptus. The complete cp sequence reported here can be used in evolutionary and population genetics studies, contributing to resolve the complex taxonomy of this species and fill the gap in genetic characterization.

  1. Genetic Analysis Using Partial Sequencing of Melanocortin 4 Receptor (MC4R Gene in Bligon Goat

    Directory of Open Access Journals (Sweden)

    Latifah Latifah

    2017-08-01

    Full Text Available Melanocortin 4 Receptor gene is involved in sympathetic nerve activity, adrenal and thyroid functions, and media for leptin in regulating energy balance and homeostasis. The aim of this research was to perform genetic analysis of MC4R gene sequences from Bligon goats. Fourty blood samples of Bligon does were used for DNA extraction. The primers were designed after alignment of 12 DNA sequences of MC4R gene from goat, sheep, and cattle. The primers were constructed on the Capra hircus MC4R gene sequence from GenBank (accession No. NM_001285591. Two DNA polymorphisms of MC4R were revealed in exon region (g.998 A/G and g.1079 C/T. The SNP g.998 A/G was a non-synonymous polymorphism i.e., changing of amino acid from methionine (Met to isoleucine (Ile. The SNP g.1079 C/T was a synonymous polymorphism. Restriction enzyme mapping on Bligon goat MC4R gene revealed three restriction enzymes (RsaI (GT’AC, Acc651 (G’GTAC_C, and KpnI (G_GTAC’C, which can recognize the SNP at g.1079 C/T. The restriction enzymes may be used for genotyping of the gene target using PCR-RFLP method in the future research.

  2. Complete Sequence and Analysis of Coconut Palm (Cocos nucifera) Mitochondrial Genome.

    Science.gov (United States)

    Aljohi, Hasan Awad; Liu, Wanfei; Lin, Qiang; Zhao, Yuhui; Zeng, Jingyao; Alamer, Ali; Alanazi, Ibrahim O; Alawad, Abdullah O; Al-Sadi, Abdullah M; Hu, Songnian; Yu, Jun

    2016-01-01

    Coconut (Cocos nucifera L.), a member of the palm family (Arecaceae), is one of the most economically important crops in tropics, serving as an important source of food, drink, fuel, medicine, and construction material. Here we report an assembly of the coconut (C. nucifera, Oman local Tall cultivar) mitochondrial (mt) genome based on next-generation sequencing data. This genome, 678,653bp in length and 45.5% in GC content, encodes 72 proteins, 9 pseudogenes, 23 tRNAs, and 3 ribosomal RNAs. Within the assembly, we find that the chloroplast (cp) derived regions account for 5.07% of the total assembly length, including 13 proteins, 2 pseudogenes, and 11 tRNAs. The mt genome has a relatively large fraction of repeat content (17.26%), including both forward (tandem) and inverted (palindromic) repeats. Sequence variation analysis shows that the Ti/Tv ratio of the mt genome is lower as compared to that of the nuclear genome and neutral expectation. By combining public RNA-Seq data for coconut, we identify 734 RNA editing sites supported by at least two datasets. In summary, our data provides the second complete mt genome sequence in the family Arecaceae, essential for further investigations on mitochondrial biology of seed plants.

  3. Sequences of the joining region genes for immunoglobulin heavy chains and their role in generation of antibody diversity.

    OpenAIRE

    Gough, N M; Bernard, O

    1981-01-01

    To assess the contribution to immunoglobulin heavy chain diversity made by recombination between variable region (VH) genes and joining region (JH) genes, we have determined the sequence of about 2000 nucleotides spanning the rearranged JH gene cluster associated with the VH gene expressed in plasmacytoma HPC76. The active VH76 gene has recombined with the second germ-line JH gene. The region we have studied contains two other JH genes, designated JH3 and JH4. No other JH gene was found withi...

  4. Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing

    Science.gov (United States)

    Manske, Magnus; Miotto, Olivo; Campino, Susana; Auburn, Sarah; Almagro-Garcia, Jacob; Maslen, Gareth; O’Brien, Jack; Djimde, Abdoulaye; Doumbo, Ogobara; Zongo, Issaka; Ouedraogo, Jean-Bosco; Michon, Pascal; Mueller, Ivo; Siba, Peter; Nzila, Alexis; Borrmann, Steffen; Kiara, Steven M.; Marsh, Kevin; Jiang, Hongying; Su, Xin-Zhuan; Amaratunga, Chanaki; Fairhurst, Rick; Socheat, Duong; Nosten, Francois; Imwong, Mallika; White, Nicholas J.; Sanders, Mandy; Anastasi, Elisa; Alcock, Dan; Drury, Eleanor; Oyola, Samuel; Quail, Michael A.; Turner, Daniel J.; Rubio, Valentin Ruano; Jyothi, Dushyanth; Amenga-Etego, Lucas; Hubbart, Christina; Jeffreys, Anna; Rowlands, Kate; Sutherland, Colin; Roper, Cally; Mangano, Valentina; Modiano, David; Tan, John C.; Ferdig, Michael T.; Amambua-Ngwa, Alfred; Conway, David J.; Takala-Harrison, Shannon; Plowe, Christopher V.; Rayner, Julian C.; Rockett, Kirk A.; Clark, Taane G.; Newbold, Chris I.; Berriman, Matthew; MacInnis, Bronwyn; Kwiatkowski, Dominic P.

    2013-01-01

    Malaria elimination strategies require surveillance of the parasite population for genetic changes that demand a public health response, such as new forms of drug resistance. 1,2 Here we describe methods for large-scale analysis of genetic variation in Plasmodium falciparum by deep sequencing of parasite DNA obtained from the blood of patients with malaria, either directly or after short term culture. Analysis of 86,158 exonic SNPs that passed genotyping quality control in 227 samples from Africa, Asia and Oceania provides genome-wide estimates of allele frequency distribution, population structure and linkage disequilibrium. By comparing the genetic diversity of individual infections with that of the local parasite population, we derive a metric of within-host diversity that is related to the level of inbreeding in the population. An open-access web application has been established for exploration of regional differences in allele frequency and of highly differentiated loci in the P. falciparum genome. PMID:22722859

  5. First fungal genome sequence from Africa: A preliminary analysis

    Directory of Open Access Journals (Sweden)

    Rene Sutherland

    2012-01-01

    Full Text Available Some of the most significant breakthroughs in the biological sciences this century will emerge from the development of next generation sequencing technologies. The ease of availability of DNA sequence made possible through these new technologies has given researchers opportunities to study organisms in a manner that was not possible with Sanger sequencing. Scientists will, therefore, need to embrace genomics, as well as develop and nurture the human capacity to sequence genomes and utilise the ’tsunami‘ of data that emerge from genome sequencing. In response to these challenges, we sequenced the genome of Fusarium circinatum, a fungal pathogen of pine that causes pitch canker, a disease of great concern to the South African forestry industry. The sequencing work was conducted in South Africa, making F. circinatum the first eukaryotic organism for which the complete genome has been sequenced locally. Here we report on the process that was followed to sequence, assemble and perform a preliminary characterisation of the genome. Furthermore, details of the computer annotation and manual curation of this genome are presented. The F. circinatum genome was found to be nearly 44 million bases in size, which is similar to that of four other Fusarium genomes that have been sequenced elsewhere. The genome contains just over 15 000 open reading frames, which is less than that of the related species, Fusarium oxysporum, but more than that for Fusarium verticillioides. Amongst the various putative gene clusters identified in F. circinatum, those encoding the secondary metabolites fumosin and fusarin appeared to harbour evidence of gene translocation. It is anticipated that similar comparisons of other loci will provide insights into the genetic basis for pathogenicity of the pitch canker pathogen. Perhaps more importantly, this project has engaged a relatively large group of scientists

  6. REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

    Science.gov (United States)

    Leonard, Guy; Stevens, Jamie R.; Richards, Thomas A.

    2009-01-01

    The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment file, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree files (with a user-defined combination of species name and/or database accession number). Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file) and generation of species and accession number lists for use in supplementary materials or figure legends. PMID:19812722

  7. REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

    Directory of Open Access Journals (Sweden)

    Guy Leonard

    2009-01-01

    Full Text Available The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment fi le, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree fi les (with a user-defined combination of species name and/or database accession number. Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file and generation of species and accession number lists for use in supplementary materials or figure legends.

  8. Sequencing and analysis of the Mediterranean amphioxus (Branchiostoma lanceolatum transcriptome.

    Directory of Open Access Journals (Sweden)

    Silvan Oulion

    Full Text Available BACKGROUND: The basally divergent phylogenetic position of amphioxus (Cephalochordata, as well as its conserved morphology, development and genetics, make it the best proxy for the chordate ancestor. Particularly, studies using the amphioxus model help our understanding of vertebrate evolution and development. Thus, interest for the amphioxus model led to the characterization of both the transcriptome and complete genome sequence of the American species, Branchiostoma floridae. However, recent technical improvements allowing induction of spawning in the laboratory during the breeding season on a daily basis with the Mediterranean species Branchiostoma lanceolatum have encouraged European Evo-Devo researchers to adopt this species as a model even though no genomic or transcriptomic data have been available. To fill this need we used the pyrosequencing method to characterize the B. lanceolatum transcriptome and then compared our results with the published transcriptome of B. floridae. RESULTS: Starting with total RNA from nine different developmental stages of B. lanceolatum, a normalized cDNA library was constructed and sequenced on Roche GS FLX (Titanium mode. Around 1.4 million of reads were produced and assembled into 70,530 contigs (average length of 490 bp. Overall 37% of the assembled sequences were annotated by BlastX and their Gene Ontology terms were determined. These results were then compared to genomic and transcriptomic data of B. floridae to assess similarities and specificities of each species. CONCLUSION: We obtained a high-quality amphioxus (B. lanceolatum reference transcriptome using a high throughput sequencing approach. We found that 83% of the predicted genes in the B. floridae complete genome sequence are also found in the B. lanceolatum transcriptome, while only 41% were found in the B. floridae transcriptome obtained with traditional Sanger based sequencing. Therefore, given the high degree of sequence conservation

  9. Tandemly repeated sequence in 5'end of mtDNA control region of ...

    African Journals Online (AJOL)

    STORAGESEVER

    2008-12-17

    Dec 17, 2008 ... chain reaction (PCR). Japanese Spanish ... mainly covered general ecology and fishery biology. No study concerning the ... Conserved sequence blocks and the repeat units are indicated by boxes. performed using the exact ...

  10. New tool to assemble repetitive regions using next-generation sequencing data

    Science.gov (United States)

    Kuśmirek, Wiktor; Nowak, Robert M.; Neumann, Łukasz

    2017-08-01

    The next generation sequencing techniques produce a large amount of sequencing data. Some part of the genome are composed of repetitive DNA sequences, which are very problematic for the existing genome assemblers. We propose a modification of the algorithm for a DNA assembly, which uses the relative frequency of reads to properly reconstruct repetitive sequences. The new approach was implemented and tested, as a demonstration of the capability of our software we present some results for model organisms. The new implementation, using a three-layer software architecture was selected, where the presentation layer, data processing layer, and data storage layer were kept separate. Source code as well as demo application with web interface and the additional data are available at project web-page: http://dnaasm.sourceforge.net.

  11. Inferring Invasion History of Red Swamp Crayfish (Procambarus clarkii) in China from Mitochondrial Control Region and Nuclear Intron Sequences

    Science.gov (United States)

    Li, Yanhe; Guo, Xianwu; Chen, Liping; Bai, Xiaohui; Wei, Xinlan; Zhou, Xiaoyun; Huang, Songqian; Wang, Weimin

    2015-01-01

    Identifying the dispersal pathways of an invasive species is useful for adopting the appropriate strategies to prevent and control its spread. However, these processes are exceedingly complex. So, it is necessary to apply new technology and collect representative samples for analysis. This study used Approximate Bayesian Computation (ABC) in combination with traditional genetic tools to examine extensive sample data and historical records to infer the invasion history of the red swamp crayfish, Procambarus clarkii, in China. The sequences of the mitochondrial control region and the proPOx intron in the nuclear genome of samples from 37 sites (35 in China and one each in Japan and the USA) were analyzed. The results of combined scenarios testing and historical records revealed a much more complex invasion history in China than previously believed. P. clarkii was most likely originally introduced into China from Japan from an unsampled source, and the species then expanded its range primarily into the middle and lower reaches and, to a lesser extent, into the upper reaches of the Changjiang River in China. No transfer was observed from the upper reaches to the middle and lower reaches of the Changjiang River. Human-mediated jump dispersal was an important dispersal pathway for P. clarkii. The results provide a better understanding of the evolutionary scenarios involved in the rapid invasion of P. clarkii in China. PMID:26132567

  12. Inferring Invasion History of Red Swamp Crayfish (Procambarus clarkii in China from Mitochondrial Control Region and Nuclear Intron Sequences

    Directory of Open Access Journals (Sweden)

    Yanhe Li

    2015-06-01

    Full Text Available Identifying the dispersal pathways of an invasive species is useful for adopting the appropriate strategies to prevent and control its spread. However, these processes are exceedingly complex. So, it is necessary to apply new technology and collect representative samples for analysis. This study used Approximate Bayesian Computation (ABC in combination with traditional genetic tools to examine extensive sample data and historical records to infer the invasion history of the red swamp crayfish, Procambarus clarkii, in China. The sequences of the mitochondrial control region and the proPOx intron in the nuclear genome of samples from 37 sites (35 in China and one each in Japan and the USA were analyzed. The results of combined scenarios testing and historical records revealed a much more complex invasion history in China than previously believed. P. clarkii was most likely originally introduced into China from Japan from an unsampled source, and the species then expanded its range primarily into the middle and lower reaches and, to a lesser extent, into the upper reaches of the Changjiang River in China. No transfer was observed from the upper reaches to the middle and lower reaches of the Changjiang River. Human-mediated jump dispersal was an important dispersal pathway for P. clarkii. The results provide a better understanding of the evolutionary scenarios involved in the rapid invasion of P. clarkii in China.

  13. Partial Sequence Analysis of Merozoite Surface Proteine-3α Gene in Plasmodium vivax Isolates from Malarious Areas of Iran

    Directory of Open Access Journals (Sweden)

    H Mirhendi

    2008-12-01

    Full Text Available Background: Approximately 85-90% of malaria infections in Iran are attributed to Plasmodium vivax, while little is known about the genetic of the parasite and its strain types in this region. This study was designed and performed for describing genetic characteristics of Plasmodium vivax population of Iran based on the merozoite surface protein-3α gene sequence. Methods: Through a descriptive study we analyzed partial P. vivax merozoite surface protein-3α gene sequences from 17 clinical P. vivax isolates collected from malarious areas of Iran. Genomic DNA was extracted by Q1Aamp® DNA blood mini kit, amplified through nested PCR for a partial nucleotide sequence of PvMSP-3 gene in P. vivax. PCR-amplified products were sequenced with an ABI Prism Perkin-Elmer 310 sequencer machine and the data were analyzed with clustal W software. Results: Analysis of PvMSP-3 gene sequences demonstrated extensive polymorphisms, but the sequence identity between isolates of same types was relatively high. We identified specific insertions and deletions for the types A, B and C variants of P. vivax in our isolates. In phylogenetic comparison of geographically separated isolates, there was not a significant geo­graphical branching of the parasite populations. Conclusion: The highly polymorphic nature of isolates suggests that more investigations of the PvMSP-3 gene are needed to explore its vaccine potential.

  14. Unique Trichomonas vaginalis gene sequences identified in multinational regions of Northwest China.

    Science.gov (United States)

    Liu, Jun; Feng, Meng; Wang, Xiaolan; Fu, Yongfeng; Ma, Cailing; Cheng, Xunjia

    2017-07-24

    Trichomonas vaginalis (T. vaginalis) is a flagellated protozoan parasite that infects humans worldwide. This study determined the sequence of the 18S ribosomal RNA gene of T. vaginalis infecting both females and males in Xinjiang, China. Samples from 73 females and 28 males were collected and confirmed for infection with T. vaginalis, a total of 110 sequences were identified when the T. vaginalis 18S ribosomal RNA gene was sequenced. These sequences were used to prepare a phylogenetic network. The rooted network comprised three large clades and several independent branches. Most of the Xinjiang sequences were in one group. Preliminary results suggest that Xinjiang T. vaginalis isolates might be genetically unique, as indicated by the sequence of their 18S ribosomal RNA gene. Low migration rate of local people in this province may contribute to a genetic conservativeness of T. vaginalis. The unique genetic feature of our isolates may suggest a different clinical presentation of trichomoniasis, including metronidazole susceptibility, T. vaginalis virus or Mycoplasma co-infection characteristics. The transmission and evolution of Xinjiang T. vaginalis is of interest and should be studied further. More attention should be given to T. vaginalis infection in both females and males in Xinjiang.

  15. Sequence variation of koala retrovirus transmembrane protein p15E among koalas from different geographic regions

    Science.gov (United States)

    Ishida, Yasuko; McCallister, Chelsea; Nikolaidis, Nikolas; Tsangaras, Kyriakos; Helgen, Kristofer M.; Greenwood, Alex D.; Roca, Alfred L.

    2014-01-01

    The koala retrovirus (KoRV), which is transitioning from an exogenous to an endogenous form, has been associated with high mortality in koalas. For other retroviruses, the envelope protein p15E has been considered a candidate for vaccine development. We therefore examined proviral sequence variation of KoRV p15E in a captive Queensland and three wild southern Australian koalas. We generated 163 sequences with intact open reading frames, which grouped into 39 distinct haplotypes. Sixteen distinct haplotypes comprising 139 of the sequences (85%) coded for the same polypeptide. Among the remaining 23 haplotypes, 22 were detected only once among the sequences, and each had 1 or 2 non-synonymous differences from the majority sequence. Several analyses suggested that p15E was under purifying selection. Important epitopes and domains were highly conserved across the p15E sequences and in previously reported exogenous KoRVs. Overall, these results support the potential use of p15E for KoRV vaccine development. PMID:25462343

  16. Whole genome sequencing and evolutionary analysis of human respiratory syncytial virus A and B from Milwaukee, WI 1998-2010.

    Directory of Open Access Journals (Sweden)

    Cecilia Rebuffo-Scheer

    Full Text Available BACKGROUND: Respiratory Syncytial Virus (RSV is the leading cause of lower respiratory-tract infections in infants and young children worldwide. Despite this, only six complete genome sequences of original strains have been previously published, the most recent of which dates back 35 and 26 years for RSV group A and group B respectively. METHODOLOGY/PRINCIPAL FINDINGS: We present a semi-automated sequencing method allowing for the sequencing of four RSV whole genomes simultaneously. We were able to sequence the complete coding sequences of 13 RSV A and 4 RSV B strains from Milwaukee collected from 1998-2010. Another 12 RSV A and 5 RSV B strains sequenced in this study cover the majority of the genome. All RSV A and RSV B sequences were analyzed by neighbor-joining, maximum parsimony and Bayesian phylogeny methods. Genetic diversity was high among RSV A viruses in Milwaukee including the circulation of multiple genotypes (GA1, GA2, GA5, GA7 with GA2 persisting throughout the 13 years of the study. However, RSV B genomes showed little variation with all belonging to the BA genotype. For RSV A, the same evolutionary patterns and clades were seen consistently across the whole genome including all intergenic, coding, and non-coding regions sequences. CONCLUSIONS/SIGNIFICANCE: The sequencing strategy presented in this work allows for RSV A and B genomes to be sequenced simultaneously in two working days and with a low cost. We have significantly increased the amount of genomic data that is available for both RSV A and B, providing the basic molecular characteristics of RSV strains circulating in Milwaukee over the last 13 years. This information can be used for comparative analysis with strains circulating in other communities around the world which should also help with the development of new strategies for control of RSV, specifically vaccine development and improvement of RSV diagnostics.

  17. Regional analysis and environmental impact assessment

    International Nuclear Information System (INIS)

    Parzyck, D.C.; Brocksen, R.W.; Emanuel, W.R.

    1976-01-01

    This paper presents a number of techniques that can be used to assess environmental impacts on a regional scale. Regional methodologies have been developed which examine impacts upon aquatic and terrestrial biota in regions through consideration of changes in land use, land cover, air quality, water resource use, and water quality. Techniques used to assess long-range atmospheric transport, water resources, effects on sensitive forest and animal species, and impacts on man are presented in this paper, along with an optimization approach which serves to integrate the analytical techniques in an overall assessment framework. A brief review of the research approach and certain modeling techniques used within one regional studies program is provided. While it is not an all inclusive report on regional analyses, it does present an illustration of the types of analyses that can be performed on a regional scale

  18. Characterization of shark complement factor I gene(s): genomic analysis of a novel shark-specific sequence.

    Science.gov (United States)

    Shin, Dong-Ho; Webb, Barbara M; Nakao, Miki; Smith, Sylvia L

    2009-07-01

    Complement factor I is a crucial regulator of mammalian complement activity. Very little is known of complement regulators in non-mammalian species. We isolated and sequenced four highly similar complement factor I cDNAs from the liver of the nurse shark (Ginglymostoma cirratum), designated as GcIf-1, GcIf-2, GcIf-3 and GcIf-4 (previously referred to as nsFI-a, -b, -c and -d) which encode 689, 673, 673 and 657 amino acid residues, respectively. They share 95% (shark-specific sequence between the leader peptide (LP) and the factor I membrane attack complex (FIMAC) domain. The cDNA sequences differ only in the size and composition of the shark-specific region (SSR). Sequence analysis of each SSR has identified within the region two novel short sequences (SS1 and SS2) and three repeat sequences (RS1-3). Genomic analysis has revealed the existence of three introns between the leader peptide and the FIMAC domain, tentatively designated intron 1, intron 2, and intron 3 which span 4067, 2293 and 2082bp, respectively. Southern blot analysis suggests the presence of a single gene copy for each cDNA type. Phylogenetic analysis suggests that complement factor I of cartilaginous fish diverged prior to the emergence of mammals. All four GcIf cDNA species are expressed in four different tissues and the liver is the main tissue in which expression level of all four is high. This suggests that the expression of GcIf isotypes is tissue-dependent.

  19. Comparative genome sequence analysis of Choristoneura occidentalis Freeman and C. rosaceana Harris (Lepidoptera: Tortricidae alphabaculoviruses.

    Directory of Open Access Journals (Sweden)

    David K Thumbi

    Full Text Available The complete genome sequences of Choristoneura occidentalis and C. rosaceana nucleopolyhedroviruses (ChocNPV and ChroNPV, respectively (Baculoviridae: Alphabaculovirus were determined and compared with each other and with those of other baculoviruses, including the genome of the closely related C. fumiferana NPV (CfMNPV. The ChocNPV genome was 128,446 bp in length (1147 bp smaller than that of CfMNPV, had a G+C content of 50.1%, and contained 148 open reading frames (ORFs. In comparison, the ChroNPV genome was 129,052 bp in length, had a G+C content of 48.6% and contained 149 ORFs. ChocNPV and ChroNPV shared 144 ORFs in common, and had a 77% sequence identity with each other and 96.5% and 77.8% sequence identity, respectively, with CfMNPV. Five homologous regions (hrs, with sequence similarities to those of CfMNPV, were identified in ChocNPV, whereas the ChroNPV genome contained three hrs featuring up to 14 repeats. Both genomes encoded three inhibitors of apoptosis (IAP-1, IAP-2, and IAP-3, as reported for CfMNPV, and the ChocNPV IAP-3 gene represented the most divergent functional region of this genome relative to CfMNPV. Two ORFs were unique to ChocNPV, and four were unique to ChroNPV. ChroNPV ORF chronpv38 is a eukaryotic initiation factor 5 (eIF-5 homolog that has also been identified in the C. occidentalis granulovirus (ChocGV and is believed to be the product of horizontal gene transfer from the host. Based on levels of sequence identity and phylogenetic analysis, both ChocNPV and ChroNPV fall within group I alphabaculoviruses, where ChocNPV appears to be more closely related to CfMNPV than does ChroNPV. Our analyses suggest that it may be appropriate to consider ChocNPV and CfMNPV as variants of the same virus species.

  20. Analysis of expressed sequence tags from Prunus mume flower and fruit and development of simple sequence repeat markers

    Directory of Open Access Journals (Sweden)

    Gao Zhihong

    2010-07-01

    Full Text Available Abstract Background Expressed Sequence Tag (EST has been a cost-effective tool in molecular biology and represents an abundant valuable resource for genome annotation, gene expression, and comparative genomics in plants. Results In this study, we constructed a cDNA library of Prunus mume flower and fruit, sequenced 10,123 clones of the library, and obtained 8,656 expressed sequence tag (EST sequences with high quality. The ESTs were assembled into 4,473 unigenes composed of 1,492 contigs and 2,981 singletons and that have been deposited in NCBI (accession IDs: GW868575 - GW873047, among which 1,294 unique ESTs were with known or putative functions. Furthermore, we found 1,233 putative simple sequence repeats (SSRs in the P. mume unigene dataset. We randomly tested 42 pairs of PCR primers flanking potential SSRs, and 14 pairs were identified as true-to-type SSR loci and could amplify polymorphic bands from 20 individual plants of P. mume. We further used the 14 EST-SSR primer pairs to test the transferability on peach and plum. The result showed that nearly 89% of the primer pairs produced target PCR bands in the two species. A high level of marker polymorphism was observed in the plum species (65% and low in the peach (46%, and the clustering analysis of the three species indicated that these SSR markers were useful in the evaluation of genetic relationships and diversity between and within the Prunus species. Conclusions We have constructed the first cDNA library of P. mume flower and fruit, and our data provide sets of molecular biology resources for P. mume and other Prunus species. These resources will be useful for further study such as genome annotation, new gene discovery, gene functional analysis, molecular breeding, evolution and comparative genomics between Prunus species.

  1. Is the extraction by Whatman FTA filter matrix technology and sequencing of large ribosomal subunit D1-D2 region sufficient for identification of clinical fungi?

    Science.gov (United States)

    Kiraz, Nuri; Oz, Yasemin; Aslan, Huseyin; Erturan, Zayre; Ener, Beyza; Akdagli, Sevtap Arikan; Muslumanoglu, Hamza; Cetinkaya, Zafer

    2015-10-01

    Although conventional identification of pathogenic fungi is based on the combination of tests evaluating their morphological and biochemical characteristics, they can fail to identify the less common species or the differentiation of closely related species. In addition these tests are time consuming, labour-intensive and require experienced personnel. We evaluated the feasibility and sufficiency of DNA extraction by Whatman FTA filter matrix technology and DNA sequencing of D1-D2 region of the large ribosomal subunit gene for identification of clinical isolates of 21 yeast and 160 moulds in our clinical mycology laboratory. While the yeast isolates were identified at species level with 100% homology, 102 (63.75%) clinically important mould isolates were identified at species level, 56 (35%) isolates at genus level against fungal sequences existing in DNA databases and two (1.25%) isolates could not be identified. Consequently, Whatman FTA filter matrix technology was a useful method for extraction of fungal DNA; extremely rapid, practical and successful. Sequence analysis strategy of D1-D2 region of the large ribosomal subunit gene was found considerably sufficient in identification to genus level for the most clinical fungi. However, the identification to species level and especially discrimination of closely related species may require additional analysis. © 2015 Blackwell Verlag GmbH.

  2. Characterization of the HLA-DRβ1 third hypervariable region amino acid sequence according to charge and parental inheritance in systemic sclerosis.

    Science.gov (United States)

    Gentil, Coline A; Gammill, Hilary S; Luu, Christine T; Mayes, Maureen D; Furst, Dan E; Nelson, J Lee

    2017-03-07

    Specific HLA class II alleles are associated with systemic sclerosis (SSc) risk, clinical characteristics, and autoantibodies. HLA nomenclature initially developed with antibodies as typing reagents defining DRB1 allele groups. However, alleles from different DRB1 allele groups encode the same third hypervariable region (3rd HVR) sequence, the primary T-cell recognition site, and 3rd HVR charge differences can affect interactions with T cells. We considered 3rd HVR sequences (amino acids 67-74) irrespective of the allele group and analyzed parental inheritance considered according to the 3rd HVR charge, comparing SSc patients with controls. In total, 306 families (121 SSc and 185 controls) were HLA genotyped and parental HLA-haplotype origin was determined. Analysis was conducted according to DRβ1 3rd HVR sequence, charge, and parental inheritance. The distribution of 3rd HVR sequences differed in SSc patients versus controls (p = 0.007), primarily due to an increase of specific DRB1*11 alleles, in accord with previous observations. The 3rd HVR sequences were next analyzed according to charge and parental inheritance. Paternal transmission of DRB1 alleles encoding a +2 charge 3rd HVR was significantly reduced in SSc patients compared with maternal transmission (p = 0.0003, corrected for analysis of four charge categories p = 0.001). To a lesser extent, paternal transmission was increased when charge was 0 (p = 0.021, corrected for multiple comparisons p = 0.084). In contrast, paternal versus maternal inheritance was similar in controls. SSc patients differed from controls when DRB1 alleles were categorized according to 3rd HVR sequences. Skewed parental inheritance was observed in SSc patients but not in controls when the DRβ1 3rd HVR was considered according to charge. These observations suggest that epigenetic modulation of HLA merits investigation in SSc.

  3. Siberian Regional Identity in the Context of Historical Consciousness (Content Analysis of Tomsk Regional Media

    Directory of Open Access Journals (Sweden)

    A V Bocharov

    2011-12-01

    Full Text Available The article presents a model to study the Siberian regional identity in the context of historical consciousness, as well as the results of its practical application in the content analysis of the publications by the Tomsk regional media. On the basis of the content analysis procedures the author demonstrates how, through historical memory, the regional identity is formed and manifested in the regional media in various spheres of society.

  4. The evolutionary rates of HCV estimated with subtype 1a and 1b sequences over the ORF length and in different genomic regions.

    Directory of Open Access Journals (Sweden)

    Manqiong Yuan

    Full Text Available Considerable progress has been made in the HCV evolutionary analysis, since the software BEAST was released. However, prior information, especially the prior evolutionary rate, which plays a critical role in BEAST analysis, is always difficult to ascertain due to various uncertainties. Providing a proper prior HCV evolutionary rate is thus of great importance.176 full-length sequences of HCV subtype 1a and 144 of 1b were assembled by taking into consideration the balance of the sampling dates and the even dispersion in phylogenetic trees. According to the HCV genomic organization and biological functions, each dataset was partitioned into nine genomic regions and two routinely amplified regions. A uniform prior rate was applied to the BEAST analysis for each region and also the entire ORF. All the obtained posterior rates for 1a are of a magnitude of 10(-3 substitutions/site/year and in a bell-shaped distribution. Significantly lower rates were estimated for 1b and some of the rate distribution curves resulted in a one-sided truncation, particularly under the exponential model. This indicates that some of the rates for subtype 1b are less accurate, so they were adjusted by including more sequences to improve the temporal structure.Among the various HCV subtypes and genomic regions, the evolutionary patterns are dissimilar. Therefore, an applied estimation of the HCV epidemic history requires the proper selection of the rate priors, which should match the actual dataset so that they can fit for the subtype, the genomic region and even the length. By referencing the findings here, future evolutionary analysis of the HCV subtype 1a and 1b datasets may become more accurate and hence prove useful for tracing their patterns.

  5. The HIVToolbox 2 web system integrates sequence, structure, function and mutation analysis.

    Directory of Open Access Journals (Sweden)

    David P Sargeant

    Full Text Available There is enormous interest in studying HIV pathogenesis for improving the treatment of patients with HIV infection. HIV infection has become one of the best-studied systems for understanding how a virus can hijack a cell. To help facilitate discovery, we previously built HIVToolbox, a web system for visual data mining. The original HIVToolbox integrated information for HIV protein sequence, structure, functional sites, and sequence conservation. This web system has been used for almost 40,000 searches. We report improvements to HIVToolbox including new functions and workflows, data updates, and updates for ease of use. HIVToolbox2, is an improvement over HIVToolbox with new functions. HIVToolbox2 has new functionalities focused on HIV pathogenesis including drug-binding sites, drug-resistance mutations, and immune epitopes. The integrated, interactive view enables visual mining to generate hypotheses that are not readily revealed by other approaches. Most HIV proteins form multimers, and there are posttranslational modification and protein-protein interaction sites at many of these multimerization interfaces. Analysis of protease drug binding sites reveals an anatomy of drug resistance with different types of drug-resistance mutations regionally localized on the surface of protease. Some of these drug-resistance mutations have a high prevalence in specific HIV-1 M subtypes. Finally, consolidation of Tat functional sites reveals a hotspot region where there appear to be 30 interactions or posttranslational modifications. A cursory analysis with HIVToolbox2 has helped to identify several global patterns for HIV proteins. An initial analysis with this tool identifies homomultimerization of almost all HIV proteins, functional sites that overlap with multimerization sites, a global drug resistance anatomy for HIV protease, and specific distributions of some DRMs in specific HIV M subtypes. HIVToolbox2 is an open-access web application available at

  6. Accident Sequence Precursor Analysis for SGTR by Using Dynamic PSA Approach

    International Nuclear Information System (INIS)

    Lee, Han Sul; Heo, Gyun Young; Kim, Tae Wan

    2016-01-01

    In order to address this issue, this study suggests the sequence tree model to analyze accident sequence systematically. Using the sequence tree model, all possible scenarios which need a specific safety action to prevent the core damage can be identified and success conditions of safety action under complicated situation such as combined accident will be also identified. Sequence tree is branch model to divide plant condition considering the plant dynamics. Since sequence tree model can reflect the plant dynamics, arising from interaction of different accident timing and plant condition and from the interaction between the operator action, mitigation system, and the indicators for operation, sequence tree model can be used to develop the dynamic event tree model easily. Target safety action for this study is a feed-and-bleed (F and B) operation. A F and B operation directly cools down the reactor cooling system (RCS) using the primary cooling system when residual heat removal by the secondary cooling system is not available. In this study, a TLOFW accident and a TLOFW accident with LOCA were the target accidents. Based on the conventional PSA model and indicators, the sequence tree model for a TLOFW accident was developed. Based on the results of a sampling analysis and data from the conventional PSA model, the CDF caused by Sequence no. 26 can be realistically estimated. For a TLOFW accident with LOCA, second accident timings were categorized according to plant condition. Indicators were selected as branch point using the flow chart and tables, and a corresponding sequence tree model was developed. If sampling analysis is performed, practical accident sequences can be identified based on the sequence analysis. If a realistic distribution for the variables can be obtained for sampling analysis, much more realistic accident sequences can be described. Moreover, if the initiating event frequency under a combined accident can be quantified, the sequence tree model

  7. Sequencing and phylogenetic analysis of Herpes simplex virus type ...

    African Journals Online (AJOL)

    For determination of the genetic relationship of HSV-2 glycoprotein G gene (gG) in Iran with those in other countries, DNA fragment of 1100 bp corresponding to gG from six HSV-2 strains have been isolated from human infected sera samples in Iran, it was amplified in PCR system and was sequenced for determining ...

  8. Transcriptome analysis of blueberry using 454 EST sequencing

    Science.gov (United States)

    Blueberry (Vaccinium corymbosum) is a major berry crop in the United States, and one that has great nutritional and economical value. Next generation sequencing methodologies, such as 454, have been demonstrated to be successful and efficient in producing a snap-shot of transcriptional activities du...

  9. Characterization and sequence analysis of cysteine and glycine-rich ...

    African Journals Online (AJOL)

    Tarek

    2011-04-18

    Apr 18, 2011 ... nucleotide alignment of both native buffalo and cattle CSRP3 cDNAs sequences ..... Exon III, Identities = 71/75 (94%), Gaps = 1/75 (1%) Strand=Plus/Plus ... Band MR, Larson JH, Rebeiz M, Green CA, Heyen DW, Donovan J,.

  10. Functional analysis of bipartite begomovirus coat protein promoter sequences

    International Nuclear Information System (INIS)

    Lacatus, Gabriela; Sunter, Garry

    2008-01-01

    We demonstrate that the AL2 gene of Cabbage leaf curl virus (CaLCuV) activates the CP promoter in mesophyll and acts to derepress the promoter in vascular tissue, similar to that observed for Tomato golden mosaic virus (TGMV). Binding studies indicate that sequences mediating repression and activation of the TGMV and CaLCuV CP promoter specifically bind different nuclear factors common to Nicotiana benthamiana, spinach and tomato. However, chromatin immunoprecipitation demonstrates that TGMV AL2 can interact with both sequences independently. Binding of nuclear protein(s) from different crop species to viral sequences conserved in both bipartite and monopartite begomoviruses, including TGMV, CaLCuV, Pepper golden mosaic virus and Tomato yellow leaf curl virus suggests that bipartite begomoviruses bind common host factors to regulate the CP promoter. This is consistent with a model in which AL2 interacts with different components of the cellular transcription machinery that bind viral sequences important for repression and activation of begomovirus CP promoters

  11. The DNA sequence, annotation and analysis of human chromosome 3

    DEFF Research Database (Denmark)

    Muzny, D.M.; Bolund, Lars; As part of the Chinese Human Genome Sequencing Consortium, E.T.A.L.

    2006-01-01

    as numerous loci involved in multiple human cancers such as the gene encoding FHIT, which contains the most common constitutive fragile site in the genome, FRA3B. Using genomic sequence from chimpanzee and rhesus macaque, we were able to characterize the breakpoints defining a large pericentric inversion...

  12. Sequence analysis of mitochondrial 16S ribosomal RNA gene

    Indian Academy of Sciences (India)

    Mosquitoes are vectors for the transmission of many human pathogens that include viruses, nematodes and protozoa. For the understanding of their vectorial capacity, identification of disease carrying and refractory strains is essential. Recently, molecular taxonomic techniques have been utilized for this purpose. Sequence ...

  13. Illumina-based de novo transcriptome sequencing and analysis

    Indian Academy of Sciences (India)

    In the present study, we used Illumina HiSeq technology to perform de novo assembly of heart and musk gland transcriptomes from the Chinese forest musk deer. A total of 239,383 transcripts and 176,450 unigenes were obtained, of which 37,329 unigenes were matched to known sequences in the NCBI nonredundant ...

  14. Generation and analysis of expressed sequence tags from Botrytis cinerea

    Directory of Open Access Journals (Sweden)

    EVELYN SILVA

    2006-01-01

    Full Text Available Botrytis cinerea is a filamentous plant pathogen of a wide range of plant species, and its infection may cause enormous damage both during plant growth and in the post-harvest phase. We have constructed a cDNA library from an isolate of B. cinerea and have sequenced 11,482 expressed sequence tags that were assembled into 1,003 contigs sequences and 3,032 singletons. Approximately 81% of the unigenes showed significant similarity to genes coding for proteins with known functions: more than 50% of the sequences code for genes involved in cellular metabolism, 12% for transport of metabolites, and approximately 10% for cellular organization. Other functional categories include responses to biotic and abiotic stimuli, cell communication, cell homeostasis, and cell development. We carried out pair-wise comparisons with fungal databases to determine the B. cinerea unisequence set with relevant similarity to genes in other fungal pathogenic counterparts. Among the 4,035 non-redundant B. cinerea unigenes, 1,338 (23% have significant homology with Fusarium verticillioides unigenes. Similar values were obtained for Saccharomyces cerevisiae and Aspergillus nidulans (22% and 24%, respectively. The lower percentages of homology were with Magnaporthe grisae and Neurospora crassa (13% and 19%, respectively. Several genes involved in putative and known fungal virulence and general pathogenicity were identified. The results provide important information for future research on this fungal pathogen

  15. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  16. DNA sequence and prokaryotic expression analysis of vitellogenin ...

    African Journals Online (AJOL)

    In this study, the DNA sequence of vitellogenin from Antheraea pernyi (Ap-Vg) was identified and its functional domain (30-740 aa, Ap-Vg-1) was expressed in Escherichia coli BL21 (DE3) cells. The recombinant Ap-Vg-1 proteins were purified and used for antibody preparation. The results showed that the intact DNA ...

  17. Molecular cloning, sequence analysis and structure prediction of the ...

    African Journals Online (AJOL)

    AJL

    2012-04-19

    Apr 19, 2012 ... The primers were based on the rBAT sequences of other animals deposited in GenBank. .... fragment; M1, 2000 bp DNA ladder; M2, 1000 bp DNA ladder. spliced to obtain the ..... A traffic signal for heterodimeric amino acid.

  18. A bibliometric analysis of global research on genome sequencing ...

    African Journals Online (AJOL)

    The results show that disease and protein related researches were the leading research focuses, and comparative genomics and evolution related research had strong potential in the near future. Key words: Genome sequencing, research trend, scientometrics, science citation index expanded (SCI-Expanded), word cluster ...

  19. Cloning and sequence analysis of the defective in anther ...

    African Journals Online (AJOL)

    To clone the defective in anther dehiscence1 (DAD1) gene fragment of Chinese kale, about 700 bp product was obtained by PCR amplification using Chinese kale genomic DNA as the template and a pair of specific primers designed according to the conserved sequence of DAD1 genes of Arabidopsis thaliana and ...

  20. Sequence and comparative analysis of Leuconostoc dairy bacteriophages

    DEFF Research Database (Denmark)

    Kot, Witold; Hansen, Lars Henrik; Neve, Horst

    2014-01-01

    Bacteriophages attacking Leuconostoc species may significantly influence the quality of the final product. There is however limited knowledge of this group of phages in the literature. We have determined the complete genome sequences of nine Leuconostoc bacteriophages virulent to either Leuconostoc...

  1. Recent Vs. Historical Seismicity Analysis For Banat Seismic Region (Western Part Of Romania

    Directory of Open Access Journals (Sweden)

    Oros Eugen

    2015-03-01

    Full Text Available The present day seismic activity from a region reflects the active tectonics and can confirm the seismic potential of the seismogenic sources as they are modelled using the historical seismicity. This paper makes a comparative analysis of the last decade seismicity recorded in the Banat Seismic Region (western part of Romania and the historical seismicity of the region (Mw≥4.0. Four significant earthquake sequences have been recently localized in the region, three of them nearby the city of Timisoara (January 2012 and March 2013 and the fourth within Hateg Basin, South Carpathians (October 2013. These sequences occurred within the epicentral areas of some strong historical earthquakes (Mw≥5.0. The main events had some macroseismic effects on people up to some few kilometers from the epicenters. Our results update the Romanian earthquakes catalogue and bring new information along the local seismic hazard sources models and seismotectonics.

  2. In silico Analysis of 3′-End-Processing Signals in Aspergillus oryzae Using Expressed Sequence Tags and Genomic Sequencing Data

    Science.gov (United States)

    Tanaka, Mizuki; Sakai, Yoshifumi; Yamada, Osamu; Shintani, Takahiro; Gomi, Katsuya

    2011-01-01

    To investigate 3′-end-processing signals in Aspergillus oryzae, we created a nucleotide sequence data set of the 3′-untranslated region (3′ UTR) plus 100 nucleotides (nt) sequence downstream of the poly(A) site using A. oryzae expressed sequence tags and genomic sequencing data. This data set comprised 1065 sequences derived from 1042 unique genes. The average 3′ UTR length in A. oryzae was 241 nt, which is greater than that in yeast but similar to that in plants. The 3′ UTR and 100 nt sequence downstream of the poly(A) site is notably U-rich, while the region located 15–30 nt upstream of the poly(A) site is markedly A-rich. The most frequently found hexanucleotide in this A-rich region is AAUGAA, although this sequence accounts for only 6% of all transcripts. These data suggested that A. oryzae has no highly conserved sequence element equivalent to AAUAAA, a mammalian polyadenylation signal. We identified that putative 3′-end-processing signals in A. oryzae, while less well conserved than those in mammals, comprised four sequence elements: the furthest upstream U-rich element, A-rich sequence, cleavage site, and downstream U-rich element flanking the cleavage site. Although these putative 3′-end-processing signals are similar to those in yeast and plants, some notable differences exist between them. PMID:21586533

  3. XplorSeq: a software environment for integrated management and phylogenetic analysis of metagenomic sequence data.

    Science.gov (United States)

    Frank, Daniel N

    2008-10-07

    Advances in automated DNA sequencing technology have accelerated the generation of metagenomic DNA sequences, especially environmental ribosomal RNA gene (rDNA) sequences. As the scale of rDNA-based studies of microbial ecology has expanded, need has arisen for software that is capable of managing, annotating, and analyzing the plethora of diverse data accumulated in these projects. XplorSeq is a software package that facilitates the compilation, management and phylogenetic analysis of DNA sequences. XplorSeq was developed for, but is not limited to, high-throughput analysis of environmental rRNA gene sequences. XplorSeq integrates and extends several commonly used UNIX-based analysis tools by use of a Macintosh OS-X-based graphical user interface (GUI). Through this GUI, users may perform basic sequence import and assembly steps (base-calling, vector/primer trimming, contig assembly), perform BLAST (Basic Local Alignment and Search Tool; 123) searches of NCBI and local databases, create multiple sequence alignments, build phylogenetic trees, assemble Operational Taxonomic Units, estimate biodiversity indices, and summarize data in a variety of formats. Furthermore, sequences may be annotated with user-specified meta-data, which then can be used to sort data and organize analyses and reports. A document-based architecture permits parallel analysis of sequence data from multiple clones or amplicons, with sequences and other data stored in a single file. XplorSeq should benefit researchers who are engaged in analyses of environmental sequence data, especially those with little experience using bioinformatics software. Although XplorSeq was developed for management of rDNA sequence data, it can be applied to most any sequencing project. The application is available free of charge for non-commercial use at http://vent.colorado.edu/phyloware.

  4. Genome sequencing and analysis of BCG vaccine strains.

    Directory of Open Access Journals (Sweden)

    Wen Zhang

    Full Text Available BACKGROUND: Although the Bacillus Calmette-Guérin (BCG vaccine against tuberculosis (TB has been available for more than 75 years, one third of the world's population is still infected with Mycobacterium tuberculosis and approximately 2 million people die of TB every year. To reduce this immense TB burden, a clearer understanding of the functional genes underlying the action of BCG and the development of new vaccines are urgently needed. METHODS AND FINDINGS: Comparative genomic analysis of 19 M. tuberculosis complex strains showed that BCG strains underwent repeated human manipulation, had higher region of deletion rates than those of natural M. tuberculosis strains, and lost several essential components such as T-cell epitopes. A total of 188 BCG strain T-cell epitopes were lost to various degrees. The non-virulent BCG Tokyo strain, which has the largest number of T-cell epitopes (359, lost 124. Here we propose that BCG strain protection variability results from different epitopes. This study is the first to present BCG as a model organism for genetics research. BCG strains have a very well-documented history and now detailed genome information. Genome comparison revealed the selection process of BCG strains under human manipulation (1908-1966. CONCLUSIONS: Our results revealed the cause of BCG vaccine strain protection variability at the genome level and supported the hypothesis that the restoration of lost BCG Tokyo epitopes is a useful future vaccine development strategy. Furthermore, these detailed BCG vaccine genome investigation results will be useful in microbial genetics, microbial engineering and other research fields.

  5. Genotyping-by-Sequencing Analysis for Determining Population Structure of Finger Millet Germplasm of Diverse Origins

    Directory of Open Access Journals (Sweden)

    Anil Kumar

    2016-07-01

    Full Text Available Finger millet [ (L. Gaertn.] is grown mainly by subsistence farmers in arid and semiarid regions of the world. To broaden its genetic base and to boost its production, it is of paramount importance to characterize and genotype the diverse gene pool of this important food and nutritional security crop. However, as a result of nonavailability of the genome sequence of finger millet, the progress could not be made in realizing the molecular basis of unique qualities of the crop. In the present investigation, attempts have been made to characterize the genetically diverse collection of 113 finger millet accessions through whole-genome genotyping-by-sequencing (GBS, which resulted in a genome-wide set of 23,000 single-nucleotide polymorphisms (SNPs segregating across the entire collection and several thousand SNPs segregating within every accession. A model-based population structure analysis reveals the presence of three subpopulations among the finger millet accessions, which are in parallel with the results of phylogenetic analysis. The observed population structure is consistent with the hypothesis that finger millet was domesticated first in Africa, and from there it was introduced to India some 3000 yr ago. A total of 1128 gene ontology (GO terms were assigned to SNP-carrying genes for three main categories: biological process, cellular component, and molecular function. Facilitated access to high-throughput genotyping and sequencing technologies are likely to improve the breeding process in developing countries, and as such, this data will be very useful to breeders who are working for the genetic improvement of finger millet.

  6. Genotyping-by-Sequencing Analysis for Determining Population Structure of Finger Millet Germplasm of Diverse Origins.

    Science.gov (United States)

    Kumar, Anil; Sharma, Divya; Tiwari, Apoorv; Jaiswal, J P; Singh, N K; Sood, Salej

    2016-07-01

    Finger millet [ (L.) Gaertn.] is grown mainly by subsistence farmers in arid and semiarid regions of the world. To broaden its genetic base and to boost its production, it is of paramount importance to characterize and genotype the diverse gene pool of this important food and nutritional security crop. However, as a result of nonavailability of the genome sequence of finger millet, the progress could not be made in realizing the molecular basis of unique qualities of the crop. In the present investigation, attempts have been made to characterize the genetically diverse collection of 113 finger millet accessions through whole-genome genotyping-by-sequencing (GBS), which resulted in a genome-wide set of 23,000 single-nucleotide polymorphisms (SNPs) segregating across the entire collection and several thousand SNPs segregating within every accession. A model-based population structure analysis reveals the presence of three subpopulations among the finger millet accessions, which are in parallel with the results of phylogenetic analysis. The observed population structure is consistent with the hypothesis that finger millet was domesticated first in Africa, and from there it was introduced to India some 3000 yr ago. A total of 1128 gene ontology (GO) terms were assigned to SNP-carrying genes for three main categories: biological process, cellular component, and molecular function. Facilitated access to high-throughput genotyping and sequencing technologies are likely to improve the breeding process in developing countries, and as such, this data will be very useful to breeders who are working for the genetic improvement of finger millet. Copyright © 2016 Crop Science Society of America.

  7. Utilization of a cloned alphoid repeating sequence of human DNA in the study of polymorphism of chromosomal heterochromatin regions

    International Nuclear Information System (INIS)

    Kruminya, A.R.; Kroshkina, V.G.; Yurov, Yu.B.; Aleksandrov, I.A.; Mitkevich, S.P.; Gindilis, V.M.

    1988-01-01

    The chromosomal distribution of the cloned PHS05 fragment of human alphoid DNA was studied by in situ hybridization in 38 individuals. It was shown that this DNA fraction is primarily localized in the pericentric regions of practically all chromosomes of the set. Significant interchromosomal differences and a weakly expressed interindividual polymorphism were discovered in the copying ability of this class of repeating DNA sequences; associations were not found between the results of hybridization and the pattern of Q-polymorphism

  8. MOVES2010a regional level sensitivity analysis

    Science.gov (United States)

    2012-12-10

    This document discusses the sensitivity of various input parameter effects on emission rates using the US Environmental Protection Agencys (EPAs) MOVES2010a model at the regional level. Pollutants included in the study are carbon monoxide (CO),...

  9. A methodology for the data energy regional consumption consistency analysis

    International Nuclear Information System (INIS)

    Canavarros, Otacilio Borges; Silva, Ennio Peres da

    1999-01-01

    The article introduces a methodology for data energy regional consumption consistency analysis. The work was going based on recent studies accomplished by several cited authors and boarded Brazilian matrices and Brazilian energetics regional balances. The results are compared and analyzed

  10. Sequence analysis of dolphin ferritin H and L subunits and possible iron-dependent translational control of dolphin ferritin gene

    Directory of Open Access Journals (Sweden)

    Sasaki Yukako

    2008-10-01

    Full Text Available Abstract Background Iron-storage protein, ferritin plays a central role in iron metabolism. Ferritin has dual function to store iron and segregate iron for protection of iron-catalyzed reactive oxygen species. Tissue ferritin is composed of two kinds of subunits (H: heavy chain or heart-type subunit; L: light chain or liver-type subunit. Ferritin gene expression is controlled at translational level in iron-dependent manner or at transcriptional level in iron-independent manner. However, sequencing analysis of marine mammalian ferritin subunits has not yet been performed fully. The purpose of this study is to reveal cDNA-derived amino acid sequences of cetacean ferritin H and L subunits, and demonstrate the possibility of expression of these subunits, especially H subunit, by iron. Methods Sequence analyses of cetacean ferritin H and L subunits were performed by direct sequencing of polymerase chain reaction (PCR fragments from cDNAs generated via reverse transcription-PCR of leukocyte total RNA prepared from blood samples of six different dolphin species (Pseudorca crassidens, Lagenorhynchus obliquidens, Grampus griseus, Globicephala macrorhynchus, Tursiops truncatus, and Delphinapterus leucas. The putative iron-responsive element sequence in the 5'-untranslated region of the six different dolphin species was revealed by direct sequencing of PCR fragments obtained using leukocyte genomic DNA. Results Dolphin H and L subunits consist of 182 and 174 amino acids, respectively, and amino acid sequence identities of ferritin subunits among these dolphins are highly conserved (H: 99–100%, (99→98 ; L: 98–100%. The conserved 28 bp IRE sequence was located -144 bp upstream from the initiation codon in the six different dolphin species. Conclusion These results indicate that six different dolphin species have conserved ferritin sequences, and suggest that these genes are iron-dependently expressed.

  11. Sequencing the CHO DXB11 genome reveals regional variations in genomic stability and haploidy

    DEFF Research Database (Denmark)

    Kaas, Christian Schrøder; Kristensen, Claus; Betenbaugh, Michael J.

    2015-01-01

    Background: The DHFR negative CHO DXB11 cell line (also known as DUX-B11 and DUKX) was historically the first CHO cell line to be used for large scale production of heterologous proteins and is still used for production of a number of complex proteins.  Results: Here we present the genomic sequence...... of the CHO DXB11 genome sequenced to a depth of 33x. Overall a significant genomic drift was seen favoring GC -> AT point mutations in line with the chemical mutagenesis strategy used for generation of the cell line. The sequencing depth for each gene in the genome revealed distinct peaks at sequencing...... in eight additional analyzed CHO genomes (15-20% haploidy) but not in the genome of the Chinese hamster. The dhfr gene is confirmed to be haploid in CHO DXB11; transcriptionally active and the remaining allele contains a G410C point mutation causing a Thr137Arg missense mutation. We find similar to 2...

  12. An atlas of over 90.000 conserved noncoding sequences provides insight into crucifer regulatory regions

    NARCIS (Netherlands)

    Haudry, A.; Platts, A.E.; Vello, E.; Hoen, D.R.; Leclerq, M.; Williamson, R.J.; Forczek, E.; Joly-Lopez, Z.; Steffen, J.G.; Hazzouri, K.M.; Dewar, K.; Stinchcombe, J.R.; Schoen, D.J.; Wang, X.; Schmutz, J.; Town, C.D.; Edger, P.P.; Pires, J.C.; Schumaker, K.S.; Jarvis, D.E.; Mandakova, T.; Lysak, M.; Bergh, van den E.; Schranz, M.E.; Harrison, P.M.

    2013-01-01

    Despite the central importance of noncoding DNA to gene regulation and evolution, understanding of the extent of selection on plant noncoding DNA remains limited compared to that of other organisms. Here we report sequencing of genomes from three Brassicaceae species (Leavenworthia alabamica,

  13. Sequence analysis of the Legionella micdadei groELS operon

    DEFF Research Database (Denmark)

    Hindersson, P; Høiby, N; Bangsborg, Jette Marie

    1991-01-01

    A 2.7 kb DNA fragment encoding the 60 kDa common antigen (CA) and a 13 kDa protein of Legionella micdadei was sequenced. Two open reading frames of 57,677 and 10,456 Da were identified, corresponding to the heat shock proteins GroEL and GroES, respectively. Typical -35, -10, and Shine-Dalgarno heat...

  14. The Matrix Method of Representation, Analysis and Classification of Long Genetic Sequences

    Directory of Open Access Journals (Sweden)

    Ivan V. Stepanyan

    2017-01-01

    Full Text Available The article is devoted to a matrix method of comparative analysis of long nucleotide sequences by means of presenting each sequence in the form of three digital binary sequences. This method uses a set of symmetries of biochemical attributes of nucleotides. It also uses the possibility of presentation of every whole set of N-mers as one of the members of a Kronecker family of genetic matrices. With this method, a long nucleotide sequence can be visually represented as an individual fractal-like mosaic or another regular mosaic of binary type. In contrast to natural nucleotide sequences, artificial random sequences give non-regular patterns. Examples of binary mosaics of long nucleotide sequences are shown, including cases of human chromosomes and penicillins. The obtained results are then discussed.

  15. Ribosomal DNA sequence analysis of different geographically distributed Aloe Vera plants: Comparison with clonally regenerated plants

    International Nuclear Information System (INIS)

    Yagi, A.; Sato, Y.; Miwa, Y.; Kabbash, A.; Moustafa, S.; Shimomura, K.; El-Bassuony, A.

    2006-01-01

    A comparison of the sequences in an internally transcribed spacer (ITS) 1 region of rDNA between clonally regenerated A.vera and same species in Japan, USA and Egypt revealed the presence of two types of nucleotide sequences, 252 and 254 bps. Based on the findings in the ITS 1 region, A.vera having 252 and 254 bps clearly showed a stable sequence similarity, suggesting high conversation of the base peak sequence in the ITS 1 region. However, frequent base substitutions in the 252 bps samples leaves that came from callus tissue and micropropagated plants were observed around the regions of nucleotide positions 66, 99 and 199-201. The minor deviation in clonally regenerated A.vera may be due to the stage of regeneration and cell specification in cases of the callus tissue. In the present study, the base peak sequence of the Its 1 region of rDNA was adopted as a molecular marker for differentiating A.vera plants from geographically distributed and clonally regenerated A.vera plants and it was suggested that the base peak substitutions in the ITS 1 region may arise from the different nutritional and environmental factors in cultivation and plant growth stages. (author)

  16. OPTSDNA: Performance evaluation of an efficient distributed bioinformatics system for DNA sequence analysis.

    Science.gov (United States)

    Khan, Mohammad Ibrahim; Sheel, Chotan

    2013-01-01

    Storage of sequence data is a big concern as the amount of data generated is exponential in nature at several locations. Therefore, there is a need to develop techniques to store data using compression algorithm. Here we describe optimal storage algorithm (OPTSDNA) for storing large amount of DNA sequences of varying length. This paper provides performance analysis of optimal storage algorithm (OPTSDNA) of a distributed bioinformatics computing system for analysis of DNA sequences. OPTSDNA algorithm is used for storing various sizes of DNA sequences into database. DNA sequences of different lengths were stored by using this algorithm. These input DNA sequences are varied in size from very small to very large. Storage size is calculated by this algorithm. Response time is also calculated in this work. The efficiency and performance of the algorithm is high (in size calculation with percentage) when compared with other known with sequential approach.

  17. Progenitor-derivative relationships of Hordeum polyploids (Poaceae, Triticeae inferred from sequences of TOPO6, a nuclear low-copy gene region.

    Directory of Open Access Journals (Sweden)

    Jonathan Brassac

    Full Text Available Polyploidization is a major mechanism of speciation in plants. Within the barley genus Hordeum, approximately half of the taxa are polyploids. While for diploid species a good hypothesis of phylogenetic relationships exists, there is little information available for the polyploids (4×, 6× of Hordeum. Relationships among all 33 diploid and polyploid Hordeum species were analyzed with the low-copy nuclear marker region TOPO6 for 341 Hordeum individuals and eight outgroup species. PCR products were either directly sequenced or cloned and on average 12 clones per individual were included in phylogenetic analyses. In most diploid Hordeum species TOPO6 is probably a single-copy locus. Most sequences found in polyploid individuals phylogenetically cluster together with sequences derived from diploid species and thus allow the identification of parental taxa of polyploids. Four groups of sequences occurring only in polyploid taxa are interpreted as footprints of extinct diploid taxa, which contributed to allopolyploid evolution. Our analysis identifies three key species involved in the evolution of the American polyploids of the genus. (i All but one of the American tetraploids have a TOPO6 copy originating from the Central Asian diploid H. roshevitzii, the second copy clustering with different American diploid species. (ii All hexaploid species from the New World have a copy of an extinct close relative of H. californicum and (iii possess the TOPO6 sequence pattern of tetraploid H. jubatum, each with an additional copy derived from different American diploids. Tetraploid H. bulbosum is an autopolyploid, while the assumed autopolyploid H. brevisubulatum (4×, 6× was identified as allopolyploid throughout most of its distribution area. The use of a proof-reading DNA polymerase in PCR reduced the proportion of chimerical sequences in polyploids in comparison to Taq polymerase.

  18. Analysis of xylem formation in pine by cDNA sequencing

    Science.gov (United States)

    Allona, I.; Quinn, M.; Shoop, E.; Swope, K.; St Cyr, S.; Carlis, J.; Riedl, J.; Retzel, E.; Campbell, M. M.; Sederoff, R.; hide

    1998-01-01

    Secondary xylem (wood) formation is likely to involve some genes expressed rarely or not at all in herbaceous plants. Moreover, environmental and developmental stimuli influence secondary xylem differentiation, producing morphological and chemical changes in wood. To increase our understanding of xylem formation, and to provide material for comparative analysis of gymnosperm and angiosperm sequences, ESTs were obtained from immature xylem of loblolly pine (Pinus taeda L.). A total of 1,097 single-pass sequences were obtained from 5' ends of cDNAs made from gravistimulated tissue from bent trees. Cluster analysis detected 107 groups of similar sequences, ranging in size from 2 to 20 sequences. A total of 361 sequences fell into these groups, whereas 736 sequences were unique. About 55% of the pine EST sequences show similarity to previously described sequences in public databases. About 10% of the recognized genes encode factors involved in cell wall formation. Sequences similar to cell wall proteins, most known lignin biosynthetic enzymes, and several enzymes of carbohydrate metabolism were found. A number of putative regulatory proteins also are represented. Expression patterns of several of these genes were studied in various tissues and organs of pine. Sequencing novel genes expressed during xylem formation will provide a powerful means of identifying mechanisms controlling this important differentiation pathway.

  19. MiSeq: A Next Generation Sequencing Platform for Genomic Analysis.

    Science.gov (United States)

    Ravi, Rupesh Kanchi; Walton, Kendra; Khosroheidari, Mahdieh

    2018-01-01

    MiSeq, Illumina's integrated next generation sequencing instrument, uses reversible-terminator sequencing-by-synthesis technology to provide end-to-end sequencing solutions. The MiSeq instrument is one of the smallest benchtop sequencers that can perform onboard cluster generation, amplification, genomic DNA sequencing, and data analysis, including base calling, alignment and variant calling, in a single run. It performs both single- and paired-end runs with adjustable read lengths from 1 × 36 base pairs to 2 × 300 base pairs. A single run can produce output data of up to 15 Gb in as little as 4 h of runtime and can output up to 25 M single reads and 50 M paired-end reads. Thus, MiSeq provides an ideal platform for rapid turnaround time. MiSeq is also a cost-effective tool for various analyses focused on targeted gene sequencing (amplicon sequencing and target enrichment), metagenomics, and gene expression studies. For these reasons, MiSeq has become one of the most widely used next generation sequencing platforms. Here, we provide a protocol to prepare libraries for sequencing using the MiSeq instrument and basic guidelines for analysis of output data from the MiSeq sequencing run.

  20. Sequences within both the 5' untranslated region and the Gag gene are important for efficient encapsidation of Mason-Pfizer monkey virus RNA

    International Nuclear Information System (INIS)

    Schmidt, Russell D.; Mustafa, Farah; Lew, Kathy A.; Browning, Mathew T.; Rizvi, Tahir A.

    2003-01-01

    It has previously been shown that the 5' untranslated leader region (UTR), including about 495 bp of the gag gene, is sufficient for the efficient encapsidation and propagation of Mason-Pfizer monkey virus (MPMV) based retroviral vectors. In addition, a deletion upstream of the major splice donor, SD, has been shown to adversely affect MPMV RNA packaging. However, the precise sequence requirement for the encapsidation of MPMV genomic RNA within the 5' UTR and gag remains largely unknown. In this study, we have used a systematic deletion analysis of the 5' UTR and gag gene to define the cis-acting sequences responsible for efficient MPMV RNA packaging. Using an in vivo packaging and transduction assay, our results reveal that the MPMV packaging signal is primarily found within the first 30 bp immediately downstream of the primer binding site. However, its function is dependent upon the presence of the last 23 bp of the 5' UTR and approximately the first 100 bp of the gag gene. Thus, sequences that affect MPMV RNA packaging seem to reside both upstream and downstream of the major splice donor with the downstream region responsible for the efficient functioning of the upstream primary packaging determinant

  1. Sequence variation and phylogenetic analysis of envelope glycoprotein of hepatitis G virus.

    Science.gov (United States)

    Lim, M Y; Fry, K; Yun, A; Chong, S; Linnen, J; Fung, K; Kim, J P

    1997-11-01

    A transfusion-transmissible agent provisionally designated hepatitis G virus (HGV) was recently identified. In this study, we examined the variability of the HGV genome by analysing sequences in the putative envelope region from 72 isolates obtained from diverse geographical sources. The 1561 nucleotide sequence of the E1/E2/NS2a region of HGV was determined from 12 isolates, and compared with three published sequences. The most variability was observed in 400 nucleotides at the N terminus of E2. We next analysed this 400 nucleotide envelope variable region (EV) from an additional 60 HGV isolates. This sequence varied considerably among the 75 isolates, with overall identity ranging from 79.3% to 99.5% at the nucleotide level, and from 83.5% to 100% at the amino acid level. However, hypervariable regions were not identified. Phylogenetic analyses indicated that the 75 HGV isolates belong to a single genotype. A single-tier distribution of evolutionary distances was observed among the 15 E1/E2/NS2a sequences and the 75 EV sequences. In contrast, 11 isolates of HCV were analysed and showed a three-tiered distribution, representing genotypes, subtypes, and isolates. The 75 isolates of HGV fell into four clusters on the phylogenetic tree. Tight geographical clustering was observed among the HGV isolates from Japan and Korea.

  2. Automatic gallbladder segmentation using combined 2D and 3D shape features to perform volumetric analysis in native and secretin-enhanced MRCP sequences.

    Science.gov (United States)

    Gloger, Oliver; Bülow, Robin; Tönnies, Klaus; Völzke, Henry

    2017-11-24

    We aimed to develop the first fully automated 3D gallbladder segmentation approach to perform volumetric analysis in volume data of magnetic resonance (MR) cholangiopancreatography (MRCP) sequences. Volumetric gallbladder analysis is performed for non-contrast-enhanced and secretin-enhanced MRCP sequences. Native and secretin-enhanced MRCP volume data were produced with a 1.5-T MR system. Images of coronal maximum intensity projections (MIP) are used to automatically compute 2D characteristic shape features of the gallbladder in the MIP images. A gallbladder shape space is generated to derive 3D gallbladder shape features, which are then combined with 2D gallbladder shape features in a support vector machine approach to detect gallbladder regions in MRCP volume data. A region-based level set approach is used for fine segmentation. Volumetric analysis is performed for both sequences to calculate gallbladder volume differences between both sequences. The approach presented achieves segmentation results with mean Dice coefficients of 0.917 in non-contrast-enhanced sequences and 0.904 in secretin-enhanced sequences. This is the first approach developed to detect and segment gallbladders in MR-based volume data automatically in both sequences. It can be used to perform gallbladder volume determination in epidemiological studies and to detect abnormal gallbladder volumes or shapes. The positive volume differences between both sequences may indicate the quantity of the pancreatobiliary reflux.

  3. Microwave-assisted acid and base hydrolysis of intact proteins containing disulfide bonds for protein sequence analysis by mass spectrometry.

    Science.gov (United States)

    Reiz, Bela; Li, Liang

    2010-09-01

    Controlled hydrolysis of proteins to generate peptide ladders combined with mass spectrometric analysis of the resultant peptides can be used for protein sequencing. In this paper, two methods of improving the microwave-assisted protein hydrolysis process are described to enable rapid sequencing of proteins containing disulfide bonds and increase sequence coverage, respectively. It was demonstrated that proteins containing disulfide bonds could be sequenced by MS analysis by first performing hydrolysis for less than 2 min, followed by 1 h of reduction to release the peptides originally linked by disulfide bonds. It was shown that a strong base could be used as a catalyst for microwave-assisted protein hydrolysis, producing complementary sequence information to that generated by microwave-assisted acid hydrolysis. However, using either acid or base hydrolysis, amide bond breakages in small regions of the polypeptide chains of the model proteins (e.g., cytochrome c and lysozyme) were not detected. Dynamic light scattering measurement of the proteins solubilized in an acid or base indicated that protein-protein interaction or aggregation was not the cause of the failure to hydrolyze certain amide bonds. It was speculated that there were some unknown local structures that might play a role in preventing an acid or base from reacting with the peptide bonds therein. 2010 American Society for Mass Spectrometry. Published by Elsevier Inc. All rights reserved.

  4. Genome-Wide Prediction and Analysis of 3D-Domain Swapped Proteins in the Human Genome from Sequence Information.

    Science.gov (United States)

    Upadhyay, Atul Kumar; Sowdhamini, Ramanathan

    2016-01-01

    3D-domain swapping is one of the mechanisms of protein oligomerization and the proteins exhibiting this phenomenon have many biological functions. These proteins, which undergo domain swapping, have acquired much attention owing to their involvement in human diseases, such as conformational diseases, amyloidosis, serpinopathies, proteionopathies etc. Early realisation of proteins in the whole human genome that retain tendency to domain swap will enable many aspects of disease control management. Predictive models were developed by using machine learning approaches with an average accuracy of 78% (85.6% of sensitivity, 87.5% of specificity and an MCC value of 0.72) to predict putative domain swapping in protein sequences. These models were applied to many complete genomes with special emphasis on the human genome. Nearly 44% of the protein sequences in the human genome were predicted positive for domain swapping. Enrichment analysis was performed on the positively predicted sequences from human genome for their domain distribution, disease association and functional importance based on Gene Ontology (GO). Enrichment analysis was also performed to infer a better understanding of the functional importance of these sequences. Finally, we developed hinge region prediction, in the given putative domain swapped sequence, by using important physicochemical properties of amino acids.

  5. Molecular evolution and diversification of snake toxin genes, revealed by analysis of intron sequences.

    Science.gov (United States)

    Fujimi, T J; Nakajyo, T; Nishimura, E; Ogura, E; Tsuchiya, T; Tamiya, T

    2003-08-14

    The genes encoding erabutoxin (short chain neurotoxin) isoforms (Ea, Eb, and Ec), LsIII (long chain neurotoxin) and a novel long chain neurotoxin pseudogene were cloned from a Laticauda semifasciata genomic library. Short and long chain neurotoxin genes were also cloned from the genome of Laticauda laticaudata, a closely related species of L. semifasciata, by PCR. A putative matrix attached region (MAR) sequence was found in the intron I of the LsIII gene. Comparative analysis of 11 structurally relevant snake toxin genes (three-finger-structure toxins) revealed the molecular evolution of these toxins. Three-finger-structure toxin genes diverged from a common ancestor through two types of evolutionary pathways (long and short types), early in the course of evolution. At a later stage of evolution in each gene, the accumulation of mutations in the exons, especially exon II, by accelerated evolution may have caused the increased diversification in their functions. It was also revealed that the putative MAR sequence found in the LsIII gene was integrated into the gene after the species-level divergence.

  6. Confirmation and Sequence analysis of N gene of PPRV in South Xinjiang, China

    Directory of Open Access Journals (Sweden)

    YongHong Liu

    Full Text Available ABSTRACT In China, Peste des petits ruminants (PPR was officially first reported in 2007. From 2010 until the outbreak of 2013, PPRV infection was not reported. In November 2013, PPRV re-emerged in Xinjiang and rapidly spread to 22 P/A/M (provinces, autonomous regions and municipalities of China. In the study, suspected PPRV-infected sheep in a breeding farm of South Xinjiang in 2014 were diagnosed and the characteristics of complete sequence of N protein gene of PPRV was analyzed. The sheep showed PPRV-infected signs, such as fever, orinasal secretions increase, dyspnea and diarrhea, with 60% of morbidity and 21.1% of fatality rate. The macroscopic lesions after autopsy and histopathological changes were observed under light microscope including stomatitis, broncho-interstitial pneumonia, catarrhal hemorrhagic enteritis and intracytoplasmic eosinophilic inclusions in multinucleated giantcell in lung. The formalin-fixed mixed tissues samples were positive by nucleic acid extraction and RT-PCR detection. The nucleotide of N protein gene of China/XJNJ/2014 strain was extremely high homology with the China/XJYL/2013 strain, and the highest with PRADESH_95 strain from India in exotic strains. Phylogenetic analysis based on complete sequence of N protein gene of PPRV showed that the China/XJNJ/2014 strain, other strain of 2013-2014 in this study and Tibetan strains all belonged to lineage Ⅳ, but the PPRV strains of 2013-2014 in this study and Tibetan strains were in different sub-branches.

  7. [Influence of PCR cycle number on microbial diversity analysis through next generation sequencing].

    Science.gov (United States)

    An, Yunhe; Gao, Lijuan; Li, Junbo; Tian, Yanjie; Wang, Jinlong; Zheng, Xuejuan; Wu, Huijuan

    2016-08-25

    Using of high throughput sequencing technology to study the microbial diversity in complex samples has become one of the hottest issues in the field of microbial diversity research. In this study, the soil and sheep rumen chyme samples were used to extract DNA, respectively. Then the 25 ng total DNA was used to amplify the 16S rRNA V3 region with 20, 25, 30 PCR cycles, and the final sequencing library was constructed by mixing equal amounts of purified PCR products. Finally, the operational taxonomic unit (OUT) amount, rarefaction curve, microbial number and species were compared through data analysis. It was found that at the same amount of DNA template, the proportion of the community composition was not the best with more numbers of PCR cycle, although the species number was much more. In all, when the PCR cycle number is 25, the number of species and proportion of the community composition were the most optimal both in soil or chyme samples.

  8. Sequence analysis of DBL2β domain of vargene of Indonesian Plasmodium falciparum

    Science.gov (United States)

    Sulistyaningsih, E.; Romadhon, B. D.; Palupi, I.; Hidayah, F.; Dewi, R.; Prasetyo, A.

    2018-03-01

    Malaria is a major health problem in tropical countries including Indonesia. The most deadly agent is Plasmodium falciparum. In P. falciparum infection, PfEMP1 is supposed to play an important role in the pathogenesis of malaria. PfEMP1 is encoded by var gene family, it is a polymorphic protein where the extra-cellular portion contains of three distinct binding domains: Duffy binding-like (DBL), Cysteine-rich interdomain regions (CIDR) and C2. PfEMP1 varies in domain composition and binding specificity. The study explored the characteristic of Indonesian DBL2β-var genes and investigated its role to the malaria outcome. Twenty blood samples from clinically mild to severe malaria patients in Jember, East Java were collected for DNA extraction. Diagnosis was confirmed by Giemsa-stained thick blood smear. PCR was conducted using specific primer targeting on the full-length of DBL2ß and resulted approximately single band of 1,7 kb in a sample. This band was observed only from severe malaria sample. Sequence analysis directly from PCR product showed 74-99% similarities with previous sequences in Gene Bank. In conclusion, the DBL2β domain of vargene of Indonesian isolates was 1603 nucleotides in length and there was a possible association of the existence of DBL2β domain with the severity of malaria outcome.

  9. Regional trade market analysis: resort marketing approaches

    Science.gov (United States)

    David C. Bojanic; Rodney B. Warnick

    1995-01-01

    This paper examines the value of geographic segmentation for a regional ski resort in New England. Customers from different user groups were surveyed along with a list of inquiries and a purchased list, and grouped according to their area of origin. An ANOVA was performed to determine if there were differences in attitudes and trip behaviors between the segments. It...

  10. Southeast Regional Clean Energy Policy Analysis (Revised)

    Energy Technology Data Exchange (ETDEWEB)

    McLaren, J.

    2011-04-01

    More than half of the electricity produced in the southeastern states is fuelled by coal. Although the region produces some coal, most of the states depend heavily on coal imports. Many of the region's aging coal power facilities are planned for retirement within the next 20 years. However, estimates indicate that a 20% increase in capacity is needed over that time to meet the rapidly growing demand. The most common incentives for energy efficiency in the Southeast are loans and rebates; however, total public spending on energy efficiency is limited. The most common state-level policies to support renewable energy development are personal and corporate tax incentives and loans. The region produced 1.8% of the electricity from renewable resources other than conventional hydroelectricity in 2009, half of the national average. There is significant potential for development of a biomass market in the region, as well as use of local wind, solar, methane-to-energy, small hydro, and combined heat and power resources. Options are offered for expanding and strengthening state-level policies such as decoupling, integrated resource planning, building codes, net metering, and interconnection standards to support further clean energy development. Benefits would include energy security, job creation, insurance against price fluctuations, increased value of marginal lands, and local and global environmental paybacks.

  11. Southeast Regional Clean Energy Policy Analysis

    Energy Technology Data Exchange (ETDEWEB)

    McLaren, Joyce [National Renewable Energy Lab. (NREL), Golden, CO (United States)

    2011-04-01

    More than half of the electricity produced in the southeastern states is fuelled by coal. Although the region produces some coal, most of the states depend heavily on coal imports. Many of the region's aging coal power facilities are planned for retirement within the next 20 years. However, estimates indicate that a 20% increase in capacity is needed over that time to meet the rapidly growing demand. The most common incentives for energy efficiency in the Southeast are loans and rebates; however, total public spending on energy efficiency is limited. The most common state-level policies to support renewable energy development are personal and corporate tax incentives and loans. The region produced 1.8% of the electricity from renewable resources other than conventional hydroelectricity in 2009, half of the national average. There is significant potential for development of a biomass market in the region, as well as use of local wind, solar, methane-to-energy, small hydro, and combined heat and power resources. Options are offered for expanding and strengthening state-level policies such as decoupling, integrated resource planning, building codes, net metering, and interconnection standards to support further clean energy development. Benefits would include energy security, job creation, insurance against price fluctuations, increased value of marginal lands, and local and global environmental paybacks.

  12. Maturity onset diabetes of youth (MODY) in Turkish children: sequence analysis of 11 causative genes by next generation sequencing.

    Science.gov (United States)

    Ağladıoğlu, Sebahat Yılmaz; Aycan, Zehra; Çetinkaya, Semra; Baş, Veysel Nijat; Önder, Aşan; Peltek Kendirci, Havva Nur; Doğan, Haldun; Ceylaner, Serdar

    2016-04-01

    Maturity-onset diabetes of the youth (MODY), is a genetically and clinically heterogeneous group of diseasesand is often misdiagnosed as type 1 or type 2 diabetes. The aim of this study is to investigate both novel and proven mutations of 11 MODY genes in Turkish children by using targeted next generation sequencing. A panel of 11 MODY genes were screened in 43 children with MODY diagnosed by clinical criterias. Studies of index cases was done with MISEQ-ILLUMINA, and family screenings and confirmation studies of mutations was done by Sanger sequencing. We identified 28 (65%) point mutations among 43 patients. Eighteen patients have GCK mutations, four have HNF1A, one has HNF4A, one has HNF1B, two have NEUROD1, one has PDX1 gene variations and one patient has both HNF1A and HNF4A heterozygote mutations. This is the first study including molecular studies of 11 MODY genes in Turkish children. GCK is the most frequent type of MODY in our study population. Very high frequency of novel mutations (42%) in our study population, supports that in heterogenous disorders like MODY sequence analysis provides rapid, cost effective and accurate genetic diagnosis.

  13. Whole genome sequencing and bioinformatics analysis of two Egyptian genomes.

    Science.gov (United States)

    ElHefnawi, Mahmoud; Jeon, Sungwon; Bhak, Youngjune; ElFiky, Asmaa; Horaiz, Ahmed; Jun, JeHoon; Kim, Hyunho; Bhak, Jong

    2018-05-15

    We report two Egyptian male genomes (EGP1 and EGP2) sequenced at ~ 30× sequencing depths. EGP1 had 4.7 million variants, where 198,877 were novel variants while EGP2 had 209,109 novel variants out of 4.8 million variants. The mitochondrial haplogroup of the two individuals were identified to be H7b1 and L2a1c, respectively. We also identified the Y haplogroup of EGP1 (R1b) and EGP2 (J1a2a1a2 > P58 > FGC11). EGP1 had a mutation in the NADH gene of the mitochondrial genome ND4 (m.11778 G > A) that causes Leber's hereditary optic neuropathy. Some SNPs shared by the two genomes were associated with an increased level of cholesterol and triglycerides, probably related with Egyptians obesity. Comparison of these genomes with African and Western-Asian genomes can provide insights on Egyptian ancestry and genetic history. This resource can be used to further understand genomic diversity and functional classification of variants as well as human migration and evolution across Africa and Western-Asia. Copyright © 2017. Published by Elsevier B.V.

  14. Accident sequence precursor analysis level 2/3 model development

    International Nuclear Information System (INIS)

    Lui, C.H.; Galyean, W.J.; Brownson, D.A.

    1997-01-01

    The US Nuclear Regulatory Commission's Accident Sequence Precursor (ASP) program currently uses simple Level 1 models to assess the conditional core damage probability for operational events occurring in commercial nuclear power plants (NPP). Since not all accident sequences leading to core damage will result in the same radiological consequences, it is necessary to develop simple Level 2/3 models that can be used to analyze the response of the NPP containment structure in the context of a core damage accident, estimate the magnitude of the resulting radioactive releases to the environment, and calculate the consequences associated with these releases. The simple Level 2/3 model development work was initiated in 1995, and several prototype models have been completed. Once developed, these simple Level 2/3 models are linked to the simple Level 1 models to provide risk perspectives for operational events. This paper describes the methods implemented for the development of these simple Level 2/3 ASP models, and the linkage process to the existing Level 1 models

  15. Sequence analysis of putative swrW gene required for surfactant ...

    African Journals Online (AJOL)

    Serratia marcescens produces biosurfactant serrawettin, essential for its population migration behavior. Serrawettin W1 was revealed to be an antibiotic serratamolide that makes it significant for deoxyribonucleic acid (DNA) and protein sequence analysis. Four nucleotide and amino-acid sequences from local strains ...

  16. Characterization of the genomic organization of the region bordering the centromere of chromosome V of Podospora anserina by direct sequencing.

    Science.gov (United States)

    Silar, Philippe; Barreau, Christian; Debuchy, Robert; Kicka, Sébastien; Turcq, Béatrice; Sainsard-Chanet, Annie; Sellem, Carole H; Billault, Alain; Cattolico, Laurence; Duprat, Simone; Weissenbach, Jean

    2003-08-01

    A Podospora anserina BAC library of 4800 clones has been constructed in the vector pBHYG allowing direct selection in fungi. Screening of the BAC collection for centromeric sequences of chromosome V allowed the recovery of clones localized on either sides of the centromere, but no BAC clone was found to contain the centromere. Seven BAC clones containing 322,195 and 156,244bp from either sides of the centromeric region were sequenced and annotated. One 5S rRNA gene, 5 tRNA genes, and 163 putative coding sequences (CDS) were identified. Among these, only six CDS seem specific to P. anserina. The gene density in the centromeric region is approximately one gene every 2.8kb. Extrapolation of this gene density to the whole genome of P. anserina suggests that the genome contains about 11,000 genes. Synteny analyses between P. anserina and Neurospora crassa show that co-linearity extends at the most to a few genes, suggesting rapid genome rearrangements between these two species.

  17. A Combined Linkage and Exome Sequencing Analysis for Electrocardiogram Parameters in the Erasmus Rucphen Family Study.

    Science.gov (United States)

    Silva, Claudia T; Zorkoltseva, Irina V; Amin, Najaf; Demirkan, Ayşe; van Leeuwen, Elisabeth M; Kors, Jan A; van den Berg, Marten; Stricker, Bruno H; Uitterlinden, André G; Kirichenko, Anatoly V; Witteman, Jacqueline C M; Willemsen, Rob; Oostra, Ben A; Axenovich, Tatiana I; van Duijn, Cornelia M; Isaacs, Aaron

    2016-01-01

    Electrocardiogram (ECG) measurements play a key role in the diagnosis and prediction of cardiac arrhythmias and sudden cardiac death. ECG parameters, such as the PR, QRS, and QT intervals, are known to be heritable and genome-wide association studies of these phenotypes have been successful in identifying common variants; however, a large proportion of the genetic variability of these traits remains to be elucidated. The aim of this study was to discover loci potentially harboring rare variants utilizing variance component linkage analysis in 1547 individuals from a large family-based study, the Erasmus Rucphen Family Study (ERF). Linked regions were further explored using exome sequencing. Five suggestive linkage peaks were identified: two for QT interval (1q24, LOD = 2.63; 2q34, LOD = 2.05), one for QRS interval (1p35, LOD = 2.52) and two for PR interval (9p22, LOD = 2.20; 14q11, LOD = 2.29). Fine-mapping using exome sequence data identified a C > G missense variant (c.713C > G, p.Ser238Cys) in the FCRL2 gene associated with QT (rs74608430; P = 2.8 × 10 -4 , minor allele frequency = 0.019). Heritability analysis demonstrated that the SNP explained 2.42% of the trait's genetic variability in ERF ( P = 0.02). Pathway analysis suggested that the gene is involved in cytosolic Ca 2+ levels ( P = 3.3 × 10 -3 ) and AMPK stimulated fatty acid oxidation in muscle ( P = 4.1 × 10 -3 ). Look-ups in bioinformatics resources showed that expression of FCRL2 is associated with ARHGAP24 and SETBP1 expression. This finding was not replicated in the Rotterdam study. Combining the bioinformatics information with the association and linkage analyses, FCRL2 emerges as a strong candidate gene for QT interval.

  18. Phylogenetic and Comparative Sequence Analysis of Thermostable Alpha Amylases of kingdom Archea, Prokaryotes and Eukaryotes.

    Science.gov (United States)

    Huma, Tayyaba; Maryam, Arooma; Rehman, Shahid Ur; Qamar, Muhammad Tahir Ul; Shaheen, Tayyaba; Haque, Asma; Shaheen, Bushra

    2014-01-01

    Alpha amylase family is generally defined as a group of enzymes that can hydrolyse and transglycosylase α-(1, 4) or α-(1, 6) glycosidic bonds along with the preservation of anomeric configuration. For the comparative analysis of alpha amylase family, nucleotide sequences of seven thermo stable organisms of Kingdom Archea i.e. Pyrococcus furiosus (100-105°C), Kingdom Prokaryotes i.e. Bacillus licheniformis (90-95°C), Geobacillus stearothermophilus (75°C), Bacillus amyloliquefaciens (72°C), Bacillus subtilis (70°C) and Bacillus KSM K38 (55°C) and Eukaryotes i.e. Aspergillus oryzae (60°C) were selected from NCBI. Primary structure composition analysis and Conserved sequence analysis were conducted through Bio Edit tools. Results from BioEdit shown only three conserved regions of base pairs and least similarity in MSA of the above mentioned alpha amylases. In Mega 5.1 Phylogeny of thermo stable alpha amylases of Kingdom Archea, Prokaryotes and Eukaryote was handled by Neighbor-Joining (NJ) algorithm. Mega 5.1 phylogenetic results suggested that alpha amylases of thermo stable organisms i.e. Pyrococcus furiosus (100-105°C), Bacillus licheniformis (90-95°C), Geobacillus stearothermophilus (75°C) and Bacillus amyloliquefaciens (72°C) are more distantly related as compared to less thermo stable organisms. By keeping in mind the characteristics of most thermo stable alpha amylases novel and improved features can be introduced in less thermo stable alpha amylases so that they become more thermo tolerant and productive for industry.

  19. A combined linkage and exome sequencing analysis for electrocardiogram parameters in the Erasmus Rucphen Family study

    Directory of Open Access Journals (Sweden)

    Claudia Tamar Silva

    2016-11-01

    Full Text Available Electrocardiogram (ECG measurements play a key role in the diagnosis and prediction of cardiac arrhythmias and sudden cardiac death. ECG parameters, such as the PR, QRS, and QT intervals, are known to be heritable and genome-wide association studies (GWAS of these phenotypes have been successful in identifying common variants; however, a large proportion of the genetic variability of these traits remains to be elucidated. The aim of this study was to discover loci potentially harboring rare variants utilizing variance component linkage analysis in 1547 individuals from a large family-based study, the Erasmus Rucphen Family Study (ERF. Linked regions were further explored using exome sequencing. Five suggestive linkage peaks were identified: two for QT interval (1q24, LOD = 2.63; 2q34, LOD = 2.05, one for QRS interval (1p35, LOD = 2.52 and two for PR interval (9p22, LOD = 2.20; 14q11, LOD = 2.29. Fine-mapping using exome sequence data identified a C > G missense variant (c.713C>G, p.Ser238Cys in the FCRL2 gene associated with QT (rs74608430; P = 2.8 ×10-4, minor allele frequency = 0.019. Heritability analysis demonstrated that the SNP explained 2.42% of the trait’s genetic variability in ERF (P = 0.02. Pathway analysis suggested that the gene is involved in cytosolic Ca2+ levels (P = 3.3 × 10-3 and AMPK stimulated fatty acid oxidation in muscle (P = 4.1 ×10-3. Look-ups in bioinformatics resources showed that expression of FCRL2 is associated with ARHGAP24 and SETBP1 expression. This finding was not replicated in the Rotterdam study. Combining the bioinformatics information with the association and linkage analyses, FCRL2 emerges as a strong candidate gene for QT interval.

  20. Markov analysis of alpha-helical, beta-sheet and random coil regions of proteins

    International Nuclear Information System (INIS)

    Macchiato, M.; Tramontano, A.

    1983-01-01

    The rules up to now used to predict the spatial configuration of proteins from their primary structure are mostly based on the recurrence analysis of some doublets, triplets and so on of contiguous amino acids, but they do not take into account the correlation characteristics of the whole amino acid sequence. A statistical analysis of amino acid sequences for the alpha-helical, beta-sheet and random coil regions of about twenty proteins with known secondary structure by considering correlations effects has been carried out. The obtained results demonstrate that these sequences are at least a second-order Markov chain, i.e. they appear as if they were generated by a source that remembers at least the two aminoacids before the one being generated and that these two previous symbols influence the present choice

  1. A symbolic dynamics approach for the complexity analysis of chaotic pseudo-random sequences

    International Nuclear Information System (INIS)

    Xiao Fanghong

    2004-01-01

    By considering a chaotic pseudo-random sequence as a symbolic sequence, authors present a symbolic dynamics approach for the complexity analysis of chaotic pseudo-random sequences. The method is applied to the cases of Logistic map and one-way coupled map lattice to demonstrate how it works, and a comparison is made between it and the approximate entropy method. The results show that this method is applicable to distinguish the complexities of different chaotic pseudo-random sequences, and it is superior to the approximate entropy method

  2. Identification and characterization of earthquake clusters: a comparative analysis for selected sequences in Italy

    Science.gov (United States)

    Peresan, Antonella; Gentili, Stefania

    2017-04-01

    Identification and statistical characterization of seismic clusters may provide useful insights about the features of seismic energy release and their relation to physical properties of the crust within a given region. Moreover, a number of studies based on spatio-temporal analysis of main-shocks occurrence require preliminary declustering of the earthquake catalogs. Since various methods, relying on different physical/statistical assumptions, may lead to diverse classifications of earthquakes into main events and related events, we aim to investigate the classification differences among different declustering techniques. Accordingly, a formal selection and comparative analysis of earthquake clusters is carried out for the most relevant earthquakes in North-Eastern Italy, as reported in the local OGS-CRS bulletins, compiled at the National Institute of Oceanography and Experimental Geophysics since 1977. The comparison is then extended to selected earthquake sequences associated with a different seismotectonic setting, namely to events that occurred in the region struck by the recent Central Italy destructive earthquakes, making use of INGV data. Various techniques, ranging from classical space-time windows methods to ad hoc manual identification of aftershocks, are applied for detection of earthquake clusters. In particular, a statistical method based on nearest-neighbor distances of events in space-time-energy domain, is considered. Results from clusters identification by the nearest-neighbor method turn out quite robust with respect to the time span of the input catalogue, as well as to minimum magnitude cutoff. The identified clusters for the largest events reported in North-Eastern Italy since 1977 are well consistent with those reported in earlier studies, which were aimed at detailed manual aftershocks identification. The study shows that the data-driven approach, based on the nearest-neighbor distances, can be satisfactorily applied to decompose the seismic

  3. Functional regression method for whole genome eQTL epistasis analysis with sequencing data.

    Science.gov (United States)

    Xu, Kelin; Jin, Li; Xiong, Momiao

    2017-05-18

    Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges. We develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions. By large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction

  4. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution.

    Science.gov (United States)

    2004-12-09

    We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.

  5. Sequence analysis of the Legionella micdadei groELS operon

    DEFF Research Database (Denmark)

    Hindersson, P; Høiby, N; Bangsborg, Jette Marie

    1991-01-01

    shock expression signals were identified upstream of the L. micdadei groEL gene. Further upstream, a poly-T region, also a feature of the sigma 32-regulated Escherichia coli groELS heat shock operon, was found. Despite the high degree of homology of the expression signals in E. coli and L. micdadei...

  6. The sequence and analysis of duplication rich human chromosome 16

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Joel; Han, Cliff; Gordon, Laurie A.; Terry, Astrid; Prabhakar, Shyam; She, Xinwei; Xie, Gary; Hellsten, Uffe; Man Chan, Yee; Altherr, Michael; Couronne, Olivier; Aerts, Andrea; Bajorek, Eva; Black, Stacey; Blumer, Heather; Branscomb, Elbert; Brown, Nancy C.; Bruno, William J.; Buckingham, Judith M.; Callen, David F.; Campbell, Connie S.; Campbell, Mary L.; Campbell, Evelyn W.; Caoile, Chenier; Challacombe, Jean F.; Chasteen, Leslie A.; Chertkov, Olga; Chi, Han C.; Christensen, Mari; Clark, Lynn M.; Cohn, Judith D.; Denys, Mirian; Detter, John C.; Dickson, Mark; Dimitrijevic-Bussod, Mira; Escobar, Julio; Fawcett, Joseph J.; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Goodwin, Lynne A.; Grady, Deborah L.; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Hildebrand, Carl E.; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Jewett, Phillip E.; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Krawczyk, Marie-Claude; Leyba, Tina; Longmire, Jonathan L.; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Ludeman, Thom; Mark, Graham A.; Mcmurray, Kimberly L.; Meincke, Linda J.; Morgan, Jenna; Moyzis, Robert K.; Mundt, Mark O.; Munk, A. Christine; Nandkeshwar, Richard D.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Parson-Quintana, Beverly; Ramirez, Lucia; Rash, Sam; Retterer, James; Ricke, Darryl O.; Robinson, Donna L.; Rodriguez, Alex; Salamov, Asaf; Saunders, Elizabeth H.; Scott, Duncan; Shough, Timothy; Stallings, Raymond L.; Stalvey, Malinda; Sutherland, Robert D.; Tapia, Roxanne; Tesmer, Judith G.; Thayer, Nina; Thompson, Linda S.; Tice, Hope; Torney, David C.; Tran-Gyamfi, Mary; Tsai, Ming; Ulanovsky, Levy E.; Ustaszewska, Anna; Vo, Nu; White, P. Scott; Williams, Albert L.; Wills, Patricia L.; Wu, Jung-Rung; Wu, Kevin; Yang, Joan; DeJong, Pieter; Bruce, David; Doggett, Norman; Deaven, Larry; Schmutz, Jeremy; Grimwood, Jane; Richardson, Paul; et al.

    2004-08-01

    We report here the 78,884,754 base pairs of finished human chromosome 16 sequence, representing over 99.9 percent of its euchromatin. Manual annotation revealed 880 protein coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes and 3 RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobasepairs were identified and result in gene content differences across humans. One of the unique features of chromosome 16 is its high level of segmental duplication, ranked among the highest of the human autosomes. While the segmental duplications are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events which are likely to have had an impact on the evolution of primates and human disease susceptibility.

  7. Analysis of decision procedures for a sequence of inventory periods

    International Nuclear Information System (INIS)

    Avenhaus, R.

    1982-07-01

    Optimal test procedures for a sequence of inventory periods will be discussed. Starting with a game theoretical description of the conflict situation between the plant operator and the inspector, the objectives of the inspector as well as the general decision theoretical problem will be formulated. In the first part the objective of 'secure' detection will be emphasized which means that only at the end of the reference time a decision is taken by the inspector. In the second part the objective of 'timely' detection will be emphasized which will lead to sequential test procedures. At the end of the paper all procedures will be summarized, and in view of the multitude of procedures available at the moment some comments about future work will be given. (orig./HP) [de

  8. The Sequence and Analysis of Duplication Rich Human Chromosome 16

    Science.gov (United States)

    Martin, Joel; Han, Cliff; Gordon, Laurie A.; Terry, Astrid; Prabhakar, Shyam; She, Xinwei; Xie, Gary; Hellsten, Uffe; Man Chan, Yee; Altherr, Michael; Couronne, Olivier; Aerts, Andrea; Bajorek, Eva; Black, Stacey; Blumer, Heather; Branscomb, Elbert; Brown, Nancy C.; Bruno, William J.; Buckingham, Judith M.; Callen, David F.; Campbell, Connie S.; Campbell, Mary L.; Campbell, Evelyn W.; Caoile, Chenier; Challacombe, Jean F.; Chasteen, Leslie A.; Chertkov, Olga; Chi, Han C.; Christensen, Mari; Clark, Lynn M.; Cohn, Judith D.; Denys, Mirian; Detter, John C.; Dickson, Mark; Dimitrijevic-Bussod, Mira; Escobar, Julio; Fawcett, Joseph J.; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Goodwin, Lynne A.; Grady, Deborah L.; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Hildebrand, Carl E.; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Jewett, Phillip E.; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Krawczyk, Marie-Claude; Leyba, Tina; Longmire, Jonathan L.; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Ludeman, Thom; Mark, Graham A.; Mcmurray, Kimberly L.; Meincke, Linda J.; Morgan, Jenna; Moyzis, Robert K.; Mundt, Mark O.; Munk, A. Christine; Nandkeshwar, Richard D.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Parson-Quintana, Beverly; Ramirez, Lucia; Rash, Sam; Retterer, James; Ricke, Darryl O.; Robinson, Donna L.; Rodriguez, Alex; Salamov, Asaf; Saunders, Elizabeth H.; Scott, Duncan; Shough, Timothy; Stallings, Raymond L.; Stalvey, Malinda; Sutherland, Robert D.; Tapia, Roxanne; Tesmer, Judith G.; Thayer, Nina; Thompson, Linda S.; Tice, Hope; Torney, David C.; Tran-Gyamfi, Mary; Tsai, Ming; Ulanovsky, Levy E.; Ustaszewska, Anna; Vo, Nu; White, P. Scott; Williams, Albert L.; Wills, Patricia L.; Wu, Jung-Rung; Wu, Kevin; Yang, Joan; DeJong, Pieter; Bruce, David; Doggett, Norman; Deaven, Larry; Schmutz, Jeremy; Grimwood, Jane; Richardson, Paul; et al.

    2004-01-01

    We report here the 78,884,754 base pairs of finished human chromosome 16 sequence, representing over 99.9 percent of its euchromatin. Manual annotation revealed 880 protein coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes and 3 RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobasepairs were identified and result in gene content differences across humans. One of the unique features of chromosome 16 is its high level of segmental duplication, ranked among the highest of the human autosomes. While the segmental duplications are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events which are likely to have had an impact on the evolution of primates and human disease susceptibility.

  9. Factoring local sequence composition in motif significance analysis.

    Science.gov (United States)

    Ng, Patrick; Keich, Uri

    2008-01-01

    We recently introduced a biologically realistic and reliable significance analysis of the output of a popular class of motif finders. In this paper we further improve our significance analysis by incorporating local base composition information. Relying on realistic biological data simulation, as well as on FDR analysis applied to real data, we show that our method is significantly better than the increasingly popular practice of using the normal approximation to estimate the significance of a finder's output. Finally we turn to leveraging our reliable significance analysis to improve the actual motif finding task. Specifically, endowing a variant of the Gibbs Sampler with our improved significance analysis we demonstrate that de novo finders can perform better than has been perceived. Significantly, our new variant outperforms all the finders reviewed in a recently published comprehensive analysis of the Harbison genome-wide binding location data. Interestingly, many of these finders incorporate additional information such as nucleosome positioning and the significance of binding data.

  10. Microbial analysis of bite marks by sequence comparison of streptococcal DNA.

    Directory of Open Access Journals (Sweden)

    Darnell M Kennedy

    Full Text Available Bite mark injuries often feature in violent crimes. Conventional morphometric methods for the forensic analysis of bite marks involve elements of subjective interpretation that threaten the credibility of this field. Human DNA recovered from bite marks has the highest evidentiary value, however recovery can be compromised by salivary components. This study assessed the feasibility of matching bacterial DNA sequences amplified from experimental bite marks to those obtained from the teeth responsible, with the aim of evaluating the capability of three genomic regions of streptococcal DNA to discriminate between participant samples. Bite mark and teeth swabs were collected from 16 participants. Bacterial DNA was extracted to provide the template for PCR primers specific for streptococcal 16S ribosomal RNA (16S rRNA gene, 16S-23S intergenic spacer (ITS and RNA polymerase beta subunit (rpoB. High throughput sequencing (GS FLX 454, followed by stringent quality filtering, generated reads from bite marks for comparison to those generated from teeth samples. For all three regions, the greatest overlaps of identical reads were between bite mark samples and the corresponding teeth samples. The average proportions of reads identical between bite mark and corresponding teeth samples were 0.31, 0.41 and 0.31, and for non-corresponding samples were 0.11, 0.20 and 0.016, for 16S rRNA, ITS and rpoB, respectively. The probabilities of correctly distinguishing matching and non-matching teeth samples were 0.92 for ITS, 0.99 for 16S rRNA and 1.0 for rpoB. These findings strongly support the tenet that bacterial DNA amplified from bite marks and teeth can provide corroborating information in the identification of assailants.

  11. Genome analysis of environmental and clinical P. aeruginosa isolates from sequence type-1146.

    Directory of Open Access Journals (Sweden)

    David Sánchez

    Full Text Available The genomes of Pseudomonas aeruginosa isolates of the new sequence type ST-1146, three environmental (P37, P47 and P49 and one clinical (SD9 isolates, with differences in their antibiotic susceptibility profiles have been sequenced and analysed. The genomes were mapped against P. aeruginosa PAO1-UW and UCBPP-PA14. The allelic profiles showed that the highest number of differences were in "Related to phage, transposon or plasmid" and "Secreted factors" categories. The clinical isolate showed a number of exclusive alleles greater than that for the environmental isolates. The phage Pf1 region in isolate SD9 accumulated the highest number of nucleotide substitutions. The ORF analysis of the four genomes assembled de novo indicated that the number of isolate-specific genes was higher in isolate SD9 (132 genes than in isolates P37 (24 genes, P47 (16 genes and P49 (21 genes. CRISPR elements were found in all isolates and SD9 showed differences in the spacer region. Genes related to bacteriophages F116 and H66 were found only in isolate SD9. Genome comparisons indicated that the isolates of ST-1146 are close related, and most genes implicated in pathogenicity are highly conserved, suggesting a genetic potential for infectivity in the environmental isolates similar to the clinical one. Phage-related genes are responsible of the main differences among the genomes of ST-1146 isolates. The role of bacteriophages has to be considered in the adaptation processes of isolates to the host and in microevolution studies.

  12. Peptide Pattern Recognition for high-throughput protein sequence analysis and clustering

    DEFF Research Database (Denmark)

    Busk, Peter Kamp

    2017-01-01

    Large collections of protein sequences with divergent sequences are tedious to analyze for understanding their phylogenetic or structure-function relation. Peptide Pattern Recognition is an algorithm that was developed to facilitate this task but the previous version does only allow a limited...... number of sequences as input. I implemented Peptide Pattern Recognition as a multithread software designed to handle large numbers of sequences and perform analysis in a reasonable time frame. Benchmarking showed that the new implementation of Peptide Pattern Recognition is twenty times faster than...... the previous implementation on a small protein collection with 673 MAP kinase sequences. In addition, the new implementation could analyze a large protein collection with 48,570 Glycosyl Transferase family 20 sequences without reaching its upper limit on a desktop computer. Peptide Pattern Recognition...

  13. Information-Theoretical Analysis of EEG Microstate Sequences in Python

    Directory of Open Access Journals (Sweden)

    Frederic von Wegner

    2018-06-01

    Full Text Available We present an open-source Python package to compute information-theoretical quantities for electroencephalographic data. Electroencephalography (EEG measures the electrical potential generated by the cerebral cortex and the set of spatial patterns projected by the brain's electrical potential on the scalp surface can be clustered into a set of representative maps called EEG microstates. Microstate time series are obtained by competitively fitting the microstate maps back into the EEG data set, i.e., by substituting the EEG data at a given time with the label of the microstate that has the highest similarity with the actual EEG topography. As microstate sequences consist of non-metric random variables, e.g., the letters A–D, we recently introduced information-theoretical measures to quantify these time series. In wakeful resting state EEG recordings, we found new characteristics of microstate sequences such as periodicities related to EEG frequency bands. The algorithms used are here provided as an open-source package and their use is explained in a tutorial style. The package is self-contained and the programming style is procedural, focusing on code intelligibility and easy portability. Using a sample EEG file, we demonstrate how to perform EEG microstate segmentation using the modified K-means approach, and how to compute and visualize the recently introduced information-theoretical tests and quantities. The time-lagged mutual information function is derived as a discrete symbolic alternative to the autocorrelation function for metric time series and confidence intervals are computed from Markov chain surrogate data. The software package provides an open-source extension to the existing implementations of the microstate transform and is specifically designed to analyze resting state EEG recordings.

  14. Massively parallel sequencing and analysis of the Necator americanus transcriptome.

    Directory of Open Access Journals (Sweden)

    Cinzia Cantacessi

    2010-05-01

    Full Text Available The blood-feeding hookworm Necator americanus infects hundreds of millions of people worldwide. In order to elucidate fundamental molecular biological aspects of this hookworm, the transcriptome of the adult stage of Necator americanus was explored using next-generation sequencing and bioinformatic analyses.A total of 19,997 contigs were assembled from the sequence data; 6,771 of these contigs had known orthologues in the free-living nematode Caenorhabditis elegans, and most of them encoded proteins with WD40 repeats (10.6%, proteinase inhibitors (7.8% or calcium-binding EF-hand proteins (6.7%. Bioinformatic analyses inferred that the C. elegans homologues are involved mainly in biological pathways linked to ribosome biogenesis (70%, oxidative phosphorylation (63% and/or proteases (60%; most of these molecules were predicted to be involved in more than one biological pathway. Comparative analyses of the transcriptomes of N. americanus and the canine hookworm, Ancylostoma caninum, revealed qualitative and quantitative differences. For instance, proteinase inhibitors were inferred to be highly represented in the former species, whereas SCP/Tpx-1/Ag5/PR-1/Sc7 proteins ( = SCP/TAPS or Ancylostoma-secreted proteins were predominant in the latter. In N. americanus, essential molecules were predicted using a combination of orthology mapping and functional data available for C. elegans. Further analyses allowed the prioritization of 18 predicted drug targets which did not have homologues in the human host. These candidate targets were inferred to be linked to mitochondrial (e.g., processing proteins or amino acid metabolism (e.g., asparagine t-RNA synthetase.This study has provided detailed insights into the transcriptome of the adult stage of N. americanus and examines similarities and differences between this species and A. caninum. Future efforts should focus on comparative transcriptomic and proteomic investigations of the other predominant human

  15. Sequence based polymorphic (SBP marker technology for targeted genomic regions: its application in generating a molecular map of the Arabidopsis thaliana genome

    Directory of Open Access Journals (Sweden)

    Sahu Binod B

    2012-01-01

    Full Text Available Abstract Background Molecular markers facilitate both genotype identification, essential for modern animal and plant breeding, and the isolation of genes based on their map positions. Advancements in sequencing technology have made possible the identification of single nucleotide polymorphisms (SNPs for any genomic regions. Here a sequence based polymorphic (SBP marker technology for generating molecular markers for targeted genomic regions in Arabidopsis is described. Results A ~3X genome coverage sequence of the Arabidopsis thaliana ecotype, Niederzenz (Nd-0 was obtained by applying Illumina's sequencing by synthesis (Solexa technology. Comparison of the Nd-0 genome sequence with the assembled Columbia-0 (Col-0 genome sequence identified putative single nucleotide polymorphisms (SNPs throughout the entire genome. Multiple 75 base pair Nd-0 sequence reads containing SNPs and originating from individual genomic DNA molecules were the basis for developing co-dominant SBP markers. SNPs containing Col-0 sequences, supported by transcript sequences or sequences from multiple BAC clones, were compared to the respective Nd-0 sequences to identify possible restriction endonuclease enzyme site variations. Small amplicons, PCR amplified from both ecotypes, were digested with suitable restriction enzymes and resolved on a gel to reveal the sequence based polymorphisms. By applying this technology, 21 SBP markers for the marker poor regions of the Arabidopsis map representing polymorphisms between Col-0 and Nd-0 ecotypes were generated. Conclusions The SBP marker technology described here allowed the development of molecular markers for targeted genomic regions of Arabidopsis. It should facilitate isolation of co-dominant molecular markers for targeted genomic regions of any animal or plant species, whose genomic sequences have been assembled. This technology will particularly facilitate the development of high density molecular marker maps, essential for

  16. Nucleotide sequence of the promoter region of the gene encoding chicken Calbindin D28K

    Energy Technology Data Exchange (ETDEWEB)

    Ferrari, S; Drusiani, E; Battini, R; Fregni, M

    1988-01-11

    Calbindin D28K (formerly Vitamin D-Dependent Calcium Binding Protein) is a protein induced by 1,25-dihydroxycholecalciferol in several chicken tissues. A chicken genomic DNA library was screened with a synthetic oligonucleotide representing the sequence of Calbindin D18K cDNA from nt 146 to nt 176. The positive clone CBAl extends the 5'-end of the first exon by 451 bp. The sequence of a BamHI-SacII restriction fragment with coordinates -451 + 50 is shown. The BamHI-SacII fragment was subcloned 5' to the CAT gene of pUCCAT. The result is shown of a CAT assay on mouse fibroblasts 3T6 transiently transfected with pUCCAT, pUCCAT containing the BamHI-SacII fragment in the correct or opposite orientation or the SV40 promoter. /sup 14/C-chloramphenicol and its acetyl derivatives generated by purified CAT are also shown. The expression of CAT appears to be constitutive since the enzyme activity is not influenced by the presence (+) or absence (-) of 1,25-dihydroxycholecalciferol in the culture medium.

  17. Reconstruction of phylogenetic relationships in dermatomycete genus Trichophyton Malmsten 1848 based on ribosomal internal transcribed spacer region, partial 28S rRNA and beta-tubulin genes sequences.

    Science.gov (United States)

    Pchelin, Ivan M; Zlatogursky, Vasily V; Rudneva, Mariya V; Chilina, Galina A; Rezaei-Matehkolaei, Ali; Lavnikevich, Dmitry M; Vasilyeva, Natalya V; Taraskina, Anastasia E

    2016-09-01

    Trichophyton spp. are important causative agents of superficial mycoses. The phylogeny of the genus and accurate strain identification, based on the ribosomal ITS region