WorldWideScience

Sample records for base sequence

  1. Classification of base sequences

    CERN Document Server

    Djokovic, Dragomir Z

    2010-01-01

    Base sequences BS(n+1,n) are quadruples of {1,-1}-sequences (A;B;C;D), with A and B of length n+1 and C and D of length n, such that the sum of their nonperiodic autocorrelation functions is a delta-function. The base sequence conjecture, asserting that BS(n+1,n) exist for all n, is stronger than the famous Hadamard matrix conjecture. We introduce a new definition of equivalence for base sequences BS(n+1,n) and construct a canonical form. By using this canonical form, we have enumerated the equivalence classes of BS(n+1,n) for n <= 30. Due to excessive size of the equivalence classes, the tables in the paper cover only the cases n <= 12.

  2. Classification of Base Sequences (+1,

    Directory of Open Access Journals (Sweden)

    Dragomir Ž. Ðoković

    2010-01-01

    Full Text Available Base sequences BS(+1, are quadruples of {±1}-sequences (;;;, with A and B of length +1 and C and D of length n, such that the sum of their nonperiodic autocor-relation functions is a -function. The base sequence conjecture, asserting that BS(+1, exist for all n, is stronger than the famous Hadamard matrix conjecture. We introduce a new definition of equivalence for base sequences BS(+1, and construct a canonical form. By using this canonical form, we have enumerated the equivalence classes of BS(+1, for ≤30. As the number of equivalence classes grows rapidly (but not monotonically with n, the tables in the paper cover only the cases ≤13.

  3. On the base sequence conjecture

    CERN Document Server

    Djokovic, Dragomir Z

    2010-01-01

    Let BS(m,n) denote the set of base sequences (A;B;C;D), with A and B of length m and C and D of length n. The base sequence conjecture (BSC) asserts that BS(n+1,n) exist (i.e., are non-empty) for all n. This is known to be true for n <= 36 and when n is a Golay number. We show that it is also true for n=37 and n=38. It is worth pointing out that BSC is stronger than the famous Hadamard matrix conjecture. In order to demonstrate the abundance of base sequences, we have previously attached to BS(n+1,n) a graph Gamma_n and computed the Gamma_n for n <= 27. We now extend these computations and determine the Gamma_n for n=28,...,35. We also propose a conjecture describing these graphs in general.

  4. NGS-based deep bisulfite sequencing.

    Science.gov (United States)

    Lee, Suman; Kim, Joomyeong

    2016-01-01

    We have developed an NGS-based deep bisulfite sequencing protocol for the DNA methylation analysis of genomes. This approach allows the rapid and efficient construction of NGS-ready libraries with a large number of PCR products that have been individually amplified from bisulfite-converted DNA. This approach also employs a bioinformatics strategy to sort the raw sequence reads generated from NGS platforms and subsequently to derive DNA methylation levels for individual loci. The results demonstrated that this NGS-based deep bisulfite sequencing approach provide not only DNA methylation levels but also informative DNA methylation patterns that have not been seen through other existing methods.•This protocol provides an efficient method generating NGS-ready libraries from individually amplified PCR products.•This protocol provides a bioinformatics strategy sorting NGS-derived raw sequence reads.•This protocol provides deep bisulfite sequencing results that can measure DNA methylation levels and patterns of individual loci.

  5. Comparative genomics beyond sequence-based alignments

    DEFF Research Database (Denmark)

    Þórarinsson, Elfar; Yao, Zizhen; Wiklund, Eric D.;

    2008-01-01

    Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure--frequent compensating base changes--is increasingly likely to cause sequence-based alignment...... methods to misalign, or even refuse to align, homologous ncRNAs, consequently obscuring that structural signal. We have used CMfinder, a structure-oriented local alignment tool, to search the ENCODE regions of vertebrate multiple alignments. In agreement with other studies, we find a large number...... of potential RNA structures in the ENCODE regions. We report 6587 candidate regions with an estimated false-positive rate of 50%. More intriguingly, many of these candidates may be better represented by alignments taking the RNA secondary structure into account than those based on primary sequence alone, often...

  6. SNAD: sequence name annotation-based designer

    Directory of Open Access Journals (Sweden)

    Gorbalenya Alexander E

    2009-08-01

    Full Text Available Abstract Background A growing diversity of biological data is tagged with unique identifiers (UIDs associated with polynucleotides and proteins to ensure efficient computer-mediated data storage, maintenance, and processing. These identifiers, which are not informative for most people, are often substituted by biologically meaningful names in various presentations to facilitate utilization and dissemination of sequence-based knowledge. This substitution is commonly done manually that may be a tedious exercise prone to mistakes and omissions. Results Here we introduce SNAD (Sequence Name Annotation-based Designer that mediates automatic conversion of sequence UIDs (associated with multiple alignment or phylogenetic tree, or supplied as plain text list into biologically meaningful names and acronyms. This conversion is directed by precompiled or user-defined templates that exploit wealth of annotation available in cognate entries of external databases. Using examples, we demonstrate how this tool can be used to generate names for practical purposes, particularly in virology. Conclusion A tool for controllable annotation-based conversion of sequence UIDs into biologically meaningful names and acronyms has been developed and placed into service, fostering links between quality of sequence annotation, and efficiency of communication and knowledge dissemination among researchers.

  7. Next-Generation Sequencing Techniques for Eukaryotic Microorganisms: Sequencing-Based Solutions to Biological Problems▿

    OpenAIRE

    Nowrousian, Minou

    2010-01-01

    Over the past 5 years, large-scale sequencing has been revolutionized by the development of several so-called next-generation sequencing (NGS) technologies. These have drastically increased the number of bases obtained per sequencing run while at the same time decreasing the costs per base. Compared to Sanger sequencing, NGS technologies yield shorter read lengths; however, despite this drawback, they have greatly facilitated genome sequencing, first for prokaryotic genomes and within the las...

  8. A repetitive sequence assembler based on next-generation sequencing.

    Science.gov (United States)

    Lian, S; Tu, Y; Wang, Y; Chen, X; Wang, L

    2016-01-01

    Repetitive sequences of variable length are common in almost all eukaryotic genomes, and most of them are presumed to have important biomedical functions and can cause genomic instability. Next-generation sequencing (NGS) technologies provide the possibility of identifying capturing these repetitive sequences directly from the NGS data. In this study, we assessed the performances in identifying capturing repeats of leading assemblers, such as Velvet, SOAPdenovo, SGA, MSR-CA, Bambus2, ALLPATHS-LG, and AByss using three real NGS datasets. Our results indicated that most of them performed poorly in capturing the repeats. Consequently, we proposed a repetitive sequence assembler, named NGSReper, for capturing repeats from NGS data. Simulated datasets were used to validate the feasibility of NGSReper. The results indicate that the completeness of capturing repeat is up to 99%. Cross validation was performed in three real NGS datasets, and extensive comparisons indicate that NGSReper performed best in terms of completeness and accuracy in capturing repeats. In conclusion, NGSReper is an appropriate and suitable tool for capturing repeats directly from NGS data. PMID:27525861

  9. Stream cipher based on GSS sequences

    Institute of Scientific and Technical Information of China (English)

    HU Yupu; XIAO Guozhen

    2004-01-01

    Generalized self-shrinking sequences, simply named the GSS sequences,are novel periodic sequences that have many advantages in cryptography. In this paper,we give several results about GSS sequence's application to cryptography. First, we give a simple method for selecting those GSS sequences whose least periods reach the maximum. Second, we give a method for describing and computing the auto-correlation coefficients of GSS sequences. Finally, we point out that some GSS sequences, when used as stream ciphers, have a security weakness.

  10. SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY

    Directory of Open Access Journals (Sweden)

    Yoshihisa Udagawa

    2013-07-01

    Full Text Available Duplicate code adversely affects the quality of software systems and hence should be detected. We discuss an approach that improves source code retrieval using structural information of source code. A lexical parser is developed to extract control statements and method identifiers from Java programs. We propose a similarity measure that is defined by the ratio of the number of sequential fully matching statements to the number of sequential partially matching statements. The defined similarity measure is an extension of the set-based Sorensen-Dice similarity index. This research primarily contributes to the development of a similarity retrieval algorithm that derives meaningful search conditions from a given sequence, and then performs retrieval using all derived conditions. Experiments show that our retrieval model shows an improvement of up to 90.9% over other retrieval models relative to the number of retrieved methods.

  11. Source Code Retrieval Using Sequence Based Similarity

    Directory of Open Access Journals (Sweden)

    Yoshihisa Udagawa

    2013-07-01

    Full Text Available Duplicate code adversely affects the quality of software systems and hence should be detected. We discussan approach that improves source code retrieval using structural information of source code. A lexicalparser is developed to extract control statements and method identifiers from Java programs. We propose asimilarity measure that is defined by the ratio ofthe number of sequential fully matching statementsto thenumber of sequential partially matching statements.The defined similarity measure is an extension oftheset-based Sorensen-Dice similarity index. This research primarily contributes to the development of asimilarity retrieval algorithm that derives meaningful search conditions from a given sequence, and thenperforms retrieval using all derived conditions. Experiments show that our retrieval model shows animprovement of up to 90.9% over other retrieval models relative to the number of retrieved methods.

  12. Chip-based sequencing nucleic acids

    Science.gov (United States)

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  13. Quick Trickle Permutation Based on Quick Trickle Characteristic Sequence

    Institute of Scientific and Technical Information of China (English)

    WangLi-na; FeiRu-chun; LiuZhu

    2003-01-01

    The concept of quick trickle characteristic sequence is presented, the properties and count of quick trickle characteristic sequence are researched, the mapping relationship between quick trickle characteristic sequence and quick trickle permutation is discussed. Finally, an efficient construction of quick trickle permutation based on quick trickle characteristic sequence is given, by which quick trickle permutation can be figured out after constructing quick trickle characteristic sequence. Quick trickle permutation has good cryptographic properties.

  14. Simulation-Based Evaluation of Learning Sequences for Instructional Technologies

    Science.gov (United States)

    McEneaney, John E.

    2016-01-01

    Instructional technologies critically depend on systematic design, and learning hierarchies are a commonly advocated tool for designing instructional sequences. But hierarchies routinely allow numerous sequences and choosing an optimal sequence remains an unsolved problem. This study explores a simulation-based approach to modeling learning…

  15. Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

    Science.gov (United States)

    Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien

    2012-01-01

    The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3. PMID:22778697

  16. Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

    Directory of Open Access Journals (Sweden)

    Chun-Tien Chang

    2012-01-01

    Full Text Available The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs, insertion-deletions (indels, short tandem repeats (STRs, and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR, which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS; (iii determine human papilloma virus (HPV genotypes by searching current viral databases in cases of double infections; (iv estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4 and its paralog HSPDP3.

  17. Mixed sequence reader: a program for analyzing DNA sequences with heterozygous base calling.

    Science.gov (United States)

    Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien

    2012-01-01

    The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3.

  18. Identification of protein superfamily from structure- based sequence motif

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    The structure-based sequence motif of the distant proteins in evolution, protein tyrosine phosphatases (PTP) Ⅰ and Ⅱ superfamilies, as an example, has been defined by the structural comparison, structure-based sequence alignment and analyses on substitution patterns of residues in common sequence conserved regions. And the phosphatases Ⅰ and Ⅱ can be correctly identified together by the structure-based PTP sequence motif from SWISS-PROT and TrEBML databases. The results show that the correct rates of identification are over 98%. This is the first time to identify PTP Ⅰ and Ⅱ together by this motif.

  19. Image-based temporal alignment of echocardiographic sequences

    Science.gov (United States)

    Danudibroto, Adriyana; Bersvendsen, Jørn; Mirea, Oana; Gerard, Olivier; D'hooge, Jan; Samset, Eigil

    2016-04-01

    Temporal alignment of echocardiographic sequences enables fair comparisons of multiple cardiac sequences by showing corresponding frames at given time points in the cardiac cycle. It is also essential for spatial registration of echo volumes where several acquisitions are combined for enhancement of image quality or forming larger field of view. In this study, three different image-based temporal alignment methods were investigated. First, a method based on dynamic time warping (DTW). Second, a spline-based method that optimized the similarity between temporal characteristic curves of the cardiac cycle using 1D cubic B-spline interpolation. Third, a method based on the spline-based method with piecewise modification. These methods were tested on in-vivo data sets of 19 echo sequences. For each sequence, the mitral valve opening (MVO) time was manually annotated. The results showed that the average MVO timing error for all methods are well under the time resolution of the sequences.

  20. An Incremental Algorithm of Text Clustering Based on Semantic Sequences

    Institute of Scientific and Technical Information of China (English)

    FENG Zhonghui; SHEN Junyi; BAO Junpeng

    2006-01-01

    This paper proposed an incremental textclustering algorithm based on semantic sequence.Using similarity relation of semantic sequences and calculating the cover of similarity semantic sequences set, the candidate cluster with minimum entropy overlap value was selected as a result cluster every time in this algorithm.The comparison of experimental results shows that the precision of the algorithm is higher than other algorithms under same conditions and this is obvious especially on long documents set.

  1. Feature-based Image Sequence Compression Coding

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    A novel compressing method for video teleconference applications is presented. Semantic-based coding based on human image feature is realized, where human features are adopted as parameters. Model-based coding and the concept of vector coding are combined with the work on image feature extraction to obtain the result.

  2. An Ant-Based Model for Multiple Sequence Alignment

    CERN Document Server

    Guinand, Frédéric

    2008-01-01

    Multiple sequence alignment is a key process in today's biology, and finding a relevant alignment of several sequences is much more challenging than just optimizing some improbable evaluation functions. Our approach for addressing multiple sequence alignment focuses on the building of structures in a new graph model: the factor graph model. This model relies on block-based formulation of the original problem, formulation that seems to be one of the most suitable ways for capturing evolutionary aspects of alignment. The structures are implicitly built by a colony of ants laying down pheromones in the factor graphs, according to relations between blocks belonging to the different sequences.

  3. Movement Pattern Analysis Based on Sequence Signatures

    Directory of Open Access Journals (Sweden)

    Seyed Hossein Chavoshi

    2015-09-01

    Full Text Available Increased affordability and deployment of advanced tracking technologies have led researchers from various domains to analyze the resulting spatio-temporal movement data sets for the purpose of knowledge discovery. Two different approaches can be considered in the analysis of moving objects: quantitative analysis and qualitative analysis. This research focuses on the latter and uses the qualitative trajectory calculus (QTC, a type of calculus that represents qualitative data on moving point objects (MPOs, and establishes a framework to analyze the relative movement of multiple MPOs. A visualization technique called sequence signature (SESI is used, which enables to map QTC patterns in a 2D indexed rasterized space in order to evaluate the similarity of relative movement patterns of multiple MPOs. The applicability of the proposed methodology is illustrated by means of two practical examples of interacting MPOs: cars on a highway and body parts of a samba dancer. The results show that the proposed method can be effectively used to analyze interactions of multiple MPOs in different domains.

  4. Swarm-based Sequencing Recommendations in E-learning

    NARCIS (Netherlands)

    Van den Berg, Bert; Tattersall, Colin; Janssen, José; Brouns, Francis; Kurvers, Hub; Koper, Rob

    2005-01-01

    Van den Berg, B., Tattersall, C., Janssen, J., Brouns, F., Kurvers, H., & Koper, R. (2006). Swarm-based Sequencing Recommendations in E-learning. International Journal of Computer Science & Applications, III(III), 1-11.

  5. Sequence Context Specific Mutagenesis and Base Excision Repair

    OpenAIRE

    Donigan, Katherine; Sweasy, Joann B.

    2009-01-01

    Base excision repair is critical for the maintenance of genome stability because it repairs at least 20,000 endogenously generated DNA lesions per cell per day. Several enzymes within the base excision repair pathway exhibit sequence context dependency during the excision and DNA synthesis steps of repair. New evidence is emerging that germ line and tumor-associated variants of enzymes in this repair pathway exhibit sequence context dependence that is different from their ancestral counterpar...

  6. Immune and Genetic Algorithm Based Assembly Sequence Planning

    Institute of Scientific and Technical Information of China (English)

    YANG Jian-guo; LI Bei-zhi; YU Lei; JIN Yu-song

    2004-01-01

    In this paper an assembly sequence planning model inspired by natural immune and genetic algorithm (ASPIG) based on the part degrees of freedom matrix (PDFM) is proposed, and a proto system - DSFAS based on the ASPIG is introduced to solve assembly sequence problem. The concept and generation of PDFM and DSFAS are also discussed. DSFAS can prevent premature convergence, and promote population diversity, and can accelerate the learning and convergence speed in behavior evolution problem.

  7. Novel Frequency Hopping Sequences Generator Based on AES Algorithm

    Institute of Scientific and Technical Information of China (English)

    李振荣; 庄奕琪; 张博; 张超

    2010-01-01

    A novel frequency hopping(FH) sequences generator based on advanced encryption standard(AES) iterated block cipher is proposed for FH communication systems.The analysis shows that the FH sequences based on AES algorithm have good performance in uniformity, correlation, complexity and security.A high-speed, low-power and low-cost ASIC of FH sequences generator is implemented by optimizing the structure of S-Box and MixColumns of AES algorithm, proposing a hierarchical power management strategy, and applying ...

  8. DNA sequence analysis with droplet-based microfluidics

    Science.gov (United States)

    Abate, Adam R.; Hung, Tony; Sperling, Ralph A.; Mary, Pascaline; Rotem, Assaf; Agresti, Jeremy J.; Weiner, Michael A.; Weitz, David A.

    2014-01-01

    Droplet-based microfluidic techniques can form and process micrometer scale droplets at thousands per second. Each droplet can house an individual biochemical reaction, allowing millions of reactions to be performed in minutes with small amounts of total reagent. This versatile approach has been used for engineering enzymes, quantifying concentrations of DNA in solution, and screening protein crystallization conditions. Here, we use it to read the sequences of DNA molecules with a FRET-based assay. Using probes of different sequences, we interrogate a target DNA molecule for polymorphisms. With a larger probe set, additional polymorphisms can be interrogated as well as targets of arbitrary sequence. PMID:24185402

  9. New chaos-based encryption scheme for digital sequence

    Institute of Scientific and Technical Information of China (English)

    Zhang Zhengwei; Fan Yangyu; Zeng Li

    2007-01-01

    To enhance the anti-breaking performance of privacy information, this article proposes a new encryption method utilizing the leaping peculiarity of the periodic orbits of chaos systems. This method maps the secret sequence to several chaos periodic orbits, and a short sequence obtained by evolving the system parameters of the periodic orbits in another nonlinear system will be the key to reconstruct these periodic orbits. In the decryption end, the shadowing method of chaos trajectory based on the modified Newton-Raphson algorithm is adopted to restore these system parameters. Through deciding which orbit each pair coordinate falls on, the original digital sequence can be decrypted.

  10. Nanopore-based Fourth-generation DNA Sequencing Technology

    Institute of Scientific and Technical Information of China (English)

    Yanxiao Feng; Yuechuan Zhang; Cuifeng Ying; Deqiang Wang; Chunlei Du

    2015-01-01

    Nanopore-based sequencers, as the fourth-generation DNA sequencing technology, have the potential to quickly and reliably sequence the entire human genome for less than $1000, and possibly for even less than$100. The single-molecule techniques used by this technology allow us to further study the interaction between DNA and protein, as well as between protein and protein. Nanopore analysis opens a new door to molecular biology investigation at the single-molecule scale. In this article, we have reviewed academic achievements in nanopore technology from the past as well as the latest advances, including both biological and solid-state nanopores, and discussed their recent and potential applications.

  11. Markov chaotic sequences for correlation based watermarking schemes

    Energy Technology Data Exchange (ETDEWEB)

    Tefas, A.; Nikolaidis, A.; Nikolaidis, N.; Solachidis, V.; Tsekeridou, S.; Pitas, I. E-mail: pitas@zeus.csd.auth.gr

    2003-07-01

    In this paper, statistical analysis of watermarking schemes based on correlation detection is presented. Statistical properties of watermark sequences generated by piecewise-linear Markov maps are exploited, resulting in superior watermark detection reliability. Correlation/spectral properties of such sequences are easily controllable, a fact that affects the watermarking system performance. A family of chaotic maps, namely the skew tent map family, is proposed for use in watermarking schemes.

  12. Solexa sequencing based transcriptome analysis of Helicoverpa armigera larvae.

    Science.gov (United States)

    Li, Jigang; Li, Xiumin; Chen, Yongli; Yang, Zhongxiang; Guo, Sandui

    2012-12-01

    Helicoverpa armigera (Hübner) is a polyphagous Lepidoptera pest which causes great economic losses in crop production worldwide. In contrast to its agricultural importance, advances in the molecular aspects of this insect are quite limited. In the present study, Illumina's SOLEXA sequencing was adopted to determine the transcriptome of young H. armigera larvae. About 7 gigabases of raw sequence data was generated and assembled into 116,601 contigs with an average length of 389 base pairs after data preprocess. 37,352 of these contigs were annotated by searching against Uniref 100 of UniProt database. The annotated sequences were functionally classified into three groups including biological process (15,632 sequences), cellular component (9,562 sequences) and molecular function (19,258 sequences). KEGG (Kyoto Encyclopedia of Genes and Genomes) analysis showed that 1,409 contigs predicted to encode enzymes with enzyme commission numbers were mapped into 220 KEGG pathways in total. Finally, contigs with simple sequence repeats were derived from this dataset. PMID:23065207

  13. Thermodynamics-based models of transcriptional regulation with gene sequence.

    Science.gov (United States)

    Wang, Shuqiang; Shen, Yanyan; Hu, Jinxing

    2015-12-01

    Quantitative models of gene regulatory activity have the potential to improve our mechanistic understanding of transcriptional regulation. However, the few models available today have been based on simplistic assumptions about the sequences being modeled or heuristic approximations of the underlying regulatory mechanisms. In this work, we have developed a thermodynamics-based model to predict gene expression driven by any DNA sequence. The proposed model relies on a continuous time, differential equation description of transcriptional dynamics. The sequence features of the promoter are exploited to derive the binding affinity which is derived based on statistical molecular thermodynamics. Experimental results show that the proposed model can effectively identify the activity levels of transcription factors and the regulatory parameters. Comparing with the previous models, the proposed model can reveal more biological sense.

  14. DNA sequence analysis using hierarchical ART-based classification networks

    Energy Technology Data Exchange (ETDEWEB)

    LeBlanc, C.; Hruska, S.I. [Florida State Univ., Tallahassee, FL (United States); Katholi, C.R.; Unnasch, T.R. [Univ. of Alabama, Birmingham, AL (United States)

    1994-12-31

    Adaptive resonance theory (ART) describes a class of artificial neural network architectures that act as classification tools which self-organize, work in real-time, and require no retraining to classify novel sequences. We have adapted ART networks to provide support to scientists attempting to categorize tandem repeat DNA fragments from Onchocerca volvulus. In this approach, sequences of DNA fragments are presented to multiple ART-based networks which are linked together into two (or more) tiers; the first provides coarse sequence classification while the sub- sequent tiers refine the classifications as needed. The overall rating of the resulting classification of fragments is measured using statistical techniques based on those introduced to validate results from traditional phylogenetic analysis. Tests of the Hierarchical ART-based Classification Network, or HABclass network, indicate its value as a fast, easy-to-use classification tool which adapts to new data without retraining on previously classified data.

  15. Accuracy of structure-based sequence alignment of automatic methods

    Directory of Open Access Journals (Sweden)

    Lee Byungkook

    2007-09-01

    similarity is low, structure-based methods produce better sequence alignments than by using sequence similarities alone. However, current structure-based methods still mis-align 11–19% of the conserved core residues when compared to the human-curated CDD alignments. The alignment quality of each program depends on the protein structural type and similarity, with DaliLite showing the most agreement with CDD on average.

  16. Repeat Sequences and Base Correlations in Human Y Chromosome Palindromes

    Institute of Scientific and Technical Information of China (English)

    Neng-zhi Jin; Zi-xian Liu; Yan-jiao Qi; Wen-yuan Qiu

    2009-01-01

    On the basis of information theory and statistical methods, we use mutual information, n-tuple entropy and conditional entropy, combined with biological characteristics, to analyze the long range correlation and short range correlation in human Y chromosome palindromes. The magnitude distribution of the long range correlation which can be reflected by the mutual information is P5>P5a>P5b (P5a and P5b are the sequences that replace solely Alu repeats and all interspersed repeats with random uncorrelated sequences in human Y chromosome palindrome 5, respectively); and the magnitude distribution of the short range correlation which can be reflected by the n-tuple entropy and the conditional entropy is P5>P5a>P5b>random uncorrelated sequence. In other words, when the Alu repeats and all interspersed repeats replace with random uncorrelated sequence, the long range and short range correlation decrease gradually. However, the random uncorrelated sequence has no correlation. This research indicates that more repeat sequences result in stronger correlation between bases in human Y chromosome. The analyses may be helpful to understand the special structures of human Y chromosome palindromes profoundly.

  17. Protein Function Prediction Based on Sequence and Structure Information

    KAUST Repository

    Smaili, Fatima Z.

    2016-05-25

    The number of available protein sequences in public databases is increasing exponentially. However, a significant fraction of these sequences lack functional annotation which is essential to our understanding of how biological systems and processes operate. In this master thesis project, we worked on inferring protein functions based on the primary protein sequence. In the approach we follow, 3D models are first constructed using I-TASSER. Functions are then deduced by structurally matching these predicted models, using global and local similarities, through three independent enzyme commission (EC) and gene ontology (GO) function libraries. The method was tested on 250 “hard” proteins, which lack homologous templates in both structure and function libraries. The results show that this method outperforms the conventional prediction methods based on sequence similarity or threading. Additionally, our method could be improved even further by incorporating protein-protein interaction information. Overall, the method we use provides an efficient approach for automated functional annotation of non-homologous proteins, starting from their sequence.

  18. Which Microbial Communities Are Present? Sequence-Based Metagenomics

    Science.gov (United States)

    Caffrey, Sean M.

    The use of metagenomic methods that directly sequence environmental samples has revealed the extraordinary microbial diversity missed by traditional culture-based methodologies. Therefore, to develop a complete and representative model of an environment's microbial community and activities, metagenomic analysis is an essential tool.

  19. Multiple Base Substitution Corrections in DNA Sequence Evolution

    Science.gov (United States)

    Kowalczuk, M.; Mackiewicz, P.; Szczepanik, D.; Nowicka, A.; Dudkiewicz, M.; Dudek, M. R.; Cebrat, S.

    We discuss the Jukes and Cantor's one-parameter model and Kimura's two-parameter model unability to describe evolution of asymmetric DNA molecules. The standard distance measure between two DNA sequences, which is the number of substitutions per site, should include the effect of multiple base substitutions separately for each type of the base. Otherwise, the respective tables of substitutions cannot reconstruct the asymmetric DNA molecule with respect to the composition. Basing on Kimura's neutral theory, we have derived a linear law for the correlation of the mean survival time of nucleotides under constant mutation pressure and their fraction in the genome. According to the law, the corrections to Kimura's theory have been discussed to describe evolution of genomes with asymmetric nucleotide composition. We consider the particular case of the strongly asymmetric Borrelia burgdorferi genome and we discuss in detail the corrections, which should be introduced into the distance measure between two DNA sequences to include multiple base substitutions.

  20. DNA sequence analysis with droplet-based microfluidics

    OpenAIRE

    Abate, Adam R.; Hung, Tony; Sperling, Ralph A.; Mary, Pascaline; Rotem, Assaf; Agresti, Jeremy J.; Weiner, Michael A.; Weitz, David A.

    2013-01-01

    Droplet-based microfluidic techniques can form and process micrometer scale droplets at thousands per second. Each droplet can house an individual biochemical reaction, allowing millions of reactions to be performed in minutes with small amounts of total reagent. This versatile approach has been used for engineering enzymes, quantifying concentrations of DNA in solution, and screening protein crystallization conditions. Here, we use it to read the sequences of DNA molecules with a FRET-based ...

  1. A New Images Hiding Scheme Based on Chaotic Sequences

    Institute of Scientific and Technical Information of China (English)

    LIU Nian-sheng; GUO Dong-hui; WU Bo-xi; Parr G

    2005-01-01

    We propose a data hidding technique in a still image. This technique is based on chaotic sequence in the transform domain of covert image. We use different chaotic random sequences multiplied by multiple sensitive images, respectively, to spread the spectrum of sensitive images. Multiple sensitive images are hidden in a covert image as a form of noise. The results of theoretical analysis and computer simulation show the new hiding technique have better properties with high security, imperceptibility and capacity for hidden information in comparison with the conventional scheme such as LSB (Least Significance Bit).

  2. 3D Motion Parameters Determination Based on Binocular Sequence Images

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    Exactly capturing three dimensional (3D) motion information of an object is an essential and important task in computer vision, and is also one of the most difficult problems. In this paper, a binocular vision system and a method for determining 3D motion parameters of an object from binocular sequence images are introduced. The main steps include camera calibration, the matching of motion and stereo images, 3D feature point correspondences and resolving the motion parameters. Finally, the experimental results of acquiring the motion parameters of the objects with uniform velocity and acceleration in the straight line based on the real binocular sequence images by the mentioned method are presented.

  3. Steganalytic method based on short and repeated sequence distance statistics

    Institute of Scientific and Technical Information of China (English)

    WANG GuoXin; PING XiJian; XU ManKun; ZHANG Tao; BAO XiRui

    2008-01-01

    According to the distribution characteristics of short and repeated sequence (SRS),a steganalytic method based on the correlation of image bit planes is proposed.Firstly,we provide the conception of SRS distance statistics and deduce its statistical distribution.Because the SRS distance statistics can effectively reflect the correlation of the sequence,SRS has statistical features when the image bit plane sequence equals the image width.Using this characteristic,the steganalytic method is fulfilled by the distinct test of Poisson distribution.Experimental results show a good performance for detecting LSB matching steganographic method in still images.By the way,the proposed method is not designed for specific steganographic algorithms and has good generality.

  4. Nanopore-based Fourth-generation DNA Sequencing Technology

    Directory of Open Access Journals (Sweden)

    Yanxiao Feng

    2015-02-01

    Full Text Available Nanopore-based sequencers, as the fourth-generation DNA sequencing technology, have the potential to quickly and reliably sequence the entire human genome for less than $1000, and possibly for even less than $100. The single-molecule techniques used by this technology allow us to further study the interaction between DNA and protein, as well as between protein and protein. Nanopore analysis opens a new door to molecular biology investigation at the single-molecule scale. In this article, we have reviewed academic achievements in nanopore technology from the past as well as the latest advances, including both biological and solid-state nanopores, and discussed their recent and potential applications.

  5. Spectroscopic investigation on the telomeric DNA base sequence repeat

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    Telomeres are protein-DNA complexes at the terminals of linear chromosomes, which protect chromosomal integrity and maintain cellular replicative capacity.From single-cell organisms to advanced animals and plants,structures and functions of telomeres are both very conservative. In cells of human and vertebral animals, telomeric DNA base sequences all are (TTAGGG)n. In the present work, we have obtained absorption and fluorescence spectra measured from seven synthesized oligonucleotides to simulate the telomeric DNA system and calculated their relative fluorescence quantum yields on which not only telomeric DNA characteristics are predicted but also possibly the shortened telomeric sequences during cell division are imrelative fluorescence quantum yield and remarkable excitation energy innerconversion, which tallies with the telomeric sequence of (TTAGGG)n. This result shows that telomeric DNA has a strong non-radiative or innerconvertible capability.``

  6. DUK - A Fast and Efficient Kmer Based Sequence Matching Tool

    Energy Technology Data Exchange (ETDEWEB)

    Li, Mingkun; Copeland, Alex; Han, James

    2011-03-21

    A new tool, DUK, is developed to perform matching task. Matching is to find whether a query sequence partially or totally matches given reference sequences or not. Matching is similar to alignment. Indeed many traditional analysis tasks like contaminant removal use alignment tools. But for matching, there is no need to know which bases of a query sequence matches which position of a reference sequence, it only need know whether there exists a match or not. This subtle difference can make matching task much faster than alignment. DUK is accurate, versatile, fast, and has efficient memory usage. It uses Kmer hashing method to index reference sequences and Poisson model to calculate p-value. DUK is carefully implemented in C++ in object oriented design. The resulted classes can also be used to develop other tools quickly. DUK have been widely used in JGI for a wide range of applications such as contaminant removal, organelle genome separation, and assembly refinement. Many real applications and simulated dataset demonstrate its power.

  7. An Uncompressed Image Encryption Algorithm Based on DNA Sequences

    Directory of Open Access Journals (Sweden)

    Shima Ramesh Maniyath

    2011-07-01

    Full Text Available The rapid growth of the Internet and digitized content made image and video distribution simpler. Hence the need for image and video data protection is on the rise. In this paper, we propose a secure and computationally feasible image and video encryption/decryption algorithm based on DNA sequences. The main purpose of this algorithm is to reduce the big image encryption time. This algorithm is implemented by using the natural DNA sequences as main keys. The first part is the process of pixel scrambling. The original image is confused in the light of the scrambling sequence which is generated by the DNA sequence. The second part is the process of pixel replacement. The pixel gray values of the new image and the one of the three encryption templates generated by the other DNA sequence are XORed bit-by-bit in turn. The main scope of this paper is to propose an extension of this algorithm to videos and making it secure using modern Biological technology. A security analysis for the proposed system is performed and presented.

  8. Solid-State Nanopore-Based DNA Sequencing Technology

    Directory of Open Access Journals (Sweden)

    Zewen Liu

    2016-01-01

    Full Text Available The solid-state nanopore-based DNA sequencing technology is becoming more and more attractive for its brand new future in gene detection field. The challenges that need to be addressed are diverse: the effective methods to detect base-specific signatures, the control of the nanopore’s size and surface properties, and the modulation of translocation velocity and behavior of the DNA molecules. Among these challenges, the realization of the high-quality nanopores with the help of modern micro/nanofabrication technologies is a crucial one. In this paper, typical technologies applied in the field of solid-state nanopore-based DNA sequencing have been reviewed.

  9. Repeat-based Sequence Typing of Carnobacterium maltaromaticum.

    Science.gov (United States)

    Rahman, Abdur; El Kheir, Sara M; Back, Alexandre; Mangavel, Cécile; Revol-Junelles, Anne-Marie; Borges, Frédéric

    2016-06-01

    Carnobacterium maltaromaticum is a Lactic Acid Bacterium (LAB) of technological interest for the food industry, especially the dairy as bioprotection and ripening flora. The industrial use of this LAB requires accurate and resolutive typing tools. A new typing method for C. maltaromaticum inspired from MLVA analysis and called Repeat-based Sequence Typing (RST) is described. Rather than electrophoresis analysis, our RST method is based on sequence analysis of multiple loci containing Variable-Number Tandem-Repeats (VNTRs). The method described here for C. maltaromaticum relies on the analysis of three VNTR loci, and was applied to a collection of 24 strains. For each strain, a PCR product corresponding to the amplification of each VNTR loci was sequenced. Sequence analysis allowed delineating 11, 11, and 12 alleles for loci VNTR-A, VNTR-B, and VNTR-C, respectively. Considering the allele combination exhibited by each strain allowed defining 15 genotypes, ending in a discriminatory index of 0.94. Comparison with MLST revealed that both methods were complementary for strain typing in C. maltaromaticum.

  10. Speeding disease gene discovery by sequence based candidate prioritization

    Directory of Open Access Journals (Sweden)

    Porteous David J

    2005-03-01

    Full Text Available Abstract Background Regions of interest identified through genetic linkage studies regularly exceed 30 centimorgans in size and can contain hundreds of genes. Traditionally this number is reduced by matching functional annotation to knowledge of the disease or phenotype in question. However, here we show that disease genes share patterns of sequence-based features that can provide a good basis for automatic prioritization of candidates by machine learning. Results We examined a variety of sequence-based features and found that for many of them there are significant differences between the sets of genes known to be involved in human hereditary disease and those not known to be involved in disease. We have created an automatic classifier called PROSPECTR based on those features using the alternating decision tree algorithm which ranks genes in the order of likelihood of involvement in disease. On average, PROSPECTR enriches lists for disease genes two-fold 77% of the time, five-fold 37% of the time and twenty-fold 11% of the time. Conclusion PROSPECTR is a simple and effective way to identify genes involved in Mendelian and oligogenic disorders. It performs markedly better than the single existing sequence-based classifier on novel data. PROSPECTR could save investigators looking at large regions of interest time and effort by prioritizing positional candidate genes for mutation detection and case-control association studies.

  11. Revision of Begomovirus taxonomy based on pairwise sequence comparisons

    KAUST Repository

    Brown, Judith K.

    2015-04-18

    Viruses of the genus Begomovirus (family Geminiviridae) are emergent pathogens of crops throughout the tropical and subtropical regions of the world. By virtue of having a small DNA genome that is easily cloned, and due to the recent innovations in cloning and low-cost sequencing, there has been a dramatic increase in the number of available begomovirus genome sequences. Even so, most of the available sequences have been obtained from cultivated plants and are likely a small and phylogenetically unrepresentative sample of begomovirus diversity, a factor constraining taxonomic decisions such as the establishment of operationally useful species demarcation criteria. In addition, problems in assigning new viruses to established species have highlighted shortcomings in the previously recommended mechanism of species demarcation. Based on the analysis of 3,123 full-length begomovirus genome (or DNA-A component) sequences available in public databases as of December 2012, a set of revised guidelines for the classification and nomenclature of begomoviruses are proposed. The guidelines primarily consider a) genus-level biological characteristics and b) results obtained using a standardized classification tool, Sequence Demarcation Tool, which performs pairwise sequence alignments and identity calculations. These guidelines are consistent with the recently published recommendations for the genera Mastrevirus and Curtovirus of the family Geminiviridae. Genome-wide pairwise identities of 91 % and 94 % are proposed as the demarcation threshold for begomoviruses belonging to different species and strains, respectively. Procedures and guidelines are outlined for resolving conflicts that may arise when assigning species and strains to categories wherever the pairwise identity falls on or very near the demarcation threshold value.

  12. A Correlational Encoder Decoder Architecture for Pivot Based Sequence Generation

    OpenAIRE

    SAHA, AMRITA; Khapra, Mitesh M.; Chandar, Sarath; Rajendran, Janarthanan; Cho, Kyunghyun

    2016-01-01

    Interlingua based Machine Translation (MT) aims to encode multiple languages into a common linguistic representation and then decode sentences in multiple target languages from this representation. In this work we explore this idea in the context of neural encoder decoder architectures, albeit on a smaller scale and without MT as the end goal. Specifically, we consider the case of three languages or modalities X, Z and Y wherein we are interested in generating sequences in Y starting from inf...

  13. Development in Rice Genome Research Based on Accurate Genome Sequence

    OpenAIRE

    2008-01-01

    Rice is one of the most important crops in the world. Although genetic improvement is a key technology for the acceleration of rice breeding, a lack of genome information had restricted efforts in molecular-based breeding until the completion of the high-quality rice genome sequence, which opened new opportunities for research in various areas of genomics. The syntenic relationship of the rice genome to other cereal genomes makes the rice genome invaluable for understanding how cereal genomes...

  14. Spike-Based Bayesian-Hebbian Learning of Temporal Sequences.

    Directory of Open Access Journals (Sweden)

    Philip J Tully

    2016-05-01

    Full Text Available Many cognitive and motor functions are enabled by the temporal representation and processing of stimuli, but it remains an open issue how neocortical microcircuits can reliably encode and replay such sequences of information. To better understand this, a modular attractor memory network is proposed in which meta-stable sequential attractor transitions are learned through changes to synaptic weights and intrinsic excitabilities via the spike-based Bayesian Confidence Propagation Neural Network (BCPNN learning rule. We find that the formation of distributed memories, embodied by increased periods of firing in pools of excitatory neurons, together with asymmetrical associations between these distinct network states, can be acquired through plasticity. The model's feasibility is demonstrated using simulations of adaptive exponential integrate-and-fire model neurons (AdEx. We show that the learning and speed of sequence replay depends on a confluence of biophysically relevant parameters including stimulus duration, level of background noise, ratio of synaptic currents, and strengths of short-term depression and adaptation. Moreover, sequence elements are shown to flexibly participate multiple times in the sequence, suggesting that spiking attractor networks of this type can support an efficient combinatorial code. The model provides a principled approach towards understanding how multiple interacting plasticity mechanisms can coordinate hetero-associative learning in unison.

  15. Spike-Based Bayesian-Hebbian Learning of Temporal Sequences.

    Science.gov (United States)

    Tully, Philip J; Lindén, Henrik; Hennig, Matthias H; Lansner, Anders

    2016-05-01

    Many cognitive and motor functions are enabled by the temporal representation and processing of stimuli, but it remains an open issue how neocortical microcircuits can reliably encode and replay such sequences of information. To better understand this, a modular attractor memory network is proposed in which meta-stable sequential attractor transitions are learned through changes to synaptic weights and intrinsic excitabilities via the spike-based Bayesian Confidence Propagation Neural Network (BCPNN) learning rule. We find that the formation of distributed memories, embodied by increased periods of firing in pools of excitatory neurons, together with asymmetrical associations between these distinct network states, can be acquired through plasticity. The model's feasibility is demonstrated using simulations of adaptive exponential integrate-and-fire model neurons (AdEx). We show that the learning and speed of sequence replay depends on a confluence of biophysically relevant parameters including stimulus duration, level of background noise, ratio of synaptic currents, and strengths of short-term depression and adaptation. Moreover, sequence elements are shown to flexibly participate multiple times in the sequence, suggesting that spiking attractor networks of this type can support an efficient combinatorial code. The model provides a principled approach towards understanding how multiple interacting plasticity mechanisms can coordinate hetero-associative learning in unison. PMID:27213810

  16. Caption detection from video sequence based on fuzzy neural networks

    Science.gov (United States)

    Gao, Xinbo; Xin, Hong; Li, Jie

    2001-09-01

    Caption graphically superimposed in video frames can provide important indexing information. The automatic detection and recognition of video captions can be of great help in querying topics of interest in digital news library. To detect the caption from video sequence, we present algorithms based on fuzzy clustering neural networks. Since neural networks have the capabilities of learning and self-organizing and parallel computing mechanism, with the great increasing of digital images and video databases, neural networks based techniques become more efficient and popular tools for multimedia processing. Experimental results show that our caption detection scheme is effective and robust.

  17. Translating sanger-based routine DNA diagnostics into generic massive parallel ion semiconductor sequencing

    NARCIS (Netherlands)

    Diekstra, A.; Bosgoed, E.A.J.; Rikken, A.; Lier, B. van; Kamsteeg, E.J.; Tychon, M.W.J.; Derks, R.C.; Soest, R.A.; Mensenkamp, A.R.; Scheffer, H.; Neveling, K.; Nelen, M.R.

    2015-01-01

    BACKGROUND: Dideoxy-based chain termination sequencing developed by Sanger is the gold standard sequencing approach and allows clinical diagnostics of disorders with relatively low genetic heterogeneity. Recently, new next generation sequencing (NGS) technologies have found their way into diagnostic

  18. [Segmentation Method for Liver Organ Based on Image Sequence Context].

    Science.gov (United States)

    Zhang, Meiyun; Fang, Bin; Wang, Yi; Zhong, Nanchang

    2015-10-01

    In view of the problems of more artificial interventions and segmentation defects in existing two-dimensional segmentation methods and abnormal liver segmentation errors in three-dimensional segmentation methods, this paper presents a semi-automatic liver organ segmentation method based on the image sequence context. The method takes advantage of the existing similarity between the image sequence contexts of the prior knowledge of liver organs, and combines region growing and level set method to carry out semi-automatic segmentation of livers, along with the aid of a small amount of manual intervention to deal with liver mutation situations. The experiment results showed that the liver segmentation algorithm presented in this paper had a high precision, and a good segmentation effect on livers which have greater variability, and can meet clinical application demands quite well.

  19. Cladistic analysis of iridoviruses based on protein and DNA sequences.

    Science.gov (United States)

    Wang, J W; Deng, R Q; Wang, X Z; Huang, Y S; Xing, K; Feng, J H; He, J G; Long, Q X

    2003-11-01

    Cladograms of iridoviruses were inferred from bootstrap analysis of molecular data sets comprising all published protein and DNA sequences of the major capsid protein, ATPase and DNA polymerase genes of members of the Iridoviridae family Iridovirus. All data sets yielded cladograms supporting the separation of the Iridovirus, Ranavirus and Lymphocystivirus genera, and the cladogram based on data derived from major capsid proteins further divided both the Iridovirus and Ranavirus genera into two groups. Tests of alternative hypotheses of topological constraints were also performed to further investigate relationships between infectious spleen and kidney necrosis virus (ISKNV), an unclassified fish iridovirus for which the complete genome sequence data is available, and other iridoviruses. Cladograms inferred and results of Shimodaira-Hasegawa tests indicated that ISKNV is more closely related to the Ranavirus genus than it is to the other genera of the family.

  20. Entamoeba histolytica: observations on metabolism based on thegenome sequence

    Energy Technology Data Exchange (ETDEWEB)

    Anderson, Iain J.; Loftus, Brendan J.

    2005-07-01

    The sequencing of the genome of Entamoeba histolytica has allowed a reconstruction of its metabolic pathways, many of which are unusual for a eukaryote. Based on the genome sequence, it appears that amino acids may play a larger role than previously thought in energy metabolism, with roles in both ATP synthesis and NAD regeneration. Arginine decarboxylase may be involved in survival of E. histolytica during its passage through the stomach. The usual pyrimidine synthesis pathway is absent, but a partial pyrimidine degradation pathway could be part of a novel pyrimidine synthesis pathway. Ribonucleotide reductase was not found in the E. histolytica genome, but it was found in the close relatives Entamoeba invadens and Entamoeba moshkovskii, suggesting a recent loss from E. histolytica. The usual eukaryotic glucose transporters are not present, but members of a prokaryotic monosaccharide transporter family are present.

  1. Watermarking scheme of colour image based on chaotic sequences

    Institute of Scientific and Technical Information of China (English)

    LIU Nian-sheng; GUO Dong-hui

    2009-01-01

    The proposed perceptual mask is based on the singularity of cover image and matches very well with the properties of the human visual system. The cover colour image is decomposed into several subbands by the wavelet transform. The water-mark composed of chaotic sequence and the covert image is embedded into the subband with the largest energy. The chaos system plays an important role in the security invisibility and robustness of the proposed scheme. The parameter and initial state of chaos system can directly influence the generation of watermark information as a key. Moreover, the watermark information has the property of spread spectrum signal by chaotic sequence to improve the invisibility and security of watermarked image. Experimental results and comparisons with other watermarking techniques prove that the proposed algorithm is effective and feasible, and improves the security, invisibility and robustness of watermarking information.

  2. Prediction of potential drug targets based on simple sequence properties

    Directory of Open Access Journals (Sweden)

    Lai Luhua

    2007-09-01

    Full Text Available Abstract Background During the past decades, research and development in drug discovery have attracted much attention and efforts. However, only 324 drug targets are known for clinical drugs up to now. Identifying potential drug targets is the first step in the process of modern drug discovery for developing novel therapeutic agents. Therefore, the identification and validation of new and effective drug targets are of great value for drug discovery in both academia and pharmaceutical industry. If a protein can be predicted in advance for its potential application as a drug target, the drug discovery process targeting this protein will be greatly speeded up. In the current study, based on the properties of known drug targets, we have developed a sequence-based drug target prediction method for fast identification of novel drug targets. Results Based on simple physicochemical properties extracted from protein sequences of known drug targets, several support vector machine models have been constructed in this study. The best model can distinguish currently known drug targets from non drug targets at an accuracy of 84%. Using this model, potential protein drug targets of human origin from Swiss-Prot were predicted, some of which have already attracted much attention as potential drug targets in pharmaceutical research. Conclusion We have developed a drug target prediction method based solely on protein sequence information without the knowledge of family/domain annotation, or the protein 3D structure. This method can be applied in novel drug target identification and validation, as well as genome scale drug target predictions.

  3. Generalization of entropy based divergence measures for symbolic sequence analysis.

    Directory of Open Access Journals (Sweden)

    Miguel A Ré

    Full Text Available Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD, is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms.

  4. Generalization of entropy based divergence measures for symbolic sequence analysis.

    Science.gov (United States)

    Ré, Miguel A; Azad, Rajeev K

    2014-01-01

    Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms.

  5. Next-Generation Sequencing-Based Molecular Diagnosis of Choroideremia

    Directory of Open Access Journals (Sweden)

    Kayo Shimizu

    2015-07-01

    Full Text Available We screened patients with choroideremia using next-generation sequencing (NGS and identified a novel mutation and a known mutation in the CHM gene. One patient presented an atypical fundus appearance for choroideremia. Another patient presented macular hole retinal detachment in the left eye. The present case series shows the utility of NGS-based screening in patients with choroideremia. In addition, the presence of macular hole in 1 of the 2 patients, together with a previous report, indicated the susceptibility of patients with choroideremia to macular hole.

  6. Rapid sequencing of DNA based on single-molecule detection

    Science.gov (United States)

    Soper, Steven A.; Davis, Lloyd M.; Fairfield, Frederick R.; Hammond, Mark L.; Harger, Carol A.; Jett, James H.; Keller, Richard A.; Marrone, Babetta L.; Martin, John C.; Nutter, Harvey L.; Shera, E. Brooks; Simpson, Daniel J.

    1991-07-01

    Sequencing the human genome is a major undertaking considering the large number of nucleotides present in the genome and the slow methods currently available to perform the task. The authors have recently reported on a scheme to sequence DNA rapidly using a non-gel based technique. The concept is based upon the incorporation of fluorescently labeled nucleotides into a strand of DNA, isolation and manipulation of a labeled DNA fragment and the detection of single nucleotides using ultra-sensitive laser-induced fluorescence detection following their cleavage from the fragment. Detection of individual fluorophores in the liquid phase was accomplished with time-gated detection following pulsed-laser excitation. The photon bursts from individual rhodamine 6G (R6G) molecules travelling through a laser beam have been observed, as have bursts from single fluorescently modified nucleotides. Using two different biotinylated nucleotides as a model system for fluorescently labeled nucleotides, the authors have observed synthesis of the complementary copy of M13 bacteriophage. Work with fluorescently labeled nucleotides is underway. Individual molecules of DNA attached to a microbead have been observed and manipulated with an epifluorescence microscope.

  7. Sequence-based classification using discriminatory motif feature selection.

    Directory of Open Access Journals (Sweden)

    Hao Xiong

    Full Text Available Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all k-mer patterns. The motivation behind such (enumerative approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length ≤ k, such that potentially important, longer (> k predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated. We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is

  8. Development of Sequence-Based Microsatellite Marker for Phalaenopsis Orchid

    Directory of Open Access Journals (Sweden)

    FATIMAH

    2011-06-01

    Full Text Available Phalaenopsis is one of the most interesting genera of orchids due to the members are often used as parents to produce hybrids. The establishment and development of highly reliable and discriminatory methods for identifying species and cultivars has become increasingly more important to plant breeders and members of the nursery industry. The aim of this research was to develop sequence-based microsatellite (eSSR markers for the Phalaenopsis orchid designed from the sequence of GenBank NCBI. Seventeen primers were designed and thirteen primers pairs could amplify the DNA giving the expected PCR product with polymorphism. A total of 51 alleles, with an average of 3 alleles per locus and polymorphism information content (PIC values at 0.674, were detected at the 16 SSR loci. Therefore, these markers could be used for identification of the Phalaenopsis orchid used in this study. Genetic similarity and principle coordinate analysis identified five major groups of Phalaenopsis sp. the first group consisted of P. amabilis, P. fuscata, P. javanica, and P. zebrine. The second group consisted of P. amabilis, P. amboinensis, P. bellina, P. floresens, and P. mannii. The third group consisted of P. bellina, P. cornucervi, P. cornucervi, P. violaceae sumatra, P. modesta. The forth group consisted of P. cornucervi and P. lueddemanniana, and the fifth group was P. amboinensis.

  9. New algorithm for iris recognition based on video sequences

    Science.gov (United States)

    Bourennane, Salah; Fossati, Caroline; Ketchantang, William

    2010-07-01

    Among existing biometrics, iris recognition systems are among the most accurate personal biometric identification systems. However, the acquisition of a workable iris image requires strict cooperation of the user; otherwise, the image will be rejected by a verification module because of its poor quality, inducing a high false reject rate (FRR). The FRR may also increase when iris localization fails or when the pupil is too dilated. To improve the existing methods, we propose to use video sequences acquired in real time by a camera. In order to keep the same computational load to identify the iris, we propose a new method to estimate the iris characteristics. First, we propose a new iris texture characterization based on Fourier-Mellin transform, which is less sensitive to pupil dilatations than previous methods. Then, we develop a new iris localization algorithm that is robust to variations of quality (partial occlusions due to eyelids and eyelashes, light reflects, etc.), and finally, we introduce a fast and new criterion of suitable image selection from an iris video sequence for an accurate recognition. The accuracy of each step of the algorithm in the whole proposed recognition process is tested and evaluated using our own iris video database and several public image databases, such as CASIA, UBIRIS, and BATH.

  10. Similarity Measurement of Web Sessions Based on Sequence Alignment

    Institute of Scientific and Technical Information of China (English)

    LI Chaofeng; LU Yansheng

    2007-01-01

    The task of clustering Web sessions is to group Web sessions based on similarity and consists of maximizing the intra-group similarity while minimizing the inter-group similarity.The first and foremost question needed to be considered in clustering Web sessions is how to measure the similarity between Web sessions. However, there are many shortcomings in traditional measurements. This paper introduces a new method for measuring similarities between Web pages that takes into account not only the URL but also the viewing time of the visited Web page. Then we give a new method to measure the similarity of Web sessions using sequence alignment and the similarity of Web page access in detail.Experiments have proved that our method is valid and efficient.

  11. BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations.

    Science.gov (United States)

    Bahr, A; Thompson, J D; Thierry, J C; Poch, O

    2001-01-01

    BAliBASE is specifically designed to serve as an evaluation resource to address all the problems encountered when aligning complete sequences. The database contains high quality, manually constructed multiple sequence alignments together with detailed annotations. The alignments are all based on three-dimensional structural superpositions, with the exception of the transmembrane sequences. The first release provided sets of reference alignments dealing with the problems of high variability, unequal repartition and large N/C-terminal extensions and internal insertions. Here we describe version 2.0 of the database, which incorporates three new reference sets of alignments containing structural repeats, trans-membrane sequences and circular permutations to evaluate the accuracy of detection/prediction and alignment of these complex sequences. BAliBASE can be viewed at the web site http://www-igbmc.u-strasbg. fr/BioInfo/BAliBASE2/index.html or can be downloaded from ftp://ftp-igbmc.u-strasbg.fr/pub/BAliBASE2 /.

  12. GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACH

    Directory of Open Access Journals (Sweden)

    Zeinab A. Fareed

    2016-02-01

    Full Text Available The alignment of two DNA sequences is a basic step in the analysis of biological data. Sequencing a long DNA sequence is one of the most interesting problems in bioinformatics. Several techniques have been developed to solve this sequence alignment problem like dynamic programming and heuristic algorithms. In this paper, we introduce (GPCodon alignment a pairwise DNA-DNA method for global sequence alignment that improves the accuracy of pairwise sequence alignment. We use a new scoring matrix to produce the final alignment called the empirical codon substitution matrix. Using this matrix in our technique enabled the discovery of new relationships between sequences that could not be discovered using traditional matrices. In addition, we present experimental results that show the performance of the proposed technique over eleven datasets of average length of 2967 bps. We compared the efficiency and accuracy of our techniques against a comparable tool called “Pairwise Align Codons” [1].

  13. AbCD: arbitrary coverage design for sequencing-based genetic studies

    OpenAIRE

    Kang, Jian; Huang, Kuan-Chieh; Xu, Zheng; Wang, Yunfei; Abecasis, Gonçalo R.; Li, Yun

    2013-01-01

    Summary: Recent advances in sequencing technologies have revolutionized genetic studies. Although high-coverage sequencing can uncover most variants present in the sequenced sample, low-coverage sequencing is appealing for its cost effectiveness. Here, we present AbCD (arbitrary coverage design) to aid the design of sequencing-based studies. AbCD is a user-friendly interface providing pre-estimated effective sample sizes, specific to each minor allele frequency category, for designs with arbi...

  14. A DNA Structure-Based Bionic Wavelet Transform and Its Application to DNA Sequence Analysis

    OpenAIRE

    Fei Chen; Yuan-Ting Zhang

    2003-01-01

    DNA sequence analysis is of great significance for increasing our understanding of genomic functions. An important task facing us is the exploration of hidden structural information stored in the DNA sequence. This paper introduces a DNA structure-based adaptive wavelet transform (WT) – the bionic wavelet transform (BWT) – for DNA sequence analysis. The symbolic DNA sequence can be separated into four channels of indicator sequences. An adaptive symbol-to-number mapping, determined from the s...

  15. SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCLEOTIDE BASES, FREQUENCY AND POSITION OF GROUP MUTATIONS

    Directory of Open Access Journals (Sweden)

    Fatima KABLI

    2016-01-01

    Full Text Available The DNA sequences similarity analysis approaches have been based on the representation and the frequency of sequences components; however, the position inside sequence is important information for the sequence data. Whereas, insufficient information in sequences representations is important reason that causes poor similarity results. Based on three classifications of the DNA bases according to their chemical properties, the frequencies and average positions of group mutations have been grouped into two twelve-components vectors, the Euclidean distances among introduced vectors applied to compare the coding sequences of the first exon of beta globin gene of 11 species.

  16. Java Implementation based Heterogeneous Video Sequence Automated Surveillance Monitoring

    Directory of Open Access Journals (Sweden)

    Sankari Muthukarupan

    2013-04-01

    Full Text Available Automated video based surveillance monitoring is an essential and computationally challenging task to resolve issues in the secure access localities. This paper deals with some of the issues which are encountered in the integration surveillance monitoring in the real-life circumstances. We have employed video frames which are extorted from heterogeneous video formats. Each video frame is chosen to identify the anomalous events which are occurred in the sequence of time-driven process. Background subtraction is essentially required based on the optimal threshold and reference frame. Rest of the frames are ablated from reference image, hence all the foreground images paradigms are obtained. The co-ordinate existing in the deducted images is found by scanning the images horizontally until the occurrence of first black pixel. Obtained coordinate is twinned with existing co-ordinates in the primary images. The twinned co-ordinate in the primary image is considered as an active-region-of-interest. At the end, the starred images are converted to temporal video that scrutinizes the moving silhouettes of human behaviors in a static background. The proposed model is implemented in Java. Results and performance analysis are carried out in the real-life environments.

  17. Multiple sequence alignment based on combining genetic algorithm with chaotic sequences.

    Science.gov (United States)

    Gao, C; Wang, B; Zhou, C J; Zhang, Q

    2016-06-24

    In bioinformatics, sequence alignment is one of the most common problems. Multiple sequence alignment is an NP (nondeterministic polynomial time) problem, which requires further study and exploration. The chaos optimization algorithm is a type of chaos theory, and a procedure for combining the genetic algorithm (GA), which uses ergodicity, and inherent randomness of chaotic iteration. It is an efficient method to solve the basic premature phenomenon of the GA. Applying the Logistic map to the GA and using chaotic sequences to carry out the chaotic perturbation can improve the convergence of the basic GA. In addition, the random tournament selection and optimal preservation strategy are used in the GA. Experimental evidence indicates good results for this process.

  18. Complete chloroplast genome sequence of Fritillaria unibracteata var. wabuensis based on SMRT Sequencing Technology.

    Science.gov (United States)

    Li, Ying; Li, Qiushi; Li, Xiwen; Song, Jingyuan; Sun, Chao

    2016-09-01

    Fritillaria unibracteata var. wabuensis is an important medicinal plant used for the treatment of cough symptoms related to the respiratory system. The chloroplast genome of F. unibracteata var. wabuensis (GenBank accession no. KF769142) was assembled using the PacBio RS platform (Pacific Biosciences, Beverly, MA) as a circle sequence with 151 009 bp. The assembled genome contains 133 genes, including 88 protein-coding, 37 tRNA, and eight rRNA genes. This genome sequence will provide important resource for further studies on the evolution of Fritillaria genus and molecular identification of Fritillaria herbs and their adulterants. This work suggests that PacBio RS is a powerful tool to sequence and assemble chloroplast genomes. PMID:26370383

  19. Roche genome sequencer FLX based high-throughput sequencing of ancient DNA

    DEFF Research Database (Denmark)

    Alquezar-Planas, David E; Fordyce, Sarah Louise

    2012-01-01

    Since the development of so-called "next generation" high-throughput sequencing in 2005, this technology has been applied to a variety of fields. Such applications include disease studies, evolutionary investigations, and ancient DNA. Each application requires a specialized protocol to ensure tha...

  20. Protein sequence for clustering DNA based on Artificial Neural Networks

    Directory of Open Access Journals (Sweden)

    Gamal. F. Elhadi

    2012-01-01

    Full Text Available DNA is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms and some viruses. Clustering is a process that groups a set of objects into clusters so that the similarity among objects in the same cluster is high, while that among the objects in different clusters is low. In this paper, we proposed an approach for clustering DNA sequences using Self-Organizing Map (SOM algorithm and Protein Sequence. The main objective is to analyze biological data and to bunch DNA to many clusters more easily and efficiently. We use the proposed approach to analyze both large and small amount of input DNA sequences. The results show that the similarity of the sequences does not depend on the amount of input sequences. Our approach depends on evaluating the degree of the DNA sequences similarity using the hierarchal representation Dendrogram. Representing large amount of data using hierarchal tree gives the ability to compare large sequences efficiently

  1. Phylogenetic relationships of Salmonella based on rRNA sequences

    DEFF Research Database (Denmark)

    Christensen, H.; Nordentoft, Steen; Olsen, J.E.

    1998-01-01

    To establish the phylogenetic relationships between the subspecies of Salmonella enterica (official name Salmonella choleraesuis), Salmonella bongori and related members of Enterobacteriaceae, sequence comparison of rRNA was performed by maximum-likelihood analysis. The two Salmonella species were...

  2. PHARMACOGENETIC TESTING OPPORTUNITIES IN CARDIOLOGY BASED ON EXOME SEQUENCING

    Directory of Open Access Journals (Sweden)

    N. V. Shcherbakova

    2014-01-01

    Full Text Available Aim. To study what cardiac drugs currently have any comments on biomarkers and what information can be obtained by pharmacogenetic testing using data exome sequencing in patients with cardiac diseases.Material and methods. Exome sequencing in random participant of the ATEROGEN IVANOVO study and bioinformatics analysis of the data were performed. Point mutations were annotated using ANNOVAR program, as well as comparison with a number of specialized databases was done on the basis of user protocols.Results. 11 cardiac drugs and 7 genes which variants can influence cardiac drug metabolism were analyzed. According to exome sequencing of the participant we did not reveal allelic variants that require dose regime correction and careful efficacy control.Conclusion. The exome sequencing application is the next step to a wide range of personalized therapy. Future opportunities for improvement of the risk-benefit ratio in each patient are the main purpose of the collection and analysis of pharmacogenetic data.

  3. Antibiotic Selection Pressure Determination through Sequence-Based Metagenomics.

    Science.gov (United States)

    Willmann, Matthias; El-Hadidi, Mohamed; Huson, Daniel H; Schütz, Monika; Weidenmaier, Christopher; Autenrieth, Ingo B; Peter, Silke

    2015-12-01

    The human gut forms a dynamic reservoir of antibiotic resistance genes (ARGs). Treatment with antimicrobial agents has a significant impact on the intestinal resistome and leads to enhanced horizontal transfer and selection of resistance. We have monitored the development of intestinal ARGs over a 6-day course of ciprofloxacin (Cp) treatment in two healthy individuals by using sequenced-based metagenomics and different ARG quantification methods. Fixed- and random-effect models were applied to determine the change in ARG abundance per defined daily dose of Cp as an expression of the respective selection pressure. Among various shifts in the composition of the intestinal resistome, we found in one individual a strong positive selection for class D beta-lactamases which were partly located on a mobile genetic element. Furthermore, a trend to a negative selection has been observed with class A beta-lactamases (-2.66 hits per million sample reads/defined daily dose; P = 0.06). By 4 weeks after the end of treatment, the composition of ARGs returned toward their initial state but to a different degree in both subjects. We present here a novel analysis algorithm for the determination of antibiotic selection pressure which can be applied in clinical settings to compare therapeutic regimens regarding their effect on the intestinal resistome. This information is of critical importance for clinicians to choose antimicrobial agents with a low selective force on their patients' intestinal ARGs, likely resulting in a diminished spread of resistance and a reduced burden of hospital-acquired infections with multidrug-resistant pathogens. PMID:26369961

  4. Antibiotic Selection Pressure Determination through Sequence-Based Metagenomics.

    Science.gov (United States)

    Willmann, Matthias; El-Hadidi, Mohamed; Huson, Daniel H; Schütz, Monika; Weidenmaier, Christopher; Autenrieth, Ingo B; Peter, Silke

    2015-12-01

    The human gut forms a dynamic reservoir of antibiotic resistance genes (ARGs). Treatment with antimicrobial agents has a significant impact on the intestinal resistome and leads to enhanced horizontal transfer and selection of resistance. We have monitored the development of intestinal ARGs over a 6-day course of ciprofloxacin (Cp) treatment in two healthy individuals by using sequenced-based metagenomics and different ARG quantification methods. Fixed- and random-effect models were applied to determine the change in ARG abundance per defined daily dose of Cp as an expression of the respective selection pressure. Among various shifts in the composition of the intestinal resistome, we found in one individual a strong positive selection for class D beta-lactamases which were partly located on a mobile genetic element. Furthermore, a trend to a negative selection has been observed with class A beta-lactamases (-2.66 hits per million sample reads/defined daily dose; P = 0.06). By 4 weeks after the end of treatment, the composition of ARGs returned toward their initial state but to a different degree in both subjects. We present here a novel analysis algorithm for the determination of antibiotic selection pressure which can be applied in clinical settings to compare therapeutic regimens regarding their effect on the intestinal resistome. This information is of critical importance for clinicians to choose antimicrobial agents with a low selective force on their patients' intestinal ARGs, likely resulting in a diminished spread of resistance and a reduced burden of hospital-acquired infections with multidrug-resistant pathogens.

  5. Autonomously generating operations sequences for a Mars Rover using AI-based planning

    Science.gov (United States)

    Sherwood, Rob; Mishkin, Andrew; Estlin, Tara; Chien, Steve; Backes, Paul; Cooper, Brian; Maxwell, Scott; Rabideau, Gregg

    2001-01-01

    This paper discusses a proof-of-concept prototype for ground-based automatic generation of validated rover command sequences from highlevel science and engineering activities. This prototype is based on ASPEN, the Automated Scheduling and Planning Environment. This Artificial Intelligence (AI) based planning and scheduling system will automatically generate a command sequence that will execute within resource constraints and satisfy flight rules.

  6. The Research of Chaos-based M-ary Spreading Sequences

    Directory of Open Access Journals (Sweden)

    YANG Hongye

    2012-12-01

    Full Text Available This paper is devoted to the generation and evaluation of the Chaos-based M-ary spreading sequences on communications systems. Sequences obtained by repeating a truncated and multi-ary quantized chaotic series are compared with classical m-sequences by means of the autocorrelation and cross-correlation properties and power-spectral features. Anti-noise performance of binary sequences and chaotic-based M-ary spreading sequences has been compared in the case of the same single-frequency interferences. Studies have shown that spectral features and anti-noise performance of chaotic-based M-ary spreading sequences which have great researching value are better than binary sequences.

  7. Novel Sequence Number Based Secure Authentication Scheme for Wireless LANs

    Institute of Scientific and Technical Information of China (English)

    Rajeev Singh; Teek Parval Sharma

    2015-01-01

    Authentication per frame is an implicit necessity for security in wireless local area networks (WLANs). We propose a novel per frame secure authentication scheme which provides authentication to data frames in WLANs. The scheme involves no cryptographic overheads for authentication of frames. It utilizes the sequence number of the frame along with the authentication stream generators for authentication. Hence, it requires no extra bits or messages for the authentication purpose and also no change in the existing frame format is required. The scheme provides authentication by modifying the sequence number of the frame at the sender, and that the modification is verified at the receiver. The modified sequence number is protected by using the XOR operation with a random number selected from the random stream. The authentication is lightweight due to the fact that it requires only trivial arithmetic operations like the subtraction and XOR operation.

  8. Predicting tissue-specific expressions based on sequence characteristics

    KAUST Repository

    Paik, Hyojung

    2011-04-30

    In multicellular organisms, including humans, understanding expression specificity at the tissue level is essential for interpreting protein function, such as tissue differentiation. We developed a prediction approach via generated sequence features from overrepresented patterns in housekeeping (HK) and tissue-specific (TS) genes to classify TS expression in humans. Using TS domains and transcriptional factor binding sites (TFBSs), sequence characteristics were used as indices of expressed tissues in a Random Forest algorithm by scoring exclusive patterns considering the biological intuition; TFBSs regulate gene expression, and the domains reflect the functional specificity of a TS gene. Our proposed approach displayed better performance than previous attempts and was validated using computational and experimental methods.

  9. A DNA Structure-Based Bionic Wavelet Transform and Its Application to DNA Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Fei Chen

    2003-01-01

    Full Text Available DNA sequence analysis is of great significance for increasing our understanding of genomic functions. An important task facing us is the exploration of hidden structural information stored in the DNA sequence. This paper introduces a DNA structure-based adaptive wavelet transform (WT – the bionic wavelet transform (BWT – for DNA sequence analysis. The symbolic DNA sequence can be separated into four channels of indicator sequences. An adaptive symbol-to-number mapping, determined from the structural feature of the DNA sequence, was introduced into WT. It can adjust the weight value of each channel to maximise the useful energy distribution of the whole BWT output. The performance of the proposed BWT was examined by analysing synthetic and real DNA sequences. Results show that BWT performs better than traditional WT in presenting greater energy distribution. This new BWT method should be useful for the detection of the latent structural features in future DNA sequence analysis.

  10. Robin Sequence: The road to evidence based personalized treatment

    NARCIS (Netherlands)

    H. Basart

    2016-01-01

    Robin Sequence (RS) is characterized by micrognathia and upper airway obstruction (UAO) caused by glossoptosis resulting in respiratory and feeding problems of varying severity. According to the original RS definition a cleft palate is associated with RS, but not part of the definition. Reported inc

  11. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby;

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  12. Phylogeny of Vibrio cholerae Based on recA Sequence

    OpenAIRE

    Stine, O. Colin; Sozhamannan, Shanmuga; Gou, Qing; Zheng, Siqen; Morris, J. Glenn; Johnson, Judith A.

    2000-01-01

    We sequenced a 705-bp fragment of the recA gene from 113 Vibrio cholerae strains and closely related species. One hundred eighty-seven nucleotides were phylogenetically informative, 55 were phylogenetically uninformative, and 463 were invariant. Not unexpectedly, Vibrio parahaemolyticus and Vibrio vulnificus strains formed out-groups; we also identified isolates which resembled V. cholerae biochemically but which did not cluster with V. cholerae. In many instances, V. cholerae serogroup desig...

  13. Evolutionary insights from suffix array-based genome sequence analysis

    Indian Academy of Sciences (India)

    Anindya Poddar; Nagasuma Chandra; Madhavi Ganapathiraju; K Sekar; Judith Klein-Seetharaman; Raj Reddy; N Balakrishnan

    2007-08-01

    Gene and protein sequence analyses, central components of studies in modern biology are easily amenable to string matching and pattern recognition algorithms. The growing need of analysing whole genome sequences more efficiently and thoroughly, has led to the emergence of new computational methods. Suffix trees and suffix arrays are data structures, well known in many other areas and are highly suited for sequence analysis too. Here we report an improvement to the design of construction of suffix arrays. Enhancement in versatility and scalability, enabled by this approach, is demonstrated through the use of real-life examples. The scalability of the algorithm to whole genomes renders it suitable to address many biologically interesting problems. One example is the evolutionary insight gained by analysing unigrams, bi-grams and higher n-grams, indicating that the genetic code has a direct influence on the overall composition of the genome. Further, different proteomes have been analysed for the coverage of the possible peptide space, which indicate that as much as a quarter of the total space at the tetra-peptide level is left un-sampled in prokaryotic organisms, although almost all tri-peptides can be seen in one protein or another in a proteome. Besides, distinct patterns begin to emerge for the counts of particular tetra and higher peptides, indicative of a ‘meaning’ for tetra and higher n-grams. The toolkit has also been used to demonstrate the usefulness of identifying repeats in whole proteomes efficiently. As an example, 16 members of one COG, coded by the genome of Mycobacterium tuberculosis H37Rv have been found to contain a repeating sequence of 300 amino acids.

  14. INDUS - a composition-based approach for rapid and accurate taxonomic classification of metagenomic sequences

    OpenAIRE

    Mohammed, Monzoorul Haque; Ghosh, Tarini Shankar; Reddy, Rachamalla Maheedhar; Reddy, Chennareddy Venkata Siva Kumar; Singh, Nitin Kumar; Sharmila S Mande

    2011-01-01

    Background Taxonomic classification of metagenomic sequences is the first step in metagenomic analysis. Existing taxonomic classification approaches are of two types, similarity-based and composition-based. Similarity-based approaches, though accurate and specific, are extremely slow. Since, metagenomic projects generate millions of sequences, adopting similarity-based approaches becomes virtually infeasible for research groups having modest computational resources. In this study, we present ...

  15. DNA Lossless Differential Compression Algorithm based on Similarity of Genomic Sequence Database

    CERN Document Server

    Afify, Heba; Wahed, Manal Abdel

    2011-01-01

    Modern biological science produces vast amounts of genomic sequence data. This is fuelling the need for efficient algorithms for sequence compression and analysis. Data compression and the associated techniques coming from information theory are often perceived as being of interest for data communication and storage. In recent years, a substantial effort has been made for the application of textual data compression techniques to various computational biology tasks, ranging from storage and indexing of large datasets to comparison of genomic databases. This paper presents a differential compression algorithm that is based on production of difference sequences according to op-code table in order to optimize the compression of homologous sequences in dataset. Therefore, the stored data are composed of reference sequence, the set of differences, and differences locations, instead of storing each sequence individually. This algorithm does not require a priori knowledge about the statistics of the sequence set. The...

  16. Comparison of sequence-based and structure-based phylogenetic trees of homologous proteins: Inferences on protein evolution

    Indian Academy of Sciences (India)

    S Balaji; N Srinivasan

    2007-01-01

    Several studies based on the known three-dimensional (3-D) structures of proteins show that two homologous proteins with insignificant sequence similarity could adopt a common fold and may perform same or similar biochemical functions. Hence, it is appropriate to use similarities in 3-D structure of proteins rather than the amino acid sequence similarities in modelling evolution of distantly related proteins. Here we present an assessment of using 3-D structures in modelling evolution of homologous proteins. Using a dataset of 108 protein domain families of known structures with at least 10 members per family we present a comparison of extent of structural and sequence dissimilarities among pairs of proteins which are inputs into the construction of phylogenetic trees. We find that correlation between the structure-based dissimilarity measures and the sequence-based dissimilarity measures is usually good if the sequence similarity among the homologues is about 30% or more. For protein families with low sequence similarity among the members, the correlation coefficient between the sequence-based and the structure-based dissimilarities are poor. In these cases the structure-based dendrogram clusters proteins with most similar biochemical functional properties better than the sequence-similarity based dendrogram. In multi-domain protein families and disulphide-rich protein families the correlation coefficient for the match of sequence-based and structure-based dissimilarity (SDM) measures can be poor though the sequence identity could be higher than 30%. Hence it is suggested that protein evolution is best modelled using 3-D structures if the sequence similarities (SSM) of the homologues are very low.

  17. Cluster-Based Multipolling Sequencing Algorithm for Collecting RFID Data in Wireless LANs

    Science.gov (United States)

    Choi, Woo-Yong; Chatterjee, Mainak

    2015-03-01

    With the growing use of RFID (Radio Frequency Identification), it is becoming important to devise ways to read RFID tags in real time. Access points (APs) of IEEE 802.11-based wireless Local Area Networks (LANs) are being integrated with RFID networks that can efficiently collect real-time RFID data. Several schemes, such as multipolling methods based on the dynamic search algorithm and random sequencing, have been proposed. However, as the number of RFID readers associated with an AP increases, it becomes difficult for the dynamic search algorithm to derive the multipolling sequence in real time. Though multipolling methods can eliminate the polling overhead, we still need to enhance the performance of the multipolling methods based on random sequencing. To that extent, we propose a real-time cluster-based multipolling sequencing algorithm that drastically eliminates more than 90% of the polling overhead, particularly so when the dynamic search algorithm fails to derive the multipolling sequence in real time.

  18. Combined sequence and sequence-structure based methods for analyzing FGF23, CYP24A1 and VDR genes.

    Science.gov (United States)

    Nagamani, Selvaraman; Singh, Kh Dhanachandra; Muthusamy, Karthikeyan

    2016-09-01

    FGF23, CYP24A1 and VDR altogether play a significant role in genetic susceptibility to chronic kidney disease (CKD). Identification of possible causative mutations may serve as therapeutic targets and diagnostic markers for CKD. Thus, we adopted both sequence and sequence-structure based SNP analysis algorithm in order to overcome the limitations of both methods. We explore the functional significance towards the prediction of risky SNPs associated with CKD. We assessed the performance of four widely used pathogenicity prediction methods. We compared the performances of the programs using Mathews correlation Coefficient ranged from poor (MCC = 0.39) to reasonably good (MCC = 0.42). However, we got the best results for the combined sequence and structure based analysis method (MCC = 0.45). 4 SNPs from FGF23 gene, 8 SNPs from VDR gene and 13 SNPs from CYP24A1 gene were predicted to be the causative agents for human diseases. This study will be helpful in selecting potential SNPs for experimental study from the SNP pool and also will reduce the cost for identification of potential SNPs as a genetic marker. PMID:27114920

  19. H.264 MOTION ESTIMATION ALGORITHM BASED ON VIDEO SEQUENCES ACTIVITY

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    Motion estimation is an important part of H.264/AVC encoding progress, with high computational complexity. Therefore, it is quite necessary to find a fast motion estimation algorithm for real-time applications. The algorithm proposed in this letter adjudges the macroblocks activity degree first; then classifies different video sequences, and applies different search strategies according to the result. Experiments show that this method obtains almost the same video quality with the Full Search (FS) algorithm but with reduced more than 95% computation cost.

  20. A sequence-based variation map of zebrafish.

    Science.gov (United States)

    Patowary, Ashok; Purkanti, Ramya; Singh, Meghna; Chauhan, Rajendra; Singh, Angom Ramcharan; Swarnkar, Mohit; Singh, Naresh; Pandey, Vikas; Torroja, Carlos; Clark, Matthew D; Kocher, Jean-Pierre; Clark, Karl J; Stemple, Derek L; Klee, Eric W; Ekker, Stephen C; Scaria, Vinod; Sivasubbu, Sridhar

    2013-03-01

    Zebrafish (Danio rerio) is a popular vertebrate model organism largely deployed using outbred laboratory animals. The nonisogenic nature of the zebrafish as a model system offers the opportunity to understand natural variations and their effect in modulating phenotype. In an effort to better characterize the range of natural variation in this model system and to complement the zebrafish reference genome project, the whole genome sequence of a wild zebrafish at 39-fold genome coverage was determined. Comparative analysis with the zebrafish reference genome revealed approximately 5.2 million single nucleotide variations and over 1.6 million insertion-deletion variations. This dataset thus represents a new catalog of genetic variations in the zebrafish genome. Further analysis revealed selective enrichment for variations in genes involved in immune function and response to the environment, suggesting genome-level adaptations to environmental niches. We also show that human disease gene orthologs in the sequenced wild zebrafish genome show a lower ratio of nonsynonymous to synonymous single nucleotide variations.

  1. DNA sequence-based analysis of the Pseudomonas species.

    Science.gov (United States)

    Mulet, Magdalena; Lalucat, Jorge; García-Valdés, Elena

    2010-06-01

    Partial sequences of four core 'housekeeping' genes (16S rRNA, gyrB, rpoB and rpoD) of the type strains of 107 Pseudomonas species were analysed in order to obtain a comprehensive view regarding the phylogenetic relationships within the Pseudomonas genus. Gene trees allowed the discrimination of two lineages or intrageneric groups (IG), called IG P. aeruginosa and IG P. fluorescens. The first IG P. aeruginosa, was divided into three main groups, represented by the species P. aeruginosa, P. stutzeri and P. oleovorans. The second IG was divided into six groups, represented by the species P. fluorescens, P. syringae, P. lutea, P. putida, P. anguilliseptica and P. straminea. The P. fluorescens group was the most complex and included nine subgroups, represented by the species P. fluorescens, P. gessardi, P. fragi, P. mandelii, P. jesseni, P. koreensis, P. corrugata, P. chlororaphis and P. asplenii. Pseudomonas rhizospherae was affiliated with the P. fluorescens IG in the phylogenetic analysis but was independent of any group. Some species were located on phylogenetic branches that were distant from defined clusters, such as those represented by the P. oryzihabitans group and the type strains P. pachastrellae, P. pertucinogena and P. luteola. Additionally, 17 strains of P. aeruginosa, 'P. entomophila', P. fluorescens, P. putida, P. syringae and P. stutzeri, for which genome sequences have been determined, have been included to compare the results obtained in the analysis of four housekeeping genes with those obtained from whole genome analyses.

  2. LookSeq: A browser-based viewer for deep sequencing data

    OpenAIRE

    Manske, Heinrich Magnus; Dominic P Kwiatkowski

    2009-01-01

    Sequencing a genome to great depth can be highly informative about heterogeneity within an individual or a population. Here we address the problem of how to visualize the multiple layers of information contained in deep sequencing data. We propose an interactive AJAX-based web viewer for browsing large data sets of aligned sequence reads. By enabling seamless browsing and fast zooming, the LookSeq program assists the user to assimilate information at different levels of resolution, from an ov...

  3. Neural network predicts sequence of TP53 gene based on DNA chip

    DEFF Research Database (Denmark)

    Spicker, J.S.; Wikman, F.; Lu, M.L.;

    2002-01-01

    We have trained an artificial neural network to predict the sequence of the human TP53 tumor suppressor gene based on a p53 GeneChip. The trained neural network uses as input the fluorescence intensities of DNA hybridized to oligonucleotides on the surface of the chip and makes between zero...... and four errors in the predicted 1300 bp sequence when tested on wild-type TP53 sequence....

  4. Parallel divide and conquer bio-sequence comparison based on Smith-Waterman algorithm

    Institute of Scientific and Technical Information of China (English)

    ZHANG Fa; QIAO Xiangzhen; LIU Zhiyong

    2004-01-01

    Tools for pair-wise bio-sequence alignment have for long played a central role in computation biology. Several algorithms for bio-sequence alignment have been developed. The Smith-Waterman algorithm, based on dynamic programming, is considered the most fundamental alignment algorithm in bioinformatics. However the existing parallel Smith-Waterman algorithm needs large memory space, and this disadvantage limits the size of a sequence to be handled. As the data of biological sequences expand rapidly, the memory requirement of the existing parallel SmithWaterman algorithm has become a critical problem. For solving this problem, we develop a new parallel bio-sequence alignment algorithm, using the strategy of divide and conquer, named PSW-DC algorithm. In our algorithm, first, we partition the query sequence into several subsequences and distribute them to every processor respectively,then compare each subsequence with the whole subject sequence in parallel, using the Smith-Waterman algorithm, and get an interim result, finally obtain the optimal alignment between the query sequence and subject sequence, through the special combination and extension method. Memory space required in our algorithm is reduced significantly in comparison with existing ones. We also develop a key technique of combination and extension, named the C&E method, to manipulate the interim results and obtain the final sequences alignment. We implement the new parallel bio-sequences alignment algorithm,the PSW-DC, in a cluster parallel system.

  5. Mining of haplotype-based expressed sequence tag single nucleotide polymorphisms in citrus

    OpenAIRE

    Chen, Chunxian; Gmitter Jr, Fred G

    2013-01-01

    Background Single nucleotide polymorphisms (SNPs), the most abundant variations in a genome, have been widely used in various studies. Detection and characterization of citrus haplotype-based expressed sequence tag (EST) SNPs will greatly facilitate further utilization of these gene-based resources. Results In this paper, haplotype-based SNPs were mined out of publicly available citrus expressed sequence tags (ESTs) from different citrus cultivars (genotypes) individually and collectively for...

  6. Sequence-Length Requirement of Distance-Based Phylogeny Reconstruction: Breaking the Polynomial Barrier

    CERN Document Server

    Roch, Sebastien

    2009-01-01

    We introduce a new distance-based phylogeny reconstruction technique which provably achieves, at sufficiently short branch lengths, a polylogarithmic sequence-length requirement -- improving significantly over previous polynomial bounds for distance-based methods. The technique is based on an averaging procedure that implicitly reconstructs ancestral sequences. In the same token, we extend previous results on phase transitions in phylogeny reconstruction to general time-reversible models. More precisely, we show that in the so-called Kesten-Stigum zone (roughly, a region of the parameter space where ancestral sequences are well approximated by ``linear combinations'' of the observed sequences) sequences of length $\\poly(\\log n)$ suffice for reconstruction when branch lengths are discretized. Here $n$ is the number of extant species. Our results challenge, to some extent, the conventional wisdom that estimates of evolutionary distances alone carry significantly less information about phylogenies than full sequ...

  7. Phylogeny of Pelargonium (Geraniaceae) based on DNA sequences from three genomes

    NARCIS (Netherlands)

    Bakker, F.T.; Culham, A.; Hettiarachi, P.; Touloumendidou, T.; Gibby, M.

    2004-01-01

    Phylogenetic hypotheses for the largely South African genus Pelargonium L'Hér. (Geraniaceae) were derived based on DNA sequence data from nuclear, chloroplast and mitochondrial encoded regions. The datasets were unequally represented and comprised cpDNA trnL-F sequences for 152 taxa, nrDNA ITS seque

  8. Genotyping of Histomonas meleagridis isolates based on Internal Transcribed Spacer-1 sequences

    NARCIS (Netherlands)

    H.M.J.F. van der Heijden; W.J.M. Landman; S. Greve; R. Peek

    2006-01-01

    C-profiling is a novel genotyping method for protozoan pathogens, based on polymerase chain reaction and sequencing of AT-rich Internal Transcribed Spacer-1 sequences. It was applied to various Histomonas meleagridis isolates originating from outbreaks of histomoniasis in six Dutch turkey and chicke

  9. THE CONSTRUCTIONS OF ALMOST BINARY SEQUENCE PAIRS WITH THREE-LEVEL CORRELATION BASED ON CYCLOTOMY

    Institute of Scientific and Technical Information of China (English)

    Peng Xiuping; Xu Chengqian

    2012-01-01

    In this paper,a new class of almost binary sequence pair with a single zero element is presented.The almost binary sequence pairs with three-level correlation are constructed based on cyclotomic numbers of order 2,4,and 6.Most of them have good correlation and balance property,whose maximum nontrivial correlation magnitudes are 2 and the difference between the numbers of occurrence of +1's and -1's are 0 or 1.In addition,the corresponding binary sequence pairs are investigated as well and we can also get some kinds of binary sequence pairs with optimum balance and good correlation.

  10. A method for amplification of unknown flanking sequences based on touchdown PCR and suppression-PCR.

    Science.gov (United States)

    Gao, Song; He, Dan; Li, Guangquan; Zhang, Yanhua; Lv, Huiying; Wang, Li

    2016-09-15

    Thermal asymmetric staggered PCR is the most widely used technique to obtain the flanking sequences. However, it has some limitations, including a low rate of positivity, and complex operation. In this study, a improved method of it was made based on suppression-PCR and touchdown PCR. The PCR fragment obtained by the amplification was used directly for sequencing after gel purification. Using this improved method, the positive rate of amplified flanking sequences of the ATMT mutants reached 99%. In addition, the time from DNA extraction to flanking sequence analysis was shortened to 2 days with about 6 dollars each sample. PMID:27393656

  11. A method for amplification of unknown flanking sequences based on touchdown PCR and suppression-PCR.

    Science.gov (United States)

    Gao, Song; He, Dan; Li, Guangquan; Zhang, Yanhua; Lv, Huiying; Wang, Li

    2016-09-15

    Thermal asymmetric staggered PCR is the most widely used technique to obtain the flanking sequences. However, it has some limitations, including a low rate of positivity, and complex operation. In this study, a improved method of it was made based on suppression-PCR and touchdown PCR. The PCR fragment obtained by the amplification was used directly for sequencing after gel purification. Using this improved method, the positive rate of amplified flanking sequences of the ATMT mutants reached 99%. In addition, the time from DNA extraction to flanking sequence analysis was shortened to 2 days with about 6 dollars each sample.

  12. An Approach to Assembly Sequence Plannning Based on Hierarchical Strategy and Genetic Algorithm

    Institute of Scientific and Technical Information of China (English)

    Niu Xinwen; Ding Han; Xiong Youlun

    2001-01-01

    Using group and subassembly cluster methods, the hierarchical structure of a product is.generated automatically, which largely reduces the complexity of planning. Based on genetic algofithn the optimal of assembly sequence of each stracture level can be obtained by sequence-bysequence search. As a result, a better assembly sequence of the product can be generated by combining the assembly sequences of all hierarchical structures, which provides more parallelism and flexibility for assembly operations. An industrial example is solved by this new approach.

  13. CAPS satellite spread spectrum communication blind multi-user detecting system based on chaotic sequences

    Institute of Scientific and Technical Information of China (English)

    2009-01-01

    Multiple Path Interference (MPI) and Multiple Access Interference (MAI) are important factors that affect the performance of Chinese Area Positioning System (CAPS). These problems can be solved by using spreading sequences with ideal properties and multi-user detectors. Chaotic sequences based on Chebyshev map are studied and the satellite communication system model is set up to investigate the application of chaotic sequences for CAPS in this paper. Simulation results show that chaotic sequences have desirable correlation properties and it is easy to generate a large number of chaotic sequences with good security. It has great practical value to apply chaotic sequences to CAPS together with multi-user detecting technology and the system performance can be improved greatly.

  14. High-throughput-sequencing-based identification of a grapevine fanleaf virus satellite RNA in Vitis vinifera.

    Science.gov (United States)

    Chiumenti, Michela; Mohorianu, Irina; Roseti, Vincenzo; Saldarelli, Pasquale; Dalmay, Tamas; Minafra, Angelantonio

    2016-05-01

    A new satellite RNA (satRNA) of grapevine fanleaf virus (GFLV) was identified by high-throughput sequencing of high-definition (HD) adapter libraries from grapevine plants of the cultivar Panse precoce (PPE) affected by enation disease. The complete nucleotide sequence was obtained by automatic sequencing using primers designed based on next-generation sequencing (NGS) data. The full-length sequence, named satGFLV-PPE, consisted of 1119 nucleotides with a single open reading frame from position 15 to 1034. This satRNA showed maximum nucleotide sequence identity of 87 % to satArMV-86 and satGFLV-R6. Symptomatic grapevines were surveyed for the presence of the satRNA, and no correlation was found between detection of the satRNA and enation symptom expression. PMID:26873812

  15. CAPS satellite spread spectrum communication blind multi-user detecting system based on chaotic sequences

    Institute of Scientific and Technical Information of China (English)

    LEI LiHua; SHI HuLi; MA GuanYi

    2009-01-01

    Multiple Path Interference (MPI) and Multiple Access Interference (MAI) are Important factors that affect the performance of Chinese Area Positioning System (CAPS),These problems can be solved by using spreading sequences with ideal properties and multi-user detectors.Chaotic sequences based on Chebyshev map are studied and the satellite communication system model is set up to investigate the application of chaotic sequences for CAPS in this paper,Simulation results show that chaotic sequences have desirable correlation properties and it is easy to generate a large number of chaotic sequences with good security.It has great practical value to apply chaotic sequences to CAPS together with multi-user detecting technology and the system performance can be improved greatly.

  16. A comparison of single molecule and amplification based sequencing of cancer transcriptomes.

    Directory of Open Access Journals (Sweden)

    Lee T Sam

    Full Text Available The second wave of next generation sequencing technologies, referred to as single-molecule sequencing (SMS, carries the promise of profiling samples directly without employing polymerase chain reaction steps used by amplification-based sequencing (AS methods. To examine the merits of both technologies, we examine mRNA sequencing results from single-molecule and amplification-based sequencing in a set of human cancer cell lines and tissues. We observe a characteristic coverage bias towards high abundance transcripts in amplification-based sequencing. A larger fraction of AS reads cover highly expressed genes, such as those associated with translational processes and housekeeping genes, resulting in relatively lower coverage of genes at low and mid-level abundance. In contrast, the coverage of high abundance transcripts plateaus off using SMS. Consequently, SMS is able to sequence lower- abundance transcripts more thoroughly, including some that are undetected by AS methods; however, these include many more mapping artifacts. A better understanding of the technical and analytical factors introducing platform specific biases in high throughput transcriptome sequencing applications will be critical in cross platform meta-analytic studies.

  17. Control allocation and management of redundant control effectors based on bases sequenced optimal method

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    For an advanced aircraft, the amount of its effectors is much more than that for a traditional one, the functions of effectors are more complex and the coupling between each other is more severe. Based on the current control allocation research, this paper puts forward the concept and framework of the control allocation and management system for aircrafts with redundancy con-trol effectors. A new optimal control allocation method, bases sequenced optimal (BSO) method, is then presented. By analyz-ing the physical meaning of the allocation process of BSO method, four types of management strategies are adopted by the system, which act on the control allocation process under different flight conditions, mission requirements and effectors work-ing conditions. Simulation results show that functions of the control allocation system are extended and the system adaptability to flight status, mission requirements and effector failure conditions is improved.

  18. Genomic Variance Estimation Based on Genotyping-by-Sequencing with Different Coverage in Perennial Ryegrass

    DEFF Research Database (Denmark)

    Ashraf, Bilal; Fé, Dario; Jensen, Just;

    2014-01-01

    Advancement in next generation sequencing (NGS) technologies has significantly decreased the cost of DNA sequencing enabling increased use of genotyping by sequencing (GBS) in several plant species. In contrast to array-based genotyping GBS also allows for easy estimation of allele frequencies...... at each SNP in family pools or polyploids. There are, however, several statistical challenges associated with this method, including low sequencing depth and missing values. Low sequencing depth results in inaccuracies in estimates of allele frequencies for each SNP. In this work we have focused...... on optimizing methods and models utilizing F2 family phenotype records and NGS information from F2 family pools in perennial ryegrass. Genomic variance was estimated using genomic relationship matrices based on different coverage depths to verify effects of coverage depth. Example traits were seed yield, rust...

  19. On Properties of Update Sequences Based on Causal Rejection

    OpenAIRE

    Eiter, T.; Fink, M; Sabbatini, G; Tompits, H.

    2001-01-01

    We consider an approach to update nonmonotonic knowledge bases represented as extended logic programs under answer set semantics. New information is incorporated into the current knowledge base subject to a causal rejection principle enforcing that, in case of conflicts, more recent rules are preferred and older rules are overridden. Such a rejection principle is also exploited in other approaches to update logic programs, e.g., in dynamic logic programming by Alferes et al. We give a thoroug...

  20. Evaluation of Hybridization Capture Versus Amplicon‐Based Methods for Whole‐Exome Sequencing

    Science.gov (United States)

    Samorodnitsky, Eric; Jewell, Benjamin M.; Hagopian, Raffi; Miya, Jharna; Wing, Michele R.; Lyon, Ezra; Damodaran, Senthilkumar; Bhatt, Darshna; Reeser, Julie W.; Datta, Jharna

    2015-01-01

    ABSTRACT Next‐generation sequencing has aided characterization of genomic variation. While whole‐genome sequencing may capture all possible mutations, whole‐exome sequencing remains cost‐effective and captures most phenotype‐altering mutations. Initial strategies for exome enrichment utilized a hybridization‐based capture approach. Recently, amplicon‐based methods were designed to simplify preparation and utilize smaller DNA inputs. We evaluated two hybridization capture‐based and two amplicon‐based whole‐exome sequencing approaches, utilizing both Illumina and Ion Torrent sequencers, comparing on‐target alignment, uniformity, and variant calling. While the amplicon methods had higher on‐target rates, the hybridization capture‐based approaches demonstrated better uniformity. All methods identified many of the same single‐nucleotide variants, but each amplicon‐based method missed variants detected by the other three methods and reported additional variants discordant with all three other technologies. Many of these potential false positives or negatives appear to result from limited coverage, low variant frequency, vicinity to read starts/ends, or the need for platform‐specific variant calling algorithms. All methods demonstrated effective copy‐number variant calling when evaluated against a single‐nucleotide polymorphism array. This study illustrates some differences between whole‐exome sequencing approaches, highlights the need for selecting appropriate variant calling based on capture method, and will aid laboratories in selecting their preferred approach. PMID:26110913

  1. Evaluation of Hybridization Capture Versus Amplicon-Based Methods for Whole-Exome Sequencing.

    Science.gov (United States)

    Samorodnitsky, Eric; Jewell, Benjamin M; Hagopian, Raffi; Miya, Jharna; Wing, Michele R; Lyon, Ezra; Damodaran, Senthilkumar; Bhatt, Darshna; Reeser, Julie W; Datta, Jharna; Roychowdhury, Sameek

    2015-09-01

    Next-generation sequencing has aided characterization of genomic variation. While whole-genome sequencing may capture all possible mutations, whole-exome sequencing remains cost-effective and captures most phenotype-altering mutations. Initial strategies for exome enrichment utilized a hybridization-based capture approach. Recently, amplicon-based methods were designed to simplify preparation and utilize smaller DNA inputs. We evaluated two hybridization capture-based and two amplicon-based whole-exome sequencing approaches, utilizing both Illumina and Ion Torrent sequencers, comparing on-target alignment, uniformity, and variant calling. While the amplicon methods had higher on-target rates, the hybridization capture-based approaches demonstrated better uniformity. All methods identified many of the same single-nucleotide variants, but each amplicon-based method missed variants detected by the other three methods and reported additional variants discordant with all three other technologies. Many of these potential false positives or negatives appear to result from limited coverage, low variant frequency, vicinity to read starts/ends, or the need for platform-specific variant calling algorithms. All methods demonstrated effective copy-number variant calling when evaluated against a single-nucleotide polymorphism array. This study illustrates some differences between whole-exome sequencing approaches, highlights the need for selecting appropriate variant calling based on capture method, and will aid laboratories in selecting their preferred approach. PMID:26110913

  2. Base J glucosyltransferase does not regulate the sequence specificity of J synthesis in trypanosomatid telomeric DNA.

    Science.gov (United States)

    Bullard, Whitney; Cliffe, Laura; Wang, Pengcheng; Wang, Yinsheng; Sabatini, Robert

    2015-12-01

    Telomeric DNA of trypanosomatids possesses a modified thymine base, called base J, that is synthesized in a two-step process; the base is hydroxylated by a thymidine hydroxylase forming hydroxymethyluracil (hmU) and a glucose moiety is then attached by the J-associated glucosyltransferase (JGT). To examine the importance of JGT in modifiying specific thymine in DNA, we used a Leishmania episome system to demonstrate that the telomeric repeat (GGGTTA) stimulates J synthesis in vivo while mutant telomeric sequences (GGGTTT, GGGATT, and GGGAAA) do not. Utilizing an in vitro GT assay we find that JGT can glycosylate hmU within any sequence with no significant change in Km or kcat, even mutant telomeric sequences that are unable to be J-modified in vivo. The data suggests that JGT possesses no DNA sequence specificity in vitro, lending support to the hypothesis that the specificity of base J synthesis is not at the level of the JGT reaction.

  3. Group Graded Associated Ideals with Flat Base Change of Rings and Short Exact Sequences

    Indian Academy of Sciences (India)

    Srinivas Behara; Shiv Datt Kumar

    2011-05-01

    This paper deals with the study of behaviour of -associated ideals and strong Krull -associated ideals with flat base change of rings and behaviour of -associated ideals with short exact sequences over rings graded by finitely generated abelian group .

  4. Design and Evaluation of a Research-Based Teaching Sequence: The Superposition of Electric Field.

    Science.gov (United States)

    Viennot, L.; Rainson, S.

    1999-01-01

    Illustrates an approach to research-based teaching strategies and their evaluation. Addresses a teaching sequence on the superposition of electric fields implemented at the college level in an institutional framework subject to severe constraints. Contains 28 references. (DDR)

  5. Base J glucosyltransferase does not regulate the sequence specificity of J synthesis in trypanosomatid telomeric DNA.

    Science.gov (United States)

    Bullard, Whitney; Cliffe, Laura; Wang, Pengcheng; Wang, Yinsheng; Sabatini, Robert

    2015-12-01

    Telomeric DNA of trypanosomatids possesses a modified thymine base, called base J, that is synthesized in a two-step process; the base is hydroxylated by a thymidine hydroxylase forming hydroxymethyluracil (hmU) and a glucose moiety is then attached by the J-associated glucosyltransferase (JGT). To examine the importance of JGT in modifiying specific thymine in DNA, we used a Leishmania episome system to demonstrate that the telomeric repeat (GGGTTA) stimulates J synthesis in vivo while mutant telomeric sequences (GGGTTT, GGGATT, and GGGAAA) do not. Utilizing an in vitro GT assay we find that JGT can glycosylate hmU within any sequence with no significant change in Km or kcat, even mutant telomeric sequences that are unable to be J-modified in vivo. The data suggests that JGT possesses no DNA sequence specificity in vitro, lending support to the hypothesis that the specificity of base J synthesis is not at the level of the JGT reaction. PMID:26815240

  6. Multifunctional hybrid networks based on self assembling peptide sequences

    Science.gov (United States)

    Sathaye, Sameer

    The overall aim of this dissertation is to achieve a comprehensive correlation between the molecular level changes in primary amino acid sequences of amphiphilic beta-hairpin peptides and their consequent solution-assembly properties and bulk network hydrogel behavior. This has been accomplished using two broad approaches. In the first approach, amino acid substitutions were made to peptide sequence MAX1 such that the hydrophobic surfaces of the folded beta-hairpins from the peptides demonstrate shape specificity in hydrophobic interactions with other beta-hairpins during the assembly process, thereby causing changes to the peptide nanostructure and bulk rheological properties of hydrogels formed from the peptides. Steric lock and key complementary hydrophobic interactions were designed to occur between two beta-hairpin molecules of a single molecule, LNK1 during beta-sheet fibrillar assembly of LNK1. Experimental results from circular dichroism, transmission electron microscopy and oscillatory rheology collectively indicate that the molecular design of the LNK1 peptide can be assigned the cause of the drastically different behavior of the networks relative to MAX1. The results indicate elimination or significant reduction of fibrillar branching due to steric complementarity in LNK1 that does not exist in MAX1, thus supporting the original hypothesis. As an extension of the designed steric lock and key complementarity between two beta-hairpin molecules of the same peptide molecule. LNK1, three new pairs of peptide molecules LP1-KP1, LP2-KP2 and LP3-KP3 that resemble complementary 'wedge' and 'trough' shapes when folded into beta-hairpins were designed and studied. All six peptides individually and when blended with their corresponding shape complement formed fibrillar nanostructures with non-uniform thickness values. Loose packing in the assembled structures was observed in all the new peptides as compared to the uniform tight packing in MAX1 by SANS analysis. This

  7. Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns

    OpenAIRE

    Comin, Matteo; Schimd, Michele

    2014-01-01

    Background With the advent of Next-Generation Sequencing technologies (NGS), a large amount of short read data has been generated. If a reference genome is not available, the assembly of a template sequence is usually challenging because of repeats and the short length of reads. When NGS reads cannot be mapped onto a reference genome alignment-based methods are not applicable. However it is still possible to study the evolutionary relationship of unassembled genomes based on NGS data. Results...

  8. A new approach based on PSO algorithm to find good computational encoding sequences

    Institute of Scientific and Technical Information of China (English)

    Cui Guangzhao; Niu Yunyun; Wang Yanfeng; Zhang Xuncai; Pan Linqiang

    2007-01-01

    Computational encoding DNA sequence design is one of the most important steps in molecular computation. A lot of research work has been done to design reliable sequence library. A revised method based on the support system developed by Tanaka et al.is proposed here with different criteria to construct fitness function. Then we adapt particle swarm optimization (PSO) algorithm to our encoding problem. By using the new algorithm, a set of sequences with good quality is generated. The result also shows that our PSO- based approach could rapidly converge at the minimum level for an output of the simulation model. The celerity of the algorithm fits our requirements.

  9. Rapid Conversion of Traditional Introductory Physics Sequences to an Activity-Based Format

    Science.gov (United States)

    Yoder, Garett; Cook, Jerry

    2014-01-01

    The Department of Physics at EKU [Eastern Kentucky University] with support from the National Science Foundations Course Curriculum and Laboratory Improvement Program has successfully converted our entire introductory physics sequence, both algebra-based and calculus-based courses, to an activity-based format where laboratory activities,…

  10. GrabCut-Based Human Segmentation in Video Sequences

    Directory of Open Access Journals (Sweden)

    Sergio Escalera

    2012-11-01

    Full Text Available In this paper, we present a fully-automatic Spatio-Temporal GrabCut human segmentation methodology that combines tracking and segmentation. GrabCut initialization is performed by a HOG-based subject detection, face detection, and skin color model. Spatial information is included by Mean Shift clustering whereas temporal coherence is considered by the historical of Gaussian Mixture Models. Moreover, full face and pose recovery is obtained by combining human segmentation with Active Appearance Models and Conditional Random Fields. Results over public datasets and in a new Human Limb dataset show a robust segmentation and recovery of both face and pose using the presented methodology.

  11. MRI-Based Thermometry for Tumor Thermal Ablation: A Comparison of Different MR Sequences

    Directory of Open Access Journals (Sweden)

    T. J. Vogl

    2010-05-01

    Full Text Available Background/Objective: To evaluate T1 and PRF thermometry methods utilizing fast MR sequences and fluoroptic thermometer."nMaterials and Methods: The MR-guided LITT (Laser-Induced Interstitial Thermotherapy with a laser wavelength/power of 1064nm/30W was applied to pig liver and a gel phantom. During the ablation process, the temperature was measured using a fluoroptic thermometer and MR imaging was performed applying a 1.5-Tesla tomograph with an EPI (Echo Planar Imaging sequence for PRF (Proton Resonance Frequency method and FLASH, IRTF, SRTF and TRUFI sequences for T1 method. Plotting MR signal intensity against measured temperature determined the temperature constant for each of the T1 sequences. To determine the PRF temperature constant, phase values were recorded from phase images and then plotted against temperature. The PRF temperature constant was verified comparing the MR temperature with the measured one obtained from a second LITT experiment on gel phantom."nResults: The experiments determining the temperature constant for T1 method showed that the IRTF and FLASH sequences have the highest temperature sensitivity and the most linear relationship between MR signal intensity and measured temperature. SRTF sequence presented relatively good linearity but inferior temperature sensitivity compared to IRTF and FLASH sequences. Conversely, TRUFI sequence exhibited the lowest temperature sensitivity and linearity of data points. Concerning the PRF method, the measured and the MR-based temperatures agreed up to approximately 70 C."nConclusion: To demonstrate and control temperature in target tissue during the LITT process, the PRF method with an EPI sequence is preferred for temperatures below 70 C due to its acceptable accuracy. Among the T1 sequences, FLASH is preferable as the most robust, though not the most accurate T1 sequence.

  12. Park-based and zero sequence-based relaying techniques with application to transformers protection

    Energy Technology Data Exchange (ETDEWEB)

    Diaz, G.; Arboleya, P.; Gomez-Aleixandre, J. [University of Oviedo (Spain). Dept. of Electrical Engineering

    2004-09-01

    Two relaying techniques for protecting power transformers are presented and discussed. Very often, differential relaying is used for this purpose. A comparison between the two proposed techniques and conventional differential relaying is thus presented. The first technique, based on the measurements of zero sequence current within a delta winding, performs best in multiwinding transformers, since only measurement of the coil currents is needed. Thus, great simplicity is achieved. The second one is based on the differential procedure, but its analysis of asymmetries in the plot in Park's plane avoids problems related to spectral analysis in conventional differential relaying. The technique is justified from the analysis of symmetrical components. Misoperation in conventional differential relaying has been observed in some cases as a function of switching instant and fault location. This issue is discussed in the paper, and a statistical analysis of a large number of laboratory tests, in which both factors were controlled, is presented. As a conclusion, both relaying techniques proposed succeed in protecting the transformer. Additionally, the Park-based relay exhibits three characteristics of most importance: fastest performance, robustness and simplicity in its formulation. (author)

  13. [Identification of plantaginis semen based on ITS2 and psbA-trnH sequences].

    Science.gov (United States)

    Song, Ming; Zhang, Ya-Qin; Lin, Yun-Han; Tu, Yuan; Ma, Xiao-Xi; Sun, Wei; Xiang, Li; Jiao, Wen-Jing; Liu, Xia

    2014-06-01

    In order to evaluate the efficiency of ITS2 and psbA-trnH sequences used as DNA barcodes to distinguish Plantaginis Semen from its adulterants, we collected 71 samples of Plantaginis Semen and its adulterants. The ITS2 and psbA-trnH sequences were aligned through Clustal W, and the genetic distances were calculated by kimura 2-parameter (K2P) model and the Neighbor-Joining (NJ) phylogenetic trees were constructed using MEGA 5.1. The results indicated that the ITS2 sequence lengths of Plantago asiatica and P. depressa were 199 bp and 200 bp, respectively; the maximum intra-specific K2P distance were lower than the minimum inter-specific K2P distance; the NJ tree based on ITS2 sequence indicated that Plantaginis Semen and its adulterants could be distinguished clearly. The sequence lengths of psbA-trnH of both P. asiatica and P. depressa were 340 bp; the maximum intra-specific K2P distances were lower than the minimum inter-specific K2P distance; the NJ tree based on psbA-trnH sequence showed that Plantaginis Semen can be distinguished clearly from its adulterants except for P. major. Therefore, ITS2 sequences can be used as an ideal DNA barcode to distinguish Plantaginis Semen from its adulterants. PMID:25244750

  14. Phylogeny of the Zygomycota based on nuclear ribosomal sequence data.

    Science.gov (United States)

    White, Merlin M; James, Timothy Y; O'Donnell, Kerry; Cafaro, Matías J; Tanabe, Yuuhiko; Sugiyama, Junta

    2006-01-01

    The Zygomycota is an ecologically heterogenous assemblage of nonzoosporic fungi comprising two classes, Zygomycetes and Trichomycetes. Phylogenetic analyses have suggested that the phylum is polyphyletic; two of four orders of Trichomycetes are related to the Mesomycetozoa (protists) that diverged near the fungal/animal split. Current circumscription of the Zygomycota includes only orders with representatives that produce zygospores. We present a molecular-based phylogeny including recognized representatives of the Zygomycetes and Trichomycetes with a combined dataset for nuclear rRNA 18S (SSU), 5.8S and 28S (LSU) genes. Tree reconstruction by Bayesian analyses suggests the Zygomycota is paraphyletic. Although 12 clades were identified only some of these correspond to the nine orders of Zygomycota currently recognized. A large superordinal clade, comprising the Dimargaritales, Harpellales, Kickxellales and Zoopagales, grouping together many symbiotic fungi, also is identified in part by a unique septal structure. Although Harpellales and Kickxellales are not monophyletic, these lineages are distinct from the Mucorales, Endogonales and Mortierellales, which appear more closely related to the Ascomycota + Basidiomycota + Glomeromycota. The final major group, the insect-associated Entomophthorales, appears to be polyphyletic. In the present analyses Basidiobolus and Neozygites group within Zygomycota but not with the Entomophthorales. Clades are discussed with special reference to traditional classifications, mapping morphological characters and ecology, where possible, as a snapshot of our current phylogenetic perspective of the Zygomycota.

  15. Efficient Simulation of Quantum States Based on Classical Fields Modulated with Pseudorandom Phase Sequences

    CERN Document Server

    Fu, Jian

    2010-01-01

    We demonstrate that a tensor product structure could be obtained by introducing pseudorandom phase sequences into classical fields with two orthogonal modes. Using classical fields modulated with pseudorandom phase sequences, we discuss efficient simulation of several typical quantum states, including product state, Bell states, GHZ state, and W state. By performing quadrature demodulation scheme, we could obtain the mode status matrix of the simulating classical fields, based on which we propose a sequence permutation mechanism to reconstruct the simulated quantum states. The research on classical simulation of quantum states is important, for it not only enables potential practical applications in quantum computation, but also provides useful insights into fundamental concepts of quantum mechanics.

  16. Predicting effects of noncoding variants with deep learning–based sequence model

    Science.gov (United States)

    Zhou, Jian; Troyanskaya, Olga G

    2016-01-01

    Identifying functional effects of noncoding variants is a major challenge in human genetics. To predict the noncoding-variant effects de novo from sequence, we developed a deep learning–based algorithmic framework, DeepSEA (http://deepsea.princeton.edu/), that directly learns a regulatory sequence code from large-scale chromatin-profiling data, enabling prediction of chromatin effects of sequence alterations with single-nucleotide sensitivity. We further used this capability to improve prioritization of functional variants including expression quantitative trait loci (eQTLs) and disease-associated variants. PMID:26301843

  17. Application of genotyping-by-sequencing on semiconductor sequencing platforms: a comparison of genetic and reference-based marker ordering in barley.

    Directory of Open Access Journals (Sweden)

    Martin Mascher

    Full Text Available The rapid development of next-generation sequencing platforms has enabled the use of sequencing for routine genotyping across a range of genetics studies and breeding applications. Genotyping-by-sequencing (GBS, a low-cost, reduced representation sequencing method, is becoming a common approach for whole-genome marker profiling in many species. With quickly developing sequencing technologies, adapting current GBS methodologies to new platforms will leverage these advancements for future studies. To test new semiconductor sequencing platforms for GBS, we genotyped a barley recombinant inbred line (RIL population. Based on a previous GBS approach, we designed bar code and adapter sets for the Ion Torrent platforms. Four sets of 24-plex libraries were constructed consisting of 94 RILs and the two parents and sequenced on two Ion platforms. In parallel, a 96-plex library of the same RILs was sequenced on the Illumina HiSeq 2000. We applied two different computational pipelines to analyze sequencing data; the reference-independent TASSEL pipeline and a reference-based pipeline using SAMtools. Sequence contigs positioned on the integrated physical and genetic map were used for read mapping and variant calling. We found high agreement in genotype calls between the different platforms and high concordance between genetic and reference-based marker order. There was, however, paucity in the number of SNP that were jointly discovered by the different pipelines indicating a strong effect of alignment and filtering parameters on SNP discovery. We show the utility of the current barley genome assembly as a framework for developing very low-cost genetic maps, facilitating high resolution genetic mapping and negating the need for developing de novo genetic maps for future studies in barley. Through demonstration of GBS on semiconductor sequencing platforms, we conclude that the GBS approach is amenable to a range of platforms and can easily be modified as new

  18. MOST: a modified MLST typing tool based on short read sequencing.

    Science.gov (United States)

    Tewolde, Rediat; Dallman, Timothy; Schaefer, Ulf; Sheppard, Carmen L; Ashton, Philip; Pichon, Bruno; Ellington, Matthew; Swift, Craig; Green, Jonathan; Underwood, Anthony

    2016-01-01

    Multilocus sequence typing (MLST) is an effective method to describe bacterial populations. Conventionally, MLST involves Polymerase Chain Reaction (PCR) amplification of housekeeping genes followed by Sanger DNA sequencing. Public Health England (PHE) is in the process of replacing the conventional MLST methodology with a method based on short read sequence data derived from Whole Genome Sequencing (WGS). This paper reports the comparison of the reliability of MLST results derived from WGS data, comparing mapping and assembly-based approaches to conventional methods using 323 bacterial genomes of diverse species. The sensitivity of the two WGS based methods were further investigated with 26 mixed and 29 low coverage genomic data sets from Salmonella enteridis and Streptococcus pneumoniae. Of the 323 samples, 92.9% (n = 300), 97.5% (n = 315) and 99.7% (n = 322) full MLST profiles were derived by the conventional method, assembly- and mapping-based approaches, respectively. The concordance between samples that were typed by conventional (92.9%) and both WGS methods was 100%. From the 55 mixed and low coverage genomes, 89.1% (n = 49) and 67.3% (n = 37) full MLST profiles were derived from the mapping and assembly based approaches, respectively. In conclusion, deriving MLST from WGS data is more sensitive than the conventional method. When comparing WGS based methods, the mapping based approach was the most sensitive. In addition, the mapping based approach described here derives quality metrics, which are difficult to determine quantitatively using conventional and WGS-assembly based approaches. PMID:27602279

  19. Statistical framework for detection of genetically modified organisms based on Next Generation Sequencing

    OpenAIRE

    Willems, Sander; Fraiture, Marie-Alice; Deforce, Dieter; De Keersmaecker, Sigrid; Herman, Philippe; De Loose, Marc; Ruttink, Tom; Van Nieuwerburgh, Filip; Roosens, Nancy

    2016-01-01

    Because the number and diversity of genetically modified (GM) crops has significantly increased, their analysis based on real-time PCR (qPCR) methods is becoming increasingly complex and laborious. While several pioneers already investigated Next Generation Sequencing (NGS) as an alternative to qPCR, its practical use has not been assessed for routine analysis. In this study a statistical framework was developed to predict the number of NGS reads needed to detect transgene sequences, to prove...

  20. CloudMap: A Cloud-Based Pipeline for Analysis of Mutant Genome Sequences

    OpenAIRE

    Minevich, Gregory; Park, Danny S.; Blankenberg, Daniel; Richard J Poole; Hobert, Oliver

    2012-01-01

    Whole genome sequencing (WGS) allows researchers to pinpoint genetic differences between individuals and significantly shortcuts the costly and time-consuming part of forward genetic analysis in model organism systems. Currently, the most effort-intensive part of WGS is the bioinformatic analysis of the relatively short reads generated by second generation sequencing platforms. We describe here a novel, easily accessible and cloud-based pipeline, called CloudMap, which greatly simplifies the ...

  1. [Clinical Application of Extraction and Analysis of the Key Frames Based on IVUS Sequences].

    Science.gov (United States)

    Mao, Haiqun; Yang, Feng; Huang, Zheng; Cui, Kai; Wang, Xinxin

    2015-08-01

    In this paper, we propose an image-based key frame gating method to reduce motion artifacts in intravascular ultrasound (IVUS) longitudinal cuts. The artifacts are mainly caused by the periodic relative displacement between blood vessels and the IVUS catheter due to cardiac motion. The method is achieved in four steps as following. Firstly, we convert IVUS image sequences to polar coordinates to cut down the amount of calculation. Secondly, we extracted a one-dimensional signal cluster reflecting cardiac motion by spectral analysis and filtering techniques. Thirdly, we designed a Butterworth band-pass filter for filtering the one-dimensional signal clusters. Fourthly, we retrieved the extremes of the filtered signal clusters to seek key frames to compose key-frames gated sequences. Experimental results showed that our algorithm was fast and the average frame processing time was 17ms. Observing the longitudinal viewpictures, we found that comparing to the original ones, the gated sequences had similar trend, less saw tooth shape, and good continuity. We selected 12 groups of clinical IVUS sequences [images (876 +/- 65 frames), coronary segments length (14.61 +/- 1.08 mm)] to calculate vessel volume, lumen volume, mean plaque burden of the original and gated sequences. Statistical results showed that, on one hand, both vessel volume and lumen volume measured of the gated sequences were significantly smaller than those of the original ones, and there was no significant difference on mean plaque burden between original and gated sequences, which met the need of the clinical diagnosis and treatment. On the other hand, variances of vessel area and lumen area of the gated sequences were significantly smaller than those of the original sequences, indicating that the gated sequences would be more stable than the original ones.

  2. SDT: a virus classification tool based on pairwise sequence alignment and identity calculation.

    Directory of Open Access Journals (Sweden)

    Brejnev Muhizi Muhire

    Full Text Available The perpetually increasing rate at which viral full-genome sequences are being determined is creating a pressing demand for computational tools that will aid the objective classification of these genome sequences. Taxonomic classification approaches that are based on pairwise genetic identity measures are potentially highly automatable and are progressively gaining favour with the International Committee on Taxonomy of Viruses (ICTV. There are, however, various issues with the calculation of such measures that could potentially undermine the accuracy and consistency with which they can be applied to virus classification. Firstly, pairwise sequence identities computed based on multiple sequence alignments rather than on multiple independent pairwise alignments can lead to the deflation of identity scores with increasing dataset sizes. Also, when gap-characters need to be introduced during sequence alignments to account for insertions and deletions, methodological variations in the way that these characters are introduced and handled during pairwise genetic identity calculations can cause high degrees of inconsistency in the way that different methods classify the same sets of sequences. Here we present Sequence Demarcation Tool (SDT, a free user-friendly computer program that aims to provide a robust and highly reproducible means of objectively using pairwise genetic identity calculations to classify any set of nucleotide or amino acid sequences. SDT can produce publication quality pairwise identity plots and colour-coded distance matrices to further aid the classification of sequences according to ICTV approved taxonomic demarcation criteria. Besides a graphical interface version of the program for Windows computers, command-line versions of the program are available for a variety of different operating systems (including a parallel version for cluster computing platforms.

  3. High Interlaboratory Reprocucibility of DNA Sequence-based Typing of Bacteria in a Multicenter Study

    DEFF Research Database (Denmark)

    Sousa, MA de; Boye, Kit; Lencastre, H de;

    2006-01-01

    Current DNA amplification-based typing methods for bacterial pathogens often lack interlaboratory reproducibility. In this international study, DNA sequence-based typing of the Staphylococcus aureus protein A gene (spa, 110 to 422 bp) showed 100% intra- and interlaboratory reproducibility without...

  4. A next generation semiconductor based sequencing approach for the identification of meat species in DNA mixtures.

    Directory of Open Access Journals (Sweden)

    Francesca Bertolini

    Full Text Available The identification of the species of origin of meat and meat products is an important issue to prevent and detect frauds that might have economic, ethical and health implications. In this paper we evaluated the potential of the next generation semiconductor based sequencing technology (Ion Torrent Personal Genome Machine for the identification of DNA from meat species (pig, horse, cattle, sheep, rabbit, chicken, turkey, pheasant, duck, goose and pigeon as well as from human and rat in DNA mixtures through the sequencing of PCR products obtained from different couples of universal primers that amplify 12S and 16S rRNA mitochondrial DNA genes. Six libraries were produced including PCR products obtained separately from 13 species or from DNA mixtures containing DNA from all species or only avian or only mammalian species at equimolar concentration or at 1:10 or 1:50 ratios for pig and horse DNA. Sequencing obtained a total of 33,294,511 called nucleotides of which 29,109,688 with Q20 (87.43% in a total of 215,944 reads. Different alignment algorithms were used to assign the species based on sequence data. Error rate calculated after confirmation of the obtained sequences by Sanger sequencing ranged from 0.0003 to 0.02 for the different species. Correlation about the number of reads per species between different libraries was high for mammalian species (0.97 and lower for avian species (0.70. PCR competition limited the efficiency of amplification and sequencing for avian species for some primer pairs. Detection of low level of pig and horse DNA was possible with reads obtained from different primer pairs. The sequencing of the products obtained from different universal PCR primers could be a useful strategy to overcome potential problems of amplification. Based on these results, the Ion Torrent technology can be applied for the identification of meat species in DNA mixtures.

  5. A next generation semiconductor based sequencing approach for the identification of meat species in DNA mixtures.

    Science.gov (United States)

    Bertolini, Francesca; Ghionda, Marco Ciro; D'Alessandro, Enrico; Geraci, Claudia; Chiofalo, Vincenzo; Fontanesi, Luca

    2015-01-01

    The identification of the species of origin of meat and meat products is an important issue to prevent and detect frauds that might have economic, ethical and health implications. In this paper we evaluated the potential of the next generation semiconductor based sequencing technology (Ion Torrent Personal Genome Machine) for the identification of DNA from meat species (pig, horse, cattle, sheep, rabbit, chicken, turkey, pheasant, duck, goose and pigeon) as well as from human and rat in DNA mixtures through the sequencing of PCR products obtained from different couples of universal primers that amplify 12S and 16S rRNA mitochondrial DNA genes. Six libraries were produced including PCR products obtained separately from 13 species or from DNA mixtures containing DNA from all species or only avian or only mammalian species at equimolar concentration or at 1:10 or 1:50 ratios for pig and horse DNA. Sequencing obtained a total of 33,294,511 called nucleotides of which 29,109,688 with Q20 (87.43%) in a total of 215,944 reads. Different alignment algorithms were used to assign the species based on sequence data. Error rate calculated after confirmation of the obtained sequences by Sanger sequencing ranged from 0.0003 to 0.02 for the different species. Correlation about the number of reads per species between different libraries was high for mammalian species (0.97) and lower for avian species (0.70). PCR competition limited the efficiency of amplification and sequencing for avian species for some primer pairs. Detection of low level of pig and horse DNA was possible with reads obtained from different primer pairs. The sequencing of the products obtained from different universal PCR primers could be a useful strategy to overcome potential problems of amplification. Based on these results, the Ion Torrent technology can be applied for the identification of meat species in DNA mixtures.

  6. JiffyNet: a web-based instant protein network modeler for newly sequenced species.

    Science.gov (United States)

    Kim, Eiru; Kim, Hanhae; Lee, Insuk

    2013-07-01

    Revolutionary DNA sequencing technology has enabled affordable genome sequencing for numerous species. Thousands of species already have completely decoded genomes, and tens of thousands more are in progress. Naturally, parallel expansion of the functional parts list library is anticipated, yet genome-level understanding of function also requires maps of functional relationships, such as functional protein networks. Such networks have been constructed for many sequenced species including common model organisms. Nevertheless, the majority of species with sequenced genomes still have no protein network models available. Moreover, biologists might want to obtain protein networks for their species of interest on completion of the genome projects. Therefore, there is high demand for accessible means to automatically construct genome-scale protein networks based on sequence information from genome projects only. Here, we present a public web server, JiffyNet, specifically designed to instantly construct genome-scale protein networks based on associalogs (functional associations transferred from a template network by orthology) for a query species with only protein sequences provided. Assessment of the networks by JiffyNet demonstrated generally high predictive ability for pathway annotations. Furthermore, JiffyNet provides network visualization and analysis pages for wide variety of molecular concepts to facilitate network-guided hypothesis generation. JiffyNet is freely accessible at http://www.jiffynet.org.

  7. Study design requirements for RNA sequencing-based breast cancer diagnostics.

    Science.gov (United States)

    Mer, Arvind Singh; Klevebring, Daniel; Grönberg, Henrik; Rantalainen, Mattias

    2016-01-01

    Sequencing-based molecular characterization of tumors provides information required for individualized cancer treatment. There are well-defined molecular subtypes of breast cancer that provide improved prognostication compared to routine biomarkers. However, molecular subtyping is not yet implemented in routine breast cancer care. Clinical translation is dependent on subtype prediction models providing high sensitivity and specificity. In this study we evaluate sample size and RNA-sequencing read requirements for breast cancer subtyping to facilitate rational design of translational studies. We applied subsampling to ascertain the effect of training sample size and the number of RNA sequencing reads on classification accuracy of molecular subtype and routine biomarker prediction models (unsupervised and supervised). Subtype classification accuracy improved with increasing sample size up to N = 750 (accuracy = 0.93), although with a modest improvement beyond N = 350 (accuracy = 0.92). Prediction of routine biomarkers achieved accuracy of 0.94 (ER) and 0.92 (Her2) at N = 200. Subtype classification improved with RNA-sequencing library size up to 5 million reads. Development of molecular subtyping models for cancer diagnostics requires well-designed studies. Sample size and the number of RNA sequencing reads directly influence accuracy of molecular subtyping. Results in this study provide key information for rational design of translational studies aiming to bring sequencing-based diagnostics to the clinic. PMID:26830453

  8. Construction of a phylogenetic tree of photosynthetic prokaryotes based on average similarities of whole genome sequences.

    Directory of Open Access Journals (Sweden)

    Soichirou Satoh

    Full Text Available Phylogenetic trees have been constructed for a wide range of organisms using gene sequence information, especially through the identification of orthologous genes that have been vertically inherited. The number of available complete genome sequences is rapidly increasing, and many tools for construction of genome trees based on whole genome sequences have been proposed. However, development of a reasonable method of using complete genome sequences for construction of phylogenetic trees has not been established. We have developed a method for construction of phylogenetic trees based on the average sequence similarities of whole genome sequences. We used this method to examine the phylogeny of 115 photosynthetic prokaryotes, i.e., cyanobacteria, Chlorobi, proteobacteria, Chloroflexi, Firmicutes and nonphotosynthetic organisms including Archaea. Although the bootstrap values for the branching order of phyla were low, probably due to lateral gene transfer and saturated mutation, the obtained tree was largely consistent with the previously reported phylogenetic trees, indicating that this method is a robust alternative to traditional phylogenetic methods.

  9. ParticleCall: A particle filter for base calling in next-generation sequencing systems

    Directory of Open Access Journals (Sweden)

    Shen Xiaohu

    2012-07-01

    Full Text Available Abstract Background Next-generation sequencing systems are capable of rapid and cost-effective DNA sequencing, thus enabling routine sequencing tasks and taking us one step closer to personalized medicine. Accuracy and lengths of their reads, however, are yet to surpass those provided by the conventional Sanger sequencing method. This motivates the search for computationally efficient algorithms capable of reliable and accurate detection of the order of nucleotides in short DNA fragments from the acquired data. Results In this paper, we consider Illumina’s sequencing-by-synthesis platform which relies on reversible terminator chemistry and describe the acquired signal by reformulating its mathematical model as a Hidden Markov Model. Relying on this model and sequential Monte Carlo methods, we develop a parameter estimation and base calling scheme called ParticleCall. ParticleCall is tested on a data set obtained by sequencing phiX174 bacteriophage using Illumina’s Genome Analyzer II. The results show that the developed base calling scheme is significantly more computationally efficient than the best performing unsupervised method currently available, while achieving the same accuracy. Conclusions The proposed ParticleCall provides more accurate calls than the Illumina’s base calling algorithm, Bustard. At the same time, ParticleCall is significantly more computationally efficient than other recent schemes with similar performance, rendering it more feasible for high-throughput sequencing data analysis. Improvement of base calling accuracy will have immediate beneficial effects on the performance of downstream applications such as SNP and genotype calling. ParticleCall is freely available at https://sourceforge.net/projects/particlecall.

  10. An Optimal Sorting of Pulse Amplitude Sequence Based on the Phased Array Radar Beam Tasks

    Institute of Scientific and Technical Information of China (English)

    Chuan Sheng∗,Yongshun Zhang; Wenlong Lu

    2016-01-01

    The study of phased array radar ( PAR) pulse amplitude sequence characteristics is the key to understand the radar’s working state and its beam’s scanning manner. According to the principle of antenna pattern formation and the searching and tracking modes of beams, this paper analyzes the characteristics and differences of pulse amplitude sequence when the radar beams work in searching and tracking modes respectively. Then an optimal sorting model of pulse amplitude sequence is established based on least⁃squares and curve⁃fitting methods. This method is helpful for acquiring the current working state of the radar and recognizing its instantaneous beam pointing by sorting the pulse amplitude sequence without the necessity to estimate the antenna pattern.

  11. Fast interactive segmentation algorithm of image sequences based on relative fuzzy connectedness

    Institute of Scientific and Technical Information of China (English)

    Tian Chunna; Gao Xinbo

    2005-01-01

    A fast interactive segmentation algorithm of image-sequences based on relative fuzzy connectedness is presented. In comparison with the original algorithm, the proposed one, with the same accuracy, accelerates the segmentation speed by three times for single image. Meanwhile, this fast segmentation algorithm is extended from single object to multiple objects and from single-image to image-sequences. Thus the segmentation of multiple objects from complex background and batch segmentation of image-sequences can be achieved. In addition, a post-processing scheme is incorporated in this algorithm, which extracts smooth edge with one-pixel-width for each segmented object. The experimental results illustrate that the proposed algorithm can obtain the object regions of interest from medical image or image-sequences as well as man-made images quickly and reliably with only a little interaction.

  12. Weather data analysis based on typical weather sequence analysis. Application: energy building simulation

    CERN Document Server

    David, Mathieu; Garde, Francois; Boyer, Harry

    2014-01-01

    In building studies dealing about energy efficiency and comfort, simulation software need relevant weather files with optimal time steps. Few tools generate extreme and mean values of simultaneous hourly data including correlation between the climatic parameters. This paper presents the C++ Runeole software based on typical weather sequences analysis. It runs an analysis process of a stochastic continuous multivariable phenomenon with frequencies properties applied to a climatic database. The database analysis associates basic statistics, PCA (Principal Component Analysis) and automatic classifications. Different ways of applying these methods will be presented. All the results are stored in the Runeole internal database that allows an easy selection of weather sequences. The extreme sequences are used for system and building sizing and the mean sequences are used for the determination of the annual cooling loads as proposed by Audrier-Cros (Audrier-Cros, 1984). This weather analysis was tested with the datab...

  13. BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark.

    Science.gov (United States)

    Thompson, Julie D; Koehl, Patrice; Ripp, Raymond; Poch, Olivier

    2005-10-01

    Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing and structural proteomics have lead to an explosion in the amount of sequence and structure information available. In response, several new multiple alignment methods have been developed that improve both the efficiency and the quality of protein alignments. Consequently, the benchmarks used to evaluate and compare these methods must also evolve. We present here the latest release of the most widely used multiple alignment benchmark, BAliBASE, which provides high quality, manually refined, reference alignments based on 3D structural superpositions. Version 3.0 of BAliBASE includes new, more challenging test cases, representing the real problems encountered when aligning large sets of complex sequences. Using a novel, semiautomatic update protocol, the number of protein families in the benchmark has been increased and representative test cases are now available that cover most of the protein fold space. The total number of proteins in BAliBASE has also been significantly increased from 1444 to 6255 sequences. In addition, full-length sequences are now provided for all test cases, which represent difficult cases for both global and local alignment programs. Finally, the BAliBASE Web site (http://www-bio3d-igbmc.u-strasbg.fr/balibase) has been completely redesigned to provide a more user-friendly, interactive interface for the visualization of the BAliBASE reference alignments and the associated annotations.

  14. Genetic characterization of three novel chicken parvovirus strains based on analysis of their coding sequences.

    Science.gov (United States)

    Koo, Bon-Sang; Lee, Hae-Rim; Jeon, Eun-Ok; Han, Moo-Sung; Min, Kyeong-Cheol; Lee, Seung-Baek; Bae, Yeon-Ji; Cho, Sun-Hyung; Mo, Jong-Suk; Kwon, Hyuk Moo; Sung, Haan Woo; Kim, Jong-Nyeo; Mo, In-Pil

    2015-01-01

    Chicken parvovirus (ChPV) is one of the causative agents of viral enteritis. Recently, the genome of the ABU-P1 strain of ChPV was fully sequenced and determined to have a distinct genomic composition compared with that of vertebrate parvoviruses. However, no comparative sequence analysis of coding regions of ChPVs was possible because of the lack of other sequence information. In this study, we obtained the nucleotide sequences of all genomic coding regions of three ChPVs by polymerase chain reaction using 13 primer sets, and deduced the amino acid sequences from the nucleotide sequences. The non-structural protein 1 (NS1) gene of the three ChPVs showed 95.0 to 95.5% nucleotide sequence identity and 96.5 to 98.1% amino acid sequence identity to those of NS1 from the ABU-P1 strain, respectively, and even higher nucleotide and amino acid similarities to one another. The viral proteins (VP) gene was more divergent between the three ChPV Korean strains and ABU-P1, with 88.1 to 88.3% nucleotide identity and 93.0% amino acid identity. Analysis of the putative tertiary structure of the ChPV VP2 protein showed that variable regions with less than 80% nucleotide similarity between the three Korean strains and ABU-P1 occurred in large loops of the VP2 protein believed to be involved in antigenicity, pathogenicity, and tissue tropism in other parvoviruses. Based on our analysis of full-length coding sequences, we discovered greater variation in ChPV strains than reported previously, especially in partial regions of the VP2 protein.

  15. Enhancing Students Motivation towards School Science with an Inquiry - Based Site Visit Teaching Sequence: A Design - Based Research Approach

    OpenAIRE

    Anni Loukomies

    2013-01-01

    An inquiry-based site visit teaching sequence for school science was designed in co-operation with researchers and science teachers, according to the principles of Design Based Research (DBR). Out-of-school industry site visits were central in the design. Theory-based conjectures arising from the literature on motivation, interest and inquiry-based science teaching (IBST) were embodied in the design solution, and these embodied conjectures were studied in order to uncover the aspects of the d...

  16. A Model for Protein Sequence Evolution Based on Selective Pressure for Protein Stability: Application to Hemoglobins

    OpenAIRE

    Lorraine Marsh

    2009-01-01

    Negative selection against protein instability is a central influence on evolution of proteins. Protein stability is maintained over evolution despite changes in underlying sequences. An empirical all-site stability-based model of evolution was developed to focus on the selection of residues arising from their contributions to protein stability. In this model, site rates could vary. A structure-based method was used to predict stationary frequencies of hemoglobin residues based on their prope...

  17. Local Sequence Information-based Support Vector Machine to Classify Voltage-gated Potassium Channels

    Institute of Scientific and Technical Information of China (English)

    Li-Xia LIU; Meng-Long LI; Fu-Yuan TAN; Min-Chun LU; Ke-Long WANG; Yan-Zhi GUO; Zhi-Ning WEN; Lin JIANG

    2006-01-01

    In our previous work, we developed a computational tool, PreK-ClassK-ClassKv, to predict and classify potassium (K+) channels. For K+ channel prediction (PreK) and classification at family level (ClassK), this method performs well. However, it does not perform so well in classifying voltage-gated potassium (Kv) channels (ClassKv). In this paper, a new method based on the local sequence information of Kv channels is introduced to classify Kv channels. Six transmembrane domains of a Kv channel protein are used to define a protein, and the dipeptide composition technique is used to transform an amino acid sequence to a numerical sequence. A Kv channel protein is represented by a vector with 2000 elements, and a support vector machine algorithm is applied to classify Kv channels. This method shows good performance with averages of total accuracy (Acc), sensitivity (SE), specificity (SP); reliability (R) and Matthews correlation coefficient (MCC) of 98.0%, 89.9%, 100%, 0.95 and 0.94 respectively. The results indicate that the local sequence information-based method is better than the global sequence information-based method to classify Kv channels.

  18. A Novel Abundance-Based Algorithm for Binning Metagenomic Sequences Using l-Tuples

    Science.gov (United States)

    Wu, Yu-Wei; Ye, Yuzhen

    Metagenomics is the study of microbial communities sampled directly from their natural environment, without prior culturing. Among the computational tools recently developed for metagenomic sequence analysis, binning tools attempt to classify all (or most) of the sequences in a metagenomic dataset into different bins (i.e., species), based on various DNA composition patterns (e.g., the tetramer frequencies) of various genomes. Composition-based binning methods, however, cannot be used to classify very short fragments, because of the substantial variation of DNA composition patterns within a single genome. We developed a novel approach (AbundanceBin) for metagenomics binning by utilizing the different abundances of species living in the same environment. AbundanceBin is an application of the Lander-Waterman model to metagenomics, which is based on the l-tuple content of the reads. AbundanceBin achieved accurate, unsupervised, clustering of metagenomic sequences into different bins, such that the reads classified in a bin belong to species of identical or very similar abundances in the sample. In addition, AbundanceBin gave accurate estimations of species abundances, as well as their genome sizes - two important parameters for characterizing a microbial community. We also show that AbundanceBin performed well when the sequence lengths are very short (e.g. 75 bp) or have sequencing errors.

  19. DNA Sequence Optimization Based on Continuous Particle Swarm Optimization for Reliable DNA Computing and DNA Nanotechnology

    Directory of Open Access Journals (Sweden)

    N. K. Khalid

    2008-01-01

    Full Text Available Problem statement: In DNA based computation and DNA nanotechnology, the design of good DNA sequences has turned out to be an essential problem and one of the most practical and important research topics. Basically, the DNA sequence design problem is a multi-objective problem and it can be evaluated using four objective functions, namely, Hmeasure, similarity, continuity and hairpin. Approach: There are several ways to solve multi-objective problem, however, in order to evaluate the correctness of PSO algorithm in DNA sequence design, this problem is converted into single objective problem. Particle Swarm Optimization (PSO is proposed to minimize the objective in the problem, subjected to two constraints: melting temperature and GCcontent. A model is developed to present the DNA sequence design based on PSO computation. Results: Based on experiments and researches done, 20 particles are used in the implementation of the optimization process, where the average values and the standard deviation for 100 runs are shown along with comparison to other existing methods. Conclusion: The results achieve verified that PSO can suitably solves the DNA sequence design problem using the proposed method and model, comparatively better than other approaches.

  20. Sequencing-based typing reveals new insight in HLA-DPA1 polymorphism.

    Science.gov (United States)

    Rozemuller, E H; Bouwens, A G; van Oort, E; Versluis, L F; Marsh, S G; Bodmer, J G; Tilanus, M G

    1995-01-01

    An HLA-DPA1 sequencing-based typing (SBT) system has been developed to identify DPA1 alleles. Up to now eight DPA1 alleles have been defined. Six can be discriminated based upon exon 2 polymorphism. The three subtypes of DPA1*01: DPA1*0101, DPA1*0102 and DPA1*0103, have identical exon 2 sequences but show differences in exon 4. Exon 4 sequences were known for only the three DPA1*01 subtypes and for DPA1*0201. We now present additional sequence information for exon 4 and the unknown segments at the 3' end of exon 2. Additionally with the use of this sequencing technique it is also possible to identify previously unidentified polymorphism. We have studied the exon 2 and exon 4 polymorphism of DPA1 in 40 samples which include all known DPA1 alleles. A new allele, DPA1*01 new, was identified which differs by one nucleotide in exon 2 from DPA1*0103, resulting in an aspartic acid at codon 28. The DPA1*01 subtypes DPA1*0101 and DPA1*0102 could not be confirmed in samples which previously were used to define these subtypes, and consequently they do not exist. The exon 4 sequence of DPA1*0201 is corrected based on sequence data of DAUDI, the cell line in which DPA1*0202 was originally defined. The exon 4 regions of the remaining four alleles were resolved: the exon 4 regions of the alleles DPA1*02021 and DPA1*02022 were found to be identical to the--corrected--DPA1*0201 whereas the exon 4 region of DPA1*0301 differs by one nucleotide compared to DPA1*0103. The DPA1*0401 exon 4 region differs by one nucleotide compared to the corrected DPA1*0201.(ABSTRACT TRUNCATED AT 250 WORDS)

  1. Pigs in sequence space: A 0.66X coverage pig genome survey based on shotgun sequencing

    DEFF Research Database (Denmark)

    Wernersson, Rasmus; Schierup, M.H.; Jorgensen, F.G.;

    2005-01-01

    sequences (0.66X coverage) from the pig genome. The data are hereby released (NCBI Trace repository with center name "SDJVP", and project name "Sino-Danish Pig Genome Project") together with an initial evolutionary analysis. The non-repetitive fraction of the sequences was aligned to the UCSC human...

  2. Experimental design-based functional mining and characterization of high-throughput sequencing data in the sequence read archive.

    Directory of Open Access Journals (Sweden)

    Takeru Nakazato

    Full Text Available High-throughput sequencing technology, also called next-generation sequencing (NGS, has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA. As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called "Gendoo". We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called "DBCLS SRA" (http://sra.dbcls.jp/. This service will improve accessibility to high-quality data from SRA.

  3. Genotyping-in-Thousands by sequencing (GT-seq): A cost effective SNP genotyping method based on custom amplicon sequencing.

    Science.gov (United States)

    Campbell, Nathan R; Harmon, Stephanie A; Narum, Shawn R

    2015-07-01

    Genotyping-in-Thousands by sequencing (GT-seq) is a method that uses next-generation sequencing of multiplexed PCR products to generate genotypes from relatively small panels (50-500) of targeted single-nucleotide polymorphisms (SNPs) for thousands of individuals in a single Illumina HiSeq lane. This method uses only unlabelled oligos and PCR master mix in two thermal cycling steps for amplification of targeted SNP loci. During this process, sequencing adapters and dual barcode sequence tags are incorporated into the amplicons enabling thousands of individuals to be pooled into a single sequencing library. Post sequencing, reads from individual samples are split into individual files using their unique combination of barcode sequences. Genotyping is performed with a simple perl script which counts amplicon-specific sequences for each allele, and allele ratios are used to determine the genotypes. We demonstrate this technique by genotyping 2068 individual steelhead trout (Oncorhynchus mykiss) samples with a set of 192 SNP markers in a single library sequenced in a single Illumina HiSeq lane. Genotype data were 99.9% concordant to previously collected TaqMan(™) genotypes at the same 192 loci, but call rates were slightly lower with GT-seq (96.4%) relative to Taqman (99.0%). Of the 192 SNPs, 187 were genotyped in ≥90% of the individual samples and only 3 SNPs were genotyped in <70% of samples. This study demonstrates amplicon sequencing with GT-seq greatly reduces the cost of genotyping hundreds of targeted SNPs relative to existing methods by utilizing a simple library preparation method and massive efficiency of scale.

  4. Mitochondrial DNA sequence-based phylogenetic relationship among flesh flies of the genus Sarcophaga (Sarcophagidae: Diptera)

    Indian Academy of Sciences (India)

    Neelam Bajpai; Raghav Ram Tewari

    2010-04-01

    The phylogenetic relationships among flesh flies of the family Sarcophagidae has been based mainly on the morphology of male genitalia. However, the male genitalic character-based relationships are far from satisfactory. Therefore, in the present study mitochondrial DNA has been used as marker to unravel genetic relatedness and to construct phylogeny among five sympatric species of the genus Sarcophaga. Two mitochondrial genes viz., cytochrome oxidase subunit 1 (COI) and NAD dehydrogenase subunit 5 (ND5) were sequenced and genetic distance values were calculated on the basis of sequence differences in both the mitochondrial genes. The data revealed very few genetic difference among the five species for the COI and ND5 gene sequences.

  5. DNA LOSSLESS DIFFERENTIAL COMPRESSION ALGORITHM BASED ON SIMILARITY OF GENOMIC SEQUENCE DATABASE

    Directory of Open Access Journals (Sweden)

    Heba Afify

    2011-09-01

    Full Text Available Modern biological science produces vast amounts of genomic sequence data. This is fuelling the need forefficient algorithms for sequence compression and analysis. Data compression and the associatedtechniques coming from information theory are often perceived as being of interest for datacommunication and storage. In recent years, a substantial effort has been made for the application oftextual data compression techniques to various computational biology tasks, ranging from storage andindexing of large datasets to comparison of genomic databases. This paper presents a differentialcompression algorithm that is based on production of difference sequences according to op-code table inorder to optimize the compression of homologous sequences in dataset. Therefore, the stored data arecomposed of reference sequence, the set of differences, and differences locations, instead of storing eachsequence individually. This algorithm does not require a priori knowledge about the statistics of thesequence set. The algorithm was applied to three different datasets of genomic sequences, it achieved upto 195-fold compression rate corresponding to 99.4% space saving.

  6. Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency

    Directory of Open Access Journals (Sweden)

    Inês Soares

    2012-01-01

    Full Text Available The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions. In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L—L-words—in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.

  7. Mapping Protein-DNA Interactions Using ChIP-exo and Illumina-Based Sequencing.

    Science.gov (United States)

    Barfeld, Stefan J; Mills, Ian G

    2016-01-01

    Chromatin immunoprecipitation (ChIP) provides a means of enriching DNA associated with transcription factors, histone modifications, and indeed any other proteins for which suitably characterized antibodies are available. Over the years, sequence detection has progressed from quantitative real-time PCR and Southern blotting to microarrays (ChIP-chip) and now high-throughput sequencing (ChIP-seq). This progression has vastly increased the sequence coverage and data volumes generated. This in turn has enabled informaticians to predict the identity of multi-protein complexes on DNA based on the overrepresentation of sequence motifs in DNA enriched by ChIP with a single antibody against a single protein. In the course of the development of high-throughput sequencing, little has changed in the ChIP methodology until recently. In the last three years, a number of modifications have been made to the ChIP protocol with the goal of enhancing the sensitivity of the method and further reducing the levels of nonspecific background sequences in ChIPped samples. In this chapter, we provide a brief commentary on these methodological changes and describe a detailed ChIP-exo method able to generate narrower peaks and greater peak coverage from ChIPped material.

  8. Context based computational analysis and characterization of ARS consensus sequences (ACS) of Saccharomyces cerevisiae genome.

    Science.gov (United States)

    Singh, Vinod Kumar; Krishnamachari, Annangarachari

    2016-09-01

    Genome-wide experimental studies in Saccharomyces cerevisiae reveal that autonomous replicating sequence (ARS) requires an essential consensus sequence (ACS) for replication activity. Computational studies identified thousands of ACS like patterns in the genome. However, only a few hundreds of these sites act as replicating sites and the rest are considered as dormant or evolving sites. In a bid to understand the sequence makeup of replication sites, a content and context-based analysis was performed on a set of replicating ACS sequences that binds to origin-recognition complex (ORC) denoted as ORC-ACS and non-replicating ACS sequences (nrACS), that are not bound by ORC. In this study, DNA properties such as base composition, correlation, sequence dependent thermodynamic and DNA structural profiles, and their positions have been considered for characterizing ORC-ACS and nrACS. Analysis reveals that ORC-ACS depict marked differences in nucleotide composition and context features in its vicinity compared to nrACS. Interestingly, an A-rich motif was also discovered in ORC-ACS sequences within its nucleosome-free region. Profound changes in the conformational features, such as DNA helical twist, inclination angle and stacking energy between ORC-ACS and nrACS were observed. Distribution of ACS motifs in the non-coding segments points to the locations of ORC-ACS which are found far away from the adjacent gene start position compared to nrACS thereby enabling an accessible environment for ORC-proteins. Our attempt is novel in considering the contextual view of ACS and its flanking region along with nucleosome positioning in the S. cerevisiae genome and may be useful for any computational prediction scheme. PMID:27508123

  9. Context based computational analysis and characterization of ARS consensus sequences (ACS of Saccharomyces cerevisiae genome

    Directory of Open Access Journals (Sweden)

    Vinod Kumar Singh

    2016-09-01

    Full Text Available Genome-wide experimental studies in Saccharomyces cerevisiae reveal that autonomous replicating sequence (ARS requires an essential consensus sequence (ACS for replication activity. Computational studies identified thousands of ACS like patterns in the genome. However, only a few hundreds of these sites act as replicating sites and the rest are considered as dormant or evolving sites. In a bid to understand the sequence makeup of replication sites, a content and context-based analysis was performed on a set of replicating ACS sequences that binds to origin-recognition complex (ORC denoted as ORC-ACS and non-replicating ACS sequences (nrACS, that are not bound by ORC. In this study, DNA properties such as base composition, correlation, sequence dependent thermodynamic and DNA structural profiles, and their positions have been considered for characterizing ORC-ACS and nrACS. Analysis reveals that ORC-ACS depict marked differences in nucleotide composition and context features in its vicinity compared to nrACS. Interestingly, an A-rich motif was also discovered in ORC-ACS sequences within its nucleosome-free region. Profound changes in the conformational features, such as DNA helical twist, inclination angle and stacking energy between ORC-ACS and nrACS were observed. Distribution of ACS motifs in the non-coding segments points to the locations of ORC-ACS which are found far away from the adjacent gene start position compared to nrACS thereby enabling an accessible environment for ORC-proteins. Our attempt is novel in considering the contextual view of ACS and its flanking region along with nucleosome positioning in the S. cerevisiae genome and may be useful for any computational prediction scheme.

  10. Simultaneous genomic identification and profiling of a single cell using semiconductor-based next generation sequencing

    Directory of Open Access Journals (Sweden)

    Manabu Watanabe

    2014-09-01

    Full Text Available Combining single-cell methods and next-generation sequencing should provide a powerful means to understand single-cell biology and obviate the effects of sample heterogeneity. Here we report a single-cell identification method and seamless cancer gene profiling using semiconductor-based massively parallel sequencing. A549 cells (adenocarcinomic human alveolar basal epithelial cell line were used as a model. Single-cell capture was performed using laser capture microdissection (LCM with an Arcturus® XT system, and a captured single cell and a bulk population of A549 cells (≈106 cells were subjected to whole genome amplification (WGA. For cell identification, a multiplex PCR method (AmpliSeq™ SNP HID panel was used to enrich 136 highly discriminatory SNPs with a genotype concordance probability of 1031–35. For cancer gene profiling, we used mutation profiling that was performed in parallel using a hotspot panel for 50 cancer-related genes. Sequencing was performed using a semiconductor-based bench top sequencer. The distribution of sequence reads for both HID and Cancer panel amplicons was consistent across these samples. For the bulk population of cells, the percentages of sequence covered at coverage of more than 100× were 99.04% for the HID panel and 98.83% for the Cancer panel, while for the single cell percentages of sequence covered at coverage of more than 100× were 55.93% for the HID panel and 65.96% for the Cancer panel. Partial amplification failure or randomly distributed non-amplified regions across samples from single cells during the WGA procedures or random allele drop out probably caused these differences. However, comparative analyses showed that this method successfully discriminated a single A549 cancer cell from a bulk population of A549 cells. Thus, our approach provides a powerful means to overcome tumor sample heterogeneity when searching for somatic mutations.

  11. Molecular phylogeny of western Atlantic Farfantepenaeus and Litopenaeus shrimp based on mitochondrial 16S partial sequences.

    Science.gov (United States)

    Maggioni, R; Rogers, A D; Maclean, N; D'Incao, F

    2001-01-01

    Partial sequences for the 16S rRNA mitochondrial gene were obtained from 10 penaeid shrimp species: Farfantepenaeus paulensis, F. brasiliensis, F. subtilis, F. duorarum, F. aztecus, Litopenaeus schmitti, L. setiferus, and Xiphopenaeus kroyeri from the western Atlantic and L. vannamei and L. stylirostris from the eastern Pacific. Sequences were also obtained from an undescribed morphotype of pink shrimp (morphotype II) usually identified as F. subtilis. The phylogeny resulting from the 16S partial sequences showed that these species form two well-supported monophyletic clades consistent with the two genera proposed in a recent systematic review of the suborder Dendrobranchiata. This contrasted with conclusions drawn from recent molecular phylogenetic work on penaeid shrimps based on partial sequences of the mitochondrial COI region that failed to support recent revisions of the Dendrobranchiata based on morphological analysis. Consistent differences observed in the sequences for morphotype II, coupled with previous allozyme data, support the conclusion that this is a previously undescribed species of Farfantepenaeus. PMID:11161743

  12. Sequence-Based Pronunciation Variation Modeling for Spontaneous ASR Using a Noisy Channel Approach

    Science.gov (United States)

    Hofmann, Hansjörg; Sakti, Sakriani; Hori, Chiori; Kashioka, Hideki; Nakamura, Satoshi; Minker, Wolfgang

    The performance of English automatic speech recognition systems decreases when recognizing spontaneous speech mainly due to multiple pronunciation variants in the utterances. Previous approaches address this problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence have not yet been considered. In this article, the sequence-based pronunciation variation is modeled using a noisy channel approach where the spontaneous phoneme sequence is considered as a “noisy” string and the goal is to recover the “clean” string of the word sequence. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the noisy channel approach will map from the phoneme to the word level. Two well-known natural language processing approaches are adopted and derived from the noisy channel model theory: Joint-sequence models and statistical machine translation. Both of them are applied and various experiments are conducted using microphone and telephone of spontaneous speech.

  13. Teaching Research Methodology Using a Project-Based Three Course Sequence Critical Reflections on Practice

    Science.gov (United States)

    Braguglia, Kay H.; Jackson, Kanata A.

    2012-01-01

    This article presents a reflective analysis of teaching research methodology through a three course sequence using a project-based approach. The authors reflect critically on their experiences in teaching research methods courses in an undergraduate business management program. The introduction of a range of specific techniques including student…

  14. Model-Based Requirements Analysis for Reactive Systems with UML Sequence Diagrams and Coloured Petri Nets

    DEFF Research Database (Denmark)

    Tjell, Simon; Lassen, Kristian Bisgaard

    2008-01-01

    of a derivative of UML 2.0 high-level Sequence Diagrams. The automated requirement checking is part of a bigger tool framework in which VDM++ is applied to automatically generate initial CPN models based on Problem Diagrams. These models are manually enhanced to provide behavioral descriptions of the environment...

  15. Predicting and understanding transcription factor interactions based on sequence level determinants of combinatorial control

    NARCIS (Netherlands)

    Dijk, van A.D.J.; Braak, ter C.J.F.; Immink, G.H.; Angenent, G.C.; Ham, van R.C.H.J.

    2008-01-01

    Motivation: Transcription factor interactions are the cornerstone of combinatorial control, which is a crucial aspect of the gene regulatory system. Understanding and predicting transcription factor interactions based on their sequence alone is difficult since they are often part of families of fact

  16. Magnetism Teaching Sequences Based on an Inductive Approach for First-Year Thai University Science Students

    Science.gov (United States)

    Narjaikaew, Pattawan; Emarat, Narumon; Arayathanitkul, Kwan; Cowie, Bronwen

    2010-01-01

    The study investigated the impact on student motivation and understanding of magnetism of teaching sequences based on an inductive approach. The study was conducted in large lecture classes. A pre- and post-Conceptual Survey of Electricity and Magnetism was conducted with just fewer than 700 Thai undergraduate science students, before and after…

  17. Nucleic acid sequence-based amplification with oligochromatography for detection of Trypanosoma brucei in clinical samples

    NARCIS (Netherlands)

    C.M. Mugasa; T. Laurent; G.J. Schoone; P.A. Kager; G.W. Lubega; H.D.F.H. Schallig

    2009-01-01

    Molecular tools, such as real-time nucleic acid sequence-based amplification (NASBA) and PCR, have been developed to detect Trypanosoma brucei parasites in blood for the diagnosis of human African trypanosomiasis (HAT). Despite good sensitivity, these techniques are not implemented in HAT control pr

  18. Molecular phylogeny of Edraianthus (Grassy Bells; Campanulaceae) based on non-coding plastid DNA sequences

    DEFF Research Database (Denmark)

    Stefanovic, Sasa; Lakusic, Dmitar; Kuzmina, Maria;

    2008-01-01

    divided into three sections: E. sect. Edraianthus, E. sect. Uniflori, and E. sect. Spathulati. We present here the first phylogenetic study of Edraianthus based on multiple plastid DNA sequences (trnL-F region and rbcL-atpB spacer) derived from a wide taxonomic sampling and geographic range. While...

  19. CGKB: an annotation knowledge base for cowpea (Vigna unguiculata L. methylation filtered genomic genespace sequences

    Directory of Open Access Journals (Sweden)

    Spraggins Thomas A

    2007-04-01

    Full Text Available Abstract Background Cowpea [Vigna unguiculata (L. Walp.] is one of the most important food and forage legumes in the semi-arid tropics because of its ability to tolerate drought and grow on poor soils. It is cultivated mostly by poor farmers in developing countries, with 80% of production taking place in the dry savannah of tropical West and Central Africa. Cowpea is largely an underexploited crop with relatively little genomic information available for use in applied plant breeding. The goal of the Cowpea Genomics Initiative (CGI, funded by the Kirkhouse Trust, a UK-based charitable organization, is to leverage modern molecular genetic tools for gene discovery and cowpea improvement. One aspect of the initiative is the sequencing of the gene-rich region of the cowpea genome (termed the genespace recovered using methylation filtration technology and providing annotation and analysis of the sequence data. Description CGKB, Cowpea Genespace/Genomics Knowledge Base, is an annotation knowledge base developed under the CGI. The database is based on information derived from 298,848 cowpea genespace sequences (GSS isolated by methylation filtering of genomic DNA. The CGKB consists of three knowledge bases: GSS annotation and comparative genomics knowledge base, GSS enzyme and metabolic pathway knowledge base, and GSS simple sequence repeats (SSRs knowledge base for molecular marker discovery. A homology-based approach was applied for annotations of the GSS, mainly using BLASTX against four public FASTA formatted protein databases (NCBI GenBank Proteins, UniProtKB-Swiss-Prot, UniprotKB-PIR (Protein Information Resource, and UniProtKB-TrEMBL. Comparative genome analysis was done by BLASTX searches of the cowpea GSS against four plant proteomes from Arabidopsis thaliana, Oryza sativa, Medicago truncatula, and Populus trichocarpa. The possible exons and introns on each cowpea GSS were predicted using the HMM-based Genscan gene predication program and the

  20. Evolution of EF-hand calcium-modulated proteins. III. Exon sequences confirm most dendrograms based on protein sequences: calmodulin dendrograms show significant lack of parallelism

    Science.gov (United States)

    Nakayama, S.; Kretsinger, R. H.

    1993-01-01

    In the first report in this series we presented dendrograms based on 152 individual proteins of the EF-hand family. In the second we used sequences from 228 proteins, containing 835 domains, and showed that eight of the 29 subfamilies are congruent and that the EF-hand domains of the remaining 21 subfamilies have diverse evolutionary histories. In this study we have computed dendrograms within and among the EF-hand subfamilies using the encoding DNA sequences. In most instances the dendrograms based on protein and on DNA sequences are very similar. Significant differences between protein and DNA trees for calmodulin remain unexplained. In our fourth report we evaluate the sequences and the distribution of introns within the EF-hand family and conclude that exon shuffling did not play a significant role in its evolution.

  1. Genomic prediction in families of perennial ryegrass based on genotyping-by-sequencing

    DEFF Research Database (Denmark)

    Ashraf, Bilal

    In this thesis we investigate the potential for genomic prediction in perennial ryegrass using genotyping-by-sequencing (GBS) data. Association method based on family-based breeding systems was developed, genomic heritabilities, genomic prediction accurancies and effects of some key factors wer e...... prediction. Overall, GBS allows for genomic prediction in breeding families of perennial ryegrass and holds good potential to expedite genetic gain and encourage the application of genomic prediction......In this thesis we investigate the potential for genomic prediction in perennial ryegrass using genotyping-by-sequencing (GBS) data. Association method based on family-based breeding systems was developed, genomic heritabilities, genomic prediction accurancies and effects of some key factors wer...

  2. Defining and Evaluating a Core Genome Multilocus Sequence Typing Scheme for Whole-Genome Sequence-Based Typing of Listeria monocytogenes

    OpenAIRE

    Ruppitsch, Werner; Pietzka, Ariane; Prior, Karola; Bletz, Stefan; Fernandez, Haizpea Lasa; Allerberger, Franz; Harmsen, Dag; Mellmann, Alexander

    2015-01-01

    Whole-genome sequencing (WGS) has emerged today as an ultimate typing tool to characterize Listeria monocytogenes outbreaks. However, data analysis and interlaboratory comparability of WGS data are still challenging for most public health laboratories. Therefore, we have developed and evaluated a new L. monocytogenes typing scheme based on genome-wide gene-by-gene comparisons (core genome multilocus the sequence typing [cgMLST]) to allow for a unique typing nomenclature. Initially, we determi...

  3. The effects of diffusion on an exonuclease/nanopore-based DNA sequencing engine

    OpenAIRE

    Reiner, Joseph E.; Balijepalli, Arvind; Robertson, Joseph W. F.; Drown, Bryon S.; Burden, Daniel L.; Kasianowicz, John J.

    2012-01-01

    Over 15 years ago, the ability to electrically detect and characterize individual polynucleotides as they are driven through a single protein ion channel was suggested as a potential method for rapidly sequencing DNA, base-by-base, in a ticker tape-like fashion. More recently, a variation of this method was proposed in which a nanopore would instead detect single nucleotides cleaved sequentially by an exonuclease enzyme in close proximity to one pore entrance. We analyze the exonuclease/nanop...

  4. Security Analysis of a Block Encryption Algorithm Based on Dynamic Sequences of Multiple Chaotic Systems

    Institute of Scientific and Technical Information of China (English)

    DU Mao-Kang; HE Bo; WANG Yong

    2011-01-01

    Recently, the cryptosystem based on chaos has attracted much attention. Wang and Yu (Commun. Nonlin. Sci. Numer. Simulat. 14(2009)574) proposed a block encryption algorithm based on dynamic sequences of multiple chaotic systems. We analyze the potential Saws in the algorithm. Then, a chosen-plaintext attack is presented. Some remedial measures are suggested to avoid the flaws effectively. Furthermore, an improved encryption algorithm is proposed to resist the attacks and to keep all the merits of the original cryptosystem.

  5. A new RF tagging pulse based on the Frank poly-phase perfect sequence

    DEFF Research Database (Denmark)

    Laustsen, Christoffer; Greferath, Marcus; Ringgaard, Steffen;

    2014-01-01

    Radio frequency (RF) spectrally selective multiband pulses or tagging pulses, are applicable in a broad range of magnetic resonance methods. We demonstrate through simulations and experiments a new phase-modulation-only RF pulse for RF tagging based on the Frank poly-phase perfect sequence. In...... addition, we introduce an extended version with a WURST modulation (Frank-WURST). The new pulses exhibit interesting and flexible spin tagging properties and are easily implemented in existing MR sequences, where they can substitute slice-selective pulses with no additional alterations....

  6. Use of polyphase continuous excitation based on the Frank sequence in EPR.

    Science.gov (United States)

    Tseitlin, Mark; Quine, Richard W; Eaton, Sandra S; Eaton, Gareth R

    2011-08-01

    Polyphase continuous excitation based on the Frank sequence is suggested as an alternative to single pulse excitation in EPR. The method allows reduction of the source power, while preserving the excitation bandwidth of a single pulse. For practical EPR implementation the use of a cross-loop resonator is essential to provide isolation between the spin system and the resonator responses to the excitation. Provided that a line broadening of about 5% is acceptable, the cumulative turning angle of the magnetization vector generated by the excitation sequence can be quite large and can produce signal amplitudes that are comparable to that achieved with a higher power 90° pulse. PMID:21737326

  7. Autonomously Generating Operations Sequences for a Mars Rover Using Artificial Intelligence-Based Planning

    Science.gov (United States)

    Sherwood, R.; Mutz, D.; Estlin, T.; Chien, S.; Backes, P.; Norris, J.; Tran, D.; Cooper, B.; Rabideau, G.; Mishkin, A.; Maxwell, S.

    2001-07-01

    This article discusses a proof-of-concept prototype for ground-based automatic generation of validated rover command sequences from high-level science and engineering activities. This prototype is based on ASPEN, the Automated Scheduling and Planning Environment. This artificial intelligence (AI)-based planning and scheduling system will automatically generate a command sequence that will execute within resource constraints and satisfy flight rules. An automated planning and scheduling system encodes rover design knowledge and uses search and reasoning techniques to automatically generate low-level command sequences while respecting rover operability constraints, science and engineering preferences, environmental predictions, and also adhering to hard temporal constraints. This prototype planning system has been field-tested using the Rocky 7 rover at JPL and will be field-tested on more complex rovers to prove its effectiveness before transferring the technology to flight operations for an upcoming NASA mission. Enabling goal-driven commanding of planetary rovers greatly reduces the requirements for highly skilled rover engineering personnel. This in turn greatly reduces mission operations costs. In addition, goal-driven commanding permits a faster response to changes in rover state (e.g., faults) or science discoveries by removing the time-consuming manual sequence validation process, allowing rapid "what-if" analyses, and thus reducing overall cycle times.

  8. DNAskew: Statistical Analysis of Base Compositional Asymmetry and Prediction of Replication Boundaries in the Genome Sequences

    Institute of Scientific and Technical Information of China (English)

    Xiang-RuMA; Shao-BoXIAO; Ai-ZhenGUO; Jian-QiangLUE; Huan-ChunCHEN

    2004-01-01

    Sueoka and Lobry declared respectively that, in the absence of bias between the two DNA strands for mutation and selection, the base composition within each strand should be A=T and C=G (this state is called Parity Rule type 2, PR2). However, the genome sequences of many bacteria, vertebrates and viruses showed asymmetries in base composition and gene direction. To determine the relationship of base composition skews with replication orientation, gene function, codon usage biases and phylogenetic evolution,in this paper a program called DNAskew was developed for the statistical analysis of strand asymmetry and codon composition bias in the DNA sequence. In addition, the program can also be used to predict the replication boundaries of genome sequences. The method builds on the fact that there are compositional asymmetries between the leading and the lagging strand for replication. DNAskew was written in Perl script language and implemented on the LINUX operating system. It works quickly with annotated or unannotated sequences in GBFF (GenBank flatfile) or fasta format. The source code is freely available for academic use at http://www.epizooty.com/pub/stat/DNAskew.

  9. MethylC-seq library preparation for base-resolution whole-genome bisulfite sequencing.

    Science.gov (United States)

    Urich, Mark A; Nery, Joseph R; Lister, Ryan; Schmitz, Robert J; Ecker, Joseph R

    2015-03-01

    Current high-throughput DNA sequencing technologies enable acquisition of billions of data points through which myriad biological processes can be interrogated, including genetic variation, chromatin structure, gene expression patterns, small RNAs and protein-DNA interactions. Here we describe the MethylC-sequencing (MethylC-seq) library preparation method, a 2-d protocol that enables the genome-wide identification of cytosine DNA methylation states at single-base resolution. The technique involves fragmentation of genomic DNA followed by adapter ligation, bisulfite conversion and limited amplification using adapter-specific PCR primers in preparation for sequencing. To date, this protocol has been successfully applied to genomic DNA isolated from primary cell culture, sorted cells and fresh tissue from over a thousand plant and animal samples.

  10. Identification of Chlorophyceae based on 18S rDNA sequences from Persian Gulf.

    Directory of Open Access Journals (Sweden)

    Raheem Haddad

    2014-12-01

    Full Text Available Chlorophyceae are important constituents of marine phytoplankton. The taxonomy of Chlorophyceae was traditionally based solely on morphological characteristics. In the present research project, genetic diversity was investigated to analyze five species of Chlorophyceae from waters of the Persian Gulf.A clone library of the ribosomal small subunit RNA gene (18S rDNA in the nuclear genome was constructed by PCR, and then, after examining the clones, selected clones were sequenced. The determined clone sequences were analyzed by a similarity search of the NCBI GenBank database using BLAST.Eleven sequences were identified correctly and used for phylogenetic analysis. We identified species of Chlorophyta (Chlorella sorokiniana, Chlamydomonas sp., Neochloris aquatic, Picochlorum sp. and Nannochloris atomus without the need to conduct extensive colony isolation techniques. Therefore, this improved molecular method can be used to generate a robust database describing the species diversity of environmental samples.

  11. Tracing the Spread of Clostridium difficile Ribotype 027 in Germany Based on Bacterial Genome Sequences.

    Directory of Open Access Journals (Sweden)

    Matthias Steglich

    Full Text Available We applied whole-genome sequencing to reconstruct the spatial and temporal dynamics underpinning the expansion of Clostridium difficile ribotype 027 in Germany. Based on re-sequencing of genomes from 57 clinical C. difficile isolates, which had been collected from hospitalized patients at 36 locations throughout Germany between 1990 and 2012, we demonstrate that C. difficile genomes have accumulated sequence variation sufficiently fast to document the pathogen's spread at a regional scale. We detected both previously described lineages of fluoroquinolone-resistant C. difficile ribotype 027, FQR1 and FQR2. Using Bayesian phylogeographic analyses, we show that fluoroquinolone-resistant C. difficile 027 was imported into Germany at least four times, that it had been widely disseminated across multiple federal states even before the first outbreak was noted in 2007, and that it has continued to spread since.

  12. Analyzing Plasmodium falciparum erythrocyte membrane protein 1 gene expression by a next generation sequencing based method

    DEFF Research Database (Denmark)

    Jespersen, Jakob S.; Petersen, Bent; Seguin-Orlando, Andaine;

    2013-01-01

    Plasmodium falciparum is responsible for most cases of severe malaria and causes >1 million deaths every year. The particular virulence of this Plasmodium species is highly associated with the expression of certain members of the Plasmodium falciparum erythrocyte membrane protein 1(PfEMP1) family......, encoded by ~60 highly variable 'var' genes per haploid genome. PfEMP1 is exported to the surface of infected erythrocytes and is thought to be fundamental to immune evasion by adhesion to host and parasite factors. The highly variable nature has constituted a roadblock in var expression studies aimed...... at identifying PfEMP1 features associated with high virulence. Here we present the first effective method for sequence analysis of var genes expressed in field samples: a sequential PCR and next generation sequencing based technique applied on expressed var sequence tags and subsequently on long range PCR...

  13. Reproducible analysis of sequencing-based RNA structure probing data with user-friendly tools

    DEFF Research Database (Denmark)

    Kielpinski, Lukasz Jan; Sidiropoulos, Nikos; Vinther, Jeppe

    2015-01-01

    time also made analysis of the data challenging for scientists without formal training in computational biology. Here, we discuss different strategies for data analysis of massive parallel sequencing-based structure-probing data. To facilitate reproducible and standardized analysis of this type of data......RNA structure-probing data can improve the prediction of RNA secondary and tertiary structure and allow structural changes to be identified and investigated. In recent years, massive parallel sequencing has dramatically improved the throughput of RNA structure probing experiments, but at the same......, we have made a collection of tools, which allow raw sequencing reads to be converted to normalized probing values using different published strategies. In addition, we also provide tools for visualization of the probing data in the UCSC Genome Browser and for converting RNA coordinates to genomic...

  14. HomPPI: a class of sequence homology based protein-protein interface prediction methods

    Directory of Open Access Journals (Sweden)

    Dobbs Drena

    2011-06-01

    Full Text Available Abstract Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i NPS-HomPPI (Non partner-specific HomPPI, which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii PS-HomPPI (Partner-specific HomPPI, which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of

  15. Extracting flat-field images from scene-based image sequences using phase correlation

    Science.gov (United States)

    Caron, James N.; Montes, Marcos J.; Obermark, Jerome L.

    2016-06-01

    Flat-field image processing is an essential step in producing high-quality and radiometrically calibrated images. Flat-fielding corrects for variations in the gain of focal plane array electronics and unequal illumination from the system optics. Typically, a flat-field image is captured by imaging a radiometrically uniform surface. The flat-field image is normalized and removed from the images. There are circumstances, such as with remote sensing, where a flat-field image cannot be acquired in this manner. For these cases, we developed a phase-correlation method that allows the extraction of an effective flat-field image from a sequence of scene-based displaced images. The method uses sub-pixel phase correlation image registration to align the sequence to estimate the static scene. The scene is removed from sequence producing a sequence of misaligned flat-field images. An average flat-field image is derived from the realigned flat-field sequence.

  16. New Restructure of Transmitted Sequences for CP-based LS Channel Estimation Method in OFDM System

    Directory of Open Access Journals (Sweden)

    Wang-Xing Zhao

    2012-05-01

    Full Text Available This study proceeded to investigate a study and a signal processing on channel estimation problem of OFDM system in wireless communication area. We gave an optimization stretching into total transmitted sequences restructure which aimed to improve Cyclic-prefix Least Square (CPLS channel estimation method proposed in this paper. By contrast to conventional Training Sequences (TS methods especially frequency TS, like sub-carriers TS, which directly occupy sub-carrier data sequences, CPLS method can greater the usage of cyclic-prefix thus saves system resources. In detail, we first gave a deduction and Mean Square Error (MSE compare of both CPLS and Equally Spaced Training Sequences (ESTS method based on LS principle. Then according to the deduction, we mainly concerned the restructure of total transmitted sequences using optimization tools, the Lagrange resolving, before which an constrain model was established, effects of our restructure were that it largely lower down the channel estimation MSE by using CPLS, while system BER was also improved. In the last part simulations given showed the correctness of our restructure theory.

  17. Clinical Sequencing Exploratory Research Consortium: Accelerating Evidence-Based Practice of Genomic Medicine.

    Science.gov (United States)

    Green, Robert C; Goddard, Katrina A B; Jarvik, Gail P; Amendola, Laura M; Appelbaum, Paul S; Berg, Jonathan S; Bernhardt, Barbara A; Biesecker, Leslie G; Biswas, Sawona; Blout, Carrie L; Bowling, Kevin M; Brothers, Kyle B; Burke, Wylie; Caga-Anan, Charlisse F; Chinnaiyan, Arul M; Chung, Wendy K; Clayton, Ellen W; Cooper, Gregory M; East, Kelly; Evans, James P; Fullerton, Stephanie M; Garraway, Levi A; Garrett, Jeremy R; Gray, Stacy W; Henderson, Gail E; Hindorff, Lucia A; Holm, Ingrid A; Lewis, Michelle Huckaby; Hutter, Carolyn M; Janne, Pasi A; Joffe, Steven; Kaufman, David; Knoppers, Bartha M; Koenig, Barbara A; Krantz, Ian D; Manolio, Teri A; McCullough, Laurence; McEwen, Jean; McGuire, Amy; Muzny, Donna; Myers, Richard M; Nickerson, Deborah A; Ou, Jeffrey; Parsons, Donald W; Petersen, Gloria M; Plon, Sharon E; Rehm, Heidi L; Roberts, J Scott; Robinson, Dan; Salama, Joseph S; Scollon, Sarah; Sharp, Richard R; Shirts, Brian; Spinner, Nancy B; Tabor, Holly K; Tarczy-Hornoch, Peter; Veenstra, David L; Wagle, Nikhil; Weck, Karen; Wilfond, Benjamin S; Wilhelmsen, Kirk; Wolf, Susan M; Wynn, Julia; Yu, Joon-Ho

    2016-06-01

    Despite rapid technical progress and demonstrable effectiveness for some types of diagnosis and therapy, much remains to be learned about clinical genome and exome sequencing (CGES) and its role within the practice of medicine. The Clinical Sequencing Exploratory Research (CSER) consortium includes 18 extramural research projects, one National Human Genome Research Institute (NHGRI) intramural project, and a coordinating center funded by the NHGRI and National Cancer Institute. The consortium is exploring analytic and clinical validity and utility, as well as the ethical, legal, and social implications of sequencing via multidisciplinary approaches; it has thus far recruited 5,577 participants across a spectrum of symptomatic and healthy children and adults by utilizing both germline and cancer sequencing. The CSER consortium is analyzing data and creating publically available procedures and tools related to participant preferences and consent, variant classification, disclosure and management of primary and secondary findings, health outcomes, and integration with electronic health records. Future research directions will refine measures of clinical utility of CGES in both germline and somatic testing, evaluate the use of CGES for screening in healthy individuals, explore the penetrance of pathogenic variants through extensive phenotyping, reduce discordances in public databases of genes and variants, examine social and ethnic disparities in the provision of genomics services, explore regulatory issues, and estimate the value and downstream costs of sequencing. The CSER consortium has established a shared community of research sites by using diverse approaches to pursue the evidence-based development of best practices in genomic medicine. PMID:27181682

  18. Clinical Sequencing Exploratory Research Consortium: Accelerating Evidence-Based Practice of Genomic Medicine.

    Science.gov (United States)

    Green, Robert C; Goddard, Katrina A B; Jarvik, Gail P; Amendola, Laura M; Appelbaum, Paul S; Berg, Jonathan S; Bernhardt, Barbara A; Biesecker, Leslie G; Biswas, Sawona; Blout, Carrie L; Bowling, Kevin M; Brothers, Kyle B; Burke, Wylie; Caga-Anan, Charlisse F; Chinnaiyan, Arul M; Chung, Wendy K; Clayton, Ellen W; Cooper, Gregory M; East, Kelly; Evans, James P; Fullerton, Stephanie M; Garraway, Levi A; Garrett, Jeremy R; Gray, Stacy W; Henderson, Gail E; Hindorff, Lucia A; Holm, Ingrid A; Lewis, Michelle Huckaby; Hutter, Carolyn M; Janne, Pasi A; Joffe, Steven; Kaufman, David; Knoppers, Bartha M; Koenig, Barbara A; Krantz, Ian D; Manolio, Teri A; McCullough, Laurence; McEwen, Jean; McGuire, Amy; Muzny, Donna; Myers, Richard M; Nickerson, Deborah A; Ou, Jeffrey; Parsons, Donald W; Petersen, Gloria M; Plon, Sharon E; Rehm, Heidi L; Roberts, J Scott; Robinson, Dan; Salama, Joseph S; Scollon, Sarah; Sharp, Richard R; Shirts, Brian; Spinner, Nancy B; Tabor, Holly K; Tarczy-Hornoch, Peter; Veenstra, David L; Wagle, Nikhil; Weck, Karen; Wilfond, Benjamin S; Wilhelmsen, Kirk; Wolf, Susan M; Wynn, Julia; Yu, Joon-Ho

    2016-06-01

    Despite rapid technical progress and demonstrable effectiveness for some types of diagnosis and therapy, much remains to be learned about clinical genome and exome sequencing (CGES) and its role within the practice of medicine. The Clinical Sequencing Exploratory Research (CSER) consortium includes 18 extramural research projects, one National Human Genome Research Institute (NHGRI) intramural project, and a coordinating center funded by the NHGRI and National Cancer Institute. The consortium is exploring analytic and clinical validity and utility, as well as the ethical, legal, and social implications of sequencing via multidisciplinary approaches; it has thus far recruited 5,577 participants across a spectrum of symptomatic and healthy children and adults by utilizing both germline and cancer sequencing. The CSER consortium is analyzing data and creating publically available procedures and tools related to participant preferences and consent, variant classification, disclosure and management of primary and secondary findings, health outcomes, and integration with electronic health records. Future research directions will refine measures of clinical utility of CGES in both germline and somatic testing, evaluate the use of CGES for screening in healthy individuals, explore the penetrance of pathogenic variants through extensive phenotyping, reduce discordances in public databases of genes and variants, examine social and ethnic disparities in the provision of genomics services, explore regulatory issues, and estimate the value and downstream costs of sequencing. The CSER consortium has established a shared community of research sites by using diverse approaches to pursue the evidence-based development of best practices in genomic medicine.

  19. [Characterization of Black and Dichothrix Cyanobacteria Based on the 16S Ribosomal RNA Gene Sequence

    Science.gov (United States)

    Ortega, Maya

    2010-01-01

    My project focuses on characterizing different cyanobacteria in thrombolitic mats found on the island of Highborn Cay, Bahamas. Thrombolites are interesting ecosystems because of the ability of bacteria in these mats to remove carbon dioxide from the atmosphere and mineralize it as calcium carbonate. In the future they may be used as models to develop carbon sequestration technologies, which could be used as part of regenerative life systems in space. These thrombolitic communities are also significant because of their similarities to early communities of life on Earth. I targeted two cyanobacteria in my research, Dichothrix spp. and whatever black is, since they are believed to be important to carbon sequestration in these thrombolitic mats. The goal of my summer research project was to molecularly identify these two cyanobacteria. DNA was isolated from each organism through mat dissections and DNA extractions. I ran Polymerase Chain Reactions (PCR) to amplify the 16S ribosomal RNA (rRNA) gene in each cyanobacteria. This specific gene is found in almost all bacteria and is highly conserved, meaning any changes in the sequence are most likely due to evolution. As a result, the 16S rRNA gene can be used for bacterial identification of different species based on the sequence of their 16S rRNA gene. Since the exact sequence of the Dichothrix gene was unknown, I designed different primers that flanked the gene based on the known sequences from other taxonomically similar cyanobacteria. Once the 16S rRNA gene was amplified, I cloned the gene into specialized Escherichia coli cells and sent the gene products for sequencing. Once the sequence is obtained, it will be added to a genetic database for future reference to and classification of other Dichothrix sp.

  20. Sequencing-based variant detection in the polyploid crop oilseed rape

    Science.gov (United States)

    2013-01-01

    Background The detection and exploitation of genetic variation underpins crop improvement. However, the polyploid nature of the genomes of many of our most important crops represents a barrier, particularly for the analysis of variation within genes. To overcome this, we aimed to develop methodologies based on amplicon sequencing that involve the incorporation of barcoded amplification tags (BATs) into PCR products. Results A protocol was developed to tag PCR products with 5’ 6-base oligonucleotide barcode extensions before pooling for sequencing library production using standard Illumina adapters. A computational method was developed for the de-convolution of products and the robust detection and scoring of sequence variants. Using this methodology, amplicons targeted to gene sequences were screened across a B. napus mapping population and the resulting allele scoring strings for 24 markers linkage mapped to the expected regions of the genome. Furthermore, using one-dimensional 8-fold pooling, 4608 lines of a B. napus mutation population were screened for induced mutations in a locus-specific amplicon (an orthologue of GL2.b) and mixed product of three co-amplified loci (orthologues of FAD2), identifying 10 and 41 mutants respectively. Conclusions The utilisation of barcode tags to de-convolute pooled PCR products in multiplexed, variation screening via Illumina sequencing provides a cost effective method for SNP genotyping and mutation detection and, potentially, markers for causative changes, even in polyploid species. Combining this approach with existing Illumina multiplexing workflows allows the analysis of thousands of lines cheaply and efficiently in a single sequencing run with minimal library production costs. PMID:23915099

  1. Quantitative sequence-function relationships in proteins based on gene ontology

    Directory of Open Access Journals (Sweden)

    Lesk Arthur M

    2007-08-01

    Full Text Available Abstract Background The relationship between divergence of amino-acid sequence and divergence of function among homologous proteins is complex. The assumption that homologs share function – the basis of transfer of annotations in databases – must therefore be regarded with caution. Here, we present a quantitative study of sequence and function divergence, based on the Gene Ontology classification of function. We determined the relationship between sequence divergence and function divergence in 6828 protein families from the PFAM database. Within families there is a broad range of sequence similarity from very closely related proteins – for instance, orthologs in different mammals – to very distantly-related proteins at the limit of reliable recognition of homology. Results We correlated the divergence in sequences determined from pairwise alignments, and the divergence in function determined by path lengths in the Gene Ontology graph, taking into account the fact that many proteins have multiple functions. Our results show that, among homologous proteins, the proportion of divergent functions decreases dramatically above a threshold of sequence similarity at about 50% residue identity. For proteins with more than 50% residue identity, transfer of annotation between homologs will lead to an erroneous attribution with a totally dissimilar function in fewer than 6% of cases. This means that for very similar proteins (about 50 % identical residues the chance of completely incorrect annotation is low; however, because of the phenomenon of recruitment, it is still non-zero. Conclusion Our results describe general features of the evolution of protein function, and serve as a guide to the reliability of annotation transfer, based on the closeness of the relationship between a new protein and its nearest annotated relative.

  2. Studies of base pair sequence effects on DNA solvation based on all-atom molecular dynamics simulations

    Indian Academy of Sciences (India)

    Surjit B Dixit; Mihaly Mezei; David L Beveridge

    2012-07-01

    Detailed analyses of the sequence-dependent solvation and ion atmosphere of DNA are presented based on molecular dynamics (MD) simulations on all the 136 unique tetranucleotide steps obtained by the ABC consortium using the AMBER suite of programs. Significant sequence effects on solvation and ion localization were observed in these simulations. The results were compared to essentially all known experimental data on the subject. Proximity analysis was employed to highlight the sequence dependent differences in solvation and ion localization properties in the grooves of DNA. Comparison of the MD-calculated DNA structure with canonical A- and B-forms supports the idea that the G/C-rich sequences are closer to canonical A- than B-form structures, while the reverse is true for the poly A sequences, with the exception of the alternating ATAT sequence. Analysis of hydration density maps reveals that the flexibility of solute molecule has a significant effect on the nature of observed hydration. Energetic analysis of solute–solvent interactions based on proximity analysis of solvent reveals that the GC or CG base pairs interactmore strongly with watermolecules in the minor groove of DNA that the AT or TA base pairs, while the interactions of the AT or TA pairs in the major groove are stronger than those of the GC or CG pairs. Computation of solvent-accessible surface area of the nucleotide units in the simulated trajectories reveals that the similarity with results derived from analysis of a database of crystallographic structures is excellent. The MD trajectories tend to follow Manning’s counterion condensation theory, presenting a region of condensed counterions within a radius of about 17 Å from the DNA surface independent of sequence. The GC and CG pairs tend to associate with cations in the major groove of the DNA structure to a greater extent than the AT and TA pairs. Cation association is more frequent in the minor groove of AT than the GC pairs. In general

  3. Shotgun metagenomic sequencing based microbial diversity assessment of Lasundra hot spring, India

    Directory of Open Access Journals (Sweden)

    Amit V. Mangrola

    2015-06-01

    Full Text Available This is the first report on the metagenomic approach for unveiling the microbial diversity of Lasundra hot spring, Gujarat State, India. High-throughput sequencing of community DNA was performed on an Ion Torrent PGM platform. Metagenome consisted of 606,867 sequences represent 98,567,305 bps size with an average length of 162 bps and 46% G + C content. Metagenome sequence information is available at EBI under EBI Metagenomic database with accession no. ERP009313. MG-RAST assisted community analysis revealed that 99.21% sequences were bacterial origin, 0.43% was fit to eukaryotes and 0.11% belongs to archaea. A total of 29 bacterial, 20 eukaryotic and 4 archaeal phyla were detected. Abundant genera were Bacillus (86.7%, Geobacillus (2.4%, Paenibacillus (1.0%, Clostridium (0.7% and Listeria (0.5%, that represent 91.52% in metagenome. In functional analysis, Cluster of Orthologous Group (COG based annotation revealed that 45.4% was metabolism connected and 19.6% falls in poorly characterized group. Subsystem based annotation approach suggests that the 14.0% was carbohydrates, 7.0% was protein metabolism and 3.0% genes for various stress responses together with the versatile presence of commercially useful traits.

  4. PHYLOViZ: phylogenetic inference and data visualization for sequence based typing methods

    Directory of Open Access Journals (Sweden)

    Francisco Alexandre P

    2012-05-01

    Full Text Available Abstract Background With the decrease of DNA sequencing costs, sequence-based typing methods are rapidly becoming the gold standard for epidemiological surveillance. These methods provide reproducible and comparable results needed for a global scale bacterial population analysis, while retaining their usefulness for local epidemiological surveys. Online databases that collect the generated allelic profiles and associated epidemiological data are available but this wealth of data remains underused and are frequently poorly annotated since no user-friendly tool exists to analyze and explore it. Results PHYLOViZ is platform independent Java software that allows the integrated analysis of sequence-based typing methods, including SNP data generated from whole genome sequence approaches, and associated epidemiological data. goeBURST and its Minimum Spanning Tree expansion are used for visualizing the possible evolutionary relationships between isolates. The results can be displayed as an annotated graph overlaying the query results of any other epidemiological data available. Conclusions PHYLOViZ is a user-friendly software that allows the combined analysis of multiple data sources for microbial epidemiological and population studies. It is freely available at http://www.phyloviz.net.

  5. incaRNAfbinv: a web server for the fragment-based design of RNA sequences.

    Science.gov (United States)

    Drory Retwitzer, Matan; Reinharz, Vladimir; Ponty, Yann; Waldispühl, Jérôme; Barash, Danny

    2016-07-01

    In recent years, new methods for computational RNA design have been developed and applied to various problems in synthetic biology and nanotechnology. Lately, there is considerable interest in incorporating essential biological information when solving the inverse RNA folding problem. Correspondingly, RNAfbinv aims at including biologically meaningful constraints and is the only program to-date that performs a fragment-based design of RNA sequences. In doing so it allows the design of sequences that do not necessarily exactly fold into the target, as long as the overall coarse-grained tree graph shape is preserved. Augmented by the weighted sampling algorithm of incaRNAtion, our web server called incaRNAfbinv implements the method devised in RNAfbinv and offers an interactive environment for the inverse folding of RNA using a fragment-based design approach. It takes as input: a target RNA secondary structure; optional sequence and motif constraints; optional target minimum free energy, neutrality and GC content. In addition to the design of synthetic regulatory sequences, it can be used as a pre-processing step for the detection of novel natural occurring RNAs. The two complementary methodologies RNAfbinv and incaRNAtion are merged together and fully implemented in our web server incaRNAfbinv, available at http://www.cs.bgu.ac.il/incaRNAfbinv. PMID:27185893

  6. Multiplex amplicon sequencing for microbe identification in community-based culture collections.

    Science.gov (United States)

    Armanhi, Jaderson Silveira Leite; de Souza, Rafael Soares Correa; de Araújo, Laura Migliorini; Okura, Vagner Katsumi; Mieczkowski, Piotr; Imperial, Juan; Arruda, Paulo

    2016-01-01

    Microbiome analysis using metagenomic sequencing has revealed a vast microbial diversity associated with plants. Identifying the molecular functions associated with microbiome-plant interaction is a significant challenge concerning the development of microbiome-derived technologies applied to agriculture. An alternative to accelerate the discovery of the microbiome benefits to plants is to construct microbial culture collections concomitant with accessing microbial community structure and abundance. However, traditional methods of isolation, cultivation, and identification of microbes are time-consuming and expensive. Here we describe a method for identification of microbes in culture collections constructed by picking colonies from primary platings that may contain single or multiple microorganisms, which we named community-based culture collections (CBC). A multiplexing 16S rRNA gene amplicon sequencing based on two-step PCR amplifications with tagged primers for plates, rows, and columns allowed the identification of the microbial composition regardless if the well contains single or multiple microorganisms. The multiplexing system enables pooling amplicons into a single tube. The sequencing performed on the PacBio platform led to recovery near-full-length 16S rRNA gene sequences allowing accurate identification of microorganism composition in each plate well. Cross-referencing with plant microbiome structure and abundance allowed the estimation of diversity and abundance representation of microorganism in the CBC. PMID:27404280

  7. CLUSS: Clustering of protein sequences based on a new similarity measure

    Directory of Open Access Journals (Sweden)

    Brzezinski Ryszard

    2007-08-01

    Full Text Available Abstract Background The rapid burgeoning of available protein data makes the use of clustering within families of proteins increasingly important. The challenge is to identify subfamilies of evolutionarily related sequences. This identification reveals phylogenetic relationships, which provide prior knowledge to help researchers understand biological phenomena. A good evolutionary model is essential to achieve a clustering that reflects the biological reality, and an accurate estimate of protein sequence similarity is crucial to the building of such a model. Most existing algorithms estimate this similarity using techniques that are not necessarily biologically plausible, especially for hard-to-align sequences such as proteins with different domain structures, which cause many difficulties for the alignment-dependent algorithms. In this paper, we propose a novel similarity measure based on matching amino acid subsequences. This measure, named SMS for Substitution Matching Similarity, is especially designed for application to non-aligned protein sequences. It allows us to develop a new alignment-free algorithm, named CLUSS, for clustering protein families. To the best of our knowledge, this is the first alignment-free algorithm for clustering protein sequences. Unlike other clustering algorithms, CLUSS is effective on both alignable and non-alignable protein families. In the rest of the paper, we use the term "phylogenetic" in the sense of "relatedness of biological functions". Results To show the effectiveness of CLUSS, we performed an extensive clustering on COG database. To demonstrate its ability to deal with hard-to-align sequences, we tested it on the GH2 family. In addition, we carried out experimental comparisons of CLUSS with a variety of mainstream algorithms. These comparisons were made on hard-to-align and easy-to-align protein sequences. The results of these experiments show the superiority of CLUSS in yielding clusters of proteins

  8. Pigs in Sequence Space: A 0.66X Coverage Pig Genome Survey based on Shotgun Sequencing

    DEFF Research Database (Denmark)

    Wernersson, R; Schierup, Mikkel Heide; Jørgensen, Frank Grønlund;

    2005-01-01

    Background Comparative whole genome analysis of Mammalia can benefit from the addition of more species. The pig is an obvious choice due to its economic and medical importance as well as its evolutionary position in the artiodactyls. Results We have generated ~ 3.84 million shotgun sequences (0.6...... as the human branch, and the joint alignment of the shot-gun sequences to the human-mouse alignment offers a rapid way for the investigator to define specific regions for analysis and resequencing....

  9. Detection of methylation in promoter sequences by melting curve analysis-based semiquantitative real time PCR

    Directory of Open Access Journals (Sweden)

    Lázcoz Paula

    2008-02-01

    Full Text Available Abstract Background We present two melting curve analysis (MCA-based semiquantitative real time PCR techniques to detect the promoter methylation status of genes. The first, MCA-MSP, follows the same principle as standard MSP but it is performed in a real time thermalcycler with results being visualized in a melting curve. The second, MCA-Meth, uses a single pair of primers designed with no CpGs in its sequence. These primers amplify both unmethylated and methylated sequences. In clinical applications the MSP technique has revolutionized methylation detection by simplifying the analysis to a PCR-based protocol. MCA-analysis based techniques may be able to further improve and simplify methylation analyses by reducing starting DNA amounts, by introducing an all-in-one tube reaction and by eliminating a final gel stage for visualization of the result. The current study aimed at investigating the feasibility of both MCA-MSP and MCA-Meth in the analysis of promoter methylation, and at defining potential advantages and shortcomings in comparison to currently implemented techniques, i.e. bisulfite sequencing and standard MSP. Methods The promoters of the RASSF1A (3p21.3, BLU (3p21.3 and MGMT (10q26 genes were analyzed by MCA-MSP and MCA-Meth in 13 astrocytoma samples, 6 high grade glioma cell lines and 4 neuroblastoma cell lines. The data were compared with standard MSP and validated by bisulfite sequencing. Results Both, MCA-MSP and MCA-Meth, successfully determined promoter methylation. MCA-MSP provided information similar to standard MSP analyses. However the analysis was possible in a single tube and avoided the gel stage. MCA-Meth proved to be useful in samples with intermediate methylation status, reflected by a melting curve position shift in dependence on methylation extent. Conclusion We propose MCA-MSP and MCA-Meth as alternative or supplementary techniques to MSP or bisulfite sequencing.

  10. Pseudorandom Bit Sequence Generator for Stream Cipher Based on Elliptic Curves

    Directory of Open Access Journals (Sweden)

    Jilna Payingat

    2015-01-01

    Full Text Available This paper proposes a pseudorandom sequence generator for stream ciphers based on elliptic curves (EC. A detailed analysis of various EC based random number generators available in the literature is done and a new method is proposed such that it addresses the drawbacks of these schemes. Statistical analysis of the proposed method is carried out using the NIST (National Institute of Standards and Technology test suite and it is seen that the sequence exhibits good randomness properties. The linear complexity analysis shows that the system has a linear complexity equal to the period of the sequence which is highly desirable. The statistical complexity and security against known plain text attack are also analysed. A comparison of the proposed method with other EC based schemes is done in terms of throughput, periodicity, and security, and the proposed method outperforms the methods in the literature. For resource constrained applications where a highly secure key exchange is essential, the proposed method provides a good option for encryption by time sharing the point multiplication unit for EC based key exchange. The algorithm and architecture for implementation are developed in such a way that the hardware consumed in addition to point multiplication unit is much less.

  11. Base- level Chang and Sequence Stratigraphy of Lishu Fault Lacustrine Basin

    Institute of Scientific and Technical Information of China (English)

    Wang Simin; Liu Zhaojun; Liu Kui

    2000-01-01

    Base - level is a kind of surface which controls sedimentation and erosion. So, it can be concluded that it is baselevel change that controls the formation and internal structure of a sequence. A single cycle of base- level change can generate four sets of different stacking patterns. They are two sets of aggradation, one progradation and one retrogradation, which affects the features of the internal structure of a sequence. Lishu fault subsidence of Songliao basin is a typical half - graben lacustrine basin. Comprehensive base - level change analysis indicates that six base - level cycles and their related six sequences can be recognized between T4 and T5 seismic reflection surface. The contemporaneous fault is the main controlling factor of the fault lacustrine basin. There are obvious differences exist in the composition of sedimentary systems and all systems tracts between its steep slope (the side that basin control fault existed) and flat slope. Except highstand systems tract is composed of fan delta - lacustrine system, lowstand systems tract, transgressive systems tract and regressive systems tract are all made up of fan delta - underwater fan- lacustrine sedimentary systems in the side of steep slope.

  12. Dual mechanisms of DNA sequencing based on tunnelling between nitrogen-doped carbon nanotube electrodes

    Science.gov (United States)

    Kim, Han; Kim, Yong-Hoon

    2013-03-01

    The DNA sequencing approach based on the combination of nanopores and electron tunnelling has seen considerable advances in recent years, and particularly carbon nanomaterials have emerged as promising candidates to replace metal electrodes. Carrying out extensive first-principles calculations, we here show that two distinct DNA sequencing mechanisms can be achieved with different configurations of a single-type nitrogen-doped capped carbon nanotube (CNT) that has significantly enhanced transmission and chemical sensitivity over its pristine counterpart. With a small CNT-CNT gap size that induces face-on nucleobase configurations, we obtain a typical conductance ordering where the largest signal is induced from guanine due to its highest occupied molecular orbital energetic position higher than those of other bases. On the other hand, for a large CNT-CNT gap size that accommodates edge-on nucleobase configurations, we extract a completely different conductance ordering in which thymine results in the largest signal. We find that the latter novel nucleobase sensing mechanism originates from the nature of chemical connectivity between nitrogen-doped CNT caps and nucleobase functional groups that include the thymine methyl group. This work thus demonstrates the feasibility of a tunnelling-based dual-mode approach toward whole genome sequencing applications, detection of DNA base modifications, and single-molecule sensing in general.

  13. Phylogenetic relationships of South China Sea snappers (genus Lutjanus; family Lutjanidae) based on mitochondrial DNA sequences.

    Science.gov (United States)

    Guo, Yusong; Wang, Zhongduo; Liu, Chuwu; Liu, Li; Liu, Yun

    2007-01-01

    Phylogenetic relationships of intra- and interspecies were elucidated based on complete cytochrome b (cyt b) and cytochrome c oxidase subunit II (COII) gene sequences from 12 recognized species of genus Lutjanus Bloch in the South China Sea (SCS). Using the combined data set of consensus cyt b and COII gene sequences, interspecific relationships for all 12 recognized species in SCS were consistent with Allen's morphology-based identifications, with strong correlation between the molecular and morphological characteristics. Monophyly of eight species (L. malabaricus, L. russellii, L. stellatus, L. bohar, L. johnii, L. sebae, L. fulvus, and L. fulviflamma) was strongly supported; however, the pairs L. vitta/L. ophuysenii and L. erythropterus/L. argentimaculatus were more similar than expected We inferred that L. malabaricus exists in SCS, and the introgression caused by hybridization is the reason for the unexpectedly high homogeneity.

  14. State of the art and challenges in sequence based T-cell epitope prediction

    DEFF Research Database (Denmark)

    Lundegaard, Claus; Hoof, Ilka; Lund, Ole;

    2010-01-01

    field has evolved significantly. Methods have now been developed that produce highly accurate binding predictions for many alleles and integrate both proteasomal cleavage and transport events. Moreover have so-called pan-specific methods been developed, which allow for prediction of peptide binding to......Sequence based T-cell epitope predictions have improved immensely in the last decade. From predictions of peptide binding to major histocompatibility complex molecules with moderate accuracy, limited allele coverage, and no good estimates of the other events in the antigen-processing pathway, the...... MHC alleles characterized by limited or no peptide binding data. Most of the developed methods are publicly available, and have proven to be very useful as a shortcut in epitope discovery. Here, we will go through some of the history of sequence-based predictions of helper as well as cytotoxic T cell...

  15. Electronic band gaps and transport in aperiodic graphene-based superlattices of Thue-Morse sequence

    Science.gov (United States)

    Wang, Ligang; Ma, Tianxing

    2014-03-01

    We investigate electronic band structure and transport properties in aperiodic graphene-based superlattices of Thue-Morse (TM) sequence. The robust properties of zero- k gap are demonstrated in both mono-layer and bi-layer graphene TM sequence. The Extra Dirac points may emerge at ky ≠ 0, and the electronic transport behaviors such as the conductance and the Fano factor are discussed in detail. Our results provide a flexible and effective way to control the transport properties in graphene-based superlattices. This work is supported by NSFCs (Nos. 11274275, 11104014 and 61078021), Research Fund for the Doctoral Program of Higher Education 20110003120007, SRF for ROCS (SEM), and the National Basic Research Program of China (No. 2011CBA00108, and 2012CB921602).

  16. Genome Sequencing

    DEFF Research Database (Denmark)

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based on transcr......The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...

  17. Capturing Human Motion based on Modified Hidden Markov Model in Multi-View Image Sequences

    OpenAIRE

    Yanan Liu; Lian Kun Jia; Wen Yu Yu

    2014-01-01

    Human motion capturing is of great importance in video information retrieval, hence, in this paper, we propose a novel approach to effectively capturing human motions based on modified hidden markov model from multi-view image sequences. Firstly, the structure of the human skeleton model is illustrated, which is extended from skeleton root and spine root, and this skeleton consists of right leg, left leg and spine. Secondly, our proposed human motion capturing system is made up of data traini...

  18. Performance of Correspondence Algorithms in Vision-Based Driver Assistance Using an Online Image Sequence Database

    DEFF Research Database (Denmark)

    Klette, Reinhard; Krüger, Norbert; Vaudrey, Tobi;

    2011-01-01

    ) for demonstrating ideas, difficulties, and possible ways in this future field of extensive performance tests in vision-based driver assistance, particularly for cases where the ground truth is not available. This paper shows that the complexity of real-world data does not support the identification of general...... rankings of correspondence techniques on sets of basic sequences that show different situations. It is suggested that correspondence techniques should adaptively be chosen in real time using some type of statistical situation classifiers....

  19. Genome signature-based dissection of human gut metagenomes to extract subliminal viral sequences

    OpenAIRE

    Ogilvie, Lesley A.; Bowler, Lucas D.; Caplin, Jonathan; Dedi, Cinzia; Diston, David; Cheek, Elizabeth; Taylor, Huw; Ebdon, James E.; Jones, Brian V.

    2013-01-01

    Bacterial viruses (bacteriophages) have a key role in shaping the development and functional outputs of host microbiomes. Although metagenomic approaches have greatly expanded our understanding of the prokaryotic virosphere, additional tools are required for the phage-oriented dissection of metagenomic data sets, and host-range affiliation of recovered sequences. Here we demonstrate the application of a genome signature-based approach to interrogate conventional whole-community metagenomes an...

  20. Evaluation of Repetitive Element Sequence-Based PCR as a Molecular Typing Method for Clostridium difficile

    OpenAIRE

    Spigaglia, Patrizia; Mastrantonio, Paola

    2003-01-01

    Repetitive element sequence-based PCR (rep-PCR) is a typing method that enables the generation of DNA fingerprinting that discriminates bacterial strains. In this study, we evaluated the applicability of rep-PCR in typing Clostridium difficile clinical isolates. The results obtained by rep-PCR were compared with those obtained by pulsed-field gel electrophoresis (PFGE) and PCR ribotyping. A high correspondence between pattern differentiations produced by rep-PCR and PFGE was observed, whereas...

  1. Use of polyphase continuous excitation based on the Frank sequence in EPR.

    OpenAIRE

    Tseitlin, Mark; Quine, Richard W.; Eaton, Sandra S.; Eaton, Gareth R.

    2011-01-01

    Polyphase continuous excitation based on the Frank sequence is suggested as an alternative to single pulse excitation in EPR. The method allows reduction of the source power, while preserving the excitation bandwidth of a single pulse. For practical EPR implementation the use of a cross-loop resonator is essential to provide isolation between the spin system and the resonator responses to the excitation. Provided that a line broadening of about 5% is acceptable, the cumulative turning angle o...

  2. A reactive navigation method based on an incremental learning of tasks sequences

    OpenAIRE

    Davesne, Frédéric; Barret, Claude

    1999-01-01

    National audience Within the contest of learning sequences of basic tasks to build a complex behavior, a method is proposed to coordinate a hierarchical set of tasks. Each one possesses a set of sub-tasks lower in the hierarchy, which must be coordinated to respect a binary perceptive constraint. For each task, the coordination is achieved by a reinforcement learning inspired algorithm based on the heuristic which does not need internal parameters. A validation of the method is given, usin...

  3. Sequence-based characterization of five SLA loci in Asian wild boars.

    Science.gov (United States)

    Jung, W Y; Choi, N R; Seo, D W; Lim, H T; Ho, C S; Lee, J H

    2014-10-01

    Two swine leucocyte antigen (SLA) class I (SLA-1 and SLA-2) and three class II (DRB1, DQB1 and DQA) genes were investigated for their diversity in Asian wild boars using a sequence-based typing method. A total of 15 alleles were detected at these loci, with eleven being novel. The findings provide one of the first glimpses of the SLA allelic diversity and architecture in the wild boar populations.

  4. Sequence-based characterization of the eight SLA loci in Korean native pigs.

    Science.gov (United States)

    Lee, Y J; Cho, K H; Kim, M J; Smith, D M; Ho, C S; Jung, K C; Jin, D I; Park, C S; Jeon, J T; Lee, J H

    2008-08-01

    Eight swine leucocyte antigen (SLA) gene (SLA-1, SLA-2, SLA-3, SLA-6, DRA, DRB1, DQA, DQB1) alleles were identified using sequence-based typing method in three Korean native pigs used for breeding at the National Institute of Animal Science in Korea. Six new alleles in class I genes and three new alleles in class II genes have been identified in this breed and can give valuable information for xenotransplantation and disease resistance.

  5. Molecular phylogeny of Toxoplasmatinae: comparison between inferences based on mitochondrial and apicoplast genetic sequences

    OpenAIRE

    Michelle Klein Sercundes; Samantha Yuri Oshiro Branco Valadas; Lara Borges Keid; Tricia Maria Ferreira Souza Oliveira; Helena Lage Ferreira; Ricardo Wagner Almeida Vitor; Fábio Gregori; Rodrigo Martins Soares

    2016-01-01

    Abstract Phylogenies within Toxoplasmatinae have been widely investigated with different molecular markers. Here, we studied molecular phylogenies of the Toxoplasmatinae subfamily based on apicoplast and mitochondrial genes. Partial sequences of apicoplast genes coding for caseinolytic protease (clpC) and beta subunit of RNA polymerase (rpoB), and mitochondrial gene coding for cytochrome B (cytB) were analyzed. Laboratory-adapted strains of the closely related parasites Sarcocystis falcatula ...

  6. Prediction of peptide drift time in ion mobility mass spectrometry from sequence-based features

    KAUST Repository

    Wang, Bing

    2013-05-09

    Background: Ion mobility-mass spectrometry (IMMS), an analytical technique which combines the features of ion mobility spectrometry (IMS) and mass spectrometry (MS), can rapidly separates ions on a millisecond time-scale. IMMS becomes a powerful tool to analyzing complex mixtures, especially for the analysis of peptides in proteomics. The high-throughput nature of this technique provides a challenge for the identification of peptides in complex biological samples. As an important parameter, peptide drift time can be used for enhancing downstream data analysis in IMMS-based proteomics.Results: In this paper, a model is presented based on least square support vectors regression (LS-SVR) method to predict peptide ion drift time in IMMS from the sequence-based features of peptide. Four descriptors were extracted from peptide sequence to represent peptide ions by a 34-component vector. The parameters of LS-SVR were selected by a grid searching strategy, and a 10-fold cross-validation approach was employed for the model training and testing. Our proposed method was tested on three datasets with different charge states. The high prediction performance achieve demonstrate the effectiveness and efficiency of the prediction model.Conclusions: Our proposed LS-SVR model can predict peptide drift time from sequence information in relative high prediction accuracy by a test on a dataset of 595 peptides. This work can enhance the confidence of protein identification by combining with current protein searching techniques. 2013 Wang et al.; licensee BioMed Central Ltd.

  7. A method to prioritize quantitative traits and individuals for sequencing in family-based studies.

    Directory of Open Access Journals (Sweden)

    Kaanan P Shah

    Full Text Available Owing to recent advances in DNA sequencing, it is now technically feasible to evaluate the contribution of rare variation to complex traits and diseases. However, it is still cost prohibitive to sequence the whole genome (or exome of all individuals in each study. For quantitative traits, one strategy to reduce cost is to sequence individuals in the tails of the trait distribution. However, the next challenge becomes how to prioritize traits and individuals for sequencing since individuals are often characterized for dozens of medically relevant traits. In this article, we describe a new method, the Rare Variant Kinship Test (RVKT, which leverages relationship information in family-based studies to identify quantitative traits that are likely influenced by rare variants. Conditional on nuclear families and extended pedigrees, we evaluate the power of the RVKT via simulation. Not unexpectedly, the power of our method depends strongly on effect size, and to a lesser extent, on the frequency of the rare variant and the number and type of relationships in the sample. As an illustration, we also apply our method to data from two genetic studies in the Old Order Amish, a founder population with extensive genealogical records. Remarkably, we implicate the presence of a rare variant that lowers fasting triglyceride levels in the Heredity and Phenotype Intervention (HAPI Heart study (p = 0.044, consistent with the presence of a previously identified null mutation in the APOC3 gene that lowers fasting triglyceride levels in HAPI Heart study participants.

  8. DNA/RNA transverse current sequencing: Intrinsic structural noise from neighboring bases

    Directory of Open Access Journals (Sweden)

    Jose eAlvarez

    2015-06-01

    Full Text Available Nanopore DNA sequencing via transverse current has emerged as a promising candidate for third-generation sequencing technology. It produces long read lengths which could alleviate problems with assembly errors inherent in current technologies. However, the high error rates of nanopore sequencing have to be addressed. A very important source of the error is the intrinsic noise in the current arising from carrier dispersion along the chain of the molecule i.e. from the influence of neighboring bases. In this work we perform calculations of the transverse current within an effective multi-orbital tight-binding model derived from first-principles calculations of the DNA/RNA molecules, to study the effect of this structural noise on the error rates in DNA/RNA sequencing via transverse current in nanopores. We demonstrate that a statistical technique, utilizing not only the currents through the nucleotides but also the correlations in the currents, can in principle reduce the error rate below any desired precision.

  9. Phylogenetic analyses of some genera in Oedipodidae (Orthoptera: Acridoidea) based on 16S mitochondrial partialgene sequences

    Institute of Scientific and Technical Information of China (English)

    Xiang-Chu Yin; Xin-Jiang Li; Wen-Qiang Wang; Hong Yin; Cheng-Quan Cao; Bao-Hua Ye; Zhan Yin

    2008-01-01

    Based on the 16S mitochondrial partial gene sequences of 29 genera, containing 26 from Oedipodidae and one each from Tanaoceridae, Pyrgomorphidae and Tetrigidae (as outgroups), the homologus sequences were compared and phylogenetic analyses were performed. A phylogenetic tree was inferred by neighbor-joining (N J). The results of sequences compared show that: (i) in a total of 574 bp of Oedipodidae, the number of substituted nucleotides was 265 bp and the average percentages ofT, C, A and G were 38.3%,11.4%, 31.8% and 18.5%, respectively, and the content of A+T (70.1%) was distinctly richer than that of C+G (29.9%); and (ii) the average nucleotide divergence of 16S rDNA sequences among genera of Oedipodidae were 9.0%, among families of Acridoidea were 17.0%, and between superfamilies (Tetrigoidea and Acridoidea) were 23.9%, respectively. The phylogenetic tree indicated: (i) the Oedipodidae was a monophyletic group, which suggested that the taxonomic status of this family was confLrrned; (ii) the genus Heteropternis separated from the other Oedipodids first and had another unique sound-producing structure in morphology, which is the type-genus of subfamily Heteropterninae; and (iii) the relative intergeneric relationship within the same continent was closer than that of different continents, and between the Eurasian genera and the African genera, was closer than that between Eurasians and Americans.

  10. A web-based search engine for triplex-forming oligonucleotide target sequences.

    Science.gov (United States)

    Gaddis, Sara S; Wu, Qi; Thames, Howard D; DiGiovanni, John; Walborg, Earl F; MacLeod, Michael C; Vasquez, Karen M

    2006-01-01

    Triplex technology offers a useful approach for site-specific modification of gene structure and function both in vitro and in vivo. Triplex-forming oligonucleotides (TFOs) bind to their target sites in duplex DNA, thereby forming triple-helical DNA structures via Hoogsteen hydrogen bonding. TFO binding has been demonstrated to site-specifically inhibit gene expression, enhance homologous recombination, induce mutation, inhibit protein binding, and direct DNA damage, thus providing a tool for gene-specific manipulation of DNA. We have developed a flexible web-based search engine to find and annotate TFO target sequences within the human and mouse genomes. Descriptive information about each site, including sequence context and gene region (intron, exon, or promoter), is provided. The engine assists the user in finding highly specific TFO target sequences by eliminating or flagging known repeat sequences and flagging overlapping genes. A convenient way to check for the uniqueness of a potential TFO binding site is provided via NCBI BLAST. The search engine may be accessed at spi.mdanderson.org/tfo. PMID:16764543

  11. Implementing amplicon-based next generation sequencing in the diagnosis of small cell lung carcinoma metastases.

    Science.gov (United States)

    Meder, Lydia; König, Katharina; Fassunke, Jana; Ozretić, Luka; Wolf, Jürgen; Merkelbach-Bruse, Sabine; Heukamp, Lukas C; Buettner, Reinhard

    2015-12-01

    Small cell lung carcinoma (SCLC) is the most aggressive entity of lung cancer. Rapid cancer progression and early formation of systemic metastases drive the deadly outcome of SCLC. Recent advances in identifying oncogenes by cancer whole genome sequencing improved the understanding of SCLC carcinogenesis. However, tumor material is often limited in the clinic. Thus, it is a compulsive issue to improve SCLC diagnostics by combining established immunohistochemistry and next generation sequencing. We implemented amplicon-based next generation deep sequencing in our routine diagnostics pipeline to analyze RB1, TP53, EP300 and CREBBP, frequently mutated in SCLC. Thereby, our pipeline combined routine SCLC histology and identification of somatic mutations. We comprehensively analyzed fifty randomly collected SCLC metastases isolated from trachea and lymph nodes in comparison to specimens derived from primary SCLC. SCLC lymph node metastases showed enhanced proliferation and frequently a collapsed keratin cytoskeleton compared to SCLC metastases isolated from trachea. We identified characteristic synchronous mutations in RB1 and TP53 and non-synchronous CREBBP and EP300 mutations. Our data showed the benefit of implementing deep sequencing into routine diagnostics. We here identify oncogenic drivers and simultaneously gain further insights into SCLC tumor biology.

  12. A phylogenetic analysis of the ubiquitin superfamily based on sequence and structural information.

    Science.gov (United States)

    Yang, Zhen; Chen, Haikui; Yang, Xiaobo; Wan, Xueshuai; He, Lian; Miao, Ruoyu; Yang, Huayu; Zhong, Yang; Wang, Li; Zhao, Haitao

    2014-09-01

    Ubiquitin belongs to an important class of protein modifier and gene expression regulator proteins that participates in various cellular processes. A large number of ubiquitin-related proteins have been identified during the last two decades. However, the evolutionary history of this ancient gene family remains largely unknown. We analyzed the members of the superfamily using both sequence- and structure-based methodology to better understand the evolution of ubiquitin-related proteins. As a part of these analyses we used the MEME algorithm to extract common sequence motifs across the superfamily, and we inferred the phylogeny and distribution of the superfamily members across multiple species. A total of 23 families were identified in the gene family. Several common sequence motifs were revealed and evaluated. We also found that the number of genes for ubiquitin-related proteins encoded within a specific genome correlates with the biological complexity of that particular species. This analysis should provide valuable insight into the sequence/function relationships and evolutionary history of ubiquitin and ubiquitin-related proteins. PMID:24997693

  13. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms.

    Directory of Open Access Journals (Sweden)

    Francesca Bertolini

    Full Text Available Few studies investigated the donkey (Equus asinus at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca. The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing and Ion Torrent (RRL runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.

  14. Pigs in sequence space: A 0.66X coverage pig genome survey based on shotgun sequencing

    Directory of Open Access Journals (Sweden)

    Li Wei

    2005-05-01

    Full Text Available Abstract Background Comparative whole genome analysis of Mammalia can benefit from the addition of more species. The pig is an obvious choice due to its economic and medical importance as well as its evolutionary position in the artiodactyls. Results We have generated ~3.84 million shotgun sequences (0.66X coverage from the pig genome. The data are hereby released (NCBI Trace repository with center name "SDJVP", and project name "Sino-Danish Pig Genome Project" together with an initial evolutionary analysis. The non-repetitive fraction of the sequences was aligned to the UCSC human-mouse alignment and the resulting three-species alignments were annotated using the human genome annotation. Ultra-conserved elements and miRNAs were identified. The results show that for each of these types of orthologous data, pig is much closer to human than mouse is. Purifying selection has been more efficient in pig compared to human, but not as efficient as in mouse, and pig seems to have an isochore structure most similar to the structure in human. Conclusion The addition of the pig to the set of species sequenced at low coverage adds to the understanding of selective pressures that have acted on the human genome by bisecting the evolutionary branch between human and mouse with the mouse branch being approximately 3 times as long as the human branch. Additionally, the joint alignment of the shot-gun sequences to the human-mouse alignment offers the investigator a rapid way to defining specific regions for analysis and resequencing.

  15. Comparison of two multilocus sequence based genotyping schemes for Leptospira species.

    Directory of Open Access Journals (Sweden)

    Ahmed Ahmed

    2011-11-01

    Full Text Available BACKGROUND: Several sequence based genotyping schemes have been developed for Leptospira spp. The objective of this study was to genotype a collection of clinical and reference isolates using the two most commonly used schemes and compare and contrast the results. METHODS AND FINDINGS: A total of 48 isolates consisting of L. interrogans (n = 40 and L. kirschneri (n = 8 were typed by the 7 locus MLST scheme described by Thaipadungpanit et al., and the 6 locus genotyping scheme described by Ahmed et al., (termed 7L and 6L, respectively. Two L. interrogans isolates were not typed using 6L because of a deletion of three nucleotides in lipL32. The remaining 46 isolates were resolved into 21 sequence types (STs by 7L, and 30 genotypes by 6L. Overall nucleotide diversity (based on concatenated sequence was 3.6% and 2.3% for 7L and 6L, respectively. The D value (discriminatory ability of 7L and 6L were comparable, i.e. 92.0 (95% CI 87.5-96.5 vs. 93.5 (95% CI 88.6-98.4. The dN/dS ratios calculated for each locus indicated that none were under positive selection. Neighbor joining trees were reconstructed based on the concatenated sequences for each scheme. Both trees showed two distinct groups corresponding to L. interrogans and L. kirschneri, and both identified two clones containing 10 and 7 clinical isolates, respectively. There were six instances in which 6L split single STs as defined by 7L into closely related clusters. We noted two discrepancies between the trees in which the genetic relatedness between two pairs of strains were more closely related by 7L than by 6L. CONCLUSIONS: This genetic analysis indicates that the two schemes are comparable. We discuss their practical advantages and disadvantages.

  16. A new proof for the convergence of an individual based model to the Trait substitution sequence

    CERN Document Server

    Gupta, Ankit; Tran, Viet Chi

    2012-01-01

    We consider a stochastic individual based model for a population structured by a vector trait and with logistic interactions. We consider its limit in a context from adaptive dynamics: the population is large, the mutations are rare and we view the process in the timescale of mutations. Using averaging techniques due to Kurtz (1992), we give a new proof of the convergence of the individual based model to the Trait substitution sequence of Metz et al. (1992) and rigorously proved by Champagnat (2006): assuming that "invasion implies fixation", we obtain in the limit a process that jumps from one population equilibrum to another one when mutations occur and invade the population.

  17. Sequence and single-base polymorphisms of the bovine alpha-lactalbumin 5'-flanking region.

    Science.gov (United States)

    Bleck, G T; Bremel, R D

    1993-04-30

    The alpha-lactalbumin (alpha LA)-encoding gene is a potential quantitative trait locus in dairy animals. In cattle, the production of alpha LA is tightly coupled to the onset of lactation and it serves as a regulatory subunit of the enzyme responsible for lactose synthesis. Lactose is the major osmole controlling water movement in the mammary gland. To better understand the control of bovine alpha LA expression, the 5'-flanking region of a Holstein alpha LA gene was cloned and sequenced. The sequenced clone contains 1952 bp of 5'-flanking region and 66-bp of the protein-coding region. Three single-bp polymorphisms were identified within this region. These polymorphisms occur at positions +15, +21 and +54 relative to the mRNA transcription start point (tsp). The +15 and +21 variations occur in the region encoding the 5'-untranslated region of the mRNA-coding sequence. The +54 polymorphism is a silent mutation in the SP-coding region of the gene. A polymerase chain reaction (PCR, Cetus)-based screening method has been employed to analyze the genotype of cattle at the +15 position. A total of 501 randomly selected cattle from seven breeds were screened for this allele. Of these animals, only the Holstein breed of cattle was found to contain the +15 variation and it occurs at a gene frequency of 32%. Sequence comparisons were conducted between the 5'-flanking regions of the bovine-milk-protein encoding genes, alpha LA, beta-casein and alpha S1-casein, which are coordinately expressed. Regions of similarity extending to 350 bp in length were observed between these sequences.

  18. Molecular Characterization of Five Potyviruses Infecting Korean Sweet Potatoes Based on Analyses of Complete Genome Sequences

    Directory of Open Access Journals (Sweden)

    Hae-Ryun Kwak

    2015-12-01

    Full Text Available Sweet potatoes (Ipomea batatas L. are grown extensively, in tropical and temperate regions, and are important food crops worldwide. In Korea, potyviruses, including Sweet potato feathery mottle virus (SPFMV, Sweet potato virus C (SPVC, Sweet potato virus G (SPVG, Sweet potato virus 2 (SPV2, and Sweet potato latent virus (SPLV, have been detected in sweet potato fields at a high (~95% incidence. In the present work, complete genome sequences of 18 isolates, representing the five potyviruses mentioned above, were compared with previously reported genome sequences. The complete genomes consisted of 10,081 to 10,830 nucleotides, excluding the poly-A tails. Their genomic organizations were typical of the Potyvirus genus, including one target open reading frame coding for a putative polyprotein. Based on phylogenetic analyses and sequence comparisons, the Korean SPFMV isolates belonged to the strains RC and O with >98% nucleotide sequence identity. Korean SPVC isolates had 99% identity to the Japanese isolate SPVC-Bungo and 70% identity to the SPFMV isolates. The Korean SPVG isolates showed 99% identity to the three previously reported SPVG isolates. Korean SPV2 isolates had 97% identity to the SPV2 GWB-2 isolate from the USA. Korean SPLV isolates had a relatively low (88% nucleotide sequence identity with the Taiwanese SPLV-TW isolates, and they were phylogenetically distantly related to SPFMV isolates. Recombination analysis revealed that possible recombination events occurred in the P1, HC-Pro and NIa-NIb regions of SPFMV and SPLV isolates and these regions were identified as hotspots for recombination in the sweet potato potyviruses.

  19. Molecular genotyping of human Ureaplasma species based on multiple-banded antigen (MBA) gene sequences.

    Science.gov (United States)

    Kong, F; Ma, Z; James, G; Gordon, S; Gilbert, G L

    2000-09-01

    Ureaplasma urealyticum has been divided into 14 serovars. Recently, subdivision of U. urealyticum into two species has been proposed: U. parvum (previously U. urealyticum parvo biovar), comprising four serovars (1, 3, 6, 14) and U. urealyticum (previously U. urealyticum T-960 biovar), 10 serovars (2, 4, 5, 7-13). The multiple-banded antigen (MBA) genes of these species contain both species and serovar/subtype specific sequences. Based on whole sequences of the 5'-ends of MBA genes of U. parvum serovars and partial sequences of the 5'-ends of MBA genes of U. urealyticum serovars, we previously divided each of these species into three MBA genotypes. To further elucidate the relationships between serovars, we sequenced the whole 5'-ends of MBA genes of all 10 U. urealyticum serovars and partial repetitive regions of these genes from all serovars of U. parvum and U. urealyticum. For the first time, all four serovars of U. parvum were clearly differentiated from each other. In addition, the 10 serovars of U. urealyticum were divided into five MBA genotypes, as follows: MBA genotype A comprises serovars 2, 5, 8; MBA genotype B, serovar 10 only; MBA genotype C, serovars 4, 12, 13; MBA genotype D, serovar 9 only; and MBA genotype E comprises serovars 7 and 11. There were no sequence differences between members within each MBA genotype. Further work is required to identify other genes or other regions of the MBA genes that may be used to differentiate U. urealyticum serovars within MBA genotypes A, C and E. A better understanding of the molecular basis of serotype differentiation will help to improve subtyping methods for use in studies of the pathogenesis and epidemiology of these organisms.

  20. SEQMINER: An R-Package to Facilitate the Functional Interpretation of Sequence-Based Associations.

    Science.gov (United States)

    Zhan, Xiaowei; Liu, Dajiang J

    2015-12-01

    Next-generation sequencing has enabled the study of a comprehensive catalogue of genetic variants for their impact on various complex diseases. Numerous consortia studies of complex traits have publically released their summary association statistics, which have become an invaluable resource for learning the underlying biology, understanding the genetic architecture, and guiding clinical translations. There is great interest in the field in developing novel statistical methods for analyzing and interpreting results from these genotype-phenotype association studies. One popular platform for method development and data analysis is R. In order to enable these analyses in R, it is necessary to develop packages that can efficiently query files of summary association statistics, explore the linkage disequilibrium structure between variants, and integrate various bioinformatics databases. The complexity and scale of sequence datasets and databases pose significant computational challenges for method developers. To address these challenges and facilitate method development, we developed the R package SEQMINER for annotating and querying files of sequence variants (e.g., VCF/BCF files) and summary association statistics (e.g., METAL/RAREMETAL files), and for integrating bioinformatics databases. SEQMINER provides an infrastructure where novel methods can be distributed and applied to analyzing sequence datasets in practice. We illustrate the performance of SEQMINER using datasets from the 1000 Genomes Project. We show that SEQMINER is highly efficient and easy to use. It will greatly accelerate the process of applying statistical innovations to analyze and interpret sequence-based associations. The R package, its source code and documentations are available from http://cran.r-project.org/web/packages/seqminer and http://seqminer.genomic.codes/.

  1. Defining and Evaluating a Core Genome Multilocus Sequence Typing Scheme for Whole-Genome Sequence-Based Typing of Listeria monocytogenes.

    Science.gov (United States)

    Ruppitsch, Werner; Pietzka, Ariane; Prior, Karola; Bletz, Stefan; Fernandez, Haizpea Lasa; Allerberger, Franz; Harmsen, Dag; Mellmann, Alexander

    2015-09-01

    Whole-genome sequencing (WGS) has emerged today as an ultimate typing tool to characterize Listeria monocytogenes outbreaks. However, data analysis and interlaboratory comparability of WGS data are still challenging for most public health laboratories. Therefore, we have developed and evaluated a new L. monocytogenes typing scheme based on genome-wide gene-by-gene comparisons (core genome multilocus the sequence typing [cgMLST]) to allow for a unique typing nomenclature. Initially, we determined the breadth of the L. monocytogenes population based on MLST data with a Bayesian approach. Based on the genome sequence data of representative isolates for the whole population, cgMLST target genes were defined and reappraised with 67 L. monocytogenes isolates from two outbreaks and serotype reference strains. The Bayesian population analysis generated five L. monocytogenes groups. Using all available NCBI RefSeq genomes (n = 36) and six additionally sequenced strains, all genetic groups were covered. Pairwise comparisons of these 42 genome sequences resulted in 1,701 cgMLST targets present in all 42 genomes with 100% overlap and ≥90% sequence similarity. Overall, ≥99.1% of the cgMLST targets were present in 67 outbreak and serotype reference strains, underlining the representativeness of the cgMLST scheme. Moreover, cgMLST enabled clustering of outbreak isolates with ≤10 alleles difference and unambiguous separation from unrelated outgroup isolates. In conclusion, the novel cgMLST scheme not only improves outbreak investigations but also enables, due to the availability of the automatically curated cgMLST nomenclature, interlaboratory exchange of data that are crucial, especially for rapid responses during transsectorial outbreaks. PMID:26135865

  2. Priority-sequence of mineral resources’ development and utilization based on grey relational analysis method

    Institute of Scientific and Technical Information of China (English)

    Wang Ying; Zhang Chang; Jiang Gaopeng

    2016-01-01

    Generally, the sequence decision of the development and utilization of Chinese mineral resources is based on national and provincial overall plan of the mineral resources. Such plan usually cannot reflect the relative size of the suitability of the development and utilization of mineral resources. To solve the problem, the paper has selected the gift condition, the market condition, the technological condition, socio-economic condition and environmental condition as the starting-points to analyze the influential factors of the priority-sequence of mineral resources’ development and utilization. The above 5 condi-tions are further specified into 9 evaluative indicators to establish an evaluation indicator system. At last, we propose a decision model of the priority sequence based on grey relational analysis method, and fig-ure out the observation objects by the suitability index of development. Finally, the mineral resources of a certain province in China were analyzed as an example. The calculation results indicate that silver (2.0057), coal (1.9955), zinc (1.9442), cement limestone (1.9077), solvent limestone (1.5624) and other minerals in the province are suitable for development and utilization.

  3. Small RNA Sequencing Based Identification of MiRNAs in Daphnia magna.

    Directory of Open Access Journals (Sweden)

    Ercan Selçuk Ünlü

    Full Text Available Small RNA molecules are short, non-coding RNAs identified for their crucial role in post-transcriptional regulation. A well-studied example includes miRNAs (microRNAs which have been identified in several model organisms including the freshwater flea and planktonic crustacean Daphnia. A model for epigenetic-based studies with an available genome database, the identification of miRNAs and their potential role in regulating Daphnia gene expression has only recently garnered interest. Computational-based work using Daphnia pulex, has indicated the existence of 45 miRNAs, 14 of which have been experimentally verified. To extend this study, we took a sequencing approach towards identifying miRNAs present in a small RNA library isolated from Daphnia magna. Using Perl codes designed for comparative genomic analysis, 815,699 reads were obtained from 4 million raw reads and run against a database file of known miRNA sequences. Using this approach, we have identified 205 putative mature miRNA sequences belonging to 188 distinct miRNA families. Data from this study provides critical information necessary to begin an investigation into a role for these transcripts in the epigenetic regulation of Daphnia magna.

  4. A Chaos-Based Secure Direct-Sequence/Spread-Spectrum Communication System

    Directory of Open Access Journals (Sweden)

    Nguyen Xuan Quyen

    2013-01-01

    Full Text Available This paper proposes a chaos-based secure direct-sequence/spread-spectrum (DS/SS communication system which is based on a novel combination of the conventional DS/SS and chaos techniques. In the proposed system, bit duration is varied according to a chaotic behavior but is always equal to a multiple of the fixed chip duration in the communication process. Data bits with variable duration are spectrum-spread by multiplying directly with a pseudonoise (PN sequence and then modulated onto a sinusoidal carrier by means of binary phase-shift keying (BPSK. To recover exactly the data bits, the receiver needs an identical regeneration of not only the PN sequence but also the chaotic behavior, and hence data security is improved significantly. Structure and operation of the proposed system are analyzed in detail. Theoretical evaluation of bit-error rate (BER performance in presence of additive white Gaussian noise (AWGN is provided. Parameter choice for different cases of simulation is also considered. Simulation and theoretical results are shown to verify the reliability and feasibility of the proposed system. Security of the proposed system is also discussed.

  5. Main-Sequence Effective Temperatures from a Revised Mass-Luminosity Relation Based on Accurate Properties

    CERN Document Server

    Eker, Z; Soydugan, E; Bilir, S; Gokce, E Yaz; Steer, I; Tuysuz, M; Senyuz, T; Demircan, O

    2015-01-01

    The mass-luminosity (M-L), mass-radius (M-R) and mass-effective temperature ($M-T_{eff}$) diagrams for a subset of galactic nearby main-sequence stars with masses and radii accurate to $\\leq 3\\%$ and luminosities accurate to $\\leq 30\\%$ (268 stars) has led to a putative discovery. Four distinct mass domains have been identified, which we have tentatively associated with low, intermediate, high, and very high mass main-sequence stars, but which nevertheless are clearly separated by three distinct break points at 1.05, 2.4, and 7$M_{\\odot}$ within the mass range studied of $0.38-32M_{\\odot}$. Further, a revised mass-luminosity relation (MLR) is found based on linear fits for each of the mass domains identified. The revised, mass-domain based MLRs, which are classical ($L \\propto M^{\\alpha}$), are shown to be preferable to a single linear, quadratic or cubic equation representing as an alternative MLR. Stellar radius evolution within the main-sequence for stars with $M>1M_{\\odot}$ is clearly evident on the M-R d...

  6. A sequence-based genetic linkage map as a reference for Brassica rapa pseudochromosome assembly

    Directory of Open Access Journals (Sweden)

    Cheng Feng

    2011-05-01

    Full Text Available Abstract Background Brassica rapa is an economically important crop and a model plant for studies concerning polyploidization and the evolution of extreme morphology. The multinational B. rapa Genome Sequencing Project (BrGSP was launched in 2003. In 2008, next generation sequencing technology was used to sequence the B. rapa genome. Several maps concerning B. rapa pseudochromosome assembly have been published but their coverage of the genome is incomplete, anchoring approximately 73.6% of the scaffolds on to chromosomes. Therefore, a new genetic map to aid pseudochromosome assembly is required. Results This study concerns the construction of a reference genetic linkage map for Brassica rapa, forming the backbone for anchoring sequence scaffolds of the B. rapa genome resulting from recent sequencing efforts. One hundred and nineteen doubled haploid (DH lines derived from microspore cultures of an F1 cross between a Chinese cabbage (B. rapa ssp. pekinensis DH line (Z16 and a rapid cycling inbred line (L144 were used to construct the linkage map. PCR-based insertion/deletion (InDel markers were developed by re-sequencing the two parental lines. The map comprises a total of 507 markers including 415 InDels and 92 SSRs. Alignment and orientation using SSR markers in common with existing B. rapa linkage maps allowed ten linkage groups to be identified, designated A01-A10. The total length of the linkage map was 1234.2 cM, with an average distance of 2.43 cM between adjacent marker loci. The lengths of linkage groups ranged from 71.5 cM to 188.5 cM for A08 and A09, respectively. Using the developed linkage map, 152 scaffolds were anchored on to the chromosomes, encompassing more than 82.9% of the B. rapa genome. Taken together with the previously available linkage maps, 183 scaffolds were anchored on to the chromosomes and the total coverage of the genome was 88.9%. Conclusions The development of this linkage map is vital for the integration of genome

  7. Development of polymorphic microsatellite markers based on expressed sequence tags in Populus cathayana (Salicaceae).

    Science.gov (United States)

    Tian, Z Z; Zhang, F Q; Cai, Z Y; Chen, S L

    2016-01-01

    Populus cathayana occupies a large area within the northern, central, and southwestern regions of China, and is considered to be an important reforestation species in western China. In order to investigate the population genetic structure of this species, 10 polymorphic microsatellite loci were identified based on expressed sequence tags from de novo sequencing on the Illumina HiSeq 2000 platform. All microsatellite primers were tested on 48 P. cathayana individuals from four locations on the Qinghai-Tibet Plateau. The observed heterozygosity ranged from 0.000 to 1.000, and the null-allele frequency ranged from 0.000 to 0.904. These microsatellite markers may be a useful tool in genetic studies on P. cathayana and closely related species. PMID:27525845

  8. discussion on validity of rana maoershanensis based on partial sequence of 16s rrna gene

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    rana maoershanensis found in mt.maoershan in guangxi,china was reported as a new species in 2007,but there was no molecular data for this frog.the partial sequences (543 bp) of 16s rrna gene from 12 specimens of 3 brown frog species (rana hanluica,r.maoershanensis and r.chensinensis) were analyzed with 17 specimens of 9 species from genbank.the nucleotide sequence divergence between r.maoershanensis and the other brown frog species were 4.5%-6.5%,with 22-30 nucleotide substitutions at this locus.the phylogenetic relationships based on mp,ml,and bayesian inference indicate that the brown frogs from southern china were diverged into three groups (clades a,b and c).r.maoershanensis was clustered together a well-supported subclade (b-l).it is suggested that r.maoershanensis is a valid species.

  9. Systematic position of Myrtama Ovcz. & Kinz. based on morphological and nrDNA ITS sequence evidence

    Institute of Scientific and Technical Information of China (English)

    ZHANG Daoyuan; ZHANG Yuan; GASKIN J. F.; CHEN Zhiduan

    2006-01-01

    Myrtama is a genus named from Myricaria elegans Royle in the 1970's in terms of its morphological peculiarities. The establishment of this genus and its systematic position have been disputed since its inception. ITS sequences from 10 species of Tamaricaceae are reported, and analyzed by PAUP 4.0b8 and Bayesian Inference to reconstruct the phylogenies. A single ITS tree is generated from maximum parsimony and MrBayes analyses, respectively. The molecular data set shows strong support for Tamarix and Myricaria as monophyletic genera,and Myrtama as a sister group to the genus Myricaria.Based on morphological differences, a single morphological tree is also generated, in which two major lineages existed but Myrtama is a sister group to Tamarix, rather than Myricaria. The evidence from DNA sequences and morphological characters supports that Myicaria elegans should be put into neither Myricaria nor Tamarix, but kept in its own monotypic genus.

  10. Towards Engineered Processes for Sequencing-Based Analysis of Single Circulating Tumor Cells.

    Science.gov (United States)

    Adalsteinsson, Viktor A; Love, J Christopher

    2014-05-01

    Sequencing-based analysis of single circulating tumor cells (CTCs) has the potential to revolutionize our understanding of metastatic cancer and improve clinical care. Technologies exist to enrich, identify, recover, and sequence single cells, but to enable systematic routine analysis of single CTCs from a range of cancer patients, there is a need to establish processes that efficiently integrate these specific operations. Such engineered processes should address challenges associated with the yield and viability of enriched CTCs, the robust identification of candidate single CTCs with minimal degradation of DNA, the bias in whole-genome amplification, and the efficient handling of candidate single CTCs or their amplified DNA products. Advances in methods for single-cell analysis and nanoscale technologies suggest opportunities to overcome these challenges, and could create integrated platforms that perform several of the unit operations together. Ultimately, technologies should be selected or adapted for optimal performance and compatibility in an integrated process. PMID:24839591

  11. Statistical framework for detection of genetically modified organisms based on Next Generation Sequencing.

    Science.gov (United States)

    Willems, Sander; Fraiture, Marie-Alice; Deforce, Dieter; De Keersmaecker, Sigrid C J; De Loose, Marc; Ruttink, Tom; Herman, Philippe; Van Nieuwerburgh, Filip; Roosens, Nancy

    2016-02-01

    Because the number and diversity of genetically modified (GM) crops has significantly increased, their analysis based on real-time PCR (qPCR) methods is becoming increasingly complex and laborious. While several pioneers already investigated Next Generation Sequencing (NGS) as an alternative to qPCR, its practical use has not been assessed for routine analysis. In this study a statistical framework was developed to predict the number of NGS reads needed to detect transgene sequences, to prove their integration into the host genome and to identify the specific transgene event in a sample with known composition. This framework was validated by applying it to experimental data from food matrices composed of pure GM rice, processed GM rice (noodles) or a 10% GM/non-GM rice mixture, revealing some influential factors. Finally, feasibility of NGS for routine analysis of GM crops was investigated by applying the framework to samples commonly encountered in routine analysis of GM crops. PMID:26304412

  12. Development of polymorphic microsatellite markers based on expressed sequence tags in Populus cathayana (Salicaceae).

    Science.gov (United States)

    Tian, Z Z; Zhang, F Q; Cai, Z Y; Chen, S L

    2016-01-01

    Populus cathayana occupies a large area within the northern, central, and southwestern regions of China, and is considered to be an important reforestation species in western China. In order to investigate the population genetic structure of this species, 10 polymorphic microsatellite loci were identified based on expressed sequence tags from de novo sequencing on the Illumina HiSeq 2000 platform. All microsatellite primers were tested on 48 P. cathayana individuals from four locations on the Qinghai-Tibet Plateau. The observed heterozygosity ranged from 0.000 to 1.000, and the null-allele frequency ranged from 0.000 to 0.904. These microsatellite markers may be a useful tool in genetic studies on P. cathayana and closely related species.

  13. Statistical framework for detection of genetically modified organisms based on Next Generation Sequencing.

    Science.gov (United States)

    Willems, Sander; Fraiture, Marie-Alice; Deforce, Dieter; De Keersmaecker, Sigrid C J; De Loose, Marc; Ruttink, Tom; Herman, Philippe; Van Nieuwerburgh, Filip; Roosens, Nancy

    2016-02-01

    Because the number and diversity of genetically modified (GM) crops has significantly increased, their analysis based on real-time PCR (qPCR) methods is becoming increasingly complex and laborious. While several pioneers already investigated Next Generation Sequencing (NGS) as an alternative to qPCR, its practical use has not been assessed for routine analysis. In this study a statistical framework was developed to predict the number of NGS reads needed to detect transgene sequences, to prove their integration into the host genome and to identify the specific transgene event in a sample with known composition. This framework was validated by applying it to experimental data from food matrices composed of pure GM rice, processed GM rice (noodles) or a 10% GM/non-GM rice mixture, revealing some influential factors. Finally, feasibility of NGS for routine analysis of GM crops was investigated by applying the framework to samples commonly encountered in routine analysis of GM crops.

  14. Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence

    Directory of Open Access Journals (Sweden)

    Luo Ming-Cheng

    2011-01-01

    Full Text Available Abstract Background Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS of total genomic DNA by making alignment and clustering of short reads generated by the NGS platforms difficult, particularly in the absence of a reference genome sequence. Results An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents of shotgun reads of another genotype generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative SNPs. A pipeline program package, AGSNP, was developed and used for genome-wide SNP discovery in Aegilops tauschii-the diploid source of the wheat D genome, and with a genome size of 4.02 Gb, of which 90% is repetitive sequences. Genomic DNA of Ae. tauschii accession AL8/78 was sequenced with the Roche 454 NGS platform. Genomic DNA and cDNA of Ae. tauschii accession AS75 was sequenced primarily with SOLiD, although some Solexa and Roche 454 genomic sequences were also generated. A total of 195,631 putative SNPs were discovered in gene sequences, 155,580 putative SNPs were discovered in uncharacterized single-copy regions, and another 145,907 putative SNPs were discovered in repeat junctions. These SNPs were dispersed across the entire Ae. tauschii genome. To assess the false positive SNP discovery rate, DNA

  15. Sequence-structure based phylogeny of GPCR Class A Rhodopsin receptors.

    Science.gov (United States)

    Kakarala, Kavita Kumari; Jamil, Kaiser

    2014-05-01

    Current methods of G protein coupled receptors (GPCRs) phylogenetic classification are sequence based and therefore inappropriate for highly divergent sequences, sharing low sequence identity. In this study, sequence structure profile based alignment generated by PROMALS3D was used to understand the GPCR Class A Rhodopsin superfamily evolution using the MEGA 5 software. Phylogenetic analysis included a combination of Neighbor-Joining method and Maximum Likelihood method, with 1000 bootstrap replicates. Our study was able to identify potential ligand association for Class A Orphans and putative/unclassified Class A receptors with no cognate ligand information: GPR21 and GPR52 with fatty acids; GPR75 with Neuropeptide Y; GPR82, GPR18, GPR141 with N-arachidonylglycine; GPR176 with Free fatty acids, GPR10 with Tachykinin & Neuropeptide Y; GPR85 with ATP, ADP & UDP glucose; GPR151 with Galanin; GPR153 and GPR162 with Adrenalin, Noradrenalin; GPR146, GPR139, GPR142 with Neuromedin, Ghrelin, Neuromedin U-25 & Thyrotropin-releasing hormone; GPR171 with ATP, ADP & UDP Glucose; GPR88, GPR135, GPR161, GPR101with 11-cis-retinal; GPR83 with Tackykinin; GPR148 with Prostanoids, GPR109b, GPR81, GPR31with ATP & UTP and GPR150 with GnRH I & GnRHII. Furthermore, we suggest that this study would prove useful in re-classification of receptors, selecting templates for homology modeling and identifying ligands which may show cross reactivity with other GPCRs as signaling via multiple ligands play a significant role in disease modulation. PMID:24503482

  16. TFpredict and SABINE: sequence-based prediction of structural and functional characteristics of transcription factors.

    Directory of Open Access Journals (Sweden)

    Johannes Eichner

    Full Text Available One of the key mechanisms of transcriptional control are the specific connections between transcription factors (TF and cis-regulatory elements in gene promoters. The elucidation of these specific protein-DNA interactions is crucial to gain insights into the complex regulatory mechanisms and networks underlying the adaptation of organisms to dynamically changing environmental conditions. As experimental techniques for determining TF binding sites are expensive and mostly performed for selected TFs only, accurate computational approaches are needed to analyze transcriptional regulation in eukaryotes on a genome-wide level. We implemented a four-step classification workflow which for a given protein sequence (1 discriminates TFs from other proteins, (2 determines the structural superclass of TFs, (3 identifies the DNA-binding domains of TFs and (4 predicts their cis-acting DNA motif. While existing tools were extended and adapted for performing the latter two prediction steps, the first two steps are based on a novel numeric sequence representation which allows for combining existing knowledge from a BLAST scan with robust machine learning-based classification. By evaluation on a set of experimentally confirmed TFs and non-TFs, we demonstrate that our new protein sequence representation facilitates more reliable identification and structural classification of TFs than previously proposed sequence-derived features. The algorithms underlying our proposed methodology are implemented in the two complementary tools TFpredict and SABINE. The online and stand-alone versions of TFpredict and SABINE are freely available to academics at http://www.cogsys.cs.uni-tuebingen.de/software/TFpredict/ and http://www.cogsys.cs.uni-tuebingen.de/software/SABINE/.

  17. The Recipe for Protein Sequence-Based Function Prediction and Its Implementation in the ANNOTATOR Software Environment.

    Science.gov (United States)

    Eisenhaber, Birgit; Kuchibhatla, Durga; Sherman, Westley; Sirota, Fernanda L; Berezovsky, Igor N; Wong, Wing-Cheong; Eisenhaber, Frank

    2016-01-01

    As biomolecular sequencing is becoming the main technique in life sciences, functional interpretation of sequences in terms of biomolecular mechanisms with in silico approaches is getting increasingly significant. Function prediction tools are most powerful for protein-coding sequences; yet, the concepts and technologies used for this purpose are not well reflected in bioinformatics textbooks. Notably, protein sequences typically consist of globular domains and non-globular segments. The two types of regions require cardinally different approaches for function prediction. Whereas the former are classic targets for homology-inspired function transfer based on remnant, yet statistically significant sequence similarity to other, characterized sequences, the latter type of regions are characterized by compositional bias or simple, repetitive patterns and require lexical analysis and/or empirical sequence pattern-function correlations. The recipe for function prediction recommends first to find all types of non-globular segments and, then, to subject the remaining query sequence to sequence similarity searches. We provide an updated description of the ANNOTATOR software environment as an advanced example of a software platform that facilitates protein sequence-based function prediction. PMID:27115649

  18. The Recipe for Protein Sequence-Based Function Prediction and Its Implementation in the ANNOTATOR Software Environment.

    Science.gov (United States)

    Eisenhaber, Birgit; Kuchibhatla, Durga; Sherman, Westley; Sirota, Fernanda L; Berezovsky, Igor N; Wong, Wing-Cheong; Eisenhaber, Frank

    2016-01-01

    As biomolecular sequencing is becoming the main technique in life sciences, functional interpretation of sequences in terms of biomolecular mechanisms with in silico approaches is getting increasingly significant. Function prediction tools are most powerful for protein-coding sequences; yet, the concepts and technologies used for this purpose are not well reflected in bioinformatics textbooks. Notably, protein sequences typically consist of globular domains and non-globular segments. The two types of regions require cardinally different approaches for function prediction. Whereas the former are classic targets for homology-inspired function transfer based on remnant, yet statistically significant sequence similarity to other, characterized sequences, the latter type of regions are characterized by compositional bias or simple, repetitive patterns and require lexical analysis and/or empirical sequence pattern-function correlations. The recipe for function prediction recommends first to find all types of non-globular segments and, then, to subject the remaining query sequence to sequence similarity searches. We provide an updated description of the ANNOTATOR software environment as an advanced example of a software platform that facilitates protein sequence-based function prediction.

  19. Influence of Single Base Change in Shine-Dalgarno Sequence on the Stability of B.Subtilis Plasmid PSM604

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    B.Subtilis expression plasmids generally require a stringent Shine-Dalgarno Sequence(SDS). Site-directed-mutagenesis was explored to change the Shine-Dalgarno Sequence from AAAAATGGGG (mutant type) to AAAAAGGGGG (wild type) in recombinant plasmid PSM604. The single base substitution made the plasmid with wild SDS unstable in structure and segregation. The interaction of SDS with subtilisin leader sequence of PSM604 might be responsible for the instability of plasmid.

  20. Identifying and calling insertions, deletions, and single-base mutations efficiently from sequence data

    Science.gov (United States)

    Whole genome sequencing studies can directly identify causative mutations for subsequent use in genomic evaluations, but sequence variant identification is a lengthy and sometimes inaccurate process. The speed and accuracy of identifying small insertions and deletions of sequence, collectively terme...

  1. Brain Bases of Working Memory for Time Intervals in Rhythmic Sequences.

    Science.gov (United States)

    Teki, Sundeep; Griffiths, Timothy D

    2016-01-01

    Perception of auditory time intervals is critical for accurate comprehension of natural sounds like speech and music. However, the neural substrates and mechanisms underlying the representation of time intervals in working memory are poorly understood. In this study, we investigate the brain bases of working memory for time intervals in rhythmic sequences using functional magnetic resonance imaging. We used a novel behavioral paradigm to investigate time-interval representation in working memory as a function of the temporal jitter and memory load of the sequences containing those time intervals. Human participants were presented with a sequence of intervals and required to reproduce the duration of a particular probed interval. We found that perceptual timing areas including the cerebellum and the striatum were more or less active as a function of increasing and decreasing jitter of the intervals held in working memory respectively whilst the activity of the inferior parietal cortex is modulated as a function of memory load. Additionally, we also analyzed structural correlations between gray and white matter density and behavior and found significant correlations in the cerebellum and the striatum, mirroring the functional results. Our data demonstrate neural substrates of working memory for time intervals and suggest that the cerebellum and the striatum represent core areas for representing temporal information in working memory. PMID:27313506

  2. Sonication-based isolation and enrichment of Chlorella protothecoides chloroplasts for illumina genome sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Angelova, Angelina [University of Arizona; Park, Sang-Hycuk [University of Arizona; Kyndt, John [Bellevue University; Fitzsimmons, Kevin [University of Arizona; Brown, Judith K [University of Arizona

    2013-09-01

    With the increasing world demand for biofuel, a number of oleaginous algal species are being considered as renewable sources of oil. Chlorella protothecoides Krüger synthesizes triacylglycerols (TAGs) as storage compounds that can be converted into renewable fuel utilizing an anabolic pathway that is poorly understood. The paucity of algal chloroplast genome sequences has been an important constraint to chloroplast transformation and for studying gene expression in TAGs pathways. In this study, the intact chloroplasts were released from algal cells using sonication followed by sucrose gradient centrifugation, resulting in a 2.36-fold enrichment of chloroplasts from C. protothecoides, based on qPCR analysis. The C. protothecoides chloroplast genome (cpDNA) was determined using the Illumina HiSeq 2000 sequencing platform and found to be 84,576 Kb in size (8.57 Kb) in size, with a GC content of 30.8 %. This is the first report of an optimized protocol that uses a sonication step, followed by sucrose gradient centrifugation, to release and enrich intact chloroplasts from a microalga (C. prototheocoides) of sufficient quality to permit chloroplast genome sequencing with high coverage, while minimizing nuclear genome contamination. The approach is expected to guide chloroplast isolation from other oleaginous algal species for a variety of uses that benefit from enrichment of chloroplasts, ranging from biochemical analysis to genomics studies.

  3. The phylogenetic status of Paxillosida (Asteroidea) based on complete mitochondrial DNA sequences.

    Science.gov (United States)

    Matsubara, Mioko; Komatsu, Miéko; Araki, Takeyoshi; Asakawa, Shuichi; Yokobori, Shin-ichi; Watanabe, Kimitsuna; Wada, Hiroshi

    2005-09-01

    One of the most important issues in asteroid phylogeny is the phylogenetic status of Paxillosida. This group lacks an anus and suckers on the tube feet in adults and does not develop the brachiolaria stage in early development. Two controversial hypotheses have been proposed for the phylogenetic status of Paxillosida, i.e., Paxillosida is primitive or rather specialized in asteroids. In this study, we determined the complete mitochondrial DNA nucleotide sequences from two paxillosidans (Astropecten polyacanthus and Luidia quinaria) and one forcipulatidan (Asterias amurensis). The mitochondrial genomes of the three asteroids were identical with respect to gene order and transcription direction, and were identical to the previously reported mitochondrial genomes of Asterina pectinifera (Valvatida) and Pisaster ochraceus (Forcipulatida) in this respect. Therefore, the comparison of genome structures was uninformative for the purposes of asteroid phylogeny. However, molecular phylogenetic analyses based on the amino acid sequences and the nucleotide sequences from the five asteroids supported the monophyly of the clade that included the two paxillosidans and Asterina. This suggests that the paxillosidan characters are secondarily derived ones.

  4. Genotyping of B. licheniformis based on a novel multi-locus sequence typing (MLST scheme

    Directory of Open Access Journals (Sweden)

    Madslien Elisabeth H

    2012-10-01

    Full Text Available Abstract Background Bacillus licheniformis has for many years been used in the industrial production of enzymes, antibiotics and detergents. However, as a producer of dormant heat-resistant endospores B. licheniformis might contaminate semi-preserved foods. The aim of this study was to establish a robust and novel genotyping scheme for B. licheniformis in order to reveal the evolutionary history of 53 strains of this species. Furthermore, the genotyping scheme was also investigated for its use to detect food-contaminating strains. Results A multi-locus sequence typing (MLST scheme, based on the sequence of six house-keeping genes (adk, ccpA, recF, rpoB, spo0A and sucC of 53 B. licheniformis strains from different sources was established. The result of the MLST analysis supported previous findings of two different subgroups (lineages within this species, named “A” and “B” Statistical analysis of the MLST data indicated a higher rate of recombination within group “A”. Food isolates were widely dispersed in the MLST tree and could not be distinguished from the other strains. However, the food contaminating strain B. licheniformis NVH1032, represented by a unique sequence type (ST8, was distantly related to all other strains. Conclusions In this study, a novel and robust genotyping scheme for B. licheniformis was established, separating the species into two subgroups. This scheme could be used for further studies of evolution and population genetics in B. licheniformis.

  5. Brain bases of working memory for time intervals in rhythmic sequences

    Directory of Open Access Journals (Sweden)

    Sundeep eTeki

    2016-06-01

    Full Text Available Perception of auditory time intervals is critical for accurate comprehension of natural sounds like speech and music. However, the neural substrates and mechanisms underlying the representation of time intervals in working memory are poorly understood. In this study, we investigate the brain bases of working memory for time intervals in rhythmic sequences using functional magnetic resonance imaging.We used a novel behavioral paradigm to investigate time-interval representation in working memory as a function of the temporal jitter and memory load of the sequences containing those time intervals. Human participants were presented with a sequence of intervals and required to reproduce the duration of a particular probed interval. We found that perceptual timing areas including the cerebellum and the striatum were more or less active as a function of increasing and decreasing jitter of the intervals held in working memory respectively whilst the activity of the inferior parietal cortex is modulated as a function of memory load. Additionally, we also analyzed structural correlations between grey and white matter density and behavior and found significant correlations in the cerebellum and the striatum, mirroring the functional results.Our data demonstrate neural substrates of working memory for time intervals and suggest that the cerebellum and the striatum represent core areas for representing temporal information in working memory.

  6. Innovative molecular diagnosis of Trichinella species based on β-carbonic anhydrase genomic sequence.

    Science.gov (United States)

    Zolfaghari Emameh, Reza; Kuuslahti, Marianne; Näreaho, Anu; Sukura, Antti; Parkkila, Seppo

    2016-03-01

    Trichinellosis is a helminthic infection where different species of Trichinella nematodes are the causative agents. Several molecular assays have been designed to aid diagnostics of trichinellosis. These assays are mostly complex and expensive. The genomes of Trichinella species contain certain parasite-specific genes, which can be detected by polymerase chain reaction (PCR) methods. We selected β-carbonic anhydrase (β-CA) gene as a target, because it is present in many parasites genomes but absent in vertebrates. We developed a novel β-CA gene-based method for detection of Trichinella larvae in biological samples. We first identified a β-CA protein sequence from Trichinella spiralis by bioinformatic tools using β-CAs from Caenorhabditis elegans and Drosophila melanogaster. Thereafter, 16 sets of designed primers were tested to detect β-CA genomic sequences from three species of Trichinella, including T. spiralis, Trichinella pseudospiralis and Trichinella nativa. Among all 16 sets of designed primers, the primer set No. 2 efficiently amplified β-CA genomic sequences from T. spiralis, T. pseudospiralis and T. nativa without any false-positive amplicons from other parasite samples including Toxoplasma gondii, Toxocara cati and Parascaris equorum. This robust and straightforward method could be useful for meat inspection in slaughterhouses, quality control by food authorities and medical laboratories. PMID:26639312

  7. A Flexible Approach to Modelling Adaptive Course Sequencing based on Graphs implemented using XLink

    Directory of Open Access Journals (Sweden)

    Rachid ELOUAHBI

    2012-02-01

    Full Text Available A major challenge in developing systems of distance learning is the ability to adapt learning to individual users. This adaptation requires a flexible scheme for sequencing the material to teach diverse learners. This is where we intend to contribute to model the personalized learning paths to be followed by the learner to achieve his/her determined educational objective. Our modelling approach of sequencing is based on the pedagogical graph which is called SMARTGraph. This graph allows expressing the totality of the pedagogic constraints under which the learner is submitted in order to achieve his/her pedagogic objective. SMARTGraph is a graph in which the nodes are the learning units and the arcs are the pedagogic constraints between learning units. We shall see how it is possible to organize the learning units and the learning paths to answer the expectations within the framework of individual courses according to the learner profile or within the framework of group courses. To implement our approach we exploit the strength of XLink (XML Linking Language to define the sequencing graph.

  8. Molecular phylogenetic relationships of China Seas groupers based on cytochrome b gene fragment sequences

    Institute of Scientific and Technical Information of China (English)

    DING Shaoxiong; ZHUANG Xuan; GUO Feng; WANG Jun; SU Yongquan; ZHANG Qiyong; LI Qifu

    2006-01-01

    The classification and evolutionary relationships are important issues in the study of the groupers. Cytochrome b gene fragment of twenty-eight grouper species within six genera of subfamily Epinephelinae was amplified using PCR techniques and the sequences were analyzed to derive the phylogenetic relationships of the groupers from the China Seas. Genetic information indexes, including Kimura-2 parameter genetic distance and Ts/Tv ratios, were generated by using a variety of biology softwares. With Niphon spinosus, Pagrus major and Pagrus auriga as the designated outgroups, phylogenetic trees, which invoke additional homologous sequences of other Epinephelus fishes from GenBank, were constructed based on the neighbor-joining (NJ), maximum-parsimony (MP), maximum-likelihood (ML) and minimum-evolution (ME) methods. Several conclusions were drawn from the DNA sequences analysis: (1) genus Plectropomus, which was early diverged, is the most primitive group in the subfamily Epinephelinae; (2) genus Variola is more closely related to genus Cephalopolis than the other four genera; (3) genus Cephalopolis is a monophyletic group and more primitive than genus Epinephelus; (4) Promicrops lanceolatus and Cromileptes altivelis should be included in genus Epinephelus; (5) there exist two sister groups in genus Epinephelus.

  9. Origin and relationships of Saintpaulia (Gesneriaceae) based on ribosomal DNA internal transcribed spacer (ITS) sequences.

    Science.gov (United States)

    Moller, M; Cronk, Q

    1997-07-01

    Phylogenetic relationships of eight species of Saintpaulia H. Wendl., 19 species of Streptocarpus Lindl. (representing all major growth forms within the genus), and two outgroups (Haberlea rhodopensis Friv., Chirita spadiciformis W. T. Wang) were examined using comparative nucleotide sequences from the two internal transcribed spacers (ITS) of nuclear ribosomal DNA. The length of the ITS 1 region ranged from 228 to 249 base pairs (bp) and the ITS 2 region from 196 to 245 bp. Pairwise sequence divergence across both spacers for ingroup and outgroup species ranged from 0 to 29%. Streptocarpus is not monophyletic, and Saintpaulia is nested within Streptocarpus subgenus Streptocarpella. Streptocarpus subgenus Streptocarpus is monophyletic. The ITS sequence data demonstrate that the unifoliate Streptocarpus species form a clade, and are also characterized by a unique 47-bp deletion in ITS 2. The results strongly support the monophyly of (1) Saintpaulia, and (2) Saintpaulia plus the African members of the subgenus Streptocarpella of Streptocarpus. The data suggest the evolution of Saintpaulia from Streptocarpus subgenus Streptocarpella. The differences in flower and vegetative characters are probably due to ecological adaptation leading to a relatively rapid radiation of Saintpaulia. PMID:21708650

  10. Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses.

    Science.gov (United States)

    Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T

    2014-06-01

    Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach.

  11. Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies.

    Directory of Open Access Journals (Sweden)

    Patrick D Schloss

    Full Text Available The advent of next generation sequencing has coincided with a growth in interest in using these approaches to better understand the role of the structure and function of the microbial communities in human, animal, and environmental health. Yet, use of next generation sequencing to perform 16S rRNA gene sequence surveys has resulted in considerable controversy surrounding the effects of sequencing errors on downstream analyses. We analyzed 2.7×10(6 reads distributed among 90 identical mock community samples, which were collections of genomic DNA from 21 different species with known 16S rRNA gene sequences; we observed an average error rate of 0.0060. To improve this error rate, we evaluated numerous methods of identifying bad sequence reads, identifying regions within reads of poor quality, and correcting base calls and were able to reduce the overall error rate to 0.0002. Implementation of the PyroNoise algorithm provided the best combination of error rate, sequence length, and number of sequences. Perhaps more problematic than sequencing errors was the presence of chimeras generated during PCR. Because we knew the true sequences within the mock community and the chimeras they could form, we identified 8% of the raw sequence reads as chimeric. After quality filtering the raw sequences and using the Uchime chimera detection program, the overall chimera rate decreased to 1%. The chimeras that could not be detected were largely responsible for the identification of spurious operational taxonomic units (OTUs and genus-level phylotypes. The number of spurious OTUs and phylotypes increased with sequencing effort indicating that comparison of communities should be made using an equal number of sequences. Finally, we applied our improved quality-filtering pipeline to several benchmarking studies and observed that even with our stringent data curation pipeline, biases in the data generation pipeline and batch effects were observed that could potentially

  12. Molecular phylogenetic relationship of Eplnephelus based on sequences of mtDNA Cty b

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    The mtDNA Cyt b gene was sequenced partially for Variola louti of Serranidae,Epinephelinae and seven endemic species of groupers-Epinephelus awoara,E.brunneus,E.coioides,E.longispinis,E.sexfasciatus,E.spilotoceps and E.tauvina in China.The seven endemic species and other seven foreign species of groupers--E,aeneus,E.caninus,E.drummondhayi,E,haifensis,E.labriformis,E.marginatus and E.multinotatus from the GenBank were combined and analysed as ingroup,while Variola louti was used as outgroup.We compared the 420 bp sequences of Cyt b among the 15 species and constructed two types of molecular phylogenetic trees with maximum parsimony method (MP)and neighbor-joining method (NJ) respectively.The results were as follows:(1) As to the base composition of mtDNA Cyt b sequence (402 bp) of 14 species of Epinepkelus,the content of (A + T) was 53.6%,higher than that of (G + C) (46.4%).The transition/transversion ratio was 4.78 with no mutation saturation.(2) The duster relationships between E.awoara and E.sexfasciatus,E.coioides and E.tauvina,E.longispinis and E.spilotoceps were consistent with phenotypes in taxonomy.(3) In the phylogenetic tree,the species in the Atlantic Ocean were associated closely with those in the Pacific Ocean,which suggested that the Cyt b sequences of Epinephelus were highly conserved.This may be attributed to the coordinate evolution.(4) In well-bred mating or heredity management,mating Epinephelus of the same branch should be avoided.It is likely to be an effective way to mate the species of the Atlantic Ocean with those of the Pacific Ocean to improve the inheritance species.

  13. Aviram-Ratner rectifying mechanism for DNA base-pair sequencing through graphene nanogaps

    Science.gov (United States)

    Agapito, Luis A.; Gayles, Jacob; Wolowiec, Christian; Kioussis, Nicholas

    2012-04-01

    We demonstrate that biological molecules such as Watson-Crick DNA base pairs can behave as biological Aviram-Ratner electrical rectifiers because of the spatial separation and weak hydrogen bonding between the nucleobases. We have performed a parallel computational implementation of the ab initio non-equilibrium Green’s function (NEGF) theory to determine the electrical response of graphene—base-pair—graphene junctions. The results show an asymmetric (rectifying) current-voltage response for the cytosine-guanine base pair adsorbed on a graphene nanogap. In sharp contrast we find a symmetric response for the thymine-adenine case. We propose applying the asymmetry of the current-voltage response as a sensing criterion to the technological challenge of rapid DNA sequencing via graphene nanogaps.

  14. Automated family-based naming of small RNAs for next generation sequencing data using a modified MD5-digest algorithm

    OpenAIRE

    Liu, Guodong; Li, Zhihua; Lin, Yuefeng; John, Bino

    2012-01-01

    We developed NameMyGene, a web tool and a stand alone program to easily generate putative family-based names for small RNA sequences so that laboratories can easily organize, analyze, and observe patterns from, the massive amount of data generated by next-generation sequencers. NameMyGene, also applicable to other emerging methods such as RNA-Seq, and Chip-Seq, solely uses the input small RNA sequence and does not require any additional data such as other sequence data sets. The web server an...

  15. Spatiotemporal Super-Resolution Reconstruction Based on Robust Optical Flow and Zernike Moment for Video Sequences

    Directory of Open Access Journals (Sweden)

    Meiyu Liang

    2013-01-01

    Full Text Available In order to improve the spatiotemporal resolution of the video sequences, a novel spatiotemporal super-resolution reconstruction model (STSR based on robust optical flow and Zernike moment is proposed in this paper, which integrates the spatial resolution reconstruction and temporal resolution reconstruction into a unified framework. The model does not rely on accurate estimation of subpixel motion and is robust to noise and rotation. Moreover, it can effectively overcome the problems of hole and block artifacts. First we propose an efficient robust optical flow motion estimation model based on motion details preserving, then we introduce the biweighted fusion strategy to implement the spatiotemporal motion compensation. Next, combining the self-adaptive region correlation judgment strategy, we construct a fast fuzzy registration scheme based on Zernike moment for better STSR with higher efficiency, and then the final video sequences with high spatiotemporal resolution can be obtained by fusion of the complementary and redundant information with nonlocal self-similarity between the adjacent video frames. Experimental results demonstrate that the proposed method outperforms the existing methods in terms of both subjective visual and objective quantitative evaluations.

  16. Archaeorhizomyces borealis sp. nov. and a sequence-based classification of related soil fungal species.

    Science.gov (United States)

    Menkis, Audrius; Urbina, Hector; James, Timothy Y; Rosling, Anna

    2014-12-01

    The class Archaeorhizomycetes (Taphrinomycotina, Ascomycota) was introduced to accommodate an ancient lineage of soil-inhabiting fungi found in association with plant roots. Based on environmental sequencing data Archaeorhizomycetes may comprise a significant proportion of the total fungal community in soils. Yet the only species described and cultivated in this class is Archaeorhizomyces finlayi. In this paper, we describe a second species from a pure culture, Archaeorhizomyces borealis NS99-600(T) (=CBS138755(ExT)) based on morphological, physiological, and multi-locus molecular characterization. Archaeorhizomyces borealis was isolated from a root tip of a Pinus sylvestris seedling grown in a forest nursery in Lithuania. Analysis of Archaeorhizomycete species from environmental samples shows that it has a Eurasian distribution and is the most commonly observed species. Archaeorhizomyces borealis shows slow growth in culture and forms yellowish creamy colonies, characteristics that distinguish A. borealis from its closest relative A. finlayi. Here we also propose a sequence-based taxonomic classification of Archaeorhizomycetes and predict that approximately 500 species in this class remain to be isolated and described. PMID:25457942

  17. Molecular phylogeny of Toxoplasmatinae: comparison between inferences based on mitochondrial and apicoplast genetic sequences

    Directory of Open Access Journals (Sweden)

    Michelle Klein Sercundes

    2016-03-01

    Full Text Available Abstract Phylogenies within Toxoplasmatinae have been widely investigated with different molecular markers. Here, we studied molecular phylogenies of the Toxoplasmatinae subfamily based on apicoplast and mitochondrial genes. Partial sequences of apicoplast genes coding for caseinolytic protease (clpC and beta subunit of RNA polymerase (rpoB, and mitochondrial gene coding for cytochrome B (cytB were analyzed. Laboratory-adapted strains of the closely related parasites Sarcocystis falcatula and Sarcocystis neurona were investigated, along with Neospora caninum, Neospora hughesi, Toxoplasma gondii (strains RH, CTG and PTG, Besnoitia akodoni, Hammondia hammondiand two genetically divergent lineages of Hammondia heydorni. The molecular analysis based on organellar genes did not clearly differentiate between N. caninum and N. hughesi, but the two lineages of H. heydorni were confirmed. Slight differences between the strains of S. falcatula and S. neurona were encountered in all markers. In conclusion, congruent phylogenies were inferred from the three different genes and they might be used for screening undescribed sarcocystid parasites in order to ascertain their phylogenetic relationships with organisms of the family Sarcocystidae. The evolutionary studies based on organelar genes confirm that the genusHammondia is paraphyletic. The primers used for amplification of clpC and rpoB were able to amplify genetic sequences of organisms of the genus Sarcocystisand organisms of the subfamily Toxoplasmatinae as well.

  18. Parallax Effect Free Mosaicing of Underwater Video Sequence Based on Texture Features

    Directory of Open Access Journals (Sweden)

    Nagaraja S

    2014-10-01

    Full Text Available In this paper, we present feature-based technique for construction of mosaic image from underwater video sequence, which suffers from parallax distortion due to propagation properties of light in the underwater environment. The most of the available mosaic tools and underwater image mosaicing techniques yields final result with some artifacts such as blurring, ghosting and seam due to presence of parallax in the input images. The removal of parallax from input images may not reduce its effects instead it must be corrected in successive steps of mosaicing. Thus, our approach minimizes the parallax effects by adopting an efficient local alignment technique after global registration. We extract texture features using Centre Symmetric Local Binary Pattern (CS-LBP descriptor in order to find feature correspondences, which are used further for estimation of homography through RANSAC. In order to increase the accuracy of global registration, we perform preprocessing such as colour alignment between two selected frames based on colour distribution adjustment. Because of existence of 100% overlap in consecutive frames of underwater video, we select frames with minimum overlap based on mutual offset in order to reduce the computation cost during mosaicing. Our approach minimizes the parallax effects considerably in final mosaic constructed using our own underwater video sequences.

  19. Molecular phylogeny of Toxoplasmatinae: comparison between inferences based on mitochondrial and apicoplast genetic sequences.

    Science.gov (United States)

    Sercundes, Michelle Klein; Valadas, Samantha Yuri Oshiro Branco; Keid, Lara Borges; Oliveira, Tricia Maria Ferreira Souza; Ferreira, Helena Lage; Vitor, Ricardo Wagner de Almeida; Gregori, Fábio; Soares, Rodrigo Martins

    2016-01-01

    Phylogenies within Toxoplasmatinae have been widely investigated with different molecular markers. Here, we studied molecular phylogenies of the Toxoplasmatinae subfamily based on apicoplast and mitochondrial genes. Partial sequences of apicoplast genes coding for caseinolytic protease (clpC) and beta subunit of RNA polymerase (rpoB), and mitochondrial gene coding for cytochrome B (cytB) were analyzed. Laboratory-adapted strains of the closely related parasites Sarcocystis falcatula and Sarcocystis neurona were investigated, along with Neospora caninum, Neospora hughesi, Toxoplasma gondii (strains RH, CTG and PTG), Besnoitia akodoni, Hammondia hammondiand two genetically divergent lineages of Hammondia heydorni. The molecular analysis based on organellar genes did not clearly differentiate between N. caninum and N. hughesi, but the two lineages of H. heydorni were confirmed. Slight differences between the strains of S. falcatula and S. neurona were encountered in all markers. In conclusion, congruent phylogenies were inferred from the three different genes and they might be used for screening undescribed sarcocystid parasites in order to ascertain their phylogenetic relationships with organisms of the family Sarcocystidae. The evolutionary studies based on organelar genes confirm that the genus Hammondia is paraphyletic. The primers used for amplification of clpC and rpoB were able to amplify genetic sequences of organisms of the genus Sarcocystisand organisms of the subfamily Toxoplasmatinae as well. PMID:27007245

  20. A sequence-based dynamic ensemble learning system for protein ligand-binding site prediction

    KAUST Repository

    Chen, Peng

    2015-12-03

    Background: Proteins have the fundamental ability to selectively bind to other molecules and perform specific functions through such interactions, such as protein-ligand binding. Accurate prediction of protein residues that physically bind to ligands is important for drug design and protein docking studies. Most of the successful protein-ligand binding predictions were based on known structures. However, structural information is not largely available in practice due to the huge gap between the number of known protein sequences and that of experimentally solved structures

  1. Cryptanalysis of a novel image encryption scheme based on improved hyperchaotic sequences

    Science.gov (United States)

    Özkaynak, Fatih; Özer, Ahmet Bedri; Yavuz, Sırma

    2012-11-01

    Chaotic cryptography is a new field that has seen a significant amount of research activity during the last 20 years. Despite the many proposals that use various methods in the design of encryption algorithms, there is a definite need for a mathematically rigorous cryptanalysis of these designs. In this study, we analyze the security weaknesses of the "C. Zhu, A novel image encryption scheme based on improved hyperchaotic sequences, Optics Communications 285 (2012) 29-37". By applying chosen plaintext attacks, we show that all the secret parameters can be revealed.

  2. Genetic diversity of Mycoplasma arginini isolates based on multilocus sequence typing.

    Science.gov (United States)

    Olaogun, Olusola M; Kanci, Anna; Barber, Stuart R; Tivendale, Kelly A; Markham, Philip F; Marenda, Marc S; Browning, Glenn F

    2015-10-22

    The contribution of Mycoplasma arginini to mycoplasmosis in small ruminants remains unclear because it is recovered from both healthy and diseased animals. In order to gain a better understanding of any relationships between isolates from different sites and different geographical locations, we developed a method for genotyping M. arginini using multilocus sequence typing (MLST). A MLST scheme based on five housekeeping genes was used to characterize M. arginini isolates from flocks of sheep and goats. A high level of genetic variability was detected between strains and within herds. PMID:26264760

  3. Amplicon-based semiconductor sequencing of human exomes: performance evaluation and optimization strategies.

    Science.gov (United States)

    Damiati, E; Borsani, G; Giacopuzzi, Edoardo

    2016-05-01

    The Ion Proton platform allows to perform whole exome sequencing (WES) at low cost, providing rapid turnaround time and great flexibility. Products for WES on Ion Proton system include the AmpliSeq Exome kit and the recently introduced HiQ sequencing chemistry. Here, we used gold standard variants from GIAB consortium to assess the performances in variants identification, characterize the erroneous calls and develop a filtering strategy to reduce false positives. The AmpliSeq Exome kit captures a large fraction of bases (>94 %) in human CDS, ClinVar genes and ACMG genes, but with 2,041 (7 %), 449 (13 %) and 11 (19 %) genes not fully represented, respectively. Overall, 515 protein coding genes contain hard-to-sequence regions, including 90 genes from ClinVar. Performance in variants detection was maximum at mean coverage >120×, while at 90× and 70× we measured a loss of variants of 3.2 and 4.5 %, respectively. WES using HiQ chemistry showed ~71/97.5 % sensitivity, ~37/2 % FDR and ~0.66/0.98 F1 score for indels and SNPs, respectively. The proposed low, medium or high-stringency filters reduced the amount of false positives by 10.2, 21.2 and 40.4 % for indels and 21.2, 41.9 and 68.2 % for SNP, respectively. Amplicon-based WES on Ion Proton platform using HiQ chemistry emerged as a competitive approach, with improved accuracy in variants identification. False-positive variants remain an issue for the Ion Torrent technology, but our filtering strategy can be applied to reduce erroneous variants.

  4. Amplicon-based semiconductor sequencing of human exomes: performance evaluation and optimization strategies.

    Science.gov (United States)

    Damiati, E; Borsani, G; Giacopuzzi, Edoardo

    2016-05-01

    The Ion Proton platform allows to perform whole exome sequencing (WES) at low cost, providing rapid turnaround time and great flexibility. Products for WES on Ion Proton system include the AmpliSeq Exome kit and the recently introduced HiQ sequencing chemistry. Here, we used gold standard variants from GIAB consortium to assess the performances in variants identification, characterize the erroneous calls and develop a filtering strategy to reduce false positives. The AmpliSeq Exome kit captures a large fraction of bases (>94 %) in human CDS, ClinVar genes and ACMG genes, but with 2,041 (7 %), 449 (13 %) and 11 (19 %) genes not fully represented, respectively. Overall, 515 protein coding genes contain hard-to-sequence regions, including 90 genes from ClinVar. Performance in variants detection was maximum at mean coverage >120×, while at 90× and 70× we measured a loss of variants of 3.2 and 4.5 %, respectively. WES using HiQ chemistry showed ~71/97.5 % sensitivity, ~37/2 % FDR and ~0.66/0.98 F1 score for indels and SNPs, respectively. The proposed low, medium or high-stringency filters reduced the amount of false positives by 10.2, 21.2 and 40.4 % for indels and 21.2, 41.9 and 68.2 % for SNP, respectively. Amplicon-based WES on Ion Proton platform using HiQ chemistry emerged as a competitive approach, with improved accuracy in variants identification. False-positive variants remain an issue for the Ion Torrent technology, but our filtering strategy can be applied to reduce erroneous variants. PMID:27003585

  5. A new trilocus sequence-based multiplex-PCR to detect major Acinetobacter baumannii clones.

    Science.gov (United States)

    Martins, Natacha; Picão, Renata Cristina; Cerqueira-Alves, Morgana; Uehara, Aline; Barbosa, Lívia Carvalho; Riley, Lee W; Moreira, Beatriz Meurer

    2016-08-01

    A collection of 163 Acinetobacter baumannii isolates detected in a large Brazilian hospital, was potentially related with the dissemination of four clonal complexes (CC): 113/79, 103/15, 109/1 and 110/25, defined by University of Oxford/Institut Pasteur multilocus sequence typing (MLST) schemes. The urge of a simple multiplex-PCR scheme to specify these clones has motivated the present study. The established trilocus sequence-based typing (3LST, for ompA, csuE and blaOXA-51-like genes) multiplex-PCR rapidly identifies international clones I (CC109/1), II (CC118/2) and III (CC187/3). Thus, the system detects only one (CC109/1) out of four main CC in Brazil. We aimed to develop an alternative multiplex-PCR scheme to detect these clones, known to be present additionally in Africa, Asia, Europe, USA and South America. MLST, performed in the present study to complement typing our whole collection of isolates, confirmed that all isolates belonged to the same four CC detected previously. When typed by 3LST-based multiplex-PCR, only 12% of the 163 isolates were classified into groups. By comparative sequence analysis of ompA, csuE and blaOXA-51-like genes, a set of eight primers was designed for an alternative multiplex-PCR to distinguish the five CC 113/79, 103/15, 109/1, 110/25 and 118/2. Study isolates and one CC118/2 isolate were blind-tested with the new alternative PCR scheme; all were correctly clustered in groups of the corresponding CC. The new multiplex-PCR, with the advantage of fitting in a single reaction, detects five leading A. baumannii clones and could help preventing the spread in healthcare settings.

  6. Identification of forensic samples by using an infrared-based automatic DNA sequencer.

    Science.gov (United States)

    Ricci, Ugo; Sani, Ilaria; Klintschar, Michael; Cerri, Nicoletta; De Ferrari, Francesco; Giovannucci Uzielli, Maria Luisa

    2003-06-01

    We have recently introduced a new protocol for analyzing all core loci of the Federal Bureau of Investigation's (FBI) Combined DNA Index System (CODIS) with an infrared (IR) automatic DNA sequencer (LI-COR 4200). The amplicons were labeled with forward oligonucleotide primers, covalently linked to a new infrared fluorescent molecule (IRDye 800). The alleles were displayed as familiar autoradiogram-like images with real-time detection. This protocol was employed for paternity testing, population studies, and identification of degraded forensic samples. We extensively analyzed some simulated forensic samples and mixed stains (blood, semen, saliva, bones, and fixed archival embedded tissues), comparing the results with donor samples. Sensitivity studies were also performed for the four multiplex systems. Our results show the efficiency, reliability, and accuracy of the IR system for the analysis of forensic samples. We also compared the efficiency of the multiplex protocol with ultraviolet (UV) technology. Paternity tests, undegraded DNA samples, and real forensic samples were analyzed with this approach based on IR technology and with UV-based automatic sequencers in combination with commercially-available kits. The comparability of the results with the widespread UV methods suggests that it is possible to exchange data between laboratories using the same core group of markers but different primer sets and detection methods.

  7. MuffinInfo: HTML5-Based Statistics Extractor from Next-Generation Sequencing Data.

    Science.gov (United States)

    Alic, Andy S; Blanquer, Ignacio

    2016-09-01

    Usually, the information known a priori about a newly sequenced organism is limited. Even resequencing the same organism can generate unpredictable output. We introduce MuffinInfo, a FastQ/Fasta/SAM information extractor implemented in HTML5 capable of offering insights into next-generation sequencing (NGS) data. Our new tool can run on any software or hardware environment, in command line or graphically, and in browser or standalone. It presents information such as average length, base distribution, quality scores distribution, k-mer histogram, and homopolymers analysis. MuffinInfo improves upon the existing extractors by adding the ability to save and then reload the results obtained after a run as a navigable file (also supporting saving pictures of the charts), by supporting custom statistics implemented by the user, and by offering user-adjustable parameters involved in the processing, all in one software. At the moment, the extractor works with all base space technologies such as Illumina, Roche, Ion Torrent, Pacific Biosciences, and Oxford Nanopore. Owing to HTML5, our software demonstrates the readiness of web technologies for mild intensive tasks encountered in bioinformatics. PMID:27606794

  8. Application of Sequence-based Methods in Human MicrobialEcology

    Energy Technology Data Exchange (ETDEWEB)

    Weng, Li; Rubin, Edward M.; Bristow, James

    2005-08-29

    Ecologists studying microbial life in the environment have recognized the enormous complexity of microbial diversity for many years, and the development of a variety of culture-independent methods, many of them coupled with high-throughput DNA sequencing, has allowed this diversity to be explored in ever greater detail. Despite the widespread application of these new techniques to the characterization of uncultivated microbes and microbial communities in the environment, their application to human health and disease has lagged behind. Because DNA based-techniques for defining uncultured microbes allow not only cataloging of microbial diversity, but also insight into microbial functions, investigators are beginning to apply these tools to the microbial communities that abound on and within us, in what has aptly been called the second Human Genome Project. In this review we discuss the sequence-based methods for microbial analysis that are currently available and their application to identify novel human pathogens, improve diagnosis of known infectious diseases, and to advance understanding of our relationship with microbial communities that normally reside in and on the human body.

  9. MuffinInfo: HTML5-Based Statistics Extractor from Next-Generation Sequencing Data.

    Science.gov (United States)

    Alic, Andy S; Blanquer, Ignacio

    2016-09-01

    Usually, the information known a priori about a newly sequenced organism is limited. Even resequencing the same organism can generate unpredictable output. We introduce MuffinInfo, a FastQ/Fasta/SAM information extractor implemented in HTML5 capable of offering insights into next-generation sequencing (NGS) data. Our new tool can run on any software or hardware environment, in command line or graphically, and in browser or standalone. It presents information such as average length, base distribution, quality scores distribution, k-mer histogram, and homopolymers analysis. MuffinInfo improves upon the existing extractors by adding the ability to save and then reload the results obtained after a run as a navigable file (also supporting saving pictures of the charts), by supporting custom statistics implemented by the user, and by offering user-adjustable parameters involved in the processing, all in one software. At the moment, the extractor works with all base space technologies such as Illumina, Roche, Ion Torrent, Pacific Biosciences, and Oxford Nanopore. Owing to HTML5, our software demonstrates the readiness of web technologies for mild intensive tasks encountered in bioinformatics.

  10. Evaluation of the Terminal Sequencing and Spacing System for Performance Based Navigation Arrivals

    Science.gov (United States)

    Thipphavong, Jane; Jung, Jaewoo; Swenson, Harry N.; Martin, Lynne; Lin, Melody; Nguyen, Jimmy

    2013-01-01

    NASA has developed the Terminal Sequencing and Spacing (TSS) system, a suite of advanced arrival management technologies combining timebased scheduling and controller precision spacing tools. TSS is a ground-based controller automation tool that facilitates sequencing and merging arrivals that have both current standard ATC routes and terminal Performance-Based Navigation (PBN) routes, especially during highly congested demand periods. In collaboration with the FAA and MITRE's Center for Advanced Aviation System Development (CAASD), TSS system performance was evaluated in human-in-the-loop (HITL) simulations with currently active controllers as participants. Traffic scenarios had mixed Area Navigation (RNAV) and Required Navigation Performance (RNP) equipage, where the more advanced RNP-equipped aircraft had preferential treatment with a shorter approach option. Simulation results indicate the TSS system achieved benefits by enabling PBN, while maintaining high throughput rates-10% above baseline demand levels. Flight path predictability improved, where path deviation was reduced by 2 NM on average and variance in the downwind leg length was 75% less. Arrivals flew more fuel-efficient descents for longer, spending an average of 39 seconds less in step-down level altitude segments. Self-reported controller workload was reduced, with statistically significant differences at the p less than 0.01 level. The RNP-equipped arrivals were also able to more frequently capitalize on the benefits of being "Best-Equipped, Best- Served" (BEBS), where less vectoring was needed and nearly all RNP approaches were conducted without interruption.

  11. Changes in DNA base sequences in the mutant of Arabidopsis thaliana induced by low-energy N+ implantation

    Institute of Scientific and Technical Information of China (English)

    常凤启; 刘选明; 李银心; 贾庚祥; 马晶晶; 刘公社; 朱至清

    2003-01-01

    To reveal the mutation effect of low-energy ion implantation on Arabidopsis thaliana in vivo, T80II, a stable dwarf mutant, derived from the seeds irradiated by 30 keV N+ with the dose of 80×1015 ions/cm2 was used for Random Amplified Polymorphic DNA (RAPD) and base sequence analysis. The results indicated that among total 397 RAPD bands observed, 52 bands in T80II were different from those of wild type showing a variation frequency 13.1%. In comparison with the sequences of A. thaliana in GenBank, the RAPD fragments in T80II were changed greatly in base sequences with an average rate of one base change per 16.8 bases. The types of base changes included base transition, transversion, deletion and insertion. Among the 275 base changes detected, single base substitutions (97.09%) occurred more frequently than base deletions and insertions (2.91%). And the frequency of base transitions (66.55%) was higher than that of base transversions (30.55%). Adenine, thymine, guanine or cytosine could be replaced by any of other three bases in cloned DNA fragments in T80II. It seems that thymine was more sensitive to the irradiation than other bases. The flanking sequences of the base changes in RAPD fragments in T80II were analyzed and the mutational "hotspot" induced by low-energy ion implantation was discussed.

  12. Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment

    Directory of Open Access Journals (Sweden)

    Manzini Giovanni

    2007-07-01

    Full Text Available Abstract Background Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rather than a formula quantifying the similarity of two strings. Three approximations of USM are available, namely UCD (Universal Compression Dissimilarity, NCD (Normalized Compression Dissimilarity and CD (Compression Dissimilarity. Their applicability and robustness is tested on various data sets yielding a first massive quantitative estimate that the USM methodology and its approximations are of value. Despite the rich theory developed around USM, its experimental assessment has limitations: only a few data compressors have been tested in conjunction with USM and mostly at a qualitative level, no comparison among UCD, NCD and CD is available and no comparison of USM with existing methods, both based on alignments and not, seems to be available. Results We experimentally test the USM methodology by using 25 compressors, all three of its known approximations and six data sets of relevance to Molecular Biology. This offers the first systematic and quantitative experimental assessment of this methodology, that naturally complements the many theoretical and the preliminary experimental results available. Moreover, we compare the USM methodology both with methods based on alignments and not. We may group our experiments into two sets. The first one, performed via ROC

  13. Channels Reallocation In Cognitive Radio Networks Based On DNA Sequence Alignment

    CERN Document Server

    Singh, Santosh Kumar; Pathak, Vibhakar; 10.5121/ijngn.2010.2203

    2010-01-01

    Nowadays, It has been shown that spectrum scarcity increased due to tremendous growth of new players in wireless base system by the evolution of the radio communication. Resent survey found that there are many areas of the radio spectrum that are occupied by authorized user/primary user (PU), which are not fully utilized. Cognitive radios (CR) prove to next generation wireless communication system that proposed as a way to reuse this under-utilised spectrum in an opportunistic and non-interfering basis. A CR is a self-directed entity in a wireless communications environment that senses its environment, tracks changes, and reacts upon its findings and frequently exchanges information with the networks for secondary user (SU). However, CR facing collision problem with tracks changes i.e. reallocating of other empty channels for SU while PU arrives. In this paper, channels reallocation technique based on DNA sequence alignment algorithm for CR networks has been proposed.

  14. Repetitive sequence based polymerase chain reaction to differentiate close bacteria strains in acidic sites

    Institute of Scientific and Technical Information of China (English)

    XIE Ming; YIN Hua-qun; LIU Yi; LIU Jie; LIU Xue-duan

    2008-01-01

    To study the diversity of bacteria strains newly isolated from several acid mine drainage(AMD) sites in China,repetitive sequence based polymerase chain reaction (rep-PCR),a well established technology for diversity analysis of closely related bacteria strains,was conducted on 30 strains of bacteria Leptospirillum ferriphilium,8 strains of bacteria Acidithiobacillus ferrooxidans,as well as the Acidithiobacillus ferrooxidans type strain ATCC (American Type Culture Collection) 23270.The results showed that,using ERIC and BOX primer sets,rep-PCR produced highly discriminatory banding patterns.Phylogenetic analysis based on ERIC-PCR banding types was made and the results indicated that rep-PCR could be used as a rapid and highly discriminatory screening technique in studying bacterial diversity,especially in differentiating bacteria within one species in AMD.

  15. Environment map building and localization for robot navigation based on image sequences

    Institute of Scientific and Technical Information of China (English)

    Ye-hu SHEN; Ji-lin LIU; Xin DU

    2008-01-01

    SLAM is one of the most important components in robot navigation. A SLAM algorithm based on image sequences captured by a single digital camera is proposed in this paper. By this algorithm, SIFT feature points are selected and matched between image pairs sequentially. After three images have been captured, the environment's 3D map and the camera's positions are initialized based on matched feature points and intrinsic parameters of the camera. A robust method is applied to estimate the position and orientation of the camera in the forthcoming images. Finally, a robust adaptive bundle adjustment algorithm is adopted to optimize the environment's 3D map and the camera's positions simultaneously. Results of quantitative and qualitative experiments show that our algorithm can reconstruct the environment and localize the camera accurately and efficiently.

  16. Study on multiple-hops performance of MOOC sequences-based optical labels for OPS networks

    Science.gov (United States)

    Zhang, Chongfu; Qiu, Kun; Ma, Chunli

    2009-11-01

    In this paper, we utilize a new study method that is under independent case of multiple optical orthogonal codes to derive the probability function of MOOCS-OPS networks, discuss the performance characteristics for a variety of parameters, and compare some characteristics of the system employed by single optical orthogonal code or multiple optical orthogonal codes sequences-based optical labels. The performance of the system is also calculated, and our results verify that the method is effective. Additionally it is found that performance of MOOCS-OPS networks would, negatively, be worsened, compared with single optical orthogonal code-based optical label for optical packet switching (SOOC-OPS); however, MOOCS-OPS networks can greatly enlarge the scalability of optical packet switching networks.

  17. Flag-based detection of weak gas signatures in long-wave infrared hyperspectral image sequences

    Science.gov (United States)

    Marrinan, Timothy; Beveridge, J. Ross; Draper, Bruce; Kirby, Michael; Peterson, Chris

    2016-05-01

    We present a flag manifold based method for detecting chemical plumes in long-wave infrared hyperspectral movies. The method encodes temporal and spatial information related to a hyperspectral pixel into a flag, or nested sequence of linear subspaces. The technique used to create the flags pushes information about the background clutter, ambient conditions, and potential chemical agents into the leading elements of the flags. Exploiting this temporal information allows for a detection algorithm that is sensitive to the presence of weak signals. This method is compared to existing techniques qualitatively on real data and quantitatively on synthetic data to show that the flag-based algorithm consistently performs better on data when the SINRdB is low, and beats the ACE and MF algorithms in probability of detection for low probabilities of false alarm even when the SINRdB is high.

  18. Performance of microarray and liquid based capture methods for target enrichment for massively parallel sequencing and SNP discovery.

    Directory of Open Access Journals (Sweden)

    Anna Kiialainen

    Full Text Available Targeted sequencing is a cost-efficient way to obtain answers to biological questions in many projects, but the choice of the enrichment method to use can be difficult. In this study we compared two hybridization methods for target enrichment for massively parallel sequencing and single nucleotide polymorphism (SNP discovery, namely Nimblegen sequence capture arrays and the SureSelect liquid-based hybrid capture system. We prepared sequencing libraries from three HapMap samples using both methods, sequenced the libraries on the Illumina Genome Analyzer, mapped the sequencing reads back to the genome, and called variants in the sequences. 74-75% of the sequence reads originated from the targeted region in the SureSelect libraries and 41-67% in the Nimblegen libraries. We could sequence up to 99.9% and 99.5% of the regions targeted by capture probes from the SureSelect libraries and from the Nimblegen libraries, respectively. The Nimblegen probes covered 0.6 Mb more of the original 3.1 Mb target region than the SureSelect probes. In each sample, we called more SNPs and detected more novel SNPs from the libraries that were prepared using the Nimblegen method. Thus the Nimblegen method gave better results when judged by the number of SNPs called, but this came at the cost of more over-sampling.

  19. A grammar-based distance metric enables fast and accurate clustering of large sets of 16S sequences

    Directory of Open Access Journals (Sweden)

    Benson Andrew K

    2010-12-01

    Full Text Available Abstract Background We propose a sequence clustering algorithm and compare the partition quality and execution time of the proposed algorithm with those of a popular existing algorithm. The proposed clustering algorithm uses a grammar-based distance metric to determine partitioning for a set of biological sequences. The algorithm performs clustering in which new sequences are compared with cluster-representative sequences to determine membership. If comparison fails to identify a suitable cluster, a new cluster is created. Results The performance of the proposed algorithm is validated via comparison to the popular DNA/RNA sequence clustering approach, CD-HIT-EST, and to the recently developed algorithm, UCLUST, using two different sets of 16S rDNA sequences from 2,255 genera. The proposed algorithm maintains a comparable CPU execution time with that of CD-HIT-EST which is much slower than UCLUST, and has successfully generated clusters with higher statistical accuracy than both CD-HIT-EST and UCLUST. The validation results are especially striking for large datasets. Conclusions We introduce a fast and accurate clustering algorithm that relies on a grammar-based sequence distance. Its statistical clustering quality is validated by clustering large datasets containing 16S rDNA sequences.

  20. Adaptation of Shift Sequence Based Method for High Number in Shifts Rostering Problem for Health Care Workers

    Directory of Open Access Journals (Sweden)

    Mindaugas Liogys

    2013-08-01

    Full Text Available Purpose—is to investigate a shift sequence-based approach efficiency then problem consisting of a high number of shifts.Research objectives:• Solve health care workers rostering problem using a shift sequence based method.• Measure its efficiency then number of shifts increases.Design/methodology/approach—Usually rostering problems are highly constrained. Constraints are classified to soft and hard constraints. Soft and hard constraints of the problem are additionally classified to: sequence constraints, schedule constraints and roster constraints. Sequence constraints are considered when constructing shift sequences. Schedule constraints are considered when constructing a schedule. Roster constraints are applied, then constructing overall solution, i.e. combining all schedules.Shift sequence based approach consists of two stages:• Shift sequences construction,• The construction of schedules.In the shift sequences construction stage, the shift sequences are constructed for each set of health care workers of different skill, considering sequence constraints. Shifts sequences are ranked by their penalties for easier retrieval in later stage.In schedules construction stage, schedules for each health care worker are constructed iteratively, using the shift sequences produced in stage 1.Shift sequence based method is an adaptive iterative method where health care workers who received the highest schedule penalties in the last iteration are scheduled first at the current iteration.During the roster construction, and after a schedule has been generated for the current health care worker, an improvement method based on an efficient greedy local search is carried out on the partial roster. It simply swaps any pair of shifts between two health care workers in the (partial roster, as long as the swaps satisfy hard constraints and decrease the roster penalty.Findings—Using shift sequence method for solving health care workers rostering problem

  1. Adaptation of Shift Sequence Based Method for High Number in Shifts Rostering Problem for Health Care Workers

    Directory of Open Access Journals (Sweden)

    Mindaugas Liogys

    2011-08-01

    Full Text Available Purpose—is to investigate a shift sequence-based approach efficiency then problem consisting of a high number of shifts. Research objectives:• Solve health care workers rostering problem using a shift sequence based method.• Measure its efficiency then number of shifts increases. Design/methodology/approach—Usually rostering problems are highly constrained.Constraints are classified to soft and hard constraints. Soft and hard constraints of the problem are additionally classified to: sequence constraints, schedule constraints and roster constraints. Sequence constraints are considered when constructing shift sequences. Schedule constraints are considered when constructing a schedule. Roster constraints are applied, then constructing overall solution, i.e. combining all schedules.Shift sequence based approach consists of two stages:• Shift sequences construction,• The construction of schedules.In the shift sequences construction stage, the shift sequences are constructed for each set of health care workers of different skill, considering sequence constraints. Shifts sequences are ranked by their penalties for easier retrieval in later stage.In schedules construction stage, schedules for each health care worker are constructed iteratively, using the shift sequences produced in stage 1. Shift sequence based method is an adaptive iterative method where health care workers who received the highest schedule penalties in the last iteration are scheduled first at the current iteration. During the roster construction, and after a schedule has been generated for the current health care worker, an improvement method based on an efficient greedy local search is carried out on the partial roster. It simply swaps any pair of shifts between two health care workers in the (partial roster, as long as the swaps satisfy hard constraints and decrease the roster penalty.Findings—Using shift sequence method for solving health care workers rostering

  2. Origin and phylogenetic analysis of Tibetan Mastiff based on the mitochondrial DNA sequence

    Institute of Scientific and Technical Information of China (English)

    Qifa Li; Zhuang Xie; Zhenshan Liu; Yinxia Li; Xingbo Zhao; Liyan Dong; Zengxiang Pan; Yuanrong Sun; Ning Li; Yinxue Xu

    2008-01-01

    At present, the Tibetan Mastiff is the oldest and most ferocious dog in the world. However, the origin of the Tibetan Mastiff and its Phylogenetic relationship with other large breed dogs such as Saint Bernard are unclear. In this study, the primers were designed according to the mitochondrial genome sequence of the domestic dog, and the 2,525 bp mitochondrial sequence, containing the whole sequence of Cytochrome b, tRNA-Thr, tRNA-Pro, and control region of the Tibetan Mastiff, was obtained. Using grey wolves and coyotes as outgroups, the Tibetan Mastiff and 12 breeds of domestic dogs were analyzed in phylogenesis. Tibetan Mastiff, domestic dog breeds, and grey wolves were clustered into a group and coyotes were clustered in a group separately. This indicated that the Tibetan Mastiff and the other domestic dogs originated from the grey wolf, and the Tibetan Mastiff belonged to Carnivora, Canidae, Canis, Canis lupus, Canis lupus familiaris on the animal taxonomy. In domestic dogs, the middle and small breed dogs were clustered at first; German Sheepdog, Swedish Elkhound, and Black Russian Terrier were clustered into one group, and the Tibetan Mastiff, Old English Sheepdog, Leonberger, and Saint Bernard were clustered in another group. This confirmed the viewpoint that many of the famous large breed dogs worldwide Such as Saint Bernard possibly had the blood lineage of the Tibetan Mastiff, based on the molecular data. According to the substitution rate, we concluded that the approximate divergence time between Tibetan Mastiff and grey wolf was 58,000 years before the present (YBP), and the approximate divergence time between other domestic dogs and grey wolf was 42,000 YBP, demonstrating that the time of origin of the Tibetan Mastiff was earlier than that of the other domestic dogs.

  3. Gene Sequence Based Clustering Assists in Dereplication of Pseudoalteromonas luteoviolacea Strains with Identical Inhibitory Activity and Antibiotic Production

    DEFF Research Database (Denmark)

    Vynne, Nikolaj Grønnegaard; Månsson, Maria; Gram, Lone

    2012-01-01

    Some microbial species are chemically homogenous, and the same secondary metabolites are found in all strains. In contrast, we previously found that five strains of P. luteoviolacea were closely related by 16S rRNA gene sequence but produced two different antibiotic profiles. The purpose...... antibacterial profiles based on inhibition assays against Vibrio anguillarum and Staphylococcus aureus. To determine whether chemotype and inhibition profile are reflected by phylogenetic clustering we sequenced 16S rRNA, gyrB and recA genes. Clustering based on 16S rRNA gene sequences alone showed little...... correlation to chemotypes and inhibition profiles, while clustering based on concatenated 16S rRNA, gyrB, and recA gene sequences resulted in three clusters, two of which uniformly consisted of strains of identical chemotype and inhibition profile. A major time sink in natural products discovery is the effort...

  4. A combined sequence-based and fragment-based characterization of microbial eukaryote assemblages provides taxonomic context for the Terminal Restriction Fragment Length Polymorphism (T-RFLP) method.

    Science.gov (United States)

    Kim, Diane Y; Countway, Peter D; Yamashita, Warren; Caron, David A

    2012-12-01

    Microbial eukaryotes in seawater samples collected from two depths (5 m and 500 m) at the USC Microbial Observatory off the coast of Southern California, USA, were characterized by cloning and sequencing of 18S rRNA genes, as well as DNA fragment analysis of these genes. The sequenced genes were assigned to operational taxonomic units (OTUs), and taxonomic information for the sequence-based OTUs was obtained by comparison to public sequence databases. The sequences were then subjected to in silico digestion to predict fragment sizes, and that information was compared to the results of the T-RFLP method applied to the same samples in order to provide taxonomic context for the environmental T-RFLP fragments. A total of 663 and 678 sequences were analyzed for the 5m and 500 m samples, respectively, which clustered into 157 OTUs and 183 OTUs. The sequences yielded substantially fewer taxonomic units as in silico fragment lengths (i.e., following in silico digestion), and the environmental T-RFLP resulted in the fewest unique OTUs (unique fragments). Bray-Curtis similarity analysis of protistan assemblages was greater using the T-RFLP dataset compared to the sequence-based OTU dataset, presumably due to the inability of the fragment method to differentiate some taxa and an inability to detect many rare taxa relative to the sequence-based approach. Nonetheless, fragments in our analysis generally represented the dominant sequence-based OTUs and putative identifications could be assigned to a majority of the fragments in the environmental T-RFLP results. Our empirical examination of the T-RFLP method identified limitations relative to sequence-based community analysis, but the relative ease and low cost of fragment analysis make this method a useful approach for characterizing the dominant taxa within complex assemblages of microbial eukaryotes in large datasets.

  5. Digital Sequences and a Time Reversal-Based Impact Region Imaging and Localization Method

    Directory of Open Access Journals (Sweden)

    Weifeng Qian

    2013-10-01

    Full Text Available To reduce time and cost of damage inspection, on-line impact monitoring of aircraft composite structures is needed. A digital monitor based on an array of piezoelectric transducers (PZTs is developed to record the impact region of impacts on-line. It is small in size, lightweight and has low power consumption, but there are two problems with the impact alarm region localization method of the digital monitor at the current stage. The first one is that the accuracy rate of the impact alarm region localization is low, especially on complex composite structures. The second problem is that the area of impact alarm region is large when a large scale structure is monitored and the number of PZTs is limited which increases the time and cost of damage inspections. To solve the two problems, an impact alarm region imaging and localization method based on digital sequences and time reversal is proposed. In this method, the frequency band of impact response signals is estimated based on the digital sequences first. Then, characteristic signals of impact response signals are constructed by sinusoidal modulation signals. Finally, the phase synthesis time reversal impact imaging method is adopted to obtain the impact region image. Depending on the image, an error ellipse is generated to give out the final impact alarm region. A validation experiment is implemented on a complex composite wing box of a real aircraft. The validation results show that the accuracy rate of impact alarm region localization is approximately 100%. The area of impact alarm region can be reduced and the number of PZTs needed to cover the same impact monitoring region is reduced by more than a half.

  6. Additional data for a new Theileria sp. from China based on the sequences of ribosomal RNA internal transcribed spacers.

    Science.gov (United States)

    Liu, Junlong; Guan, Guiquan; Liu, Zhijie; Liu, Aihong; Ma, Miling; Bai, Qi; Yin, Hong; Luo, Jianxun

    2013-02-01

    Theileria sinensis was recently isolated and named as an independent Theileria species that infects cattle in China. To date, this parasite has been described based on its morphology, transmission and molecular studies, indicating that it should be classified as a distinct species. To test the validity of this taxon, the two internal transcribed spacers (ITS1 and ITS2) and the 5.8S rRNA gene were cloned and sequenced from three T. sinensis isolates. The complete ITS sequences were compared with those of other Theileria sp. available in GenBank. Phylogenetic analyses based on sequence data for the complete ITS sequences indicate that T. sinensis lies in a distinct clade that is separate from that of T. buffeli/orientalis and T. annulata. Sequence comparisons indicate that different T. sinensis isolates possess unique sizes of ITS1 and ITS2 as well as species-specific nucleotide sequences. This analysis provides new molecular data to support the classification of T. sinensis as a distinct species from other known Theileria spp. based on ITS sequences.

  7. Micro-motion Recognition of Spatial Cone Target Based on ISAR Image Sequences

    Directory of Open Access Journals (Sweden)

    Changyong Shu

    2016-04-01

    Full Text Available The accurate micro-motions recognition of spatial cone target is the foundation of the characteristic parameter acquisition. For this reason, a micro-motion recognition method based on the distinguishing characteristics extracted from the Inverse Synthetic Aperture Radar (ISAR sequences is proposed in this paper. The projection trajectory formula of cone node strong scattering source and cone bottom slip-type strong scattering sources, which are located on the spatial cone target, are deduced under three micro-motion types including nutation, precession, and spinning, and the correctness is verified by the electromagnetic simulation. By comparison, differences are found among the projection of the scattering sources with different micro-motions, the coordinate information of the scattering sources in the Inverse Synthetic Aperture Radar sequences is extracted by the CLEAN algorithm, and the spinning is recognized by setting the threshold value of Doppler. The double observation points Interacting Multiple Model Kalman Filter is used to separate the scattering sources projection of the nutation target or precession target, and the cross point number of each scattering source’s projection track is used to classify the nutation or precession. Finally, the electromagnetic simulation data are used to verify the effectiveness of the micro-motion recognition method.

  8. Molecular Description of Macroorchis spinulosus (Digenea: Nanophyetidae) Based on ITS1 Sequences

    Science.gov (United States)

    Won, Eun Jeong; Kim, Deok-Gyu; Cho, Jaeeun; Jung, Bong-Kwang; Kim, Min-Jae; Yun, Yong Woon; Chai, Jong-Yil; Ryang, Dong Wook

    2016-01-01

    We performed a molecular genetic study on the sequences of 18S ribosomal RNA (ITS1 region) gene in 4-day-old adult worms of Macroorchis spinulosus recovered in mice experimentally infected with metacercariae from crayfish in Jeollanam-do Province, Korea. The metacercariae were round, 180 μm in average diameter, encysted with 2 layers of thick walls, but the stylet on the oral sucker was not clearly seen. The adult flukes were oval shape, and 760-820 μm long and 320-450 μm wide, with anterolateral location of 2 large testes. The phylogenetic tree based on ITS1 sequences of 6 M. spinulosus samples showed their distinguished position from other trematode species in GenBank. The most closely resembled group was Paragonimus spp. which also take crayfish or crabs as the second intermediate host. The present study is the first molecular characterization of M. spinulosus and provided a basis for further phylogenetic studies to compare with other trematode fauna in Korea. PMID:26951989

  9. Molecular phylogenetic analysis of Indonesia Solanaceae based on DNA sequences of internal transcribed spacer region

    Science.gov (United States)

    Hidayat, Topik; Priyandoko, Didik; Islami, Dina Karina; Wardiny, Putri Yunitha

    2016-02-01

    Solanaceae is one of largest family in Angiosperm group with highly diverse in morphological character. In Indonesia, this group of plant is very popular due to its usefulness as food, ornamental and medicinal plants. However, investigation on phylogenetic relationship among the member of this family in Indonesia remains less attention. The purpose of this study was to evaluate the phylogenetics relationship of the family especially distributed in Indonesia. DNA sequences of Internal Transcribed Spacer (ITS) region of 19 species of Solanaceae and three species of outgroup, which belongs to family Convolvulaceae, Apocynaceae, and Plantaginaceae, were isolated, amplified, and sequenced. Phylogenetic tree analysis based on parsimony method was conducted with using data derived from the ITS-1, 5.8S, and ITS-2, separately, and the combination of all. Results indicated that the phylogenetic tree derived from the combined data established better pattern of relationship than separate data. Thus, three major groups were revealed. Group 1 consists of tribe Datureae, Cestreae, and Petunieae, whereas group 2 is member of tribe Physaleae. Group 3 belongs to tribe Solaneae. The use of the ITS region as a molecular markers, in general, support the global Solanaceae relationship that has been previously reported.

  10. Genome signature-based dissection of human gut metagenomes to extract subliminal viral sequences

    Science.gov (United States)

    Ogilvie, Lesley A.; Bowler, Lucas D.; Caplin, Jonathan; Dedi, Cinzia; Diston, David; Cheek, Elizabeth; Taylor, Huw; Ebdon, James E.; Jones, Brian V.

    2013-01-01

    Bacterial viruses (bacteriophages) have a key role in shaping the development and functional outputs of host microbiomes. Although metagenomic approaches have greatly expanded our understanding of the prokaryotic virosphere, additional tools are required for the phage-oriented dissection of metagenomic data sets, and host-range affiliation of recovered sequences. Here we demonstrate the application of a genome signature-based approach to interrogate conventional whole-community metagenomes and access subliminal, phylogenetically targeted, phage sequences present within. We describe a portion of the biological dark matter extant in the human gut virome, and bring to light a population of potentially gut-specific Bacteroidales-like phage, poorly represented in existing virus like particle-derived viral metagenomes. These predominantly temperate phage were shown to encode functions of direct relevance to human health in the form of antibiotic resistance genes, and provided evidence for the existence of putative ‘viral-enterotypes’ among this fraction of the human gut virome. PMID:24036533

  11. Molecular phylogeny and evolution of Scomber (Teleostei: Scombridae) based on mitochondrial and nuclear DNA sequences

    Institute of Scientific and Technical Information of China (English)

    CHENG Jiao; GAO Tianxiang; MIAO Zhenqing; YANAGIMOTO Takashi

    2011-01-01

    A molecular phylogenetic analysis of the genus Scomber was conducted based on mitochondrial (COI, Cyt b and control region) and nuclear (5S rDNA) DNA sequence data in multigene perspective. A variety of phylogenetic analytic methods were used to clarify the current taxonomic classification and to assess phylogenetic relationships and the evolutionary history of this genus. The present study produced a well-resolved phylogeny that strongly supported the monophyly of Scomber. We confirmed that S. japonicus and S. colias were genetically distinct. Although morphologically and ecologically similar to S. colias, the molecular data showed that S. japonicus has a greater molecular affinity with S. australasicus, which conflicts with the traditional taxonomy. This phyiogenetic pattern was corroborated by the mtDNA data, but incompletely by the nuclear DNA data. Phylogenetic concordance between the mitochondrial and nuclear DNA regions for the basal nodes supports an Atlantic origin for Scomber. The present-day geographic ranges of the species were compared with the resultant molecular phylogeny derived from partition Bayesian analyses of the combined data sets to evaluate possible dispersal routes of the genus. The present-day geographic distribution of Scomber species might be best ascribed to multiple dispersal events. In addition, our results suggest that phylogenies derived from multiple genes and long sequences exhibited improved phylogenetic resolution, from which we conclude that the phylogenetic reconstruction is a reliable representation of the evolutionary history of Scomber.

  12. Sequencing-based approach identified three new susceptibility loci for psoriasis.

    Science.gov (United States)

    Sheng, Yujun; Jin, Xin; Xu, Jinhua; Gao, Jinping; Du, Xiaoqing; Duan, Dawei; Li, Bing; Zhao, Jinhua; Zhan, Wenying; Tang, Huayang; Tang, Xianfa; Li, Yang; Cheng, Hui; Zuo, Xianbo; Mei, Junpu; Zhou, Fusheng; Liang, Bo; Chen, Gang; Shen, Changbing; Cui, Hongzhou; Zhang, Xiaoguang; Zhang, Change; Wang, Wenjun; Zheng, Xiaodong; Fan, Xing; Wang, Zaixing; Xiao, Fengli; Cui, Yong; Li, Yingrui; Wang, Jun; Yang, Sen; Xu, Lei; Sun, Liangdan; Zhang, Xuejun

    2014-01-01

    In a previous large-scale exome sequencing analysis for psoriasis, we discovered seven common and low-frequency missense variants within six genes with genome-wide significance. Here we describe an in-depth analysis of noncoding variants based on sequencing data (10,727 cases and 10,582 controls) with replication in an independent cohort of Han Chinese individuals consisting of 4,480 cases and 6,521 controls to identify additional psoriasis susceptibility loci. We confirmed four known psoriasis susceptibility loci (IL12B, IFIH1, ERAP1 and RNF114; 2.30 × 10(-20)≤P≤2.41 × 10(-7)) and identified three new susceptibility loci: 4q24 (NFKB1) at rs1020760 (P=2.19 × 10(-8)), 12p13.3 (CD27-LAG3) at rs758739 (P=4.08 × 10(-8)) and 17q12 (IKZF3) at rs10852936 (P=1.96 × 10(-8)). Two suggestive loci, 3p21.31 and 17q25, are also identified with P<1.00 × 10(-6). The results of this study increase the number of confirmed psoriasis risk loci and provide novel insight into the pathogenesis of psoriasis. PMID:25006012

  13. Prevalence and Sequence-Based Identity of Rumen Fluke in Cattle and Deer in New Caledonia.

    Directory of Open Access Journals (Sweden)

    Laura Cauquil

    Full Text Available An abattoir survey was performed in the French Melanesian archipelago of New Caledonia to determine the prevalence of paramphistomes in cattle and deer and to generate material for molecular typing at species and subspecies level. Prevalence in adult cattle was high at animal level (70% of 387 adult cattle and batch level (81%. Prevalence was lower in calves at both levels (33% of 484 calves, 51% at batch level. Animals from 2 of 7 deer farms were positive for rumen fluke, with animal-level prevalence of 41.4% (29/70 and 47.1% (33/70, respectively. Using ITS-2 sequencing, 3 species of paramphistomes were identified, i.e. Calicophoron calicophorum, Fischoederius elongatus and Orthocoelium streptocoelium. All three species were detected in cattle as well as deer, suggesting the possibility of rumen fluke transmission between the two host species. Based on heterogeneity in ITS-2 sequences, the C. calicophorum population comprises two clades, both of which occur in cattle as well as deer. The results suggest two distinct routes of rumen fluke introduction into this area. This approach has wider applicability for investigations of the origin of rumen fluke infections and for the possibility of parasite transmission at the livestock-wildlife interface.

  14. Prevalence and Sequence-Based Identity of Rumen Fluke in Cattle and Deer in New Caledonia.

    Science.gov (United States)

    Cauquil, Laura; Hüe, Thomas; Hurlin, Jean-Claude; Mitchell, Gillian; Searle, Kate; Skuce, Philip; Zadoks, Ruth

    2016-01-01

    An abattoir survey was performed in the French Melanesian archipelago of New Caledonia to determine the prevalence of paramphistomes in cattle and deer and to generate material for molecular typing at species and subspecies level. Prevalence in adult cattle was high at animal level (70% of 387 adult cattle) and batch level (81%). Prevalence was lower in calves at both levels (33% of 484 calves, 51% at batch level). Animals from 2 of 7 deer farms were positive for rumen fluke, with animal-level prevalence of 41.4% (29/70) and 47.1% (33/70), respectively. Using ITS-2 sequencing, 3 species of paramphistomes were identified, i.e. Calicophoron calicophorum, Fischoederius elongatus and Orthocoelium streptocoelium. All three species were detected in cattle as well as deer, suggesting the possibility of rumen fluke transmission between the two host species. Based on heterogeneity in ITS-2 sequences, the C. calicophorum population comprises two clades, both of which occur in cattle as well as deer. The results suggest two distinct routes of rumen fluke introduction into this area. This approach has wider applicability for investigations of the origin of rumen fluke infections and for the possibility of parasite transmission at the livestock-wildlife interface.

  15. Systematic positions of Lamiophiomis and Paraphlomis (Lamiaceae) based on nuclear and chloroplast sequences

    Institute of Scientific and Technical Information of China (English)

    Yue-Zhi PAN; Li-Qin FANG; Gang HAO; Jie CAI; Xun GONG

    2009-01-01

    Genera Lamiophlomis and Paraphlomis were originally separated from genus Phlomis s.l. on the basis of particular morphological characteristics. However, their relationship was highly contentious, as evidenced by the literature. In the present paper, the systematic positions of Lamiophlomis, Paraphlomis, and their related genera were assessed based on nuclear internal transcribed spacer (ITS) and chloroplast rpl16 and trnL-F sequence data using maximum parsimony (MP) and Bayesian methods. In total, 24 species representing six genera of the ingroup and outgroup were sampled. Analyses of both separate and combined sequence data were conducted to resolve the systematic relationships of these genera. The results reveal that Lamiophlomis is nested within Phlomis sect. Phlomoides and its genetic status is not supported. With the inclusion of Lamiophlomis rotata in sect. Phlomoides, sections Phlomis and Phlomoides of Phlomis were resolved as monophyletic. Paraphlomis was supported as an inde-pendent genus. However, the resolution of its monophyly conflicted between MP and Bayesian analyses, suggesting the need for expended sampling and further evidence.

  16. Wind Farm Dynamic Equivalence Based on the Wind Turbine Output Active Power Sequence Clustering

    Directory of Open Access Journals (Sweden)

    Zhang Ge

    2016-01-01

    Full Text Available In order to reduce the complexity of simulation model containing wind farms in the context of keeping the accuracy static, this paper put forward a kind of Dynamic Equivalence method aiming at making output characteristic of the connecting point of wind farm consistent. Based on the output power sequence of wind turbines, geometric template matching algorithm is used to obtain the characteristic of that power sequence and then Attribute Threshold Clustering Algorithm is used to classify wind turbine. In each cluster, the parameter of wind turbine is made equal according to the principle of constant power output character and then be distinguished according to AMPSO. At last, this paper takes a practical wind farm as an example and respectively simulates the conditions of fault of system side and variation of wind speed, which is used in comparing the output characteristic of detailed model and Equivalent model. Results show that the output characteristic of the connecting point of wind farm keeps consistent after equivalent and that the Clustering Algorithm can reflect the operating characteristics of the wind turbine in the whole moment of any time period. It can also be saw that Equivalent method is reasonable and effective, which has certain value in engineering application.

  17. Phylogenetic relationships of fig wasps pollinating functionally dioecious Ficus based on mitochondrial DNA sequences and morphology.

    Science.gov (United States)

    Weiblen, G D

    2001-04-01

    The obligate mutualism between pollinating fig wasps in the family Agaonidae (Hymenoptera: Chalcidoidea) and Ficus species (Moraceae) is often regarded as an example of co-evolution but little is known about the history of the interaction, and understanding the origin of functionally dioecious fig pollination has been especially difficult. The phylogenetic relationships of fig wasps pollinating functionally dioecious Ficus were inferred from mitochondrial cytochrome oxidase gene sequences (mtDNA) and morphology. Separate and combined analyses indicated that the pollinators of functionally dioecious figs are not monophyletic. However, pollinator relationships were generally congruent with host phylogeny and support a revised classification of Ficus. Ancestral changes in pollinator ovipositor length also correlated with changes in fig breeding systems. In particular, the relative elongation of the ovipositor was associated with the repeated loss of functionally dioecious pollination. The concerted evolution of interacting morphologies may bias estimates of phylogeny based on female head characters, but homoplasy is not so strong in other morphological traits. The lesser phylogenetic utility of morphology than of mtDNA is not due to rampant convergence in morphology but rather to the greater number of potentially informative characters in DNA sequence data; patterns of nucleotide substitution also limit the utility of mtDNA findings. Nonetheless, inferring the ancestral associations of fig pollinators from the best-supported phylogeny provided strong evidence of host conservatism in this highly specialized mutualism.

  18. Safety assessment of Bifidobacterium longum J DM301 based on complete genome sequences

    Institute of Scientific and Technical Information of China (English)

    Yan-Xia Wei; Zhuo-Yang Zhang; Chang Liu; Xiao-Kui Guo; Pradeep K Malakar

    2012-01-01

    AIM: To assess the safety of Bifidobacterium longum (B.longum) JDM301 based on complete genome sequences. METHODS: The complete genome sequences of JDM301 were determined using the GS 20 system. Putative virulence factors, putative antibiotic resistance genes and genes encoding enzymes responsible for harmful metabolites were identified by blast with virulence factors database, antibiotic resistance genes database and genes associated with harmful metabolites in previous reports. Minimum inhibitory concentration of 16 common antimicrobial agents was evaluated by E-test. RESULTS: JDM301 was shown to contain 36 genes associated with antibiotic resistance, 5 enzymes related to harmful metabolites and 162 nonspecific virulence factors mainly associated with transcriptional regulation, adhesion, sugar and amino acid transport. B. longum JDM301 was intrinsically resistant tocipro ciprofloxacin,amikacin, gentamicin and streptomycin and susceptible to vancomycin, amoxicillin, cephalothin, chloramphenicol, erythromycin, ampicillin, cefotaxime, rifampicin, imipenemandtrimethoprim and trimethoprim-sulphamethoxazol. JDM301.JDM301 was moderately resistant to bacitracin, while an earlier study showed that bifidobacteria were susceptible to this antibiotic. A tetracycline resistance gene with the risk of transfer was found in JDM301, which needs to be experimentally validated. CONCLUSION: The safety assessment of JDM301 using information derived from complete bacterial genome will contribute to a wider and deeper insight into the safety of probiotic bacteria.

  19. Cluster based on sequence comparison of homologous proteins of 95 organism species - Gclust Server | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available Gclust Server Cluster based on sequence comparison of homologous proteins of 95 organism species Data detail... Data name Cluster based on sequence comparison of homologous proteins of 95 organism species Description of...e History of This Database Site Policy | Contact Us Cluster based on sequence comparison of homologous proteins of 95 organism species - Gclust Server | LSDB Archive ...

  20. The Utility of Specific Markers Based on ITS2 Sequences for Molecular Identification and Detection of Trichogramma spp.

    Institute of Scientific and Technical Information of China (English)

    LI Zheng-xi; SHEN Zuo-rui

    2002-01-01

    The technology based on specific PCR amplification using internal transcribed spacer 2 of nuclear ribosomal DNA for molecular identification and detection of Trichogramma species was studied. Firstly the ITS2s of six Trichogramma species were cloned and sequenced, and the interspecific sequence variation was analyzed. Secondly the ITS2 regions of six geographical populations of T. dendrolimi were cloned and sequenced, and the intraspecific sequence identity was analyzed. The results show that the interspecific variation and intraspecific similarity of ITS2 sequences are very suitable for designation of specific primers at specieslevel. Screening of specific primers for T. dendrolimi leads to final sensitive and stable diagnostic primers. This system lets non-specialists can not only identify adults (males and females), but also identify eggs in parasitized hosts rapidly and accurately, which is impossible by conventional methods. Further development of this protocol can create a complete set of specific primers for different species of the whole genus Trichogramma.

  1. PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population.

    Directory of Open Access Journals (Sweden)

    Oren E Livne

    2015-03-01

    Full Text Available Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm, a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs, from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost.

  2. An Efficient Genotyping Method in Chicken Based on Genome Reducing and Sequencing

    OpenAIRE

    Liao, Rongrong; Zhen WANG; Chen, Qiang; Tu, Yingying; Chen, Zhenliang; Wang, Qishan; Yang, Changsuo; Zhang, Xiangzhe; Pan, Yuchun

    2015-01-01

    Single nucleotide polymorphisms (SNPs) are essential for identifying the genetic mechanisms of complex traits. In the present study, we applied genotyping by genome reducing and sequencing (GGRS) method to construct a 252-plex sequencing library for SNP discovery and genotyping in chicken. The library was successfully sequenced on an Illumina HiSeq 2500 sequencer with a paired-end pattern; approximately 400 million raw reads were generated, and an average of approximately 1.4 million good rea...

  3. Linkage disequilibrium based genotype calling from low-coverage shotgun sequencing reads

    OpenAIRE

    Wu Yufeng; Hernández Yözen; Dinakar Sanjiv; Kennedy Justin; Duitama Jorge; Măndoiu Ion I

    2011-01-01

    Abstract Background Recent technology advances have enabled sequencing of individual genomes, promising to revolutionize biomedical research. However, deep sequencing remains more expensive than microarrays for performing whole-genome SNP genotyping. Results In this paper we introduce a new multi-locus statistical model and computationally efficient genotype calling algorithms that integrate shotgun sequencing data with linkage disequilibrium (LD) information extracted from reference populati...

  4. Direct-sequence codes based blind synchronization algorithm for UWB systems

    Institute of Scientific and Technical Information of China (English)

    QIAO Yong-wei; L(U) Tie-jun; REN Zhi-yuan

    2009-01-01

    A new blind (non-data-aided) synchronization algorithm based on direct-sequence (DS) codes is proposed for ultra-wideband (UWB) systems. The proposed approach fully exploits prior knowledge of DS codes and bypasses channel estimation. The real-time acquisition is achieved using integrating-and-dumping (I&D) operation and DS codes matching filter. Because of pseudo randomicity and periodicity of DS codes, both the speed and the accuracy of synchronization are improved significantly. A lower bound on the acquisition probability of the proposed approach is also derived. Simulations confirm performance improvement of the proposed algorithm relative to existing alternatives in terms of acquisition probability, normalized mean square error (NMSE), and bit error rate (BER).

  5. Phylogenetic relationship of Podocopida (Ostracoda: Podocopa) based on 18S ribosomal DNA sequences

    Institute of Scientific and Technical Information of China (English)

    YU Na; ZHAO Meiying; CHEN Liqiao; YANG Pin

    2006-01-01

    Nucleotide sequences from 18S rDNA of 11 ostracodes, which represent four suborders and six superfamilies ofpodocopidan, were determined. The phylogenetic relationships were analyzed based on three kinds of methods (maximum-likelihood, maximum-parsimony,and neighbor-joining), and the three topologies gained were basically similar. The results have showed that (1) a monophyletic Podocopida was supported strongly; (2) the phylogenetic relationships of four suborders were (Darwinulocopina plus (Bairdiocopina plus (Cytherocopina plus Cypridocopina))), which indicated that a close relationship between Cytherocopina and Cypridocopina, and Darwinulocopina had separated early from the main podocopinan; (3) Cypridocopinan formed a monophyletic group, among which the phylogenetic relationship of three superfamilies was (Cypridoidea plus (Macrocypridoidea plus Pontocypridoidea)).

  6. VLSI Floorplanning with Boundary Constraints Based on Single-Sequence Representation

    Science.gov (United States)

    Li, Kang; Yu, Juebang; Li, Jian

    In modern VLSI physical design, huge integration scale necessitates hierarchical design and IP reuse to cope with design complexity. Besides, interconnect delay becomes dominant to overall circuit performance. These critical factors require some modules to be placed along designated boundaries to effectively facilitate hierarchical design and interconnection optimization related problems. In this paper, boundary constraints of general floorplan are solved smoothly based on the novel representation Single-Sequence (SS). Necessary and sufficient conditions of rooms along specified boundaries of a floorplan are proposed and proved. By assigning constrained modules to proper boundary rooms, our proposed algorithm always guarantees a feasible SS code with appropriate boundary constraints in each perturbation. Time complexity of the proposed algorithm is O(n). Experimental results on MCNC benchmarks show effectiveness and efficiency of the proposed method.

  7. A modeling method for virtual scene based on multi-view image sequence

    Institute of Scientific and Technical Information of China (English)

    WANG Jia-sheng; TANG Hao-xuan; YANG Tie-dong

    2009-01-01

    Through the analysis and comparison of shortcomings and advantages of existing technologies on object modeling in 3D applications, we propose a new modeling method for virtual scene based on multi-view image sequence to model irregular objects efficiently in 3D application. In 3D scene, this method can get better visual effect by tracking the viewer's real-time perspective position and projecting the photos from different perspectives dynamically. The philosophy of design, the steps of development and some other relevant topics are discussed in details, and the validity of the algorithm is analyzed. The results demonstrate that this method represents more superiority on simulating irregular objects by applying it to the modeling of virtual museum.

  8. Geographic Distribution of Leishmania Species in Ecuador Based on the Cytochrome B Gene Sequence Analysis

    Science.gov (United States)

    Kato, Hirotomo; Gomez, Eduardo A.; Martini-Robles, Luiggi; Muzzio, Jenny; Velez, Lenin; Calvopiña, Manuel; Romero-Alvarez, Daniel; Mimori, Tatsuyuki; Uezato, Hiroshi; Hashiguchi, Yoshihisa

    2016-01-01

    A countrywide epidemiological study was performed to elucidate the current geographic distribution of causative species of cutaneous leishmaniasis (CL) in Ecuador by using FTA card-spotted samples and smear slides as DNA sources. Putative Leishmania in 165 samples collected from patients with CL in 16 provinces of Ecuador were examined at the species level based on the cytochrome b gene sequence analysis. Of these, 125 samples were successfully identified as Leishmania (Viannia) guyanensis, L. (V.) braziliensis, L. (V.) naiffi, L. (V.) lainsoni, and L. (Leishmania) mexicana. Two dominant species, L. (V.) guyanensis and L. (V.) braziliensis, were widely distributed in Pacific coast subtropical and Amazonian tropical areas, respectively. Recently reported L. (V.) naiffi and L. (V.) lainsoni were identified in Amazonian areas, and L. (L.) mexicana was identified in an Andean highland area. Importantly, the present study demonstrated that cases of L. (V.) braziliensis infection are increasing in Pacific coast areas. PMID:27410039

  9. Terabit Nyquist PDM-32QAM signal transmission with training sequence based time domain channel estimation.

    Science.gov (United States)

    Zhang, Fan; Wang, Dan; Ding, Rui; Chen, Zhangyuan

    2014-09-22

    We propose a time domain structure of channel estimation for coherent optical communication systems, which employs training sequence based equalizer and is transparent to arbitrary quadrature amplitude modulation (QAM) formats. Enabled with this methodology, 1.02 Tb/s polarization division multiplexed 32 QAM Nyquist pulse shaping signal with a net spectral efficiency of 7.46 b/s/Hz is transmitted over standard single-mode fiber link with Erbium-doped fiber amplifier only amplification. After 1190 km transmission, the average bit-error rate is lower than the 20% hard-decision forward error correction threshold of 1.5 × 10(-2). The transmission distance can be extended to 1428 km by employing intra-subchannel nonlinear compensation with the digital back-propagation method.

  10. Xylariaceae diversity in Thailand and Philippines, based on rDNA sequencing

    Directory of Open Access Journals (Sweden)

    Natarajan Velmurugan

    2013-07-01

    Full Text Available Twenty three different Xylariaceae Tul. & C. Tul were isolated from samples collected from forest zones of Thailand and Philippines. The fungal samples were characterized based on morphological characteristics and nuclear ITS1-5.8S rDNA-ITS2 region sequences. Ten species of Xylaria, two species of Hypoxylon, Biscogniauxia, Rosellinia and one species of Annulohypoxylon and Entonaema were found. Entonaema the distinctive genus of Xylariaceae, isolated in the study from Thailand samples showed a close relationship withXylaria in phylogenetic tree. Xylariaceous species identified at molecular level showed significant similarity of the morphological characters, such as stromal structure, ascal apex and the germ slit of ascospores. In addition, three species of Arthrinium, two species of Pestalotiopsis were also isolated and characterized in the study. A phylogenetic affinity of Pestalotiopsis with Xylariaceae was found.

  11. The contrasting structures of mismatched DNA sequences containing looped-out bases (bulges) and multiple mismatches (bubbles).

    Science.gov (United States)

    Bhattacharyya, A; Lilley, D M

    1989-09-12

    We have studied the structure and reactivities of two kinds of mismatched DNA sequences--unopposed bases, or bulges, and multiple mismatched pairs of bases. These were generated in a constant sequence environment, in relatively long DNA fragments, using a technique based on heteroduplex formation between sequences cloned into single-stranded M13 phage. The mismatched sequences were studied from two points of view, viz 1. The mobility of the fragments on gel electrophoresis in polyacrylamide was studied in order to examine possible bending of the DNA due to the presence of the mismatch defect. Such bending would constitute a global effect on the conformation of the molecule. 2. Sequences in and around the mismatches were studied using enzyme and chemical probes of DNA structure. This would reveal more local structural effects of the mismatched sequences. We observed that the structures of the bulges and the multiple mismatches appear to be fundamentally different. The bulged sequences exhibited a large gel retardation, consistent with a significant bending of the DNA at the bulge, and whose magnitude depends on the number of mismatched bases. The larger bulges were sensitive to cleavage by single-strand specific nucleases, and modified by diethyl pyrocarbonate (adenines) or osmium tetroxide (thymines) in a non-uniform way, suggesting that the bulges have a precise structure that leads to exposure of some, but not all, of the bases. In contrast the multiple mismatches ('bubbles') cause very much less bending of the DNA fragment in which they occur, and uniform patterns of chemical reactivity along the length of the mismatched sequences, suggesting a less well defined, and possibly flexible, structure. The precise structure of the bulges suggests that such features may be especially significant for recognition by proteins.

  12. PCR-based study of the presence of Y-chromosome sequences in patients with Ullrich-Turner syndrome

    Energy Technology Data Exchange (ETDEWEB)

    Coto, E.; Menendez, M.J.; Lopez-Larrea, C. [Universidad Complutense, Madrid (Spain)] [and others

    1995-07-03

    The presence of Y chromosome sequences in Ullrich-Turner syndrome (UTS) patients has been suggested in previous work. Karyotype analysis estimated at about 60% of patients with a 45, X constitution and molecular analysis (Southern blot analysis with several Y chromosome probes and PCR of specific sequences) identified the presence of Y chromosome material in about 40% of 45, X patients. We have developed a very sensitive, PCR-based method to detect Y specific sequences in DNA from UTS patients. This protocol permits the detection of a single cell carrying a Y sequence among 10{sup 5} Y-negative cells. We studied 18 UTS patients with 4 Y-specific sequences. In 11 patients we detected a positive amplification for at least one Y sequence. The existence of a simple and sensitive method for the detection of Y sequences has important implications for UTS patients, in view of the risk for some of the females carrying Y chromosome material of developing gonadoblastoma and virilization. Additionally, some of the UTS-associated phenotypes, such as renal anomalies, could be correlated with the presence of Y chromosome-specific sequences. 27 refs., 2 figs., 1 tab.

  13. Field-based assessment of landslide hazards resulting from the 2015 Gorkha, Nepal earthquake sequence

    Science.gov (United States)

    Collins, B. D.; Jibson, R.

    2015-12-01

    The M7.8 2015 Gorkha, Nepal earthquake sequence caused thousands of fatalities, destroyed entire villages, and displaced millions of residents. The earthquake sequence also triggered thousands of landslides in the steep Himalayan topography of Nepal and China; these landslides were responsible for hundreds of fatalities and blocked vital roads, trails, and rivers. With the support of USAID's Office of Foreign Disaster Assistance, the U.S. Geological Survey responded to this crisis by providing landslide-hazard expertise to Nepalese agencies and affected villages. Assessments of landslide hazards following earthquakes are essential to identify vulnerable populations and infrastructure, and inform government agencies working on rebuilding and mitigation efforts. However, assessing landslide hazards over an entire earthquake-affected region (in Nepal, estimated to be ~30,000 km2), and in exceedingly steep, inaccessible topography presents a number of logistical challenges. We focused the scope of our assessment by conducting helicopter- and ground-based landslide assessments in 12 priority areas in central Nepal identified a priori from satellite photo interpretation performed in conjunction with an international consortium of remote sensing experts. Our reconnaissance covered 3,200 km of helicopter flight path, extending over an approximate area of 8,000 km2. During our field work, we made 17 site-specific assessments and provided landslide hazard information to both villages and in-country agencies. Upon returning from the field, we compiled our observations and further identified and assessed 74 river-blocking landslide dams, 12% of which formed impoundments larger than 1,000 m2 in surface area. These assessments, along with more than 11 hours of helicopter-based video, and an overview of hazards expected during the 2015 summer monsoon have been publically released (http://dx.doi.org/10.3133/ofr20151142) for use by in-country and international agencies.

  14. A 502-Base Free-Solution Electrophoretic DNA Sequencing Method Using End-Attached Wormlike Micelles.

    Science.gov (United States)

    Istivan, Stephen B; Bishop, Daniel K; Jones, Angela L; Grosser, Shane T; Schneider, James W

    2015-11-17

    We demonstrate that the use of wormlike nonionic micelles as drag-tags in end-labeled free-solution electrophoresis ("micelle-ELFSE") provides single-base resolution of Sanger sequencing products up to 502 bases in length, a nearly 2-fold improvement over reported ELFSE separations. "CiEj" running buffers containing 48 mM C12E5, 6 mM C10E5, and 3 M urea (32.5 °C) form wormlike micelles that provide a drag equivalent to an uncharged DNA fragment with a length (α) of 509 bases (effective Rh = 27 nm). Runtime in a 40 cm capillary (30 kV) was 35 min for elution of all products down to the 26-base primer. We also show that smaller Triton X-100 micelles give a read length of 103 bases in a 4 min run, so that a combined analysis of the Sanger products using the two buffers in separate capillaries could be completed in 14 min for the full range of lengths. A van Deemter analysis shows that resolution is limited by diffusion-based peak broadening and wall adsorption. Effects of drag-tag polydispersity are not observed, despite the inherent polydispersity of the wormlike micelles. We ascribe this to a stochastic size-sampling process that occurs as micelle size fluctuates rapidly during the runtime. A theoretical model of the process suggests that fluctuations occur with a time scale less than 10 ms, consistent with the monomer exchange process in nonionic micelles. The CiEj buffer has a low viscosity (2.7 cP) and appears to be semidilute in micelle concentration. The large drag-tag size of the CiEj buffers leads to steric segregation of the DNA and tag for short fragments and attendant mobility shifts.

  15. A 502-Base Free-Solution Electrophoretic DNA Sequencing Method Using End-Attached Wormlike Micelles.

    Science.gov (United States)

    Istivan, Stephen B; Bishop, Daniel K; Jones, Angela L; Grosser, Shane T; Schneider, James W

    2015-11-17

    We demonstrate that the use of wormlike nonionic micelles as drag-tags in end-labeled free-solution electrophoresis ("micelle-ELFSE") provides single-base resolution of Sanger sequencing products up to 502 bases in length, a nearly 2-fold improvement over reported ELFSE separations. "CiEj" running buffers containing 48 mM C12E5, 6 mM C10E5, and 3 M urea (32.5 °C) form wormlike micelles that provide a drag equivalent to an uncharged DNA fragment with a length (α) of 509 bases (effective Rh = 27 nm). Runtime in a 40 cm capillary (30 kV) was 35 min for elution of all products down to the 26-base primer. We also show that smaller Triton X-100 micelles give a read length of 103 bases in a 4 min run, so that a combined analysis of the Sanger products using the two buffers in separate capillaries could be completed in 14 min for the full range of lengths. A van Deemter analysis shows that resolution is limited by diffusion-based peak broadening and wall adsorption. Effects of drag-tag polydispersity are not observed, despite the inherent polydispersity of the wormlike micelles. We ascribe this to a stochastic size-sampling process that occurs as micelle size fluctuates rapidly during the runtime. A theoretical model of the process suggests that fluctuations occur with a time scale less than 10 ms, consistent with the monomer exchange process in nonionic micelles. The CiEj buffer has a low viscosity (2.7 cP) and appears to be semidilute in micelle concentration. The large drag-tag size of the CiEj buffers leads to steric segregation of the DNA and tag for short fragments and attendant mobility shifts. PMID:26455271

  16. Combining sequence-based prediction methods and circular dichroism and infrared spectroscopic data to improve protein secondary structure determinations

    Directory of Open Access Journals (Sweden)

    Lees Jonathan G

    2008-01-01

    Full Text Available Abstract Background A number of sequence-based methods exist for protein secondary structure prediction. Protein secondary structures can also be determined experimentally from circular dichroism, and infrared spectroscopic data using empirical analysis methods. It has been proposed that comparable accuracy can be obtained from sequence-based predictions as from these biophysical measurements. Here we have examined the secondary structure determination accuracies of sequence prediction methods with the empirically determined values from the spectroscopic data on datasets of proteins for which both crystal structures and spectroscopic data are available. Results In this study we show that the sequence prediction methods have accuracies nearly comparable to those of spectroscopic methods. However, we also demonstrate that combining the spectroscopic and sequences techniques produces significant overall improvements in secondary structure determinations. In addition, combining the extra information content available from synchrotron radiation circular dichroism data with sequence methods also shows improvements. Conclusion Combining sequence prediction with experimentally determined spectroscopic methods for protein secondary structure content significantly enhances the accuracy of the overall results obtained.

  17. Final Technical Report on the Genome Sequence DataBase (GSDB): DE-FG03 95 ER 62062 September 1997-September 1999

    Energy Technology Data Exchange (ETDEWEB)

    Harger, Carol A.

    1999-10-28

    Since September 1997 NCGR has produced two web-based tools for researchers to use to access and analyze data in the Genome Sequence DataBase (GSDB). These tools are: Sequence Viewer, a nucleotide sequence and annotation visualization tool, and MAR-Finder, a tool that predicts, base upon statistical inferences, the location of matrix attachment regions (MARS) within a nucleotide sequence. [The annual report for June 1996 to August 1997 is included as an attachment to this final report.

  18. Final Technical Report on the Genome Sequence DataBase (GSDB): DE-FG03 95 ER 62062 September 1997-September 1999; FINAL

    International Nuclear Information System (INIS)

    Since September 1997 NCGR has produced two web-based tools for researchers to use to access and analyze data in the Genome Sequence DataBase (GSDB). These tools are: Sequence Viewer, a nucleotide sequence and annotation visualization tool, and MAR-Finder, a tool that predicts, base upon statistical inferences, the location of matrix attachment regions (MARS) within a nucleotide sequence.[The annual report for June 1996 to August 1997 is included as an attachment to this final report.

  19. Online Detection Approach for Rectangle Ceramic Tile Based on Sequenced Scenery Image

    Directory of Open Access Journals (Sweden)

    Yang Lei

    2013-06-01

    Full Text Available Image based ceramic tile detection is a way to labor liberation in the production process of ceramic tile. Shapes of ceramic tiles studied in this study are rectangle with different sizes. Many existed researches are based on a situation that only a piece of tile goes through special rail one time, resulting in one or less piece of tile hold in the image from CCD sensor. But in fact, multiple tiles with the same sizes run in a row simultaneously at most factories’ rails, and a 'scenery' image is obtained from CCD sensor. And the image processing method based on close-up images is not satisfied in such cases. To detect different rectangle ceramic tiles online according to a sequence of scenery images, this study provide a vector corner method to decide the rectangle tiles with known size information, and a valley detection method via key-image-frames strategy to distinguish the first row in images. Finally, our Online Approach for Rectangle Tile Detection (OARTD was embedded into a detection system and applied to a factory; testing results validated its good performance. Indeed, the use of such an automatic system, to control a tile plant for shape classifying has a good prospect.

  20. Blind Demodulation of Chaotic Direct Sequence Spread Spectrum Signals Based on Particle Filters

    Directory of Open Access Journals (Sweden)

    Yimeng Zhang

    2013-09-01

    Full Text Available Applying the particle filter (PF technique, this paper proposes a PF-based algorithm to blindly demodulate the chaotic direct sequence spread spectrum (CDS-SS signals under the colored or non-Gaussian noises condition. To implement this algorithm, the PFs are modified by (i the colored or non-Gaussian noises are formulated by autoregressive moving average (ARMA models, and then the parameters that model the noises are included in the state vector; (ii the range-differentiating factor is imported into the intruder’s chaotic system equation. Since the range-differentiating factor is able to make the inevitable chaos fitting error advantageous based on the chaos fitting method, thus the CDS-SS signals can be demodulated according to the range of the estimated message. Simulations show that the proposed PF-based algorithm can obtain a good bit-error rate performance when extracting the original binary message from the CDS-SS signals without any knowledge of the transmitter’s chaotic map, or initial value, even when colored or non-Gaussian noises exist.

  1. Example-Based Sequence Diagrams to Colored Petri Nets Transformation Using Heuristic Search

    Science.gov (United States)

    Kessentini, Marouane; Bouchoucha, Arbi; Sahraoui, Houari; Boukadoum, Mounir

    Dynamic UML models like sequence diagrams (SD) lack sufficient formal semantics, making it difficult to build automated tools for their analysis, simulation and validation. A common approach to circumvent the problem is to map these models to more formal representations. In this context, many works propose a rule-based approach to automatically translate SD into colored Petri nets (CPN). However, finding the rules for such SD-to-CPN transformations may be difficult, as the transformation rules are sometimes difficult to define and the produced CPN may be subject to state explosion. We propose a solution that starts from the hypothesis that examples of good transformation traces of SD-to-CPN can be useful to generate the target model. To this end, we describe an automated SD-to-CPN transformation method which finds the combination of transformation fragments that best covers the SD model, using heuristic search in a base of examples. To achieve our goal, we combine two algorithms for global and local search, namely Particle Swarm Optimization (PSO) and Simulated Annealing (SA). Our empirical results show that the new approach allows deriving the sought CPNs with at least equal performance, in terms of size and correctness, to that obtained by a transformation rule-based tool.

  2. Face Recognition from Still Images to Video Sequences: A Local-Feature-Based Framework

    Directory of Open Access Journals (Sweden)

    Chen Shaokang

    2011-01-01

    Full Text Available Although automatic faces recognition has shown success for high-quality images under controlled conditions, for video-based recognition it is hard to attain similar levels of performance. We describe in this paper recent advances in a project being undertaken to trial and develop advanced surveillance systems for public safety. In this paper, we propose a local facial feature based framework for both still image and video-based face recognition. The evaluation is performed on a still image dataset LFW and a video sequence dataset MOBIO to compare 4 methods for operation on feature: feature averaging (Avg-Feature, Mutual Subspace Method (MSM, Manifold to Manifold Distance (MMS, and Affine Hull Method (AHM, and 4 methods for operation on distance on 3 different features. The experimental results show that Multi-region Histogram (MRH feature is more discriminative for face recognition compared to Local Binary Patterns (LBP and raw pixel intensity. Under the limitation on a small number of images available per person, feature averaging is more reliable than MSM, MMD, and AHM and is much faster. Thus, our proposed framework—averaging MRH feature is more suitable for CCTV surveillance systems with constraints on the number of images and the speed of processing.

  3. Effects of coal ash pollution on the genetic diversity of Brachionus calyciflorus based on rDNA ITS sequences

    OpenAIRE

    Xinli Wen; Xianling Xiang; Xin Hu; Yinghao Xue; Yilong Xi; Gen Zhang

    2010-01-01

    In this study, rDNA ITS sequences were analyzed to compare the genetic diversity of Brachionus calyciflorus from the coal ash contaminated (Lake Hui) and two uncontaminated lakes (Lake Tingtang and Lake Fengming). The results showed that two sibling species in Brachionus calyciflorus species complex were defined in both Lake Tingtang and Lake Fengming, but only one sibling species was found in Lake Hui. The coal ash pollution decreased the number of sibling species. Based on the sequences of ...

  4. PlutoF—a Web Based Workbench for Ecological and Taxonomic Research, with an Online Implementation for Fungal ITS Sequences

    OpenAIRE

    Kessy Abarenkov; Leho Tedersoo; R Henrik Nilsson; Kai Vellak; Irja Saar; Vilmar Veldre; Erast Parmasto; Marko Prous; Anne Aan; Margus Ots; Olavi Kurina; Ivika Ostonen; Janno Jõgeva; Siim Halapuu; Kadri Põldmaa

    2010-01-01

    DNA sequences accumulating in the International Nucleotide Sequence Databases (INSD) form a rich source of information for taxonomic and ecological meta-analyses. However, these databases include many erroneous entries, and the data itself is poorly annotated with metadata, making it difficult to target and extract entries of interest with any degree of precision. Here we describe the web-based workbench PlutoF, which is designed to bridge the gap between the needs of contemporary research in...

  5. Prosystemin identification in Amaranthus cruentus and A. hypochondriacus x hybridus based on data mining and sequence alignment

    OpenAIRE

    Žiarovska Jana; Zahorsky Michal; Hricova Andrea

    2016-01-01

    Bioinformatic tool have became an inevitable part of molecular genetic research in many applications. In the present study, an in silico based approach was used to find conservative region of currently known prosystemin gene sequences and its PCR identification was performed in Amaranthus cruentus and Amaranthus hypochondriacus x hybridus. Identification results were veryfied by direct sequencing of obtained amplicons. For both of analysed species, the pros...

  6. Ambiguous allele combinations in HLA Class I and Class II sequence-based typing: when precise nucleotide sequencing leads to imprecise allele identification

    Directory of Open Access Journals (Sweden)

    Larsen Paula

    2004-09-01

    Full Text Available Abstract Sequence-based typing (SBT is one of the most comprehensive methods utilized for HLA typing. However, one of the inherent problems with this typing method is the interpretation of ambiguous allele combinations which occur when two or more different allele combinations produce identical sequences. The purpose of this study is to investigate the probability of this occurrence. We performed HLA-A,-B SBT for Exons 2 and 3 on 676 donors. Samples were analyzed with a capillary sequencer. The racial distribution of the donors was as follows: 615-Caucasian, 13-Asian, 23-African American, 17-Hispanic and 8-Unknown. 672 donors were analyzed for HLA-A locus ambiguities and 666 donors were analyzed for HLA-B locus ambiguities. At the HLA-A locus a total of 548 total ambiguous allele combinations were identified (548/1344 = 41%. Most (278/548 = 51% of these ambiguities were due to the fact that Exon 4 analysis was not performed. At the HLA-B locus 322 total ambiguous allele combinations were found (322/1332 = 24%. The HLA-B*07/08/15/27/35/44 antigens, common in Caucasians, produced a large portion of the ambiguities (279/322 = 87%. A large portion of HLA-A and B ambiguous allele combinations can be addressed by utilizing a group-specific primary amplification approach to produce an unambiguous homozygous sequence. Therefore, although the prevalence of ambiguous allele combinations is high, if the resolution of these ambiguities is clinically warranted, methods exist to compensate for this problem.

  7. Transcriptome walking: a laboratory-oriented GUI-based approach to mRNA identification from deep-sequenced data

    Directory of Open Access Journals (Sweden)

    French Andrew S

    2012-12-01

    Full Text Available Abstract Background Deep sequencing technology provides efficient and economical production of large numbers of randomly positioned, relatively short, estimates of base identities in DNA molecules. Application of this technology to mRNA samples allows rapid examination of the molecular genetic environment in individual cells or tissues, the transcriptome. However, assembly of such short sequences into complete mRNA creates a challenge that limits the usefulness of the technology, particularly when no, or limited, genomic data is available. Several approaches to this problem have been developed, but there is still no general method to rapidly obtain an mRNA sequence from deep sequence data when a specific molecule, or family of molecules, are of interest. A frequent requirement is to identify specific mRNA molecules from tissues that are being investigated by methods such as electrophysiology, immunocytology and pharmacology. To be widely useful, any approach must be relatively simple to use in the laboratory by operators without extensive statistical or bioinformatics knowledge, and with readily available hardware. Findings An approach was developed that allows de novo assembly of individual mRNA sequences in two linked stages: sequence discovery and sequence completion. Both stages rely on computer assisted, Graphical User Interface (GUI-guided, user interaction with the data, but proceed relatively efficiently once discovery is complete. The method grows a discovered sequence by repeated passes through the complete raw data in a series of steps, and is hence termed ‘transcriptome walking’. All of the operations required for transcriptome analysis are combined in one program that presents a relatively simple user interface and runs on a standard desktop, or laptop computer, but takes advantage of multi-core processors, when available. Complete mRNA sequence identifications usually require less than 24 hours. This approach has already

  8. GntR family of regulators in Mycobacterium smegmatis: a sequence and structure based characterization

    Directory of Open Access Journals (Sweden)

    Ranjan Akash

    2007-08-01

    Full Text Available Abstract Background Mycobacterium smegmatis is fast growing non-pathogenic mycobacteria. This organism has been widely used as a model organism to study the biology of other virulent and extremely slow growing species like Mycobacterium tuberculosis. Based on the homology of the N-terminal DNA binding domain, the recently sequenced genome of M. smegmatis has been shown to possess several putative GntR regulators. A striking characteristic feature of this family of regulators is that they possess a conserved N-terminal DNA binding domain and a diverse C-terminal domain involved in the effector binding and/or oligomerization. Since the physiological role of these regulators is critically dependent upon effector binding and operator sites, we have analysed and classified these regulators into their specific subfamilies and identified their potential binding sites. Results The sequence analysis of M. smegmatis putative GntRs has revealed that FadR, HutC, MocR and the YtrA-like regulators are encoded by 45, 8, 8 and 1 genes respectively. Further out of 45 FadR-like regulators, 19 were classified into the FadR group and 26 into the VanR group. All these proteins showed similar secondary structural elements specific to their respective subfamilies except MSMEG_3959, which showed additional secondary structural elements. Using the reciprocal BLAST searches, we further identified the orthologs of these regulators in Bacillus subtilis and other mycobacteria. Since the expression of many regulators is auto-regulatory, we have identified potential operator sites for a number of these GntR regulators by analyzing the upstream sequences. Conclusion This study helps in extending the annotation of M. smegmatis GntR proteins. It identifies the GntR regulators of M. smegmatis that could serve as a model for studying orthologous regulators from virulent as well as other saprophytic mycobacteria. This study also sheds some light on the nucleotide preferences in the

  9. Analysis of genetic relationship among Indonesian native chicken breeds based on 335 D-loop sequences

    Directory of Open Access Journals (Sweden)

    Sri Sulandari

    2008-12-01

    Full Text Available he Mitochondrial DNA (mtDNA D-loop segment was PCR amplified and subsequently sequenced for a total of 335 individuals from Indonesian native chicken. The individuals were drawn from sixteen populations of native chicken and three individuals of green jungle fowls (Gallus varius. Indonesian native chicken populations were: Pelung Sembawa, PL (n = 18, Pelung Cianjur, PLC (n = 29 and Arab Silver, ARS (n=30, Cemani, CM (n = 32, Gaok, GA (n = 7, Kedu Hitam, KDH (n = 11, Wareng, T & TW (n = 10, Cemani, CMP (n = 2, Kedu, KD (n=26, Kedu Putih, KDP (n = 15, Sentul Jatiwangi, STJ (n = 27, Ayam Kate, KT (n = 29, Ayam Sentul, STC (n = 15, Arab Golden, ARG (n = 26, Ayam Merawang, MR (n = 28, Kedu Putih Jatiwangi, KDPJ (n=6 and Kapas, KPS (n = 21. Green jungle fowls were: two individuals from Flores island (FL5 and FL57 and one individual (BD42 from Sumbawa island. The sequences of the first 530 nucleotides were used for analysis. Eighty two haplotypes were identified from 78 polymorphic sites for the 335 individuals. Seventy nine haplotypes were identified in native chicken from 57 polymorphic sites while three were of jungle fowls. Phylogenetic analysis indicates that Indonesian native chicken can be grouped into five clades (Clade I, II, IIIc, IIId and IV of the previously identified seven clades (Clade I, II, IIIa, IIIb, IIIc, IIId and IV in Asian domestic chicken. Haplotypes CM10 and CM32 fall to a different category while STC12 is also on its own. Interestingly STC12 clusters together with Gallus gallus gallus (GenBank accession No. SULANDARI et al. Analysis of genetic relationship among Indonesian native chicken breeds based on 335 D-loop sequences 296 AB007720. When CM10 (same as CM14, CM32 and STC12 were removed, 77 haplotypes of domestic chicken were identified from 53 polymorphic sites. All the green jungle fowls are clustered to one clade of their own. The clades of domestic chicken are: Clade I which has three haplotypes, Clade II has 52

  10. Knowledge-based decision support for Space Station assembly sequence planning

    Science.gov (United States)

    1991-04-01

    A complete Personal Analysis Assistant (PAA) for Space Station Freedom (SSF) assembly sequence planning consists of three software components: the system infrastructure, intra-flight value added, and inter-flight value added. The system infrastructure is the substrate on which software elements providing inter-flight and intra-flight value-added functionality are built. It provides the capability for building representations of assembly sequence plans and specification of constraints and analysis options. Intra-flight value-added provides functionality that will, given the manifest for each flight, define cargo elements, place them in the National Space Transportation System (NSTS) cargo bay, compute performance measure values, and identify violated constraints. Inter-flight value-added provides functionality that will, given major milestone dates and capability requirements, determine the number and dates of required flights and develop a manifest for each flight. The current project is Phase 1 of a projected two phase program and delivers the system infrastructure. Intra- and inter-flight value-added were to be developed in Phase 2, which has not been funded. Based on experience derived from hundreds of projects conducted over the past seven years, ISX developed an Intelligent Systems Engineering (ISE) methodology that combines the methods of systems engineering and knowledge engineering to meet the special systems development requirements posed by intelligent systems, systems that blend artificial intelligence and other advanced technologies with more conventional computing technologies. The ISE methodology defines a phased program process that begins with an application assessment designed to provide a preliminary determination of the relative technical risks and payoffs associated with a potential application, and then moves through requirements analysis, system design, and development.

  11. Hellbender genome sequences shed light on genomic expansion at the base of crown salamanders.

    Science.gov (United States)

    Sun, Cheng; Mueller, Rachel Lockridge

    2014-07-01

    Among animals, genome sizes range from 20 Mb to 130 Gb, with 380-fold variation across vertebrates. Most of the largest vertebrate genomes are found in salamanders, an amphibian clade of 660 species. Thus, salamanders are an important system for studying causes and consequences of genomic gigantism. Previously, we showed that plethodontid salamander genomes accumulate higher levels of long terminal repeat (LTR) retrotransposons than do other vertebrates, although the evolutionary origins of such sequences remained unexplored. We also showed that some salamanders in the family Plethodontidae have relatively slow rates of DNA loss through small insertions and deletions. Here, we present new data from Cryptobranchus alleganiensis, the hellbender. Cryptobranchus and Plethodontidae span the basal phylogenetic split within salamanders; thus, analyses incorporating these taxa can shed light on the genome of the ancestral crown salamander lineage, which underwent expansion. We show that high levels of LTR retrotransposons likely characterize all crown salamanders, suggesting that disproportionate expansion of this transposable element (TE) class contributed to genomic expansion. Phylogenetic and age distribution analyses of salamander LTR retrotransposons indicate that salamanders' high TE levels reflect persistence and diversification of ancestral TEs rather than horizontal transfer events. Finally, we show that relatively slow DNA loss rates through small indels likely characterize all crown salamanders, suggesting that a decreased DNA loss rate contributed to genomic expansion at the clade's base. Our identification of shared genomic features across phylogenetically distant salamanders is a first step toward identifying the evolutionary processes underlying accumulation and persistence of high levels of repetitive sequence in salamander genomes. PMID:25115007

  12. Classifying Genomic Sequences by Sequence Feature Analysis

    Institute of Scientific and Technical Information of China (English)

    Zhi-Hua Liu; Dian Jiao; Xiao Sun

    2005-01-01

    Traditional sequence analysis depends on sequence alignment. In this study, we analyzed various functional regions of the human genome based on sequence features, including word frequency, dinucleotide relative abundance, and base-base correlation. We analyzed the human chromosome 22 and classified the upstream,exon, intron, downstream, and intergenic regions by principal component analysis and discriminant analysis of these features. The results show that we could classify the functional regions of genome based on sequence feature and discriminant analysis.

  13. Application of genotyping-by-sequencing on semiconductor sequencing platforms: A comparison of genetic and reference-based marker ordering in barley

    Science.gov (United States)

    The rapid development of next generation sequencing platforms has enabled the use of sequencing for routine genotyping across a range of genetics studies and breeding applications. Genotyping-by-sequencing (GBS), a low-cost, reduced representation sequencing method, is becoming a common approach fo...

  14. Diversity analysis in Cannabis sativa based on large-scale development of expressed sequence tag-derived simple sequence repeat markers.

    Directory of Open Access Journals (Sweden)

    Chunsheng Gao

    Full Text Available Cannabis sativa L. is an important economic plant for the production of food, fiber, oils, and intoxicants. However, lack of sufficient simple sequence repeat (SSR markers has limited the development of cannabis genetic research. Here, large-scale development of expressed sequence tag simple sequence repeat (EST-SSR markers was performed to obtain more informative genetic markers, and to assess genetic diversity in cannabis (Cannabis sativa L.. Based on the cannabis transcriptome, 4,577 SSRs were identified from 3,624 ESTs. From there, a total of 3,442 complementary primer pairs were designed as SSR markers. Among these markers, trinucleotide repeat motifs (50.99% were the most abundant, followed by hexanucleotide (25.13%, dinucleotide (16.34%, tetranucloetide (3.8%, and pentanucleotide (3.74% repeat motifs, respectively. The AAG/CTT trinucleotide repeat (17.96% was the most abundant motif detected in the SSRs. One hundred and seventeen EST-SSR markers were randomly selected to evaluate primer quality in 24 cannabis varieties. Among these 117 markers, 108 (92.31% were successfully amplified and 87 (74.36% were polymorphic. Forty-five polymorphic primer pairs were selected to evaluate genetic diversity and relatedness among the 115 cannabis genotypes. The results showed that 115 varieties could be divided into 4 groups primarily based on geography: Northern China, Europe, Central China, and Southern China. Moreover, the coefficient of similarity when comparing cannabis from Northern China with the European group cannabis was higher than that when comparing with cannabis from the other two groups, owing to a similar climate. This study outlines the first large-scale development of SSR markers for cannabis. These data may serve as a foundation for the development of genetic linkage, quantitative trait loci mapping, and marker-assisted breeding of cannabis.

  15. Diversity analysis in Cannabis sativa based on large-scale development of expressed sequence tag-derived simple sequence repeat markers.

    Science.gov (United States)

    Gao, Chunsheng; Xin, Pengfei; Cheng, Chaohua; Tang, Qing; Chen, Ping; Wang, Changbiao; Zang, Gonggu; Zhao, Lining

    2014-01-01

    Cannabis sativa L. is an important economic plant for the production of food, fiber, oils, and intoxicants. However, lack of sufficient simple sequence repeat (SSR) markers has limited the development of cannabis genetic research. Here, large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed to obtain more informative genetic markers, and to assess genetic diversity in cannabis (Cannabis sativa L.). Based on the cannabis transcriptome, 4,577 SSRs were identified from 3,624 ESTs. From there, a total of 3,442 complementary primer pairs were designed as SSR markers. Among these markers, trinucleotide repeat motifs (50.99%) were the most abundant, followed by hexanucleotide (25.13%), dinucleotide (16.34%), tetranucloetide (3.8%), and pentanucleotide (3.74%) repeat motifs, respectively. The AAG/CTT trinucleotide repeat (17.96%) was the most abundant motif detected in the SSRs. One hundred and seventeen EST-SSR markers were randomly selected to evaluate primer quality in 24 cannabis varieties. Among these 117 markers, 108 (92.31%) were successfully amplified and 87 (74.36%) were polymorphic. Forty-five polymorphic primer pairs were selected to evaluate genetic diversity and relatedness among the 115 cannabis genotypes. The results showed that 115 varieties could be divided into 4 groups primarily based on geography: Northern China, Europe, Central China, and Southern China. Moreover, the coefficient of similarity when comparing cannabis from Northern China with the European group cannabis was higher than that when comparing with cannabis from the other two groups, owing to a similar climate. This study outlines the first large-scale development of SSR markers for cannabis. These data may serve as a foundation for the development of genetic linkage, quantitative trait loci mapping, and marker-assisted breeding of cannabis.

  16. Unveiling distribution patterns of freshwater phytoplankton by a next generation sequencing based approach.

    Directory of Open Access Journals (Sweden)

    Alexander Eiler

    Full Text Available The recognition and discrimination of phytoplankton species is one of the foundations of freshwater biodiversity research and environmental monitoring. This step is frequently a bottleneck in the analytical chain from sampling to data analysis and subsequent environmental status evaluation. Here we present phytoplankton diversity data from 49 lakes including three seasonal surveys assessed by next generation sequencing (NGS of 16S ribosomal RNA chloroplast and cyanobacterial gene amplicons and also compare part of these datasets with identification based on morphology. Direct comparison of NGS to microscopic data from three time-series showed that NGS was able to capture the seasonality in phytoplankton succession as observed by microscopy. Still, the PCR-based approach was only semi-quantitative, and detailed NGS and microscopy taxa lists had only low taxonomic correspondence. This is probably due to, both, methodological constraints and current discrepancies in taxonomic frameworks. Discrepancies included Euglenophyta and Heterokonta that were scarce in the NGS but frequently detected by microscopy and Cyanobacteria that were in general more abundant and classified with high resolution by NGS. A deep-branching taxonomically unclassified cluster was frequently detected by NGS but could not be linked to any group identified by microscopy. NGS derived phytoplankton composition differed significantly among lakes with different trophic status, showing that our approach can resolve phytoplankton communities at a level relevant for ecosystem management. The high reproducibility and potential for standardization and parallelization makes our NGS approach an excellent candidate for simultaneous monitoring of prokaryotic and eukaryotic phytoplankton in inland waters.

  17. Molecular Detection of Verticillium albo-atrum by PCR Based on Its Sequences

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

    We developed one species-specific PCR assays for rapid and accurate detection of the pathogenic fungi Verticilliumalbo-atrum in diseased plant tissues and soil. Based on differences in internal transcribed spacer (ITS) sequences ofVerticilliun spp., a pair of species-specific primers, Vaa1/Vaa2, was synthesized. After screening 17 isolates of V. albo-atrum, 121 isolates from the Ascomycota, Basidiomycota, Deuteromycota, and Oomycota, the Vaa1/Vaa2 primers amplifiedonly a single PCR band of approximately 330 bp from V. albo-atrum. The detection sensitivity with primers Vaa1/Vaa2 was10 fg of genomic DNA. Using ITS1/ITS4 as the first-round primers, combined with Vaa1/Vaa2, the nested PCR procedureswere developed, and the detection sensitivity increased 1 000-fold to 10 ag. The detection sensitivity for the soil pathogenswas 100-conidiag-1 soil. The PCR-based methods developed here could simplify both plant disease diagnosis and pathogen monitoring as well as guide plant disease management.

  18. Eliciting neutralizing antibodies with gp120 outer domain constructs based on M-group consensus sequence.

    Science.gov (United States)

    Qin, Yali; Banasik, Marisa; Kim, SoonJeung; Penn-Nicholson, Adam; Habte, Habtom H; LaBranche, Celia; Montefiori, David C; Wang, Chong; Cho, Michael W

    2014-08-01

    One strategy being evaluated for HIV-1 vaccine development is focusing immune responses towards neutralizing epitopes on the gp120 outer domain (OD) by removing the immunodominant, but non-neutralizing, inner domain. Previous OD constructs have not elicited strong neutralizing antibodies (nAbs). We constructed two immunogens, a monomeric gp120-OD and a trimeric gp120-OD×3, based on an M group consensus sequence (MCON6). Their biochemical and immunological properties were compared with intact gp120. Results indicated better preservation of critical neutralizing epitopes on gp120-OD×3. In contrast to previous studies, our immunogens induced potent, cross-reactive nAbs in rabbits. Although nAbs primarily targeted Tier 1 viruses, they exhibited significant breadth. Epitope mapping analyses indicated that nAbs primarily targeted conserved V3 loop elements. Although the potency and breadth of nAbs were similar for all three immunogens, nAb induction kinetics indicated that gp120-OD×3 was superior to gp120-OD, suggesting that gp120-OD×3 is a promising prototype for further gp120 OD-based immunogen development. PMID:25046154

  19. Design and implementation of microcontroller-based automatic sequence counting and switching system

    Directory of Open Access Journals (Sweden)

    Joshua ABOLARINWA

    2015-05-01

    Full Text Available Technological advancement and its influence on human being have been on the increase in recent time. Major areas of such influence, include monitoring and control activities. In order to keep track of human movement in and out of a particular building, there is the need for an automatic counting system. Therefore, in this paper, we present the design and implementation of a microcontroller-based automatic sequence counting and switching system. This system was designed and developed to save cost, time, energy, and to achieve seamless control in the event of switching on or off of electrical appliances within a building. Top-down modular design approach was used in conjunction with the versatility of microcontroller. The system is able to monitor, sequentially count the number of entry and exit of people through an entrance, afterwards, automatically control any electrical device connected to it. From various tests and measurements obtained, there are comparative benefits derived from the deployment of this system in terms of simplicity and accuracy over similar system that is not microcontroller-based. Therefore, this system can be deployed at commercial quantity with wide range of applications in homes, offices and other public places.

  20. Rapid Detection and Identification of Infectious Pathogens Based on High-throughput Sequencing

    Institute of Scientific and Technical Information of China (English)

    Pei-Xiang Ni; Xin Ding; Yin-Xin Zhang; Xue Yao; Rui-Xue Sun; Peng Wang; Yan-Ping Gong

    2015-01-01

    Background:The dilemma of pathogens identification in patients with unidentified clinical symptoms such as fever of unknown origin exists,which not only poses a challenge to both the diagnostic and therapeutic process by itself,but also to expert physicians.Methods:In this report,we have attempted to increase the awareness of unidentified pathogens by developing a method to investigate hitherto unidentified infectious pathogens based on unbiased high-throughput sequencing.Results:Our observations show that this method supplements current diagnostic technology that predominantly relies on information derived five cases from the intensive care unit.This methodological approach detects viruses and corrects the incidence of false positive detection rates of pathogens in a much shorter period.Through our method is followed by polymerase chain reaction validation,we could identify infection with Epstein-Barr virus,and in another case,we could identify infection with Streptococcus viridians based on the culture,which was false positive.Conclusions:This technology is a promising approach to revolutionize rapid diagnosis of infectious pathogens and to guide therapy that might result in the improvement of personalized medicine.

  1. Modular robotic intelligence system based on fuzzy reasoning and state machine sequencing

    Science.gov (United States)

    Sights, B.; Ahuja, G.; Kogut, G.; Pacis, E. B.; Everett, H. R.; Fellars, D.; Hardjadinata, S.

    2007-04-01

    The fusion of multiple behavior commands and sensor data into intelligent and cohesive robotic movement has been the focus of robot research for many years. Sequencing low level behaviors to create high level intelligence has also been researched extensively. Cohesive robotic movement is also dependent on other factors, such as environment, user intent, and perception of the environment. In this paper, a method for managing the complexity derived from the increase in sensors and perceptions is described. Our system uses fuzzy logic and a state machine to fuse multiple behaviors into an optimal response based on the robot's current task. The resulting fused behavior is filtered through fuzzy logic based obstacle avoidance to create safe movement. The system also provides easy integration with any communications protocol, plug-and-play devices, perceptions, and behaviors. Most behaviors and the obstacle avoidance parameters are easily changed through configuration files. Combined with previous work in the area of navigation and localization a very robust autonomy suite is created.

  2. FN-Identify: Novel Restriction Enzymes-Based Method for Bacterial Identification in Absence of Genome Sequencing

    Directory of Open Access Journals (Sweden)

    Mohamed Awad

    2015-01-01

    Full Text Available Sequencing and restriction analysis of genes like 16S rRNA and HSP60 are intensively used for molecular identification in the microbial communities. With aid of the rapid progress in bioinformatics, genome sequencing became the method of choice for bacterial identification. However, the genome sequencing technology is still out of reach in the developing countries. In this paper, we propose FN-Identify, a sequencing-free method for bacterial identification. FN-Identify exploits the gene sequences data available in GenBank and other databases and the two algorithms that we developed, CreateScheme and GeneIdentify, to create a restriction enzyme-based identification scheme. FN-Identify was tested using three different and diverse bacterial populations (members of Lactobacillus, Pseudomonas, and Mycobacterium groups in an in silico analysis using restriction enzymes and sequences of 16S rRNA gene. The analysis of the restriction maps of the members of three groups using the fragment numbers information only or along with fragments sizes successfully identified all of the members of the three groups using a minimum of four and maximum of eight restriction enzymes. Our results demonstrate the utility and accuracy of FN-Identify method and its two algorithms as an alternative method that uses the standard microbiology laboratories techniques when the genome sequencing is not available.

  3. Improved PCR-Based Detection of Soil Transmitted Helminth Infections Using a Next-Generation Sequencing Approach to Assay Design

    Science.gov (United States)

    Pilotte, Nils; Papaiakovou, Marina; Grant, Jessica R.; Bierwert, Lou Ann; Llewellyn, Stacey; McCarthy, James S.; Williams, Steven A.

    2016-01-01

    Background The soil transmitted helminths are a group of parasitic worms responsible for extensive morbidity in many of the world’s most economically depressed locations. With growing emphasis on disease mapping and eradication, the availability of accurate and cost-effective diagnostic measures is of paramount importance to global control and elimination efforts. While real-time PCR-based molecular detection assays have shown great promise, to date, these assays have utilized sub-optimal targets. By performing next-generation sequencing-based repeat analyses, we have identified high copy-number, non-coding DNA sequences from a series of soil transmitted pathogens. We have used these repetitive DNA elements as targets in the development of novel, multi-parallel, PCR-based diagnostic assays. Methodology/Principal Findings Utilizing next-generation sequencing and the Galaxy-based RepeatExplorer web server, we performed repeat DNA analysis on five species of soil transmitted helminths (Necator americanus, Ancylostoma duodenale, Trichuris trichiura, Ascaris lumbricoides, and Strongyloides stercoralis). Employing high copy-number, non-coding repeat DNA sequences as targets, novel real-time PCR assays were designed, and assays were tested against established molecular detection methods. Each assay provided consistent detection of genomic DNA at quantities of 2 fg or less, demonstrated species-specificity, and showed an improved limit of detection over the existing, proven PCR-based assay. Conclusions/Significance The utilization of next-generation sequencing-based repeat DNA analysis methodologies for the identification of molecular diagnostic targets has the ability to improve assay species-specificity and limits of detection. By exploiting such high copy-number repeat sequences, the assays described here will facilitate soil transmitted helminth diagnostic efforts. We recommend similar analyses when designing PCR-based diagnostic tests for the detection of other

  4. A sequence-based genetic linkage map as a reference for Brassica rapa pseudochromosome assembly

    OpenAIRE

    Cheng Feng; Wang Qian; Liao Yongcui; Deng Jie; Wang Hui(Wendy); Liu Bo; Sun Silong; Wang Yan; Wang Xiaowu; Wu Jian

    2011-01-01

    Abstract Background Brassica rapa is an economically important crop and a model plant for studies concerning polyploidization and the evolution of extreme morphology. The multinational B. rapa Genome Sequencing Project (BrGSP) was launched in 2003. In 2008, next generation sequencing technology was used to sequence the B. rapa genome. Several maps concerning B. rapa pseudochromosome assembly have been published but their coverage of the genome is incomplete, anchoring approximately 73.6% of t...

  5. Comparison of illumina and 454 deep sequencing in participants failing raltegravir-based antiretroviral therapy.

    Directory of Open Access Journals (Sweden)

    Jonathan Z Li

    Full Text Available The impact of raltegravir-resistant HIV-1 minority variants (MVs on raltegravir treatment failure is unknown. Illumina sequencing offers greater throughput than 454, but sequence analysis tools for viral sequencing are needed. We evaluated Illumina and 454 for the detection of HIV-1 raltegravir-resistant MVs.A5262 was a single-arm study of raltegravir and darunavir/ritonavir in treatment-naïve patients. Pre-treatment plasma was obtained from 5 participants with raltegravir resistance at the time of virologic failure. A control library was created by pooling integrase clones at predefined proportions. Multiplexed sequencing was performed with Illumina and 454 platforms at comparable costs. Illumina sequence analysis was performed with the novel snp-assess tool and 454 sequencing was analyzed with V-Phaser.Illumina sequencing resulted in significantly higher sequence coverage and a 0.095% limit of detection. Illumina accurately detected all MVs in the control library at ≥0.5% and 7/10 MVs expected at 0.1%. 454 sequencing failed to detect any MVs at 0.1% with 5 false positive calls. For MVs detected in the patient samples by both 454 and Illumina, the correlation in the detected variant frequencies was high (R2 = 0.92, P<0.001. Illumina sequencing detected 2.4-fold greater nucleotide MVs and 2.9-fold greater amino acid MVs compared to 454. The only raltegravir-resistant MV detected was an E138K mutation in one participant by Illumina sequencing, but not by 454.In participants of A5262 with raltegravir resistance at virologic failure, baseline raltegravir-resistant MVs were rarely detected. At comparable costs to 454 sequencing, Illumina demonstrated greater depth of coverage, increased sensitivity for detecting HIV MVs, and fewer false positive variant calls.

  6. Comparison of Illumina and 454 Deep Sequencing in Participants Failing Raltegravir-Based Antiretroviral Therapy

    OpenAIRE

    Li, Jonathan Z.; Chapman, Brad; Charlebois, Patrick; Hofmann, Oliver; Weiner, Brian; Porter, Alyssa J.; Samuel, Reshmi; Vardhanabhuti, Saran; ZHENG, LU; Eron, Joseph; Taiwo, Babafemi; Zody, Michael C; Henn, Matthew R.; Daniel R Kuritzkes; Hide, Winston

    2014-01-01

    Background: The impact of raltegravir-resistant HIV-1 minority variants (MVs) on raltegravir treatment failure is unknown. Illumina sequencing offers greater throughput than 454, but sequence analysis tools for viral sequencing are needed. We evaluated Illumina and 454 for the detection of HIV-1 raltegravir-resistant MVs. Methods: A5262 was a single-arm study of raltegravir and darunavir/ritonavir in treatment-naïve patients. Pre-treatment plasma was obtained from 5 participants with raltegra...

  7. Comparison of Illumina and 454 Deep Sequencing in Participants Failing Raltegravir-Based Antiretroviral Therapy

    OpenAIRE

    Li, Jonathan Z.; Brad Chapman; Patrick Charlebois; Oliver Hofmann; Brian Weiner; Porter, Alyssa J.; Reshmi Samuel; Saran Vardhanabhuti; Lu Zheng; Joseph Eron; Babafemi Taiwo; Zody, Michael C; Henn, Matthew R.; Daniel R Kuritzkes; Winston Hide

    2014-01-01

    BACKGROUND: The impact of raltegravir-resistant HIV-1 minority variants (MVs) on raltegravir treatment failure is unknown. Illumina sequencing offers greater throughput than 454, but sequence analysis tools for viral sequencing are needed. We evaluated Illumina and 454 for the detection of HIV-1 raltegravir-resistant MVs. METHODS: A5262 was a single-arm study of raltegravir and darunavir/ritonavir in treatment-naïve patients. Pre-treatment plasma was obtained from 5 participants with raltegra...

  8. Comparison of base composition analysis and Sanger sequencing of mitochondrial DNA for four U.S. population groups.

    Science.gov (United States)

    Kiesler, Kevin M; Coble, Michael D; Hall, Thomas A; Vallone, Peter M

    2014-01-01

    A set of 711 samples from four U.S. population groups was analyzed using a novel mass spectrometry based method for mitochondrial DNA (mtDNA) base composition profiling. Comparison of the mass spectrometry results with Sanger sequencing derived data yielded a concordance rate of 99.97%. Length heteroplasmy was identified in 46% of samples and point heteroplasmy was observed in 6.6% of samples in the combined mass spectral and Sanger data set. Using discrimination capacity as a metric, Sanger sequencing of the full control region had the highest discriminatory power, followed by the mass spectrometry base composition method, which was more discriminating than Sanger sequencing of just the hypervariable regions. This trend is in agreement with the number of nucleotides covered by each of the three assays.

  9. Shotgun protein sequencing.

    Energy Technology Data Exchange (ETDEWEB)

    Faulon, Jean-Loup Michel; Heffelfinger, Grant S.

    2009-06-01

    A novel experimental and computational technique based on multiple enzymatic digestion of a protein or protein mixture that reconstructs protein sequences from sequences of overlapping peptides is described in this SAND report. This approach, analogous to shotgun sequencing of DNA, is to be used to sequence alternative spliced proteins, to identify post-translational modifications, and to sequence genetically engineered proteins.

  10. Combined sequence-based and genetic mapping analysis of complex traits in outbred rats

    NARCIS (Netherlands)

    Baud, Amelie; Hermsen, Roel; Guryev, Victor; Stridh, Pernilla; Graham, Delyth; McBride, Martin W.; Foroud, Tatiana; Calderari, Sophie; Diez, Margarita; Ockinger, Johan; Beyeen, Amennai D.; Gillett, Alan; Abdelmagid, Nada; Guerreiro-Cacais, Andre Ortlieb; Jagodic, Maja; Tuncel, Jonatan; Norin, Ulrika; Beattie, Elisabeth; Huynh, Ngan; Miller, William H.; Koller, Daniel L.; Alam, Imranul; Falak, Samreen; Osborne-Pellegrin, Mary; Martinez-Membrives, Esther; Canete, Toni; Blazquez, Gloria; Vicens-Costa, Elia; Mont-Cardona, Carme; Diaz-Moran, Sira; Tobena, Adolf; Hummel, Oliver; Zelenika, Diana; Saar, Kathrin; Patone, Giannino; Bauerfeind, Anja; Bihoreau, Marie-Therese; Heinig, Matthias; Lee, Young-Ae; Rintisch, Carola; Schulz, Herbert; Wheeler, David A.; Worley, Kim C.; Muzny, Donna M.; Gibbs, Richard A.; Lathrop, Mark; Lansu, Nico; Toonen, Pim; Ruzius, Frans Paul; de Bruijn, Ewart; Hauser, Heidi; Adams, David J.; Keane, Thomas; Atanur, Santosh S.; Aitman, Tim J.; Flicek, Paul; Malinauskas, Tomas; Jones, E. Yvonne; Ekman, Diana; Lopez-Aumatell, Regina; Dominiczak, Anna F.; Johannesson, Martina; Holmdahl, Rikard; Olsson, Tomas; Gauguier, Dominique; Hubner, Norbert; Fernandez-Teruel, Alberto; Cuppen, Edwin; Mott, Richard; Flint, Jonathan

    2013-01-01

    Genetic mapping on fully sequenced individuals is transforming understanding of the relationship between molecular variation and variation in complex traits. Here we report a combined sequence and genetic mapping analysis in outbred rats that maps 355 quantitative trait loci for 122 phenotypes. We i

  11. Identification of Anoectochilus based on rDNA ITS sequences alignment and SELDI-TOF-MS

    OpenAIRE

    Gao, Chuan; Zhang, Fusheng; Zhang, Jun; Guo, Shunxing; Shao, Hongbo

    2009-01-01

    The internal transcribed spacer (ITS) sequences alignment and proteomic difference of Anoectochilus interspecies have been studied by means of ITS molecular identification and surface enhanced laser desorption ionization time of flight mass spectrography. Results showed that variety certification on Anoectochilus by ITS sequences can not determine species, and there is proteomic difference among Anoectochilus interspecies. Moreover, proteomic finger printings of five Anoectochilus species hav...

  12. Designing and Evaluating Research-Based Instructional Sequences for Introducing Magnetic Fields

    Science.gov (United States)

    Guisasola, Jenaro; Almudi, Jose Manuel; Ceberio, Mikel; Zubimendi, Jose Luis

    2009-01-01

    This study examines the didactic suitability of introducing a teaching sequence when teaching the concept of magnetic fields within introductory physics courses at the university level. This instructional sequence was designed taking into account students' common conceptions, an analysis of the course content, and the history of the development of…

  13. A likelihood ratio test for species membership based on DNA sequence data

    DEFF Research Database (Denmark)

    Matz, Mikhail V.; Nielsen, Rasmus

    2005-01-01

    sequence is a member of an a priori specified species. We investigate the performance of the test using coalescence simulations, as well as using the real data from butterflies and frogs representing two kinds of challenge for DNA barcoding: extremely low and extremely high levels of sequence variability....

  14. Mining and gene ontology based annotation of SSR markers from expressed sequence tags of Humulus lupulus.

    Science.gov (United States)

    Singh, Swati; Gupta, Sanchita; Mani, Ashutosh; Chaturvedi, Anoop

    2012-01-01

    Humulus lupulus is commonly known as hops, a member of the family moraceae. Currently many projects are underway leading to the accumulation of voluminous genomic and expressed sequence tag sequences in public databases. The genetically characterized domains in these databases are limited due to non-availability of reliable molecular markers. The large data of EST sequences are available in hops. The simple sequence repeat markers extracted from EST data are used as molecular markers for genetic characterization, in the present study. 25,495 EST sequences were examined and assembled to get full-length sequences. Maximum frequency distribution was shown by mononucleotide SSR motifs i.e. 60.44% in contig and 62.16% in singleton where as minimum frequency are observed for hexanucleotide SSR in contig (0.09%) and pentanucleotide SSR in singletons (0.12%). Maximum trinucleotide motifs code for Glutamic acid (GAA) while AT/TA were the most frequent repeat of dinucleotide SSRs. Flanking primer pairs were designed in-silico for the SSR containing sequences. Functional categorization of SSRs containing sequences was done through gene ontology terms like biological process, cellular component and molecular function.

  15. A dated molecular phylogeny of manta and devil rays (Mobulidae) based on mitogenome and nuclear sequences

    NARCIS (Netherlands)

    Poortvliet, Marloes; Olsen, Jeanine; Croll, Donald A.; Bernardi, Giacomo; Newton, Kelly; Kollias, Spyros; O'Sullivan, John; Fernando, Daniel; Stevens, Guy; Galván Magaña, Felipe; Seret, Bernard; Wintner, Sabine; Hoarau, Galice

    2015-01-01

    Manta and devil rays are an iconic group of globally distributed pelagic filter feeders, yet their evolutionary history remains enigmatic. We employed next generation sequencing of mitogenomes for nine of the 11 recognized species and two outgroups; as well as additional Sanger sequencing of two mit

  16. Phylogenetic analysis of the Listeria monocytogenes based on sequencing of 16S rRNA and hlyA genes.

    Science.gov (United States)

    Soni, Dharmendra Kumar; Dubey, Suresh Kumar

    2014-12-01

    The discrimination between Listeria monocytogenes and Listeria species has been detected. The 16S rRNA and hlyA were PCR amplified with set of oligonucleotide primers with flank 1,500 and 456 bp fragments, respectively. Based on the differences in 16S rRNA and hlyA genes, a total 80 isolates from different environmental, food and clinical samples confirmed it to be L. monocytogenes. The 16S rRNA sequence similarity suggested that the isolates were similar to the previously reported ones from different habitats by others. The phylogenetic interrelationships of the genus Listeria were investigated by sequencing of 16S rRNA and hlyA gene. The 16S rRNA sequence indicated that genus Listeria is comprised of following closely related but distinct lines of descent, one is the L. monocytogenes species group (including L. innocua, L. ivanovii, L. seeligeri and L. welshimeri) and other, the species L. grayi, L. rocourtiae and L. fleischmannii. The phylogenetic tree based on hlyA gene sequence clearly differentiates between the L. monocytogenes, L. ivanovii and L. seeligeri. In the present study, we identified 80 isolates of L. monocytogenes originating from different clinical, food and environmental samples based on 16S rRNA and hlyA gene sequence similarity.

  17. Combined sequence-based and genetic mapping analysis of complex traits in outbred rats

    Science.gov (United States)

    Baud, Amelie; Hermsen, Roel; Guryev, Victor; Stridh, Pernilla; Graham, Delyth; McBride, Martin W.; Foroud, Tatiana; Calderari, Sophie; Diez, Margarita; Ockinger, Johan; Beyeen, Amennai D.; Gillett, Alan; Abdelmagid, Nada; Guerreiro-Cacais, Andre Ortlieb; Jagodic, Maja; Tuncel, Jonatan; Norin, Ulrika; Beattie, Elisabeth; Huynh, Ngan; Miller, William H.; Koller, Daniel L.; Alam, Imranul; Falak, Samreen; Osborne-Pellegrin, Mary; Martinez-Membrives, Esther; Canete, Toni; Blazquez, Gloria; Vicens-Costa, Elia; Mont-Cardona, Carme; Diaz-Moran, Sira; Tobena, Adolf; Hummel, Oliver; Zelenika, Diana; Saar, Kathrin; Patone, Giannino; Bauerfeind, Anja; Bihoreau, Marie-Therese; Heinig, Matthias; Lee, Young-Ae; Rintisch, Carola; Schulz, Herbert; Wheeler, David A.; Worley, Kim C.; Muzny, Donna M.; Gibbs, Richard A.; Lathrop, Mark; Lansu, Nico; Toonen, Pim; Ruzius, Frans Paul; de Bruijn, Ewart; Hauser, Heidi; Adams, David J.; Keane, Thomas; Atanur, Santosh S.; Aitman, Tim J.; Flicek, Paul; Malinauskas, Tomas; Jones, E. Yvonne; Ekman, Diana; Lopez-Aumatell, Regina; Dominiczak, Anna F; Johannesson, Martina; Holmdahl, Rikard; Olsson, Tomas; Gauguier, Dominique; Hubner, Norbert; Fernandez-Teruel, Alberto; Cuppen, Edwin; Mott, Richard; Flint, Jonathan

    2013-01-01

    Genetic mapping on fully sequenced individuals is transforming our understanding of the relationship between molecular variation and variation in complex traits. Here we report a combined sequence and genetic mapping analysis in outbred rats that maps 355 quantitative trait loci for 122 phenotypes. We identify 35 causal genes involved in 31 phenotypes, implicating novel genes in models of anxiety, heart disease and multiple sclerosis. The relation between sequence and genetic variation is unexpectedly complex: at approximately 40% of quantitative trait loci a single sequence variant cannot account for the phenotypic effect. Using comparable sequence and mapping data from mice, we show the extent and spatial pattern of variation in inbred rats differ significantly from those of inbred mice, and that the genetic variants in orthologous genes rarely contribute to the same phenotype in both species. PMID:23708188

  18. GENBIT COMPRESS TOOL(GBC: A JAVA-BASED TOOL TO COMPRESS DNA SEQUENCES AND COMPUTE COMPRESSION RATIO(BITS/BASE OF GENOMES

    Directory of Open Access Journals (Sweden)

    P.RAJA RAJESWARI

    2010-06-01

    Full Text Available We present a Compression Tool , GenBit Compress”, for genetic sequences based on our newproposed “GenBit Compress Algorithm”. Our Tool achieves the best compression ratios for Entire Genome(DNA sequences . Significantly better compression results show that GenBit compress algorithm is the bestamong the remaining Genome compression algorithms for non-repetitive DNA sequences in Genomes. Thestandard Compression algorithms such as gzip or compress cannot compress DNA sequences but only expandthem in size. In this paper we consider the problem of DNA compression. It is well known that one of the mainfeatures of DNA Sequences is that they contain substrings which are duplicated except for a few randomMutations. For this reason most DNA compressors work by searching and encoding approximate repeats. Wedepart from this strategy by searching and encoding only exact repeats. our proposed algorithm achieves thebest compression ratio for DNA sequences for larger genome. As long as 8 lakh characters can be given asinput While achieving the best compression ratios for DNA sequences, our new GenBit Compress programsignificantly improves the running time of all previous DNA compressors. Assigning binary bits for fragmentsof DNA sequence is also a unique concept introduced in this program for the first time in DNA compression.

  19. Fuzzy time-series based on Fibonacci sequence for stock price forecasting

    Science.gov (United States)

    Chen, Tai-Liang; Cheng, Ching-Hsue; Jong Teoh, Hia

    2007-07-01

    Time-series models have been utilized to make reasonably accurate predictions in the areas of stock price movements, academic enrollments, weather, etc. For promoting the forecasting performance of fuzzy time-series models, this paper proposes a new model, which incorporates the concept of the Fibonacci sequence, the framework of Song and Chissom's model and the weighted method of Yu's model. This paper employs a 5-year period TSMC (Taiwan Semiconductor Manufacturing Company) stock price data and a 13-year period of TAIEX (Taiwan Stock Exchange Capitalization Weighted Stock Index) stock index data as experimental datasets. By comparing our forecasting performances with Chen's (Forecasting enrollments based on fuzzy time-series. Fuzzy Sets Syst. 81 (1996) 311-319), Yu's (Weighted fuzzy time-series models for TAIEX forecasting. Physica A 349 (2004) 609-624) and Huarng's (The application of neural networks to forecast fuzzy time series. Physica A 336 (2006) 481-491) models, we conclude that the proposed model surpasses in accuracy these conventional fuzzy time-series models.

  20. The phylogenetic placement of Siniperca obscura base on complete mitochondrial DNA sequence.

    Science.gov (United States)

    Chen, Dun-Xue; Li, Yulong; Bin, Shi-Yu; Wang, Kaizhuo; Chu, Wu-Ying; Zhang, Jian-She

    2014-06-01

    Abstract The extant freshwater sinipercids represent a group of 12 species and they are endemic to East Asia. In this study, we cloned and sequenced the complete mitochondrial DNA of Siniperca obscura from the Lijiang River. The size of the complete mitochondrial genome is 16,492 bp. The organization of the mitochondrial contained 37 genes (13 protein-coding genes, 2 ribosomal RNA and 22 transfer RNAs) and a major non-coding control region as well as those reported sinipercid fishes. Among the 13 protein-coding genes, three reading-frame overlaps were found: ATP8 and ATP6 overlap by 10 nucleotides and ND4 and ND4L overlap by 7 nucleotides and ND5 and ND6 overlap by 5 nucleotides. Phylogenetic analyses using N-J and maximum parsimony (MP) computational algorithms showed that S. chuatsi and S. kneri are sister species, next joined by S. Obscura, based on combined 12 protein-coding genes (excluding DN6).

  1. Molecular phylogeny of the large carpenter bees, genus Xylocopa (Hymenoptera: apidae), based on mitochondrial DNA sequences.

    Science.gov (United States)

    Leys, R; Cooper, S J; Schwarz, M P

    2000-12-01

    Carpenter bees, genus Xylocopa Latreille, a group of bees found on all continents, are of particular interest to behavioral ecologists because of their utility for studies of the evolution of mating strategies and sociality. This paper presents phylogenetic analyses based on sequences of two mitochondrial genes cytochrome oxidase 1 and cytochrome b for 22 subgenera of Xylocopa. Maximum-parsimony and maximum-likelihood methods were used to infer phylogenetic relationships. The analyses resulted in three resolved clades of subgenera: a South American group (including the subgenera Stenoxylocopa, Megaxylocopa, and Neoxylocopa), a group including the subgenera Xylocopa s.s. and Ctenoxylocopa, and an Ethiopean group (including the subgenera Afroxylocopa, Mesotrichia, Alloxylocopa, Platynopoda, Hoploxylocopa, and Koptortosoma). The relationships between the 11 other subgenera and the resolved clades are unclear. Within the Ethiopian group we found a clear separation of the African and the Oriental taxa and apparent polyphyly of the subgenus Koptortosoma. Using an evolutionary rate for ants, we investigated whether Gondwana vicariance or more recent dispersal events could best explain the present-day distribution of subgenera. Although some taxa show divergences that approach Gondwanan breakup times, most divergences between geographic groups are too recent to support a vicariance hypothesis. PMID:11133195

  2. Position-specific prediction of methylation sites from sequence conservation based on information theory.

    Science.gov (United States)

    Shi, Yinan; Guo, Yanzhi; Hu, Yayun; Li, Menglong

    2015-07-23

    Protein methylation plays vital roles in many biological processes and has been implicated in various human diseases. To fully understand the mechanisms underlying methylation for use in drug design and work in methylation-related diseases, an initial but crucial step is to identify methylation sites. The use of high-throughput bioinformatics methods has become imperative to predict methylation sites. In this study, we developed a novel method that is based only on sequence conservation to predict protein methylation sites. Conservation difference profiles between methylated and non-methylated peptides were constructed by the information entropy (IE) in a wider neighbor interval around the methylation sites that fully incorporated all of the environmental information. Then, the distinctive neighbor residues were identified by the importance scores of information gain (IG). The most representative model was constructed by support vector machine (SVM) for Arginine and Lysine methylation, respectively. This model yielded a promising result on both the benchmark dataset and independent test set. The model was used to screen the entire human proteome, and many unknown substrates were identified. These results indicate that our method can serve as a useful supplement to elucidate the mechanism of protein methylation and facilitate hypothesis-driven experimental design and validation.

  3. [Comparison between Astragalus membranaceus var. mongholicus and Hedysarum polybotrys based on ITS sequences and metabolomics].

    Science.gov (United States)

    Jiao, Mei-li; Li, Zhen-yu; Zhang, Fu-sheng; Qin, Xue-mei

    2015-12-01

    Astragalus membranaceus var. mongholicus and Hedysarum polybotrys belong to different genera, but have similar drug efficacy in traditional Chinese medicine theory, and H. polybotrys was used as the legal A. membranaceus var. mongholicus previously. In this study, similarities and differences between them were analyzed via their ITS/ITS2 fragments information. The ITS (internal transcribed spacer) regions were amplified using polymerase chain reaction and then sequenced in two-way. The alignment lengths of ITS regions were 616 bp, in which 508 loci were consistent, and 103 loci were different, accounting for 82.47% and 16.72% of the total ITS nucleotides in length, respectively. As genotype determines phenotype, 1HNMR-based metabolomic approach was further used to reveal the chemical similarities and differences between them. Thirty-four metabolites were identified in the 1H NMR spectra, and twenty-seven metabolites were the common components. Amino acids, carbohydrates and other primary metabolites were similar, while a large difference existed in the flavonoids and astragalosides. This study suggests that A. membranaceus var. mongholicus and H. polybotrys show similarities and differences from molecular and chemical perspectives, which has laid a foundation for elucidating the effective material basis of drug with similar efficacy and resources utilization. PMID:27169287

  4. A Preliminarily Phylogeny Study of the Eriobotrya Based on the nrDNA Adh Sequences

    Directory of Open Access Journals (Sweden)

    Yang XINAGHUI

    2012-11-01

    Full Text Available Phylogenetic relationships of the genus Eriobotrya Lindl. were examined based on the nrDNA Adh sequences. A phylogenetic tree of 14 loquat accessions (species, varieties and forma was generated by using Photinieae serrulaia L. as an outgroup and Rhaphiolepis indica (L. Lindl. as an ingroup, which represent the two closest genera of Eriobotyra. The results showed that these loquat accessions were divided into two main clades in the consensus tree. Clade I included E. seguinii Card and group A formed by E. henryi Nakai, E.bengalensis Hook.f., and forma angustifolia Vidal. Clade II is composed of the other taxas which included three groups. E. cavaleriei Rehd and E. fragrans Champ formed group B; group C consisted of E. prinoides Rehd. & Wils. var. dadunensis H.Z.Zhang, and E. japonica Lindl.; and group D included E. deflexa Nakai and E. deflexa Nakai Var.buisanensis NaKai. Since E. deflexa Nakai, E. deflexa Nakai Var.buisanensis NaKai and E. kwangsiensis Chun, were closer in the phylogenetic tree; while E. prinoides Rehd. & Wils. var. dadunensis H.Z.Zhang, E. japonica Lindl, E. prinoides Rehd & Wils and E.elliptica Lindl. were close with each other, they may be locataed at a similar place of the phylogenetic stage. However, E. malipoensis Kuan need further studies on its phylogenetis relationship for it was separated from the others. Results further support the theory that E. cavaleriei Rehd could be a variety of E. fragrans Champ.

  5. A cluster finding algorithm based on the multi-band identification of red-sequence galaxies

    CERN Document Server

    Oguri, Masamune

    2014-01-01

    We present a new algorithm, CAMIRA, to identify clusters of galaxies in wide-field imaging survey data. We base our algorithm on the stellar population synthesis model to predict colours of red-sequence galaxies at a given redshift for an arbitrary set of bandpass filters, with additional calibration using a sample of spectroscopic galaxies to improve the accuracy of the model prediction. We run the algorithm on ~11960 deg^2 of imaging data from the Sloan Digital Sky Survey (SDSS) Data Release 8 to construct a catalogue of 71743 clusters in the redshift range 0.1

  6. Genomic sequence-based discovery of novel angucyclinone antibiotics from marine Streptomyces sp. W007.

    Science.gov (United States)

    Zhang, Hongyu; Wang, Hongbo; Wang, Yipeng; Cui, Hongli; Xie, Zeping; Pu, Yang; Pei, Shiqian; Li, Fuchao; Qin, Song

    2012-07-01

    A large number of novel bioactive compounds were discovered from microbial secondary metabolites based on the traditional bioactivity screenings. Recent fermentation studies indicated that the crude extract of marine Streptomyces sp. W007 possessed great potential in agricultural fungal disease control against Phomopsis asparagi, Polystigma deformans, Cladosporium cucumerinum, Monilinia fructicola, and Colletotrichum lagenarium. To further evaluate the biosynthetic potential of secondary metabolites, we sequenced the genome of Streptomyces sp. W007 and analyzed the identifiable secondary metabolite gene clusters. Moreover, one gene cluster with type II PKS implied the possibility of Streptomyces sp. W007 to produce aromatic polyketide of angucyclinone antibiotics. Therefore, two novel compounds, 3-hydroxy-1-keto-3-methyl-8-methoxy-1,2,3,4-tetrahydro-benz[α]anthracene and kiamycin with potent cytotoxicities against human cancer cell lines, were isolated from the culture broth of Streptomyces sp. W007. In addition, other four known angucyclinone antibiotics were obtained. The gene cluster for these angucyclinone antibiotics could be assigned to 20 genes. This work provides powerful evidence for the interplay between genomic analysis and traditional natural product isolation research. PMID:22536997

  7. A Novel Data Assimilation Methodology for Predicting Lithology Based on Sequence Labeling Algorithms

    Science.gov (United States)

    Park, E.; Jeong, J.; Han, W. S.; Kim, K. Y.

    2014-12-01

    A hidden Markov model (HMM) and a conditional random fields (CRFs) model for lithological predictions based on multiple geophysical well-logging data are derived for dealing with directional non-stationarity through bi-directional training and conditioning. The developed models were benchmarked against their conventional counterparts, and hypothetical boreholes with the corresponding synthetic geophysical data including artificial errors were employed. In the three test scenarios devised, the average fitness and unfitness values of the developed CRFs model and HMM are 0.84 and 0.071, and 0.81 and 0.084, respectively, while those of the conventional CRFs model and HMM are 0.78 and 0.091, and 0.77 and 0.099, respectively. Comparisons of their predictabilities show that the models designed for directional non-stationarity clearly perform better than the conventional models for all tested examples. Among them, the developed linear-chain CRFs model showed the best or close to the best performance with high predictability and a low training data requirement. Keywords: one-dimensional lithological characterization, sequence labeling algorithm, conditional random fields, hidden Markov model, borehole, geophysical well-logging data.

  8. Aquifer Vulnerability Assessment Based on Sequence Stratigraphic and ³⁹Ar Transport Modeling.

    Science.gov (United States)

    Sonnenborg, Torben O; Scharling, Peter B; Hinsby, Klaus; Rasmussen, Erik S; Engesgaard, Peter

    2016-03-01

    A large-scale groundwater flow and transport model is developed for a deep-seated (100 to 300 m below ground surface) sedimentary aquifer system. The model is based on a three-dimensional (3D) hydrostratigraphic model, building on a sequence stratigraphic approach. The flow model is calibrated against observations of hydraulic head and stream discharge while the credibility of the transport model is evaluated against measurements of (39)Ar from deep wells using alternative parameterizations of dispersivity and effective porosity. The directly simulated 3D mean age distributions and vertical fluxes are used to visualize the two-dimensional (2D)/3D age and flux distribution along transects and at the top plane of individual aquifers. The simulation results are used to assess the vulnerability of the aquifer system that generally has been assumed to be protected by thick overlaying clayey units and therefore proposed as future reservoirs for drinking water supply. The results indicate that on a regional scale these deep-seated aquifers are not as protected from modern surface water contamination as expected because significant leakage to the deeper aquifers occurs. The complex distribution of local and intermediate groundwater flow systems controlled by the distribution of the river network as well as the topographical variation (Tóth 1963) provides the possibility for modern water to be found in even the deepest aquifers.

  9. Recent improvements to the SMART domain-based sequence annotation resource.

    Science.gov (United States)

    Letunic, Ivica; Goodstadt, Leo; Dickens, Nicholas J; Doerks, Tobias; Schultz, Joerg; Mott, Richard; Ciccarelli, Francesca; Copley, Richard R; Ponting, Chris P; Bork, Peer

    2002-01-01

    SMART (Simple Modular Architecture Research Tool, http://smart.embl-heidelberg.de) is a web-based resource used for the annotation of protein domains and the analysis of domain architectures, with particular emphasis on mobile eukaryotic domains. Extensive annotation for each domain family is available, providing information relating to function, subcellular localization, phyletic distribution and tertiary structure. The January 2002 release has added more than 200 hand-curated domain models. This brings the total to over 600 domain families that are widely represented among nuclear, signalling and extracellular proteins. Annotation now includes links to the Online Mendelian Inheritance in Man (OMIM) database in cases where a human disease is associated with one or more mutations in a particular domain. We have implemented new analysis methods and updated others. New advanced queries provide direct access to the SMART relational database using SQL. This database now contains information on intrinsic sequence features such as transmembrane regions, coiled-coils, signal peptides and internal repeats. SMART output can now be easily included in users' documents. A SMART mirror has been created at http://smart.ox.ac.uk. PMID:11752305

  10. Phylogenetic relationships of graminicolous downy mildews based on cox2 sequence data.

    Science.gov (United States)

    Thines, Marco; Göker, Markus; Telle, Sabine; Ryley, Malcolm; Mathur, Kusum; Narayana, Yaladabagi D; Spring, Otmar; Thakur, Ram P

    2008-03-01

    Graminicolous downy mildews (GDM) are an understudied, yet economically important, group of plant pathogens, which are one of the major constraints to poaceous crops in the tropics and subtropics. Here we present a first molecular phylogeny based on cox2 sequences comprising all genera of the GDM currently accepted, with both lasting (Graminivora, Poakatesthia, and Viennotia) and evanescent (Peronosclerospora, Sclerophthora, and Sclerospora) sporangiophores. In addition, all other downy mildew genera currently accepted, as well as a representative sample of other oomycete taxa, have been included. It was shown that all genera of the GDM have had a long, independent evolutionary history, and that the delineation between Peronosclerospora and Sclerospora is correct. Sclerophthora was found to be a particularly divergent taxon nested within a paraphyletic Phytophthora, but without support. The results confirm that the placement of Peronosclerospora and Sclerospora in the Saprolegniomycetidae is incorrect. Sclerophthora is not closely related to Pachymetra of the family Verrucalvaceae, and also does not belong to the Saprolegniomycetidae, but shows close affinities to the Peronosporaceae. In addition, all GDM are interspersed throughout the Peronosporaceae s lat., suggesting that a separate family for the Sclerosporaceae might not be justified. PMID:18308532

  11. Recent improvements to the SMART domain-based sequence annotation resource

    Science.gov (United States)

    Letunic, Ivica; Goodstadt, Leo; Dickens, Nicholas J.; Doerks, Tobias; Schultz, Joerg; Mott, Richard; Ciccarelli, Francesca; Copley, Richard R.; Ponting, Chris P.; Bork, Peer

    2002-01-01

    SMART (Simple Modular Architecture Research Tool, http://smart.embl-heidelberg.de) is a web-based resource used for the annotation of protein domains and the analysis of domain architectures, with particular emphasis on mobile eukaryotic domains. Extensive annotation for each domain family is available, providing information relating to function, subcellular localization, phyletic distribution and tertiary structure. The January 2002 release has added more than 200 hand-curated domain models. This brings the total to over 600 domain families that are widely represented among nuclear, signalling and extracellular proteins. Annotation now includes links to the Online Mendelian Inheritance in Man (OMIM) database in cases where a human disease is associated with one or more mutations in a particular domain. We have implemented new analysis methods and updated others. New advanced queries provide direct access to the SMART relational database using SQL. This database now contains information on intrinsic sequence features such as transmembrane regions, coiled-coils, signal peptides and internal repeats. SMART output can now be easily included in users’ documents. A SMART mirror has been created at http://smart.ox.ac.uk. PMID:11752305

  12. Range camera calibration based on image sequences and dense comprehensive error statistics

    Science.gov (United States)

    Karel, Wilfried; Pfeifer, Norbert

    2009-01-01

    This article concentrates on the integrated self-calibration of both the interior orientation and the distance measurement system of a time-of-flght range camera (photonic mixer device). Unlike other approaches that investigate individual distortion factors separately, in the presented approach all calculations are based on the same data set that is captured without auxiliary devices serving as high-order reference, but with the camera being guided by hand. Flat, circular targets stuck on a planar whiteboard and with known positions are automatically tracked throughout the amplitude layer of long image sequences. These image observations are introduced into a bundle block adjustment, which on the one hand results in the determination of the interior orientation. Capitalizing the known planarity of the imaged board, the reconstructed exterior orientations furthermore allow for the derivation of reference values of the actual distance observations. Eased by the automatic reconstruction of the cameras trajectory and attitude, comprehensive statistics are generated, which are accumulated into a 5-dimensional matrix in order to be manageable. The marginal distributions of this matrix are inspected for the purpose of system identification, whereupon its elements are introduced into another least-squares adjustment, finally leading to clear range correction models and parameters.

  13. GGIP: Structure and sequence-based GPCR-GPCR interaction pair predictor.

    Science.gov (United States)

    Nemoto, Wataru; Yamanishi, Yoshihiro; Limviphuvadh, Vachiranee; Saito, Akira; Toh, Hiroyuki

    2016-09-01

    G Protein-Coupled Receptors (GPCRs) are important pharmaceutical targets. More than 30% of currently marketed pharmaceutical medicines target GPCRs. Numerous studies have reported that GPCRs function not only as monomers but also as homo- or hetero-dimers or higher-order molecular complexes. Many GPCRs exert a wide variety of molecular functions by forming specific combinations of GPCR subtypes. In addition, some GPCRs are reportedly associated with diseases. GPCR oligomerization is now recognized as an important event in various biological phenomena, and many researchers are investigating this subject. We have developed a support vector machine (SVM)-based method to predict interacting pairs for GPCR oligomerization, by integrating the structure and sequence information of GPCRs. The performance of our method was evaluated by the Receiver Operating Characteristic (ROC) curve. The corresponding area under the curve was 0.938. As far as we know, this is the only prediction method for interacting pairs among GPCRs. Our method could accelerate the analyses of these interactions, and contribute to the elucidation of the global structures of the GPCR networks in membranes. Proteins 2016; 84:1224-1233. © 2016 Wiley Periodicals, Inc. PMID:27191053

  14. Molecular phylogeny of the large carpenter bees, genus Xylocopa (Hymenoptera: apidae), based on mitochondrial DNA sequences.

    Science.gov (United States)

    Leys, R; Cooper, S J; Schwarz, M P

    2000-12-01

    Carpenter bees, genus Xylocopa Latreille, a group of bees found on all continents, are of particular interest to behavioral ecologists because of their utility for studies of the evolution of mating strategies and sociality. This paper presents phylogenetic analyses based on sequences of two mitochondrial genes cytochrome oxidase 1 and cytochrome b for 22 subgenera of Xylocopa. Maximum-parsimony and maximum-likelihood methods were used to infer phylogenetic relationships. The analyses resulted in three resolved clades of subgenera: a South American group (including the subgenera Stenoxylocopa, Megaxylocopa, and Neoxylocopa), a group including the subgenera Xylocopa s.s. and Ctenoxylocopa, and an Ethiopean group (including the subgenera Afroxylocopa, Mesotrichia, Alloxylocopa, Platynopoda, Hoploxylocopa, and Koptortosoma). The relationships between the 11 other subgenera and the resolved clades are unclear. Within the Ethiopian group we found a clear separation of the African and the Oriental taxa and apparent polyphyly of the subgenus Koptortosoma. Using an evolutionary rate for ants, we investigated whether Gondwana vicariance or more recent dispersal events could best explain the present-day distribution of subgenera. Although some taxa show divergences that approach Gondwanan breakup times, most divergences between geographic groups are too recent to support a vicariance hypothesis.

  15. Expanding the diversity of oenococcal bacteriophages: insights into a novel group based on the integrase sequence.

    Science.gov (United States)

    Jaomanjaka, Fety; Ballestra, Patricia; Dols-lafargue, Marguerite; Le Marrec, Claire

    2013-09-01

    Temperate bacteriophages are a contributor of the genetic diversity in the lactic acid bacterium Oenococcus oeni. We used a classification scheme for oenococcal prophages based on integrase gene polymorphism, to analyze a collection of Oenococcus strains mostly isolated in the area of Bordeaux, which represented the major lineages identified through MLST schemes in the species. Genome sequences of oenococcal prophages were clustered into four integrase groups (A to D) which were related to the chromosomal integration site. The prevalence of each group was determined and we could show that members of the intB- and intC-prophage groups were rare in our panel of strains. Our study focused on the so far uncharacterized members of the intD-group. Various intD viruses could be easily isolated from wine samples, while intD lysogens could be induced to produce phages active against two permissive O. oeni isolates. These data support the role of this prophage group in the biology of O. oeni. Global alignment of three relevant intD-prophages revealed significant conservation and highlighted a number of unique ORFs that may contribute to phage and lysogen fitness.

  16. Improved Bevirimat resistance prediction by combination of structural and sequence-based classifiers

    Directory of Open Access Journals (Sweden)

    Dybowski J Nikolaj

    2011-11-01

    Full Text Available Abstract Background Maturation inhibitors such as Bevirimat are a new class of antiretroviral drugs that hamper the cleavage of HIV-1 proteins into their functional active forms. They bind to these preproteins and inhibit their cleavage by the HIV-1 protease, resulting in non-functional virus particles. Nevertheless, there exist mutations in this region leading to resistance against Bevirimat. Highly specific and accurate tools to predict resistance to maturation inhibitors can help to identify patients, who might benefit from the usage of these new drugs. Results We tested several methods to improve Bevirimat resistance prediction in HIV-1. It turned out that combining structural and sequence-based information in classifier ensembles led to accurate and reliable predictions. Moreover, we were able to identify the most crucial regions for Bevirimat resistance computationally, which are in line with experimental results from other studies. Conclusions Our analysis demonstrated the use of machine learning techniques to predict HIV-1 resistance against maturation inhibitors such as Bevirimat. New maturation inhibitors are already under development and might enlarge the arsenal of antiretroviral drugs in the future. Thus, accurate prediction tools are very useful to enable a personalized therapy.

  17. Molecular phylogeny of the lionfish genera Dendrochirus and Pterois (Scorpaenidae, Pteroinae) based on mitochondrial DNA sequences.

    Science.gov (United States)

    Kochzius, Marc; Söller, Rainer; Khalaf, Maroof A; Blohm, Dietmar

    2003-09-01

    This study investigates the molecular phylogeny of seven lionfishes of the genera Dendrochirus and Pterois. MP, ML, and NJ phylogenetic analysis based on 964 bp of partial mitochondrial DNA sequences (cytochrome b and 16S rDNA) revealed two main clades: (1) "Pterois" clade (Pterois miles and Pterois volitans), and (2) "Pteropterus-Dendrochirus" clade (remainder of the sampled species). The position of Dendrochirus brachypterus either basal to the main clades or in the "Pteropterus-Dendrochirus" clade cannot be resolved. However, the molecular phylogeny did not support the current separation of the genera Pterois and Dendrochirus. The siblings P. miles and P. volitans are clearly separated and our results support the proposed allopatric or parapatric distribution in the Indian and Pacific Ocean. However, the present analysis cannot reveal if P. miles and P. volitans are separate species or two populations of a single species, because the observed separation in different clades can be either explained by speciation or lineage sorting. Molecular clock estimates for the siblings P. miles and P. volitans suggest a divergence time of 2.4-8.3 mya, which coincide with geological events that created vicariance between populations of the Indian and Pacific Ocean. PMID:12927126

  18. Sequence-based prediction of protein-protein interaction sites with L1-logreg classifier.

    Science.gov (United States)

    Dhole, Kaustubh; Singh, Gurdeep; Pai, Priyadarshini P; Mondal, Sukanta

    2014-05-01

    Protein-protein interactions are of central importance for virtually every process in a living cell. Information about the interaction sites in proteins improves our understanding of disease mechanisms and can provide the basis for new therapeutic approaches. Since a multitude of unique residue-residue contacts facilitate the interactions, protein-protein interaction sites prediction has become one of the most important and challenging problems of computational biology. Although much progress in this field has been reported, this problem is yet to be satisfactorily solved. Here, a novel method (LORIS: L1-regularized LOgistic Regression based protein-protein Interaction Sites predictor) is proposed, that identifies interaction residues, using sequence features and is implemented via the L1-logreg classifier. Results show that LORIS is not only quite effective, but also, performs better than existing state-of-the art methods. LORIS, available as standalone package, can be useful for facilitating drug-design and targeted mutation related studies, which require a deeper knowledge of protein interactions sites. PMID:24486250

  19. PRIMAL: Page Rank-Based Indoor Mapping and Localization Using Gene-Sequenced Unlabeled WLAN Received Signal Strength

    Directory of Open Access Journals (Sweden)

    Mu Zhou

    2015-09-01

    Full Text Available Due to the wide deployment of wireless local area networks (WLAN, received signal strength (RSS-based indoor WLAN localization has attracted considerable attention in both academia and industry. In this paper, we propose a novel page rank-based indoor mapping and localization (PRIMAL by using the gene-sequenced unlabeled WLAN RSS for simultaneous localization and mapping (SLAM. Specifically, first of all, based on the observation of the motion patterns of the people in the target environment, we use the Allen logic to construct the mobility graph to characterize the connectivity among different areas of interest. Second, the concept of gene sequencing is utilized to assemble the sporadically-collected RSS sequences into a signal graph based on the transition relations among different RSS sequences. Third, we apply the graph drawing approach to exhibit both the mobility graph and signal graph in a more readable manner. Finally, the page rank (PR algorithm is proposed to construct the mapping from the signal graph into the mobility graph. The experimental results show that the proposed approach achieves satisfactory localization accuracy and meanwhile avoids the intensive time and labor cost involved in the conventional location fingerprinting-based indoor WLAN localization.

  20. A rank-based sequence aligner with applications in phylogenetic analysis.

    Directory of Open Access Journals (Sweden)

    Liviu P Dinu

    Full Text Available Recent tools for aligning short DNA reads have been designed to optimize the trade-off between correctness and speed. This paper introduces a method for assigning a set of short DNA reads to a reference genome, under Local Rank Distance (LRD. The rank-based aligner proposed in this work aims to improve correctness over speed. However, some indexing strategies to speed up the aligner are also investigated. The LRD aligner is improved in terms of speed by storing [Formula: see text]-mer positions in a hash table for each read. Another improvement, that produces an approximate LRD aligner, is to consider only the positions in the reference that are likely to represent a good positional match of the read. The proposed aligner is evaluated and compared to other state of the art alignment tools in several experiments. A set of experiments are conducted to determine the precision and the recall of the proposed aligner, in the presence of contaminated reads. In another set of experiments, the proposed aligner is used to find the order, the family, or the species of a new (or unknown organism, given only a set of short Next-Generation Sequencing DNA reads. The empirical results show that the aligner proposed in this work is highly accurate from a biological point of view. Compared to the other evaluated tools, the LRD aligner has the important advantage of being very accurate even for a very low base coverage. Thus, the LRD aligner can be considered as a good alternative to standard alignment tools, especially when the accuracy of the aligner is of high importance. Source code and UNIX binaries of the aligner are freely available for future development and use at http://lrd.herokuapp.com/aligners. The software is implemented in C++ and Java, being supported on UNIX and MS Windows.

  1. A sequence-based approach to identify reference genes for gene expression analysis

    Directory of Open Access Journals (Sweden)

    Chari Raj

    2010-08-01

    Full Text Available Abstract Background An important consideration when analyzing both microarray and quantitative PCR expression data is the selection of appropriate genes as endogenous controls or reference genes. This step is especially critical when identifying genes differentially expressed between datasets. Moreover, reference genes suitable in one context (e.g. lung cancer may not be suitable in another (e.g. breast cancer. Currently, the main approach to identify reference genes involves the mining of expression microarray data for highly expressed and relatively constant transcripts across a sample set. A caveat here is the requirement for transcript normalization prior to analysis, and measurements obtained are relative, not absolute. Alternatively, as sequencing-based technologies provide digital quantitative output, absolute quantification ensues, and reference gene identification becomes more accurate. Methods Serial analysis of gene expression (SAGE profiles of non-malignant and malignant lung samples were compared using a permutation test to identify the most stably expressed genes across all samples. Subsequently, the specificity of the reference genes was evaluated across multiple tissue types, their constancy of expression was assessed using quantitative RT-PCR (qPCR, and their impact on differential expression analysis of microarray data was evaluated. Results We show that (i conventional references genes such as ACTB and GAPDH are highly variable between cancerous and non-cancerous samples, (ii reference genes identified for lung cancer do not perform well for other cancer types (breast and brain, (iii reference genes identified through SAGE show low variability using qPCR in a different cohort of samples, and (iv normalization of a lung cancer gene expression microarray dataset with or without our reference genes, yields different results for differential gene expression and subsequent analyses. Specifically, key established pathways in lung

  2. Viridans Group Streptococci clinical isolates: MALDI-TOF mass spectrometry versus gene sequence-based identification.

    Directory of Open Access Journals (Sweden)

    Silvia Angeletti

    Full Text Available Viridans Group Streptococci (VGS species-level identification is fundamental for patients management. Matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS has been used for VGS identification but discrimination within the Mitis group resulted difficult. In this study, VGS identifications with two MALDI-TOF instruments, the Biotyper (Bruker and the VITEK MS (bioMérieux have been compared to those derived from tuf, soda and rpoB genes sequencing. VGS isolates were clustered and a dendrogram constructed using the Biotyper 3.0 software (Bruker. RpoB gene sequencing resulted the most sensitive and specific molecular method for S. pneumonia identification and was used as reference method. The sensitivity and the specificity of the VITEK MS in S. pneumonia identification were 100%, while the Biotyper resulted less specific (92.4%. In non pneumococcal VGS strains, the group-level correlation between rpoB and the Biotyper was 100%, while the species-level correlation was 61% after database upgrading (than 37% before upgrading. The group-level correlation between rpoB and the VITEK MS was 100%, while the species-level correlation was 36% and increases at 69% if isolates identified as S. mitis/S. oralis are included. The less accurate performance of the VITEK MS in VGS identification within the Mitis group was due to the inability to discriminate between S. mitis and S. oralis. Conversely, the Biotyper, after the release of the upgraded database, was able to discriminate between the two species. In the dendrogram, VGS strains from the same group were grouped into the same cluster and had a good correspondence with the gene-based clustering reported by other authors, thus confirming the validity of the upgraded version of the database. Data from this study demonstrated that MALDI-TOF technique can represent a rapid and cost saving method for VGS identification even within the Mitis group but improvements of spectra

  3. Molecular diversification based on analysis of expressed sequence tags from the venom glands of the Chinese bird spider Ornithoctonus huwena.

    Science.gov (United States)

    Jiang, Liping; Peng, Li; Chen, Jinjun; Zhang, Yongqun; Xiong, Xia; Liang, Songping

    2008-06-15

    The bird spider Ornithoctonus huwena is one of the most venomous spiders in China. Its venom has been investigated but usually only the most abundant components have been analyzed. To characterize the primary structure of O. huwena toxins, a list of transcripts within the venom gland were made using the expressed sequence tag (EST) strategy. We generated 468 ESTs from a directional cDNA library of O. huwena venom glands. All ESTs were grouped into 24 clusters and 65 singletons, of which 68.00% of total ESTs belong to toxin-like sequences, 13.00% are similar to body peptide transcripts and 19.00% have no significant similarity to any known sequences. Precursors of all toxin-like sequences can be classified into eight different superfamilies (HWTX-I superfamily, HWTX-II superfamily, HWTX-X superfamily, HWTX-XIV superfamily, HWTX-XV superfamily, HWTX-XVI superfamily, HWTX-XVII superfamily, HWTX-XVIII superfamily) except HWTX-XI and HWTX-XIII, according to the identity of their precursor sequences. The results have predictive value for the discovery of various groups of pharmacologically distinct toxins in complex venoms, and for understanding the relationship of spider toxin evolution based on the diversification of cDNA sequences, primary structure of precursor peptides, three-dimensional structure motifs and biological functions.

  4. Introduction of the hybcell-based compact sequencing technology and comparison to state-of-the-art methodologies for KRAS mutation detection.

    Science.gov (United States)

    Zopf, Agnes; Raim, Roman; Danzer, Martin; Niklas, Norbert; Spilka, Rita; Pröll, Johannes; Gabriel, Christian; Nechansky, Andreas; Roucka, Markus

    2015-03-01

    The detection of KRAS mutations in codons 12 and 13 is critical for anti-EGFR therapy strategies; however, only those methodologies with high sensitivity, specificity, and accuracy as well as the best cost and turnaround balance are suitable for routine daily testing. Here we compared the performance of compact sequencing using the novel hybcell technology with 454 next-generation sequencing (454-NGS), Sanger sequencing, and pyrosequencing, using an evaluation panel of 35 specimens. A total of 32 mutations and 10 wild-type cases were reported using 454-NGS as the reference method. Specificity ranged from 100% for Sanger sequencing to 80% for pyrosequencing. Sanger sequencing and hybcell-based compact sequencing achieved a sensitivity of 96%, whereas pyrosequencing had a sensitivity of 88%. Accuracy was 97% for Sanger sequencing, 85% for pyrosequencing, and 94% for hybcell-based compact sequencing. Quantitative results were obtained for 454-NGS and hybcell-based compact sequencing data, resulting in a significant correlation (r = 0.914). Whereas pyrosequencing and Sanger sequencing were not able to detect multiple mutated cell clones within one tumor specimen, 454-NGS and the hybcell-based compact sequencing detected multiple mutations in two specimens. Our comparison shows that the hybcell-based compact sequencing is a valuable alternative to state-of-the-art methodologies used for detection of clinically relevant point mutations.

  5. Analysis of hepatitis B virus genotyping and drug resistance gene mutations based on massively parallel sequencing.

    Science.gov (United States)

    Han, Yingxin; Zhang, Yinxin; Mei, Yanhua; Wang, Yuqi; Liu, Tao; Guan, Yanfang; Tan, Deming; Liang, Yu; Yang, Ling; Yi, Xin

    2013-11-01

    Drug resistance to nucleoside analogs is a serious problem worldwide. Both drug resistance gene mutation detection and HBV genotyping are helpful for guiding clinical treatment. Total HBV DNA from 395 patients who were treated with single or multiple drugs including Lamivudine, Adefovir, Entecavir, Telbivudine, Tenofovir and Emtricitabine were sequenced using the HiSeq 2000 sequencing system and validated using the 3730 sequencing system. In addition, a mixed sample of HBV plasmid DNA was used to determine the cutoff value for HiSeq-sequencing, and 52 of the 395 samples were sequenced three times to evaluate the repeatability and stability of this technology. Of the 395 samples sequenced using both HiSeq and 3730 sequencing, the results from 346 were consistent, and the results from 49 were inconsistent. Among the 49 inconsistent results, 13 samples were detected as drug-resistance-positive using HiSeq but negative using 3730, and the other 36 samples showed a higher number of drug-resistance-positive gene mutations using HiSeq 2000 than using 3730. Gene mutations had an apparent frequency of 1% as assessed by the plasmid testing. Therefore, a 1% cutoff value was adopted. Furthermore, the experiment was repeated three times, and the same results were obtained in 49/52 samples using the HiSeq sequencing system. HiSeq sequencing can be used to analyze HBV gene mutations with high sensitivity, high fidelity, high throughput and automation and is a potential method for hepatitis B virus gene mutation detection and genotyping.

  6. Knowledge discovery and sequence-based prediction of pandemic influenza using an integrated classification and association rule mining (CBA) algorithm.

    Science.gov (United States)

    Kargarfard, Fatemeh; Sami, Ashkan; Ebrahimie, Esmaeil

    2015-10-01

    Pandemic influenza is a major concern worldwide. Availability of advanced technologies and the nucleotide sequences of a large number of pandemic and non-pandemic influenza viruses in 2009 provide a great opportunity to investigate the underlying rules of pandemic induction through data mining tools. Here, for the first time, an integrated classification and association rule mining algorithm (CBA) was used to discover the rules underpinning alteration of non-pandemic sequences to pandemic ones. We hypothesized that the extracted rules can lead to the development of an efficient expert system for prediction of influenza pandemics. To this end, we used a large dataset containing 5373 HA (hemagglutinin) segments of the 2009 H1N1 pandemic and non-pandemic influenza sequences. The analysis was carried out for both nucleotide and protein sequences. We found a number of new rules which potentially present the undiscovered antigenic sites at influenza structure. At the nucleotide level, alteration of thymine (T) at position 260 was the key discriminating feature in distinguishing non-pandemic from pandemic sequences. At the protein level, rules including I233K, M334L were the differentiating features. CBA efficiently classifies pandemic and non-pandemic sequences with high accuracy at both the nucleotide and protein level. Finding hotspots in influenza sequences is a significant finding as they represent the regions with low antibody reactivity. We argue that the virus breaks host immunity response by mutation at these spots. Based on the discovered rules, we developed the software, "Prediction of Pandemic Influenza" for discrimination of pandemic from non-pandemic sequences. This study opens a new vista in discovery of association rules between mutation points during evolution of pandemic influenza.

  7. Three-Phase Multiple Harmonic Sequence Detection Based on Generalized Delayed Signal Superposition

    DEFF Research Database (Denmark)

    Lu, Yong; Xiao, Guochun; Wang, Xiongfei;

    2016-01-01

    multiple harmonic sequence detection method is proposed for estimating both the fundamental and harmonic sequence components under adverse grid conditions. This detection method is denoted as MG DSS-PLL since it contains Multiple Generalized Delayed Signal Superposition operators and a Phase-Locked Loop......Grid synchronization has always been an important challenge for three-phase grid-connected converters under unbalanced and distorted grid conditions. Moreover, how to quickly and accurately extract multiple harmonic sequence information is essential for control systems. In this paper, a three-phase...

  8. Sequence and structure-based prediction of fructosyltransferase activity for functional subclassification of fungal GH32 enzymes.

    Science.gov (United States)

    Trollope, Kim M; van Wyk, Niël; Kotjomela, Momo A; Volschenk, Heinrich

    2015-12-01

    Sucrolytic enzymes catalyse sucrose hydrolysis or the synthesis of fructooligosaccharides (FOSs), a prebiotic in human and animal nutrition. FOS synthesis capacity differs between sucrolytic enzymes. Amino-acid-sequence-based classification of FOS synthesizing enzymes would greatly facilitate the in silico identification of novel catalysts, as large amounts of sequence data lie untapped. The development of a bioinformatics tool to rapidly distinguish between high-level FOSs synthesizing predominantly sucrose hydrolysing enzymes from fungal genomic data is presented. Sequence comparison of functionally characterized enzymes displaying low- and high-level FOS synthesis revealed conserved motifs unique to each group. New light is shed on the sequence context of active site residues in three previously identified conserved motifs. We characterized two enzymes predicted to possess low- and high-level FOS synthesis activities based on their conserved motif sequences. FOS data for the enzymes confirmed our successful prediction of their FOS synthesis capacity. Structural comparison of enzymes displaying low- and high-level FOS synthesis identified steric hindrance between nystose and a long loop region present only in low-level FOS synthesizers. This loop is proposed to limit the synthesis of FOS species with higher degrees of polymerization, a phenomenon observed among enzymes displaying low-level FOS synthesis. Conserved sequence motifs surrounding catalytic residues and a distant structural determinant were identifiers of FOS synthesis capacity and allow for functional annotation of sucrolytic enzymes directly from amino acid sequence. The tool presented may also be useful to study the structure-function relationships of β-fructofuranosidases by identifying mutations present in a group of closely related enzymes displaying similar function. PMID:26426731

  9. Sequence and structure-based prediction of fructosyltransferase activity for functional subclassification of fungal GH32 enzymes.

    Science.gov (United States)

    Trollope, Kim M; van Wyk, Niël; Kotjomela, Momo A; Volschenk, Heinrich

    2015-12-01

    Sucrolytic enzymes catalyse sucrose hydrolysis or the synthesis of fructooligosaccharides (FOSs), a prebiotic in human and animal nutrition. FOS synthesis capacity differs between sucrolytic enzymes. Amino-acid-sequence-based classification of FOS synthesizing enzymes would greatly facilitate the in silico identification of novel catalysts, as large amounts of sequence data lie untapped. The development of a bioinformatics tool to rapidly distinguish between high-level FOSs synthesizing predominantly sucrose hydrolysing enzymes from fungal genomic data is presented. Sequence comparison of functionally characterized enzymes displaying low- and high-level FOS synthesis revealed conserved motifs unique to each group. New light is shed on the sequence context of active site residues in three previously identified conserved motifs. We characterized two enzymes predicted to possess low- and high-level FOS synthesis activities based on their conserved motif sequences. FOS data for the enzymes confirmed our successful prediction of their FOS synthesis capacity. Structural comparison of enzymes displaying low- and high-level FOS synthesis identified steric hindrance between nystose and a long loop region present only in low-level FOS synthesizers. This loop is proposed to limit the synthesis of FOS species with higher degrees of polymerization, a phenomenon observed among enzymes displaying low-level FOS synthesis. Conserved sequence motifs surrounding catalytic residues and a distant structural determinant were identifiers of FOS synthesis capacity and allow for functional annotation of sucrolytic enzymes directly from amino acid sequence. The tool presented may also be useful to study the structure-function relationships of β-fructofuranosidases by identifying mutations present in a group of closely related enzymes displaying similar function.

  10. Structural insights of microbial community of Deulajhari (India hot spring using 16s-rRNA based metagenomic sequencing

    Directory of Open Access Journals (Sweden)

    Archana Singh

    2016-03-01

    Full Text Available Insights about the distribution of the microbial community prove to be the major goal of understanding microbial ecology which remains to be fully deciphered. Hot springs being hub for the thermophilic microbiota attract the attention of the microbiologists. Deulajhari hot spring cluster is located in the Angul district of Odisha. Covered within a wooded area, Deulajhari hot spring is also fed by the plant litter resulting in a relatively high amount of total organic content (TOC. For the first time, Illumina sequencing based biodiversity analysis of microbial composition is studied through amplicon metagenome sequencing of 16s rRNA targeting V3‐V4 region using metagenomic DNA from the hot spring sediment. Over 28 phyla were detected through the amplicon metagenome sequencing of which the most dominating phyla at the existing physiochemical parameters like; temperature 69 °C, pH 8.09, electroconductivity 0.025 dSm−1 and total organic carbon 0.356%, were Proteobacteria (88.12%, Bacteriodetes (10.76%, Firmicutes (0.35%, Spirochetes (0.18% and chloroflexi (0.11%. Approximately 713 species were observed at the above physiochemical parameters. The analysis of the metagenome provides the quantitative insights into microbial populations based on the sequence data in Deulajhari hot spring. Metagenome sequence is deposited to SRA database which is available at NCBI with accession no. SRX1459736.

  11. Phylogeographic analysis of hemorrhagic fever with renal syndrome patients using multiplex PCR-based next generation sequencing.

    Science.gov (United States)

    Kim, Won-Keun; Kim, Jeong-Ah; Song, Dong Hyun; Lee, Daesang; Kim, Yong Chul; Lee, Sook-Young; Lee, Seung-Ho; No, Jin Sun; Kim, Ji Hye; Kho, Jeong Hoon; Gu, Se Hun; Jeong, Seong Tae; Wiley, Michael; Kim, Heung-Chul; Klein, Terry A; Palacios, Gustavo; Song, Jin-Won

    2016-01-01

    Emerging and re-emerging infectious diseases caused by RNA viruses pose a critical public health threat. Next generation sequencing (NGS) is a powerful technology to define genomic sequences of the viruses. Of particular interest is the use of whole genome sequencing (WGS) to perform phylogeographic analysis, that allows the detection and tracking of the emergence of viral infections. Hantaviruses, Bunyaviridae, cause hemorrhagic fever with renal syndrome (HFRS) and hantavirus pulmonary syndrome (HPS) in humans. We propose to use WGS for the phylogeographic analysis of human hantavirus infections. A novel multiplex PCR-based NGS was developed to gather whole genome sequences of Hantaan virus (HTNV) from HFRS patients and rodent hosts in endemic areas. The obtained genomes were described for the spatial and temporal links between cases and their sources. Phylogenetic analyses demonstrated geographic clustering of HTNV strains from clinical specimens with the HTNV strains circulating in rodents, suggesting the most likely site and time of infection. Recombination analysis demonstrated a genome organization compatible with recombination of the HTNV S segment. The multiplex PCR-based NGS is useful and robust to acquire viral genomic sequences and may provide important ways to define the phylogeographical association and molecular evolution of hantaviruses. PMID:27221218

  12. A comparative study of pseudorandom sequences used in a c-VEP based BCI for online wheelchair control

    DEFF Research Database (Denmark)

    Isaksen, Jonas L.; Mohebbi, Ali; Puthusserypady, Sadasivan

    2016-01-01

    In this study, a c-VEP based BCI system was developed to run on three distinctive pseudorandom sequences, namely the m-code, the Gold-code, and the Barker-code. The Visual Evoked Potentials (VEPs) were provoked using these codes. In the online session, subjects controlled a LEGO® Mindstorms® robo...

  13. Next-generation sequencing-based genome diagnostics across clinical genetics centers : implementation choices and their effects

    NARCIS (Netherlands)

    Vrijenhoek, Terry; Kraaijeveld, Ken; Elferink, Martin; de Ligt, Joep; Kranendonk, Elcke; Santen, Gijs; Nijman, Isaac J.; Butler, Derek; Claes, Godelieve; Costessi, Adalberto; Dorlijn, Wim; van Eyndhoven, Winfried; Halley, Dicky J. J.; van den Hout, Mirjam C. G. N.; van Hove, Steven; Johansson, Lennart F.; Jongbloed, Jan D. H.; Kamps, Rick; Kockx, Christel E. M.; de Koning, Bart; Kriek, Marjolein; Deprez, Ronald Lekanne Dit; Lunstroo, Hans; Mannens, Marcel; Mook, Olaf R.; Nelen, Marcel; Ploem, Corrette; Rijnen, Marco; Saris, Jasper J.; Sinke, Richard; Sistermans, Erik; van Slegtenhorst, Marjon; Sleutels, Frank; van der Stoep, Nienke; van Tienhoven, Marianne; Vermaat, Martijn; Vogel, Maartje; Waisfisz, Quinten; Weiss, Janneke Marjan; van den Wijngaard, Arthur; van Workum, Wilbert; Ijntema, Helger; van der Zwaag, Bert; van IJcken, Wilfred F. J.; den Dunnen, Johan T.; Veltman, Joris A.; Hennekam, Raoul; Cuppen, Edwin

    2015-01-01

    Implementation of next-generation DNA sequencing (NGS) technology into routine diagnostic genome care requires strategic choices. Instead of theoretical discussions on the consequences of such choices, we compared NGS-based diagnostic practices in eight clinical genetic centers in the Netherlands, b

  14. Phylogeny and evolutionary histories of Pyrus L. revealed by phylogenetic trees and networks based on data from multiple DNA sequences

    Science.gov (United States)

    Reconstructing the phylogeny of Pyrus has been difficult due to the wide distribution of the genus and lack of informative data. In this study, we collected 110 accessions representing 25 Pyrus species and constructed both phylogenetic trees and phylogenetic networks based on multiple DNA sequence d...

  15. Matrix based method for synthesis of main intensified and integrated distillation sequences

    Energy Technology Data Exchange (ETDEWEB)

    Khalili-Garakani, Amirhossein; Kasiri, Norollah [Iran University of Science and Technology (IUST), Tehran (Iran, Islamic Republic of); Ivakpour, Javad [Research Institute of Petroleum Industry (RIPI), Tehran (Iran, Islamic Republic of)

    2016-04-15

    The objective of many studies in this area has involved access to a column-sequencing algorithm enabling designers and researchers alike to generate a wide range of sequences in a broad search space, and be as mathematically and as automated as possible for programing purposes and with good generality. In the present work an algorithm previously developed by the authors, called the matrix method, has been developed much further. The new version of the algorithm includes thermally coupled, thermodynamically equivalent, intensified, simultaneous heat and mass integrated and divided-wall column sequences which are of gross application and provide vast saving potential both on capital investment, operating costs and energy usage in industrial applications. To demonstrate the much wider searchable space now accessible, a three component separation has been thoroughly examined as a case study, always resulting in an integrated sequence being proposed as the optimum.

  16. Simultaneous digital quantification and fluorescence-based size characterization of massively parallel sequencing libraries.

    Science.gov (United States)

    Laurie, Matthew T; Bertout, Jessica A; Taylor, Sean D; Burton, Joshua N; Shendure, Jay A; Bielas, Jason H

    2013-08-01

    Due to the high cost of failed runs and suboptimal data yields, quantification and determination of fragment size range are crucial steps in the library preparation process for massively parallel sequencing (or next-generation sequencing). Current library quality control methods commonly involve quantification using real-time quantitative PCR and size determination using gel or capillary electrophoresis. These methods are laborious and subject to a number of significant limitations that can make library calibration unreliable. Herein, we propose and test an alternative method for quality control of sequencing libraries using droplet digital PCR (ddPCR). By exploiting a correlation we have discovered between droplet fluorescence and amplicon size, we achieve the joint quantification and size determination of target DNA with a single ddPCR assay. We demonstrate the accuracy and precision of applying this method to the preparation of sequencing libraries.

  17. Chaotic-Laser-Based True Random Sequence Generation for Spread-Spectrum Communications

    Institute of Scientific and Technical Information of China (English)

    张朝霞; 周俊杰; 张东泽; 傅正; 张建忠

    2012-01-01

    We propose a scheme for spread-spectrum communications using true random sequences generated by chaotic semiconductor lasers as spreading codes. These sequences can eliminate the inherent periodicity of pseudorandom sequences,enlarge the capacity of spread-spectrum codes,improve communication security,and increase the number ot users of the system.When a true random sequence with an appropriate length is used as thespread-spectrum code and the information speed is maintained constant,the system acquires a greater spreadspectrum gain and a lower bit-error ratio (BER) than the traditional spread-spectrum system.The communication security is also enhanced.The BER smoothly increases with the number of users,which indicates the good multipleaccess capability of the system.

  18. What are the basic modules of implicit sequence learning? A feature-based account

    OpenAIRE

    Eberhardt, Katharina

    2015-01-01

    According to the Theory of Event Coding (TEC; Hommel et al., 2001), action and perception are represented in a shared format in the cognitive system by means of feature codes. In implicit sequence learning research, it is still common to make a conceptual difference between independent motor and perceptual sequences. This supposedly independent learning takes place in encapsulated modules (Keele et al., 2003) which process information along single dimensions. These dimensions have remained un...

  19. Transcriptome analysis of carnation (Dianthus caryophyllus L. based on next-generation sequencing technology

    Directory of Open Access Journals (Sweden)

    Tanase Koji

    2012-07-01

    Full Text Available Abstract Background Carnation (Dianthus caryophyllus L., in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, we generated an expressed sequence tag (EST database for a carnation cultivar important in horticulture by high-throughput sequencing using 454 pyrosequencing technology. Results We constructed a normalized cDNA library and a 3’-UTR library of carnation, obtaining a total of 1,162,126 high-quality reads. These reads were assembled into 300,740 unigenes consisting of 37,844 contigs and 262,896 singlets. The contigs were searched against an Arabidopsis sequence database, and 61.8% (23,380 of them had at least one BLASTX hit. These contigs were also annotated with Gene Ontology (GO and were found to cover a broad range of GO categories. Furthermore, we identified 17,362 potential simple sequence repeats (SSRs in 14,291 of the unigenes. We focused on gene discovery in the areas of flower color and ethylene biosynthesis. Transcripts were identified for almost every gene involved in flower chlorophyll and carotenoid metabolism and in anthocyanin biosynthesis. Transcripts were also identified for every step in the ethylene biosynthesis pathway. Conclusions We present the first large-scale sequence data set for carnation, generated using next-generation sequencing technology. The large EST database generated from these sequences is an informative resource for identifying genes involved in various biological processes in carnation and provides an EST resource for understanding the genetic diversity of this plant.

  20. MytiBase: a knowledgebase of mussel (M. galloprovincialis transcribed sequences

    Directory of Open Access Journals (Sweden)

    Roch Philippe

    2009-02-01

    Full Text Available Abstract Background Although Bivalves are among the most studied marine organisms due to their ecological role, economic importance and use in pollution biomonitoring, very little information is available on the genome sequences of mussels. This study reports the functional analysis of a large-scale Expressed Sequence Tag (EST sequencing from different tissues of Mytilus galloprovincialis (the Mediterranean mussel challenged with toxic pollutants, temperature and potentially pathogenic bacteria. Results We have constructed and sequenced seventeen cDNA libraries from different Mediterranean mussel tissues: gills, digestive gland, foot, anterior and posterior adductor muscle, mantle and haemocytes. A total of 24,939 clones were sequenced from these libraries generating 18,788 high-quality ESTs which were assembled into 2,446 overlapping clusters and 4,666 singletons resulting in a total of 7,112 non-redundant sequences. In particular, a high-quality normalized cDNA library (Nor01 was constructed as determined by the high rate of gene discovery (65.6%. Bioinformatic screening of the non-redundant M. galloprovincialis sequences identified 159 microsatellite-containing ESTs. Clusters, consensuses, related similarities and gene ontology searches have been organized in a dedicated, searchable database http://mussel.cribi.unipd.it. Conclusion We defined the first species-specific catalogue of M. galloprovincialis ESTs including 7,112 unique transcribed sequences. Putative microsatellite markers were identified. This annotated catalogue represents a valuable platform for expression studies, marker validation and genetic linkage analysis for investigations in the biology of Mediterranean mussels.

  1. An Efficient Genotyping Method in Chicken Based on Genome Reducing and Sequencing.

    Directory of Open Access Journals (Sweden)

    Rongrong Liao

    Full Text Available Single nucleotide polymorphisms (SNPs are essential for identifying the genetic mechanisms of complex traits. In the present study, we applied genotyping by genome reducing and sequencing (GGRS method to construct a 252-plex sequencing library for SNP discovery and genotyping in chicken. The library was successfully sequenced on an Illumina HiSeq 2500 sequencer with a paired-end pattern; approximately 400 million raw reads were generated, and an average of approximately 1.4 million good reads per sample were generated. A total of 91,767 SNPs were identified after strict filtering, and all of the 252 samples and all of the chromosomes were well represented. Compared with the Illumina 60K chicken SNP chip data, approximately 34,131 more SNPs were identified using GGRS, and a higher SNP density was found using GGRS, which could be beneficial for downstream analysis. Using the GGRS method, more than 3528 samples can be sequenced simultaneously, and the cost is reduced to $18 per sample. To the best of our knowledge, this study describes the first report of such highly multiplexed sequencing in chicken, indicating potential applications for genome-wide association and genomic selection in chicken.

  2. Modeling compositional dynamics based on GC and purine contents of protein-coding sequences

    KAUST Repository

    Zhang, Zhang

    2010-11-08

    Background: Understanding the compositional dynamics of genomes and their coding sequences is of great significance in gaining clues into molecular evolution and a large number of publically-available genome sequences have allowed us to quantitatively predict deviations of empirical data from their theoretical counterparts. However, the quantification of theoretical compositional variations for a wide diversity of genomes remains a major challenge.Results: To model the compositional dynamics of protein-coding sequences, we propose two simple models that take into account both mutation and selection effects, which act differently at the three codon positions, and use both GC and purine contents as compositional parameters. The two models concern the theoretical composition of nucleotides, codons, and amino acids, with no prerequisite of homologous sequences or their alignments. We evaluated the two models by quantifying theoretical compositions of a large collection of protein-coding sequences (including 46 of Archaea, 686 of Bacteria, and 826 of Eukarya), yielding consistent theoretical compositions across all the collected sequences.Conclusions: We show that the compositions of nucleotides, codons, and amino acids are largely determined by both GC and purine contents and suggest that deviations of the observed from the expected compositions may reflect compositional signatures that arise from a complex interplay between mutation and selection via DNA replication and repair mechanisms.Reviewers: This article was reviewed by Zhaolei Zhang (nominated by Mark Gerstein), Guruprasad Ananda (nominated by Kateryna Makova), and Daniel Haft. 2010 Zhang and Yu; licensee BioMed Central Ltd.

  3. Systematics of Amaryllidaceae based on cladistic analysis of plastid sequence data.

    Science.gov (United States)

    Meerow, A W; Fay, M F; Guy, C L; Li, Q B; Zaman, F Q; Chase, M W

    1999-09-01

    Cladistic analyses of plastid DNA sequences rbcL and trnL-F are presented separately and combined for 48 genera of Amaryllidaceae and 29 genera of related asparagalean families. The combined analysis is the most highly resolved of the three and provides good support for the monophyly of Amaryllidaceae and indicates Agapanthaceae as its sister family. Alliaceae are in turn sister to the Amaryllidaceae/Agapanthaceae clade. The origins of the family appear to be western Gondwanaland (Africa), and infrafamilial relationships are resolved along biogeographic lines. Tribe Amaryllideae, primarily South African, is sister to the rest of Amaryllidaceae; this tribe is supported by numerous morphological synapomorphies as well. The remaining two African tribes of the family, Haemantheae and Cyrtantheae, are well supported, but their position relative to the Australasian Calostemmateae and a large clade comprising the Eurasian and American genera, is not yet clear. The Eurasian and American elements of the family are each monophyletic sister clades. Internal resolution of the Eurasian clade only partially supports currently accepted tribal concepts, and few conclusions can be drawn on the relationships of the genera based on these data. A monophyletic Lycorideae (Central and East Asian) is weakly supported. Galanthus and Leucojum (Galantheae pro parte) are supported as sister genera by the bootstrap. The American clade shows a higher degree of internal resolution. Hippeastreae (minus Griffinia and Worsleya) are well supported, and Zephyranthinae are resolved as a distinct subtribe. An Andean clade marked by a chromosome number of 2n = 46 (and derivatives thereof) is resolved with weak support. The plastid DNA phylogenies are discussed in the context of biogeography and character evolution in the family.

  4. HLA polymorphisms in Cabo Verde and Guiné-Bissau inferred from sequence-based typing.

    Science.gov (United States)

    Spínola, Hélder; Bruges-Armas, Jácome; Middleton, Derek; Brehm, António

    2005-10-01

    Human leukocyte antigen (HLA)-A, -B, and -DRB1 polymorphisms were examined in the Cabo Verde and Guiné-Bissau populations. The data were obtained at high-resolution level, using sequence-based typing. The most frequent alleles in each locus was: A*020101 (16.7% in Guiné-Bissau and 13.5% in Cabo Verde), B*350101 (14.4% in Guiné-Bissau and 13.2% in Cabo Verde), DRB1*1304 (19.6% in Guiné-Bissau), and DRB1*1101 (10.1% in Cabo Verde). The predominant three loci haplotype in Guiné-Bissau was A*2301-B*1503-DRB1*1101 (4.6%) and in Cabo Verde was A*3002-B*350101-DRB1*1001 (2.8%), exclusive to northwestern islands (5.6%) and absent in Guiné-Bissau. The present study corroborates historic sources and other genetic studies that say Cabo Verde were populated not only by Africans but also by Europeans. Haplotypes and dendrogram analysis shows a Caucasian genetic influence in today's gene pool of Cabo Verdeans. Haplotypes and allele frequencies present a differential distribution between southeastern and northwestern Cabo Verde islands, which could be the result of different genetic influences, founder effect, or bottlenecks. Dendrograms and principal coordinates analysis show that Guineans are more similar to North Africans than other HLA-studied sub-Saharans, probably from ancient and recent genetic contacts with other peoples, namely East Africans. PMID:16386651

  5. Modifications in SIFT-based 3D reconstruction from image sequence

    Science.gov (United States)

    Wei, Zhenzhong; Ding, Boshen; Wang, Wei

    2014-11-01

    In this paper, we aim to reconstruct 3D points of the scene from related images. Scale Invariant Feature Transform( SIFT) as a feature extraction and matching algorithm has been proposed and improved for years and has been widely used in image alignment and stitching, image recognition and 3D reconstruction. Because of the robustness and reliability of the SIFT's feature extracting and matching algorithm, we use it to find correspondences between images. Hence, we describe a SIFT-based method to reconstruct 3D sparse points from ordered images. In the process of matching, we make a modification in the process of finding the correct correspondences, and obtain a satisfying matching result. By rejecting the "questioned" points before initial matching could make the final matching more reliable. Given SIFT's attribute of being invariant to the image scale, rotation, and variable changes in environment, we propose a way to delete the multiple reconstructed points occurred in sequential reconstruction procedure, which improves the accuracy of the reconstruction. By removing the duplicated points, we avoid the possible collapsed situation caused by the inexactly initialization or the error accumulation. The limitation of some cases that all reprojected points are visible at all times also does not exist in our situation. "The small precision" could make a big change when the number of images increases. The paper shows the contrast between the modified algorithm and not. Moreover, we present an approach to evaluate the reconstruction by comparing the reconstructed angle and length ratio with actual value by using a calibration target in the scene. The proposed evaluation method is easy to be carried out and with a great applicable value. Even without the Internet image datasets, we could evaluate our own results. In this paper, the whole algorithm has been tested on several image sequences both on the internet and in our shots.

  6. MICA polymorphism in a population from north Morocco, Metalsa Berbers, using sequence-based typing.

    Science.gov (United States)

    Piancatelli, Daniela; Del Beato, Tiziana; Oumhani, Khadija; El Aouad, Rajae; Adorno, Domenico

    2005-08-01

    The MICA gene encodes a family of nonclassical major histocompatibility complex class I molecules. Data on MICA polymorphism in different populations are still limited. In the present study, MICA allele frequencies (af) were assessed in 82 unrelated healthy individuals from a Moroccan Berber population named Metalsa (ME) by means of sequence-based typing of exons 2, 3, 4, and 5. In consideration of the linkage disequilibrium existing between MICA and human leukocyte antigen (HLA) class I alleles, MICA/HLA-B, MICA/HLA-Cw, and MICA/HLA-A haplotype frequencies (hf) were estimated. A wide allelic distribution including 16 different MICA alleles was found in ME. The most common MICA alleles were MICA*00801 (af = 0.268), *004 (0.232), *00902 (0.140), *00901 (0.085), and *00901 (0.073). The most common MICA/HLA-B haplotypes were MICA*004-B*4403 and MICA*009-B*50 (hf = 0.113 for both these haplotypes). Some known MICA and HLA-B associations were confirmed in this population. Noteworthy was the high frequency of MICA*009 (af = 0.226); the high frequency of B*50 found in ME (af = 0.114) permitted us to evidence the associations of MICA*00902 with B*5001 (hf = 0.068) or *5002 (hf = 0.045), whereas MICA*00901 was mainly associated with B*5101 (hf = 0.038), which corresponds to the previously described association MICA*009/A6-HLA-B*51. This study extends the previous knowledge on MICA polymorphism to a North African white population and may have implications for disease associations and transplantation.

  7. Deep sequencing-based identification of small regulatory RNAs in Synechocystis sp. PCC 6803.

    Directory of Open Access Journals (Sweden)

    Wen Xu

    Full Text Available Synechocystis sp. PCC 6803 is a genetically tractable model organism for photosynthesis research. The genome of Synechocystis sp. PCC 6803 consists of a circular chromosome and seven plasmids. The importance of small regulatory RNAs (sRNAs as mediators of a number of cellular processes in bacteria has begun to be recognized. However, little is known regarding sRNAs in Synechocystis sp. PCC 6803. To provide a comprehensive overview of sRNAs in this model organism, the sRNAs of Synechocystis sp. PCC 6803 were analyzed using deep sequencing, and 7,951,189 reads were obtained. High quality mapping reads (6,127,890 were mapped onto the genome and assembled into 16,192 transcribed regions (clusters based on read overlap. A total number of 5211 putative sRNAs were revealed from the genome and the 4 megaplasmids, and 27 of these molecules, including four from plasmids, were confirmed by RT-PCR. In addition, possible target genes regulated by all of the putative sRNAs identified in this study were predicted by IntaRNA and analyzed for functional categorization and biological pathways, which provided evidence that sRNAs are indeed involved in many different metabolic pathways, including basic metabolic pathways, such as glycolysis/gluconeogenesis, the citrate cycle, fatty acid metabolism and adaptations to environmentally stress-induced changes. The information from this study provides a valuable reservoir for understanding the sRNA-mediated regulation of the complex physiology and metabolic processes of cyanobacteria.

  8. AnaBench: a Web/CORBA-based workbench for biomolecular sequence analysis

    Directory of Open Access Journals (Sweden)

    Burger Gertraud

    2003-12-01

    Full Text Available Abstract Background Sequence data analyses such as gene identification, structure modeling or phylogenetic tree inference involve a variety of bioinformatics software tools. Due to the heterogeneity of bioinformatics tools in usage and data requirements, scientists spend much effort on technical issues including data format, storage and management of input and output, and memorization of numerous parameters and multi-step analysis procedures. Results In this paper, we present the design and implementation of AnaBench, an interactive, Web-based bioinformatics Analysis workBench allowing streamlined data analysis. Our philosophy was to minimize the technical effort not only for the scientist who uses this environment to analyze data, but also for the administrator who manages and maintains the workbench. With new bioinformatics tools published daily, AnaBench permits easy incorporation of additional tools. This flexibility is achieved by employing a three-tier distributed architecture and recent technologies including CORBA middleware, Java, JDBC, and JSP. A CORBA server permits transparent access to a workbench management database, which stores information about the users, their data, as well as the description of all bioinformatics applications that can be launched from the workbench. Conclusion AnaBench is an efficient and intuitive interactive bioinformatics environment, which offers scientists application-driven, data-driven and protocol-driven analysis approaches. The prototype of AnaBench, managed by a team at the Université de Montréal, is accessible on-line at: http://malawimonas.bcm.umontreal.ca:8091/anabench. Please contact the authors for details about setting up a local-network AnaBench site elsewhere.

  9. High Throughput Sample Preparation and Analysis for DNA Sequencing, PCR and Combinatorial Screening of Catalysis Based on Capillary Array Technique

    Energy Technology Data Exchange (ETDEWEB)

    Yonghua Zhang

    2002-05-27

    Sample preparation has been one of the major bottlenecks for many high throughput analyses. The purpose of this research was to develop new sample preparation and integration approach for DNA sequencing, PCR based DNA analysis and combinatorial screening of homogeneous catalysis based on multiplexed capillary electrophoresis with laser induced fluorescence or imaging UV absorption detection. The author first introduced a method to integrate the front-end tasks to DNA capillary-array sequencers. protocols for directly sequencing the plasmids from a single bacterial colony in fused-silica capillaries were developed. After the colony was picked, lysis was accomplished in situ in the plastic sample tube using either a thermocycler or heating block. Upon heating, the plasmids were released while chromsomal DNA and membrane proteins were denatured and precipitated to the bottom of the tube. After adding enzyme and Sanger reagents, the resulting solution was aspirated into the reaction capillaries by a syringe pump, and cycle sequencing was initiated. No deleterious effect upon the reaction efficiency, the on-line purification system, or the capillary electrophoresis separation was observed, even though the crude lysate was used as the template. Multiplexed on-line DNA sequencing data from 8 parallel channels allowed base calling up to 620 bp with an accuracy of 98%. The entire system can be automatically regenerated for repeated operation. For PCR based DNA analysis, they demonstrated that capillary electrophoresis with UV detection can be used for DNA analysis starting from clinical sample without purification. After PCR reaction using cheek cell, blood or HIV-1 gag DNA, the reaction mixtures was injected into the capillary either on-line or off-line by base stacking. The protocol was also applied to capillary array electrophoresis. The use of cheaper detection, and the elimination of purification of DNA sample before or after PCR reaction, will make this approach an

  10. RCK: accurate and efficient inference of sequence- and structure-based protein–RNA binding models from RNAcompete data

    Science.gov (United States)

    Orenstein, Yaron; Wang, Yuhao; Berger, Bonnie

    2016-01-01

    Motivation: Protein–RNA interactions, which play vital roles in many processes, are mediated through both RNA sequence and structure. CLIP-based methods, which measure protein–RNA binding in vivo, suffer from experimental noise and systematic biases, whereas in vitro experiments capture a clearer signal of protein RNA-binding. Among them, RNAcompete provides binding affinities of a specific protein to more than 240 000 unstructured RNA probes in one experiment. The computational challenge is to infer RNA structure- and sequence-based binding models from these data. The state-of-the-art in sequence models, Deepbind, does not model structural preferences. RNAcontext models both sequence and structure preferences, but is outperformed by GraphProt. Unfortunately, GraphProt cannot detect structural preferences from RNAcompete data due to the unstructured nature of the data, as noted by its developers, nor can it be tractably run on the full RNACompete dataset. Results: We develop RCK, an efficient, scalable algorithm that infers both sequence and structure preferences based on a new k-mer based model. Remarkably, even though RNAcompete data is designed to be unstructured, RCK can still learn structural preferences from it. RCK significantly outperforms both RNAcontext and Deepbind in in vitro binding prediction for 244 RNAcompete experiments. Moreover, RCK is also faster and uses less memory, which enables scalability. While currently on par with existing methods in in vivo binding prediction on a small scale test, we demonstrate that RCK will increasingly benefit from experimentally measured RNA structure profiles as compared to computationally predicted ones. By running RCK on the entire RNAcompete dataset, we generate and provide as a resource a set of protein–RNA structure-based models on an unprecedented scale. Availability and Implementation: Software and models are freely available at http://rck.csail.mit.edu/ Contact: bab@mit.edu Supplementary information

  11. Composition-based classification of short metagenomic sequences elucidates the landscapes of taxonomic and functional enrichment of microorganisms

    OpenAIRE

    Liu, Jiemeng; Wang, Haifeng; Yang, Hongxing; Zhang, Yizhe; Wang, Jinfeng; Zhao, Fangqing; Qi, Ji

    2012-01-01

    Compared with traditional algorithms for long metagenomic sequence classification, characterizing microorganisms’ taxonomic and functional abundance based on tens of millions of very short reads are much more challenging. We describe an efficient composition and phylogeny-based algorithm [Metagenome Composition Vector (MetaCV)] to classify very short metagenomic reads (75–100 bp) into specific taxonomic and functional groups. We applied MetaCV to the Meta-HIT data (371-Gb 75-bp reads of 109 h...

  12. Multi-wavelength photonic band gaps based on quasi-periodically poled lithium niobate ordered in Fibonacci sequences

    Institute of Scientific and Technical Information of China (English)

    Zhuoer Zhou; Jianhong Shi; Xianfeng Chen

    2009-01-01

    We demonstrate a quasi-periodic structure exhibiting multiple photonic band gaps (PBGs) based on submicron-period poled lithium niobate (LN).The structure consists of two building blocks,each containing a pair of antiparallel poled domains,arranged as a Fibonacci sequence.The gap wavelengths are analyzed with the Fibonacci sequence parameters such as the quasiperiodic indices and the average lattice parameter.The transmission properties are investigated by a traditional 4x4 matrix method.It has also been proved that the gap depth can be tuned by the lengths of poled domains.

  13. Compact FPGA-based pulse-sequencer and radio-frequency generator for experiments with trapped atoms

    CERN Document Server

    Pruttivarasin, Thaned

    2015-01-01

    We present a compact FPGA-based pulse sequencer and radio-frequency (RF) generator suitable for experiments with cold trapped ions and atoms. The unit is capable of outputting a pulse sequence with at least 32 TTL channels with a timing resolution of 40 ns and contains a built-in 100 MHz frequency counter for counting electrical pulses from a photo-multiplier tube (PMT). There are 16 independent direct-digital-synthesizers (DDS) RF sources with fast (rise-time of ~60 ns) amplitude switching and sub-mHz frequency tuning from 0 to 800 MHz.

  14. Compact field programmable gate array-based pulse-sequencer and radio-frequency generator for experiments with trapped atoms

    Energy Technology Data Exchange (ETDEWEB)

    Pruttivarasin, Thaned, E-mail: thaned.pruttivarasin@riken.jp [Quantum Metrology Laboratory, RIKEN, Wako-shi, Saitama 351-0198 (Japan); Katori, Hidetoshi [Quantum Metrology Laboratory, RIKEN, Wako-shi, Saitama 351-0198 (Japan); Innovative Space-Time Project, ERATO, JST, Bunkyo-ku, Tokyo 113-8656 (Japan); Department of Applied Physics, Graduate School of Engineering, The University of Tokyo, Bunkyo-ku, Tokyo 113-8656 (Japan)

    2015-11-15

    We present a compact field-programmable gate array (FPGA) based pulse sequencer and radio-frequency (RF) generator suitable for experiments with cold trapped ions and atoms. The unit is capable of outputting a pulse sequence with at least 32 transistor-transistor logic (TTL) channels with a timing resolution of 40 ns and contains a built-in 100 MHz frequency counter for counting electrical pulses from a photo-multiplier tube. There are 16 independent direct-digital-synthesizers RF sources with fast (rise-time of ∼60 ns) amplitude switching and sub-mHz frequency tuning from 0 to 800 MHz.

  15. Patch-based generation of a pseudo CT from conventional MRI sequences for MRI-only radiotherapy of the brain

    DEFF Research Database (Denmark)

    Andreasen, Daniel; Van Leemput, Koen; Hansen, Rasmus H.;

    2015-01-01

    Purpose: In radiotherapy (RT) based on magnetic resonance imaging (MRI) as the only modality, the information on electron density must be derived from the MRI scan by creating a so-called pseudo computed tomography (pCT). This is a nontrivial task, since the voxel-intensities in an MRI scan...... are not uniquely related to electron density. To solve the task, voxel-based or atlas-based models have typically been used. The voxel-based models require a specialized dual ultrashort echo time MRI sequence for bone visualization and the atlas-based models require deformable registrations of conventional MRI...... scans. In this study, we investigate the potential of a patch-based method for creating a pCT based on conventional T1-weighted MRI scans without using deformable registrations. We compare this method against two state-of-the-art methods within the voxel-based and atlas-based categories. Methods...

  16. DNAskew:Statistical Analysis of Base Compositional Asymmetry and Prediction of Replication Boundaries in the Genome Sequences

    Institute of Scientific and Technical Information of China (English)

    Xiang-Ru MA; Shao-Bo XIAO; Ai-Zhen GUO; Jian-Qiang L(U); Huan-Chun CHEN

    2004-01-01

    Sueoka and Lobry declared respectively that, in the absence of bias between the two DNA strands for mutation and selection, the base composition within each strand should be A=T and C=G (this state is called Parity Rule type 2, PR2). However, the genome sequences of many bacteria, vertebrates and viruses showed asymmetries in base composition and gene direction. To determine the relationship of base composition skews with replication orientation, gene function, codon usage biases and phylogenetic evolution,in this paper a program called DNAskew was developed for the statistical analysis of strand asymmetry and codon composition bias in the DNA sequence. In addition, the program can also be used to predict the replication boundaries of genome sequences. The method builds on the fact that there are compositional asymmetries between the leading and the lagging strand for replication. DNAskew was written in Perl script language and implemented on the LINUX operating system. It works quickly with annotated or unannotated sequences in GBFF (GenBank flatfile) or fasta format. The source code is freely available for academic use at http://www.epizooty.com/pub/stat/DNAskew.

  17. STING Millennium: a web-based suite of programs for comprehensive and simultaneous analysis of protein structure and sequence

    Science.gov (United States)

    Neshich, Goran; Togawa, Roberto C.; Mancini, Adauto L.; Kuser, Paula R.; Yamagishi, Michel E. B.; Pappas, Georgios; Torres, Wellington V.; Campos, Tharsis Fonseca e; Ferreira, Leonardo L.; Luna, Fabio M.; Oliveira, Adilton G.; Miura, Ronald T.; Inoue, Marcus K.; Horita, Luiz G.; de Souza, Dimas F.; Dominiquini, Fabiana; Álvaro, Alexandre; Lima, Cleber S.; Ogawa, Fabio O.; Gomes, Gabriel B.; Palandrani, Juliana F.; dos Santos, Gabriela F.; de Freitas, Esther M.; Mattiuz, Amanda R.; Costa, Ivan C.; de Almeida, Celso L.; Souza, Savio; Baudet, Christian; Higa, Roberto H.

    2003-01-01

    STING Millennium Suite (SMS) is a new web-based suite of programs and databases providing visualization and a complex analysis of molecular sequence and structure for the data deposited at the Protein Data Bank (PDB). SMS operates with a collection of both publicly available data (PDB, HSSP, Prosite) and its own data (contacts, interface contacts, surface accessibility). Biologists find SMS useful because it provides a variety of algorithms and validated data, wrapped-up in a user friendly web interface. Using SMS it is now possible to analyze sequence to structure relationships, the quality of the structure, nature and volume of atomic contacts of intra and inter chain type, relative conservation of amino acids at the specific sequence position based on multiple sequence alignment, indications of folding essential residue (FER) based on the relationship of the residue conservation to the intra-chain contacts and Cα–Cα and Cβ–Cβ distance geometry. Specific emphasis in SMS is given to interface forming residues (IFR)—amino acids that define the interactive portion of the protein surfaces. SMS may simultaneously display and analyze previously superimposed structures. PDB updates trigger SMS updates in a synchronized fashion. SMS is freely accessible for public data at http://www.cbi.cnptia.embrapa.br, http://mirrors.rcsb.org/SMS and http://trantor.bioc.columbia.edu/SMS. PMID:12824333

  18. Array-based comparative genomic hybridization for the detection of DNA sequence copy number changes in Barrett's adenocarcinoma.

    Science.gov (United States)

    Albrecht, Bettina; Hausmann, Michael; Zitzelsberger, Horst; Stein, Hubert; Siewert, Jörg Rüdiger; Hopt, Ulrich; Langer, Rupert; Höfler, Heinz; Werner, Martin; Walch, Axel

    2004-07-01

    Array-based comparative genomic hybridization (aCGH) allows the identification of DNA sequence copy number changes at high resolution by co-hybridizing differentially labelled test and control DNAs to a micro-array of genomic clones. The present study has analysed a series of 23 formalin-fixed, paraffin wax-embedded tissue samples of Barrett's adenocarcinoma (BCA, n = 18) and non-neoplastic squamous oesophageal (n = 2) and gastric cardia mucosa (n = 3) by aCGH. The micro-arrays used contained 287 genomic targets covering oncogenes, tumour suppressor genes, and DNA sequences localized within chromosomal regions previously reported to be altered in BCA. DNA sequence copy number changes for a panel of approximately 50 genes were identified, most of which have not been previously described in BCA. DNA sequence copy number gains (mean 41 +/- 25/BCA) were more frequent than DNA sequence copy number losses (mean 20 +/- 15/BCA). The highest frequencies for DNA sequence copy number gains were detected for SNRPN (61%); GNLY (44%); NME1 (44%); DDX15, ABCB1 (MDR), ATM, LAMA3, MYBL2, ZNF217, and TNFRSF6B (39% each); and MSH2, TERC, SERPINE1, AFM137XA11, IGF1R, and PTPN1 (33% each). DNA sequence copy number losses were identified for PDGFB (44%); D17S125 (39%); AKT3 (28%); and RASSFI, FHIT, CDKN2A (p16), and SAS (CDK4) (28% each). In all non-neoplastic tissue samples of squamous oesophageal and gastric cardia mucosa, the measured mean ratios were 1.00 (squamous oesophageal mucosa) or 1.01 (gastric mucosa), indicating that no DNA sequence copy number chances were present. For validation, the DNA sequence copy number changes of selected clones (SNRPN, CMYC, HER2, ZNF217) detected by aCGH were confirmed by fluorescence in situ hybridization (FISH). These data show the sensitivity of aCGH for the identification of DNA sequence copy number changes at high resolution in BCA. The newly identified genes may include so far unknown biomarkers in BCA and are therefore a starting point for

  19. Effect of Addition Sequence during Neutralization and Precipitation on Iron-based Catalysts for High Temperature Shift Reaction

    Institute of Scientific and Technical Information of China (English)

    Li Wei; Zhu Jianhua; Mou Zhanjun

    2007-01-01

    The preparation of the iron-based catalysts promoted by cobalt with a small amount of copper and aluminum for the high temperature shift reaction (HTS) with different sequences of adding catalyst raw materials during neutralization and precipitation was investigated. XRD,BET and particle size distribution (PSD) were used to characterize the prepared catalysts. It was found that the catalyst crystals were all γ-Fe2O3,and the intermediate of the catalyst after aging was Fe3O4. The crystallographic form of the catalyst and its intermediate was not affected by the addition sequence in the neutralization and precipitation process. The results showed that the specific surface area and the particle size of the catalysts depended on the addition sequence to the mother liquor. Cobalt with a small amount of copper and aluminum could increase the specific surface area and decrease the particle size of catalysts.

  20. Molecular Taxonomy ofConogethes punctiferalis andConogethes pinicolalis (Lepidoptera:Crambidae) Based on Mitochondrial DNA Sequences

    Institute of Scientific and Technical Information of China (English)

    WANG Jing; ZHANG Tian-tao; WANG Zhen-ying; HE Kang-lai; LIU Yong; LI Jing

    2014-01-01

    Conogethes punctiferalis (Guenée) (Lepidoptera: Crambidae) was originally considered as one species with fruit-feeding type (FFT) and pinaceae-feeding type (PFT), but it has subsequently been divided into two different species ofConogethes punctiferalis andConogethes pinicolalis. The relationship between the two species was investigated by phylogenetic reconstruction using maximum-likelihood (ML) parameter estimations. The phylogenetic tree and network were constructed based upon sequence data from concatenation of three genes of mitochondrial cytochromec oxidase subunits I, II and cytochromeb which were derived from 118 samples ofC. punctiferalisand 24 samples ofC. pinicolalis. The phylogenetic tree and network showed that conspeciifc sequences were clustering together despite intraspeciifc variability. Here we report the results of a combined analysis of mitochondrial DNA sequences from three genes and morphological data representing powerful evidence thatC. pinicolalis andC. punctiferalis are signiifcantly different.

  1. Plasmid-Based Materials as Multiplex Quality Controls and Calibrators for Clinical Next-Generation Sequencing Assays.

    Science.gov (United States)

    Sims, David J; Harrington, Robin D; Polley, Eric C; Forbes, Thomas D; Mehaffey, Michele G; McGregor, Paul M; Camalier, Corinne E; Harper, Kneshay N; Bouk, Courtney H; Das, Biswajit; Conley, Barbara A; Doroshow, James H; Williams, P Mickey; Lih, Chih-Jian

    2016-05-01

    Although next-generation sequencing technologies have been widely adapted for clinical diagnostic applications, an urgent need exists for multianalyte calibrator materials and controls to evaluate the performance of these assays. Control materials will also play a major role in the assessment, development, and selection of appropriate alignment and variant calling pipelines. We report an approach to provide effective multianalyte controls for next-generation sequencing assays, referred to as the control plasmid spiked-in genome (CPSG). Control plasmids that contain approximately 1000 bases of human genomic sequence with a specific mutation of interest positioned near the middle of the insert and a nearby 6-bp molecular barcode were synthesized, linearized, quantitated, and spiked into genomic DNA derived from formalin-fixed, paraffin-embedded-prepared hapmap cell lines at defined copy number ratios. Serial titration experiments demonstrated the CPSGs performed with similar efficiency of variant detection as formalin-fixed, paraffin-embedded cell line genomic DNA. Repetitive analyses of one lot of CPSGs 90 times during 18 months revealed that the reagents were stable with consistent detection of each of the plasmids at similar variant allele frequencies. CPSGs are designed to work across most next-generation sequencing methods, platforms, and data analysis pipelines. CPSGs are robust controls and can be used to evaluate the performance of different next-generation sequencing diagnostic assays, assess data analysis pipelines, and ensure robust assay performance metrics. PMID:27105923

  2. Identification of Sinorhizobium (Ensifer) medicae based on a specific genomic sequence unveiled by M13-PCR fingerprinting.

    Science.gov (United States)

    Dourado, Ana Catarina; Alves, Paula I L; Tenreiro, Tania; Ferreira, Eugénio M; Tenreiro, Rogério; Fareleira, Paula; Crespo, M Teresa Barreto

    2009-12-01

    A collection of nodule isolates from Medicago polymorpha obtained from southern and central Portugal was evaluated by M13-PCR fingerprinting and hierarchical cluster analysis. Several genomic clusters were obtained which, by 16S rRNA gene sequencing of selected representatives, were shown to be associated with particular taxonomic groups of rhizobia and other soil bacteria. The method provided a clear separation between rhizobia and co-isolated non-symbiotic soil contaminants. Ten M13-PCR groups were assigned to Sinorhizobium (Ensifer) medicae and included all isolates responsible for the formation of nitrogen-fixing nodules upon re-inoculation of M. polymorpha test-plants. In addition, enterobacterial repetitive intergenic consensus (ERIC)-PCR fingerprinting indicated a high genomic heterogeneity within the major M13- PCR clusters of S. medicae isolates. Based on nucleotide sequence data of an M13-PCR amplicon of ca. 1500 bp, observed only in S. medicae isolates and spanning locus Smed_3707 to Smed_3709 from the pSMED01 plasmid sequence of S. medicae WSM419 genome's sequence, a pair of PCR primers was designed and used for direct PCR amplification of a 1399-bp sequence within this fragment. Additional in silico and in vitro experiments, as well as phylogenetic analysis, confirmed the specificity of this primer combination and therefore the reliability of this approach in the prompt identification of S. medicae isolates and their distinction from other soil bacteria. PMID:20112226

  3. Prediction of Antimicrobial Peptides Based on Sequence Alignment and Support Vector Machine-Pairwise Algorithm Utilizing LZ-Complexity

    Directory of Open Access Journals (Sweden)

    Xin Yi Ng

    2015-01-01

    Full Text Available This study concerns an attempt to establish a new method for predicting antimicrobial peptides (AMPs which are important to the immune system. Recently, researchers are interested in designing alternative drugs based on AMPs because they have found that a large number of bacterial strains have become resistant to available antibiotics. However, researchers have encountered obstacles in the AMPs designing process as experiments to extract AMPs from protein sequences are costly and require a long set-up time. Therefore, a computational tool for AMPs prediction is needed to resolve this problem. In this study, an integrated algorithm is newly introduced to predict AMPs by integrating sequence alignment and support vector machine- (SVM- LZ complexity pairwise algorithm. It was observed that, when all sequences in the training set are used, the sensitivity of the proposed algorithm is 95.28% in jackknife test and 87.59% in independent test, while the sensitivity obtained for jackknife test and independent test is 88.74% and 78.70%, respectively, when only the sequences that has less than 70% similarity are used. Applying the proposed algorithm may allow researchers to effectively predict AMPs from unknown protein peptide sequences with higher sensitivity.

  4. Palindrome analyser - A new web-based server for predicting and evaluating inverted repeats in nucleotide sequences.

    Science.gov (United States)

    Brázda, Václav; Kolomazník, Jan; Lýsek, Jiří; Hároníková, Lucia; Coufal, Jan; Št'astný, Jiří

    2016-09-30

    DNA cruciform structures play an important role in the regulation of natural processes including gene replication and expression, as well as nucleosome structure and recombination. They have also been implicated in the evolution and development of diseases such as cancer and neurodegenerative disorders. Cruciform structures are formed by inverted repeats, and their stability is enhanced by DNA supercoiling and protein binding. They have received broad attention because of their important roles in biology. Computational approaches to study inverted repeats have allowed detailed analysis of genomes. However, currently there are no easily accessible and user-friendly tools that can analyse inverted repeats, especially among long nucleotide sequences. We have developed a web-based server, Palindrome analyser, which is a user-friendly application for analysing inverted repeats in various DNA (or RNA) sequences including genome sequences and oligonucleotides. It allows users to search and retrieve desired gene/nucleotide sequence entries from the NCBI databases, and provides data on length, sequence, locations and energy required for cruciform formation. Palindrome analyser also features an interactive graphical data representation of the distribution of the inverted repeats, with options for sorting according to the length of inverted repeat, length of loop, and number of mismatches. Palindrome analyser can be accessed at http://bioinformatics.ibp.cz. PMID:27603574

  5. A Quantitative Tool to Distinguish Isobaric Leucine and Isoleucine Residues for Mass Spectrometry-Based De Novo Monoclonal Antibody Sequencing

    Science.gov (United States)

    Poston, Chloe N.; Higgs, Richard E.; You, Jinsam; Gelfanova, Valentina; Hale, John E.; Knierman, Michael D.; Siegel, Robert; Gutierrez, Jesus A.

    2014-07-01

    De novo sequencing by mass spectrometry (MS) allows for the determination of the complete amino acid (AA) sequence of a given protein based on the mass difference of detected ions from MS/MS fragmentation spectra. The technique relies on obtaining specific masses that can be attributed to characteristic theoretical masses of AAs. A major limitation of de novo sequencing by MS is the inability to distinguish between the isobaric residues leucine (Leu) and isoleucine (Ile). Incorrect identification of Ile as Leu or vice versa often results in loss of activity in recombinant antibodies. This functional ambiguity is commonly resolved with costly and time-consuming AA mutation and peptide sequencing experiments. Here, we describe a set of orthogonal biochemical protocols, which experimentally determine the identity of Ile or Leu residues in monoclonal antibodies (mAb) based on the selectivity that leucine aminopeptidase shows for n-terminal Leu residues and the cleavage preference for Leu by chymotrypsin. The resulting observations are combined with germline frequencies and incorporated into a logistic regression model, called Predictor for Xle Sites (PXleS) to provide a statistical likelihood for the identity of Leu at an ambiguous site. We demonstrate that PXleS can generate a probability for an Xle site in mAbs with 96% accuracy. The implementation of PXleS precludes the expression of several possible sequences and, therefore, reduces the overall time and resources required to go from spectra generation to a biologically active sequence for a mAb when an Ile or Leu residue is in question.

  6. The feline oral microbiome: a provisional 16S rRNA gene based taxonomy with full-length reference sequences.

    Science.gov (United States)

    Dewhirst, Floyd E; Klein, Erin A; Bennett, Marie-Louise; Croft, Julie M; Harris, Stephen J; Marshall-Jones, Zoe V

    2015-02-25

    The human oral microbiome is known to play a significant role in human health and disease. While less well studied, the feline oral microbiome is thought to play a similarly important role. To determine roles oral bacteria play in health and disease, one first has to be able to accurately identify bacterial species present. 16S rRNA gene sequence information is widely used for molecular identification of bacteria and is also useful for establishing the taxonomy of novel species. The objective of this research was to obtain full 16S rRNA gene reference sequences for feline oral bacteria, place the sequences in species-level phylotypes, and create a curated 16S rRNA based taxonomy for common feline oral bacteria. Clone libraries were produced using "universal" and phylum-selective PCR primers and DNA from pooled subgingival plaque from healthy and periodontally diseased cats. Bacteria in subgingival samples were also cultivated to obtain isolates. Full-length 16S rDNA sequences were determined for clones and isolates that represent 171 feline oral taxa. A provisional curated taxonomy was developed based on the position of each taxon in 16S rRNA phylogenetic trees. The feline oral microbiome curated taxonomy and 16S rRNA gene reference set will allow investigators to refer to precisely defined bacterial taxa. A provisional name such as "Propionibacterium sp. feline oral taxon FOT-327" is an anchor to which clone, strain or GenBank names or accession numbers can point. Future next-generation-sequencing studies of feline oral bacteria will be able to map reads to taxonomically curated full-length 16S rRNA gene sequences.

  7. A work stealing based approach for enabling scalable optimal sequence homology detection

    Energy Technology Data Exchange (ETDEWEB)

    Daily, Jeffrey A. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Kalyanaraman, Anantharaman [Washington State Univ., Pullman, WA (United States); Krishnamoorthy, Sriram [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Vishnu, Abhinav [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2015-05-01

    Sequence homology detection is central to a number of bioinformatics applications including genome sequencing and protein family characterization. Given millions of sequences, the goal is to identify all pairs of sequences that are highly similar (or “homologous”) on the basis of alignment criteria. While there are optimal alignment algorithms to compute pairwise homology, their deployment for large-scale is currently not feasible; instead, heuristic methods are used at the expense of quality. Here, we present the design and evaluation of a parallel implementation for conducting optimal homology detection on distributed memory supercomputers. Our approach uses a combination of techniques from asynchronous load balancing (viz. work stealing, dynamic task counters), data replication, and exact-matching filters to achieve homology detection at scale. Results for 2.56M sequences on up to 8K cores show parallel efficiencies of ~ 75-100%, a time-to-solution of 33s, and a rate of ~ 2.0M alignments per second.

  8. A sequence-based survey of the complex structural organization of tumor genomes

    Energy Technology Data Exchange (ETDEWEB)

    Collins, Colin; Raphael, Benjamin J.; Volik, Stanislav; Yu, Peng; Wu, Chunxiao; Huang, Guiqing; Linardopoulou, Elena V.; Trask, Barbara J.; Waldman, Frederic; Costello, Joseph; Pienta, Kenneth J.; Mills, Gordon B.; Bajsarowicz, Krystyna; Kobayashi, Yasuko; Sridharan, Shivaranjani; Paris, Pamela; Tao, Quanzhou; Aerni, Sarah J.; Brown, Raymond P.; Bashir, Ali; Gray, Joe W.; Cheng, Jan-Fang; de Jong, Pieter; Nefedov, Mikhail; Ried, Thomas; Padilla-Nash, Hesed M.; Collins, Colin C.

    2008-04-03

    The genomes of many epithelial tumors exhibit extensive chromosomal rearrangements. All classes of genome rearrangements can be identified using End Sequencing Profiling (ESP), which relies on paired-end sequencing of cloned tumor genomes. In this study, brain, breast, ovary and prostate tumors along with three breast cancer cell lines were surveyed with ESP yielding the largest available collection of sequence-ready tumor genome breakpoints and providing evidence that some rearrangements may be recurrent. Sequencing and fluorescence in situ hybridization (FISH) confirmed translocations and complex tumor genome structures that include coamplification and packaging of disparate genomic loci with associated molecular heterogeneity. Comparison of the tumor genomes suggests recurrent rearrangements. Some are likely to be novel structural polymorphisms, whereas others may be bona fide somatic rearrangements. A recurrent fusion transcript in breast tumors and a constitutional fusion transcript resulting from a segmental duplication were identified. Analysis of end sequences for single nucleotide polymorphisms (SNPs) revealed candidate somatic mutations and an elevated rate of novel SNPs in an ovarian tumor. These results suggest that the genomes of many epithelial tumors may be far more dynamic and complex than previously appreciated and that genomic fusions including fusion transcripts and proteins may be common, possibly yielding tumor-specific biomarkers and therapeutic targets.

  9. Core genome conservation of Staphylococcus haemolyticus limits sequence based population structure analysis.

    Science.gov (United States)

    Cavanagh, Jorunn Pauline; Klingenberg, Claus; Hanssen, Anne-Merethe; Fredheim, Elizabeth Aarag; Francois, Patrice; Schrenzel, Jacques; Flægstad, Trond; Sollid, Johanna Ericson

    2012-06-01

    The notoriously multi-resistant Staphylococcus haemolyticus is an emerging pathogen causing serious infections in immunocompromised patients. Defining the population structure is important to detect outbreaks and spread of antimicrobial resistant clones. Currently, the standard typing technique is pulsed-field gel electrophoresis (PFGE). In this study we describe novel molecular typing schemes for S. haemolyticus using multi locus sequence typing (MLST) and multi locus variable number of tandem repeats (VNTR) analysis. Seven housekeeping genes (MLST) and five VNTR loci (MLVF) were selected for the novel typing schemes. A panel of 45 human and veterinary S. haemolyticus isolates was investigated. The collection had diverse PFGE patterns (38 PFGE types) and was sampled over a 20 year-period from eight countries. MLST resolved 17 sequence types (Simpsons index of diversity [SID]=0.877) and MLVF resolved 14 repeat types (SID=0.831). We found a low sequence diversity. Phylogenetic analysis clustered the isolates in three (MLST) and one (MLVF) clonal complexes, respectively. Taken together, neither the MLST nor the MLVF scheme was suitable to resolve the population structure of this S. haemolyticus collection. Future MLVF and MLST schemes will benefit from addition of more variable core genome sequences identified by comparing different fully sequenced S. haemolyticus genomes. PMID:22484086

  10. Comparison of pulse sequences for R1-based electron paramagnetic resonance oxygen imaging

    Science.gov (United States)

    Epel, Boris; Halpern, Howard J.

    2015-05-01

    Electron paramagnetic resonance (EPR) spin-lattice relaxation (SLR) oxygen imaging has proven to be an indispensable tool for assessing oxygen partial pressure in live animals. EPR oxygen images show remarkable oxygen accuracy when combined with high precision and spatial resolution. Developing more effective means for obtaining SLR rates is of great practical, biological and medical importance. In this work we compared different pulse EPR imaging protocols and pulse sequences to establish advantages and areas of applicability for each method. Tests were performed using phantoms containing spin probes with oxygen concentrations relevant to in vivo oxymetry. We have found that for small animal size objects the inversion recovery sequence combined with the filtered backprojection reconstruction method delivers the best accuracy and precision. For large animals, in which large radio frequency energy deposition might be critical, free induction decay and three pulse stimulated echo sequences might find better practical usage.

  11. TAXONOMIC STATUS OF CAR BACILLUS BASED ON THE SMALL SUBUNIT RIBOSOMAL RNA SEQUENCES

    Institute of Scientific and Technical Information of China (English)

    魏强; TsujiM; TakahashiT; IshiharaC; ItohT

    1995-01-01

    In an attempt to identify the taxonomic relationship between CAR bacillus and other bacteria, the SSU rRNA gene sequences of two CAR bacillus strains, CBM and CBR isolated from mice and rats respectively were used in the present studies. The SSU rRNA gene sequences, approximately 1.5 kb in size amplified from genomic DNAs from both strains, were determined and 96. 8% homologies were found to exist be-tween them. Those sequences were aligned to most euhacteria with a computer search showing high homol-ogy with those of Flavobacter/Flexibacter species especially closed to Fx. sanai and Ft. ferrugineum. Phylogenetic analysts indicated that CAR bacillus belongs to a species close to Fx. sancti and Ft. ferrug-imum subdivision.

  12. PyVDT: A PsychoPy-Based Visual Sequence Detection Task

    OpenAIRE

    Hansen, Mads

    2016-01-01

    PyVDT is a computerized test consisting of two brief visual sequence detection tasks in which participants watch single digits displayed on screen and respond whenever target digit sequences (even – odd – even) are displayed. The total duration of the test is around five minutes. PyVDT is a reimplementation of the Visual Monitoring Task (VMT), a task thought to measure working memory.PyVDT uses the PsychoPy API to display digits, to plot diagnostic information, and to output log files and res...

  13. Identification of Anoectochilus based on rDNA ITS sequences alignment and SELDI-TOF-MS

    Directory of Open Access Journals (Sweden)

    Chuan Gao, Fusheng Zhang, Jun Zhang, Shunxing Guo, Hongbo Shao

    2009-01-01

    Full Text Available The internal transcribed spacer (ITS sequences alignment and proteomic difference of Anoectochilus interspecies have been studied by means of ITS molecular identification and surface enhanced laser desorption ionization time of flight mass spectrography. Results showed that variety certification on Anoectochilus by ITS sequences can not determine species, and there is proteomic difference among Anoectochilus interspecies. Moreover, proteomic finger printings of five Anoectochilus species have been established for identifying species, and genetic relationships of five species within Anoectochilus have been deduced according to proteomic differences among five species.

  14. The Cenozoic geological evolution of the Central and Northern North Sea based on seismic sequence stratigraphy

    Energy Technology Data Exchange (ETDEWEB)

    Jordt, Henrik

    1996-03-01

    This thesis represents scientific results from seismic sequence stratigraphic investigations. These investigations and results are integrated into an ongoing mineralogical study of the Cenozoic deposits. the main results from this mineralogical study are presented and discussed. The seismic investigations have provided boundary conditions for a forward modelling study of the Cenozoic depositional history. Results from the forward modelling are presented as they emphasise the influence of tectonics on sequence development. The tectonic motions described were important for the formation of the large oil and gas fields in the North Sea.

  15. Effective Simulation of Quantum Entanglement Based on Classical Fields Modulated with Pseudorandom Phase Sequences

    CERN Document Server

    Fu, Jian; Xu, Yingying; Dong, Hongtao

    2010-01-01

    We demonstrate that n classical fields modulated with n different pseudorandom phase sequences can constitute a 2^n-dimensional Hilbert space that contains tensor product structure. By using classical fields modulated with pseudorandom phase sequences, we discuss effective simulation of Bell states and GHZ state, and apply both correlation analysis and von Neumann entropy to characterize the simulation. We obtain similar results with the cases in quantum mechanics and find that the conclusions can be easily generalized to n quantum particles. The research on simulation of quantum entanglement may be important, for it not only provides useful insights into fundamental features of quantum entanglement, but also yields new insights into quantum computation.

  16. Small Cell Layouts Based on Accounting Product Demand and Operating Sequences

    Institute of Scientific and Technical Information of China (English)

    常剑峰; 钟约先; 韩赞东

    2004-01-01

    Most current cell layout methods do not take into account the product demand and operating sequences, and may be too sophisticated for facilities with a relatively small number of products. A specific method for designing small manufacturing cells was developed especially for the press production lines, which is computationally simple, and yet considers product demand and the operating sequences. A simulated application illustrates the robustness of the layouts to demand changes. The method uses simple rules and database tools, so it is accessible to a wide range of facilities.

  17. Outline of a genome navigation system based on the properties of GA-sequences and their flanks.

    Directory of Open Access Journals (Sweden)

    Guenter Albrecht-Buehler

    protein synthesis based on the shared segments of different GA-sequences.

  18. Identification of polymorphic tandem repeats by direct comparison of genome sequence from different bacterial strains : a web-based resource

    Directory of Open Access Journals (Sweden)

    Vergnaud Gilles

    2004-01-01

    Full Text Available Abstract Background Polymorphic tandem repeat typing is a new generic technology which has been proved to be very efficient for bacterial pathogens such as B. anthracis, M. tuberculosis, P. aeruginosa, L. pneumophila, Y. pestis. The previously developed tandem repeats database takes advantage of the release of genome sequence data for a growing number of bacteria to facilitate the identification of tandem repeats. The development of an assay then requires the evaluation of tandem repeat polymorphism on well-selected sets of isolates. In the case of major human pathogens, such as S. aureus, more than one strain is being sequenced, so that tandem repeats most likely to be polymorphic can now be selected in silico based on genome sequence comparison. Results In addition to the previously described general Tandem Repeats Database, we have developed a tool to automatically identify tandem repeats of a different length in the genome sequence of two (or more closely related bacterial strains. Genome comparisons are pre-computed. The results of the comparisons are parsed in a database, which can be conveniently queried over the internet according to criteria of practical value, including repeat unit length, predicted size difference, etc. Comparisons are available for 16 bacterial species, and the orthopox viruses, including the variola virus and three of its close neighbors. Conclusions We are presenting an internet-based resource to help develop and perform tandem repeats based bacterial strain typing. The tools accessible at http://minisatellites.u-psud.fr now comprise four parts. The Tandem Repeats Database enables the identification of tandem repeats across entire genomes. The Strain Comparison Page identifies tandem repeats differing between different genome sequences from the same species. The "Blast in the Tandem Repeats Database" facilitates the search for a known tandem repeat and the prediction of amplification product sizes. The "Bacterial

  19. Phylogeny of plastids based on cladistic analysis of gene loss inferred from complete plastid genome sequences.

    Science.gov (United States)

    Nozaki, Hisayoshi; Ohta, Njij; Matsuzaki, Motomichi; Misumi, Osami; Kuroiwa, Tsuneyoshi

    2003-10-01

    Based on the recent hypothesis on the origin of eukaryotic phototrophs, red algae, green plants, and glaucophytes constitute the "primary photosynthetic eukaryotes" (whose plastids may have originated directly from a cyanobacterium-like prokaryote via primary endosymbiosis), whereas the plastids of other lineages of eukaryotic phototrophs appear to be the result of secondary or tertiary endosymbiotic events (involving a phototrophic eukaryote and a host cell). Although phylogenetic analyses using multiple plastid genes from a wide range of eukaryotic lineages have been carried out, some of the major phylogenetic relationships of plastids remain ambiguous or conflict between different phylogenetic methods used for nucleotide or amino acid substitutions. Therefore, an alternative methodology to infer the plastid phylogeny is needed. Here, we carried out a cladistic analysis of the "loss of plastid genes" after primary endosymbiosis using complete plastid genome sequences from a wide range of eukaryotic phototrophs. Since it is extremely unlikely that plastid genes are regained during plastid evolution, we used the irreversible Camin-Sokal model for our cladistic analysis of the loss of plastid genes. The cladistic analysis of the 274 plastid protein-coding genes resolved the 20 operational taxonomic units representing a wide range of eukaryotic lineages (including three secondary plastid-containing groups) into two large monophyletic groups with high bootstrap values: one corresponded to the red lineage and the other consisted of a large clade composed of the green lineage (green plants and Euglena) and the basal glaucophyte plastid. Although the sister relationship between the green lineage and the Glaucophyta was not resolved in recent phylogenetic studies using amino acid substitutions from multiple plastid genes, it is consistent with the rbcL gene phylogeny and with a recent phylogenetic study using multiple nuclear genes. In addition, our analysis robustly

  20. Genbit Compress Tool(GBC): A Java-Based Tool to Compress DNA Sequences and Compute Compression Ratio(bits/base) of Genomes

    CERN Document Server

    Rajeswari, P Raja; Kumar, V K; 10.5121/ijcsit.2010.2313

    2010-01-01

    We present a Compression Tool, "GenBit Compress", for genetic sequences based on our new proposed "GenBit Compress Algorithm". Our Tool achieves the best compression ratios for Entire Genome (DNA sequences) . Significantly better compression results show that GenBit compress algorithm is the best among the remaining Genome compression algorithms for non-repetitive DNA sequences in Genomes. The standard Compression algorithms such as gzip or compress cannot compress DNA sequences but only expand them in size. In this paper we consider the problem of DNA compression. It is well known that one of the main features of DNA Sequences is that they contain substrings which are duplicated except for a few random Mutations. For this reason most DNA compressors work by searching and encoding approximate repeats. We depart from this strategy by searching and encoding only exact repeats. our proposed algorithm achieves the best compression ratio for DNA sequences for larger genome. As long as 8 lakh characters can be give...

  1. Data Interoperability of Whole Exome Sequencing (WES Based Mutational Burden Estimates from Different Laboratories

    Directory of Open Access Journals (Sweden)

    Ping Qiu

    2016-04-01

    Full Text Available Immune checkpoint inhibitors, which unleash a patient’s own T cells to kill tumors, are revolutionizing cancer treatment. Several independent studies suggest that higher non-synonymous mutational burden assessed by whole exome sequencing (WES in tumors is associated with improved objective response, durable clinical benefit, and progression-free survival in immune checkpoint inhibitors treatment. Next-generation sequencing (NGS is a promising technology being used in the clinic to direct patient treatment. Cancer genome WES poses a unique challenge due to tumor heterogeneity and sequencing artifacts introduced by formalin-fixed, paraffin-embedded (FFPE tissue. In order to evaluate the data interoperability of WES data from different sources to survey tumor mutational landscape, we compared WES data of several tumor/normal matched samples from five commercial vendors. A large data discrepancy was observed from vendors’ self-reported data. Independent data analysis from vendors’ raw NGS data shows that whole exome sequencing data from qualified vendors can be combined and analyzed uniformly to derive comparable quantitative estimates of tumor mutational burden.

  2. Molecular phylogeny of Pamphagidae (Acridoidea, Orthoptera) from China based on mitochondrial cytochrome oxidase Ⅱ sequences

    Institute of Scientific and Technical Information of China (English)

    Dao-Chuan Zhang; Hong-Yan Han; Hong Yin; Xin-Jiang Li; Zhan Yin; Xiang-Chu Yin

    2011-01-01

    Phylogenetic relationships of Pamphagidae were examined using cytochrome oxidase subunit Ⅱ (COII) mtDNA sequences (684 bp). Twenty-seven species of Acridoidea from 20 genera were sequenced to obtain mtDNA data, along with four species from the GenBank nucleotide database. The purpose of this study was analyzing the phylogenetic relationships among subfamilies within Pamphagidae and interpreting the phylogenetic position of this family within the Acridoidea superfamily. Phylogenetic trees were reconstructed using neighbor-joining (NJ), maximum parsimony (MP) and Bayesian inference (BI) methods. The 684 bp analyzed fragment included 126 parsimony informative sites. Sequences diverged 1.0%-l1.1% between genera within subfamilies, and 8.8%-12.3% between subfamilies. Amino acid sequence diverged 0-6.1% between genera within subfamilies, and 0.4%-7.5% between subfamilies. Our phylogenetic trees revealed the monophyly of Pamphagidae and three distinct major groups within this family. Moreover, several well supported and stable clades were found in Pamphagidae. The global clustering results were similar to that obtained through classical morphological classification: Prionotropisinae, Thrinchinae and Pamphaginae were monophyletic groups. However, the current genus Filchnerella (Prionotropisinae) was not a monophyletic group and the genus Asiotmethis (Prionotropisinae) was a sister group of the genus Thrinchus (Thrinchinae). Further molecular and morphological studies are required to clarify the phylogenetic relationships of the genera Filchnerella and Asiotmethis.

  3. Cladistic biogeography of Gleditsia (Leguminosae) based on ndhF and rpl16 chloroplast gene sequences.

    Science.gov (United States)

    Schnabel, A; Wendel, J F

    1998-12-01

    We used cladistic analysis of chloroplast gene sequences (ndhF and rpl16) to test biogeographic hypotheses in the woody genus Gleditsia. Previous morphological comparisons suggested the presence of two eastern Asian-eastern North American species pairs among the 13 known species, as well as other intra- and inter-continental disjunctions. Results from phylogenetic analyses, interpreted in light of the amount of sequence divergence observed, led to the following conclusions. First, there is a fundamental division of the genus into three clades, only one of which contains both Asian and North American species. Second, the widespread and polymorphic Asian species, G. japonica, is sister to the two North American species, G. triacanthos and G. aquatica, which themselves are closely related inter se, but are both polymorphic and paraphyletic. Third, the lone South American Gleditsia species, G. amorphoides, forms a clade with two eastern Asian species. Gleditsia thus appears to have only one Asian-North American disjunction and no intercontinental species pairs. Low sequence divergence between G. amorphoides and its closest Asian relatives implicates long-distance dispersal in the origin of this unusual disjunction. Sequence divergence between Asian and North American Gleditsia is much lower than between Asian and North American species of its closest relative, Gymnocladus. Estimates of Asian-North American divergence times for Gymnocladus are in general accordance with fossil data, but estimates for Gleditsia suggest recent divergences that conflict with ages of known North American Gleditsia fossils.

  4. Cladistic biogeography of Juglans (Juglandaceae) based on chloroplast DNA intergenic spacer sequences

    Science.gov (United States)

    The phylogenetic utility of sequence variation from five chloroplast DNA intergenic spacer (IGS) regions: trnT-trnF, psbA-trnH, atpB-rbcL, trnV-16S rRNA, and trnS-trnfM was examined in the genus Juglans. A total of seventeen taxa representing the four sections within Juglans and an outgroup taxon, ...

  5. Citrus plastid-related gene profiling based on expressed sequence tag analyses

    Directory of Open Access Journals (Sweden)

    Tercilio Calsa Jr.

    2007-01-01

    Full Text Available Plastid-related sequences, derived from putative nuclear or plastome genes, were searched in a large collection of expressed sequence tags (ESTs and genomic sequences from the Citrus Biotechnology initiative in Brazil. The identified putative Citrus chloroplast gene sequences were compared to those from Arabidopsis, Eucalyptus and Pinus. Differential expression profiling for plastid-directed nuclear-encoded proteins and photosynthesis-related gene expression variation between Citrus sinensis and Citrus reticulata, when inoculated or not with Xylella fastidiosa, were also analyzed. Presumed Citrus plastome regions were more similar to Eucalyptus. Some putative genes appeared to be preferentially expressed in vegetative tissues (leaves and bark or in reproductive organs (flowers and fruits. Genes preferentially expressed in fruit and flower may be associated with hypothetical physiological functions. Expression pattern clustering analysis suggested that photosynthesis- and carbon fixation-related genes appeared to be up- or down-regulated in a resistant or susceptible Citrus species after Xylella inoculation in comparison to non-infected controls, generating novel information which may be helpful to develop novel genetic manipulation strategies to control Citrus variegated chlorosis (CVC.

  6. Progress in environmental transcriptomics based on next-generation high-throughput sequencing

    Directory of Open Access Journals (Sweden)

    Yuanfeng Cai

    2013-07-01

    Full Text Available Environmental transcriptomics, which focuses on microbial mRNA derived from complex environmental samples using the RNA-Seq method, allows investigation of expression and patterns of regulation of functional genes in natural microbial communities. This review outlines the basic protocol of environmental transcriptomics, from sample collection and preservation, total RNA isolation, mRNA enrichment, cDNA synthesis to high-throughput sequencing and data analysis. Main technological problems are pointed out, such as low yield of mRNA in environmental samples, contamination of mRNA by various impurities like humic substances and limited degree of rRNA removal. Recent progresses in specific methodologies to improve the quantity and quality of mRNA, especially in RNA extraction, purification and the enrichment of mRNA, are outlined. Bioinformatics methods that deal with the large volume of RNA-Seq data are addressed, such as quality control of the sequence data, sequence assembly, detection and removal of rRNA, gene annotation and functional classification, and detection of differently expressed genes. The widely application of environmental transcriptomics, including detection of new genes, study of gene expression and regulation of microorganisms in different environments, and the analysis of metabolic pathways of special organic substances, are also highlighted. Environmental transcriptomics, combined with the further development of sequencing technology and bioinformatics tools in the future, are likely to be comprehensively used in the study of environmental microbiology.

  7. Antibody-based screening for hereditary nonpolyposis colorectal carcinoma compared with microsatellite analysis and sequencing

    DEFF Research Database (Denmark)

    Christensen, Mariann; Katballe, Niels; Wikman, Friedrik;

    2002-01-01

    BACKGROUND: Germline mutations in the DNA mismatch repair genes, MSH2, MLH1, and others are associated with hereditary nonpolyposis colorectal cancer (HNPCC). Due to the high costs of sequencing, cheaper screening methods are needed to identify HNPCC cases. Ideally, these methods should have a hi...

  8. The Sequence Modeling Method Based on ECC in Developing Program Specifications

    Institute of Scientific and Technical Information of China (English)

    CAI Jiamei

    1999-01-01

    This article discusses the developing process of theversion sequences of specifications and the formal expressions ofvarious reconstructions including the expansion and revision of theversion at each stage. The author suggests using ECC (Extended Calculusof Construction) to describe the specifications of formal system andusing functional language ML to implement this developing process.

  9. Base-pair resolution DNA methylation sequencing reveals profoundly divergent epigenetic landscapes in acute myeloid leukemia

    NARCIS (Netherlands)

    A. Akalin (Altuna); F.E. Garrett-Bakelman (Francine); M. Kormaksson (Matthias); J. Busuttil (Jennifer); L. Zhang (Lingling); I. Khrebtukova (Irina); T.A. Milne (Thomas); Y. Huang (Yongsheng); R.S. Biswas (Rajat); J.L. Hess (Jay); C.D. Allis (C. David); R.G. Roeder (Robert); P.J.M. Valk (Peter); B. Löwenberg (Bob); H.R. Delwel (Ruud); H.F. Fernandez (Hugo); E. Paietta (Elisabeth); M.S. Tallman (Martin); G.P. Schroth (Gary P); C.E. Mason (Christopher); A. Melnick (Ari); M.E. Figueroa (Maria Eugenia)

    2012-01-01

    textabstractWe have developed an enhanced form of reduced representation bisulfite sequencing with extended genomic coverage, which resulted in greater capture of DNA methylation information of regions lying outside of traditional CpG islands. Applying this method to primary human bone marrow specim

  10. Phylogenetic inferences in Avena based on analysis of FL intron2 sequences.

    Science.gov (United States)

    Peng, Yuan-Ying; Wei, Yu-Ming; Baum, Bernard R; Yan, Ze-Hong; Lan, Xiu-Jin; Dai, Shou-Fen; Zheng, You-Liang

    2010-09-01

    The development and application of molecular methods in oats has been relatively slow compared with other crops. Results from the previous analyses have left many questions concerning species evolutionary relationships unanswered, especially regarding the origins of the B and D genomes, which are only known to be present in polyploid oat species. To investigate the species and genome relationships in genus Avena, among 13 diploid (A and C genomes), we used the second intron of the nuclear gene FLORICAULA/LEAFY (FL int2) in seven tetraploid (AB and AC genomes), and five hexaploid (ACD genome) species. The Avena FL int2 is rather long, and high levels of variation in length and sequence composition were found. Evidence for more than one copy of the FL int2 sequence was obtained for both the A and C genome groups, and the degree of divergence of the A genome copies was greater than that observed within the C genome sequences. Phylogenetic analysis of the FL int2 sequences resulted in topologies that contained four major groups; these groups reemphasize the major genomic divergence between the A and C genomes, and the close relationship among the A, B, and D genomes. However, the D genome in hexaploids more likely originated from a C genome diploid rather than the generally believed A genome, and the C genome diploid A. clauda may have played an important role in the origination of both the C and D genome in polyploids.

  11. GENETIC DIVERSITY OF KEJOBONG GOAT BASED ON MITOCHONDRIAL DNA D-LOOP SEQUENCE

    Directory of Open Access Journals (Sweden)

    M. F. Harlistyo

    2015-09-01

    Full Text Available This study was aimed to find out the diversity of mtDNA D-loop at Kejobong goat. The completemtDNA D-loop sequence of 12 goat blood samples were analyzed from 4 different location inPurbalingga Regency, Central Java province, sub-districts Kejobong, Pangadegan, Bukateja, andKaligondang. The mtDNA D-loop was extracted from blood sample. DNA obtained were amplified byPCR (Polymerase Chain Reaction method using primers (5’-tcactatcagcacccaaagc-3’ as forward and(5’-ggcattttcagtgccttgct-3’ as reverse and subsequently sequenced. After nucleotide sequencing analysisconducted, 548 bp along was obtained. Nucleotides were then aligned with Capra hircus (GenBankAccess No.: KF952601.1 and apparently there were 11 different sites on the segment of mtDNA Dloop.Five sites could be used as a specific marker to distinguish between the Capra hircus andKejobong goat, namely at the site of 317 (A-G, 403 (T-C, 434 (T-C, 537 (C-T, and 553 (A-G.Nucleotide sequence analysis also contained seven different haplotypes. It was concluded that thedistribution of the different sites showed different haplotype patterns in Kejobong goat.

  12. Comparative analysis of four essential Gracilariaceae species in China based on whole transcriptomic sequencing

    Institute of Scientific and Technical Information of China (English)

    XU Jiayue; WU Shuangxiu; YU Jun; SUN Jing; YIN Jinlong; WANG Liang; WANG Xumin; LIU Tao; CHI Shan; LIU Cui; REN Lufeng

    2014-01-01

    Three Gracilaria species, G. chouae, G. blodgettii, G. vermiculophylla and a close relative species, Gracilari-opsis lemaneiformis which is now nominated as Gracilaria lemaneiformis, are the typically indigenous spe-cies which are important resources for the production of special proteins, phycobilisomes, special carbo-hydrates, and agar in China. In this study, de novo transcriptome sequencing on these four species using the next generation sequencing technology was performed for the first time. Functional annotations on assembled sequencing reads showed that the transcriptomic profiles were quite different between G. lema-neiformis and other three Gracilaria species. Comparative analysis of differential gene expression related to carbohydrate and phycobiliprotein metabolisms also showed that the expression profiles of these essential genes were different in four species. The genes encoding allophycocyanin, phycocyanin and phycoerythrin were further examined in four species and their deduced amino acid sequences were used for phylogenetic analysis to confirm that G. lemaneiformis had close relationship to genus Gracilaria, as well as that within genus Gracilaria, G. chouae had closer relationship to G. vermiculophylla rather than to G. blodgettii. The de novo transcriptome study on four species provided a valuable genomic resource for further understanding and analysis on biological and evolutionary study among marine algae.

  13. A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs

    Directory of Open Access Journals (Sweden)

    Seitzer Phillip

    2012-11-01

    Full Text Available Abstract Background Discovery of functionally significant short, statistically overrepresented subsequence patterns (motifs in a set of sequences is a challenging problem in bioinformatics. Oftentimes, not all sequences in the set contain a motif. These non-motif-containing sequences complicate the algorithmic discovery of motifs. Filtering the non-motif-containing sequences from the larger set of sequences while simultaneously determining the identity of the motif is, therefore, desirable and a non-trivial problem in motif discovery research. Results We describe MotifCatcher, a framework that extends the sensitivity of existing motif-finding tools by employing random sampling to effectively remove non-motif-containing sequences from the motif search. We developed two implementations of our algorithm; each built around a commonly used motif-finding tool, and applied our algorithm to three diverse chromatin immunoprecipitation (ChIP data sets. In each case, the motif finder with the MotifCatcher extension demonstrated improved sensitivity over the motif finder alone. Our approach organizes candidate functionally significant discovered motifs into a tree, which allowed us to make additional insights. In all cases, we were able to support our findings with experimental work from the literature. Conclusions Our framework demonstrates that additional processing at the sequence entry level can significantly improve the performance of existing motif-finding tools. For each biological data set tested, we were able to propose novel biological hypotheses supported by experimental work from the literature. Specifically, in Escherichia coli, we suggested binding site motifs for 6 non-traditional LexA protein binding sites; in Saccharomyces cerevisiae, we hypothesize 2 disparate mechanisms for novel binding sites of the Cse4p protein; and in Halobacterium sp. NRC-1, we discoverd subtle differences in a general transcription factor (GTF binding site motif

  14. Statistical Methods for Population Genetic Inference Based on Low-Depth Sequencing Data from Modern and Ancient DNA

    DEFF Research Database (Denmark)

    Korneliussen, Thorfinn Sand

    Due to the recent advances in DNA sequencing technology genomic data are being generated at an unprecedented rate and we are gaining access to entire genomes at population level. The technology does, however, not give direct access to the genetic variation and the many levels of preprocessing...... that is required before being able to make inferences from the data introduces multiple levels of uncertainty, especially for low-depth data. Therefore methods that take into account the inherent uncertainty are needed for being able to make robust inferences in the downstream analysis of such data. This poses...... data. These methods are all based on the concept of genotype likelihoods, which provides a degree of uncertainty of the data, and we show, both through simulations and with proper high-throughput sequencing data, that for low-depth data our methods outperform existing approaches, which are based...

  15. Non-B DNA-forming sequences and WRN deficiency independently increase the frequency of base substitution in human cells

    DEFF Research Database (Denmark)

    Bacolla, Albino; Wang, Guliang; Jain, Aklank;

    2011-01-01

    the mutation frequency, the increase afforded by WRN-KD was independent of DNA structure despite the fact that purified WRN helicase was found to resolve these structures in vitro. In U2OS cells, ~70% of mutations comprised single-base substitutions, mostly at G·C base-pairs, with the remaining ~30% being...... microdeletions. The number of mutations at G·C base-pairs in the context of NGNN/NNCN sequences correlated well with predicted free energies of base stacking and ionization potentials, suggesting a possible origin via oxidation reactions involving electron loss and subsequent electron transfer (hole migration......) between neighboring bases. A set of ~40,000 somatic mutations at G·C base pairs identified in a lung cancer genome exhibited similar correlations, implying that hole migration may also be involved. We conclude that alternative DNA conformations, WRN deficiency and lung tumorigenesis may all serve...

  16. Repair of oxidative DNA base damage in the host genome influences the HIV integration site sequence preference.

    Directory of Open Access Journals (Sweden)

    Geoffrey R Bennett

    Full Text Available Host base excision repair (BER proteins that repair oxidative damage enhance HIV infection. These proteins include the oxidative DNA damage glycosylases 8-oxo-guanine DNA glycosylase (OGG1 and mutY homolog (MYH as well as DNA polymerase beta (Polβ. While deletion of oxidative BER genes leads to decreased HIV infection and integration efficiency, the mechanism remains unknown. One hypothesis is that BER proteins repair the DNA gapped integration intermediate. An alternative hypothesis considers that the most common oxidative DNA base damages occur on guanines. The subtle consensus sequence preference at HIV integration sites includes multiple G:C base pairs surrounding the points of joining. These observations suggest a role for oxidative BER during integration targeting at the nucleotide level. We examined the hypothesis that BER repairs a gapped integration intermediate by measuring HIV infection efficiency in Polβ null cell lines complemented with active site point mutants of Polβ. A DNA synthesis defective mutant, but not a 5'dRP lyase mutant, rescued HIV infection efficiency to wild type levels; this suggested Polβ DNA synthesis activity is not necessary while 5'dRP lyase activity is required for efficient HIV infection. An alternate hypothesis that BER events in the host genome influence HIV integration site selection was examined by sequencing integration sites in OGG1 and MYH null cells. In the absence of these 8-oxo-guanine specific glycosylases the chromatin elements of HIV integration site selection remain the same as in wild type cells. However, the HIV integration site sequence preference at G:C base pairs is altered at several positions in OGG1 and MYH null cells. Inefficient HIV infection in the absence of oxidative BER proteins does not appear related to repair of the gapped integration intermediate; instead oxidative damage repair may participate in HIV integration site preference at the sequence level.

  17. Real-Time Nucleic Acid Sequence-Based Amplification Using Molecular Beacons for Detection of Enterovirus RNA in Clinical Specimens

    OpenAIRE

    Landry, Marie L.; Garner, Robin; Ferguson, David

    2005-01-01

    Real-time nucleic acid sequence-based amplification (NASBA) using molecular beacon technology (NASBA-beacon) was compared to standard NASBA with postamplification hybridization using electrochemiluminescently labeled probes (NASBA-ECL) for detection of enteroviruses (EV) in 133 cerebrospinal fluid and 27 stool samples. NASBA-ECL and NASBA-beacon were similar in sensitivity, detecting 55 (100%) and 52 (94.5%) EV-positive samples, respectively. There were no false positives. Both NASBA assays w...

  18. Navigating the Rapids: The Development of Regulated Next-Generation Sequencing-Based Clinical Trial Assays and Companion Diagnostics

    OpenAIRE

    Saumya ePant; Russell eWeiner; Matthew John Marton

    2014-01-01

    Over the past decade, next-generation sequencing (NGS) technology has experienced meteoric growth in the aspects of platform, technology, and supporting bioinformatics development allowing its widespread and rapid uptake in research settings. More recently, NGS-based genomic data have been exploited to better understand disease development and patient characteristics that influence response to a given therapeutic intervention. Cancer, as a disease characterized by and driven by the tumor gene...

  19. Infrageneric Phylogeny and Temporal Divergence of Sorghum (Andropogoneae, Poaceae) Based on Low-Copy Nuclear and Plastid Sequences

    OpenAIRE

    Qing Liu; Huan Liu; Jun Wen; Peterson, Paul M.

    2014-01-01

    The infrageneric phylogeny and temporal divergence of Sorghum were explored in the present study. Sequence data of two low-copy nuclear (LCN) genes, phosphoenolpyruvate carboxylase 4 (Pepc4) and granule-bound starch synthase I (GBSSI), from 79 accessions of Sorghum plus Cleistachne sorghoides together with those from outgroups were used for maximum likelihood (ML) and Bayesian inference (BI) analyses. Bayesian dating based on three plastid DNA markers (ndhA intron, rpl32-trnL, and rps16 intro...

  20. Sequence based polymorphic (SBP marker technology for targeted genomic regions: its application in generating a molecular map of the Arabidopsis thaliana genome

    Directory of Open Access Journals (Sweden)

    Sahu Binod B

    2012-01-01

    Full Text Available Abstract Background Molecular markers facilitate both genotype identification, essential for modern animal and plant breeding, and the isolation of genes based on their map positions. Advancements in sequencing technology have made possible the identification of single nucleotide polymorphisms (SNPs for any genomic regions. Here a sequence based polymorphic (SBP marker technology for generating molecular markers for targeted genomic regions in Arabidopsis is described. Results A ~3X genome coverage sequence of the Arabidopsis thaliana ecotype, Niederzenz (Nd-0 was obtained by applying Illumina's sequencing by synthesis (Solexa technology. Comparison of the Nd-0 genome sequence with the assembled Columbia-0 (Col-0 genome sequence identified putative single nucleotide polymorphisms (SNPs throughout the entire genome. Multiple 75 base pair Nd-0 sequence reads containing SNPs and originating from individual genomic DNA molecules were the basis for developing co-dominant SBP markers. SNPs containing Col-0 sequences, supported by transcript sequences or sequences from multiple BAC clones, were compared to the respective Nd-0 sequences to identify possible restriction endonuclease enzyme site variations. Small amplicons, PCR amplified from both ecotypes, were digested with suitable restriction enzymes and resolved on a gel to reveal the sequence based polymorphisms. By applying this technology, 21 SBP markers for the marker poor regions of the Arabidopsis map representing polymorphisms between Col-0 and Nd-0 ecotypes were generated. Conclusions The SBP marker technology described here allowed the development of molecular markers for targeted genomic regions of Arabidopsis. It should facilitate isolation of co-dominant molecular markers for targeted genomic regions of any animal or plant species, whose genomic sequences have been assembled. This technology will particularly facilitate the development of high density molecular marker maps, essential for

  1. A new navigation approach of terrain contour matching based on 3-D terrain reconstruction from onboard image sequence

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    This article presents a passive navigation method of terrain contour matching by reconstructing the 3-D terrain from the image sequence(acquired by the onboard camera).To achieve automation and simultaneity of the image sequence processing for navigation,a correspondence registration method based on control points tracking is proposed which tracks the sparse control points through the whole image sequence and uses them as correspondence in the relation geometry solution.Besides,a key frame selection method based on the images overlapping ratio and intersecting angles is explored,thereafter the requirement for the camera system configuration is provided.The proposed method also includes an optimal local homography estimating algorithm according to the control points,which helps correctly predict points to be matched and their speed corresponding.Consequently,the real-time 3-D terrain of the trajectory thus reconstructed is matched with the referenced terrain map,and the result of which provides navigating information.The digital simulation experiment and the real image based experiment have verified the proposed method.

  2. Phylogeny and divergence of Chinese Angiopteridaceae based on chloroplast DNA sequence data (rbcL and trnL-F)

    Institute of Scientific and Technical Information of China (English)

    LI ChunXiang; LU ShuGang

    2007-01-01

    Marattioid ferns are an ancient lineage of primitive vascular plants that first appeared in the middle Carboniferous. Extant members are almost exclusively restricted to tropical regions, and the species-rich family Angiopteridaceae are limited in their distribution to the eastern hemisphere; relationships within the group are currently vague. Here the phylogenetic relationship between Angiopteris Hoffm. and Archangiopteris Christ et Gies. was evaluated based on the sequence analysis of chloroplast rbcL gene and trnL-F intergenic spacer with MEGA2 and MrBayes v3.0b4. On the basis of the phylogenetic pattern and fossil record, we further estimated the divergence time for the two genera. The phylogenetic trees revealed that all species of Angiopteris and Archangiopteris in this study formed a monophyletic group with strong statistical support, but the relationship between the two genera remained unresolved based on individual sequence analysis. On the other hand, the sequence analyses of combined data set revealed that Archangiopteris species diverged first, indicating that Archangiopteris may not be a direct derivative as traditionally assumed. The clade of Angiopteris and Archangiopteris appears to have diversified in the late Oligocene (≈26 Ma) based on the molecular estimate. Thus, the evolutionary history of extant Angiopteris and Archangiopteris has been characterized by ancient origin and recent diversification, and these groups are not relic and endangered lineages as traditionally considered.

  3. A New Revised DNA Cramp Tool Based Approach of Chopping DNA Repetitive and Non-Repetitive Genome Sequences

    Directory of Open Access Journals (Sweden)

    V.Hari Prasad

    2012-11-01

    Full Text Available In vogue tremendous amount of data generated day by day by the living organism of genetic sequences and its accumulation in database, their size is growing in an exponential manner. Due to excessive storage of DNA sequences in public databases like NCBI, EMBL and DDBJ archival maintenance is tedious task. Transmission of information from one place to another place in network management systems is also a critical task. So To improve the efficiency and to reduce the overhead of the database need of compression arises in database optimization. In this connection different techniques were bloomed, but achieved results are not bountiful. Many classical algorithms are fails to compress genetic sequences due to the specificity of text encoded in dna and few of the existing techniques achieved positive results. DNA is repetitive and non repetitive in nature. Our proposed technique DNACRAMP is applicable on repetitive and non repetitive sequences of dna and it yields better compression ratio in terms of bits per bases. This is compared with existing techniques and observed that our one is the optimum technique and compression results are on par with existing techniques.

  4. Analysis of single nucleotide polymorphisms based on RNA sequencing data of diverse bio-geographical accessions in barley

    Science.gov (United States)

    Takahagi, Kotaro; Uehara-Yamaguchi, Yukiko; Yoshida, Takuhiro; Sakurai, Tetsuya; Shinozaki, Kazuo; Mochida, Keiichi; Saisho, Daisuke

    2016-01-01

    Barley is one of the founder crops of Old world agriculture and has become the fourth most important cereal worldwide. Information on genome-scale DNA polymorphisms allows elucidating the evolutionary history behind domestication, as well as discovering and isolating useful genes for molecular breeding. Deep transcriptome sequencing enables the exploration of sequence variations in transcribed sequences; such analysis is particularly useful for species with large and complex genomes, such as barley. In this study, we performed RNA sequencing of 20 barley accessions, comprising representatives of several biogeographic regions and a wild ancestor. We identified 38,729 to 79,949 SNPs in the 19 domesticated accessions and 55,403 SNPs in the wild barley and revealed their genome-wide distribution using a reference genome. Genome-scale comparisons among accessions showed a clear differentiation between oriental and occidental barley populations. The results based on population structure analyses provide genome-scale properties of sub-populations grouped to oriental, occidental and marginal groups in barley. Our findings suggest that the oriental population of domesticated barley has genomic variations distinct from those in occidental groups, which might have contributed to barley’s domestication. PMID:27616653

  5. Flanking p10 contribution and sequence bias in matrix based epitope prediction: revisiting the assumption of independent binding pockets

    Directory of Open Access Journals (Sweden)

    Parry Christian S

    2008-10-01

    Full Text Available Abstract Background Eluted natural peptides from major histocompatibility molecules show patterns of conserved residues. Crystallographic structures show that the bound peptide in class II major histocompatibility complex adopts a near uniform polyproline II-like conformation. This way allele-specific favoured residues are able to anchor into pockets in the binding groove leaving other peptide side chains exposed for recognition by T cells. The anchor residues form a motif. This sequence pattern can be used to screen large sequences for potential epitopes. Quantitative matrices extend the motif idea to include the contribution of non-anchor peptide residues. This report examines two new matrices that extend the binding register to incorporate the polymorphic p10 pocket of human leukocyte antigen DR1. Their performance is quantified against experimental binding measurements and against the canonical nine-residue register matrix. Results One new matrix shows significant improvement over the base matrix; the other does not. The new matrices differ in the sequence of the peptide library. Conclusion One of the extended quantitative matrices showed significant improvement in prediction over the original nine residue matrix and over the other extended matrix. Proline in the sequence of the peptide library of the better performing matrix presumably stabilizes the peptide conformation through neighbour interactions. Such interactions may influence epitope prediction in this test of quantitative matrices. This calls into question the assumption of the independent contribution of individual binding pockets.

  6. Explicit-Explicit Sequence Calculation Method for the Wheel/rail Rolling Contact Problem Based on ANSYS/LS-DYNA

    Directory of Open Access Journals (Sweden)

    Song Hua

    2015-01-01

    Full Text Available The wheel/rail rolling contact can not only lead to rail fatigue damage but also bring rail corrugation. According to the wheel/rail rolling contact problem, based on the ANSYS/LS-DYNA explicit analysis software, this paper established the finite element model of wheel/rail rolling contact in non-linear steady-state curve negotiation, and proposed the explicit-explicit sequence calculation method that can be used to solve this model. The explicit-explicit sequence calculation method uses explicit solver in calculating the rail pre-stressing force and the process of wheel/rail rolling contact. Compared with the implicit-explicit sequence calculation method that has been widely applied, the explicit-explicit sequence calculation method including similar precision in calculation with faster speed and higher efficiency, make it more applicable to solve the wheel/rail rolling contact problem of non-linear steady-state curving with a large solving model or a high non-linear degree.

  7. Sequence- and interactome-based prediction of viral protein hotspots targeting host proteins: a case study for HIV Nef.

    Directory of Open Access Journals (Sweden)

    Mahdi Sarmady

    Full Text Available Virus proteins alter protein pathways of the host toward the synthesis of viral particles by breaking and making edges via binding to host proteins. In this study, we developed a computational approach to predict viral sequence hotspots for binding to host proteins based on sequences of viral and host proteins and literature-curated virus-host protein interactome data. We use a motif discovery algorithm repeatedly on collections of sequences of viral proteins and immediate binding partners of their host targets and choose only those motifs that are conserved on viral sequences and highly statistically enriched among binding partners of virus protein targeted host proteins. Our results match experimental data on binding sites of Nef to host proteins such as MAPK1, VAV1, LCK, HCK, HLA-A, CD4, FYN, and GNB2L1 with high statistical significance but is a poor predictor of Nef binding sites on highly flexible, hoop-like regions. Predicted hotspots recapture CD8 cell epitopes of HIV Nef highlighting their importance in modulating virus-host interactions. Host proteins potentially targeted or outcompeted by Nef appear crowding the T cell receptor, natural killer cell mediated cytotoxicity, and neurotrophin signaling pathways. Scanning of HIV Nef motifs on multiple alignments of hepatitis C protein NS5A produces results consistent with literature, indicating the potential value of the hotspot discovery in advancing our understanding of virus-host crosstalk.

  8. Genomic-scale comparison of sequence- and structure-based methods of function prediction: Does structure provide additional insight?

    Science.gov (United States)

    Fetrow, Jacquelyn S.; Siew, Naomi; Di Gennaro, Jeannine A.; Martinez-Yamout, Maria; Dyson, H. Jane; Skolnick, Jeffrey

    2001-01-01

    A function annotation method using the sequence-to-structure-to-function paradigm is applied to the identification of all disulfide oxidoreductases in the Saccharomyces cerevisiae genome. The method identifies 27 sequences as potential disulfide oxidoreductases. All previously known thioredoxins, glutaredoxins, and disulfide isomerases are correctly identified. Three of the 27 predictions are probable false-positives. Three novel predictions, which subsequently have been experimentally validated, are presented. Two additional novel predictions suggest a disulfide oxidoreductase regulatory mechanism for two subunits (OST3 and OST6) of the yeast oligosaccharyltransferase complex. Based on homology, this prediction can be extended to a potential tumor suppressor gene, N33, in humans, whose biochemical function was not previously known. Attempts to obtain a folded, active N33 construct to test the prediction were unsuccessful. The results show that structure prediction coupled with biochemically relevant structural motifs is a powerful method for the function annotation of genome sequences and can provide more detailed, robust predictions than function prediction methods that rely on sequence comparison alone. PMID:11316881

  9. A Small Surrogate for the Golden Angle in Time-Resolved Radial MRI Based on Generalized Fibonacci Sequences.

    Science.gov (United States)

    Wundrak, Stefan; Paul, Jan; Ulrici, Johannes; Hell, Erich; Rasche, Volker

    2015-06-01

    In golden angle radial magnetic resonance imaging a constant azimuthal radial profile spacing of 111.246...(°) guarantees a nearly uniform azimuthal profile distribution in k-space for an arbitrary number of radial profiles. Even though this profile order is advantageous for various real-time imaging methods, in combination with balanced steady-state free precession (SSFP) sequences the large azimuthal angle increment may lead to strong image artifacts, due to the varying eddy currents introduced by the rapidly switching gradient scheme. Based on a generalized Fibonacci sequence, a new sequence of smaller irrational angles is introduced ( 49.750...(°), 32.039...(°), 27.198...(°), 23.628...(°), ... ). The subsequent profile orders guarantee the same sampling efficiency as the golden angle if at least a minimum number of radial profiles is used for reconstruction. The suggested angular increments are applied for dynamic imaging of the heart and the temporomandibular joint. It is shown that for balanced SSFP sequences, trajectories using the smaller golden angle surrogates strongly reduce the image artifacts, while the free retrospective choice of the reconstruction window width is maintained. PMID:25532172

  10. Captured metagenomics: large-scale targeting of genes based on 'sequence capture' reveals functional diversity in soils.

    Science.gov (United States)

    Manoharan, Lokeshwaran; Kushwaha, Sandeep K; Hedlund, Katarina; Ahrén, Dag

    2015-12-01

    Microbial enzyme diversity is a key to understand many ecosystem processes. Whole metagenome sequencing (WMG) obtains information on functional genes, but it is costly and inefficient due to large amount of sequencing that is required. In this study, we have applied a captured metagenomics technique for functional genes in soil microorganisms, as an alternative to WMG. Large-scale targeting of functional genes, coding for enzymes related to organic matter degradation, was applied to two agricultural soil communities through captured metagenomics. Captured metagenomics uses custom-designed, hybridization-based oligonucleotide probes that enrich functional genes of interest in metagenomic libraries where only probe-bound DNA fragments are sequenced. The captured metagenomes were highly enriched with targeted genes while maintaining their target diversity and their taxonomic distribution correlated well with the traditional ribosomal sequencing. The captured metagenomes were highly enriched with genes related to organic matter degradation; at least five times more than similar, publicly available soil WMG projects. This target enrichment technique also preserves the functional representation of the soils, thereby facilitating comparative metagenomics projects. Here, we present the first study that applies the captured metagenomics approach in large scale, and this novel method allows deep investigations of central ecosystem processes by studying functional gene abundances.

  11. Prediction of protein structural features from sequence data based on Shannon entropy and Kolmogorov complexity.

    Science.gov (United States)

    Bywater, Robert Paul

    2015-01-01

    While the genome for a given organism stores the information necessary for the organism to function and flourish it is the proteins that are encoded by the genome that perhaps more than anything else characterize the phenotype for that organism. It is therefore not surprising that one of the many approaches to understanding and predicting protein folding and properties has come from genomics and more specifically from multiple sequence alignments. In this work I explore ways in which data derived from sequence alignment data can be used to investigate in a predictive way three different aspects of protein structure: secondary structures, inter-residue contacts and the dynamics of switching between different states of the protein. In particular the use of Kolmogorov complexity has identified a novel pathway towards achieving these goals.

  12. Photobiont diversity in lichens from metal-rich substrata based on ITS rDNA sequences.

    Science.gov (United States)

    Backor, Martin; Peksa, Ondrej; Skaloud, Pavel; Backorová, Miriam

    2010-05-01

    The photobiont is considered as the more sensitive partner of lichen symbiosis in metal pollution. For this reason the presence of a metal tolerant photobiont in lichens may be a key factor of ecological success of lichens growing on metal polluted substrata. The photobiont inventory was examined for terricolous lichen community growing in Cu mine-spoil heaps derived by historical mining. Sequences of internal transcribed spacer (ITS) were phylogenetically analyzed using maximum likelihood analyses. A total of 50 ITS algal sequences were obtained from 22 selected lichen taxa collected at three Cu mine-spoil heaps and two control localities. Algae associated with Cladonia and Stereocaulon were identified as members of several Asterochloris lineages, photobionts of cetrarioid lichens clustered with Trebouxia hypogymniae ined. We did not find close relationship between heavy metal content (in localities as well as lichen thalli) and photobiont diversity. Presence of multiple algal genotypes in single lichen thallus has been confirmed. PMID:20031214

  13. PyVDT: A PsychoPy-Based Visual Sequence Detection Task

    Directory of Open Access Journals (Sweden)

    Mads Hansen

    2016-06-01

    Full Text Available PyVDT is a computerized test consisting of two brief visual sequence detection tasks in which participants watch single digits displayed on screen and respond whenever target digit sequences (even – odd – even are displayed. The total duration of the test is around five minutes. PyVDT is a reimplementation of the Visual Monitoring Task (VMT, a task thought to measure working memory. PyVDT uses the PsychoPy API to display digits, to plot diagnostic information, and to output log files and results. It is available for download on Figshare and GitHub. PyVDT is free software and has minimal software and hardware requirements. Thus, PyVDT provides a readily available visual monitoring task for use in experiments within cognitive science and related fields.

  14. The Effect of Haptic Cues on Motor and Perceptual Based Implicit Sequence Learning

    Directory of Open Access Journals (Sweden)

    Dongwon eKim

    2014-03-01

    Full Text Available We introduced haptic cues to the serial reaction time (SRT sequence learning task alongside the standard visual cues to assess the relative contributions of haptic and visual stimuli to the formation of motor and perceptual memories. We used motorized keys to deliver brief pulse-like displacements to the resting fingers, expecting that the proximity and similarity of these cues to the subsequent response motor actions (finger activated key-presses would strengthen the motor memory trace in particular. We adopted the experimental protocol developed by Willingham in 1999 to explore whether haptic cues contribute differently than visual cues to the balance of motor and perceptual learning. We found that sequence learning occurs with haptic stimuli as with visual stimuli and we found that irrespective of the stimuli (visual or haptic the serial reaction time task leads to a greater amount of motor learning than perceptual learning.

  15. rVISTA for Comparative Sequence-Based Discovery of Functional Transcription Factor Binding Sites

    Energy Technology Data Exchange (ETDEWEB)

    Loots, Gabriela G.; Ovcharenko, Ivan; Pachter, Lior; Dubchak, Inna; Rubin, Edward M.

    2002-03-08

    Identifying transcriptional regulatory elements represents a significant challenge in annotating the genomes of higher vertebrates. We have developed a computational tool, rVISTA, for high-throughput discovery of cis-regulatory elements that combines transcription factor binding site prediction and the analysis of inter-species sequence conservation. Here, we illustrate the ability of rVISTA to identify true transcription factor binding sites through the analysis of AP-1 and NFAT binding sites in the 1 Mb well-annotated cytokine gene cluster1 (Hs5q31; Mm11). The exploitation of orthologous human-mouse data set resulted in the elimination of 95 percent of the 38,000 binding sites predicted upon analysis of the human sequence alone, while it identified 87 percent of the experimentally verified binding sites in this region.

  16. Secondary structure, a missing component of sequence-based minimotif definitions.

    Directory of Open Access Journals (Sweden)

    David P Sargeant

    Full Text Available Minimotifs are short contiguous segments of proteins that have a known biological function. The hundreds of thousands of minimotifs discovered thus far are an important part of the theoretical understanding of the specificity of protein-protein interactions, posttranslational modifications, and signal transduction that occur in cells. However, a longstanding problem is that the different abstractions of the sequence definitions do not accurately capture the specificity, despite decades of effort by many labs. We present evidence that structure is an essential component of minimotif specificity, yet is not used in minimotif definitions. Our analysis of several known minimotifs as case studies, analysis of occurrences of minimotifs in structured and disordered regions of proteins, and review of the literature support a new model for minimotif definitions that includes sequence, structure, and function.

  17. Analysis of genetic relationship among Indonesian native chicken breeds based on 335 D-loop sequences

    OpenAIRE

    Sri Sulandari; M Syamsul Arifin Zein; Tike Sartika

    2008-01-01

    he Mitochondrial DNA (mtDNA) D-loop segment was PCR amplified and subsequently sequenced for a total of 335 individuals from Indonesian native chicken. The individuals were drawn from sixteen populations of native chicken and three individuals of green jungle fowls (Gallus varius). Indonesian native chicken populations were: Pelung Sembawa, PL (n = 18), Pelung Cianjur, PLC (n = 29) and Arab Silver, ARS (n=30), Cemani, CM (n = 32), Gaok, GA (n = 7), Kedu Hitam, KDH (n = 11), Wareng, T & TW (n ...

  18. Deciphering Clostridium tyrobutyricum Metabolism Based on the Whole-Genome Sequence and Proteome Analyses

    OpenAIRE

    Lee, Joungmin; Jang, Yu-Sin; Han, Mee-Jung; Kim, Jin Young; Lee, Sang Yup

    2016-01-01

    ABSTRACT Clostridium tyrobutyricum is a Gram-positive anaerobic bacterium that efficiently produces butyric acid and is considered a promising host for anaerobic production of bulk chemicals. Due to limited knowledge on the genetic and metabolic characteristics of this strain, however, little progress has been made in metabolic engineering of this strain. Here we report the complete genome sequence of C. tyrobutyricum KCTC 5387 (ATCC 25755), which consists of a 3.07-Mbp chromosome and a 63-kb...

  19. Relationships of wild and domesticated rices (Oryza AA genome species) based upon whole chloroplast genome sequences

    OpenAIRE

    Wambugu, Peterson W.; Marta Brozynska; Agnelo Furtado; Daniel L. Waters; Robert J. Henry

    2015-01-01

    Rice is the most important crop in the world, acting as the staple food for over half of the world’s population. The evolutionary relationships of cultivated rice and its wild relatives have remained contentious and inconclusive. Here we report on the use of whole chloroplast sequences to elucidate the evolutionary and phylogenetic relationships in the AA genome Oryza species, representing the primary gene pool of rice. This is the first study that has produced a well resolved and strongly su...

  20. Molecular identification based on ITS sequences for Kappaphycus and Eucheuma cultivated in China

    Institute of Scientific and Technical Information of China (English)

    ZHAO Sufen; HE Peimin

    2011-01-01

    The systematic classification of the Eucheurnatoideae is difficult because of their variable morphology and interpretation of reproductive structures.Kappaphycus and Eucheuma specimens cultivated on the Hainan and Fujian coast of China were introduced from Vietnam,the Philippines and Indonesia.Combined with morphological characteristics,all Kappaphycus and Eucheuma cultivated strains were identified by internal transcribed spacer (ITS) sequences.The phylogenetic tree was constructed using neighbor-joining and maximum likelihood methods.The results indicate that different ITS sequence lengths occurred in the different genera and species.An obvious difference in morphology could be found in the protuberance shape between Kappaphycus and Eucheuma.The protuberance in Eucheuma was thorn-like and in Kappaphycus was wartlike or papillate.Their ITS sequence lengths differed significantly in nucleotide variation rates up to 58.55%-63.90%.All nucleotide variations occurred in the ITS1 andITS2 regions except for five nucleotide transversions in the 5.8S rDNA region.In addition,the difference was at the branches among congeneric species.Kappaphycus sp.had branches with small buds,while K.alvarezii did not have such a feature.The nucleotide variation rates varied from 7.02% to 7.48% among species; within the same species of the clades it was <1.20%.Eucheumatoideae algae cultivated in China consisted of three clades,K.alvarezii,Kappaphycus sp.,and E.denticulatum.The results indicate that ITS sequence analysis was an effective way for identification of interspecies and intraspecies phylogenetic relationships and might provide a clue for molecular identification of algal Eucheumatoideae.