WorldWideScience

Sample records for biostratigraphic sequence analysis

  1. Sequence stratigraphic interpretation of parts of Anambra Basin, Nigeria using geophysical well logs and biostratigraphic data

    Science.gov (United States)

    Anakwuba, E. K.; Ajaegwu, N. E.; Ejeke, C. F.; Onyekwelu, C. U.; Chinwuko, A. I.

    2018-03-01

    The Anambra basin constitutes the southeastern lower portion of the Benue Trough, which is a large structural depression that is divided into lower, middle and upper parts; and is one of the least studied inland sedimentary basins in Nigeria. Sequence stratigraphic interpretation had been carried out in parts of the Anambra Basin using data from three wells (Alo-1 Igbariam-1 and Ajire-1). Geophysical well logs and biostratigraphic data were integrated in order to identify key bounding surfaces, subdivide the sediment packages, correlate sand continuity and interpret the environment of deposition in the fields. Biostratigraphic interpretation, using foraminifera and plankton population and diversity, reveals five maximum flooding surfaces (MFS) in the fields. Five sequence boundaries (SB) were also identified using the well log analysis. Four 3rd order genetic sequences bounded by maximum flooding surfaces (MFS-1 to MFS-6) were identified in the areas; four complete sequences and one incomplete sequence were identified in both Alo-1 and Igbariam-1 wells while Ajire-1 has an no complete sequence. The identified system tracts delineated comprises Lowstand Systems Tracts (progradational to aggradational to retrogradational packages), Transgressive Systems Tracts (retrogradational packages) and Highstand Systems Tracts (aggradational to progradational packages) in each well. The sand continuity across the fields reveal sands S1 to S5 where S1 is present in Ajire-1 well and Igbariam-1 well but not in Alo-1 well. The sands S4 to S5 run across the three fields at different depths. The formations penetrated by the wells starting from the base are; Nkporo Formation (Campanian), Mamu Formation (Late Campanian to Early Maastrichtian), Ajali Sandstone (Maastrichtian), Nsukka Formation (Late Maastrichtian to Early Palaeocene), Imo Formation (Palaeocene) and Nanka Sand (Eocene). The environments of deposition revealed are from coastal to bathyal. The sands of lowstand system

  2. Biostratigraphic analysis based on palynomorphs and ostracods from core 2-JNS-01PE, Lower Cretaceous, Jatobá Basin, northeastern Brazil

    Science.gov (United States)

    Nascimento, Luiz R. D. S. L.; Tomé, Maria E. T. R.; Barreto, Alcina M. F.; Holanda de Oliveira, David; Neumann, Virgínio H. M. L.

    2017-07-01

    This manuscript presents a biostratigraphic analysis based on non-marine palynomorphs and ostracods during the interval corresponding to the Aptian - lower Albian (local Alagoas stage) in the Jatobá Basin, northeastern Brazil. The data used for the analysis were from 179 samples collected in core 2-JSN-01-PE, which was drilled in the locality of Serra Negra, municipality of Ibimirim, Pernambuco. Of the 179 samples collected and prepared, only 23 presented ostracods, whereas 64 levels were carriers of organic waste, which aided in the biostratigraphic correlation of the studied core. From the palynological data, it was possible to identify the upper limit of the Inaperturopollenites turbatus (P-260) palynozone and the Sergipea variverrucata (P-270) palynozone and the lower limit of the Complicatisaccus cearensis (P-280) palynozone in the core 2-JSN-01-PE. Among the palynozones, the upper limit of P-260, whose record is difficult to track in northeastern Brazil and has been identified only in the Sergipe/Alagoas basin, and the limit of P-270 and P-280 in the Jatobá Basin, which had not yet been identified, can be highlighted. From the data based on non-marine ostracods, it was possible identify the "Cytheridea"? spp. ex. Group 201/218 Biozone (herein designated Damonella grandiensis), code 011. Such record in the Jatobá Basin is represented for a diverse fauna when compared with the record of monospecific fauna initially proposed in the Sergipe/Alagoas Basin.

  3. 361 Biostratigraphical Analysis and Palaeoenvironmental ...

    African Journals Online (AJOL)

    User

    phase with first frank marine transgressions (Cenomanian-Early Senonian), which allowed deposition of calcispherid limestones that will erode during. Early Senonian, 3) a phase of active expansion and subsidence (Campanian-. Maastrichtian) with transgressive marine clays over eroded surfaces affecting in some places ...

  4. Quantitative Biostratigraphic Age Control of Glacimarine Sediments, ANDRILL 1B Drillcore, McMurdo Ice Shelf

    Science.gov (United States)

    Cody, R.; Levy, R.; Crampton, J.; Wilson, G.; Naish, T.; Harwood, D.; Winter, D.; Scherer, R.

    2008-12-01

    Interpretation of glacimarine sedimentary records from Antarctic shelf drillholes has been greatly hampered by the ambiguous age of strata where erosional unconformities and coarse diamictite deposits truncate or omit the mangetostratigraphic and biostratigraphic units used for correlation. However, new quantitative biostratigraphic techniques enable the correlation of sparse, incomplete, and reworking-prone Plio- Pleistocene records of Ross Sea fossil diatom flora with the more extensively documented but potentially diachronous offshore history of species' first and last appearances (FAs and LAs). The approach uses a comprehensive regional database of fossil records and computer-automated search algorithms to (a) find the multidimensional line of correlation (LOC) that best fits local observations, and (b) map out confidence intervals based on the full range of equally parsimonious composite FA/LA sequences and local range-end adjustments. An integrated, quantitative chronostratigraphic model for the AND-1B drillcore was constructed iteratively: the initial LOC was based solely on preliminary on-ice observations of fossil diatom highest and lowest occurrences (HOs and LOs) and their correlation with a database of other local event records from 24 DVDP, CIROS, and IODP drillcore sections. The model was subsequently updated as off-ice work yielded additional biostratigraphic marker events and revised event horizons, Ar/Ar ages for volcanic material, better- constrained magnetostratigraphic interpretations, and refinements to computational/analytical methodology. The current quantitative biostratigraphic age model for the AND-1B hole integrates the local ranges of 29 diatom taxa, 5 dated ashes, and independently constrained ages of 5 paleomagnetic reversals. Results corroborate almost all of the on-ice geomagnetic polarity reversal age interpretations, but identify a previously unrecognized major disconformity (~800kyr hiatus) near 440mbsf. It is significant to note

  5. Biological sequence analysis

    DEFF Research Database (Denmark)

    Durbin, Richard; Eddy, Sean; Krogh, Anders Stærmose

    This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis, and phylogene......This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis...

  6. Image sequence analysis

    CERN Document Server

    1981-01-01

    The processing of image sequences has a broad spectrum of important applica­ tions including target tracking, robot navigation, bandwidth compression of TV conferencing video signals, studying the motion of biological cells using microcinematography, cloud tracking, and highway traffic monitoring. Image sequence processing involves a large amount of data. However, because of the progress in computer, LSI, and VLSI technologies, we have now reached a stage when many useful processing tasks can be done in a reasonable amount of time. As a result, research and development activities in image sequence analysis have recently been growing at a rapid pace. An IEEE Computer Society Workshop on Computer Analysis of Time-Varying Imagery was held in Philadelphia, April 5-6, 1979. A related special issue of the IEEE Transactions on Pattern Anal­ ysis and Machine Intelligence was published in November 1980. The IEEE Com­ puter magazine has also published a special issue on the subject in 1981. The purpose of this book ...

  7. Sequence analysis on microcomputers.

    Science.gov (United States)

    Cannon, G C

    1987-10-02

    Overall, each of the program packages performed their tasks satisfactorily. For analyses where there was a well-defined answer, such as a search for a restriction site, there were few significant differences between the program sets. However, for tasks in which a degree of flexibility is desirable, such as homology or similarity determinations and database searches, DNASTAR consistently afforded the user more options in conducting the required analysis than did the other two packages. However, for laboratories where sequence analysis is not a major effort and the expense of a full sequence analysis workstation cannot be justified, MicroGenie and IBI-Pustell offer a satisfactory alternative. MicroGenie is a polished program system. Many may find that its user interface is more "user friendly" than the standard menu-driven interfaces. Its system of filing sequences under individual passwords facilitates use by more than one person. MicroGenie uses a hardware device for software protection that occupies a card slot in the computer on which it is used. Although I am sympathetic to the problem of software piracy, I feel that a less drastic solution is in order for a program likely to be sharing limited computer space with other software packages. The IBI-Pustell package performs the required analysis functions as accurately and quickly as MicroGenie but it lacks the clearness and ease of use. The menu system seems disjointed, and new or infrequent users often find themselves at apparent "dead-end menus" where the only clear alternative is to restart the entire program package. It is suggested from published accounts that the user interface is going to be upgraded and perhaps when that version is available, use of the system will be improved. The documentation accompanying each package was relatively clear as to how to run the programs, but all three packages assumed that the user was familiar with the computational techniques employed. MicroGenie and IBI-Pustell further

  8. Image analysis for DNA sequencing

    International Nuclear Information System (INIS)

    Palaniappan, K.; Huang, T.S.

    1991-01-01

    This paper reports that there is a great deal of interest in automating the process of DNA (deoxyribonucleic acid) sequencing to support the analysis of genomic DNA such as the Human and Mouse Genome projects. In one class of gel-based sequencing protocols autoradiograph images are generated in the final step and usually require manual interpretation to reconstruct the DNA sequence represented by the image. The need to handle a large volume of sequence information necessitates automation of the manual autoradiograph reading step through image analysis in order to reduce the length of time required to obtain sequence data and reduce transcription errors. Various adaptive image enhancement, segmentation and alignment methods were applied to autoradiograph images. The methods are adaptive to the local characteristics of the image such as noise, background signal, or presence of edges. Once the two-dimensional data is converted to a set of aligned one-dimensional profiles waveform analysis is used to determine the location of each band which represents one nucleotide in the sequence. Different classification strategies including a rule-based approach are investigated to map the profile signals, augmented with the original two-dimensional image data as necessary, to textual DNA sequence information

  9. Integrated sequence analysis. Final report

    International Nuclear Information System (INIS)

    Andersson, K.; Pyy, P.

    1998-02-01

    The NKS/RAK subprojet 3 'integrated sequence analysis' (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term 'methodology' denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  10. Integrated sequence analysis. Final report

    Energy Technology Data Exchange (ETDEWEB)

    Andersson, K.; Pyy, P

    1998-02-01

    The NKS/RAK subprojet 3 `integrated sequence analysis` (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term `methodology` denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  11. Sequence Handling by Sequence Analysis Toolbox v1.0

    DEFF Research Database (Denmark)

    Ingrell, Christian Ravnsborg; Matthiesen, Rune; Jensen, Ole Nørregaard

    2006-01-01

    analysis toolbox v1.0 was to have a general purpose sequence analyzing tool that can import sequences obtained by high-throughput sequencing methods. The program includes algorithms for calculation or prediction of isoelectric point, hydropathicity index, transmembrane segments, and glycosylphosphatidyl......The fact that mass spectrometry have become a high-throughput method calls for bioinformatic tools for automated sequence handling and prediction. For efficient use of bioinformatic tools, it is important that these tools are integrated or interfaced with each other. The purpose of sequence...... inositol-anchored proteins....

  12. Nonlinear analysis of biological sequences

    Energy Technology Data Exchange (ETDEWEB)

    Torney, D.C.; Bruno, W.; Detours, V. [and others

    1998-11-01

    This is the final report of a three-year, Laboratory Directed Research and Development (LDRD) project at the Los Alamos National Laboratory (LANL). The main objectives of this project involved deriving new capabilities for analyzing biological sequences. The authors focused on tabulating the statistical properties exhibited by Human coding DNA sequences and on techniques of inferring the phylogenetic relationships among protein sequences related by descent.

  13. Sequence analysis of Leukemia DNA

    Science.gov (United States)

    Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa

    2018-03-01

    Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.

  14. Fossils from Quaternary fluvial archives: Sources of biostratigraphical, biogeographical and palaeoclimatic evidence

    Science.gov (United States)

    White, Tom S.; Bridgland, David R.; Limondin-Lozouet, Nicole; Schreve, Danielle C.

    2017-06-01

    Fluvial sedimentary archives have the potential to preserve a wide variety of palaeontological evidence, ranging from robust bones and teeth found in coarse gravel aggradations to delicate insect remains and plant macrofossils from fine-grained deposits. Over the last decade, advances in Quaternary biostratigraphy based on vertebrate and invertebrate fossils (primarily mammals and molluscs) have been made in many parts of the world, resulting in improved relative chronologies for fluviatile sequences. Complementary fossil groups, such as insects, ostracods and plant macrofossils, are also increasingly used in multi-proxy palaeoclimatic and palaeoenvironmental reconstructions, allowing direct comparison of the climates and environments that prevailed at different times across widely separated regions. This paper reviews these topics on a regional basis, with an emphasis on the latest published information, and represents an update to the 2007 review compiled by the FLAG-inspired IGCP 449 biostratigraphy subgroup. Disparities in the level of detail available for different regions can largely be attributed to varying potential for preservation of fossil material, which is especially poor in areas of non-calcareous bedrock, but to some extent also reflect research priorities in different parts of the world. Recognition of the value of biostratigraphical and palaeoclimatic frameworks, which have been refined over many decades in the 'core regions' for such research (particularly for the late Middle and Late Pleistocene of NW Europe), has focussed attention on the need to accumulate similar palaeontological datasets in areas lacking such long research histories. Although the emerging datasets from these understudied regions currently allow only tentative conclusions to be drawn, they represent an important stage in the development of independent biostratigraphical and palaeoenvironmental schemes, which can then be compared and contrasted.

  15. A review of biostratigraphic studies in the olistostrome deposits of Karangsambung Formation

    Science.gov (United States)

    Hendrizan, Marfasran

    2018-02-01

    Planktonic foraminifera is widely used for marine sediment biostratigraphy. Foraminiferal biostratigraphy of Karangsambung Formation is relatively rare to be investigated by previous researchers. A review of foraminiferal biostratigraphy is expected to be early work to perform a research about the ages of Tertiary rock formations in Karangsambung. The research area is formed by olistostrome process; a sedimentary slide deposit characterized by bodies of harder rock mixed and dispersed in a matrix. Biostratigraphic studies based on foraminifera and nannoplankton in Karangsambung Formation are still qualitative analysis using fossils biomarker. However, the age of this formation is still debatable based on foraminifera and nannofossil analysis. Two explanations of debatable ages in Karangsambung Formation that is possibly developed in Karangsambung area: firstly, Karangsambung Formation is characterized by normal sedimentation in some places and other regions such Kali Welaran and Clebok, Village as a product of olistostrome, and secondly, Karangsambung Formation is olistostrome deposit. However, micropaleontology sampling and analysis in matrix clays from olistostrome were ignored causing biostratigraphical results in those matrix clays occurred in normal sedimentation process and achieving the age of middle Eocene to Oligocene. We suppose previous authors picked samples in matrix of Karangsambung Formation from several river sections, which will make misinterpretation of the age of Karangsambung Formation. The age of middle to late Eocene probably is the dates of the older sediment that was reworked by sliding and sampling process and accumulated in Karangsambung Formation. The date of Karangsambung Fm is in Oligocene period based on a finding of several calcareous nannofossils. Detailed micropaleontological analysis of olistostrome deposits in Karangsambung Formation should be reevaluated for new finding of the accurate dating. Re-evaluation should start from

  16. Robustness analysis of chiller sequencing control

    International Nuclear Information System (INIS)

    Liao, Yundan; Sun, Yongjun; Huang, Gongsheng

    2015-01-01

    Highlights: • Uncertainties with chiller sequencing control were systematically quantified. • Robustness of chiller sequencing control was systematically analyzed. • Different sequencing control strategies were sensitive to different uncertainties. • A numerical method was developed for easy selection of chiller sequencing control. - Abstract: Multiple-chiller plant is commonly employed in the heating, ventilating and air-conditioning system to increase operational feasibility and energy-efficiency under part load condition. In a multiple-chiller plant, chiller sequencing control plays a key role in achieving overall energy efficiency while not sacrifices the cooling sufficiency for indoor thermal comfort. Various sequencing control strategies have been developed and implemented in practice. Based on the observation that (i) uncertainty, which cannot be avoided in chiller sequencing control, has a significant impact on the control performance and may cause the control fail to achieve the expected control and/or energy performance; and (ii) in current literature few studies have systematically addressed this issue, this paper therefore presents a study on robustness analysis of chiller sequencing control in order to understand the robustness of various chiller sequencing control strategies under different types of uncertainty. Based on the robustness analysis, a simple and applicable method is developed to select the most robust control strategy for a given chiller plant in the presence of uncertainties, which will be verified using case studies

  17. Automated genome sequence analysis and annotation.

    Science.gov (United States)

    Andrade, M A; Brown, N P; Leroy, C; Hoersch, S; de Daruvar, A; Reich, C; Franchini, A; Tamames, J; Valencia, A; Ouzounis, C; Sander, C

    1999-05-01

    Large-scale genome projects generate a rapidly increasing number of sequences, most of them biochemically uncharacterized. Research in bioinformatics contributes to the development of methods for the computational characterization of these sequences. However, the installation and application of these methods require experience and are time consuming. We present here an automatic system for preliminary functional annotation of protein sequences that has been applied to the analysis of sets of sequences from complete genomes, both to refine overall performance and to make new discoveries comparable to those made by human experts. The GeneQuiz system includes a Web-based browser that allows examination of the evidence leading to an automatic annotation and offers additional information, views of the results, and links to biological databases that complement the automatic analysis. System structure and operating principles concerning the use of multiple sequence databases, underlying sequence analysis tools, lexical analyses of database annotations and decision criteria for functional assignments are detailed. The system makes automatic quality assessments of results based on prior experience with the underlying sequence analysis tools; overall error rates in functional assignment are estimated at 2.5-5% for cases annotated with highest reliability ('clear' cases). Sources of over-interpretation of results are discussed with proposals for improvement. A conservative definition for reporting 'new findings' that takes account of database maturity is presented along with examples of possible kinds of discoveries (new function, family and superfamily) made by the system. System performance in relation to sequence database coverage, database dynamics and database search methods is analysed, demonstrating the inherent advantages of an integrated automatic approach using multiple databases and search methods applied in an objective and repeatable manner. The GeneQuiz system

  18. Biostratigraphic and morphometric analyses of specimens from the calcareous nannofossil genus Tribrachiatus

    Science.gov (United States)

    Self-Trail, Jean; Seefelt, Ellen L.; Shepherd, Claire L.; Martin, Victoria A.

    2017-01-01

    Biostratigraphic and morphometric analyses of calcareous nannofossil assemblages from one outcrop and two cored sections of lower Eocene sediments reveal the presence of two new species: Tribrachiatus lunatus sp. nov., and Tribrachiatus absidatus sp. nov. Differences between the new species and Tribrachiatus orthostylus are discussed. The first occurrence of the two new species is just below the calcareous nannofossil Zone NP11/NP12 boundary, close to the Chron 24r/23n boundary, and thus they are globally useful biostratigraphic markers.

  19. Fossil struthionid eggshells from Laetoli, Tanzania: Taxonomic and biostratigraphic significance

    Science.gov (United States)

    Harrison, Terry; Msuya, Charles P.

    2005-04-01

    Recent paleontological investigations at Laetoli and neighboring localities in northern Tanzania have produced a large collection of fossil ostrich eggshells from the Pliocene-aged Laetolil Beds (˜3.5-4.5 Ma) and Ndolanya Beds (˜2.6-2.7 Ma). A detailed analysis of the morphology of the eggshells and their taxonomic affinities indicates that two different species of Struthio are represented. In the Lower Laetolil Beds and in the Upper Laetolil Beds below Tuff 3 a new species is recognized— Struthio kakesiensis. This is replaced in the Upper Laetolil Beds by Struthio camelus, the modern species of ostrich. Since radiometric age determinations are available for the stratigraphic sequence at Laetoli, it is possible to precisely date the first appearance of S. camelus at ˜3.6-3.8 Ma. Comparisons of the Laetoli material with specimens from the well-dated sequences at Lothagam and Kanapoi in northern Kenya, allow the taxonomic and biochronological analysis to be extended back in time to the late Miocene. At about 6.5 Ma, Diamantornis and elephant birds were replaced in East Africa by ostriches belonging to the genus Struthio. Three time-successive species of ostriches are identified in the fossil record of East Africa, beginning with Struthio. cf. karingarabensis (˜6.5-4.2 Ma), followed by S. kakesiensis (˜4.5-3.6 Ma) and then S. camelus (˜3.8 Ma onwards). A similar sequence of taxa has previously been recorded from localities in Namibia, but at these sites there is no possibility to precisely calibrate the ages of the different species using radiometric dating. Nevertheless, the broadly similar evolutionary sequence and the close correspondence in inferred ages for the succession of species in East Africa and Namibia suggest that ostrich eggshells are a very useful tool for biochronological correlation of paleontological sites in sub-Saharan Africa.

  20. Sequence analysis by iterated maps, a review.

    Science.gov (United States)

    Almeida, Jonas S

    2014-05-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.

  1. Preliminary hazard analysis using sequence tree method

    International Nuclear Information System (INIS)

    Huang Huiwen; Shih Chunkuan; Hung Hungchih; Chen Minghuei; Yih Swu; Lin Jiinming

    2007-01-01

    A system level PHA using sequence tree method was developed to perform Safety Related digital I and C system SSA. The conventional PHA is a brainstorming session among experts on various portions of the system to identify hazards through discussions. However, this conventional PHA is not a systematic technique, the analysis results strongly depend on the experts' subjective opinions. The analysis quality cannot be appropriately controlled. Thereby, this research developed a system level sequence tree based PHA, which can clarify the relationship among the major digital I and C systems. Two major phases are included in this sequence tree based technique. The first phase uses a table to analyze each event in SAR Chapter 15 for a specific safety related I and C system, such as RPS. The second phase uses sequence tree to recognize what I and C systems are involved in the event, how the safety related systems work, and how the backup systems can be activated to mitigate the consequence if the primary safety systems fail. In the sequence tree, the defense-in-depth echelons, including Control echelon, Reactor trip echelon, ESFAS echelon, and Indication and display echelon, are arranged to construct the sequence tree structure. All the related I and C systems, include digital system and the analog back-up systems are allocated in their specific echelon. By this system centric sequence tree based analysis, not only preliminary hazard can be identified systematically, the vulnerability of the nuclear power plant can also be recognized. Therefore, an effective simplified D3 evaluation can be performed as well. (author)

  2. Information theory applications for biological sequence analysis.

    Science.gov (United States)

    Vinga, Susana

    2014-05-01

    Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology. In particular, alignment-free sequence analysis and comparison greatly benefited from concepts derived from IT, such as entropy and mutual information. This review covers several aspects of IT applications, ranging from genome global analysis and comparison, including block-entropy estimation and resolution-free metrics based on iterative maps, to local analysis, comprising the classification of motifs, prediction of transcription factor binding sites and sequence characterization based on linguistic complexity and entropic profiles. IT has also been applied to high-level correlations that combine DNA, RNA or protein features with sequence-independent properties, such as gene mapping and phenotype analysis, and has also provided models based on communication systems theory to describe information transmission channels at the cell level and also during evolutionary processes. While not exhaustive, this review attempts to categorize existing methods and to indicate their relation with broader transversal topics such as genomic signatures, data compression and complexity, time series analysis and phylogenetic classification, providing a resource for future developments in this promising area.

  3. Digital image sequence processing, compression, and analysis

    CERN Document Server

    Reed, Todd R

    2004-01-01

    IntroductionTodd R. ReedCONTENT-BASED IMAGE SEQUENCE REPRESENTATIONPedro M. Q. Aguiar, Radu S. Jasinschi, José M. F. Moura, andCharnchai PluempitiwiriyawejTHE COMPUTATION OF MOTIONChristoph Stiller, Sören Kammel, Jan Horn, and Thao DangMOTION ANALYSIS AND DISPLACEMENT ESTIMATION IN THE FREQUENCY DOMAINLuca Lucchese and Guido Maria CortelazzoQUALITY OF SERVICE ASSESSMENT IN NEW GENERATION WIRELESS VIDEO COMMUNICATIONSGaetano GiuntaERROR CONCEALMENT IN DIGITAL VIDEOFrancesco G.B. De NataleIMAGE SEQUENCE RESTORATION: A WIDER PERSPECTIVEAnil KokaramVIDEO SUMMARIZATIONCuneyt M. Taskiran and Edward

  4. OTU analysis using metagenomic shotgun sequencing data.

    Directory of Open Access Journals (Sweden)

    Xiaolin Hao

    Full Text Available Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs. Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison between simulated datasets revealed a relationship between 16S rRNA gene fragments and full-length 16S rRNA sequences that a 16S rRNA gene fragment having a length >150 bp provides the same accuracy as a full-length 16S rRNA sequence using our proposed pipeline, which could serve as a good starting point for experimental design and making the comparison between 16S rRNA gene fragment-based and targeted 16S rRNA sequencing-based surveys possible.

  5. Sequence Matching Analysis for Curriculum Development

    Directory of Open Access Journals (Sweden)

    Liem Yenny Bendatu

    2015-06-01

    Full Text Available Many organizations apply information technologies to support their business processes. Using the information technologies, the actual events are recorded and utilized to conform with predefined model. Conformance checking is an approach to measure the fitness and appropriateness between process model and actual events. However, when there are multiple events with the same timestamp, the traditional approach unfit to result such measures. This study attempts to develop a sequence matching analysis. Considering conformance checking as the basis of this approach, this proposed approach utilizes the current control flow technique in process mining domain. A case study in the field of educational process has been conducted. This study also proposes a curriculum analysis framework to test the proposed approach. By considering the learning sequence of students, it results some measurements for curriculum development. Finally, the result of the proposed approach has been verified by relevant instructors for further development.

  6. Protein sequence analysis using Hewlett-Packard biphasic sequencing cartridges in an applied biosystems 473A protein sequencer.

    Science.gov (United States)

    Tang, S; Mozdzanowski, J; Anumula, K R

    1999-01-01

    Protein sequence analysis using an adsorptive biphasic sequencing cartridge, a set of two coupled columns introduced by Hewlett-Packard for protein sequencing by Edman degradation, in an Applied Biosystems 473A protein sequencer has been demonstrated. Samples containing salts, detergents, excipients, etc. (e.g., formulated protein drugs) can be easily analyzed using the ABI sequencer. Simple modifications to the ABI sequencer to accommodate the cartridge extend its utility in the analysis of difficult samples. The ABI sequencer solvents and reagents were compatible with the HP cartridge for sequencing. Sequence information up to ten residues can be easily generated by this nonoptimized procedure, and it is sufficient for identifying proteins by database search and for preparing a DNA probe for cloning novel proteins.

  7. Genome sequence and analysis of Lactobacillus helveticus

    Directory of Open Access Journals (Sweden)

    Paola eCremonesi

    2013-01-01

    Full Text Available The microbiological characterization of lactobacilli is historically well developed, but the genomic analysis is recent. Because of the widespread use of L. helveticus in cheese technology, information concerning the heterogeneity in this species is accumulating rapidly. Recently, the genome of five L. helveticus strains was sequenced to completion and compared with other genomically characterized lactobacilli. The genomic analysis of the first sequenced strain, L. helveticus DPC 4571, isolated from cheese and selected for its characteristics of rapid lysis and high proteolytic activity, has revealed a plethora of genes with industrial potential including those responsible for key metabolic functions such as proteolysis, lipolysis, and cell lysis. These genes and their derived enzymes can facilitate the production of cheese and cheese derivatives with potential for use as ingredients in consumer foods. In addition, L. helveticus has the potential to produce peptides with a biological function, such as angiotensin converting enzyme (ACE inhibitory activity, in fermented dairy products, demonstrating the therapeutic value of this species. A most intriguing feature of the genome of L. helveticus is the remarkable similarity in gene content with many intestinal lactobacilli. Comparative genomics has allowed the identification of key gene sets that facilitate a variety of lifestyles including adaptation to food matrices or the gastrointestinal tract.As genome sequence and functional genomic information continues to explode, key features of the genomes of L. helveticus strains continue to be discovered, answering many questions but also raising many new ones.

  8. Comparative analysis of sequences from PT 2013

    DEFF Research Database (Denmark)

    Mikkelsen, Susie Sommer

    . All but one sequence mapped to the MCP gene while the last sequence mapped to the Neurofilament gene. Approx. half of the sequences contained no errors while the rest differed with 88-99 percent similarity with most having 99% similarity. One sequence, when BLASTed, showed most similarity to European...... Sheatfish and not EHNV. Generally, mistakes occurred at the ends of the sequences. This can be due to several factors. One is that the sequence has not been trimmed of the sequence primer sites. Another is the lack of quality control of the chromatogram. Finally, sequencing in just one direction can result...

  9. Bayesian Correlation Analysis for Sequence Count Data.

    Directory of Open Access Journals (Sweden)

    Daniel Sánchez-Taltavull

    Full Text Available Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities' measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low-especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities' signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset.

  10. A basic analysis toolkit for biological sequences

    Directory of Open Access Journals (Sweden)

    Siragusa Enrico

    2007-09-01

    Full Text Available Abstract This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filtering operations to select strings from a set and establish their statistical significance, via z-score computation. None of the algorithms is new, but although they are generally regarded as fundamental for sequence analysis, they have not been implemented in a single and consistent software package, as we do here. Therefore, our main contribution is to fill this gap between algorithmic theory and practice by providing an extensible and easy to use software library that includes algorithms for the mentioned string matching and alignment problems. The library consists of C/C++ library functions as well as Perl library functions. It can be interfaced with Bioperl and can also be used as a stand-alone system with a GUI. The software is available at http://www.math.unipa.it/~raffaele/BATS/ under the GNU GPL.

  11. An analysis of sequence alignment: heuristic algorithms.

    Science.gov (United States)

    Bucak, I Ö; Uslan, V

    2010-01-01

    Sequence alignment becomes challenging with an increase in size and number of sequences. Finding optimal or near optimal solutions for sequence alignment is one of the most important operations in bioinformatics. This study aims to survey heuristics applied for the sequence alignment problem summarized in a time line.

  12. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan

    2015-10-21

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  13. RIKEN Integrated Sequence Analysis (RISA) System—384-Format Sequencing Pipeline with 384 Multicapillary Sequencer

    Science.gov (United States)

    Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Nagaoka, Sumiharu; Sasaki, Nobuya; Carninci, Piero; Konno, Hideaki; Akiyama, Junichi; Nishi, Katsuo; Kitsunai, Tokuji; Tashiro, Hideo; Itoh, Mari; Sumi, Noriko; Ishii, Yoshiyuki; Nakamura, Shin; Hazama, Makoto; Nishine, Tsutomu; Harada, Akira; Yamamoto, Rintaro; Matsumoto, Hiroyuki; Sakaguchi, Sumito; Ikegami, Takashi; Kashiwagi, Katsuya; Fujiwake, Syuji; Inoue, Kouji; Togawa, Yoshiyuki; Izawa, Masaki; Ohara, Eiji; Watahiki, Masanori; Yoneda, Yuko; Ishikawa, Tomokazu; Ozawa, Kaori; Tanaka, Takumi; Matsuura, Shuji; Kawai, Jun; Okazaki, Yasushi; Muramatsu, Masami; Inoue, Yorinao; Kira, Akira; Hayashizaki, Yoshihide

    2000-01-01

    The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3′ end and 5′ end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can

  14. RIKEN integrated sequence analysis (RISA) system--384-format sequencing pipeline with 384 multicapillary sequencer.

    Science.gov (United States)

    Shibata, K; Itoh, M; Aizawa, K; Nagaoka, S; Sasaki, N; Carninci, P; Konno, H; Akiyama, J; Nishi, K; Kitsunai, T; Tashiro, H; Itoh, M; Sumi, N; Ishii, Y; Nakamura, S; Hazama, M; Nishine, T; Harada, A; Yamamoto, R; Matsumoto, H; Sakaguchi, S; Ikegami, T; Kashiwagi, K; Fujiwake, S; Inoue, K; Togawa, Y

    2000-11-01

    The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3' end and 5' end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can be

  15. Multilocus sequence analysis of the family Halomonadaceae.

    Science.gov (United States)

    de la Haba, Rafael R; Márquez, M Carmen; Papke, R Thane; Ventosa, Antonio

    2012-03-01

    Multilocus sequence analysis (MLSA) protocols have been developed for species circumscription for many taxa. However, at present, no studies based on MLSA have been performed within any moderately halophilic bacterial group. To test the usefulness of MLSA with these kinds of micro-organisms, the family Halomonadaceae, which includes mainly halophilic bacteria, was chosen as a model. This family comprises ten genera with validly published names and 85 species of environmental, biotechnological and clinical interest. In some cases, the phylogenetic relationships between members of this family, based on 16S rRNA gene sequence comparisons, are not clear and a deep phylogenetic analysis using several housekeeping genes seemed appropriate. Here, MLSA was applied using the 16S rRNA, 23S rRNA, atpA, gyrB, rpoD and secA genes for species of the family Halomonadaceae. Phylogenetic trees based on the individual and concatenated gene sequences revealed that the family Halomonadaceae formed a monophyletic group of micro-organisms within the order Oceanospirillales. With the exception of the genera Halomonas and Modicisalibacter, all other genera within this family were phylogenetically coherent. Five of the six studied genes (16S rRNA, 23S rRNA, gyrB, rpoD and secA) showed a consistent evolutionary history. However, the results obtained with the atpA gene were different; thus, this gene may not be considered useful as an individual gene phylogenetic marker within this family. The phylogenetic methods produced variable results, with those generated from the maximum-likelihood and neighbour-joining algorithms being more similar than those obtained by maximum-parsimony methods. Horizontal gene transfer (HGT) plays an important evolutionary role in the family Halomonadaceae; however, the impact of recombination events in the phylogenetic analysis was minimized by concatenating the six loci, which agreed with the current taxonomic scheme for this family. Finally, the findings of

  16. Time fluctuation analysis of forest fire sequences

    Science.gov (United States)

    Vega Orozco, Carmen D.; Kanevski, Mikhaïl; Tonini, Marj; Golay, Jean; Pereira, Mário J. G.

    2013-04-01

    Forest fires are complex events involving both space and time fluctuations. Understanding of their dynamics and pattern distribution is of great importance in order to improve the resource allocation and support fire management actions at local and global levels. This study aims at characterizing the temporal fluctuations of forest fire sequences observed in Portugal, which is the country that holds the largest wildfire land dataset in Europe. This research applies several exploratory data analysis measures to 302,000 forest fires occurred from 1980 to 2007. The applied clustering measures are: Morisita clustering index, fractal and multifractal dimensions (box-counting), Ripley's K-function, Allan Factor, and variography. These algorithms enable a global time structural analysis describing the degree of clustering of a point pattern and defining whether the observed events occur randomly, in clusters or in a regular pattern. The considered methods are of general importance and can be used for other spatio-temporal events (i.e. crime, epidemiology, biodiversity, geomarketing, etc.). An important contribution of this research deals with the analysis and estimation of local measures of clustering that helps understanding their temporal structure. Each measure is described and executed for the raw data (forest fires geo-database) and results are compared to reference patterns generated under the null hypothesis of randomness (Poisson processes) embedded in the same time period of the raw data. This comparison enables estimating the degree of the deviation of the real data from a Poisson process. Generalizations to functional measures of these clustering methods, taking into account the phenomena, were also applied and adapted to detect time dependences in a measured variable (i.e. burned area). The time clustering of the raw data is compared several times with the Poisson processes at different thresholds of the measured function. Then, the clustering measure value

  17. Statistical analysis of next generation sequencing data

    CERN Document Server

    Nettleton, Dan

    2014-01-01

    Next Generation Sequencing (NGS) is the latest high throughput technology to revolutionize genomic research. NGS generates massive genomic datasets that play a key role in the big data phenomenon that surrounds us today. To extract signals from high-dimensional NGS data and make valid statistical inferences and predictions, novel data analytic and statistical techniques are needed. This book contains 20 chapters written by prominent statisticians working with NGS data. The topics range from basic preprocessing and analysis with NGS data to more complex genomic applications such as copy number variation and isoform expression detection. Research statisticians who want to learn about this growing and exciting area will find this book useful. In addition, many chapters from this book could be included in graduate-level classes in statistical bioinformatics for training future biostatisticians who will be expected to deal with genomic data in basic biomedical research, genomic clinical trials and personalized med...

  18. SVAMP: Sequence variation analysis, maps and phylogeny

    KAUST Repository

    Naeem, Raeece

    2014-04-03

    Summary: SVAMP is a stand-alone desktop application to visualize genomic variants (in variant call format) in the context of geographical metadata. Users of SVAMP are able to generate phylogenetic trees and perform principal coordinate analysis in real time from variant call format (VCF) and associated metadata files. Allele frequency map, geographical map of isolates, Tajima\\'s D metric, single nucleotide polymorphism density, GC and variation density are also available for visualization in real time. We demonstrate the utility of SVAMP in tracking a methicillin-resistant Staphylococcus aureus outbreak from published next-generation sequencing data across 15 countries. We also demonstrate the scalability and accuracy of our software on 245 Plasmodium falciparum malaria isolates from three continents. Availability and implementation: The Qt/C++ software code, binaries, user manual and example datasets are available at http://cbrc.kaust.edu.sa/svamp. © The Author 2014.

  19. Pig genome sequence - analysis and publication strategy

    DEFF Research Database (Denmark)

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol

    2010-01-01

    BACKGROUND: The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing. RESULTS: Assemblies...... of the BAC clone derived genome sequence have been annotated using the Pre-Ensembl and Ensembl automated pipelines and made accessible through the Pre-Ensembl/Ensembl browsers. The current annotated genome assembly (Sscrofa9) was released with Ensembl 56 in September 2009. A revised assembly (Sscrofa10......) is under construction and will incorporate whole genome shotgun sequence (WGS) data providing > 30x genome coverage. The WGS sequence, most of which comprise short Illumina/Solexa reads, were generated from DNA from the same single Duroc sow as the source of the BAC library from which clones were...

  20. Movement Pattern Analysis Based on Sequence Signatures

    Directory of Open Access Journals (Sweden)

    Seyed Hossein Chavoshi

    2015-09-01

    Full Text Available Increased affordability and deployment of advanced tracking technologies have led researchers from various domains to analyze the resulting spatio-temporal movement data sets for the purpose of knowledge discovery. Two different approaches can be considered in the analysis of moving objects: quantitative analysis and qualitative analysis. This research focuses on the latter and uses the qualitative trajectory calculus (QTC, a type of calculus that represents qualitative data on moving point objects (MPOs, and establishes a framework to analyze the relative movement of multiple MPOs. A visualization technique called sequence signature (SESI is used, which enables to map QTC patterns in a 2D indexed rasterized space in order to evaluate the similarity of relative movement patterns of multiple MPOs. The applicability of the proposed methodology is illustrated by means of two practical examples of interacting MPOs: cars on a highway and body parts of a samba dancer. The results show that the proposed method can be effectively used to analyze interactions of multiple MPOs in different domains.

  1. Timing of Early Aptian demise of northern Tethyan carbonate platforms - chemostratigraphic versus biostratigraphic evidence

    Science.gov (United States)

    Huck, Stefan; Immenhauser, Adrian; Heimhofer, Ulrich; Rameil, Niels

    2010-05-01

    A lively controversy still exists between different authors dealing with the timing of northern Tethyan platform drowning and the Early Aptian oceanic anoxic event (OAE 1a). To the present day, there is no consensus if the OAE 1a black shales must be attributed to the Deshayesites weissi or the Deshayesites deshayesi zone (see discussion in Moreno-Bedmar et al., 2009). OAE 1a black shale deposition has been traditionally attributed to the Deshayesites weissi zone (Gradstein et al., 2004). Despite this disagreement about the biostratigraphic timing, several authors postulate a relation between biotic perturbations and environmental changes linked to OAE 1a, e. g. the disappearance of coral-rudist reefs related with the demise of the northern Tethyan Urgonian platforms in the Helvetic Alps (Weissert et al., 1998; Föllmi et al., 2008). In the central and southern Tethyan realm (Istria, Oman), OAE 1a is likely expressed as the transient mass occurrence of microencrusters (Lithocodium-Bacinella) and the coeval demise of the characteristic mid-Cretaceous framework-builders (rudists, corals). Chemostratigraphic data indicate that these microbial blooms coincide with the Deshayesites weissi zone (Huck et al., 2010, Rameil et al, 2010). These observations raise the question whether northern Tethyan platform drowning is coeval to microbial bloom periods in the central and southern Tethys? The analysis of all available literature and unpublished evidence demonstrates that well constrained age data are surprisingly scarce and controversial. The goal of the present research project is to compile a chemostratigraphic framework for the northern Tethyan platform drowning (Haute-Savoie, SE France) in order to shed light on the temporal constraints of platform drowning versus pelagic black shale deposition versus microbial blooms. Two Barremian to Aptian shoalwater sections (Cluses section, Grande Forclaz section) in the Subalpine Chains were investigated applying chemostratigraphy

  2. Efficient computational methods for sequence analysis of small RNAs

    OpenAIRE

    Cozen, Gozde

    2007-01-01

    With the discovery of small regulatory RNAs, there has been a tremendous increase in the number of RNA sequencing projects. Meanwhile, novel high-throughput sequencing technologies, which can sequence as much as 500000 small RNA sequences in one run, have emerged. The challenge of processing this rapidly growing data can be addressed by optimizing current analysis approaches for small RNA sequences. We present fast register-level methods for small RNA pairwise alignment and small RNA to genom...

  3. Noncoding sequence classification based on wavelet transform analysis: part I

    Science.gov (United States)

    Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

    2017-09-01

    DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.

  4. AN INTEGRATED CALCAREOUS PLANKTON BIOSTRATIGRAPHIC SCHEME AND BIOCHRONOLOGY FOR THE MEDITERRANEAN MIDDLE MIOCENE

    Directory of Open Access Journals (Sweden)

    RODOLFO SPROVIERI

    2002-07-01

    Full Text Available The relative position of 30 main bioevents  pertaining to calcareous nannofossils and planktonic foraminifera was identified in the time interval between 13.75 Ma and 10.50 Ma, based on the quantitative study of the those microfossils in three Mediterranean sections spanning the late Langhian – lower Tortonian stratigraphic interval. The events were correlated  to the astronomic target curve using a cyclostratigraphic approach, resulting in a very detailed biostratigraphic and biochronologic subdivision of the interval. The zonal scheme proposed by Fornaciari et al. (1996 was adopted for the calcareous nannofossils, but three subzones were identified in the MMN7 Zone. For the planktonic foraminifera reference is made to the zonal scheme recently proposed by Foresi et al. (1998, slightly modified in order to increase its biostratigraphic resolution. The age of all the zonal boundaries is reported. 

  5. Sequencing small RNA: introduction and data analysis fundamentals.

    Science.gov (United States)

    Mehta, Jai Prakash

    2014-01-01

    Small RNAs are important transcriptional regulators within cells. With the advent of powerful Next Generation Sequencing platforms, sequencing small RNAs seems to be an obvious choice to understand their expression and its downstream effect. Additionally, sequencing provides an opportunity to identify novel and polymorphic miRNA. However, the biggest challenge is the appropriate data analysis pipeline, which is still in phase of active development by various academic groups. This chapter describes basic and advanced steps for small RNA sequencing analysis including quality control, small RNA alignment and quantification, differential expression analysis, novel small RNA identification, target prediction, and downstream analysis. We also provide a list of various resources for small RNA analysis.

  6. Project Report: Automatic Sequence Processor Software Analysis

    Science.gov (United States)

    Benjamin, Brandon

    2011-01-01

    The Mission Planning and Sequencing (MPS) element of Multi-Mission Ground System and Services (MGSS) provides space missions with multi-purpose software to plan spacecraft activities, sequence spacecraft commands, and then integrate these products and execute them on spacecraft. Jet Propulsion Laboratory (JPL) is currently is flying many missions. The processes for building, integrating, and testing the multi-mission uplink software need to be improved to meet the needs of the missions and the operations teams that command the spacecraft. The Multi-Mission Sequencing Team is responsible for collecting and processing the observations, experiments and engineering activities that are to be performed on a selected spacecraft. The collection of these activities is called a sequence and ultimately a sequence becomes a sequence of spacecraft commands. The operations teams check the sequence to make sure that no constraints are violated. The workflow process involves sending a program start command, which activates the Automatic Sequence Processor (ASP). The ASP is currently a file-based system that is comprised of scripts written in perl, c-shell and awk. Once this start process is complete, the system checks for errors and aborts if there are any; otherwise the system converts the commands to binary, and then sends the resultant information to be radiated to the spacecraft.

  7. Novel algorithms for protein sequence analysis

    NARCIS (Netherlands)

    Ye, Kai

    2008-01-01

    Each protein is characterized by its unique sequential order of amino acids, the so-called protein sequence. Biology”s paradigm is that this order of amino acids determines the protein”s architecture and function. In this thesis, we introduce novel algorithms to analyze protein sequences. Chapter 1

  8. Analysis and prediction of baculovirus promoter sequences.

    Science.gov (United States)

    Xing, Ke; Deng, Riqiang; Wang, Jinwen; Feng, Jinghua; Huang, Mingsong; Wang, Xunzhang

    2005-10-01

    Consensus patterns of baculovirus sequences upstream from the translational initiation sites have been analyzed and a web tool, Local Alignment Promoter Predictor (LAPP), for the prediction of baculovirus promoter sequences has also been developed. Potential consensus sequences, i.e., TCATTGT, TCTTGTA, CTCGTAA, TCCATTT and TCATT plus TCGT in approximately 30 bp spacing context, have been found in baculovirus promoter regions, in addition to well-characterized late and early promoter elements G/T/ATAAG and TATAA, which is accompanied about 30-bp downstream by a transcriptional initiation sequence CAGT or CATT. Promoter prediction is performed by a dynamic programming algorithm based on maximal segment pair measure with scores above some cutoff against each sequence in a refined promoter database. The algorithm was able to discriminate between promoter and non-promoter sequences in a test set of baculovirus sequences with prediction specificity and sensitivity superior to that using five other eukaryotic promoter recognition programs available on the Internet. A web server that implements the LAPP with continually updated promoter database is freely available at http://life.zsu.edu.cn/LAPP/.

  9. Characterization and sequence analysis of cysteine and glycine-rich ...

    African Journals Online (AJOL)

    Primers specific for CSRP3 were designed using known cDNA sequences of Bos taurus published in database with different accession numbers. Polymerase chain reaction (PCR) was performed and products were purified and sequenced. Sequence analysis and alignment were carried out using CLUSTAL W (1.83).

  10. Incident sequence analysis; event trees, methods and graphical symbols

    International Nuclear Information System (INIS)

    1980-11-01

    When analyzing incident sequences, unwanted events resulting from a certain cause are looked for. Graphical symbols and explanations of graphical representations are presented. The method applies to the analysis of incident sequences in all types of facilities. By means of the incident sequence diagram, incident sequences, i.e. the logical and chronological course of repercussions initiated by the failure of a component or by an operating error, can be presented and analyzed simply and clearly

  11. Computer-aided visualization and analysis system for sequence evaluation

    Energy Technology Data Exchange (ETDEWEB)

    Chee, Mark S.; Wang, Chunwei; Jevons, Luis C.; Bernhart, Derek H.; Lipshutz, Robert J.

    2004-05-11

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  12. Analysis of Neuronal Sequences Using Pairwise Biases

    Science.gov (United States)

    2015-08-27

    semantic memory (knowledge of facts) and implicit memory (e.g., how to ride a bike ). Evidence for the participation of the hippocampus in the formation of...very different from each other in many ways including duration and number of spikes. Still, these sequences share a similar trend in the general order...1 and 2 precede all other spikes in both s and s�). Many other sequences share this property with s and s�; in fact, we can completely characterize

  13. Establishing a framework for comparative analysis of genome sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  14. Bioinformatic analysis of whole genome sequencing data

    OpenAIRE

    Maqbool, Khurram

    2014-01-01

    Evolution has shaped the life forms for billion of years. Domestication is an accelerated process that can be used as a model for evolutionary changes. The aim of this thesis project has been to carry out extensive bioinformatic analyses of whole genome sequencing data to reveal SNPs, InDels and selective sweeps in the chicken, pig and dog genome. Pig genome sequencing revealed loci under selection for elongation of back and increased number of vertebrae, associated with the NR6A1, PLAG1,...

  15. [Tabular excel editor for analysis of aligned nucleotide sequences].

    Science.gov (United States)

    Demkin, V V

    2010-01-01

    Excel platform was used for transition of results of multiple aligned nucleotide sequences obtained using the BLAST network service to the form appropriate for visual analysis and editing. Two macros operators for MS Excel 2007 were constructed. The array of aligned sequences transformed into Excel table and processed using macros operators is more appropriate for analysis than initial html data.

  16. SEQUENCE ANALYSIS OF MATURASE K (MATK): A ...

    African Journals Online (AJOL)

    Global Journal

    The application and utilization of sequence data has been found very informative in the characterization and phylogenetic relationship of different crops species. This study aimed to use bioinformatics tools to characterize the. matK gene in some selected legumes with special reference to pigeon pea [cajanus cajan ...

  17. Google matrix analysis of DNA sequences.

    Science.gov (United States)

    Kandiah, Vivek; Shepelyansky, Dima L

    2013-01-01

    For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW). At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  18. Google matrix analysis of DNA sequences.

    Directory of Open Access Journals (Sweden)

    Vivek Kandiah

    Full Text Available For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW. At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  19. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    Directory of Open Access Journals (Sweden)

    Wadim L. Matochko

    2013-01-01

    Full Text Available Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N×1 frequency vector n=ni, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N×N matrix and a stochastic sampling operator (Sa. The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq. Sequencing without any bias and errors is Seq=Sa IN, where IN is a N×N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN, which describes elimination or statistically significant downsampling, of specific reads during the sequencing process.

  20. Phylogenetic analysis of the genus Hordeum using repetitive DNA sequences

    DEFF Research Database (Denmark)

    Svitashev, S.; Bryngelsson, T.; Vershinin, A.

    1994-01-01

    A set of six cloned barley (Hordeum vulgare) repetitive DNA sequences was used for the analysis of phylogenetic relationships among 31 species (46 taxa) of the genus Hordeum, using molecular hybridization techniques. In situ hybridization experiments showed dispersed organization of the sequences...

  1. Sequence comparison and phylogenetic analysis of core gene of ...

    African Journals Online (AJOL)

    In Pakistan, more than 10 million people are living with hepatitis C virus (HCV) with high morbidity and mortality. The aims of the present study are to report HCV core gene sequences from Pakistani population and perform their sequence comparison/phylogenetic analysis. The core gene of HCV has been cloned from six ...

  2. Cloning and sequence analysis of the Antheraea pernyi ...

    Indian Academy of Sciences (India)

    A genomic library was generated using HindIII and the positive clones were sequenced and analysed. The gp64 gene, encoding the baculovirus envelope protein GP64, was found in an insert. The nucleotide sequence analysis indicated that the AnpeNPV gp64 gene consists of a 1530 nucleotide open reading frame ...

  3. Sequence analysis corresponding to the PPE and PE proteins in ...

    Indian Academy of Sciences (India)

    Amino acid sequence analysis corresponding to the PPE proteins in H37Rv and CDC1551 strains of the Mycobacterium tuberculosis genomes resulted in the identification of a previously uncharacterized 225 amino acidresidue common region in 22 proteins. The pairwise sequence identities were as low as 18%.

  4. Biological sequence analysis: probabilistic models of proteins and nucleic acids

    National Research Council Canada - National Science Library

    Durbin, Richard

    1998-01-01

    ... analysis methods are now based on principles of probabilistic modelling. Examples of such methods include the use of probabilistically derived score matrices to determine the significance of sequence alignments, the use of hidden Markov models as the basis for profile searches to identify distant members of sequence families, and the inference...

  5. RNA Sequencing Analysis of Salivary Extracellular RNA.

    Science.gov (United States)

    Majem, Blanca; Li, Feng; Sun, Jie; Wong, David T W

    2017-01-01

    Salivary biomarkers for disease detection, diagnostic and prognostic assessments have become increasingly well established in recent years. In this chapter we explain the current leading technology that has been used to characterize salivary non-coding RNAs (ncRNAs) from the extracellular RNA (exRNA) fraction: HiSeq from Illumina® platform for RNA sequencing. Therefore, the chapter is divided into two main sections regarding the type of the library constructed (small and long ncRNA libraries), from saliva collection, RNA extraction and quantification to cDNA library generation and corresponding QCs. Using these invaluable technical tools, one can identify thousands of ncRNA species in saliva. These methods indicate that salivary exRNA provides an efficient medium for biomarker discovery of oral and systemic diseases.

  6. Strategy for the sequence analysis of heparin.

    Science.gov (United States)

    Liu, J; Desai, U R; Han, X J; Toida, T; Linhardt, R J

    1995-12-01

    The versatile biological activities of proteoglycans are mainly mediated by their glycosaminoglycan (GAG) components. Unlike proteins and nucleic acids, no satisfactory method for sequencing GAGs has been developed. This paper describes a strategy to sequence the GAG chains of heparin. Heparin, prepared from animal tissue, and processed by proteinases and endoglucuronidases, is 90% GAG heparin and 10% peptidoglycan heparin (containing small remnants of core protein). Raw porcine mucosal heparin was labelled on the amino termini of these core protein remnants with a hydrophobic, fluorescent tag [N-4-(6-dimethylamino-2-benzofuranyl) phenyl (NDBP)-isothiocyanate]. Enrichment of the NDBP-heparin using phenyl-Sepharose chromatography, followed by treatment with a mixture of heparin lyase I and III, resulted in a single NDBP-linkage region tetrasaccharide, which was characterized as deltaUAp(1-->3)-beta-D-Galp(1-->3)-beta-D-Galp(1-->4)-beta-Xylp -(1-->O-Ser-NDBP (deltaUAp is 4-deoxy-alpha-L-threo-hex-4-enopyranosyl uronic acid). Several NDBP-octasaccharides were isolated when NDBP-heparin was treated with only heparin lyase I. The structure of one of these NDBP-octasaccharides, deltaUAp2S(1-->4)-alpha-D-GlcNpAc(1-->4)-alpha-L-IdoAp (1-->4)-alpha-D-GlcNpAc6S(1-->4)-beta-D-GlcAp(1-->3)-beta-D- Galp(1-->3)-beta-D-Galp(1-->4)-beta-Xylp-(1-->O-Ser NDBP (S is sulphate, Ac is acetate), was determined by 1H-NMR and enzymatic methods. Enriched NDBP-heparin was treated with lithium hydroxide to release heparin, and the GAG chain was then labelled at xylose with 7-amino-1,3-naphthalene disulphonic acid (AGA). The resulting AGA-Xyl-heparin was sequenced on gradient PAGE using heparin lyase I and heparin lyase III. A predominant sequence in heparin at the protein core attachment site was deduced to be -D-GlcNp2S6S(or 6OH)(1-->4)-alpha-L-IdoAp2S-(1-->4)-alpha-D-GlcNp2S6S (or60H) (1-->4)-alpha-L-IdoAp2S(1-->4)-alpha-D-GlcNp2S6S( or 6OH)(1-->4)-alpha-L-IdoAp2S(1-->4)-alpha-D-GlcNpAc (1

  7. Cretaceous ostracods of the Barreirinhas Basin: Taxonomy, biostratigraphic considerations and paleoenvironmental inferences

    Science.gov (United States)

    Santos Filho, M. A. B.; Fauth, G.; Piovesan, E. K.

    2017-01-01

    Ostracods are microcrustaceans that inhabit different aquatic environments and are frequently used in paleoecological interpretations and biostratigraphic studies. The Barreirinhas Basin, Northern of Brazil, contains a well-preserved ostracod assemblage in its sedimentary rocks of early and late Cretaceous age, which have been so far poorly studied. This paper contains the first taxonomic identification of the ostracod assemblages, as well as the elaboration of paleoenvironmental and biostratigraphic inferences, for the Cretaceous of the Barreirinhas Basin. The studied material consists of 147 samples from the wells 1-MAS-1A, 1-MAS-3A, 1-MAS-4A and 1-MAS-14A. 495 specimens were recovered, distributed between 40 species, 16 genera and 9 families, including three new species. Based on previously established biozones for the Sergipe basin, two biozones were identified: the Nigeroloxoconcha aff. Nigeroloxoconcha sp. GA A 22 Range Zone, of lower Cenomanian age; and the Brachycythere sapucariensis Interval Zone, of Turonian to middle Coniacian age. Finally, three distinct ostracod assemblages were defined: Assemblage 1, dominated by Conchoecia? species; Assemblage 2, well diversified but with low abundance; and Assemblage 3, with cold water ostracods such as Krithe. Based on the ostracod assemblages identified, a middle neritic, platformal paleoenvironment was inferred for the studied interval.

  8. Editorial: Special Issue on Algorithms for Sequence Analysis and Storage

    Directory of Open Access Journals (Sweden)

    Veli Mäkinen

    2014-03-01

    Full Text Available This special issue of Algorithms is dedicated to approaches to biological sequence analysis that have algorithmic novelty and potential for fundamental impact in methods used for genome research.

  9. Initial sequencing and comparative analysis of the mouse genome.

    Science.gov (United States)

    Waterston, Robert H; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R; Brown, Daniel G; Brown, Stephen D; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T; Church, Deanna M; Clamp, Michele; Clee, Christopher; Collins, Francis S; Cook, Lisa L; Copley, Richard R; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D; Deri, Justin; Dermitzakis, Emmanouil T; Dewey, Colin; Dickens, Nicholas J; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M; Eddy, Sean R; Elnitski, Laura; Emes, Richard D; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A; Flicek, Paul; Foley, Karen; Frankel, Wayne N; Fulton, Lucinda A; Fulton, Robert S; Furey, Terrence S; Gage, Diane; Gibbs, Richard A; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A; Green, Eric D; Gregory, Simon; Guigó, Roderic; Guyer, Mark; Hardison, Ross C; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B; Johnson, L Steven; Jones, Matthew; Jones, Thomas A; Joy, Ann; Kamal, Michael; Karlsson, Elinor K; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W James; Kirby, Andrew; Kolbe, Diana L; Korf, Ian; Kucherlapati, Raju S; Kulbokas, Edward J; Kulp, David; Landers, Tom; Leger, J P; Leonard, Steven; Letunic, Ivica; Levine, Rosie; Li, Jia; Li, Ming; Lloyd, Christine; Lucas, Susan; Ma, Bin; Maglott, Donna R; Mardis, Elaine R; Matthews, Lucy; Mauceli, Evan; Mayer, John H; McCarthy, Megan; McCombie, W Richard; McLaren, Stuart; McLay, Kirsten; McPherson, John D; Meldrim, Jim; Meredith, Beverley; Mesirov, Jill P; Miller, Webb; Miner, Tracie L; Mongin, Emmanuel; Montgomery, Kate T; Morgan, Michael; Mott, Richard; Mullikin, James C; Muzny, Donna M; Nash, William E; Nelson, Joanne O; Nhan, Michael N; Nicol, Robert; Ning, Zemin; Nusbaum, Chad; O'Connor, Michael J; Okazaki, Yasushi; Oliver, Karen; Overton-Larty, Emma; Pachter, Lior; Parra, Genís; Pepin, Kymberlie H; Peterson, Jane; Pevzner, Pavel; Plumb, Robert; Pohl, Craig S; Poliakov, Alex; Ponce, Tracy C; Ponting, Chris P; Potter, Simon; Quail, Michael; Reymond, Alexandre; Roe, Bruce A; Roskin, Krishna M; Rubin, Edward M; Rust, Alistair G; Santos, Ralph; Sapojnikov, Victor; Schultz, Brian; Schultz, Jörg; Schwartz, Matthias S; Schwartz, Scott; Scott, Carol; Seaman, Steven; Searle, Steve; Sharpe, Ted; Sheridan, Andrew; Shownkeen, Ratna; Sims, Sarah; Singer, Jonathan B; Slater, Guy; Smit, Arian; Smith, Douglas R; Spencer, Brian; Stabenau, Arne; Stange-Thomann, Nicole; Sugnet, Charles; Suyama, Mikita; Tesler, Glenn; Thompson, Johanna; Torrents, David; Trevaskis, Evanne; Tromp, John; Ucla, Catherine; Ureta-Vidal, Abel; Vinson, Jade P; Von Niederhausern, Andrew C; Wade, Claire M; Wall, Melanie; Weber, Ryan J; Weiss, Robert B; Wendl, Michael C; West, Anthony P; Wetterstrand, Kris; Wheeler, Raymond; Whelan, Simon; Wierzbowski, Jamey; Willey, David; Williams, Sophie; Wilson, Richard K; Winter, Eitan; Worley, Kim C; Wyman, Dudley; Yang, Shan; Yang, Shiaw-Pyng; Zdobnov, Evgeny M; Zody, Michael C; Lander, Eric S

    2002-12-05

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

  10. Laser Desorption Mass Spectrometry for DNA Sequencing and Analysis

    Science.gov (United States)

    Chen, C. H. Winston; Taranenko, N. I.; Golovlev, V. V.; Isola, N. R.; Allman, S. L.

    1998-03-01

    Rapid DNA sequencing and/or analysis is critically important for biomedical research. In the past, gel electrophoresis has been the primary tool to achieve DNA analysis and sequencing. However, gel electrophoresis is a time-consuming and labor-extensive process. Recently, we have developed and used laser desorption mass spectrometry (LDMS) to achieve sequencing of ss-DNA longer than 100 nucleotides. With LDMS, we succeeded in sequencing DNA in seconds instead of hours or days required by gel electrophoresis. In addition to sequencing, we also applied LDMS for the detection of DNA probes for hybridization LDMS was also used to detect short tandem repeats for forensic applications. Clinical applications for disease diagnosis such as cystic fibrosis caused by base deletion and point mutation have also been demonstrated. Experimental details will be presented in the meeting. abstract.

  11. Tools for integrated sequence-structure analysis with UCSF Chimera

    Directory of Open Access Journals (Sweden)

    Huang Conrad C

    2006-07-01

    Full Text Available Abstract Background Comparing related structures and viewing the structures in the context of sequence alignments are important tasks in protein structure-function research. While many programs exist for individual aspects of such work, there is a need for interactive visualization tools that: (a provide a deep integration of sequence and structure, far beyond mapping where a sequence region falls in the structure and vice versa; (b facilitate changing data of one type based on the other (for example, using only sequence-conserved residues to match structures, or adjusting a sequence alignment based on spatial fit; (c can be used with a researcher's own data, including arbitrary sequence alignments and annotations, closely or distantly related sets of proteins, etc.; and (d interoperate with each other and with a full complement of molecular graphics features. We describe enhancements to UCSF Chimera to achieve these goals. Results The molecular graphics program UCSF Chimera includes a suite of tools for interactive analyses of sequences and structures. Structures automatically associate with sequences in imported alignments, allowing many kinds of crosstalk. A novel method is provided to superimpose structures in the absence of a pre-existing sequence alignment. The method uses both sequence and secondary structure, and can match even structures with very low sequence identity. Another tool constructs structure-based sequence alignments from superpositions of two or more proteins. Chimera is designed to be extensible, and mechanisms for incorporating user-specific data without Chimera code development are also provided. Conclusion The tools described here apply to many problems involving comparison and analysis of protein structures and their sequences. Chimera includes complete documentation and is intended for use by a wide range of scientists, not just those in the computational disciplines. UCSF Chimera is free for non-commercial use and is

  12. Categorizing accident sequences in the external radiotherapy for risk analysis

    OpenAIRE

    Kim, Jonghyun

    2013-01-01

    Purpose This study identifies accident sequences from the past accidents in order to help the risk analysis application to the external radiotherapy. Materials and Methods This study reviews 59 accidental cases in two retrospective safety analyses that have collected the incidents in the external radiotherapy extensively. Two accident analysis reports that accumulated past incidents are investigated to identify accident sequences including initiating events, failure of safety measures, and co...

  13. Sequencing and Analysis of Neanderthal Genomic DNA

    Energy Technology Data Exchange (ETDEWEB)

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith,Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo,Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2006-06-13

    Recovery and analysis of multiple Neanderthal autosomalsequences using a metagenomic approach reveals that modern humans andNeanderthals split ~;400,000 years ago, without significant evidence ofsubsequent admixture.

  14. DNA Sequence Analysis in Clinical Medicine, Proceeding Cautiously

    Directory of Open Access Journals (Sweden)

    Moyra Smith

    2017-05-01

    Full Text Available Delineation of underlying genomic and genetic factors in a specific disease may be valuable in establishing a definitive diagnosis and may guide patient management and counseling. In addition, genetic information may be useful in identification of at risk family members. Gene mapping and initial genome sequencing data enabled the development of microarrays to analyze genomic variants. The goal of this review is to consider different generations of sequencing techniques and their application to exome sequencing and whole genome sequencing and their clinical applications. In recent decades, exome sequencing has primarily been used in patient studies. Discussed in some detail, are important measures that have been developed to standardize variant calling and to assess pathogenicity of variants. Examples of cases where exome sequencing has facilitated diagnosis and led to improved medical management are presented. Whole genome sequencing and its clinical relevance are presented particularly in the context of analysis of nucleotide and structural genomic variants in large population studies and in certain patient cohorts. Applications involving analysis of cell free DNA in maternal blood for prenatal diagnosis of specific autosomal trisomies are reviewed. Applications of DNA sequencing to diagnosis and therapeutics of cancer are presented. Also discussed are important recent diagnostic applications of DNA sequencing in cancer, including analysis of tumor derived cell free DNA and exosomes that are present in body fluids. Insights gained into underlying pathogenetic mechanisms of certain complex common diseases, including schizophrenia, macular degeneration, neurodegenerative disease are presented. The relevance of different types of variants, rare, uncommon, and common to disease pathogenesis, and the continuum of causality, are addressed. Pharmogenetic variants detected by DNA sequence analysis are gaining in importance and are particularly relevant

  15. Quantiprot - a Python package for quantitative analysis of protein sequences.

    Science.gov (United States)

    Konopka, Bogumił M; Marciniak, Marta; Dyrka, Witold

    2017-07-17

    The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted. Quantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf's law coefficient. We propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences.

  16. Genome sequencing and analysis conference grant

    Energy Technology Data Exchange (ETDEWEB)

    Venter, J.C. [ed.

    1995-10-01

    The 14 plenary session presentations focused on nematode; yeast; fruit fly; plants; mycobacteria; and man. In addition there were presentations on a variety of technical innovations including database developments and refinements, bioelectronic genesensors, computer-assisted multiplex techniques, and hybridization analysis with DNA chip technology. This document includes a list of exhibitors and abstracts of sessions.

  17. Categorizing accident sequences in the external radiotherapy for risk analysis

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Jong Hyun [KEPCO International Nuclear Graduate School (KINGS), Ulsan (Korea, Republic of)

    2013-06-15

    This study identifies accident sequences from the past accidents in order to help the risk analysis application to the external radiotherapy. This study reviews 59 accidental cases in two retrospective safety analyses that have collected the incidents in the external radiotherapy extensively. Two accident analysis reports that accumulated past incidents are investigated to identify accident sequences including initiating events, failure of safety measures, and consequences. This study classifies the accidents by the treatments stages and sources of errors for initiating events, types of failures in the safety measures, and types of undesirable consequences and the number of affected patients. Then, the accident sequences are grouped into several categories on the basis of similarity of progression. As a result, these cases can be categorized into 14 groups of accident sequence. The result indicates that risk analysis needs to pay attention to not only the planning stage, but also the calibration stage that is committed prior to the main treatment process. It also shows that human error is the largest contributor to initiating events as well as to the failure of safety measures. This study also illustrates an event tree analysis for an accident sequence initiated in the calibration. This study is expected to provide sights into the accident sequences for the prospective risk analysis through the review of experiences.

  18. Categorizing accident sequences in the external radiotherapy for risk analysis.

    Science.gov (United States)

    Kim, Jonghyun

    2013-06-01

    This study identifies accident sequences from the past accidents in order to help the risk analysis application to the external radiotherapy. This study reviews 59 accidental cases in two retrospective safety analyses that have collected the incidents in the external radiotherapy extensively. Two accident analysis reports that accumulated past incidents are investigated to identify accident sequences including initiating events, failure of safety measures, and consequences. This study classifies the accidents by the treatments stages and sources of errors for initiating events, types of failures in the safety measures, and types of undesirable consequences and the number of affected patients. Then, the accident sequences are grouped into several categories on the basis of similarity of progression. As a result, these cases can be categorized into 14 groups of accident sequence. The result indicates that risk analysis needs to pay attention to not only the planning stage, but also the calibration stage that is committed prior to the main treatment process. It also shows that human error is the largest contributor to initiating events as well as to the failure of safety measures. This study also illustrates an event tree analysis for an accident sequence initiated in the calibration. This study is expected to provide sights into the accident sequences for the prospective risk analysis through the review of experiences.

  19. Nonlinear analysis of river flow time sequences

    Science.gov (United States)

    Porporato, Amilcare; Ridolfi, Luca

    1997-06-01

    Within the field of chaos theory several methods for the analysis of complex dynamical systems have recently been proposed. In light of these ideas we study the dynamics which control the behavior over time of river flow, investigating the existence of a low-dimension deterministic component. The present article follows the research undertaken in the work of Porporato and Ridolfi [1996a] in which some clues as to the existence of chaos were collected. Particular emphasis is given here to the problem of noise and to nonlinear prediction. With regard to the latter, the benefits obtainable by means of the interpolation of the available time series are reported and the remarkable predictive results attained with this nonlinear method are shown.

  20. Deep Sequencing Analysis of Nucleolar Small RNAs: Bioinformatics.

    Science.gov (United States)

    Bai, Baoyan; Laiho, Marikki

    2016-01-01

    Small RNAs (size 20-30 nt) of various types have been actively investigated in recent years, and their subcellular compartmentalization and relative concentrations are likely to be of importance to their cellular and physiological functions. Comprehensive data on this subset of the transcriptome can only be obtained by application of high-throughput sequencing, which yields data that are inherently complex and multidimensional, as sequence composition, length, and abundance will all inform to the small RNA function. Subsequent data analysis, hypothesis testing, and presentation/visualization of the results are correspondingly challenging. We have constructed small RNA libraries derived from different cellular compartments, including the nucleolus, and asked whether small RNAs exist in the nucleolus and whether they are distinct from cytoplasmic and nuclear small RNAs, the miRNAs. Here, we present a workflow for analysis of small RNA sequencing data generated by the Ion Torrent PGM sequencer from samples derived from different cellular compartments.

  1. Molecular cloning and sequence analysis of the cat myostatin gene ...

    African Journals Online (AJOL)

    ... MEF3, MTBF, PAX3, SMAD, HBOX, HOMF and TEAF motifs. Comparative analysis for some motifs showed both conservations and differences among cat, horse, porcine and human. Key words: Cat, myostatin 5'-regulatory region, molecular cloning, sequence analysis and comparison, transcription factor binding sites.

  2. Food Fish Identification from DNA Extraction through Sequence Analysis

    Science.gov (United States)

    Hallen-Adams, Heather E.

    2015-01-01

    This experiment exposed 3rd and 4th y undergraduates and graduate students taking a course in advanced food analysis to DNA extraction, polymerase chain reaction (PCR), and DNA sequence analysis. Students provided their own fish sample, purchased from local grocery stores, and the class as a whole extracted DNA, which was then subjected to PCR,…

  3. Precise age and biostratigraphic significance of the Kinney Brick Quarry Lagerstätte, Pennsylvanian of New Mexico, USA

    DEFF Research Database (Denmark)

    Lucas, Spencer G.; Allen, Bruce D.; Krainer, Karl

    2011-01-01

    The Kinney Brick Quarry is a world famous Late Pennsylvanian fossil Lagerstätte in central New Mexico, USA. The age assigned to the Kinney Brick Quarry (early-middle Virgilian) has long been based more on its inferred lithostratigraphic position than on biostratigraphic indicators at the quarry. We...

  4. An optimum analysis sequence for environmental gamma-ray spectrometry

    International Nuclear Information System (INIS)

    De la Torre, F.; Rios M, C.; Ruvalcaba A, M. G.; Mireles G, F.; Saucedo A, S.; Davila R, I.; Pinedo, J. L.

    2010-10-01

    This work aims to obtain an optimum analysis sequence for environmental gamma-ray spectroscopy by means of Genie 2000 (Canberra). Twenty different analysis sequences were customized using different peak area percentages and different algorithms for: 1) peak finding, and 2) peak area determination, and with or without the use of a library -based on evaluated nuclear data- of common gamma-ray emitters in environmental samples. The use of an optimum analysis sequence with certified nuclear information avoids the problems originated by the significant variations in out-of-date nuclear parameters of commercial software libraries. Interference-free gamma ray energies with absolute emission probabilities greater than 3.75% were included in the customized library. The gamma-ray spectroscopy system (based on a Ge Re-3522 Canberra detector) was calibrated both in energy and shape by means of the IAEA-2002 reference spectra for software intercomparison. To test the performance of the analysis sequences, the IAEA-2002 reference spectrum was used. The z-score and the reduced χ 2 criteria were used to determine the optimum analysis sequence. The results show an appreciable variation in the peak area determinations and their corresponding uncertainties. Particularly, the combination of second derivative peak locate with simple peak area integration algorithms provides the greater accuracy. Lower accuracy comes from the combination of library directed peak locate algorithm and Genie's Gamma-M peak area determination. (Author)

  5. An optimum analysis sequence for environmental gamma-ray spectrometry

    Energy Technology Data Exchange (ETDEWEB)

    De la Torre, F.; Rios M, C.; Ruvalcaba A, M. G.; Mireles G, F.; Saucedo A, S.; Davila R, I.; Pinedo, J. L., E-mail: fta777@hotmail.co [Universidad Autonoma de Zacatecas, Centro Regional de Estudis Nucleares, Calle Cipres No. 10, Fracc. La Penuela, 98068 Zacatecas (Mexico)

    2010-10-15

    This work aims to obtain an optimum analysis sequence for environmental gamma-ray spectroscopy by means of Genie 2000 (Canberra). Twenty different analysis sequences were customized using different peak area percentages and different algorithms for: 1) peak finding, and 2) peak area determination, and with or without the use of a library -based on evaluated nuclear data- of common gamma-ray emitters in environmental samples. The use of an optimum analysis sequence with certified nuclear information avoids the problems originated by the significant variations in out-of-date nuclear parameters of commercial software libraries. Interference-free gamma ray energies with absolute emission probabilities greater than 3.75% were included in the customized library. The gamma-ray spectroscopy system (based on a Ge Re-3522 Canberra detector) was calibrated both in energy and shape by means of the IAEA-2002 reference spectra for software intercomparison. To test the performance of the analysis sequences, the IAEA-2002 reference spectrum was used. The z-score and the reduced {chi}{sup 2} criteria were used to determine the optimum analysis sequence. The results show an appreciable variation in the peak area determinations and their corresponding uncertainties. Particularly, the combination of second derivative peak locate with simple peak area integration algorithms provides the greater accuracy. Lower accuracy comes from the combination of library directed peak locate algorithm and Genie's Gamma-M peak area determination. (Author)

  6. Les sédiments mésozoïques et cénozoïques de mer Ionienne (campagne Escarmed 3 : escarpement de Malte, mont Alfeo et monts de Médine. Etude biostratigraphique : foraminifères, nannoplancton, microfaciès Mesozoic and Cenozoic Sediments of the Ionian Sea (Escarmed Campaigns: Malta Escarpment, Alfeo and Medina Seamounts. Biostratigraphic Analysis: Foraminifers, Nannoplankton and Microfacies

    Directory of Open Access Journals (Sweden)

    Group Escarmed

    2006-11-01

    Full Text Available La campagne Escarmed 3 succède à toute une série de prélèvements effectués par des navires français et italiens. Les paléoenvironnements mésozoïques et cénozoïques de quelques marges du bassin Ionien profond (Méditerranée orientale et plus particulièrement ceux de l'escarpement de Malte ont pu être reconstitués partiellement grâce à une étude biostratigraphique. Aux faciès de plate-forme du Trias et Lias inférieur-moyen succèdent, comme en Sicile, des séries pélagiques au Jurassique moyen et supérieur. Au Crétacé inférieur, lacunes ou érosions caractérisent la série de l'escarpement de Malte par opposition aux séries de plate-forme des monts de Médine. Dès l'Albien jusqu'au Miocène, la sédimentation est à dominance pélagique avec présence locale de dépôts circalittoraux et brèches à éléments de plate-forme (Maestrichtien, Eocène, Oligocène ou phénomènes d'érosion, voire hiatus (Cénomanien-Turonien. Les faciès franchement pélagiques du Pliocène succèdent à la phase d'érosion du Miocène supérieur. Enfin, on notera un magmatisme important accompagnant des phénomènes de distension dès le Jurassique supérieur. The Escarmed 3 coring and dredging campaign in the Ionian Sea comes after a series of similar surveys by French and Italian vessels. They made it possible to partially reconstruct the Mesozoic and Cenozoic paleoenvironments of several margins of the deep Ionian basin (Eastern Mediterranean, especially those of the Malta escarpment, on the basis of biostratigraphic analysis. Whereas a shallow-water environment prevailed during the Triassic and the Lower to Middle Lias period, pelagic facies are observed in the Middle-Upper Jurassic series of this area, just as has been found in Sicily. During the Lower Cretaceous, gaps or erosions characterize the Malta Escarpment, whereas the Medina seamounts are subject to continuous shelf sedimentation. From the Albian up to the Miocene

  7. Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers.

    Directory of Open Access Journals (Sweden)

    Stephan Pabinger

    Full Text Available Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM. Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage

  8. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    Energy Technology Data Exchange (ETDEWEB)

    Athavale, Ajay

    2012-06-01

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  9. Bioinformatics analysis of circulating cell-free DNA sequencing data.

    Science.gov (United States)

    Chan, Landon L; Jiang, Peiyong

    2015-10-01

    The discovery of cell-free DNA molecules in plasma has opened up numerous opportunities in noninvasive diagnosis. Cell-free DNA molecules have become increasingly recognized as promising biomarkers for detection and management of many diseases. The advent of next generation sequencing has provided unprecedented opportunities to scrutinize the characteristics of cell-free DNA molecules in plasma in a genome-wide fashion and at single-base resolution. Consequently, clinical applications of circulating cell-free DNA analysis have not only revolutionized noninvasive prenatal diagnosis but also facilitated cancer detection and monitoring toward an era of blood-based personalized medicine. With the remarkably increasing throughput and lowering cost of next generation sequencing, bioinformatics analysis becomes increasingly demanding to understand the large amount of data generated by these sequencing platforms. In this Review, we highlight the major bioinformatics algorithms involved in the analysis of cell-free DNA sequencing data. Firstly, we briefly describe the biological properties of these molecules and provide an overview of the general bioinformatics approach for the analysis of cell-free DNA. Then, we discuss the specific upstream bioinformatics considerations concerning the analysis of sequencing data of circulating cell-free DNA, followed by further detailed elaboration on each key clinical situation in noninvasive prenatal diagnosis and cancer management where downstream bioinformatics analysis is heavily involved. We also discuss bioinformatics analysis as well as clinical applications of the newly developed massively parallel bisulfite sequencing of cell-free DNA. Finally, we offer our perspectives on the future development of bioinformatics in noninvasive diagnosis. Copyright © 2015 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.

  10. Validation of Genotyping-By-Sequencing Analysis in Populations of Tetraploid Alfalfa by 454 Sequencing

    Science.gov (United States)

    Rocher, Solen; Jean, Martine; Castonguay, Yves; Belzile, François

    2015-01-01

    Genotyping-by-sequencing (GBS) is a relatively low-cost high throughput genotyping technology based on next generation sequencing and is applicable to orphan species with no reference genome. A combination of genome complexity reduction and multiplexing with DNA barcoding provides a simple and affordable way to resolve allelic variation between plant samples or populations. GBS was performed on ApeKI libraries using DNA from 48 genotypes each of two heterogeneous populations of tetraploid alfalfa (Medicago sativa spp. sativa): the synthetic cultivar Apica (ATF0) and a derived population (ATF5) obtained after five cycles of recurrent selection for superior tolerance to freezing (TF). Nearly 400 million reads were obtained from two lanes of an Illumina HiSeq 2000 sequencer and analyzed with the Universal Network-Enabled Analysis Kit (UNEAK) pipeline designed for species with no reference genome. Following the application of whole dataset-level filters, 11,694 single nucleotide polymorphism (SNP) loci were obtained. About 60% had a significant match on the Medicago truncatula syntenic genome. The accuracy of allelic ratios and genotype calls based on GBS data was directly assessed using 454 sequencing on a subset of SNP loci scored in eight plant samples. Sequencing depth in this study was not sufficient for accurate tetraploid allelic dosage, but reliable genotype calls based on diploid allelic dosage were obtained when using additional quality filtering. Principal Component Analysis of SNP loci in plant samples revealed that a small proportion (<5%) of the genetic variability assessed by GBS is able to differentiate ATF0 and ATF5. Our results confirm that analysis of GBS data using UNEAK is a reliable approach for genome-wide discovery of SNP loci in outcrossed polyploids. PMID:26115486

  11. De novo transcriptome sequencing and sequence analysis of the malaria vector Anopheles sinensis (Diptera: Culicidae)

    Science.gov (United States)

    2014-01-01

    Background Anopheles sinensis is the major malaria vector in China and Southeast Asia. Vector control is one of the most effective measures to prevent malaria transmission. However, there is little transcriptome information available for the malaria vector. To better understand the biological basis of malaria transmission and to develop novel and effective means of vector control, there is a need to build a transcriptome dataset for functional genomics analysis by large-scale RNA sequencing (RNA-seq). Methods To provide a more comprehensive and complete transcriptome of An. sinensis, eggs, larvae, pupae, male adults and female adults RNA were pooled together for cDNA preparation, sequenced using the Illumina paired-end sequencing technology and assembled into unigenes. These unigenes were then analyzed in their genome mapping, functional annotation, homology, codon usage bias and simple sequence repeats (SSRs). Results Approximately 51.6 million clean reads were obtained, trimmed, and assembled into 38,504 unigenes with an average length of 571 bp, an N50 of 711 bp, and an average GC content 51.26%. Among them, 98.4% of unigenes could be mapped onto the reference genome, and 69% of unigenes could be annotated with known biological functions. Homology analysis identified certain numbers of An. sinensis unigenes that showed homology or being putative 1:1 orthologues with genomes of other Dipteran species. Codon usage bias was analyzed and 1,904 SSRs were detected, which will provide effective molecular markers for the population genetics of this species. Conclusions Our data and analysis provide the most comprehensive transcriptomic resource and characteristics currently available for An. sinensis, and will facilitate genetic, genomic studies, and further vector control of An. sinensis. PMID:25000941

  12. Molecular cloning, sequence analysis and structure prediction of the ...

    African Journals Online (AJOL)

    Molecular cloning, sequence analysis and structure prediction of the related to b 0,+ amino acid transporter (rBAT) in Cyprinus carpio L. ... The amplified product was 2370 bp, including a 42 bp 5'-untranslated region, a 288 bp 3'-untranslated region, and a 2040 bp open reading frame (ORF), which encoded 679 amino acids ...

  13. Sequence analysis corresponding to the PPE and PE proteins in ...

    Indian Academy of Sciences (India)

    Unknown

    AB repeats; Mycobacterium tuberculosis genome; PE-PPE domain; PPE, PE proteins; sequence analysis; surface antigens. J. Biosci. | Vol. ... bacterium tuberculosis genomes resulted in the identification of a previously uncharacterized 225 amino acid- ...... Vega Lopez F, Brooks L A, Dockrell H M, De Smet K A,. Thompson ...

  14. Sequence analysis corresponding to the PPE and PE proteins in ...

    Indian Academy of Sciences (India)

    Unknown

    Amino acid sequence analysis corresponding to the PE proteins resulted in the identification of tandem repeats comprising 41–43 amino acid ..... Q9RAP8. Leptospira interrogans serovar ictero- haemorrhagiae-B. ORFC protein. 5. P71003. Bacillus subtilis-B. Hypothetical 49⋅4 kDa protein. 5. ALL2941. Anabaena sp.

  15. Sequence symmetry analysis in pharmacovigilance and pharmacoepidemiologic studies

    DEFF Research Database (Denmark)

    Lai, Edward Chia Cheng; Pratt, Nicole; Hsieh, Cheng Yang

    2017-01-01

    Sequence symmetry analysis (SSA) is a method for detecting adverse drug events by utilizing computerized claims data. The method has been increasingly used to investigate safety concerns of medications and as a pharmacovigilance tool to identify unsuspected side effects. Validation studies have i...

  16. Inter simple sequence repeat analysis of genetic diversity of five ...

    African Journals Online (AJOL)

    This paper studied the genetic diversity of five cultivated pepper species using inter simple sequence repeat (ISSR) analysis. The amplicons of 13 out of 15 designed primers were stable polymorphic and therefore were used as genetic biomarkers. 135 total clear bands were obtained, of which 102 were polymorphic bands ...

  17. Molecular cloning, expression analysis and sequence prediction of ...

    African Journals Online (AJOL)

    CCAAT/enhancer-binding protein beta as an essential transcriptional factor, regulates the differentiation of adipocytes and the deposition of fat. Herein, we cloned the whole open reading frame (ORF) of bovine C/EBPβ gene and analyzed its putative protein structures via DNA cloning and sequence analysis. Then, the ...

  18. Sequence analysis of mitochondrial 16S ribosomal RNA gene ...

    Indian Academy of Sciences (India)

    Unknown

    Sequence analysis of mitochondrial 16S ribosomal RNA gene fragment from seven mosquito species. YOGESH S SHOUCHE* and MILIND S PATOLE. National Center for Cell Science, Pune University Campus, Pune 411 007, India. *Corresponding author (Fax, 91-20-5672259; Email, yogesh@nccs.res.in). Mosquitoes are ...

  19. Phylogenetic relationships of Malassezia species based on multilocus sequence analysis.

    Science.gov (United States)

    Castellá, Gemma; Coutinho, Selene Dall' Acqua; Cabañes, F Javier

    2014-01-01

    Members of the genus Malassezia are lipophilic basidiomycetous yeasts, which are part of the normal cutaneous microbiota of humans and other warm-blooded animals. Currently, this genus consists of 14 species that have been characterized by phenetic and molecular methods. Although several molecular methods have been used to identify and/or differentiate Malassezia species, the sequencing of the rRNA genes and the chitin synthase-2 gene (CHS2) are the most widely employed. There is little information about the β-tubulin gene in the genus Malassezia, a gene has been used for the analysis of complex species groups. The aim of the present study was to sequence a fragment of the β-tubulin gene of Malassezia species and analyze their phylogenetic relationship using a multilocus sequence approach based on two rRNA genes (ITS including 5.8S rRNA and D1/D2 region of 26S rRNA) together with two protein encoding genes (CHS2 and β-tubulin). The phylogenetic study of the partial β-tubulin gene sequences indicated that this molecular marker can be used to assess diversity and identify new species. The multilocus sequence analysis of the four loci provides robust support to delineate species at the terminal nodes and could help to estimate divergence times for the origin and diversification of Malassezia species.

  20. The SCALE criticality safety analysis sequences: Status and future directions

    International Nuclear Information System (INIS)

    Parks, C.V.

    1993-01-01

    The Standardized Computer Analyses for Licensing Evaluation (SCALE) code system. Was originally conceived and developed in the late 1970s for the US Nuclear Regulatory Commission. The goal was to provide easy-to-use, yet accurate, analysis capabilities for use in evaluating the criticality safety, shielding, and heat transfer aspects of transportation packages for radioactive material. The Criticality Safety Analysis Sequences (CSAS) for SCALE were developed to ''automate'' problem-dependent cross-section and material processing prior to execution of the wellestablished XSDRNPM or KENO codes for calculation of k eff . The criticality analysis sequences provided in SCALE-4 are summarized. The SCALE system continues to be maintained and enhanced by staff of the Computing Applications Division at Oak Ridge National Laboratory (ORNL). The purpose of this paper is to discuss recent work to improve system portability and user interfaces and to provide information on ongoing work to enhance the analysis capabilities

  1. Complete genome sequence analysis of chicken astrovirus isolate from India.

    Science.gov (United States)

    Patel, Amrutlal K; Pandit, Ramesh J; Thakkar, Jalpa R; Hinsu, Ankit T; Pandey, Vinod C; Pal, Joy K; Prajapati, Kantilal S; Jakhesara, Subhash J; Joshi, Chaitanya G

    2017-03-01

    Chicken astroviruses have been known to cause severe disease in chickens leading to increased mortality and "white chicks" condition. Here we aim to characterize the causative agent of visceral gout suspected for astrovirus infection in broiler breeder chickens. Total RNA isolated from allantoic fluid of SPF embryo passaged with infected chicken sample was sequenced by whole genome shotgun sequencing using ion-torrent PGM platform. The sequence was analysed for the presence of coding and non-coding features, its similarity with reported isolates and epitope analysis of capsid structural protein. The consensus length of 7513 bp genome sequence of Indian isolate of chicken astrovirus was obtained after assembly of 14,121 high quality reads. The genome was comprised of 13 bp 5'-UTR, three open reading frames (ORFs) including ORF1a encoding serine protease, ORF1b encoding RNA dependent RNA polymerase (RdRp) and ORF2 encoding capsid protein, and 298 bp of 3'-UTR which harboured two corona virus stem loop II like "s2m" motifs and a poly A stretch of 19 nucleotides. The genetic analysis of CAstV/INDIA/ANAND/2016 suggested highest sequence similarity of 86.94% with the chicken astrovirus isolate CAstV/GA2011 followed by 84.76% with CAstV/4175 and 74.48%% with CAstV/Poland/G059/2014 isolates. The capsid structural protein of CAstV/INDIA/ANAND/2016 showed 84.67% similarity with chicken astrovirus isolate CAstV/GA2011, 81.06% with CAstV/4175 and 41.18% with CAstV/Poland/G059/2014 isolates. However, the capsid protein sequence showed high degree of sequence identity at nucleotide level (98.64-99.32%) and at amino acids level (97.74-98.69%) with reported sequences of Indian isolates suggesting their common origin and limited sequence divergence. The epitope analysis by SVMTriP identified two unique epitopes in our isolate, seven shared epitopes among Indian isolates and two shared epitopes among all isolates except Poland isolate which carried all distinct epitopes.

  2. Phylogeny and classification of Dickeya based on multilocus sequence analysis.

    Science.gov (United States)

    Marrero, Glorimar; Schneider, Kevin L; Jenkins, Daniel M; Alvarez, Anne M

    2013-09-01

    Bacterial heart rot of pineapple reported in Hawaii in 2003 and reoccurring in 2006 was caused by an undetermined species of Dickeya. Classification of the bacterial strains isolated from infected pineapple to one of the recognized Dickeya species and their phylogenetic relationships with Dickeya were determined by a multilocus sequence analysis (MLSA), based on the partial gene sequences of dnaA, dnaJ, dnaX, gyrB and recN. Individual and concatenated gene phylogenies revealed that the strains form a clade with reference Dickeya sp. isolated from pineapple in Malaysia and are closely related to D. zeae; however, previous DNA-DNA reassociation values suggest that these strains do not meet the genomic threshold for consideration in D. zeae, and require further taxonomic analysis. An analysis of the markers used in this MLSA determined that recN was the best overall marker for resolution of species within Dickeya. Differential intraspecies resolution was observed with the other markers, suggesting that marker selection is important for defining relationships within a clade. Phylogenies produced with gene sequences from the sequenced genomes of strains D. dadantii Ech586, D. dadantii Ech703 and D. zeae Ech1591 did not place the sequenced strains with members of other well-characterized members of their respective species. The average nucleotide identity (ANI) and tetranucleotide frequencies determined for the sequenced strains corroborated the results of the MLSA that D. dadantii Ech586 and D. dadantii Ech703 should be reclassified as Dickeya zeae Ech586 and Dickeya paradisiaca Ech703, respectively, whereas D. zeae Ech1591 should be reclassified as Dickeya chrysanthemi Ech1591.

  3. Construction of an integrated database to support genomic sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  4. Analysis of Sequence Diagram Layout in Advanced UML Modelling Tools

    Directory of Open Access Journals (Sweden)

    Ņikiforova Oksana

    2016-05-01

    Full Text Available System modelling using Unified Modelling Language (UML is the task that should be solved for software development. The more complex software becomes the higher requirements are stated to demonstrate the system to be developed, especially in its dynamic aspect, which in UML is offered by a sequence diagram. To solve this task, the main attention is devoted to the graphical presentation of the system, where diagram layout plays the central role in information perception. The UML sequence diagram due to its specific structure is selected for a deeper analysis on the elements’ layout. The authors research represents the abilities of modern UML modelling tools to offer automatic layout of the UML sequence diagram and analyse them according to criteria required for the diagram perception.

  5. Evolutionary analysis of hepatitis C virus gene sequences from 1953

    Science.gov (United States)

    Gray, Rebecca R.; Tanaka, Yasuhito; Takebe, Yutaka; Magiorkinis, Gkikas; Buskell, Zelma; Seeff, Leonard; Alter, Harvey J.; Pybus, Oliver G.

    2013-01-01

    Reconstructing the transmission history of infectious diseases in the absence of medical or epidemiological records often relies on the evolutionary analysis of pathogen genetic sequences. The precision of evolutionary estimates of epidemic history can be increased by the inclusion of sequences derived from ‘archived’ samples that are genetically distinct from contemporary strains. Historical sequences are especially valuable for viral pathogens that circulated for many years before being formally identified, including HIV and the hepatitis C virus (HCV). However, surprisingly few HCV isolates sampled before discovery of the virus in 1989 are currently available. Here, we report and analyse two HCV subgenomic sequences obtained from infected individuals in 1953, which represent the oldest genetic evidence of HCV infection. The pairwise genetic diversity between the two sequences indicates a substantial period of HCV transmission prior to the 1950s, and their inclusion in evolutionary analyses provides new estimates of the common ancestor of HCV in the USA. To explore and validate the evolutionary information provided by these sequences, we used a new phylogenetic molecular clock method to estimate the date of sampling of the archived strains, plus the dates of four more contemporary reference genomes. Despite the short fragments available, we conclude that the archived sequences are consistent with a proposed sampling date of 1953, although statistical uncertainty is large. Our cross-validation analyses suggest that the bias and low statistical power observed here likely arise from a combination of high evolutionary rate heterogeneity and an unstructured, star-like phylogeny. We expect that attempts to date other historical viruses under similar circumstances will meet similar problems. PMID:23938759

  6. A general sequence processing and analysis program for protein engineering.

    Science.gov (United States)

    Stafford, Ryan L; Zimmerman, Erik S; Hallam, Trevor J; Sato, Aaron K

    2014-10-27

    Protein engineering projects often amass numerous raw DNA sequences, but no readily available software combines sequence processing and activity correlation required for efficient lead identification. XLibraryDisplay is an open source program integrated into Microsoft Excel for Windows that automates batch sequence processing via a simple step-by-step, menu-driven graphical user interface. XLibraryDisplay accepts any DNA template which is used as a basis for trimming, filtering, translating, and aligning hundreds to thousands of sequences (raw, FASTA, or Phred PHD file formats). Key steps for library characterization through lead discovery are available including library composition analysis, filtering by experimental data, graphing and correlating to experimental data, alignment to structural data extracted from PDB files, and generation of PyMOL visualization scripts. Though larger data sets can be handled, the program is best suited for analyzing approximately 10 000 or fewer leads or naïve clones which have been characterized using Sanger sequencing and other experimental approaches. XLibraryDisplay can be downloaded for free from sourceforge.net/projects/xlibrarydisplay/ .

  7. Now and next-generation sequencing techniques: future of sequence analysis using cloud computing.

    Science.gov (United States)

    Thakur, Radhe Shyam; Bandopadhyay, Rajib; Chaudhary, Bratati; Chatterjee, Sourav

    2012-01-01

    Advances in the field of sequencing techniques have resulted in the greatly accelerated production of huge sequence datasets. This presents immediate challenges in database maintenance at datacenters. It provides additional computational challenges in data mining and sequence analysis. Together these represent a significant overburden on traditional stand-alone computer resources, and to reach effective conclusions quickly and efficiently, the virtualization of the resources and computation on a pay-as-you-go concept (together termed "cloud computing") has recently appeared. The collective resources of the datacenter, including both hardware and software, can be available publicly, being then termed a public cloud, the resources being provided in a virtual mode to the clients who pay according to the resources they employ. Examples of public companies providing these resources include Amazon, Google, and Joyent. The computational workload is shifted to the provider, which also implements required hardware and software upgrades over time. A virtual environment is created in the cloud corresponding to the computational and data storage needs of the user via the internet. The task is then performed, the results transmitted to the user, and the environment finally deleted after all tasks are completed. In this discussion, we focus on the basics of cloud computing, and go on to analyze the prerequisites and overall working of clouds. Finally, the applications of cloud computing in biological systems, particularly in comparative genomics, genome informatics, and SNP detection are discussed with reference to traditional workflows.

  8. Now And Next Generation Sequencing Techniques: Future of Sequence Analysis using Cloud Computing

    Directory of Open Access Journals (Sweden)

    Radhe Shyam Thakur

    2012-12-01

    Full Text Available Advancements in the field of sequencing techniques resulted in the huge sequenced data to be produced at a very faster rate. It is going cumbersome for the datacenter to maintain the databases. Data mining and sequence analysis approaches needs to analyze the databases several times to reach any efficient conclusion. To cope with such overburden on computer resources and to reach efficient and effective conclusions quickly, the virtualization of the resources and computation on pay as you go concept was introduced and termed as cloud computing. The datacenter’s hardware and software is collectively known as cloud which when available publicly is termed as public cloud. The datacenter’s resources are provided in a virtual mode to the clients via a service provider like Amazon, Google and Joyent which charges on pay as you go manner. The workload is shifted to the provider which is maintained by the required hardware and software upgradation. The service provider manages it by upgrading the requirements in the virtual mode. Basically a virtual environment is created according to the need of the user by taking permission from datacenter via internet, the task is performed and the environment is deleted after the task is over. In this discussion, we are focusing on the basics of cloud computing, the prerequisites and overall working of clouds. Furthermore, briefly the applications of cloud computing in biological systems, especially in comparative genomics, genome informatics and SNP detection with reference to traditional workflow are discussed.

  9. SEQUENCING AND SEQUENCE ANALYSIS OF MYOSTATIN GENE IN THE EXON 1 OF THE CAMEL (CAMELUS DROMEDARIUS

    Directory of Open Access Journals (Sweden)

    M. G. SHAH, A. S. QURESHI1, M. REISSMANN2 AND H. J. SCHWARTZ3

    2006-10-01

    Full Text Available Myostatin, also called growth differentiation factor-8 (GDF-8, is a member of the mammalian growth transforming family (TGF-beta superfamily, which is expressed specifically in developing an adult skeletal muscle. Muscular hypertrophy allele (mh allele in the double muscle breeds involved mutation within the myostatin gene. Genomic DNA was isolated from the camel hair using NucleoSpin Tissue kit. Two animals of each of the six breeds namely, Marecha, Dhatti, Larri, Kohi, Sakrai and Cambelpuri were used for sequencing. For PCR amplification of the gene, a primer pair was designed from homolog regions of already published sequences of farm animals from GenBank. Results showed that camel myostatin possessed more than 90% homology with that of cattle, sheep and pig. Camel formed separate cluster from the pig in spite of having high homology (98% and showed 94% homology with cattle and sheep as reported in literature. Sequence analysis of the PCR amplified part of exon 1 (256 bp of the camel myostatin was identical among six camel breeds.

  10. Precise age and biostratigraphic significance of the Kinney Brick Quarry Lagerstätte, Pennsylvanian of New Mexico, USA

    DEFF Research Database (Denmark)

    Lucas, Spencer G.; Allen, Bruce D.; Krainer, Karl

    2011-01-01

    The Kinney Brick Quarry is a world famous Late Pennsylvanian fossil Lagerstätte in central New Mexico, USA. The age assigned to the Kinney Brick Quarry (early-middle Virgilian) has long been based more on its inferred lithostratigraphic position than on biostratigraphic indicators at the quarry. We...... have developed three datasets--stratigraphic position, fusulinids and conodonts--that indicate the Kinney Brick Quarry is older, of middle Missourian (Kasimovian) age. Our detailed local lithostratigraphic studies coupled with regional stratigraphic investigations indicate the Kinney Brick Quarry...... region (Dennis cyclothem; middle Missourian). Nonmarine biostratigraphic indicators at the Kinney Brick Quarry indicate either an imprecise age (Late Pennsylvanian: megaflora) or a slightly younger age (late Kasimovian-early Gzhelian: blattids) than do stratigraphic position and marine microfossils...

  11. Infrared thermal facial image sequence registration analysis and verification

    Science.gov (United States)

    Chen, Chieh-Li; Jian, Bo-Lin

    2015-03-01

    To study the emotional responses of subjects to the International Affective Picture System (IAPS), infrared thermal facial image sequence is preprocessed for registration before further analysis such that the variance caused by minor and irregular subject movements is reduced. Without affecting the comfort level and inducing minimal harm, this study proposes an infrared thermal facial image sequence registration process that will reduce the deviations caused by the unconscious head shaking of the subjects. A fixed image for registration is produced through the localization of the centroid of the eye region as well as image translation and rotation processes. Thermal image sequencing will then be automatically registered using the two-stage genetic algorithm proposed. The deviation before and after image registration will be demonstrated by image quality indices. The results show that the infrared thermal image sequence registration process proposed in this study is effective in localizing facial images accurately, which will be beneficial to the correlation analysis of psychological information related to the facial area.

  12. Multilocus sequence analysis of Treponema denticola strains of diverse origin

    Science.gov (United States)

    2013-01-01

    Background The oral spirochete bacterium Treponema denticola is associated with both the incidence and severity of periodontal disease. Although the biological or phenotypic properties of a significant number of T. denticola isolates have been reported in the literature, their genetic diversity or phylogeny has never been systematically investigated. Here, we describe a multilocus sequence analysis (MLSA) of 20 of the most highly studied reference strains and clinical isolates of T. denticola; which were originally isolated from subgingival plaque samples taken from subjects from China, Japan, the Netherlands, Canada and the USA. Results The sequences of the 16S ribosomal RNA gene, and 7 conserved protein-encoding genes (flaA, recA, pyrH, ppnK, dnaN, era and radC) were successfully determined for each strain. Sequence data was analyzed using a variety of bioinformatic and phylogenetic software tools. We found no evidence of positive selection or DNA recombination within the protein-encoding genes, where levels of intraspecific sequence polymorphism varied from 18.8% (flaA) to 8.9% (dnaN). Phylogenetic analysis of the concatenated protein-encoding gene sequence data (ca. 6,513 nucleotides for each strain) using Bayesian and maximum likelihood approaches indicated that the T. denticola strains were monophyletic, and formed 6 well-defined clades. All analyzed T. denticola strains appeared to have a genetic origin distinct from that of ‘Treponema vincentii’ or Treponema pallidum. No specific geographical relationships could be established; but several strains isolated from different continents appear to be closely related at the genetic level. Conclusions Our analyses indicate that previous biological and biophysical investigations have predominantly focused on a subset of T. denticola strains with a relatively narrow range of genetic diversity. Our methodology and results establish a genetic framework for the discrimination and phylogenetic analysis of T

  13. Multilocus sequence analysis of Treponema denticola strains of diverse origin

    Directory of Open Access Journals (Sweden)

    Mo Sisu

    2013-02-01

    Full Text Available Abstract Background The oral spirochete bacterium Treponema denticola is associated with both the incidence and severity of periodontal disease. Although the biological or phenotypic properties of a significant number of T. denticola isolates have been reported in the literature, their genetic diversity or phylogeny has never been systematically investigated. Here, we describe a multilocus sequence analysis (MLSA of 20 of the most highly studied reference strains and clinical isolates of T. denticola; which were originally isolated from subgingival plaque samples taken from subjects from China, Japan, the Netherlands, Canada and the USA. Results The sequences of the 16S ribosomal RNA gene, and 7 conserved protein-encoding genes (flaA, recA, pyrH, ppnK, dnaN, era and radC were successfully determined for each strain. Sequence data was analyzed using a variety of bioinformatic and phylogenetic software tools. We found no evidence of positive selection or DNA recombination within the protein-encoding genes, where levels of intraspecific sequence polymorphism varied from 18.8% (flaA to 8.9% (dnaN. Phylogenetic analysis of the concatenated protein-encoding gene sequence data (ca. 6,513 nucleotides for each strain using Bayesian and maximum likelihood approaches indicated that the T. denticola strains were monophyletic, and formed 6 well-defined clades. All analyzed T. denticola strains appeared to have a genetic origin distinct from that of ‘Treponema vincentii’ or Treponema pallidum. No specific geographical relationships could be established; but several strains isolated from different continents appear to be closely related at the genetic level. Conclusions Our analyses indicate that previous biological and biophysical investigations have predominantly focused on a subset of T. denticola strains with a relatively narrow range of genetic diversity. Our methodology and results establish a genetic framework for the discrimination and phylogenetic

  14. Automated sequence analysis of atmospheric oxidation pathways: SEQUENCE version 1.0

    Directory of Open Access Journals (Sweden)

    T. M. Butler

    2009-10-01

    Full Text Available An algorithm for the sequential analysis of the atmospheric oxidation of chemical species using output from a photochemical model is presented. Starting at a "root species", the algorithm traverses all possible reaction sequences which consume this species, and lead, via intermediate products, to final products. The algorithm keeps track of the effects of all of these reactions on their respective reactants and products. Upon completion, the algorithm has built a detailed picture of the effects of the oxidation of the root species on its chemical surroundings. The output of the algorithm can be used to determine product yields, radical recycling fractions, and ozone production potentials of arbitrary chemical species.

  15. A stochastic model for EEG microstate sequence analysis.

    Science.gov (United States)

    Gärtner, Matthias; Brodbeck, Verena; Laufs, Helmut; Schneider, Gaby

    2015-01-01

    The analysis of spontaneous resting state neuronal activity is assumed to give insight into the brain function. One noninvasive technique to study resting state activity is electroencephalography (EEG) with a subsequent microstate analysis. This technique reduces the recorded EEG signal to a sequence of prototypical topographical maps, which is hypothesized to capture important spatio-temporal properties of the signal. In a statistical EEG microstate analysis of healthy subjects in wakefulness and three stages of sleep, we observed a simple structure in the microstate transition matrix. It can be described with a first order Markov chain in which the transition probability from the current state (i.e., map) to a different map does not depend on the current map. The resulting transition matrix shows a high agreement with the observed transition matrix, requiring only about 2% of mass transport (1/2 L1-distance). In the second part, we introduce an extended framework in which the simple Markov chain is used to make inferences on a potential underlying time continuous process. This process cannot be directly observed and is therefore usually estimated from discrete sampling points of the EEG signal given by the local maxima of the global field power. Therefore, we propose a simple stochastic model called sampled marked intervals (SMI) model that relates the observed sequence of microstates to an assumed underlying process of background intervals and thus, complements approaches that focus on the analysis of observable microstate sequences. Copyright © 2014 Elsevier Inc. All rights reserved.

  16. Congruence analysis of point clouds from unstable stereo image sequences

    Directory of Open Access Journals (Sweden)

    C. Jepping

    2014-06-01

    Full Text Available This paper deals with the correction of exterior orientation parameters of stereo image sequences over deformed free-form surfaces without control points. Such imaging situation can occur, for example, during photogrammetric car crash test recordings where onboard high-speed stereo cameras are used to measure 3D surfaces. As a result of such measurements 3D point clouds of deformed surfaces are generated for a complete stereo sequence. The first objective of this research focusses on the development and investigation of methods for the detection of corresponding spatial and temporal tie points within the stereo image sequences (by stereo image matching and 3D point tracking that are robust enough for a reliable handling of occlusions and other disturbances that may occur. The second objective of this research is the analysis of object deformations in order to detect stable areas (congruence analysis. For this purpose a RANSAC-based method for congruence analysis has been developed. This process is based on the sequential transformation of randomly selected point groups from one epoch to another by using a 3D similarity transformation. The paper gives a detailed description of the congruence analysis. The approach has been tested successfully on synthetic and real image data.

  17. CISAPS: Complex Informational Spectrum for the Analysis of Protein Sequences

    Directory of Open Access Journals (Sweden)

    Charalambos Chrysostomou

    2015-01-01

    Full Text Available Complex informational spectrum analysis for protein sequences (CISAPS and its web-based server are developed and presented. As recent studies show, only the use of the absolute spectrum in the analysis of protein sequences using the informational spectrum analysis is proven to be insufficient. Therefore, CISAPS is developed to consider and provide results in three forms including absolute, real, and imaginary spectrum. Biologically related features to the analysis of influenza A subtypes as presented as a case study in this study can also appear individually either in the real or imaginary spectrum. As the results presented, protein classes can present similarities or differences according to the features extracted from CISAPS web server. These associations are probable to be related with the protein feature that the specific amino acid index represents. In addition, various technical issues such as zero-padding and windowing that may affect the analysis are also addressed. CISAPS uses an expanded list of 611 unique amino acid indices where each one represents a different property to perform the analysis. This web-based server enables researchers with little knowledge of signal processing methods to apply and include complex informational spectrum analysis to their work.

  18. Ichnology applied to sequence stratigraphic analysis of Siluro-Devonian mud-dominated shelf deposits, Paraná Basin, Brazil

    Science.gov (United States)

    Sedorko, Daniel; Netto, Renata G.; Savrda, Charles E.

    2018-04-01

    Previous studies of the Paraná Supersequence (Furnas and Ponta Grossa formations) of the Paraná Basin in southern Brazil have yielded disparate sequence stratigraphic interpretations. An integrated sedimentological, paleontological, and ichnological model was created to establish a refined sequence stratigraphic framework for this succession, focusing on the Ponta Grossa Formation. Twenty-nine ichnotaxa are recognized in the Ponta Grossa Formation, recurring assemblages of which define five trace fossil suites that represent various expressions of the Skolithos, Glossifungites and Cruziana ichnofacies. Physical sedimentologic characteristics and associated softground ichnofacies provide the basis for recognizing seven facies that reflect a passive relationship to bathymetric gradients from shallow marine (shoreface) to offshore deposition. The vertical distribution of facies provides the basis for dividing the Ponta Grossa Formation into three major (3rd-order) depositional sequences- Siluro-Devonian and Devonian I and II-each containing a record of three to seven higher-order relative sea-level cycles. Major sequence boundaries, commonly coinciding with hiatuses recognized from previously published biostratigraphic data, are locally marked by firmground Glossifungites Ichnofacies associated with submarine erosion. Maximum transgressive horizons are prominently marked by unbioturbated or weakly bioturbated black shales. By integrating observations of the Ponta Grossa Formation with those recently made on the underlying marginal- to shallow-marine Furnas Formation, the entire Paraná Supersequence can be divided into four disconformity-bound sequences: a Lower Silurian (Llandovery-Wenlock) sequence, corresponding to lower and middle units of the Furnas; a Siluro-Devonian sequence (?Pridoli-Early Emsian), and Devonian sequences I (Late Emsian-Late Eifelian) and II (Late Eifelian-Early Givetian). Stratigraphic positions of sequence boundaries generally coincide with

  19. Frame to Frame Diffeomorphic Motion Analysis from Echocardiographic Sequences

    OpenAIRE

    Zhang, Zhijun; Sahn, David; Song, Xubo

    2011-01-01

    International audience; Quantitative motion analysis from echocardiography is an important yet challenging problem. We develop a motion estimation algorithm for echocardiographic image sequences based on diffeomorphic image registration in which the velocity field is spatiotemporally smooth. The novelty of this work is that instead of optimizing a functional of velocity field which consists of similarity metrics between a reference image to each of the following images (\\textitfirst-to-follow...

  20. The sequence and analysis of Trypanosoma brucei chromosome II

    OpenAIRE

    El-Sayed, Najib M. A.; Ghedin, Elodie; Song, Jinming; MacLeod, Annette; Bringaud, Frederic; Larkin, Christopher; Wanless, David; Peterson, Jeremy; Hou, Lihua; Taylor, Sonya; Tweedie, Alison; Biteau, Nicolas; Khalak, Hanif G.; Lin, Xiaoying; Mason, Tanya

    2003-01-01

    We report here the sequence of chromosome II from Trypanosoma brucei, the causative agent of African sleeping sickness. The 1.2-Mb pairs encode about 470 predicted genes organised in 17 directional clusters on either strand, the largest cluster of which has 92 genes lined up over a 284-kb region. An analysis of the GC skew reveals strand compositional asymmetries that coincide with the distribution of protein-coding genes, suggesting these asymmetries may be the result of transcription-couple...

  1. Analysis on Response of Dynamic Systems to Pulse Sequences Excitation

    Directory of Open Access Journals (Sweden)

    Xie Lili

    2009-07-01

    Full Text Available Near-fault ground motions with long-period pulses can place severe demands on structures near an active fault. These pulse-type ground motions can be represented by pulse sequences with simple shapes. Half-sinusoidal pulse sequences are used to approximate recorded ground motions and dynamic responses of SDOF system under the excitation of these pulse sequences are studied. Four cases are considered: (1 variation in duration of successor sub-pulse; (2 variation in duration of predecessor sub-pulse; (3 variation in amplitude of successor sub-pulse; and (4 variation in amplitude of predecessor sub-pulse. The corresponding acceleration, velocity and displacement response spectra of these pulse sequences are studied. The analysis on SDOF system shows that in some cases the responses are strongly affected by the changes of duration and/or amplitude of the sub-pulse. The study can be useful to understand the influences of sub-pulse in the near-fault pulse-type ground motions.

  2. Environmental impact analysis for the main accidental sequences of ignitor

    International Nuclear Information System (INIS)

    Carpignano, A.; Francabandiera, S.; Vella, R.; Zucchetti, M.

    1996-01-01

    A safety analysis study has been applied to the Ignitor machine using Probabilistic Safety Assessment. The main initiating events have been identified, and accident sequences have been studied by means of traditional methods such as Failure Mode and Effect Analysis (FMEA), Fault Trees (FT) and Event Trees (ET). The consequences of the radioactive environmental releases have been assessed in terms of Effective Dose Equivalent (EDEs) to the Most Exposed Individuals (MEI) of the chosen site, by means of a population dose code. Results point out the low enviromental impact of the machine. 13 refs., 1 fig., 3 tabs

  3. Targeted DNA methylation analysis by next-generation sequencing.

    Science.gov (United States)

    Masser, Dustin R; Stanford, David R; Freeman, Willard M

    2015-02-24

    The role of epigenetic processes in the control of gene expression has been known for a number of years. DNA methylation at cytosine residues is of particular interest for epigenetic studies as it has been demonstrated to be both a long lasting and a dynamic regulator of gene expression. Efforts to examine epigenetic changes in health and disease have been hindered by the lack of high-throughput, quantitatively accurate methods. With the advent and popularization of next-generation sequencing (NGS) technologies, these tools are now being applied to epigenomics in addition to existing genomic and transcriptomic methodologies. For epigenetic investigations of cytosine methylation where regions of interest, such as specific gene promoters or CpG islands, have been identified and there is a need to examine significant numbers of samples with high quantitative accuracy, we have developed a method called Bisulfite Amplicon Sequencing (BSAS). This method combines bisulfite conversion with targeted amplification of regions of interest, transposome-mediated library construction and benchtop NGS. BSAS offers a rapid and efficient method for analysis of up to 10 kb of targeted regions in up to 96 samples at a time that can be performed by most research groups with basic molecular biology skills. The results provide absolute quantitation of cytosine methylation with base specificity. BSAS can be applied to any genomic region from any DNA source. This method is useful for hypothesis testing studies of target regions of interest as well as confirmation of regions identified in genome-wide methylation analyses such as whole genome bisulfite sequencing, reduced representation bisulfite sequencing, and methylated DNA immunoprecipitation sequencing.

  4. Analysis of integrated human papillomavirus type 16 DNA in cervical cancers: amplification of viral sequences together with cellular flanking sequences.

    Science.gov (United States)

    Wagatsuma, M; Hashimoto, K; Matsukura, T

    1990-01-01

    We have isolated four clones of integrated human papillomavirus type 16 (HPV-16) DNA from four different primary cervical cancer specimens. All clones were found to be monomeric or dimeric forms of HPV-16 DNA with cellular flanking sequences at both ends. Analysis of the viral sequences in these clones showed that E6/E7 open reading frames and the long control region were conserved and that no region specific for the integration was detected. Analysis of the cellular flanking sequences revealed no significant homology with any known human DNA sequences, except Alu sequences, and no homology among the clones, indicating no cellular sequence specific for the integration. By probing with single-copy cellular flanking sequences from the clones, it was demonstrated that the integrated HPV-16 DNAs, with different sizes in the same specimens, shared the same cellular flanking sequences at the ends. Furthermore, it was shown that the viral sequences together with cellular flanking sequences were amplified. The possible process of viral integration into cell chromosomes in cervical cancer is discussed. Images PMID:2153245

  5. Swab-to-Sequence: Real-time Data Analysis Platform for the Biomolecule Sequencer

    Data.gov (United States)

    National Aeronautics and Space Administration — DNA was successfully sequenced on the ISS in 2016, but the DNA sequenced was prepared on the ground. With FY’16 IRAD funds, the same team developed a...

  6. Pan-Cancer Analysis of Genomic Sequencing Among the Elderly.

    Science.gov (United States)

    Wahl, Daniel R; Nguyen, Paul L; Santiago, Maria; Yousefi, Kasra; Davicioni, Elai; Shumway, Dean A; Speers, Corey; Mehra, Rohit; Feng, Felix Y; Osborne, Joseph R; Spratt, Daniel E

    2017-07-15

    We hypothesized that elderly patients might have age-specific genetic abnormalities yet be underrepresented in currently available sequencing repositories, which could limit the effect of sequencing efforts for this population. Leveraging The Cancer Genome Atlas (TCGA) data portal, 9 tumor types were analyzed. The frequency distribution of cancer by age was determined and compared with Surveillance, Epidemiology, and End Results data. Using the estimated median somatic mutational frequency of each tumor type, the samples needed beyond TCGA to detect a 10% mutational frequency were calculated. Microarray data from a separate prospective cohort were obtained from primary prostatectomy samples to determine whether elderly-specific transcriptomic alterations could be identified. Of the 5236 TCGA samples, 73% were from patients aged elderly patients with cancer were likely to harbor age-specific molecular abnormalities, we accessed transcriptomic data from a separate, larger database of >2000 prostate cancer samples. That analysis revealed significant differences in the expression of 10 genes in patients aged ≥70 years compared with those Elderly patients have been underrepresented in genomic sequencing studies. Our data suggest the presence of elderly-specific molecular alterations. Further dedicated efforts to understand the biology of cancer among the elderly will be important moving forward. Copyright © 2017 Elsevier Inc. All rights reserved.

  7. Linear discriminant analysis of character sequences using occurrences of words

    KAUST Repository

    Dutta, Subhajit

    2014-02-01

    Classification of character sequences, where the characters come from a finite set, arises in disciplines such as molecular biology and computer science. For discriminant analysis of such character sequences, the Bayes classifier based on Markov models turns out to have class boundaries defined by linear functions of occurrences of words in the sequences. It is shown that for such classifiers based on Markov models with unknown orders, if the orders are estimated from the data using cross-validation, the resulting classifier has Bayes risk consistency under suitable conditions. Even when Markov models are not valid for the data, we develop methods for constructing classifiers based on linear functions of occurrences of words, where the word length is chosen by cross-validation. Such linear classifiers are constructed using ideas of support vector machines, regression depth, and distance weighted discrimination. We show that classifiers with linear class boundaries have certain optimal properties in terms of their asymptotic misclassification probabilities. The performance of these classifiers is demonstrated in various simulated and benchmark data sets.

  8. Planarian homeobox genes: cloning, sequence analysis, and expression.

    Science.gov (United States)

    Garcia-Fernàndez, J; Baguñà, J; Saló, E

    1991-01-01

    Freshwater planarians (Platyhelminthes, Turbellaria, and Tricladida) are acoelomate, triploblastic, unsegmented, and bilaterally symmetrical organisms that are mainly known for their ample power to regenerate a complete organism from a small piece of their body. To identify potential pattern-control genes in planarian regeneration, we have isolated two homeobox-containing genes, Dth-1 and Dth-2 [Dugesia (Girardia) tigrina homeobox], by using degenerate oligonucleotides corresponding to the most conserved amino acid sequence from helix-3 of the homeodomain. Dth-1 and Dth-2 homeodomains are closely related (68% at the nucleotide level and 78% at the protein level) and show the conserved residues characteristic of the homeodomains identified to data. Similarity with most homeobox sequences is low (30-50%), except with Drosophila NK homeodomains (80-82% with NK-2) and the rodent TTF-1 homeodomain (77-87%). Some unusual amino acid residues specific to NK-2, TTF-1, Dth-1, and Dth-2 can be observed in the recognition helix (helix-3) and may define a family of homeodomains. The deduced amino acid sequences from the cDNAs contain, in addition to the homeodomain, other domains also present in various homeobox-containing genes. The expression of both genes, detected by Northern blot analysis, appear slightly higher in cephalic regions than in the rest of the intact organism, while a slight increase is detected in the central period (5 days) or regeneration. Images PMID:1714599

  9. Integrated visual analysis of protein structures, sequences, and feature data.

    Science.gov (United States)

    Stolte, Christian; Sabir, Kenneth S; Heinrich, Julian; Hammang, Christopher J; Schafferhans, Andrea; O'Donoghue, Seán I

    2015-01-01

    To understand the molecular mechanisms that give rise to a protein's function, biologists often need to (i) find and access all related atomic-resolution 3D structures, and (ii) map sequence-based features (e.g., domains, single-nucleotide polymorphisms, post-translational modifications) onto these structures. To streamline these processes we recently developed Aquaria, a resource offering unprecedented access to protein structure information based on an all-against-all comparison of SwissProt and PDB sequences. In this work, we provide a requirements analysis for several frequently occuring tasks in molecular biology and describe how design choices in Aquaria meet these requirements. Finally, we show how the interface can be used to explore features of a protein and gain biologically meaningful insights in two case studies conducted by domain experts. The user interface design of Aquaria enables biologists to gain unprecedented access to molecular structures and simplifies the generation of insight. The tasks involved in mapping sequence features onto structures can be conducted easier and faster using Aquaria.

  10. Determining physical constraints in transcriptional initiationcomplexes using DNA sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Shultzaberger, Ryan K.; Chiang, Derek Y.; Moses, Alan M.; Eisen,Michael B.

    2007-07-01

    Eukaryotic gene expression is often under the control ofcooperatively acting transcription factors whose binding is limited bystructural constraints. By determining these structural constraints, wecan understand the "rules" that define functional cooperativity.Conversely, by understanding the rules of binding, we can inferstructural characteristics. We have developed an information theory basedmethod for approximating the physical limitations of cooperativeinteractions by comparing sequence analysis to microarray expressiondata. When applied to the coordinated binding of the sulfur amino acidregulatory protein Met4 by Cbf1 and Met31, we were able to create acombinatorial model that can correctly identify Met4 regulatedgenes.

  11. De novo transcriptome assembly of Zanthoxylum bungeanum using Illumina sequencing for evolutionary analysis and simple sequence repeat marker development

    OpenAIRE

    Feng, Shijing; Zhao, Lili; Liu, Zhenshan; Liu, Yulin; Yang, Tuxi; Wei, Anzhi

    2017-01-01

    Zanthoxylum, an ancient economic crop in Asia, has a satisfying aromatic taste and immense medicinal values. A lack of genomic information and genetic markers has limited the evolutionary analysis and genetic improvement of Zanthoxylum species and their close relatives. To better understand the evolution, domestication, and divergence of Zanthoxylum, we present a de novo transcriptome analysis of an elite cultivar of Z. bungeanum using Illumina sequencing; we then developed simple sequence re...

  12. Streaming Support for Data Intensive Cloud-Based Sequence Analysis

    Science.gov (United States)

    Issa, Shadi A.; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J.; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

    2013-01-01

    Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation. PMID:23710461

  13. Next-Generation Sequence Analysis of Cancer Xenograft Models

    Science.gov (United States)

    Rossello, Fernando J.; Tothill, Richard W.; Britt, Kara; Marini, Kieren D.; Falzon, Jeanette; Thomas, David M.; Peacock, Craig D.; Marchionni, Luigi; Li, Jason; Bennett, Samara; Tantoso, Erwin; Brown, Tracey; Chan, Philip; Martelotto, Luciano G.; Watkins, D. Neil

    2013-01-01

    Next-generation sequencing (NGS) studies in cancer are limited by the amount, quality and purity of tissue samples. In this situation, primary xenografts have proven useful preclinical models. However, the presence of mouse-derived stromal cells represents a technical challenge to their use in NGS studies. We examined this problem in an established primary xenograft model of small cell lung cancer (SCLC), a malignancy often diagnosed from small biopsy or needle aspirate samples. Using an in silico strategy that assign reads according to species-of-origin, we prospectively compared NGS data from primary xenograft models with matched cell lines and with published datasets. We show here that low-coverage whole-genome analysis demonstrated remarkable concordance between published genome data and internal controls, despite the presence of mouse genomic DNA. Exome capture sequencing revealed that this enrichment procedure was highly species-specific, with less than 4% of reads aligning to the mouse genome. Human-specific expression profiling with RNA-Seq replicated array-based gene expression experiments, whereas mouse-specific transcript profiles correlated with published datasets from human cancer stroma. We conclude that primary xenografts represent a useful platform for complex NGS analysis in cancer research for tumours with limited sample resources, or those with prominent stromal cell populations. PMID:24086345

  14. Next-generation sequence analysis of cancer xenograft models.

    Directory of Open Access Journals (Sweden)

    Fernando J Rossello

    Full Text Available Next-generation sequencing (NGS studies in cancer are limited by the amount, quality and purity of tissue samples. In this situation, primary xenografts have proven useful preclinical models. However, the presence of mouse-derived stromal cells represents a technical challenge to their use in NGS studies. We examined this problem in an established primary xenograft model of small cell lung cancer (SCLC, a malignancy often diagnosed from small biopsy or needle aspirate samples. Using an in silico strategy that assign reads according to species-of-origin, we prospectively compared NGS data from primary xenograft models with matched cell lines and with published datasets. We show here that low-coverage whole-genome analysis demonstrated remarkable concordance between published genome data and internal controls, despite the presence of mouse genomic DNA. Exome capture sequencing revealed that this enrichment procedure was highly species-specific, with less than 4% of reads aligning to the mouse genome. Human-specific expression profiling with RNA-Seq replicated array-based gene expression experiments, whereas mouse-specific transcript profiles correlated with published datasets from human cancer stroma. We conclude that primary xenografts represent a useful platform for complex NGS analysis in cancer research for tumours with limited sample resources, or those with prominent stromal cell populations.

  15. Sequence comparison and phylogenetic analysis of core gene of ...

    African Journals Online (AJOL)

    STORAGESEVER

    2010-07-19

    Jul 19, 2010 ... sequences from Japan are grouped into same cluster in the phylogenetic tree. Sequence comparison and phylogenetic ..... Tree was generated by Neighbor joining algorithm. Boot strap values are shown ... Clustal W: improving the sensitivity of progressive multiple sequence alignment through sequence ...

  16. A novel method for comparative analysis of DNA sequences by Ramanujan-Fourier transform.

    Science.gov (United States)

    Yin, Changchuan; Yin, Xuemeng E; Wang, Jiasong

    2014-12-01

    Alignment-free sequence analysis approaches provide important alternatives over multiple sequence alignment (MSA) in biological sequence analysis because alignment-free approaches have low computation complexity and are not dependent on high level of sequence identity. However, most of the existing alignment-free methods do not employ true full information content of sequences and thus can not accurately reveal similarities and differences among DNA sequences. We present a novel alignment-free computational method for sequence analysis based on Ramanujan-Fourier transform (RFT), in which complete information of DNA sequences is retained. We represent DNA sequences as four binary indicator sequences and apply RFT on the indicator sequences to convert them into frequency domain. The Euclidean distance of the complete RFT coefficients of DNA sequences are used as similarity measures. To address the different lengths of RFT coefficients in Euclidean space, we pad zeros to short DNA binary sequences so that the binary sequences equal the longest length in the comparison sequence data. Thus, the DNA sequences are compared in the same dimensional frequency space without information loss. We demonstrate the usefulness of the proposed method by presenting experimental results on hierarchical clustering of genes and genomes. The proposed method opens a new channel to biological sequence analysis, classification, and structural module identification.

  17. Frasnian reef and basinal strata of West Central Alberta: A combined sedimentological and biostratigraphic analysis

    Energy Technology Data Exchange (ETDEWEB)

    Weissenberger, J.A.W. (Imperial Oil Resources Ltd., Calgary, AB (Canada))

    1994-03-01

    The depositional history for the Frasnian in the Nordegg area is interpreted and illustrated on cross sections and paleogeographic maps. Carbonate deposition began with the flooding of the West Alberta Arch and the deposition of the upper Swan Hills Formation during the Lower asymmetrica Zone. Transgression in the Middle asymmetrica Zone initiated the basinal Cline Channel and Duvernay Formation shale deposition, while the time equivalent Cooking Lake Formation was deposited on the drowned Swan Hills platform. The overlying lower Leduc Formation shows backstepping and aggradational reef margin stacking patterns. Maximum relief from the carbonate platform to surrounding Duvernay Formation shale during the Upper asymmetrica Zone was 100 m. Aggradation and backstepping was repeated in the Ancyrognathus trianularis Zone, with syndepositional relief reaching 170 m at the Wapiabi Gap reef margin. Platfrom-margin profiles were controlled by physical factors such as dominant wind direction and currents. On the Ram Range the margin backstepped, but then aggraded at Cripple Creek. At Wapiabi Gap, to the north on the Bighorn Range, the margin was dominantly aggradational. Ireton Formation shale deposition was also influenced by currents. In the Lower gigas Zone, the Leduc carbonate platform reached a maximum syndepositional relief at 220 m. A change from dominantly biohermal to biostromal platform margins occurred. A prograding wedge of Ireton Formation shale filled much of the relief in the Cline Channel, while the upper Leduc platform was drowned. Finally, the progradational Nisku Formation was deposited during the Upper gigas Zone. 70 refs., 20 figs.

  18. An Analysis of Loss of Offsite Power Sequence for the Severe Accident Analysis Database (II)

    Energy Technology Data Exchange (ETDEWEB)

    Park, Soo Yong; Kim, Dong Ha

    2006-12-15

    This report contains analysis methodologies and calculation results of loss of offsite sequences for the severe accident analysis database system. The Korean standard nuclear power plant has been selected as a reference plant. Based on the probabilistic safety analysis of the corresponding plant, Twelve accident scenarios, which was predicted to have more than 10-10 /ry occurrence frequency, have been analyzed as base cases for the loss of offsite sequence database. The functions of the severe accident analysis database system will be to make a diagnosis of the accident by some input information from the plant symptoms, to search a corresponding scenario, and finally to provide the user phenomenological information based on the pre-analyzed results. The MAAP 4.06 calculation results of loss of offsite sequence in this report will be utilized as input data of the severe accident analysis database system. This report updates and complements a previously published Technical Report.

  19. Multilocus sequence analysis for Leishmania braziliensis outbreak investigation.

    Directory of Open Access Journals (Sweden)

    Mariel A Marlow

    2014-02-01

    Full Text Available With the emergence of leishmaniasis in new regions around the world, molecular epidemiological methods with adequate discriminatory power, reproducibility, high throughput and inter-laboratory comparability are needed for outbreak investigation of this complex parasitic disease. As multilocus sequence analysis (MLSA has been projected as the future gold standard technique for Leishmania species characterization, we propose a MLSA panel of six housekeeping gene loci (6pgd, mpi, icd, hsp70, mdhmt, mdhnc for investigating intraspecific genetic variation of L. (Viannia braziliensis strains and compare the resulting genetic clusters with several epidemiological factors relevant to outbreak investigation. The recent outbreak of cutaneous leishmaniasis caused by L. (V. braziliensis in the southern Brazilian state of Santa Catarina is used to demonstrate the applicability of this technique. Sequenced fragments from six genetic markers from 86 L. (V. braziliensis strains from twelve Brazilian states, including 33 strains from Santa Catarina, were used to determine clonal complexes, genetic structure, and phylogenic networks. Associations between genetic clusters and networks with epidemiological characteristics of patients were investigated. MLSA revealed epidemiological patterns among L. (V. braziliensis strains, even identifying strains from imported cases among the Santa Catarina strains that presented extensive homogeneity. Evidence presented here has demonstrated MLSA possesses adequate discriminatory power for outbreak investigation, as well as other potential uses in the molecular epidemiology of leishmaniasis.

  20. Sequencing Infrastructure Investments under Deep Uncertainty Using Real Options Analysis

    Directory of Open Access Journals (Sweden)

    Nishtha Manocha

    2018-02-01

    Full Text Available The adaptation tipping point and adaptation pathway approach developed to make decisions under deep uncertainty do not shed light on which among the multiple available pathways should be chosen as the preferred pathway. This creates the need to extend these approaches by means of suitable tools that can help sequence actions and subsequently enable the outlining of relevant policies. This paper presents two sequencing approaches, namely, the “Build to Target” and “Build Up” approach, to aid in sub-selecting a set of preferred pathways. Both approaches differ in the levels of flexibility they offer. They are exemplified by means of two case studies wherein the Net Present Valuation and the Real Options Analysis are employed as selection criterions. The results demonstrate the benefit of these two approaches when used in conjunction with the adaptation pathways and show how the pathways selected by means of a Build to Target approach generally have a value greater than, or at least the same as, the pathways selected by the Build Up approach. Further, this paper also demonstrates the capacity of Real Options to quantify and capture the economic value of flexibility, which cannot be done by traditional valuation approaches such as Net Present Valuation.

  1. Secondary structure-based analysis of mouse brain small RNA sequences obtained by using next-generation sequencing.

    Science.gov (United States)

    Kiyosawa, Hidenori; Okumura, Akio; Okui, Saya; Ushida, Chisato; Kawai, Gota

    2015-08-01

    In order to find novel structured small RNAs, next-generation sequencing was applied to small RNA fractions with lengths ranging from 40 to 140 nt and secondary structure-based clustering was performed. Sequences of structured RNAs were effectively clustered and analyzed by secondary structure. Although more than 99% of the obtained sequences were known RNAs, 16 candidate mouse structured small non-coding RNAs (MsncRs) were isolated. Based on these results, the merits of secondary structure-based analysis are discussed. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. Human factors review for Severe Accident Sequence Analysis (SASA)

    International Nuclear Information System (INIS)

    Krois, P.A.; Haas, P.M.; Manning, J.J.; Bovell, C.R.

    1984-01-01

    The paper will discuss work being conducted during this human factors review including: (1) support of the Severe Accident Sequence Analysis (SASA) Program based on an assessment of operator actions, and (2) development of a descriptive model of operator severe accident management. Research by SASA analysts on the Browns Ferry Unit One (BF1) anticipated transient without scram (ATWS) was supported through a concurrent assessment of operator performance to demonstrate contributions to SASA analyses from human factors data and methods. A descriptive model was developed called the Function Oriented Accident Management (FOAM) model, which serves as a structure for bridging human factors, operations, and engineering expertise and which is useful for identifying needs/deficiencies in the area of accident management. The assessment of human factors issues related to ATWS required extensive coordination with SASA analysts. The analysis was consolidated primarily to six operator actions identified in the Emergency Procedure Guidelines (EPGs) as being the most critical to the accident sequence. These actions were assessed through simulator exercises, qualitative reviews, and quantitative human reliability analyses. The FOAM descriptive model assumes as a starting point that multiple operator/system failures exceed the scope of procedures and necessitates a knowledge-based emergency response by the operators. The FOAM model provides a functionally-oriented structure for assembling human factors, operations, and engineering data and expertise into operator guidance for unconventional emergency responses to mitigate severe accident progression and avoid/minimize core degradation. Operators must also respond to potential radiological release beyond plant protective barriers. Research needs in accident management and potential uses of the FOAM model are described. 11 references, 1 figure

  3. Arguments for a Cluster Analysis of Nasal Consonant Sequences of ...

    African Journals Online (AJOL)

    Bantu language scholars, have among other things, debated over the issue of whether nasal and consonant sequences (NC sequences) in various Bantu languages should be considered as clusters or single segments (prenasalised stops). This paper examines these sequences as they occur in Sukwa nouns. Sukwa is a ...

  4. Design and Analysis of Single-Cell Sequencing Experiments

    NARCIS (Netherlands)

    Grün, Dominic; van Oudenaarden, Alexander

    2015-01-01

    Recent advances in single-cell sequencing hold great potential for exploring biological systems with unprecedented resolution. Sequencing the genome of individual cells can reveal somatic mutations and allows the investigation of clonal dynamics. Single-cell transcriptome sequencing can elucidate

  5. Chimera: construction of chimeric sequences for phylogenetic analysis

    NARCIS (Netherlands)

    Leunissen, J.A.M.

    2003-01-01

    Chimera allows the construction of chimeric protein or nucleic acid sequence files by concatenating sequences from two or more sequence files in PHYLIP formats. It allows the user to interactively select genes and species from the input files. The concatenated result is stored to one single output

  6. Sequence analysis of cereal sucrose synthase genes and isolation ...

    African Journals Online (AJOL)

    SERVER

    2007-10-18

    Oct 18, 2007 ... sequencing of sucrose synthase gene fragment from sor- ghum using primers designed at their conserved exons. MATERIALS AND METHODS. Multiple sequence alignment. Sucrose synthase gene sequences of various cereals like rice, maize, and barley were accessed from NCBI Genbank database.

  7. Analysis of expressed sequence tags derived from inflorescence ...

    African Journals Online (AJOL)

    PRECIOUS

    2009-11-02

    Nov 2, 2009 ... genetically distant organisms. Clones could be isolated and partially sequenced by the thousands with the improvements in DNA sequencing technology. High- throughput DNA sequencing has greatly reduced both the cost and time involved in obtaining large ESTs data sets. There are several applications ...

  8. Accident Sequence Evaluation Program: Human reliability analysis procedure

    International Nuclear Information System (INIS)

    Swain, A.D.

    1987-02-01

    This document presents a shortened version of the procedure, models, and data for human reliability analysis (HRA) which are presented in the Handbook of Human Reliability Analysis With emphasis on Nuclear Power Plant Applications (NUREG/CR-1278, August 1983). This shortened version was prepared and tried out as part of the Accident Sequence Evaluation Program (ASEP) funded by the US Nuclear Regulatory Commission and managed by Sandia National Laboratories. The intent of this new HRA procedure, called the ''ASEP HRA Procedure,'' is to enable systems analysts, with minimal support from experts in human reliability analysis, to make estimates of human error probabilities and other human performance characteristics which are sufficiently accurate for many probabilistic risk assessments. The ASEP HRA Procedure consists of a Pre-Accident Screening HRA, a Pre-Accident Nominal HRA, a Post-Accident Screening HRA, and a Post-Accident Nominal HRA. The procedure in this document includes changes made after tryout and evaluation of the procedure in four nuclear power plants by four different systems analysts and related personnel, including human reliability specialists. The changes consist of some additional explanatory material (including examples), and more detailed definitions of some of the terms. 42 refs

  9. Accident Sequence Evaluation Program: Human reliability analysis procedure

    Energy Technology Data Exchange (ETDEWEB)

    Swain, A.D.

    1987-02-01

    This document presents a shortened version of the procedure, models, and data for human reliability analysis (HRA) which are presented in the Handbook of Human Reliability Analysis With emphasis on Nuclear Power Plant Applications (NUREG/CR-1278, August 1983). This shortened version was prepared and tried out as part of the Accident Sequence Evaluation Program (ASEP) funded by the US Nuclear Regulatory Commission and managed by Sandia National Laboratories. The intent of this new HRA procedure, called the ''ASEP HRA Procedure,'' is to enable systems analysts, with minimal support from experts in human reliability analysis, to make estimates of human error probabilities and other human performance characteristics which are sufficiently accurate for many probabilistic risk assessments. The ASEP HRA Procedure consists of a Pre-Accident Screening HRA, a Pre-Accident Nominal HRA, a Post-Accident Screening HRA, and a Post-Accident Nominal HRA. The procedure in this document includes changes made after tryout and evaluation of the procedure in four nuclear power plants by four different systems analysts and related personnel, including human reliability specialists. The changes consist of some additional explanatory material (including examples), and more detailed definitions of some of the terms. 42 refs.

  10. Meta-analysis of small RNA-sequencing errors reveals ubiquitous post-transcriptional RNA modifications

    OpenAIRE

    Ebhardt, H. Alexander; Tsang, Herbert H.; Dai, Denny C.; Liu, Yifeng; Bostan, Babak; Fahlman, Richard P.

    2009-01-01

    Recent advances in DNA-sequencing technology have made it possible to obtain large datasets of small RNA sequences. Here we demonstrate that not all non-perfectly matched small RNA sequences are simple technological sequencing errors, but many hold valuable biological information. Analysis of three small RNA datasets originating from Oryza sativa and Arabidopsis thaliana small RNA-sequencing projects demonstrates that many single nucleotide substitution errors overlap when aligning homologous...

  11. De novo transcriptome assembly of Zanthoxylum bungeanum using Illumina sequencing for evolutionary analysis and simple sequence repeat marker development.

    Science.gov (United States)

    Feng, Shijing; Zhao, Lili; Liu, Zhenshan; Liu, Yulin; Yang, Tuxi; Wei, Anzhi

    2017-12-01

    Zanthoxylum, an ancient economic crop in Asia, has a satisfying aromatic taste and immense medicinal values. A lack of genomic information and genetic markers has limited the evolutionary analysis and genetic improvement of Zanthoxylum species and their close relatives. To better understand the evolution, domestication, and divergence of Zanthoxylum, we present a de novo transcriptome analysis of an elite cultivar of Z. bungeanum using Illumina sequencing; we then developed simple sequence repeat markers for identification of Zanthoxylum. In total, we predicted 45,057 unigenes and 22,212 protein coding sequences, approximately 90% of which showed significant similarities to known proteins in databases. Phylogenetic analysis indicated that Zanthoxylum is relatively recent and estimated to have diverged from Citrus ca. 36.5-37.7 million years ago. We also detected a whole-genome duplication event in Zanthoxylum that occurred 14 million years ago. We found no protein coding sequences that were significantly under positive selection by Ka/Ks. Simple sequence repeat analysis divided 31 Zanthoxylum cultivars and landraces into three major groups. This Zanthoxylum reference transcriptome provides crucial information for the evolutionary study of the Zanthoxylum genus and the Rutaceae family, and facilitates the establishment of more effective Zanthoxylum breeding programs.

  12. RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

    Science.gov (United States)

    Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

    2012-01-01

    RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.

  13. Transcriptome sequencing and positive selected genes analysis of Bombyx mandarina.

    Directory of Open Access Journals (Sweden)

    Tingcai Cheng

    Full Text Available The wild silkworm Bombyx mandarina is widely believed to be an ancestor of the domesticated silkworm, Bombyx mori. Silkworms are often used as a model for studying the mechanism of species domestication. Here, we performed transcriptome sequencing of the wild silkworm using an Illumina HiSeq2000 platform. We produced 100,004,078 high-quality reads and assembled them into 50,773 contigs with an N50 length of 1764 bp and a mean length of 941.62 bp. A total of 33,759 unigenes were identified, with 12,805 annotated in the Nr database, 8273 in the Pfam database, and 9093 in the Swiss-Prot database. Expression profile analysis found significant differential expression of 1308 unigenes between the middle silk gland (MSG and posterior silk gland (PSG. Three sericin genes (sericin 1, sericin 2, and sericin 3 were expressed specifically in the MSG and three fibroin genes (fibroin-H, fibroin-L, and fibroin/P25 were expressed specifically in the PSG. In addition, 32,297 Single-nucleotide polymorphisms (SNPs and 361 insertion-deletions (INDELs were detected. Comparison with the domesticated silkworm p50/Dazao identified 5,295 orthologous genes, among which 400 might have experienced or to be experiencing positive selection by Ka/Ks analysis. These data and analyses presented here provide insights into silkworm domestication and an invaluable resource for wild silkworm genomics research.

  14. Database-driven primary analysis of raw sequencing data

    DEFF Research Database (Denmark)

    2014-01-01

    The present invention relates to methods for identifying the source of a biological sequence containing sample from raw sequencing reads. The method may be used to identify the source of unknown DNA and can be used for diagnostic, biodefense, food safety and quality, and hygiene applications....... In another aspect the invention relates to a database of reference sequences which can be used in the method of the invention....

  15. Whole genome sequence analysis of unidentified genetically modified papaya for development of a specific detection method.

    Science.gov (United States)

    Nakamura, Kosuke; Kondo, Kazunari; Akiyama, Hiroshi; Ishigaki, Takumi; Noguchi, Akio; Katsumata, Hiroshi; Takasaki, Kazuto; Futo, Satoshi; Sakata, Kozue; Fukuda, Nozomi; Mano, Junichi; Kitta, Kazumi; Tanaka, Hidenori; Akashi, Ryo; Nishimaki-Mogami, Tomoko

    2016-08-15

    Identification of transgenic sequences in an unknown genetically modified (GM) papaya (Carica papaya L.) by whole genome sequence analysis was demonstrated. Whole genome sequence data were generated for a GM-positive fresh papaya fruit commodity detected in monitoring using real-time polymerase chain reaction (PCR). The sequences obtained were mapped against an open database for papaya genome sequence. Transgenic construct- and event-specific sequences were identified as a GM papaya developed to resist infection from a Papaya ringspot virus. Based on the transgenic sequences, a specific real-time PCR detection method for GM papaya applicable to various food commodities was developed. Whole genome sequence analysis enabled identifying unknown transgenic construct- and event-specific sequences in GM papaya and development of a reliable method for detecting them in papaya food commodities. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. Feature Extraction From DNA Sequences by Multifractal Analysis

    National Research Council Canada - National Science Library

    Zhang, H

    2001-01-01

    This paper presents feature extraction and estimation of multifractal measures of DNA sequences using a multifractal methodology and demonstrates a new scheme for identifying biological functionality...

  17. Sequence and expression analysis of gaps in human chromosome 20

    DEFF Research Database (Denmark)

    Minocherhomji, Sheroy; Seemann, Stefan; Mang, Yuan

    2012-01-01

    /or overlap disease-associated loci, including the DLGAP4 locus. In this study, we sequenced ~99% of all three unfinished gaps on human chr 20, determined their complete genomic sizes and assessed epigenetic profiles using a combination of Sanger sequencing, mate pair paired-end high-throughput sequencing......The finished human genome-assemblies comprise several hundred un-sequenced euchromatic gaps, which may be rich in long polypurine/polypyrimidine stretches. Human chromosome 20 (chr 20) currently has three unfinished gaps remaining on its q-arm. All three gaps are within gene-dense regions and...

  18. A DNA Structure-Based Bionic Wavelet Transform and Its Application to DNA Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Fei Chen

    2003-01-01

    Full Text Available DNA sequence analysis is of great significance for increasing our understanding of genomic functions. An important task facing us is the exploration of hidden structural information stored in the DNA sequence. This paper introduces a DNA structure-based adaptive wavelet transform (WT – the bionic wavelet transform (BWT – for DNA sequence analysis. The symbolic DNA sequence can be separated into four channels of indicator sequences. An adaptive symbol-to-number mapping, determined from the structural feature of the DNA sequence, was introduced into WT. It can adjust the weight value of each channel to maximise the useful energy distribution of the whole BWT output. The performance of the proposed BWT was examined by analysing synthetic and real DNA sequences. Results show that BWT performs better than traditional WT in presenting greater energy distribution. This new BWT method should be useful for the detection of the latent structural features in future DNA sequence analysis.

  19. Structural analysis of DNA sequence: evidence for lateral gene transfer in Thermotoga maritima

    DEFF Research Database (Denmark)

    Worning, Peder; Jensen, Lars Juhl; Nelson, K. E.

    2000-01-01

    The recently published complete DNA sequence of the bacterium Thermotoga maritima provides evidence, based on protein sequence conservation, for lateral gene transfer between Archaea and Bacteria. We introduce a new method of periodicity analysis of DNA sequences, based on structural parameters......, which brings independent evidence for the lateral gene transfer in the genome of T.maritima, The structural analysis relates the Archaea-like DNA sequences to the genome of Pyrococcus horikoshii. Analysis of 24 complete genomic DNA sequences shows different periodicity patterns for organisms...

  20. An Analysis of Medium Loss of Coolant Sequence for the Severe Accident Analysis DB

    Energy Technology Data Exchange (ETDEWEB)

    Park, Soo Yong; Song, Yong Mann

    2007-12-15

    This report contains analysis methodologies and calculation results of medium loss of Coolant sequences for the severe accident analysis database system. The Korean standard nuclear power plant has been selected as a reference plant. Based on the probabilistic safety analysis of the corresponding plant, 10 accident scenarios, which was predicted to have more than 10{sup -10} /ry occurrence frequency, have been analyzed as base cases for the medium loss of Coolant sequence database. The functions of the severe accident analysis database system will be to make a diagnosis of the accident by some input information from the plant symptoms, to search a corresponding scenario, and finally to provide the user phenomenological information based on the pre-analyzed results. The MAAP 4.06 calculation results in this report will be utilized as input data of the severe accident analysis database system.

  1. An Analysis of Large Loss of Coolant Sequence for the Severe Accident Analysis DB

    Energy Technology Data Exchange (ETDEWEB)

    Park, Soo Yong; Song, Yong Mann

    2007-12-15

    This report contains analysis methodologies and calculation results of Large loss of Coolant sequences for the severe accident analysis database system. The Korean standard nuclear power plant has been selected as a reference plant. Based on the probabilistic safety analysis of the corresponding plant, 14 accident scenarios, which was predicted to have more than 10{sup -10} /ry occurrence frequency, have been analyzed as base cases for the Large loss of Coolant sequence database. The functions of the severe accident analysis database system will be to make a diagnosis of the accident by some input information from the plant symptoms, to search a corresponding scenario, and finally to provide the user phenomenological information based on the pre-analyzed results. The MAAP 4.06 calculation results in this report will be utilized as input data of the severe accident analysis database system.

  2. An Analysis of Small Loss of Coolant Sequence for the Severe Accident Analysis Database

    Energy Technology Data Exchange (ETDEWEB)

    Park, Soo Yong; Song, Yong Mann

    2007-12-15

    This report contains analysis methodologies and calculation results of small loss of Coolant sequences for the severe accident analysis database system. The Korean standard nuclear power plant has been selected as a reference plant. Based on the probabilistic safety analysis of the corresponding plant, 10 accident scenarios, which was predicted to have more than 10{sup -9} /ry occurrence frequency, have been analyzed as base cases for the small loss of Coolant sequence database. The functions of the severe accident analysis database system will be to make a diagnosis of the accident by some input information from the plant symptoms, to search a corresponding scenario, and finally to provide the user phenomenological information based on the pre-analyzed results. The MAAP 4.06 calculation results in this report will be utilized as input data of the severe accident analysis database system.

  3. Sequence and comparative analysis of Leuconostoc dairy bacteriophages.

    Science.gov (United States)

    Kot, Witold; Hansen, Lars H; Neve, Horst; Hammer, Karin; Jacobsen, Susanne; Pedersen, Per D; Sørensen, Søren J; Heller, Knut J; Vogensen, Finn K

    2014-04-17

    Bacteriophages attacking Leuconostoc species may significantly influence the quality of the final product. There is however limited knowledge of this group of phages in the literature. We have determined the complete genome sequences of nine Leuconostoc bacteriophages virulent to either Leuconostoc mesenteroides or Leuconostoc pseudomesenteroides strains. The phages have dsDNA genomes with sizes ranging from 25.7 to 28.4 kb. Comparative genomics analysis helped classify the 9 phages into two classes, which correlates with the host species. High percentage of similarity within the classes on both nucleotide and protein levels was observed. Genome comparison also revealed very high conservation of the overall genomic organization between the classes. The genes were organized in functional modules responsible for replication, packaging, head and tail morphogenesis, cell lysis and regulation and modification, respectively. No lysogeny modules were detected. To our knowledge this report provides the first comparative genomic work done on Leuconostoc dairy phages. Copyright © 2014 Elsevier B.V. All rights reserved.

  4. Analysis of expressed sequence tags from the Ulva prolifera (Chlorophyta)

    Science.gov (United States)

    Niu, Jianfeng; Hu, Haiyan; Hu, Songnian; Wang, Guangce; Peng, Guang; Sun, Song

    2010-01-01

    In 2008, a green tide broke out before the sailing competition of the 29th Olympic Games in Qingdao. The causative species was determined to be Enteromorpha prolifera ( Ulva prolifera O. F. Müller), a familiar green macroalga along the coastline of China. Rapid accumulation of a large biomass of floating U. prolifera prompted research on different aspects of this species. In this study, we constructed a nonnormalized cDNA library from the thalli of U. prolifera and acquired 10 072 high-quality expressed sequence tags (ESTs). These ESTs were assembled into 3 519 nonredundant gene groups, including 1 446 clusters and 2 073 singletons. After annotation with the nr database, a large number of genes were found to be related with chloroplast and ribosomal protein, GO functional classification showed 1 418 ESTs participated in photosynthesis and 1 359 ESTs were responsible for the generation of precursor metabolites and energy. In addition, rather comprehensive carbon fixation pathways were found in U. prolifera using KEGG. Some stress-related and signal transduction-related genes were also found in this study. All the evidences displayed that U. prolifera had substance and energy foundation for the intense photosynthesis and the rapid proliferation. Phylogenetic analysis of cytochrome c oxidase subunit I revealed that this green-tide causative species is most closely affiliated to Pseudendoclonium akinetum (Ulvophyceae).

  5. The sequence and analysis of a Chinese pig genome

    Directory of Open Access Journals (Sweden)

    Fang Xiaodong

    2012-11-01

    Full Text Available Abstract Background The pig is an economically important food source, amounting to approximately 40% of all meat consumed worldwide. Pigs also serve as an important model organism because of their similarity to humans at the anatomical, physiological and genetic level, making them very useful for studying a variety of human diseases. A pig strain of particular interest is the miniature pig, specifically the Wuzhishan pig (WZSP, as it has been extensively inbred. Its high level of homozygosity offers increased ease for selective breeding for specific traits and a more straightforward understanding of the genetic changes that underlie its biological characteristics. WZSP also serves as a promising means for applications in surgery, tissue engineering, and xenotransplantation. Here, we report the sequencing and analysis of an inbreeding WZSP genome. Results Our results reveal some unique genomic features, including a relatively high level of homozygosity in the diploid genome, an unusual distribution of heterozygosity, an over-representation of tRNA-derived transposable elements, a small amount of porcine endogenous retrovirus, and a lack of type C retroviruses. In addition, we carried out systematic research on gene evolution, together with a detailed investigation of the counterparts of human drug target genes. Conclusion Our results provide the opportunity to more clearly define the genomic character of pig, which could enhance our ability to create more useful pig models.

  6. The sequence and analysis of Trypanosoma brucei chromosome II

    Science.gov (United States)

    El-Sayed, Najib M. A.; Ghedin, Elodie; Song, Jinming; MacLeod, Annette; Bringaud, Frederic; Larkin, Christopher; Wanless, David; Peterson, Jeremy; Hou, Lihua; Taylor, Sonya; Tweedie, Alison; Biteau, Nicolas; Khalak, Hanif G.; Lin, Xiaoying; Mason, Tanya; Hannick, Linda; Caler, Elisabet; Blandin, Gaëlle; Bartholomeu, Daniella; Simpson, Anjana J.; Kaul, Samir; Zhao, Hong; Pai, Grace; Aken, Susan Van; Utterback, Teresa; Haas, Brian; Koo, Hean L.; Umayam, Lowell; Suh, Bernard; Gerrard, Caroline; Leech, Vanessa; Qi, Rong; Zhou, Shiguo; Schwartz, David; Feldblyum, Tamara; Salzberg, Steven; Tait, Andrew; Turner, C. Michael R.; Ullu, Elisabetta; White, Owen; Melville, Sara; Adams, Mark D.; Fraser, Claire M.; Donelson, John E.

    2003-01-01

    We report here the sequence of chromosome II from Trypanosoma brucei, the causative agent of African sleeping sickness. The 1.2-Mb pairs encode about 470 predicted genes organised in 17 directional clusters on either strand, the largest cluster of which has 92 genes lined up over a 284-kb region. An analysis of the GC skew reveals strand compositional asymmetries that coincide with the distribution of protein-coding genes, suggesting these asymmetries may be the result of transcription-coupled repair on coding versus non-coding strand. A 5-cM genetic map of the chromosome reveals recombinational ‘hot’ and ‘cold’ regions, the latter of which is predicted to include the putative centromere. One end of the chromosome consists of a 250-kb region almost exclusively composed of RHS (pseudo)genes that belong to a newly characterised multigene family containing a hot spot of insertion for retroelements. Interspersed with the RHS genes are a few copies of truncated RNA polymerase pseudogenes as well as expression site associated (pseudo)genes (ESAGs) 3 and 4, and 76 bp repeats. These features are reminiscent of a vestigial variant surface glycoprotein (VSG) gene expression site. The other end of the chromosome contains a 30-kb array of VSG genes, the majority of which are pseudogenes, suggesting that this region may be a site for modular de novo construction of VSG gene diversity during transposition/gene conversion events. PMID:12907728

  7. Characterization and sequence analysis of cysteine and glycine-rich ...

    African Journals Online (AJOL)

    Tarek

    2011-04-18

    Apr 18, 2011 ... 3056 Afr. J. Biotechnol. Table 1. DNA sequence of the primers tested. Name. Sequence. Accession no. ... The lack of contaminating genomic DNA was checked out by monitoring negative polymerase chain ..... variation between human and chimpanzee. Genome Res. 15: 1344-. 1356. Pomies P, Louis HA, ...

  8. Sequencing and analysis of an Irish human genome.

    LENUS (Irish Health Repository)

    Tong, Pin

    2010-01-01

    Recent studies generating complete human sequences from Asian, African and European subgroups have revealed population-specific variation and disease susceptibility loci. Here, choosing a DNA sample from a population of interest due to its relative geographical isolation and genetic impact on further populations, we extend the above studies through the generation of 11-fold coverage of the first Irish human genome sequence.

  9. Illumina-based de novotranscriptome sequencing and analysis of ...

    Indian Academy of Sciences (India)

    ZHONGXIAN XU

    2017-12-18

    Dec 18, 2017 ... Next-generation sequencing technique is an efficient method for generating an enormous amount of sequence data that can represent a large number of genes and their expression levels. In the present study, we used Illumina HiSeq technology to perform de novo assembly of heart and musk gland.

  10. Sequence stratigraphy and structural analysis of the Emi field ...

    African Journals Online (AJOL)

    Highstand system tract and transgressive system tract were identified within the depositional sequences. Marker shales, characterized by index fossils Haplophramoides-24 and Bolivina-46, were used to date the key bounding surfaces with the aid of the Niger Delta chronostratigraphic chart. Ages assigned to the sequence ...

  11. CSReport: A New Computational Tool Designed for Automatic Analysis of Class Switch Recombination Junctions Sequenced by High-Throughput Sequencing.

    Science.gov (United States)

    Boyer, François; Boutouil, Hend; Dalloul, Iman; Dalloul, Zeinab; Cook-Moreau, Jeanne; Aldigier, Jean-Claude; Carrion, Claire; Herve, Bastien; Scaon, Erwan; Cogné, Michel; Péron, Sophie

    2017-05-15

    B cells ensure humoral immune responses due to the production of Ag-specific memory B cells and Ab-secreting plasma cells. In secondary lymphoid organs, Ag-driven B cell activation induces terminal maturation and Ig isotype class switch (class switch recombination [CSR]). CSR creates a virtually unique IgH locus in every B cell clone by intrachromosomal recombination between two switch (S) regions upstream of each C region gene. Amount and structural features of CSR junctions reveal valuable information about the CSR mechanism, and analysis of CSR junctions is useful in basic and clinical research studies of B cell functions. To provide an automated tool able to analyze large data sets of CSR junction sequences produced by high-throughput sequencing (HTS), we designed CSReport, a software program dedicated to support analysis of CSR recombination junctions sequenced with a HTS-based protocol (Ion Torrent technology). CSReport was assessed using simulated data sets of CSR junctions and then used for analysis of Sμ-Sα and Sμ-Sγ1 junctions from CH12F3 cells and primary murine B cells, respectively. CSReport identifies junction segment breakpoints on reference sequences and junction structure (blunt-ended junctions or junctions with insertions or microhomology). Besides the ability to analyze unprecedentedly large libraries of junction sequences, CSReport will provide a unified framework for CSR junction studies. Our results show that CSReport is an accurate tool for analysis of sequences from our HTS-based protocol for CSR junctions, thereby facilitating and accelerating their study. Copyright © 2017 by The American Association of Immunologists, Inc.

  12. A genome-wide analysis of lentivector integration sites using targeted sequence capture and next generation sequencing technology.

    Science.gov (United States)

    Ustek, Duran; Sirma, Sema; Gumus, Ergun; Arikan, Muzaffer; Cakiris, Aris; Abaci, Neslihan; Mathew, Jaicy; Emrence, Zeliha; Azakli, Hulya; Cosan, Fulya; Cakar, Atilla; Parlak, Mahmut; Kursun, Olcay

    2012-10-01

    One application of next-generation sequencing (NGS) is the targeted resequencing of interested genes which has not been used in viral integration site analysis of gene therapy applications. Here, we combined targeted sequence capture array and next generation sequencing to address the whole genome profiling of viral integration sites. Human 293T and K562 cells were transduced with a HIV-1 derived vector. A custom made DNA probe sets targeted pLVTHM vector used to capture lentiviral vector/human genome junctions. The captured DNA was sequenced using GS FLX platform. Seven thousand four hundred and eighty four human genome sequences flanking the long terminal repeats (LTR) of pLVTHM fragment sequences matched with an identity of at least 98% and minimum 50 bp criteria in both cells. In total, 203 unique integration sites were identified. The integrations in both cell lines were totally distant from the CpG islands and from the transcription start sites and preferentially located in introns. A comparison between the two cell lines showed that the lentiviral-transduced DNA does not have the same preferred regions in the two different cell lines. Copyright © 2012 Elsevier B.V. All rights reserved.

  13. An auditory display tool for DNA sequence analysis.

    Science.gov (United States)

    Temple, Mark D

    2017-04-24

    DNA Sonification refers to the use of an auditory display to convey the information content of DNA sequence data. Six sonification algorithms are presented that each produce an auditory display. These algorithms are logically designed from the simple through to the more complex. Three of these parse individual nucleotides, nucleotide pairs or codons into musical notes to give rise to 4, 16 or 64 notes, respectively. Codons may also be parsed degenerately into 20 notes with respect to the genetic code. Lastly nucleotide pairs can be parsed as two separate frames or codons can be parsed as three reading frames giving rise to multiple streams of audio. The most informative sonification algorithm reads the DNA sequence as codons in three reading frames to produce three concurrent streams of audio in an auditory display. This approach is advantageous since start and stop codons in either frame have a direct affect to start or stop the audio in that frame, leaving the other frames unaffected. Using these methods, DNA sequences such as open reading frames or repetitive DNA sequences can be distinguished from one another. These sonification tools are available through a webpage interface in which an input DNA sequence can be processed in real time to produce an auditory display playable directly within the browser. The potential of this approach as an analytical tool is discussed with reference to auditory displays derived from test sequences including simple nucleotide sequences, repetitive DNA sequences and coding or non-coding genes. This study presents a proof-of-concept that some properties of a DNA sequence can be identified through sonification alone and argues for their inclusion within the toolkit of DNA sequence browsers as an adjunct to existing visual and analytical tools.

  14. Identification and sequence analysis of Tapasin gene in guinea fowl

    Directory of Open Access Journals (Sweden)

    Varuna P. Panicker

    2014-12-01

    Full Text Available Aim: An attempt has been made to identify and study the nucleotide sequence variability in exon 5 - exon 6 regions of guinea fowl Tapasin gene. Materials and Methods: Blood samples were collected from randomly selected birds (12 guinea fowl birds and Tapasin gene amplified using chicken specific primers designed from GenBank submitted sequences. Polymerase chain reaction conditions were standardized so as get only single amplicons. Obtained products were then cloned and sequenced; sequences were then analyzed using suitable software. Results: Amplicon size of the Tapasin gene in guinea fowl was same as reported in chicken with areas of transitions and transversions. The sequence variations reported in these coding sequences might have influence in the protein structure, which may be correlated with the increased immune status of the bird when compared with chicken breeds. Conclusion: Since Tapasin gene is an immunologically important gene, which plays an important role in the immune status of the bird. Sequence variations in the gene can be correlated with the altered immune status of the bird.

  15. Event Sequence Analysis of the Air Intelligence Agency Information Operations Center Flight Operations

    National Research Council Canada - National Science Library

    Larsen, Glen

    1998-01-01

    This report applies Event Sequence Analysis, methodology adapted from aircraft mishap investigation, to an investigation of the performance of the Air Intelligence Agency's Information Operations Center (IOC...

  16. Integrating bio-, chemo- and sequence stratigraphy of the Late Ordovician, Early Katian: A connection between onshore and offshore facies using carbon isotope analysis: Kentucky, Ohio, USA

    Science.gov (United States)

    Young, Allison; Brett, Carlton; McLaughlin, Patrick

    2017-04-01

    , and mineralized surfaces. They also contain well studied fossil assemblages and event beds, which at the scale of an outcrop, allow for detailed paleoenvironmental interpretation. The offshore record of this interval, known almost exclusively from a few drill cores, displays an abrupt transition to distal, siliciclastic dominated facies, recording a more dysoxic and organic rich interval. Internal correlation of these shales has relied mostly on limited graptolite biostratigraphic and geochemical analysis. Here we seek to establish age relationships across a major facies transition between these two interrelated paleoenvironmental settings using high resolution whole rock carbon isotope analysis to integrate new and previous work on lithostratigraphy, biostratigraphy, and sequence stratigraphy of a series of cores and outcrops. Results to date demonstrate the persistence of carbon isotopic patterns (including the globally recognized GICE positive carbon isotopic excursion) permitting extension of correlation into basinal facies where tracking of stratigraphic sequences becomes difficult. A complicated relationship across the region is emerging involving both rapid facies transitions and submarine erosional cutout of units toward the center of the Sebree Trough. This study demonstrates the utility of an integrated stratigraphic approach for establishing high resolution regional correlations allowing for interpretations across a major facies transitions.

  17. Sequence analysis of Schmallenberg virus genomes detected in Hungary.

    Science.gov (United States)

    Fehér, Enikő; Marton, Szilvia; Tóth, Ádám György; Ursu, Krisztina; Wernike, Kerstin; Beer, Martin; Dán, Ádám; Bányai, Krisztián

    2017-12-01

    Since its emergence near the German-Dutch border in 2011, Schmallenberg virus (SBV) has been identified in many European countries. In this study, we determined the complete coding sequence of seven Hungarian SBV genomes to expand our knowledge about the genetic diversity of circulating field strains. The samples originated from the first case, an aborted cattle fetus without malformation collected in 2012, and from the blood samples of six adult cattle in 2014. The Hungarian SBV sequences shared ≥99.3% nucleotide (nt) and ≥97.8% amino acid (aa) identity with each other, and ≥98.9 nt and ≥96.7% aa identity with reference strains. Although phylogenetic analyses showed low resolution in general, the M sequences of cattle and sheep origin SBV strains seemed to cluster on different branches. Both common and unique mutation sites were observed in different groups of sequences that might help understanding the evolution of emerging SBV strains.

  18. simple sequence repeat (SSR) markers in genetic analysis of

    African Journals Online (AJOL)

    Yomi

    2012-08-28

    1998). Cross- species amplification of soybean (Glycine max) simple sequence repeats (SSRs) within the genus and other legume genera: implications for the transferability of SSRs in plants. Mol. Biol. Evol. 15:1275-1287.

  19. USE OF NEXT-GENERATION SEQUENCING FOR GENOMIC ANALYSIS IN COMPLEX DISEASES

    OpenAIRE

    Sana, Maria Elena

    2013-01-01

    Since the early 1990s, Sanger method has been the gold standard methodology for sequencing analysis of DNA. Next-generation sequencing (NGS) approaches revolutionized the field of genomics over the last 5 years. These new sequencing technologies make feasible the direct and cost-effective sequencing of genomes at unprecedented scale and speed. Furthermore, the applications of these technologies are wide-spread and have been developed to explore the complex biological systems, among which RNA ...

  20. Cloning and sequence analysis of lily and tobacco guanylate kinases.

    Science.gov (United States)

    Kumar, V

    2000-03-01

    Guanylate kinase is an essential enzyme in the nucleotide biosynthetic pathway, catalyzing the reversible transfer of the terminal phosphoryl group of ATP to GMP or dGMP. This enzyme has been well studied from several organisms and many structural and functional details have been characterized. Animal GMP kinases have also been implicated in signal transduction pathways. However, the corresponding role by plant derived GMP kinases remains to be elucidated. Full-length cDNA clones encoding enzymatically active guanylate kinases were isolated from cDNA libraries of lily and tobacco. Lily cDNA is predicted to encode a 392-amino acid protein with a molecular mass of 43.1 kDa and carries amino- and carboxy- terminal extensions of the guanylate kinase (GK)-like domain. But tobacco cDNA is predicted to encode a smaller protein of 297-amino acids with a molecular mass of 32.7 kDa. The amino acid residues known to participate in the catalytic activity of functionally characterized GMP kinases, are also conserved in GK domains of LGK-1 and NGK-1. The GK domains of NGK-1, LGK-1 and previously characterized AGK-1 from Arabidopsis exhibit 74-84% identity, whereas their N- and C-terminal domains are more divergent with amino acid conservation in the order of 48-55%. Phylogenetic analysis on the deduced amino acid sequences reveals that NGK-1 and LGK-1 form one distinct subgroup along with AGK-1 and AGK-2 homologues from Arabidopsis. Isolation of GMP kinases from diverse plant species like lily and tobacco adds a new dimension in understanding their role in cell signaling pathways that are associated with plant growth and development.

  1. Genome sequence and analysis of the tuber crop potato

    DEFF Research Database (Denmark)

    Xu, X.; Pan, S.; Cheng, S.

    2011-01-01

    and assemble 86% of the 844-megabase genome. We predict 39,031 protein-coding genes and present evidence for at least two genome duplication events indicative of a palaeopolyploid origin. As the first genome sequence of an asterid, the potato genome reveals 2,642 genes specific to this large angiosperm clade...... contributed to the evolution of tuber development. The potato genome sequence provides a platform for genetic improvement of this vital crop....

  2. Analysis of 16S rRNA amplicon sequencing options on the Roche/454 next-generation titanium sequencing platform.

    Directory of Open Access Journals (Sweden)

    Hideyuki Tamaki

    Full Text Available BACKGROUND: 16S rRNA gene pyrosequencing approach has revolutionized studies in microbial ecology. While primer selection and short read length can affect the resulting microbial community profile, little is known about the influence of pyrosequencing methods on the sequencing throughput and the outcome of microbial community analyses. The aim of this study is to compare differences in output, ease, and cost among three different amplicon pyrosequencing methods for the Roche/454 Titanium platform METHODOLOGY/PRINCIPAL FINDINGS: The following three pyrosequencing methods for 16S rRNA genes were selected in this study: Method-1 (standard method is the recommended method for bi-directional sequencing using the LIB-A kit; Method-2 is a new option designed in this study for unidirectional sequencing with the LIB-A kit; and Method-3 uses the LIB-L kit for unidirectional sequencing. In our comparison among these three methods using 10 different environmental samples, Method-2 and Method-3 produced 1.5-1.6 times more useable reads than the standard method (Method-1, after quality-based trimming, and did not compromise the outcome of microbial community analyses. Specifically, Method-3 is the most cost-effective unidirectional amplicon sequencing method as it provided the most reads and required the least effort in consumables management. CONCLUSIONS: Our findings clearly demonstrated that alternative pyrosequencing methods for 16S rRNA genes could drastically affect sequencing output (e.g. number of reads before and after trimming but have little effect on the outcomes of microbial community analysis. This finding is important for both researchers and sequencing facilities utilizing 16S rRNA gene pyrosequencing for microbial ecological studies.

  3. First fungal genome sequence from Africa: A preliminary analysis

    Directory of Open Access Journals (Sweden)

    Rene Sutherland

    2012-01-01

    Full Text Available Some of the most significant breakthroughs in the biological sciences this century will emerge from the development of next generation sequencing technologies. The ease of availability of DNA sequence made possible through these new technologies has given researchers opportunities to study organisms in a manner that was not possible with Sanger sequencing. Scientists will, therefore, need to embrace genomics, as well as develop and nurture the human capacity to sequence genomes and utilise the ’tsunami‘ of data that emerge from genome sequencing. In response to these challenges, we sequenced the genome of Fusarium circinatum, a fungal pathogen of pine that causes pitch canker, a disease of great concern to the South African forestry industry. The sequencing work was conducted in South Africa, making F. circinatum the first eukaryotic organism for which the complete genome has been sequenced locally. Here we report on the process that was followed to sequence, assemble and perform a preliminary characterisation of the genome. Furthermore, details of the computer annotation and manual curation of this genome are presented. The F. circinatum genome was found to be nearly 44 million bases in size, which is similar to that of four other Fusarium genomes that have been sequenced elsewhere. The genome contains just over 15 000 open reading frames, which is less than that of the related species, Fusarium oxysporum, but more than that for Fusarium verticillioides. Amongst the various putative gene clusters identified in F. circinatum, those encoding the secondary metabolites fumosin and fusarin appeared to harbour evidence of gene translocation. It is anticipated that similar comparisons of other loci will provide insights into the genetic basis for pathogenicity of the pitch canker pathogen. Perhaps more importantly, this project has engaged a relatively large group of scientists

  4. REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

    Directory of Open Access Journals (Sweden)

    Guy Leonard

    2009-01-01

    Full Text Available The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment fi le, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree fi les (with a user-defined combination of species name and/or database accession number. Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file and generation of species and accession number lists for use in supplementary materials or figure legends.

  5. REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

    Directory of Open Access Journals (Sweden)

    Guy Leonard

    2009-05-01

    Full Text Available The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment fi le, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree fi les (with a user-defined combination of species name and/or database accession number. Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file and generation of species and accession number lists for use in supplementary materials or figure legends.

  6. Sequencing and analysis of the Mediterranean amphioxus (Branchiostoma lanceolatum transcriptome.

    Directory of Open Access Journals (Sweden)

    Silvan Oulion

    Full Text Available BACKGROUND: The basally divergent phylogenetic position of amphioxus (Cephalochordata, as well as its conserved morphology, development and genetics, make it the best proxy for the chordate ancestor. Particularly, studies using the amphioxus model help our understanding of vertebrate evolution and development. Thus, interest for the amphioxus model led to the characterization of both the transcriptome and complete genome sequence of the American species, Branchiostoma floridae. However, recent technical improvements allowing induction of spawning in the laboratory during the breeding season on a daily basis with the Mediterranean species Branchiostoma lanceolatum have encouraged European Evo-Devo researchers to adopt this species as a model even though no genomic or transcriptomic data have been available. To fill this need we used the pyrosequencing method to characterize the B. lanceolatum transcriptome and then compared our results with the published transcriptome of B. floridae. RESULTS: Starting with total RNA from nine different developmental stages of B. lanceolatum, a normalized cDNA library was constructed and sequenced on Roche GS FLX (Titanium mode. Around 1.4 million of reads were produced and assembled into 70,530 contigs (average length of 490 bp. Overall 37% of the assembled sequences were annotated by BlastX and their Gene Ontology terms were determined. These results were then compared to genomic and transcriptomic data of B. floridae to assess similarities and specificities of each species. CONCLUSION: We obtained a high-quality amphioxus (B. lanceolatum reference transcriptome using a high throughput sequencing approach. We found that 83% of the predicted genes in the B. floridae complete genome sequence are also found in the B. lanceolatum transcriptome, while only 41% were found in the B. floridae transcriptome obtained with traditional Sanger based sequencing. Therefore, given the high degree of sequence conservation

  7. Exome Sequence Analysis of 14 Families With High Myopia

    DEFF Research Database (Denmark)

    Kloss, Bethany A.; Tompson, Stuart W.; Whisenhunt, Kristina N.

    2017-01-01

    Purpose: To identify causal gene mutations in 14 families with autosomal dominant (AD) high myopia using exome sequencing. Methods: Select individuals from 14 large Caucasian families with high myopia were exome sequenced. Gene variants were filtered to identify potential pathogenic changes. Sang...... implicated in the pathogenesis of AD high myopia. This study provides new genes for consideration in the pathogenesis of high myopia, and may aid in the development of genetic profiling of those at greatest risk for attendant ocular morbidities of this disorder.......Purpose: To identify causal gene mutations in 14 families with autosomal dominant (AD) high myopia using exome sequencing. Methods: Select individuals from 14 large Caucasian families with high myopia were exome sequenced. Gene variants were filtered to identify potential pathogenic changes. Sanger...... sequencing was used to confirm variants in original DNA, and to test for disease cosegregation in additional family members. Candidate genes and chromosomal loci previously associated with myopic refractive error and its endophenotypes were comprehensively screened. Results: In 14 high myopia families, we...

  8. Combining text mining and sequence analysis to discover protein functional regions.

    Science.gov (United States)

    Eskin, E; Agichtein, E

    2004-01-01

    Recently presented protein sequence classification models can identify relevant regions of the sequence. This observation has many potential applications to detecting functional regions of proteins. However, identifying such sequence regions automatically is difficult in practice, as relatively few types of information have enough annotated sequences to perform this analysis. Our approach addresses this data scarcity problem by combining text and sequence analysis. First, we train a text classifier over the explicit textual annotations available for some of the sequences in the dataset, and use the trained classifier to predict the class for the rest of the unlabeled sequences. We then train a joint sequence text classifier over the text contained in the functional annotations of the sequences, and the actual sequences in this larger, automatically extended dataset. Finally, we project the classifier onto the original sequences to determine the relevant regions of the sequences. We demonstrate the effectiveness of our approach by predicting protein sub-cellular localization and determining localization specific functional regions of these proteins.

  9. Laser desorption mass spectrometry for DNA analysis and sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Chen, C.H.; Taranenko, N.I.; Tang, K.; Allman, S.L.

    1995-03-01

    Laser desorption mass spectrometry has been considered as a potential new method for fast DNA sequencing. Our approach is to use matrix-assisted laser desorption to produce parent ions of DNA segments and a time-of-flight mass spectrometer to identify the sizes of DNA segments. Thus, the approach is similar to gel electrophoresis sequencing using Sanger`s enzymatic method. However, gel, radioactive tagging, and dye labeling are not required. In addition, the sequencing process can possibly be finished within a few hundred microseconds instead of hours and days. In order to use mass spectrometry for fast DNA sequencing, the following three criteria need to be satisfied. They are (1) detection of large DNA segments, (2) sensitivity reaching the femtomole region, and (3) mass resolution good enough to separate DNA segments of a single nucleotide difference. It has been very difficult to detect large DNA segments by mass spectrometry before due to the fragile chemical properties of DNA and low detection sensitivity of DNA ions. We discovered several new matrices to increase the production of DNA ions. By innovative design of a mass spectrometer, we can increase the ion energy up to 45 KeV to enhance the detection sensitivity. Recently, we succeeded in detecting a DNA segment with 500 nucleotides. The sensitivity was 100 femtomole. Thus, we have fulfilled two key criteria for using mass spectrometry for fast DNA sequencing. The major effort in the near future is to improve the resolution. Different approaches are being pursued. When high resolution of mass spectrometry can be achieved and automation of sample preparation is developed, the sequencing speed to reach 500 megabases per year can be feasible.

  10. The DNA sequence, annotation and analysis of human chromosome 3

    DEFF Research Database (Denmark)

    Muzny, D.M.; Bolund, Lars; As part of the Chinese Human Genome Sequencing Consortium, E.T.A.L.

    2006-01-01

    chromosomes. Chromosome 3 comprises just four contigs, one of which currently represents the longest unbroken stretch of finished DNA sequence known so far. The chromosome is remarkable in having the lowest rate of segmental duplication in the genome. It also includes a chemokine receptor gene cluster as well...... as numerous loci involved in multiple human cancers such as the gene encoding FHIT, which contains the most common constitutive fragile site in the genome, FRA3B. Using genomic sequence from chimpanzee and rhesus macaque, we were able to characterize the breakpoints defining a large pericentric inversion...

  11. SmashCell: A software framework for the analysis of single-cell amplified genome sequences

    DEFF Research Database (Denmark)

    Harrington, Eoghan D; Arumugam, Manimozhiyan; Raes, Jeroen

    2010-01-01

    SUMMARY: Recent advances in single-cell manipulation technology, whole genome amplification and high-throughput sequencing have now made it possible to sequence the genome of an individual cell. The bioinformatic analysis of these genomes however is far more complicated than the analysis of those...

  12. cDNA sequence and tissue expression analysis of glucokinase from ...

    African Journals Online (AJOL)

    Yomi

    2012-01-10

    Jan 10, 2012 ... (Rattus norvegicus) were 98.1, 96.8, 80.3 and 79.8%, respectively. Phylogenetic analysis based on GK amino acid sequences. Phylogenetic analysis among eight fish species, eleven endothermic species and one amphibian species based on glucokinase amino acid sequences is shown in Figure. 3.

  13. Global sequence characterization of rice centromeric satellite based on oligomer frequency analysis in large-scale sequencing data

    Czech Academy of Sciences Publication Activity Database

    Macas, Jiří; Neumann, Pavel; Novák, Petr; Jiang, J.

    2010-01-01

    Roč. 26, č. 1797 (2010), s. 2101-2108 ISSN 1367-4803 R&D Projects: GA AV ČR KJB500960802; GA MŠk(CZ) OC10037; GA MŠk(CZ) LC06004 Institutional research plan: CEZ:AV0Z50510513 Keywords : next-generation sequencing * satellite repeats * K-mer analysis Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 4.877, year: 2010

  14. Analysis of expressed sequence tags from Prunus mume flower and fruit and development of simple sequence repeat markers

    Directory of Open Access Journals (Sweden)

    Gao Zhihong

    2010-07-01

    Full Text Available Abstract Background Expressed Sequence Tag (EST has been a cost-effective tool in molecular biology and represents an abundant valuable resource for genome annotation, gene expression, and comparative genomics in plants. Results In this study, we constructed a cDNA library of Prunus mume flower and fruit, sequenced 10,123 clones of the library, and obtained 8,656 expressed sequence tag (EST sequences with high quality. The ESTs were assembled into 4,473 unigenes composed of 1,492 contigs and 2,981 singletons and that have been deposited in NCBI (accession IDs: GW868575 - GW873047, among which 1,294 unique ESTs were with known or putative functions. Furthermore, we found 1,233 putative simple sequence repeats (SSRs in the P. mume unigene dataset. We randomly tested 42 pairs of PCR primers flanking potential SSRs, and 14 pairs were identified as true-to-type SSR loci and could amplify polymorphic bands from 20 individual plants of P. mume. We further used the 14 EST-SSR primer pairs to test the transferability on peach and plum. The result showed that nearly 89% of the primer pairs produced target PCR bands in the two species. A high level of marker polymorphism was observed in the plum species (65% and low in the peach (46%, and the clustering analysis of the three species indicated that these SSR markers were useful in the evaluation of genetic relationships and diversity between and within the Prunus species. Conclusions We have constructed the first cDNA library of P. mume flower and fruit, and our data provide sets of molecular biology resources for P. mume and other Prunus species. These resources will be useful for further study such as genome annotation, new gene discovery, gene functional analysis, molecular breeding, evolution and comparative genomics between Prunus species.

  15. Analysis of B-genome derived simple sequence repeat (SSR ...

    African Journals Online (AJOL)

    A study was conducted to investigate the genetic variability between 40 Musa genotypes maintained at the Musa germplasm collection of the International Institute for Tropical Agriculture, Ibadan using nine B-genome derived simple sequence repeat (SSR) markers. The nine primers produced reproducible and discrete ...

  16. Cloning and sequence analysis of the Antheraea pernyi ...

    Indian Academy of Sciences (India)

    The genome size of AnpeNPV is estimated at 128 kb. ... A genomic library was generated using HindIII and the positive clones were sequenced and analysed. ... Institute of Life Sciences, Jiangsu University, Xuefu Road 301, Zhenjiang 212013, People's Republic of China; School of Agricultural Science and Technology, ...

  17. Characterisation and Next-generation Sequencing Analysis of Unknown Arboviruses

    Science.gov (United States)

    2012-09-01

    incapacitating illness, lack of adequate control measures, and the ease of production of large quantities of virus. Characterisation by sequencing is...ability to induce a fatal or seriously incapacitating illness, the lack of adequate control measures, and the ease of production of large...Location Family Neutralising antibody detected Culex annulirostris (mosquito) DPP1163 1987 Darwin, NT Rhabdoviridae Cattle, buffalo

  18. Illumina-based de novo transcriptome sequencing and analysis of ...

    Indian Academy of Sciences (India)

    2017-12-18

    Dec 18, 2017 ... In the present study, we used Illumina HiSeq technology to perform de novo assembly of heart and musk gland transcriptomes from the Chinese forest musk deer. A total of 239,383 transcripts and 176,450 unigenes were obtained, of which 37,329 unigenes were matched to known sequences in the NCBI ...

  19. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  20. POSA: perl objects for DNA sequencing data analysis

    NARCIS (Netherlands)

    Aerts, J.A.; Jungerius, B.J.; Groenen, M.A.M.

    2004-01-01

    Background - Capillary DNA sequencing machines allow the generation of vast amounts of data with little hands-on time. With this expansion of data generation, there is a growing need for automated data processing. Most available software solutions, however, still require user intervention or provide

  1. POSA : Perl objects for DNA sequencing data analysis

    NARCIS (Netherlands)

    Aerts, JA; Jungerius, BJ; Groenen, MA

    2004-01-01

    Background: Capillary DNA sequencing machines allow the generation of vast amounts of data with little hands-on time. With this expansion of data generation, there is a growing need for automated data processing. Most available software solutions, however, still require user intervention or provide

  2. DNA sequence and prokaryotic expression analysis of vitellogenin ...

    African Journals Online (AJOL)

    In this study, the DNA sequence of vitellogenin from Antheraea pernyi (Ap-Vg) was identified and its functional domain (30-740 aa, Ap-Vg-1) was expressed in Escherichia coli BL21 (DE3) cells. The recombinant Ap-Vg-1 proteins were purified and used for antibody preparation. The results showed that the intact DNA ...

  3. Illumina-based de novo transcriptome sequencing and analysis

    Indian Academy of Sciences (India)

    In the present study, we used Illumina HiSeq technology to perform de novo assembly of heart and musk gland transcriptomes from the Chinese forest musk deer. A total of 239,383 transcripts and 176,450 unigenes were obtained, of which 37,329 unigenes were matched to known sequences in the NCBI nonredundant ...

  4. A deep sequencing analysis of transcriptomes and the development ...

    Indian Academy of Sciences (India)

    Mungbean (Vigna radiata L. Wilczek) is one of the most important leguminous food crops in Asia. We employed Illumina paired-end sequencing to analyse transcriptomes of three different mungbean genotypes. A total of 38.3–39.8 million paired-end reads with 73 bp lengths were generated. The pooled reads from the ...

  5. Sequence analysis of mitochondrial 16S ribosomal RNA gene ...

    Indian Academy of Sciences (India)

    Mosquitoes are vectors for the transmission of many human pathogens that include viruses, nematodes and protozoa. For the understanding of their vectorial capacity, identification of disease carrying and refractory strains is essential. Recently, molecular taxonomic techniques have been utilized for this purpose. Sequence ...

  6. A bibliometric analysis of global research on genome sequencing ...

    African Journals Online (AJOL)

    The results show that disease and protein related researches were the leading research focuses, and comparative genomics and evolution related research had strong potential in the near future. Key words: Genome sequencing, research trend, scientometrics, science citation index expanded (SCI-Expanded), word cluster ...

  7. Phylogenetic analysis of the Bifidobacterium genus using glycolysis enzyme sequences

    Directory of Open Access Journals (Sweden)

    Katelyn eBrandt

    2016-05-01

    Full Text Available Bifidobacteria are important members of the human gastrointestinal tract that promote the establishment of a healthy microbial consortium in the gut of infants. Recent studies have established that the Bifidobacterium genus is a polymorphic phylogenetic clade, which encompasses a diversity of species and subspecies that encode a broad range of proteins implicated in complex and non-digestible carbohydrate uptake and catabolism, ranging from human breast milk oligosaccharides, to plant fibers. Recent genomic studies have created a need to properly place Bifidobacterium species in a phylogenetic tree. Current approaches, based on core-genome analyses come at the cost of intensive sequencing and demanding analytical processes. Here, we propose a typing method based on sequences of glycolysis genes and the proteins they encode, to provide insights into diversity, typing, and phylogeny in this complex and broad genus. We show that glycolysis genes occur broadly in these genomes, to encode the machinery necessary for the biochemical spine of the cell, and provide a robust phylogenetic marker. Furthermore, glycolytic sequences-based trees are congruent with both the classical 16S rRNA phylogeny, and core genome-based strain clustering. Furthermore, these glycolysis markers can also be used to provide insights into the adaptive evolution of this genus, especially with regards to trends towards a high GC content. This streamlined method may open new avenues for phylogenetic studies on a broad scale, given the widespread occurrence of the glycolysis pathway in bacteria, and the diversity of the sequences they encode.

  8. Sequence analysis of mitochondrial 16S ribosomal RNA gene

    Indian Academy of Sciences (India)

    Mosquitoes are vectors for the transmission of many human pathogens that include viruses, nematodes and protozoa. For the understanding of their vectorial capacity, identification of disease carrying and refractory strains is essential. Recently, molecular taxonomic techniques have been utilized for this purpose. Sequence ...

  9. Sequencing and Gene Expression Analysis of Leishmania tropica LACK Gene.

    Science.gov (United States)

    Hammoudeh, Nour; Kweider, Mahmoud; Abbady, Abdul-Qader; Soukkarieh, Chadi

    2014-01-01

    Leishmania Homologue of receptors for Activated C Kinase (LACK) antigen is a 36-kDa protein, which provokes a very early immune response against Leishmania infection. There are several reports on the expression of LACK through different life-cycle stages of genus Leishmania, but only a few of them have focused on L.tropica. The present study provides details of the cloning, DNA sequencing and gene expression of LACK in this parasite species. First, several local isolates of Leishmania parasites were typed in our laboratory using PCR technique to verify of Leishmania parasite species. After that, LACK gene was amplified and cloned into a vector for sequencing. Finally, the expression of this molecule in logarithmic and stationary growth phase promastigotes, as well as in amastigotes, was evaluated by Reverse Transcription-PCR (RT-PCR) technique. The typing result confirmed that all our local isolates belong to L.tropica. LACK gene sequence was determined and high similarity was observed with the sequences of other Leishmania species. Furthermore, the expression of LACK gene in both promastigotes and amastigotes forms was confirmed. Overall, the data set the stage for future studies of the properties and immune role of LACK gene products.

  10. Cloning and sequence analysis of the defective in anther ...

    African Journals Online (AJOL)

    To clone the defective in anther dehiscence1 (DAD1) gene fragment of Chinese kale, about 700 bp product was obtained by PCR amplification using Chinese kale genomic DNA as the template and a pair of specific primers designed according to the conserved sequence of DAD1 genes of Arabidopsis thaliana and ...

  11. A Bibliometric Analysis of Global Research on Genome Sequencing ...

    African Journals Online (AJOL)

    YSHo

    This study was carried out to evaluate the global scientific production of genome sequencing research to assess the characteristics of the research performances and the research tendencies. Data were obtained from Science Citation Index Expanded database during 1991-2010. Conventional methods including document ...

  12. Sequencing and phylogenetic analysis of Herpes simplex virus type ...

    African Journals Online (AJOL)

    For determination of the genetic relationship of HSV-2 glycoprotein G gene (gG) in Iran with those in other countries, DNA fragment of 1100 bp corresponding to gG from six HSV-2 strains have been isolated from human infected sera samples in Iran, it was amplified in PCR system and was sequenced for determining ...

  13. Molecular cloning and sequence analysis of the cat myostatin gene ...

    African Journals Online (AJOL)

    Administrator

    2011-09-07

    Sep 7, 2011 ... TEAF. TEA/ATTS DNA binding domain factors. TEAD.01. TEA domain-containing factors, transcriptional enhancer factors 1, 3,. 4, 5. -1129/-1117(-) transcription factor binding sites located in the cat myostatin gene upstream sequence. According to previous works, we focused on analyzing and discussing.

  14. Molecular cloning, expression analysis and sequence prediction of ...

    African Journals Online (AJOL)

    ajl yemi

    2011-11-28

    Nov 28, 2011 ... Besides, one basic leucine zipper domain (bZIP) in amino acid area from 274 to 337 was found, concurring with the main characteristic of C/EBPs. Homologous comparison of the amino acid sequences from C/EBPβ cloned in this study and those from different species indicated C/EBPβ gene of Qinchuan ...

  15. Sequence and comparative analysis of Leuconostoc dairy bacteriophages

    DEFF Research Database (Denmark)

    Kot, Witold; Hansen, Lars Henrik; Neve, Horst

    2014-01-01

    Bacteriophages attacking Leuconostoc species may significantly influence the quality of the final product. There is however limited knowledge of this group of phages in the literature. We have determined the complete genome sequences of nine Leuconostoc bacteriophages virulent to either Leuconostoc...

  16. Cloning and sequence analysis of the Antheraea pernyi ...

    Indian Academy of Sciences (India)

    Unknown

    Goldbach R W and Vlak J M 1999 Sequence and organiza- tion of the Spodoptera exigua multicapsid nucleopolyhedro- virus genome; J. Gen. Virol. 80 3289–3304. Jakubowska A, Oers M M, Cory J S, Ziemnick J and Vlak J M. 2005 European Leucoma salicis NPV is closely related to. North American Orgyia pseudotsugata ...

  17. Integrated stratigraphy and 40Ar/39Ar chronology of early Middle Miocene sediments from DSDP Leg 42A, Site 372 (Western Mediterranean)

    NARCIS (Netherlands)

    Abdul Aziz, H.; di Stefano, A.; Foresi, L. M.; Hilgen, Frederik J.; Iaccarino, S. M.; Kuiper, K. F.; Lirer, F.; Salvatorini, G.; Turco, E.

    2008-01-01

    An integrated magneto-biostratigraphic framework is presented for Middle Miocene sediments of DSDP Site 372 located in the Western Mediterranean. Detailed biostratigraphic analysis shows a nearly complete sequence of early Middle Miocene calcareous plankton bioevents in the Mediterranean, including

  18. Novel technologies applied to the nucleotide sequencing and comparative sequence analysis of the genomes of infectious agents in veterinary medicine.

    Science.gov (United States)

    Granberg, F; Bálint, Á; Belák, S

    2016-04-01

    Next-generation sequencing (NGS), also referred to as deep, high-throughput or massively parallel sequencing, is a powerful new tool that can be used for the complex diagnosis and intensive monitoring of infectious disease in veterinary medicine. NGS technologies are also being increasingly used to study the aetiology, genomics, evolution and epidemiology of infectious disease, as well as host-pathogen interactions and other aspects of infection biology. This review briefly summarises recent progress and achievements in this field by first introducing a range of novel techniques and then presenting examples of NGS applications in veterinary infection biology. Various work steps and processes for sampling and sample preparation, sequence analysis and comparative genomics, and improving the accuracy of genomic prediction are discussed, as are bioinformatics requirements. Examples of sequencing-based applications and comparative genomics in veterinary medicine are then provided. This review is based on novel references selected from the literature and on experiences of the World Organisation for Animal Health (OIE) Collaborating Centre for the Biotechnology-based Diagnosis of Infectious Diseases in Veterinary Medicine, Uppsala, Sweden.

  19. Network Analysis of Sequence-Function Relationships and Exploration of Sequence Space of TEM β-Lactamases.

    Science.gov (United States)

    Zeil, Catharina; Widmann, Michael; Fademrecht, Silvia; Vogel, Constantin; Pleiss, Jürgen

    2016-05-01

    The Lactamase Engineering Database (www.LacED.uni-stuttgart.de) was developed to facilitate the classification and analysis of TEM β-lactamases. The current version contains 474 TEM variants. Two hundred fifty-nine variants form a large scale-free network of highly connected point mutants. The network was divided into three subnetworks which were enriched by single phenotypes: one network with predominantly 2be and two networks with 2br phenotypes. Fifteen positions were found to be highly variable, contributing to the majority of the observed variants. Since it is expected that a considerable fraction of the theoretical sequence space is functional, the currently sequenced 474 variants represent only the tip of the iceberg of functional TEM β-lactamase variants which form a huge natural reservoir of highly interconnected variants. Almost 50% of the variants are part of a quartet. Thus, two single mutations that result in functional enzymes can be combined into a functional protein. Most of these quartets consist of the same phenotype, or the mutations are additive with respect to the phenotype. By predicting quartets from triplets, 3,916 unknown variants were constructed. Eighty-seven variants complement multiple quartets and therefore have a high probability of being functional. The construction of a TEM β-lactamase network and subsequent analyses by clustering and quartet prediction are valuable tools to gain new insights into the viable sequence space of TEM β-lactamases and to predict their phenotype. The highly connected sequence space of TEM β-lactamases is ideally suited to network analysis and demonstrates the strengths of network analysis over tree reconstruction methods. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  20. XplorSeq: a software environment for integrated management and phylogenetic analysis of metagenomic sequence data.

    Science.gov (United States)

    Frank, Daniel N

    2008-10-07

    Advances in automated DNA sequencing technology have accelerated the generation of metagenomic DNA sequences, especially environmental ribosomal RNA gene (rDNA) sequences. As the scale of rDNA-based studies of microbial ecology has expanded, need has arisen for software that is capable of managing, annotating, and analyzing the plethora of diverse data accumulated in these projects. XplorSeq is a software package that facilitates the compilation, management and phylogenetic analysis of DNA sequences. XplorSeq was developed for, but is not limited to, high-throughput analysis of environmental rRNA gene sequences. XplorSeq integrates and extends several commonly used UNIX-based analysis tools by use of a Macintosh OS-X-based graphical user interface (GUI). Through this GUI, users may perform basic sequence import and assembly steps (base-calling, vector/primer trimming, contig assembly), perform BLAST (Basic Local Alignment and Search Tool; 123) searches of NCBI and local databases, create multiple sequence alignments, build phylogenetic trees, assemble Operational Taxonomic Units, estimate biodiversity indices, and summarize data in a variety of formats. Furthermore, sequences may be annotated with user-specified meta-data, which then can be used to sort data and organize analyses and reports. A document-based architecture permits parallel analysis of sequence data from multiple clones or amplicons, with sequences and other data stored in a single file. XplorSeq should benefit researchers who are engaged in analyses of environmental sequence data, especially those with little experience using bioinformatics software. Although XplorSeq was developed for management of rDNA sequence data, it can be applied to most any sequencing project. The application is available free of charge for non-commercial use at http://vent.colorado.edu/phyloware.

  1. XplorSeq: A software environment for integrated management and phylogenetic analysis of metagenomic sequence data

    Directory of Open Access Journals (Sweden)

    Frank Daniel N

    2008-10-01

    Full Text Available Abstract Background Advances in automated DNA sequencing technology have accelerated the generation of metagenomic DNA sequences, especially environmental ribosomal RNA gene (rDNA sequences. As the scale of rDNA-based studies of microbial ecology has expanded, need has arisen for software that is capable of managing, annotating, and analyzing the plethora of diverse data accumulated in these projects. Results XplorSeq is a software package that facilitates the compilation, management and phylogenetic analysis of DNA sequences. XplorSeq was developed for, but is not limited to, high-throughput analysis of environmental rRNA gene sequences. XplorSeq integrates and extends several commonly used UNIX-based analysis tools by use of a Macintosh OS-X-based graphical user interface (GUI. Through this GUI, users may perform basic sequence import and assembly steps (base-calling, vector/primer trimming, contig assembly, perform BLAST (Basic Local Alignment and Search Tool; 123 searches of NCBI and local databases, create multiple sequence alignments, build phylogenetic trees, assemble Operational Taxonomic Units, estimate biodiversity indices, and summarize data in a variety of formats. Furthermore, sequences may be annotated with user-specified meta-data, which then can be used to sort data and organize analyses and reports. A document-based architecture permits parallel analysis of sequence data from multiple clones or amplicons, with sequences and other data stored in a single file. Conclusion XplorSeq should benefit researchers who are engaged in analyses of environmental sequence data, especially those with little experience using bioinformatics software. Although XplorSeq was developed for management of rDNA sequence data, it can be applied to most any sequencing project. The application is available free of charge for non-commercial use at http://vent.colorado.edu/phyloware.

  2. Addressing challenges in the production and analysis of illumina sequencing data.

    Science.gov (United States)

    Kircher, Martin; Heyn, Patricia; Kelso, Janet

    2011-07-29

    Advances in DNA sequencing technologies have made it possible to generate large amounts of sequence data very rapidly and at substantially lower cost than capillary sequencing. These new technologies have specific characteristics and limitations that require either consideration during project design, or which must be addressed during data analysis. Specialist skills, both at the laboratory and the computational stages of project design and analysis, are crucial to the generation of high quality data from these new platforms. The Illumina sequencers (including the Genome Analyzers I/II/IIe/IIx and the new HiScan and HiSeq) represent a widely used platform providing parallel readout of several hundred million immobilized sequences using fluorescent-dye reversible-terminator chemistry. Sequencing library quality, sample handling, instrument settings and sequencing chemistry have a strong impact on sequencing run quality. The presence of adapter chimeras and adapter sequences at the end of short-insert molecules, as well as increased error rates and short read lengths complicate many computational analyses. We discuss here some of the factors that influence the frequency and severity of these problems and provide solutions for circumventing these. Further, we present a set of general principles for good analysis practice that enable problems with sequencing runs to be identified and dealt with.

  3. Addressing challenges in the production and analysis of illumina sequencing data

    Directory of Open Access Journals (Sweden)

    Kelso Janet

    2011-07-01

    Full Text Available Abstract Advances in DNA sequencing technologies have made it possible to generate large amounts of sequence data very rapidly and at substantially lower cost than capillary sequencing. These new technologies have specific characteristics and limitations that require either consideration during project design, or which must be addressed during data analysis. Specialist skills, both at the laboratory and the computational stages of project design and analysis, are crucial to the generation of high quality data from these new platforms. The Illumina sequencers (including the Genome Analyzers I/II/IIe/IIx and the new HiScan and HiSeq represent a widely used platform providing parallel readout of several hundred million immobilized sequences using fluorescent-dye reversible-terminator chemistry. Sequencing library quality, sample handling, instrument settings and sequencing chemistry have a strong impact on sequencing run quality. The presence of adapter chimeras and adapter sequences at the end of short-insert molecules, as well as increased error rates and short read lengths complicate many computational analyses. We discuss here some of the factors that influence the frequency and severity of these problems and provide solutions for circumventing these. Further, we present a set of general principles for good analysis practice that enable problems with sequencing runs to be identified and dealt with.

  4. The sequence and analysis of duplication rich human chromosome 16

    Energy Technology Data Exchange (ETDEWEB)

    Martin, J; Han, C; Gordon, L A; Terry, A; Prabhakar, S; She, X; Xie, G; Hellsten, U; Chan, Y M; Altherr, M; Couronne, O; Aerts, A; Bajorek, E; Black, S; Blumer, H; Branscomb, E; Brown, N; Bruno, W J; Buckingham, J; Callen, D F; Campbell, C S; Campbell, M L; Campbell, E W; Caoile, C; Challacombe, J F; Chasteen, L A; Chertkov, O; Chi, H C; Christensen, M; Clark, L M; Cohn, J D; Denys, M; Detter, J C; Dickson, M; Dimitrijevic-Bussod, M; Escobar, J; Fawcett, J J; Flowers, D; Fotopulos, D; Glavina, T; Gomez, M; Gonzales, E; Goodstein, D; Goodwin, L A; Grady, D L; Grigoriev, I; Groza, M; Hammon, N; Hawkins, T; Haydu, L; Hildebrand, C E; Huang, W; Israni, S; Jett, J; Jewett, P B; Kadner, K; Kimball, H; Kobayashi, A; Krawczyk, M; Leyba, T; Longmire, J L; Lopez, F; Lou, Y; Lowry, S; Ludeman, T; Manohar, C F; Mark, G A; McMurray, K L; Meincke, L J; Morgan, J; Moyzis, R K; Mundt, M O; Munk, A C; Nandkeshwar, R D; Pitluck, S; Pollard, M; Predki, P; Parson-Quintana, B; Ramirez, L; Rash, S; Retterer, J; Ricke, D O; Robinson, D; Rodriguez, A; Salamov, A; Saunders, E H; Scott, D; Shough, T; Stallings, R L; Stalvey, M; Sutherland, R D; Tapia, R; Tesmer, J G; Thayer, N; Thompson, L S; Tice, H; Torney, D C; Tran-Gyamfi, M; Tsai, M; Ulanovsky, L E; Ustaszewska, A; Vo, N; White, P S; Williams, A L; Wills, P L; Wu, J; Wu, K; Yang, J; DeJong, P; Bruce, D; Doggett, N A; Deaven, L; Schmutz, J; Grimwood, J; Richardson, P; Rokhsar, D S; Eichler, E E; Gilna, P; Lucas, S M; Myers, R M; Rubin, E M; Pennacchio, L A

    2005-04-06

    Human chromosome 16 features one of the highest levels of segmentally duplicated sequence among the human autosomes. We report here the 78,884,754 base pairs of finished chromosome 16 sequence, representing over 99.9% of its euchromatin. Manual annotation revealed 880 protein-coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes, and 3 RNA pseudogenes. These genes include metallothionein, cadherin, and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobase pairs were identified and result in gene content differences among humans. While the segmental duplications of chromosome 16 are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events likely to have had an impact on the evolution of primates and human disease susceptibility.

  5. mitoSAVE: mitochondrial sequence analysis of variants in Excel.

    Science.gov (United States)

    King, Jonathan L; Sajantila, Antti; Budowle, Bruce

    2014-09-01

    The mitochondrial genome (mtGenome) contains genetic information amenable to numerous applications such as medical research, population and evolutionary studies, and human identity testing. However, inconsistent nomenclature assignment makes haplotype comparison difficult and can lead to false exclusion of potentially useful profiles. Massively Parallel Sequencing (MPS) is a platform for sequencing large datasets and potentially whole populations with relative ease. However, the data generated are not easily parsed and interpreted. With this in mind, mitoSAVE has been developed to enable fast conversion of Variant Call Format (VCF) files. mitoSAVE is an Excel-based workbook that converts data within the VCF into mtDNA haplotypes using phylogenetically-established nomenclature as well as rule-based alignments consistent with current forensic standards. mitoSAVE is formatted for human mitochondrial genome; however, it can easily be adapted to support other reasonably small genomes. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  6. Sequence analysis of the Legionella micdadei groELS operon

    DEFF Research Database (Denmark)

    Hindersson, P; Høiby, N; Bangsborg, Jette Marie

    1991-01-01

    A 2.7 kb DNA fragment encoding the 60 kDa common antigen (CA) and a 13 kDa protein of Legionella micdadei was sequenced. Two open reading frames of 57,677 and 10,456 Da were identified, corresponding to the heat shock proteins GroEL and GroES, respectively. Typical -35, -10, and Shine-Dalgarno heat...

  7. Analysis of plastid DNA-like sequences within the nuclear genomes of higher plants.

    Science.gov (United States)

    Ayliffe, M A; Scott, N S; Timmis, J N

    1998-06-01

    A wide-ranging examination of plastid (pt)DNA sequence homologies within higher plant nuclear genomes (promiscuous DNA) was undertaken. Digestion with methylation-sensitive restriction enzymes and Southern analysis was used to distinguish plastid and nuclear DNA in order to assess the extent of variability of promiscuous sequences within and between plant species. Some species, such as Gossypium hirsutum (cotton), Nicotiana tabacum (tobacco), and Chenopodium quinoa, showed homogenity of these sequences, while intraspecific sequence variation was observed among different cultivars of Pisum sativum (pea), Hordeum vulgare (barley), and Triticum aestivum (wheat). Hypervariability of plastid sequence homologies was identified in the nuclear genomes of Spinacea oleracea (spinach) and Beta vulgaris (beet), in which individual plants were shown to possess a unique spectrum of nuclear sequences with ptDNA homology. This hypervariability apparently extended to somatic variation in B. vulgaris. No sequences with ptDNA homology were identified by this method in the nuclear genome of Arabidopsis thaliana.

  8. OPTSDNA: Performance evaluation of an efficient distributed bioinformatics system for DNA sequence analysis.

    Science.gov (United States)

    Khan, Mohammad Ibrahim; Sheel, Chotan

    2013-01-01

    Storage of sequence data is a big concern as the amount of data generated is exponential in nature at several locations. Therefore, there is a need to develop techniques to store data using compression algorithm. Here we describe optimal storage algorithm (OPTSDNA) for storing large amount of DNA sequences of varying length. This paper provides performance analysis of optimal storage algorithm (OPTSDNA) of a distributed bioinformatics computing system for analysis of DNA sequences. OPTSDNA algorithm is used for storing various sizes of DNA sequences into database. DNA sequences of different lengths were stored by using this algorithm. These input DNA sequences are varied in size from very small to very large. Storage size is calculated by this algorithm. Response time is also calculated in this work. The efficiency and performance of the algorithm is high (in size calculation with percentage) when compared with other known with sequential approach.

  9. A model of the statistical power of comparative genome sequence analysis.

    OpenAIRE

    Sean R Eddy

    2005-01-01

    Comparative genome sequence analysis is powerful, but sequencing genomes is expensive. It is desirable to be able to predict how many genomes are needed for comparative genomics, and at what evolutionary distances. Here I describe a simple mathematical model for the common problem of identifying conserved sequences. The model leads to some useful rules of thumb. For a given evolutionary distance, the number of comparative genomes needed for a constant level of statistical stringency in identi...

  10. MiSeq: A Next Generation Sequencing Platform for Genomic Analysis.

    Science.gov (United States)

    Ravi, Rupesh Kanchi; Walton, Kendra; Khosroheidari, Mahdieh

    2018-01-01

    MiSeq, Illumina's integrated next generation sequencing instrument, uses reversible-terminator sequencing-by-synthesis technology to provide end-to-end sequencing solutions. The MiSeq instrument is one of the smallest benchtop sequencers that can perform onboard cluster generation, amplification, genomic DNA sequencing, and data analysis, including base calling, alignment and variant calling, in a single run. It performs both single- and paired-end runs with adjustable read lengths from 1 × 36 base pairs to 2 × 300 base pairs. A single run can produce output data of up to 15 Gb in as little as 4 h of runtime and can output up to 25 M single reads and 50 M paired-end reads. Thus, MiSeq provides an ideal platform for rapid turnaround time. MiSeq is also a cost-effective tool for various analyses focused on targeted gene sequencing (amplicon sequencing and target enrichment), metagenomics, and gene expression studies. For these reasons, MiSeq has become one of the most widely used next generation sequencing platforms. Here, we provide a protocol to prepare libraries for sequencing using the MiSeq instrument and basic guidelines for analysis of output data from the MiSeq sequencing run.

  11. Deep Sequencing Analysis of the Ixodes ricinus Haemocytome.

    Directory of Open Access Journals (Sweden)

    Michalis Kotsyfakis

    2015-05-01

    Full Text Available Ixodes ricinus is the main tick vector of the microbes that cause Lyme disease and tick-borne encephalitis in Europe. Pathogens transmitted by ticks have to overcome innate immunity barriers present in tick tissues, including midgut, salivary glands epithelia and the hemocoel. Molecularly, invertebrate immunity is initiated when pathogen recognition molecules trigger serum or cellular signalling cascades leading to the production of antimicrobials, pathogen opsonization and phagocytosis. We presently aimed at identifying hemocyte transcripts from semi-engorged female I. ricinus ticks by mass sequencing a hemocyte cDNA library and annotating immune-related transcripts based on their hemocyte abundance as well as their ubiquitous distribution.De novo assembly of 926,596 pyrosequence reads plus 49,328,982 Illumina reads (148 nt length from a hemocyte library, together with over 189 million Illumina reads from salivary gland and midgut libraries, generated 15,716 extracted coding sequences (CDS; these are displayed in an annotated hyperlinked spreadsheet format. Read mapping allowed the identification and annotation of tissue-enriched transcripts. A total of 327 transcripts were found significantly over expressed in the hemocyte libraries, including those coding for scavenger receptors, antimicrobial peptides, pathogen recognition proteins, proteases and protease inhibitors. Vitellogenin and lipid metabolism transcription enrichment suggests fat body components. We additionally annotated ubiquitously distributed transcripts associated with immune function, including immune-associated signal transduction proteins and transcription factors, including the STAT transcription factor.This is the first systems biology approach to describe the genes expressed in the haemocytes of this neglected disease vector. A total of 2,860 coding sequences were deposited to GenBank, increasing to 27,547 the number so far deposited by our previous transcriptome studies

  12. A Markovian analysis of bacterial genome sequence constraints

    Directory of Open Access Journals (Sweden)

    Aaron D. Skewes

    2013-08-01

    Full Text Available The arrangement of nucleotides within a bacterial chromosome is influenced by numerous factors. The degeneracy of the third codon within each reading frame allows some flexibility of nucleotide selection; however, the third nucleotide in the triplet of each codon is at least partly determined by the preceding two. This is most evident in organisms with a strong G + C bias, as the degenerate codon must contribute disproportionately to maintaining that bias. Therefore, a correlation exists between the first two nucleotides and the third in all open reading frames. If the arrangement of nucleotides in a bacterial chromosome is represented as a Markov process, we would expect that the correlation would be completely captured by a second-order Markov model and an increase in the order of the model (e.g., third-, fourth-…order would not capture any additional uncertainty in the process. In this manuscript, we present the results of a comprehensive study of the Markov property that exists in the DNA sequences of 906 bacterial chromosomes. All of the 906 bacterial chromosomes studied exhibit a statistically significant Markov property that extends beyond second-order, and therefore cannot be fully explained by codon usage. An unrooted tree containing all 906 bacterial chromosomes based on their transition probability matrices of third-order shares ∼25% similarity to a tree based on sequence homologies of 16S rRNA sequences. This congruence to the 16S rRNA tree is greater than for trees based on lower-order models (e.g., second-order, and higher-order models result in diminishing improvements in congruence. A nucleotide correlation most likely exists within every bacterial chromosome that extends past three nucleotides. This correlation places significant limits on the number of nucleotide sequences that can represent probable bacterial chromosomes. Transition matrix usage is largely conserved by taxa, indicating that this property is likely

  13. Analysis of next-generation sequencing data using Galaxy.

    Science.gov (United States)

    Blankenberg, Daniel; Hillman-Jackson, Jennifer

    2014-01-01

    The extraordinary throughput of next-generation sequencing (NGS) technology is outpacing our ability to analyze and interpret the data. This chapter will focus on practical informatics methods, strategies, and software tools for transforming NGS data into usable information through the use of a web-based platform, Galaxy. The Galaxy interface is explored through several different types of example analyses. Instructions for running one's own Galaxy server on local hardware or on cloud computing resources are provided. Installing new tools into a personal Galaxy instance is also demonstrated.

  14. In Vivo Enhancer Analysis Chromosome 16 Conserved NoncodingSequences

    Energy Technology Data Exchange (ETDEWEB)

    Pennacchio, Len A.; Ahituv, Nadav; Moses, Alan M.; Nobrega,Marcelo; Prabhakar, Shyam; Shoukry, Malak; Minovitsky, Simon; Visel,Axel; Dubchak, Inna; Holt, Amy; Lewis, Keith D.; Plajzer-Frick, Ingrid; Akiyama, Jennifer; De Val, Sarah; Afzal, Veena; Black, Brian L.; Couronne, Olivier; Eisen, Michael B.; Rubin, Edward M.

    2006-02-01

    The identification of enhancers with predicted specificitiesin vertebrate genomes remains a significant challenge that is hampered bya lack of experimentally validated training sets. In this study, weleveraged extreme evolutionary sequence conservation as a filter toidentify putative gene regulatory elements and characterized the in vivoenhancer activity of human-fish conserved and ultraconserved1 noncodingelements on human chromosome 16 as well as such elements from elsewherein the genome. We initially tested 165 of these extremely conservedsequences in a transgenic mouse enhancer assay and observed that 48percent (79/165) functioned reproducibly as tissue-specific enhancers ofgene expression at embryonic day 11.5. While driving expression in abroad range of anatomical structures in the embryo, the majority of the79 enhancers drove expression in various regions of the developingnervous system. Studying a set of DNA elements that specifically droveforebrain expression, we identified DNA signatures specifically enrichedin these elements and used these parameters to rank all ~;3,400human-fugu conserved noncoding elements in the human genome. The testingof the top predictions in transgenic mice resulted in a three-foldenrichment for sequences with forebrain enhancer activity. These datadramatically expand the catalogue of in vivo-characterized human geneenhancers and illustrate the future utility of such training sets for avariety of iological applications including decoding the regulatoryvocabulary of the human genome.

  15. A cosmopolitan late Ediacaran biotic assemblage: new fossils from Nevada and Namibia support a global biostratigraphic link

    Science.gov (United States)

    Smith, E. F.; Nelson, L. L.; Tweedt, S. M.; Zeng, H.; Workman, Jeremiah B.

    2017-01-01

    Owing to the lack of temporally well-constrained Ediacaran fossil localities containing overlapping biotic assemblages, it has remained uncertain if the latest Ediacaran (ca 550–541 Ma) assemblages reflect systematic biological turnover or environmental, taphonomic or biogeographic biases. Here, we report new latest Ediacaran fossil discoveries from the lower member of the Wood Canyon Formation in Nye County, Nevada, including the first figured reports of erniettomorphs, Gaojiashania, Conotubus and other problematic fossils. The fossils are spectacularly preserved in three taphonomic windows and occur in greater than 11 stratigraphic horizons, all of which are below the first appearance of Treptichnus pedum and the nadir of a large negative δ13C excursion that is a chemostratigraphic marker of the Ediacaran–Cambrian boundary. The co-occurrence of morphologically diverse tubular fossils and erniettomorphs in Nevada provides a biostratigraphic link among latest Ediacaran fossil localities globally. Integrated with a new report of Gaojiashania from Namibia, previous fossil reports and existing age constraints, these finds demonstrate a distinctive late Ediacaran fossil assemblage comprising at least two groups of macroscopic organisms with dissimilar body plans that ecologically and temporally overlapped for at least 6 Myr at the close of the Ediacaran Period. This cosmopolitan biotic assemblage disappeared from the fossil record at the end of the Ediacaran Period, prior to the Cambrian radiation.

  16. A cosmopolitan late Ediacaran biotic assemblage: new fossils from Nevada and Namibia support a global biostratigraphic link.

    Science.gov (United States)

    Smith, E F; Nelson, L L; Tweedt, S M; Zeng, H; Workman, J B

    2017-07-12

    Owing to the lack of temporally well-constrained Ediacaran fossil localities containing overlapping biotic assemblages, it has remained uncertain if the latest Ediacaran ( ca 550-541 Ma) assemblages reflect systematic biological turnover or environmental, taphonomic or biogeographic biases. Here, we report new latest Ediacaran fossil discoveries from the lower member of the Wood Canyon Formation in Nye County, Nevada, including the first figured reports of erniettomorphs, Gaojiashania , Conotubus and other problematic fossils. The fossils are spectacularly preserved in three taphonomic windows and occur in greater than 11 stratigraphic horizons, all of which are below the first appearance of Treptichnus pedum and the nadir of a large negative δ 13 C excursion that is a chemostratigraphic marker of the Ediacaran-Cambrian boundary. The co-occurrence of morphologically diverse tubular fossils and erniettomorphs in Nevada provides a biostratigraphic link among latest Ediacaran fossil localities globally. Integrated with a new report of Gaojiashania from Namibia, previous fossil reports and existing age constraints, these finds demonstrate a distinctive late Ediacaran fossil assemblage comprising at least two groups of macroscopic organisms with dissimilar body plans that ecologically and temporally overlapped for at least 6 Myr at the close of the Ediacaran Period. This cosmopolitan biotic assemblage disappeared from the fossil record at the end of the Ediacaran Period, prior to the Cambrian radiation. © 2017 The Author(s).

  17. Combined sequence-based and genetic mapping analysis of complex traits in outbred rats

    NARCIS (Netherlands)

    Baud, Amelie; Hermsen, Roel; Guryev, Victor; Stridh, Pernilla; Graham, Delyth; McBride, Martin W.; Foroud, Tatiana; Calderari, Sophie; Diez, Margarita; Ockinger, Johan; Beyeen, Amennai D.; Gillett, Alan; Abdelmagid, Nada; Guerreiro-Cacais, Andre Ortlieb; Jagodic, Maja; Tuncel, Jonatan; Norin, Ulrika; Beattie, Elisabeth; Huynh, Ngan; Miller, William H.; Koller, Daniel L.; Alam, Imranul; Falak, Samreen; Osborne-Pellegrin, Mary; Martinez-Membrives, Esther; Canete, Toni; Blazquez, Gloria; Vicens-Costa, Elia; Mont-Cardona, Carme; Diaz-Moran, Sira; Tobena, Adolf; Hummel, Oliver; Zelenika, Diana; Saar, Kathrin; Patone, Giannino; Bauerfeind, Anja; Bihoreau, Marie-Therese; Heinig, Matthias; Lee, Young-Ae; Rintisch, Carola; Schulz, Herbert; Wheeler, David A.; Worley, Kim C.; Muzny, Donna M.; Gibbs, Richard A.; Lathrop, Mark; Lansu, Nico; Toonen, Pim; Ruzius, Frans Paul; de Bruijn, Ewart; Hauser, Heidi; Adams, David J.; Keane, Thomas; Atanur, Santosh S.; Aitman, Tim J.; Flicek, Paul; Malinauskas, Tomas; Jones, E. Yvonne; Ekman, Diana; Lopez-Aumatell, Regina; Dominiczak, Anna F.; Johannesson, Martina; Holmdahl, Rikard; Olsson, Tomas; Gauguier, Dominique; Hubner, Norbert; Fernandez-Teruel, Alberto; Cuppen, Edwin; Mott, Richard; Flint, Jonathan

    Genetic mapping on fully sequenced individuals is transforming understanding of the relationship between molecular variation and variation in complex traits. Here we report a combined sequence and genetic mapping analysis in outbred rats that maps 355 quantitative trait loci for 122 phenotypes. We

  18. Sequence analysis of the mitochondrial genomes from Dutch pedigrees with Leber hereditary optic neuropathy

    NARCIS (Netherlands)

    Howell, Neil; Oostra, Roelof-Jan; Bolhuis, Piet A.; Spruijt, Liesbeth; Clarke, Lorne A.; Mackey, David A.; Preston, Gwen; Herrnstadt, Corinna

    2003-01-01

    The complete mitochondrial DNA (mtDNA) sequences for 63 Dutch pedigrees with Leber hereditary optic neuropathy (LHON) were determined, 56 of which carried one of the classic LHON mutations at nucleotide (nt) 3460, 11778, or 14484. Analysis of these sequences indicated that there were several

  19. High-Throughput Analysis of DNA Break-Induced Chromosome Rearrangements by Amplicon Sequencing.

    Science.gov (United States)

    Brown, Alexander J; Al-Soodani, Aneesa T; Saul, Miles; Her, Stephanie; Garcia, Juan C; Ramsden, Dale A; Her, Chengtao; Roberts, Steven A

    2018-01-01

    The mechanistic understanding of how DNA double-strand breaks (DSB) are repaired is rapidly advancing in part due to the advent of inducible site-specific break model systems as well as the employment of next-generation sequencing (NGS) technologies to sequence repair junctions at high depth. Unfortunately, the sheer volume of data produced by these methods makes it difficult to analyze the structure of repair junctions manually or with other general-purpose software. Here, we describe methods to produce amplicon libraries of DSB repair junctions for sequencing, to map the sequencing reads, and then to use a robust, custom python script, Hi-FiBR, to analyze the sequence structure of mapped reads. The Hi-FiBR analysis processes large data sets quickly and provides information such as number and type of repair events, size of deletion, size of insertion and inserted sequence, microhomology usage, and whether mismatches are due to sequencing error or biological effect. The analysis also corrects for common alignment errors generated by sequencing read mapping tools, allowing high-throughput analysis of DSB break repair fidelity to be accurately conducted regardless of which suite of NGS analysis software is available. © 2018 Elsevier Inc. All rights reserved.

  20. Cloning and sequence analysis of H. contortus HC58cDNA gene ...

    African Journals Online (AJOL)

    Phylogenetic analysis revealed close evolutionary proximity of the protein sequence to counterpart sequences in the cathepsin B like proteases, suggesting that HC58cDNA was a member of the papain family. Keywords:Haemonchus contortus, HC58cDNA, cathepsin B like protease, papain family. Kenya Veterinarian Vol.

  1. Sequence analysis of putative swrW gene required for surfactant ...

    African Journals Online (AJOL)

    Serratia marcescens produces biosurfactant serrawettin, essential for its population migration behavior. Serrawettin W1 was revealed to be an antibiotic serratamolide that makes it significant for deoxyribonucleic acid (DNA) and protein sequence analysis. Four nucleotide and amino-acid sequences from local strains ...

  2. Genomic insight into the common carp (Cyprinus carpio genome by sequencing analysis of BAC-end sequences

    Directory of Open Access Journals (Sweden)

    Wang Jintu

    2011-04-01

    Full Text Available Abstract Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio, a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3

  3. The sequence and analysis of duplication rich human chromosome 16

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Joel; Han, Cliff; Gordon, Laurie A.; Terry, Astrid; Prabhakar, Shyam; She, Xinwei; Xie, Gary; Hellsten, Uffe; Man Chan, Yee; Altherr, Michael; Couronne, Olivier; Aerts, Andrea; Bajorek, Eva; Black, Stacey; Blumer, Heather; Branscomb, Elbert; Brown, Nancy C.; Bruno, William J.; Buckingham, Judith M.; Callen, David F.; Campbell, Connie S.; Campbell, Mary L.; Campbell, Evelyn W.; Caoile, Chenier; Challacombe, Jean F.; Chasteen, Leslie A.; Chertkov, Olga; Chi, Han C.; Christensen, Mari; Clark, Lynn M.; Cohn, Judith D.; Denys, Mirian; Detter, John C.; Dickson, Mark; Dimitrijevic-Bussod, Mira; Escobar, Julio; Fawcett, Joseph J.; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Goodwin, Lynne A.; Grady, Deborah L.; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Hildebrand, Carl E.; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Jewett, Phillip E.; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Krawczyk, Marie-Claude; Leyba, Tina; Longmire, Jonathan L.; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Ludeman, Thom; Mark, Graham A.; Mcmurray, Kimberly L.; Meincke, Linda J.; Morgan, Jenna; Moyzis, Robert K.; Mundt, Mark O.; Munk, A. Christine; Nandkeshwar, Richard D.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Parson-Quintana, Beverly; Ramirez, Lucia; Rash, Sam; Retterer, James; Ricke, Darryl O.; Robinson, Donna L.; Rodriguez, Alex; Salamov, Asaf; Saunders, Elizabeth H.; Scott, Duncan; Shough, Timothy; Stallings, Raymond L.; Stalvey, Malinda; Sutherland, Robert D.; Tapia, Roxanne; Tesmer, Judith G.; Thayer, Nina; Thompson, Linda S.; Tice, Hope; Torney, David C.; Tran-Gyamfi, Mary; Tsai, Ming; Ulanovsky, Levy E.; Ustaszewska, Anna; Vo, Nu; White, P. Scott; Williams, Albert L.; Wills, Patricia L.; Wu, Jung-Rung; Wu, Kevin; Yang, Joan; DeJong, Pieter; Bruce, David; Doggett, Norman; Deaven, Larry; Schmutz, Jeremy; Grimwood, Jane; Richardson, Paul; et al.

    2004-08-01

    We report here the 78,884,754 base pairs of finished human chromosome 16 sequence, representing over 99.9 percent of its euchromatin. Manual annotation revealed 880 protein coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes and 3 RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobasepairs were identified and result in gene content differences across humans. One of the unique features of chromosome 16 is its high level of segmental duplication, ranked among the highest of the human autosomes. While the segmental duplications are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events which are likely to have had an impact on the evolution of primates and human disease susceptibility.

  4. The Sequence and Analysis of Duplication Rich Human Chromosome 16

    Science.gov (United States)

    Martin, Joel; Han, Cliff; Gordon, Laurie A.; Terry, Astrid; Prabhakar, Shyam; She, Xinwei; Xie, Gary; Hellsten, Uffe; Man Chan, Yee; Altherr, Michael; Couronne, Olivier; Aerts, Andrea; Bajorek, Eva; Black, Stacey; Blumer, Heather; Branscomb, Elbert; Brown, Nancy C.; Bruno, William J.; Buckingham, Judith M.; Callen, David F.; Campbell, Connie S.; Campbell, Mary L.; Campbell, Evelyn W.; Caoile, Chenier; Challacombe, Jean F.; Chasteen, Leslie A.; Chertkov, Olga; Chi, Han C.; Christensen, Mari; Clark, Lynn M.; Cohn, Judith D.; Denys, Mirian; Detter, John C.; Dickson, Mark; Dimitrijevic-Bussod, Mira; Escobar, Julio; Fawcett, Joseph J.; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Goodwin, Lynne A.; Grady, Deborah L.; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Hildebrand, Carl E.; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Jewett, Phillip E.; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Krawczyk, Marie-Claude; Leyba, Tina; Longmire, Jonathan L.; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Ludeman, Thom; Mark, Graham A.; Mcmurray, Kimberly L.; Meincke, Linda J.; Morgan, Jenna; Moyzis, Robert K.; Mundt, Mark O.; Munk, A. Christine; Nandkeshwar, Richard D.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Parson-Quintana, Beverly; Ramirez, Lucia; Rash, Sam; Retterer, James; Ricke, Darryl O.; Robinson, Donna L.; Rodriguez, Alex; Salamov, Asaf; Saunders, Elizabeth H.; Scott, Duncan; Shough, Timothy; Stallings, Raymond L.; Stalvey, Malinda; Sutherland, Robert D.; Tapia, Roxanne; Tesmer, Judith G.; Thayer, Nina; Thompson, Linda S.; Tice, Hope; Torney, David C.; Tran-Gyamfi, Mary; Tsai, Ming; Ulanovsky, Levy E.; Ustaszewska, Anna; Vo, Nu; White, P. Scott; Williams, Albert L.; Wills, Patricia L.; Wu, Jung-Rung; Wu, Kevin; Yang, Joan; DeJong, Pieter; Bruce, David; Doggett, Norman; Deaven, Larry; Schmutz, Jeremy; Grimwood, Jane; Richardson, Paul; et al.

    2004-01-01

    We report here the 78,884,754 base pairs of finished human chromosome 16 sequence, representing over 99.9 percent of its euchromatin. Manual annotation revealed 880 protein coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes and 3 RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobasepairs were identified and result in gene content differences across humans. One of the unique features of chromosome 16 is its high level of segmental duplication, ranked among the highest of the human autosomes. While the segmental duplications are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events which are likely to have had an impact on the evolution of primates and human disease susceptibility.

  5. Massively parallel sequencing and analysis of the Necator americanus transcriptome.

    Directory of Open Access Journals (Sweden)

    Cinzia Cantacessi

    2010-05-01

    Full Text Available The blood-feeding hookworm Necator americanus infects hundreds of millions of people worldwide. In order to elucidate fundamental molecular biological aspects of this hookworm, the transcriptome of the adult stage of Necator americanus was explored using next-generation sequencing and bioinformatic analyses.A total of 19,997 contigs were assembled from the sequence data; 6,771 of these contigs had known orthologues in the free-living nematode Caenorhabditis elegans, and most of them encoded proteins with WD40 repeats (10.6%, proteinase inhibitors (7.8% or calcium-binding EF-hand proteins (6.7%. Bioinformatic analyses inferred that the C. elegans homologues are involved mainly in biological pathways linked to ribosome biogenesis (70%, oxidative phosphorylation (63% and/or proteases (60%; most of these molecules were predicted to be involved in more than one biological pathway. Comparative analyses of the transcriptomes of N. americanus and the canine hookworm, Ancylostoma caninum, revealed qualitative and quantitative differences. For instance, proteinase inhibitors were inferred to be highly represented in the former species, whereas SCP/Tpx-1/Ag5/PR-1/Sc7 proteins ( = SCP/TAPS or Ancylostoma-secreted proteins were predominant in the latter. In N. americanus, essential molecules were predicted using a combination of orthology mapping and functional data available for C. elegans. Further analyses allowed the prioritization of 18 predicted drug targets which did not have homologues in the human host. These candidate targets were inferred to be linked to mitochondrial (e.g., processing proteins or amino acid metabolism (e.g., asparagine t-RNA synthetase.This study has provided detailed insights into the transcriptome of the adult stage of N. americanus and examines similarities and differences between this species and A. caninum. Future efforts should focus on comparative transcriptomic and proteomic investigations of the other predominant human

  6. Peptide Pattern Recognition for high-throughput protein sequence analysis and clustering

    DEFF Research Database (Denmark)

    Busk, Peter Kamp

    2017-01-01

    Large collections of protein sequences with divergent sequences are tedious to analyze for understanding their phylogenetic or structure-function relation. Peptide Pattern Recognition is an algorithm that was developed to facilitate this task but the previous version does only allow a limited...... number of sequences as input. I implemented Peptide Pattern Recognition as a multithread software designed to handle large numbers of sequences and perform analysis in a reasonable time frame. Benchmarking showed that the new implementation of Peptide Pattern Recognition is twenty times faster than...... the previous implementation on a small protein collection with 673 MAP kinase sequences. In addition, the new implementation could analyze a large protein collection with 48,570 Glycosyl Transferase family 20 sequences without reaching its upper limit on a desktop computer. Peptide Pattern Recognition...

  7. Analysis and Functional Annotation of an Expressed Sequence Tag Collection for Tropical Crop Sugarcane

    Science.gov (United States)

    Vettore, André L.; da Silva, Felipe R.; Kemper, Edson L.; Souza, Glaucia M.; da Silva, Aline M.; Ferro, Maria Inês T.; Henrique-Silva, Flavio; Giglioti, Éder A.; Lemos, Manoel V.F.; Coutinho, Luiz L.; Nobrega, Marina P.; Carrer, Helaine; França, Suzelei C.; Bacci, Maurício; Goldman, Maria Helena S.; Gomes, Suely L.; Nunes, Luiz R.; Camargo, Luis E.A.; Siqueira, Walter J.; Van Sluys, Marie-Anne; Thiemann, Otavio H.; Kuramae, Eiko E.; Santelli, Roberto V.; Marino, Celso L.; Targon, Maria L.P.N.; Ferro, Jesus A.; Silveira, Henrique C.S.; Marini, Danyelle C.; Lemos, Eliana G.M.; Monteiro-Vitorello, Claudia B.; Tambor, José H.M.; Carraro, Dirce M.; Roberto, Patrícia G.; Martins, Vanderlei G.; Goldman, Gustavo H.; de Oliveira, Regina C.; Truffi, Daniela; Colombo, Carlos A.; Rossi, Magdalena; de Araujo, Paula G.; Sculaccio, Susana A.; Angella, Aline; Lima, Marleide M.A.; de Rosa, Vicente E.; Siviero, Fábio; Coscrato, Virginia E.; Machado, Marcos A.; Grivet, Laurent; Di Mauro, Sonia M.Z.; Nobrega, Francisco G.; Menck, Carlos F.M.; Braga, Marilia D.V.; Telles, Guilherme P.; Cara, Frank A.A.; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo

    2003-01-01

    To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged. PMID:14613979

  8. Analysis and functional annotation of an expressed sequence tag collection for tropical crop sugarcane.

    Science.gov (United States)

    Vettore, André L; da Silva, Felipe R; Kemper, Edson L; Souza, Glaucia M; da Silva, Aline M; Ferro, Maria Inês T; Henrique-Silva, Flavio; Giglioti, Eder A; Lemos, Manoel V F; Coutinho, Luiz L; Nobrega, Marina P; Carrer, Helaine; França, Suzelei C; Bacci Júnior, Mauricio; Goldman, Maria Helena S; Gomes, Suely L; Nunes, Luiz R; Camargo, Luis E A; Siqueira, Walter J; Van Sluys, Marie-Anne; Thiemann, Otavio H; Kuramae, Eiko E; Santelli, Roberto V; Marino, Celso L; Targon, Maria L P N; Ferro, Jesus A; Silveira, Henrique C S; Marini, Danyelle C; Lemos, Eliana G M; Monteiro-Vitorello, Claudia B; Tambor, José H M; Carraro, Dirce M; Roberto, Patrícia G; Martins, Vanderlei G; Goldman, Gustavo H; de Oliveira, Regina C; Truffi, Daniela; Colombo, Carlos A; Rossi, Magdalena; de Araujo, Paula G; Sculaccio, Susana A; Angella, Aline; Lima, Marleide M A; de Rosa Júnior, Vicente E; Siviero, Fábio; Coscrato, Virginia E; Machado, Marcos A; Grivet, Laurent; Di Mauro, Sonia M Z; Nobrega, Francisco G; Menck, Carlos F M; Braga, Marilia D V; Telles, Guilherme P; Cara, Frank A A; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo

    2003-12-01

    To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged.

  9. Biostratigraphic data from Upper Cretaceous formations-eastern Wyoming, central Colorado, and northeastern New Mexico

    Science.gov (United States)

    Merewether, E.A.; Cobban, W.A.; Obradovich, J.D.

    2011-01-01

    Lithological and paleontological studies of outcrops of Upper Cretaceous formations were conducted at 12 localities in eastern Wyoming, central Colorado, and northeastern New Mexico. The sequence extends upward from the top of the Mowry Shale, or age-equivalent rocks, through the Graneros Shale, Greenhorn Limestone, Carlile Shale, Niobrara Formation, Pierre Shale, and Fox Hills Sandstone, or age-equivalent formations, to the top of the Laramie Formation, or laterally equivalent formations. The strata are mainly siliciclastic and calcareous, with thicknesses ranging from about 3,300 ft in northeastern New Mexico to as much as 13,500 ft in eastern Wyoming. Deposition was mainly in marine environments and molluscan fossils of Cenomanian through Maastrichtian ages are common. Radiometric ages were determined from beds of bentonite that are associated with fossil zones. The Upper Cretaceous formations at the 12 study localities are herein divided into three informal time-stratigraphic units based on fossil content and contact relations with adjacent strata. The basal unit in most places extends from the base of the Graneros to the top of the Niobrara, generally to the horizon of the fossil Scaphites hippocrepis, and spans a period of about 14 million years. The middle unit generally extends from the top of the Niobrara to the approximate middle of the Pierre, the horizon of the fossil Baculites gregoryensis, and represents a period of about 5 million years. The upper unit includes strata between the middle of the Pierre and the top of the Upper Cretaceous Series, which is the top of the Laramie Formation or of laterally equivalent formations; it represents a period of deposition of as much as 11 million years. Comparisons of the collections of fossils from each outcrop with the complete sequence of Upper Cretaceous index fossils can indicate disconformable contacts and lacunae. Widespread disconformities have been found within the Carlile Shale and between the Carlile

  10. Combined DECS Analysis and Next-Generation Sequencing Enable Efficient Detection of Novel Plant RNA Viruses.

    Science.gov (United States)

    Yanagisawa, Hironobu; Tomita, Reiko; Katsu, Koji; Uehara, Takuya; Atsumi, Go; Tateda, Chika; Kobayashi, Kappei; Sekine, Ken-Taro

    2016-03-07

    The presence of high molecular weight double-stranded RNA (dsRNA) within plant cells is an indicator of infection with RNA viruses as these possess genomic or replicative dsRNA. DECS (dsRNA isolation, exhaustive amplification, cloning, and sequencing) analysis has been shown to be capable of detecting unknown viruses. We postulated that a combination of DECS analysis and next-generation sequencing (NGS) would improve detection efficiency and usability of the technique. Here, we describe a model case in which we efficiently detected the presumed genome sequence of Blueberry shoestring virus (BSSV), a member of the genus Sobemovirus, which has not so far been reported. dsRNAs were isolated from BSSV-infected blueberry plants using the dsRNA-binding protein, reverse-transcribed, amplified, and sequenced using NGS. A contig of 4,020 nucleotides (nt) that shared similarities with sequences from other Sobemovirus species was obtained as a candidate of the BSSV genomic sequence. Reverse transcription (RT)-PCR primer sets based on sequences from this contig enabled the detection of BSSV in all BSSV-infected plants tested but not in healthy controls. A recombinant protein encoded by the putative coat protein gene was bound by the BSSV-antibody, indicating that the candidate sequence was that of BSSV itself. Our results suggest that a combination of DECS analysis and NGS, designated here as "DECS-C," is a powerful method for detecting novel plant viruses.

  11. Data Analysis of Sequences and qPCR for Microbial Communities during Algal Blooms

    Science.gov (United States)

    A training opportunity is open to a highly microbial-research-motivated student to conduct sequence analysis, explore novel genes and metabolic pathways, validate resultant findings using qPCR/RT-qPCR and summarize the findings

  12. Seismically induced accident sequence analysis of the advanced test reactor

    International Nuclear Information System (INIS)

    Khericha, S.T.; Henry, D.M.; Ravindra, M.K.; Hashimoto, P.S.; Griffin, M.J.; Tong, W.H.; Nafday, A.M.

    1991-01-01

    A seismic probabilistic risk assessment (PRA) was performed for the Department of Energy (DOE) Advanced Test Reactor (ATR) as part of the external events analysis. The risk from seismic events to the fuel in the core and in the fuel storage canal was evaluated. The key elements of this paper are the integration of seismically induced internal flood and internal fire, and the modeling of human error rates as a function of the magnitude of earthquake. The systems analysis was performed by EG ampersand G Idaho, Inc. and the fragility analysis and quantification were performed by EQE International, Inc. (EQE)

  13. Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer

    Science.gov (United States)

    2015-09-01

    Assay Kits respectively on the Qubit 2.0 Fluorometer (Life Technologies). The BioRad Experion Automated Electrophoresis System RNA kit was used to...AWARD NUMBER: W81XWH-14-1-0080 TITLE: Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer. PRINCIPAL INVESTIGATOR...Aug 2015 4. TITLE AND SUBTITLE Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer. 5a. CONTRACT NUMBER 5b. GRANT

  14. Microscopic Analysis and Modeling of Airport Surface Sequencing, Phase II

    Data.gov (United States)

    National Aeronautics and Space Administration — Although a number of airportal surface models exist and have been successfully used for analysis of airportal operations, only recently has it become possible to...

  15. Microscopic Analysis and Modeling of Airport Surface Sequencing, Phase I

    Data.gov (United States)

    National Aeronautics and Space Administration — The complexity and interdependence of operations on the airport surface motivate the need for a comprehensive and detailed, yet flexible and validated analysis and...

  16. BioMatriX: Sequence analysis, structure visualization, phylogenetics ...

    African Journals Online (AJOL)

    bmx-biomatrix.blogspot.com) developed for biological science community to augment scientific research regarding genomics, proteomics, phylogenetics and linkage analysis in one platform. BioMatriX offers multi-functional services to perform ...

  17. Secure distributed genome analysis for GWAS and sequence comparison computation

    Science.gov (United States)

    2015-01-01

    Background The rapid increase in the availability and volume of genomic data makes significant advances in biomedical research possible, but sharing of genomic data poses challenges due to the highly sensitive nature of such data. To address the challenges, a competition for secure distributed processing of genomic data was organized by the iDASH research center. Methods In this work we propose techniques for securing computation with real-life genomic data for minor allele frequency and chi-squared statistics computation, as well as distance computation between two genomic sequences, as specified by the iDASH competition tasks. We put forward novel optimizations, including a generalization of a version of mergesort, which might be of independent interest. Results We provide implementation results of our techniques based on secret sharing that demonstrate practicality of the suggested protocols and also report on performance improvements due to our optimization techniques. Conclusions This work describes our techniques, findings, and experimental results developed and obtained as part of iDASH 2015 research competition to secure real-life genomic computations and shows feasibility of securely computing with genomic data in practice. PMID:26733307

  18. Sequence analysis of L RNA of Lassa virus

    International Nuclear Information System (INIS)

    Vieth, Simon; Torda, Andrew E.; Asper, Marcel; Schmitz, Herbert; Guenther, Stephan

    2004-01-01

    The L RNA of three Lassa virus strains originating from Nigeria, Ghana/Ivory Coast, and Sierra Leone was sequenced and the data subjected to structure predictions and phylogenetic analyses. The L gene products had 2218-2221 residues, diverged by 18% at the amino acid level, and contained several conserved regions. Only one region of 504 residues (positions 1043-1546) could be assigned a function, namely that of an RNA polymerase. Secondary structure predictions suggest that this domain is very similar to RNA-dependent RNA polymerases of known structure encoded by plus-strand RNA viruses, permitting a model to be built. Outside the polymerase region, there is little structural data, except for regions of strong alpha-helical content and probably a coiled-coil domain at the N terminus. No evidence for reassortment or recombination during Lassa virus evolution was found. The secondary structure-assisted alignment of the RNA polymerase region permitted a reliable reconstruction of the phylogeny of all negative-strand RNA viruses, indicating that Arenaviridae are most closely related to Nairoviruses. In conclusion, the data provide a basis for structural and functional characterization of the Lassa virus L protein and reveal new insights into the phylogeny of negative-strand RNA viruses

  19. BNL severe accident sequence experiments and analysis program

    International Nuclear Information System (INIS)

    Greene, G.A.; Ginsberg, T.; Tutu, N.K.

    1985-01-01

    Analyses of LWR degraded core accidents require mathematical characterization of two major sources of pressure and temperature loading on the reactor containment buildings: (1) steam generation from core debris-water thermal interactions and (2) molten core-concrete interactions. Experiments are in progress at BNL in support of analytical model development related to aspects of the above containment loading mechanisms. The work supports development and evaluation of the CORCON, MARCH, CONTAIN and MEDICI computer under development at other NRC-contractor laboratories. The thermal-hydraulic behavior of hot debris located within the reactor core region upon sudden introduction of cooling water is being investigated in a joint experimental and analytical program. This work supports development and evaluation of the SCDAP computer code being developed at EG and G to characterize in-vessel severe core damage accident sequences. Progress is described in the two areas of: 1) core debris thermal-hydraulic phenomenology and 2) heat transfer in core-concrete interactions

  20. Reproducible analysis of sequencing-based RNA structure probing data with user-friendly tools

    DEFF Research Database (Denmark)

    Kielpinski, Lukasz Jan; Sidiropoulos, Nikos; Vinther, Jeppe

    2015-01-01

    time also made analysis of the data challenging for scientists without formal training in computational biology. Here, we discuss different strategies for data analysis of massive parallel sequencing-based structure-probing data. To facilitate reproducible and standardized analysis of this type of data...

  1. Meta-analysis of small RNA-sequencing errors reveals ubiquitous post-transcriptional RNA modifications.

    Science.gov (United States)

    Ebhardt, H Alexander; Tsang, Herbert H; Dai, Denny C; Liu, Yifeng; Bostan, Babak; Fahlman, Richard P

    2009-05-01

    Recent advances in DNA-sequencing technology have made it possible to obtain large datasets of small RNA sequences. Here we demonstrate that not all non-perfectly matched small RNA sequences are simple technological sequencing errors, but many hold valuable biological information. Analysis of three small RNA datasets originating from Oryza sativa and Arabidopsis thaliana small RNA-sequencing projects demonstrates that many single nucleotide substitution errors overlap when aligning homologous non-identical small RNA sequences. Investigating the sites and identities of substitution errors reveal that many potentially originate as a result of post-transcriptional modifications or RNA editing. Modifications include N1-methyl modified purine nucleotides in tRNA, potential deamination or base substitutions in micro RNAs, 3' micro RNA uridine extensions and 5' micro RNA deletions. Additionally, further analysis of large sequencing datasets reveal that the combined effects of 5' deletions and 3' uridine extensions can alter the specificity by which micro RNAs associate with different Argonaute proteins. Hence, we demonstrate that not all sequencing errors in small RNA datasets are technical artifacts, but that these actually often reveal valuable biological insights to the sites of post-transcriptional RNA modifications.

  2. Expressed sequence tags as a tool for phylogenetic analysis of placental mammal evolution.

    Directory of Open Access Journals (Sweden)

    Morgan Kullberg

    Full Text Available BACKGROUND: We investigate the usefulness of expressed sequence tags, ESTs, for establishing divergences within the tree of placental mammals. This is done on the example of the established relationships among primates (human, lagomorphs (rabbit, rodents (rat and mouse, artiodactyls (cow, carnivorans (dog and proboscideans (elephant. METHODOLOGY/PRINCIPAL FINDINGS: We have produced 2000 ESTs (1.2 mega bases from a marsupial mouse and characterized the data for their use in phylogenetic analysis. The sequences were used to identify putative orthologous sequences from whole genome projects. Although most ESTs stem from single sequence reads, the frequency of potential sequencing errors was found to be lower than allelic variation. Most of the sequences represented slowly evolving housekeeping-type genes, with an average amino acid distance of 6.6% between human and mouse. Positive Darwinian selection was identified at only a few single sites. Phylogenetic analyses of the EST data yielded trees that were consistent with those established from whole genome projects. CONCLUSIONS: The general quality of EST sequences and the general absence of positive selection in these sequences make ESTs an attractive tool for phylogenetic analysis. The EST approach allows, at reasonable costs, a fast extension of data sampling from species outside the genome projects.

  3. A base composition analysis of natural patterns for the preprocessing of metagenome sequences.

    Science.gov (United States)

    Bonham-Carter, Oliver; Ali, Hesham; Bastola, Dhundy

    2013-01-01

    On the pretext that sequence reads and contigs often exhibit the same kinds of base usage that is also observed in the sequences from which they are derived, we offer a base composition analysis tool. Our tool uses these natural patterns to determine relatedness across sequence data. We introduce spectrum sets (sets of motifs) which are permutations of bacterial restriction sites and the base composition analysis framework to measure their proportional content in sequence data. We suggest that this framework will increase the efficiency during the pre-processing stages of metagenome sequencing and assembly projects. Our method is able to differentiate organisms and their reads or contigs. The framework shows how to successfully determine the relatedness between these reads or contigs by comparison of base composition. In particular, we show that two types of organismal-sequence data are fundamentally different by analyzing their spectrum set motif proportions (coverage). By the application of one of the four possible spectrum sets, encompassing all known restriction sites, we provide the evidence to claim that each set has a different ability to differentiate sequence data. Furthermore, we show that the spectrum set selection having relevance to one organism, but not to the others of the data set, will greatly improve performance of sequence differentiation even if the fragment size of the read, contig or sequence is not lengthy. We show the proof of concept of our method by its application to ten trials of two or three freshly selected sequence fragments (reads and contigs) for each experiment across the six organisms of our set. Here we describe a novel and computationally effective pre-processing step for metagenome sequencing and assembly tasks. Furthermore, our base composition method has applications in phylogeny where it can be used to infer evolutionary distances between organisms based on the notion that related organisms often have much conserved code.

  4. Establishment of screening technique for mutant cell and analysis of base sequence in the mutation

    International Nuclear Information System (INIS)

    Sofuni, Toshio; Nomi, Takehiko; Yamada, Masami; Masumura, Kenichi

    2000-01-01

    This research project aimed to establish an easy and quick detection method for radiation-induced mutation using molecular-biological techniques and an effective analyzing method for the molecular changes in base sequence. In this year, Spi mutants derived from γ-radiation exposed mouse were analyzed by PCR method and DNA sequence method. Male transgenic mice were exposed to γ-ray at 5,10, 50 Gy and the transgene was taken out from the genome DNA from the spleen in vivo packaging method. Spi mutant plaques were obtained by infecting the recovered phage to E. coli. Sequence analysis for the mutants was made using ALFred DNA sequencer and SequiTherm TM Long-Red Cycle sequencing kit. Sequence analysis was carried out for 41 of 50 independent Spi mutants obtained. The deletions were classified into 4 groups; Group 1 included 15 mutants that were characterized with a large deletion (43 bp-10 kb) with a short homologous sequence. Group 2 included 11 mutants of a large deletion having no homologous sequence at the connecting region. Group 3 included 11 mutants having a short deletion of less than 20 bp, which occurred in the non-repetitive sequence of gam gene and possibly caused by oxidative breakage of DNA or recombination of DNA fragment produced by the breakage. Group 4 included 4 mutants having deletions as short as 20 bp or less in the repetitive sequence of gam gene, resulting in an alteration of the reading frame. Thus, the synthesis of Gam protein was terminated by the appearance of TGA between code 13 and 14 of redB gene, leading to inactivation of gam gene and redBA gene. These results indicated that most of Spi mutants had a deletion in red/gam region and the deletions in more than half mutants occurred in homologous sequences as short as 8 bp. (M.N.)

  5. The BsaHI restriction-modification system: Cloning, sequencing and analysis of conserved motifs

    Directory of Open Access Journals (Sweden)

    Roberts Richard J

    2008-05-01

    Full Text Available Abstract Background Restriction and modification enzymes typically recognise short DNA sequences of between two and eight bases in length. Understanding the mechanism of this recognition represents a significant challenge that we begin to address for the BsaHI restriction-modification system, which recognises the six base sequence GRCGYC. Results The DNA sequences of the genes for the BsaHI methyltransferase, bsaHIM, and restriction endonuclease, bsaHIR, have been determined (GenBank accession #EU386360, cloned and expressed in E. coli. Both the restriction endonuclease and methyltransferase enzymes share significant similarity with a group of 6 other enzymes comprising the restriction-modification systems HgiDI and HgiGI and the putative HindVP, NlaCORFDP, NpuORFC228P and SplZORFNP restriction-modification systems. A sequence alignment of these homologues shows that their amino acid sequences are largely conserved and highlights several motifs of interest. We target one such conserved motif, reading SPERRFD, at the C-terminal end of the bsaHIR gene. A mutational analysis of these amino acids indicates that the motif is crucial for enzymatic activity. Sequence alignment of the methyltransferase gene reveals a short motif within the target recognition domain that is conserved among enzymes recognising the same sequences. Thus, this motif may be used as a diagnostic tool to define the recognition sequences of the cytosine C5 methyltransferases. Conclusion We have cloned and sequenced the BsaHI restriction and modification enzymes. We have identified a region of the R. BsaHI enzyme that is crucial for its activity. Analysis of the amino acid sequence of the BsaHI methyltransferase enzyme led us to propose two new motifs that can be used in the diagnosis of the recognition sequence of the cytosine C5-methyltransferases.

  6. [Sequence analysis of ITS2 and CO1 genes of Paragonimus harinasutai].

    Science.gov (United States)

    Qian, Bao-zhen; Sugiyama, H; Waikagul, J; Zhu, Zhi-hang

    2006-04-30

    To identify Paragonimus harinasutai from Ninghai, Zhejiang Province, China. Metacercariae were collected from the crabs Sinopotamon chekiangenes in Xixi village of Ninghai County for ITS2 sequence analysis, CO1 sequence analysis and endonuclease BsaHI and StuI analysis by PCR-RFLP. Results The fingerprintings of PCR-RFLP were virtually same to the isolate from Thailand (Nakorn-nayok). The ITS2 sequence with 366 bp and CO1 sequence with 390 bp of the metacercariae collected from Ninghai revealed a nucleotide identity 95.6% and 89.5% respectively to the Thai isolate. The study confirmed that Paragonimus harinasutai is present in Ninghai, China, with certain variation on molecular biology in comparison to the Thai isolate.

  7. De novo structural modeling and computational sequence analysis ...

    African Journals Online (AJOL)

    Jane

    2011-07-25

    Jul 25, 2011 ... Our study was aimed towards computational proteomic analysis and 3D structural modeling of this novel bacteriocin protein encoded by the earlier aforementioned gene. Different bioinformatics tools and machine learning techniques were used for protein structural classification. De novo protein modeling ...

  8. Inter Simple Sequence Repeat (ISSR) analysis of wild and cultivated ...

    African Journals Online (AJOL)

    ONOS

    2010-08-09

    Aug 9, 2010 ... ed through interspecific hybridization of Asian and African rice, formed a cluster with Asian rice. Generally cultivated and wild species clearly observed to have separated groups in both UPGMA and neighbor joining analysis. The two methods showed almost the same tree topology with similar groupings ...

  9. Sequence length variation, indel costs, and congruence in sensitivity analysis

    DEFF Research Database (Denmark)

    Aagesen, Lone; Petersen, Gitte; Seberg, Ole

    2005-01-01

    The behavior of two topological and four character-based congruence measures was explored using different indel treatments in three empirical data sets, each with different alignment difficulties. The analyses were done using direct optimization within a sensitivity analysis framework in which...

  10. Simple sequence repeat (SSR) markers analysis of genetic diversity ...

    African Journals Online (AJOL)

    hope&shola

    2012-04-24

    Apr 24, 2012 ... eight groups according to the k-means cluster analysis based on seed quality characters. There were significant differences among different groups (P< 0.01). The characteristics of each group were as follows; group 1: yellow seed with high oil and protein content, group 2: yellow seed with low oil and high ...

  11. Microarrays and high-throughput transcriptomic analysis in species with incomplete availability of genomic sequences.

    Science.gov (United States)

    Pariset, Lorraine; Chillemi, Giovanni; Bongiorni, Silvia; Romano Spica, Vincenzo; Valentini, Alessio

    2009-06-01

    Microarrays produce a measurement of gene expression based on the relative measures of dye intensities that correspond to the amount of target RNA. This technology is fast developing and its application is expanding from Homo sapiens to a wide number of species, where enough information on sequences and annotations exist. Anyway, the number of species for which a dedicated platform exists is not high. The use of heterologous array hybridization, screening for gene expression in one species using an array developed for another one, is still quite frequent, even though cross-species microarray hybridization has raised many arguments. Some methods which are high throughput and do not rely on knowledge of the DNA/RNA sequence exist, namely serial analysis of gene expression (SAGE), Massively Parallel Signature Sequencing (MPSS) and deep sequencing of full transcriptome. Although very powerful, particularly the latter, they are still quite costly and cumbersome methods. In some species where genome sequences are largely unknown, several anonymous sequences are deposited in gene banks as a result of Expressed Sequence Tags (ESTs) sequencing projects. The ESTs databases represent a valuable knowledge that can be exploited with some bioinformatic effort to build species-specific microarrays. We present here a method of high-density in situ synthesized microarrays starting from available EST sequences in, Ovis aries. Our data indicate that the method is very efficient and can be easily extended to other species of which genetic sequences are present in public databases, but neglected so far with advanced devices like microarrays. As a perspective, the approach can be applied also to species of which no sequences are available to date, thanks to high-throughput deep sequencing methods.

  12. Analysis of Multiple Genomic Sequence Alignments: A Web Resource, Online Tools, and Lessons Learned From Analysis of Mammalian SCL Loci

    Science.gov (United States)

    Chapman, Michael A.; Donaldson, Ian J.; Gilbert, James; Grafham, Darren; Rogers, Jane; Green, Anthony R.; Göttgens, Berthold

    2004-01-01

    Comparative analysis of genomic sequences is becoming a standard technique for studying gene regulation. However, only a limited number of tools are currently available for the analysis of multiple genomic sequences. An extensive data set for the testing and training of such tools is provided by the SCL gene locus. Here we have expanded the data set to eight vertebrate species by sequencing the dog SCL locus and by annotating the dog and rat SCL loci. To provide a resource for the bioinformatics community, all SCL sequences and functional annotations, comprising a collation of the extensive experimental evidence pertaining to SCL regulation, have been made available via a Web server. A Web interface to new tools specifically designed for the display and analysis of multiple sequence alignments was also implemented. The unique SCL data set and new sequence comparison tools allowed us to perform a rigorous examination of the true benefits of multiple sequence comparisons. We demonstrate that multiple sequence alignments are, overall, superior to pairwise alignments for identification of mammalian regulatory regions. In the search for individual transcription factor binding sites, multiple alignments markedly increase the signal-to-noise ratio compared to pairwise alignments. PMID:14718377

  13. Cloning and sequence analysis of hyaluronoglucosaminidase (nagH gene of Clostridium chauvoei

    Directory of Open Access Journals (Sweden)

    Saroj K. Dangi

    2017-09-01

    Full Text Available Aim: Blackleg disease is caused by Clostridium chauvoei in ruminants. Although virulence factors such as C. chauvoei toxin A, sialidase, and flagellin are well characterized, hyaluronidases of C. chauvoei are not characterized. The present study was aimed at cloning and sequence analysis of hyaluronoglucosaminidase (nagH gene of C. chauvoei. Materials and Methods: C. chauvoei strain ATCC 10092 was grown in ATCC 2107 media and confirmed by polymerase chain reaction (PCR using the primers specific for 16-23S rDNA spacer region. nagH gene of C. chauvoei was amplified and cloned into pRham-SUMO vector and transformed into Escherichia cloni 10G cells. The construct was then transformed into E. cloni cells. Colony PCR was carried out to screen the colonies followed by sequencing of nagH gene in the construct. Results: PCR amplification yielded nagH gene of 1143 bp product, which was cloned in prokaryotic expression system. Colony PCR, as well as sequencing of nagH gene, confirmed the presence of insert. Sequence was then subjected to BLAST analysis of NCBI, which confirmed that the sequence was indeed of nagH gene of C. chauvoei. Phylogenetic analysis of the sequence showed that it is closely related to Clostridium perfringens and Clostridium paraputrificum. Conclusion: The gene for virulence factor nagH was cloned into a prokaryotic expression vector and confirmed by sequencing.

  14. EventThread: Visual Summarization and Stage Analysis of Event Sequence Data.

    Science.gov (United States)

    Guo, Shunan; Xu, Ke; Zhao, Rongwen; Gotz, David; Zha, Hongyuan; Cao, Nan

    2018-01-01

    Event sequence data such as electronic health records, a person's academic records, or car service records, are ordered series of events which have occurred over a period of time. Analyzing collections of event sequences can reveal common or semantically important sequential patterns. For example, event sequence analysis might reveal frequently used care plans for treating a disease, typical publishing patterns of professors, and the patterns of service that result in a well-maintained car. It is challenging, however, to visually explore large numbers of event sequences, or sequences with large numbers of event types. Existing methods focus on extracting explicitly matching patterns of events using statistical analysis to create stages of event progression over time. However, these methods fail to capture latent clusters of similar but not identical evolutions of event sequences. In this paper, we introduce a novel visualization system named EventThread which clusters event sequences into threads based on tensor analysis and visualizes the latent stage categories and evolution patterns by interactively grouping the threads by similarity into time-specific clusters. We demonstrate the effectiveness of EventThread through usage scenarios in three different application domains and via interviews with an expert user.

  15. Eocene tectonic compression in Northern Zealandia: Magneto-biostratigraphic constraints from the sedimentary records of New Caledonia (Southwest Pacific Ocean)

    Science.gov (United States)

    Dallanave, E.; Agnini, C.; Pascher, K. M.; Maurizot, P.; Bachtadse, V.; Hollis, C. J.; Dickens, G. R.; Collot, J.; Sevin, B.; Strogen, D.; Monesi, E.

    2017-12-01

    Published seismic profiles acquired from the Tasman Sea and northern Zealandia area (southwest Pacific) point to a widespread Eocene convergent deformation of oceanic and continental crust, with reverse faults and uplift (Tectonic Event of the Cenozoic in the Tasman Area; TECTA). The TECTA is interpreted as the precursor of the Tonga-Kermadec subduction initiation. Grande Terre is the main island of the New Caledonia archipelago and the largest emergent portion of northern Norfolk Ridge (part of northern Zealandia). Eocene sedimentary records exposed in Grande Terre contain a transition from pelagic micrite to terrigenous-rich calciturbidites, marking a shift from passive margin to convergent tectonic regime. This could represent the local expression of the convergence inception observed on a regional scale. We conducted an integrated magneto-biostratigraphic study, based on calcareous nannofossil and radiolaria, of two early-middle Eocene records cropping out near Noumea (southwest Grande Terre) and Koumac (northwest Grande Terre). The natural remanent magnetization of the sediments is complicated by multiple vector components, likely related to the late Eocene obduction, but a characteristic remanent magnetization has been successfully isolated. Overall the record spans from magnetic polarity Chron C23n to C18n, i.e. from 51 to 39 Ma. In this robust magnetic polarity-based chronological frame, the pelagic micrite to terrigenous-rich calciturbidites occurred near the top of Chron C21n and is dated 46 Ma. Furthermore, the magnetic mineral assemblage within part of the calciturbidites consists of hematite associated with maghemite. This association indicates emergent land as source of the terrigenous, suggesting a considerable uplift. Because 94% of the Zealandia continent is submerged, ocean drilling is needed to gauge the full extent and timing of Eocene compressive deformation revealed by the seismic profiles acquired in the Tasman area. This is a primary aim of

  16. The scale analysis sequence for LWR fuel depletion

    International Nuclear Information System (INIS)

    Hermann, O.W.; Parks, C.V.

    1991-01-01

    The SCALE (Standardized Computer Analyses for Licensing Evaluation) code system is used extensively to perform away-from-reactor safety analysis (particularly criticality safety, shielding, heat transfer analyses) for spent light water reactor (LWR) fuel. Spent fuel characteristics such as radiation sources, heat generation sources, and isotopic concentrations can be computed within SCALE using the SAS2 control module. A significantly enhanced version of the SAS2 control module, which is denoted as SAS2H, has been made available with the release of SCALE-4. For each time-dependent fuel composition, SAS2H performs one-dimensional (1-D) neutron transport analyses (via XSDRNPM-S) of the reactor fuel assembly using a two-part procedure with two separate unit-cell-lattice models. The cross sections derived from a transport analysis at each time step are used in a point-depletion computation (via ORIGEN-S) that produces the burnup-dependent fuel composition to be used in the next spectral calculation. A final ORIGEN-S case is used to perform the complete depletion/decay analysis using the burnup-dependent cross sections. The techniques used by SAS2H and two recent applications of the code are reviewed in this paper. 17 refs., 5 figs., 5 tabs

  17. MEGAN Community Edition - Interactive Exploration and Analysis of Large-Scale Microbiome Sequencing Data.

    Directory of Open Access Journals (Sweden)

    Daniel H Huson

    2016-06-01

    Full Text Available There is increasing interest in employing shotgun sequencing, rather than amplicon sequencing, to analyze microbiome samples. Typical projects may involve hundreds of samples and billions of sequencing reads. The comparison of such samples against a protein reference database generates billions of alignments and the analysis of such data is computationally challenging. To address this, we have substantially rewritten and extended our widely-used microbiome analysis tool MEGAN so as to facilitate the interactive analysis of the taxonomic and functional content of very large microbiome datasets. Other new features include a functional classifier called InterPro2GO, gene-centric read assembly, principal coordinate analysis of taxonomy and function, and support for metadata. The new program is called MEGAN Community Edition (CE and is open source. By integrating MEGAN CE with our high-throughput DNA-to-protein alignment tool DIAMOND and by providing a new program MeganServer that allows access to metagenome analysis files hosted on a server, we provide a straightforward, yet powerful and complete pipeline for the analysis of metagenome shotgun sequences. We illustrate how to perform a full-scale computational analysis of a metagenomic sequencing project, involving 12 samples and 800 million reads, in less than three days on a single server. All source code is available here: https://github.com/danielhuson/megan-ce.

  18. Application of sequence stratigraphy to carbonate reservoir prediction, Early Palaeozoic eastern Warburton basin, South Australia

    Energy Technology Data Exchange (ETDEWEB)

    Xiaowen S.; Stuart, W.J.

    1996-12-31

    The Early Palaeozoic Warburton Basin underlies the gas and oil producing Cooper and Eromanga Basins. Postdepositional tectonism created high potential fracture porosities, complicating the stratigraphy and making reservoir prediction difficult. Sequence stratigraphy integrating core, cuttings, well-log, seismic and biostratigraphic data has recognized a carbonate-dominated to mixed carbonate/siliciclastic supersequence comprising several depositional sequences. Biostratigraphy based on trilobites and conodonts ensures reliable well and seismic correlations across structurally complex areas. Lithofacies interpretation indicates sedimentary environments ranging from carbonate inner shelf, peritidal, shelf edge, deep outer shelf and slope to basin. Log facies show gradually upward shallowing trends or abrupt changes indicating possible sequence boundaries. With essential depositional models and sequence analysis from well data, seismic facies suggest general reflection configurations including parallel-continuous layered patterns indicating uniform neuritic shelf, and mounded structures suggesting carbonate build-ups and pre-existing volcanic relief. Seismic stratigraphy also reveals inclined slope and onlapping margins of a possibly isolated platform geometry. The potential reservoirs are dolomitized carbonates containing oomoldic, vuggy, intercrystalline and fracture porosities in lowstand systems tracts either on carbonate mounds and shelf crests or below shelf edge. The source rock is a deep basinal argillaceous mudstone, and the seal is fine-grained siltstone/shale of the transgressive system tract.

  19. Application of sequence stratigraphy to carbonate reservoir prediction, Early Palaeozoic eastern Warburton basin, South Australia

    Energy Technology Data Exchange (ETDEWEB)

    Xiaowen S.; Stuart, W.J.

    1996-01-01

    The Early Palaeozoic Warburton Basin underlies the gas and oil producing Cooper and Eromanga Basins. Postdepositional tectonism created high potential fracture porosities, complicating the stratigraphy and making reservoir prediction difficult. Sequence stratigraphy integrating core, cuttings, well-log, seismic and biostratigraphic data has recognized a carbonate-dominated to mixed carbonate/siliciclastic supersequence comprising several depositional sequences. Biostratigraphy based on trilobites and conodonts ensures reliable well and seismic correlations across structurally complex areas. Lithofacies interpretation indicates sedimentary environments ranging from carbonate inner shelf, peritidal, shelf edge, deep outer shelf and slope to basin. Log facies show gradually upward shallowing trends or abrupt changes indicating possible sequence boundaries. With essential depositional models and sequence analysis from well data, seismic facies suggest general reflection configurations including parallel-continuous layered patterns indicating uniform neuritic shelf, and mounded structures suggesting carbonate build-ups and pre-existing volcanic relief. Seismic stratigraphy also reveals inclined slope and onlapping margins of a possibly isolated platform geometry. The potential reservoirs are dolomitized carbonates containing oomoldic, vuggy, intercrystalline and fracture porosities in lowstand systems tracts either on carbonate mounds and shelf crests or below shelf edge. The source rock is a deep basinal argillaceous mudstone, and the seal is fine-grained siltstone/shale of the transgressive system tract.

  20. Multifractal analysis of DNA sequences using a novel chaos-game representation

    Science.gov (United States)

    Gutiérrez, J. M.; Rodríguez, M. A.; Abramson, G.

    2001-11-01

    We present a generalization of the standard chaos-game representation method introduced by Jeffrey. To this aim, a DNA symbolic sequence is mapped onto a singular measure on the attractor of a particular IFS model, which is a perfect statistical representation of the sequence. A multifractal analysis of the resulting measure is introduced and an interpretation of singularities in terms of mutual information and redundancy (statistical dependence) among subsequence symbols within the DNA sequence is provided. The multifractal spectrum is also shown to be more sensitive for detecting dependence structures within the DNA sequence than the averaged contribution given by redundancy. This method presents several advantages with respect to other representations such as walks or interfaces, which may introduce spurious effects. In contrast with the results obtained by other standard methods, here we note that no general statement can be made on the influence of coding and non-coding content on the correlation length of a given sequence.

  1. Studies of DNA dumbbells VIII. Melting analysis of DNA dumbbells with dinucleotide repeat stem sequences.

    Science.gov (United States)

    Mandell, Kathleen E; Vallone, Peter M; Owczarzy, Richard; Riccelli, Peter V; Benight, Albert S

    2006-06-15

    Melting curves and circular dichroism spectra were measured for a number of DNA dumbbell and linear molecules containing dinucleotide repeat sequences of different lengths. To study effects of different sequences on the melting and spectroscopic properties, six DNA dumbbells whose stems contain the central sequences (AA)(10), (AC)(10), (AG)(10), (AT)(10), (GC)(10), and (GG)(10) were prepared. These represent the minimal set of 10 possible dinucleotide repeats. To study effects of dinucleotide repeat length, dumbbells with the central sequences (AG)(n), n = 5 and 20, were prepared. Control molecules, dumbbells with a random central sequence, (RN)(n), n = 5, 10, and 20, were also prepared. The central sequence of each dumbbell was flanked on both sides by the same 12 base pairs and T(4) end-loops. Melting curves were measured by optical absorbance and differential scanning calorimetry in solvents containing 25, 55, 85, and 115 mM Na(+). CD spectra were collected from 20 to 45 degrees C and [Na(+)] from 25 to 115 mM. The spectral database did not reveal any apparent temperature dependence in the pretransition region. Analysis of the melting thermodynamics evaluated as a function of Na(+) provided a means for quantitatively estimating the counterion release with melting for the different sequences. Results show a very definite sequence dependence, indicating the salt-dependent properties of duplex DNA are also sequence dependent. Linear DNA molecules containing the (AG)(n) and (RN)(n), sequences, n = 5, 10, 20, and 30, were also prepared and studied. The linear DNA molecules had the exact sequences of the dumbbell stems. That is, the central repeat sequence in each linear duplex was flanked on both sides by the same 12-bp sequence. Melting and CD studies were also performed on the linear DNA molecules. Comparison of results obtained for the same sequences in dumbbell and linear molecular environments reveals several interesting features of the interplay between

  2. Exon capture and bulk segregant analysis: rapid discovery of causative mutations using high-throughput sequencing

    Directory of Open Access Journals (Sweden)

    del Viso Florencia

    2012-11-01

    Full Text Available Abstract Background Exome sequencing has transformed human genetic analysis and may do the same for other vertebrate model systems. However, a major challenge is sifting through the large number of sequence variants to identify the causative mutation for a given phenotype. In models like Xenopus tropicalis, an incomplete and occasionally incorrect genome assembly compounds this problem. To facilitate cloning of X. tropicalis mutants identified in forward genetic screens, we sought to combine bulk segregant analysis and exome sequencing into a single step. Results Here we report the first use of exon capture sequencing to identify mutations in a non-mammalian, vertebrate model. We demonstrate that bulk segregant analysis coupled with exon capture sequencing is not only able to identify causative mutations but can also generate linkage information, facilitate the assembly of scaffolds, identify misassembles, and discover thousands of SNPs for fine mapping. Conclusion Exon capture sequencing and bulk segregant analysis is a rapid, inexpensive method to clone mutants identified in forward genetic screens. With sufficient meioses, this method can be generalized to any model system with a genome assembly, polished or unpolished, and in the latter case, it also provides many critical genomic resources.

  3. Generation and analysis of expressed sequence tags from the ciliate protozoan parasite Ichthyophthirius multifiliis

    Directory of Open Access Journals (Sweden)

    Arias Covadonga

    2007-06-01

    Full Text Available Abstract Background The ciliate protozoan Ichthyophthirius multifiliis (Ich is an important parasite of freshwater fish that causes 'white spot disease' leading to significant losses. A genomic resource for large-scale studies of this parasite has been lacking. To study gene expression involved in Ich pathogenesis and virulence, our goal was to generate expressed sequence tags (ESTs for the development of a powerful microarray platform for the analysis of global gene expression in this species. Here, we initiated a project to sequence and analyze over 10,000 ESTs. Results We sequenced 10,368 EST clones using a normalized cDNA library made from pooled samples of the trophont, tomont, and theront life-cycle stages, and generated 9,769 sequences (94.2% success rate. Post-sequencing processing led to 8,432 high quality sequences. Clustering analysis of these ESTs allowed identification of 4,706 unique sequences containing 976 contigs and 3,730 singletons. These unique sequences represent over two million base pairs (~10% of Plasmodium falciparum genome, a phylogenetically related protozoan. BLASTX searches produced 2,518 significant (E-value -5 hits and further Gene Ontology (GO analysis annotated 1,008 of these genes. The ESTs were analyzed comparatively against the genomes of the related protozoa Tetrahymena thermophila and P. falciparum, allowing putative identification of additional genes. All the EST sequences were deposited by dbEST in GenBank (GenBank: EG957858–EG966289. Gene discovery and annotations are presented and discussed. Conclusion This set of ESTs represents a significant proportion of the Ich transcriptome, and provides a material basis for the development of microarrays useful for gene expression studies concerning Ich development, pathogenesis, and virulence.

  4. Next-generation sequencing of multiple individuals per barcoded library by deconvolution of sequenced amplicons using endonuclease fragment analysis

    DEFF Research Database (Denmark)

    Andersen, Jeppe D; Pereira, Vania; Pietroni, Carlotta

    2014-01-01

    The simultaneous sequencing of samples from multiple individuals increases the efficiency of next-generation sequencing (NGS) while also reducing costs. Here we describe a novel and simple approach for sequencing DNA from multiple individuals per barcode. Our strategy relies on the endonuclease d...

  5. A genome-wide analysis of FRT-like sequences in the human genome.

    Science.gov (United States)

    Shultz, Jeffry L; Voziyanova, Eugenia; Konieczka, Jay H; Voziyanov, Yuri

    2011-03-23

    Efficient and precise genome manipulations can be achieved by the Flp/FRT system of site-specific DNA recombination. Applications of this system are limited, however, to cases when target sites for Flp recombinase, FRT sites, are pre-introduced into a genome locale of interest. To expand use of the Flp/FRT system in genome engineering, variants of Flp recombinase can be evolved to recognize pre-existing genomic sequences that resemble FRT and thus can serve as recombination sites. To understand the distribution and sequence properties of genomic FRT-like sites, we performed a genome-wide analysis of FRT-like sites in the human genome using the experimentally-derived parameters. Out of 642,151 identified FRT-like sequences, 581,157 sequences were unique and 12,452 sequences had at least one exact duplicate. Duplicated FRT-like sequences are located mostly within LINE1, but also within LTRs of endogenous retroviruses, Alu repeats and other repetitive DNA sequences. The unique FRT-like sequences were classified based on the number of matches to FRT within the first four proximal bases pairs of the Flp binding elements of FRT and the nature of mismatched base pairs in the same region. The data obtained will be useful for the emerging field of genome engineering.

  6. A genome-wide analysis of FRT-like sequences in the human genome.

    Directory of Open Access Journals (Sweden)

    Jeffry L Shultz

    2011-03-01

    Full Text Available Efficient and precise genome manipulations can be achieved by the Flp/FRT system of site-specific DNA recombination. Applications of this system are limited, however, to cases when target sites for Flp recombinase, FRT sites, are pre-introduced into a genome locale of interest. To expand use of the Flp/FRT system in genome engineering, variants of Flp recombinase can be evolved to recognize pre-existing genomic sequences that resemble FRT and thus can serve as recombination sites. To understand the distribution and sequence properties of genomic FRT-like sites, we performed a genome-wide analysis of FRT-like sites in the human genome using the experimentally-derived parameters. Out of 642,151 identified FRT-like sequences, 581,157 sequences were unique and 12,452 sequences had at least one exact duplicate. Duplicated FRT-like sequences are located mostly within LINE1, but also within LTRs of endogenous retroviruses, Alu repeats and other repetitive DNA sequences. The unique FRT-like sequences were classified based on the number of matches to FRT within the first four proximal bases pairs of the Flp binding elements of FRT and the nature of mismatched base pairs in the same region. The data obtained will be useful for the emerging field of genome engineering.

  7. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays.

    Science.gov (United States)

    Brenner, S; Johnson, M; Bridgham, J; Golda, G; Lloyd, D H; Johnson, D; Luo, S; McCurdy, S; Foy, M; Ewan, M; Roth, R; George, D; Eletr, S; Albrecht, G; Vermaas, E; Williams, S R; Moon, K; Burcham, T; Pallas, M; DuBridge, R B; Kirchner, J; Fearon, K; Mao, J; Corcoran, K

    2000-06-01

    We describe a novel sequencing approach that combines non-gel-based signature sequencing with in vitro cloning of millions of templates on separate 5 microm diameter microbeads. After constructing a microbead library of DNA templates by in vitro cloning, we assembled a planar array of a million template-containing microbeads in a flow cell at a density greater than 3x10(6) microbeads/cm2. Sequences of the free ends of the cloned templates on each microbead were then simultaneously analyzed using a fluorescence-based signature sequencing method that does not require DNA fragment separation. Signature sequences of 16-20 bases were obtained by repeated cycles of enzymatic cleavage with a type IIs restriction endonuclease, adaptor ligation, and sequence interrogation by encoded hybridization probes. The approach was validated by sequencing over 269,000 signatures from two cDNA libraries constructed from a fully sequenced strain of Saccharomyces cerevisiae, and by measuring gene expression levels in the human cell line THP-1. The approach provides an unprecedented depth of analysis permitting application of powerful statistical techniques for discovery of functional relationships among genes, whether known or unknown beforehand, or whether expressed at high or very low levels.

  8. Genetic mutation analysis of human gastric adenocarcinomas using ion torrent sequencing platform.

    Directory of Open Access Journals (Sweden)

    Zhi Xu

    Full Text Available Gastric cancer is the one of the major causes of cancer-related death, especially in Asia. Gastric adenocarcinoma, the most common type of gastric cancer, is heterogeneous and its incidence and cause varies widely with geographical regions, gender, ethnicity, and diet. Since unique mutations have been observed in individual human cancer samples, identification and characterization of the molecular alterations underlying individual gastric adenocarcinomas is a critical step for developing more effective, personalized therapies. Until recently, identifying genetic mutations on an individual basis by DNA sequencing remained a daunting task. Recent advances in new next-generation DNA sequencing technologies, such as the semiconductor-based Ion Torrent sequencing platform, makes DNA sequencing cheaper, faster, and more reliable. In this study, we aim to identify genetic mutations in the genes which are targeted by drugs in clinical use or are under development in individual human gastric adenocarcinoma samples using Ion Torrent sequencing. We sequenced 737 loci from 45 cancer-related genes in 238 human gastric adenocarcinoma samples using the Ion Torrent Ampliseq Cancer Panel. The sequencing analysis revealed a high occurrence of mutations along the TP53 locus (9.7% in our sample set. Thus, this study indicates the utility of a cost and time efficient tool such as Ion Torrent sequencing to screen cancer mutations for the development of personalized cancer therapy.

  9. galaxie--CGI scripts for sequence identification through automated phylogenetic analysis.

    Science.gov (United States)

    Nilsson, R Henrik; Larsson, Karl-Henrik; Ursing, Björn M

    2004-06-12

    The prevalent use of similarity searches like BLAST to identify sequences and species implicitly assumes the reference database to be of extensive sequence sampling. This is often not the case, restraining the correctness of the outcome as a basis for sequence identification. Phylogenetic inference outperforms similarity searches in retrieving correct phylogenies and consequently sequence identities, and a project was initiated to design a freely available script package for sequence identification through automated Web-based phylogenetic analysis. Three CGI scripts were designed to facilitate qualified sequence identification from a Web interface. Query sequences are aligned to pre-made alignments or to alignments made by ClustalW with entries retrieved from a BLAST search. The subsequent phylogenetic analysis is based on the PHYLIP package for inferring neighbor-joining and parsimony trees. The scripts are highly configurable. A service installation and a version for local use are found at http://andromeda.botany.gu.se/galaxiewelcome.html and http://galaxie.cgb.ki.se

  10. Representative proteomes: a stable, scalable and unbiased proteome set for sequence analysis and functional annotation.

    Directory of Open Access Journals (Sweden)

    Chuming Chen

    2011-04-01

    Full Text Available The accelerating growth in the number of protein sequences taxes both the computational and manual resources needed to analyze them. One approach to dealing with this problem is to minimize the number of proteins subjected to such analysis in a way that minimizes loss of information. To this end we have developed a set of Representative Proteomes (RPs, each selected from a Representative Proteome Group (RPG containing similar proteomes calculated based on co-membership in UniRef50 clusters. A Representative Proteome is the proteome that can best represent all the proteomes in its group in terms of the majority of the sequence space and information. RPs at 75%, 55%, 35% and 15% co-membership threshold (CMT are provided to allow users to decrease or increase the granularity of the sequence space based on their requirements. We find that a CMT of 55% (RP55 most closely follows standard taxonomic classifications. Further analysis of this set reveals that sequence space is reduced by more than 80% relative to UniProtKB, while retaining both sequence diversity (over 95% of InterPro domains and annotation information (93% of experimentally characterized proteins. All sets can be browsed and are available for sequence similarity searches and download at http://www.proteininformationresource.org/rps, while the set of 637 RPs determined using a 55% CMT are also available for text searches. Potential applications include sequence similarity searches, protein classification and targeted protein annotation and characterization.

  11. DNA Sudoku--harnessing high-throughput sequencing for multiplexed specimen analysis.

    Science.gov (United States)

    Erlich, Yaniv; Chang, Kenneth; Gordon, Assaf; Ronen, Roy; Navon, Oron; Rooks, Michelle; Hannon, Gregory J

    2009-07-01

    Next-generation sequencers have sufficient power to analyze simultaneously DNAs from many different specimens, a practice known as multiplexing. Such schemes rely on the ability to associate each sequence read with the specimen from which it was derived. The current practice of appending molecular barcodes prior to pooling is practical for parallel analysis of up to many dozen samples. Here, we report a strategy that permits simultaneous analysis of tens of thousands of specimens. Our approach relies on the use of combinatorial pooling strategies in which pools rather than individual specimens are assigned barcodes. Thus, the identity of each specimen is encoded within the pooling pattern rather than by its association with a particular sequence tag. Decoding the pattern allows the sequence of an original specimen to be inferred with high confidence. We verified the ability of our encoding and decoding strategies to accurately report the sequence of individual samples within a large number of mixed specimens in two ways. First, we simulated data both from a clone library and from a human population in which a sequence variant associated with cystic fibrosis was present. Second, we actually pooled, sequenced, and decoded identities within two sets of 40,000 bacterial clones comprising approximately 20,000 different artificial microRNAs targeting Arabidopsis or human genes. We achieved greater than 97% accuracy in these trials. The strategies reported here can be applied to a wide variety of biological problems, including the determination of genotypic variation within large populations of individuals.

  12. DNA Sudoku—harnessing high-throughput sequencing for multiplexed specimen analysis

    Science.gov (United States)

    Erlich, Yaniv; Chang, Kenneth; Gordon, Assaf; Ronen, Roy; Navon, Oron; Rooks, Michelle; Hannon, Gregory J.

    2009-01-01

    Next-generation sequencers have sufficient power to analyze simultaneously DNAs from many different specimens, a practice known as multiplexing. Such schemes rely on the ability to associate each sequence read with the specimen from which it was derived. The current practice of appending molecular barcodes prior to pooling is practical for parallel analysis of up to many dozen samples. Here, we report a strategy that permits simultaneous analysis of tens of thousands of specimens. Our approach relies on the use of combinatorial pooling strategies in which pools rather than individual specimens are assigned barcodes. Thus, the identity of each specimen is encoded within the pooling pattern rather than by its association with a particular sequence tag. Decoding the pattern allows the sequence of an original specimen to be inferred with high confidence. We verified the ability of our encoding and decoding strategies to accurately report the sequence of individual samples within a large number of mixed specimens in two ways. First, we simulated data both from a clone library and from a human population in which a sequence variant associated with cystic fibrosis was present. Second, we actually pooled, sequenced, and decoded identities within two sets of 40,000 bacterial clones comprising approximately 20,000 different artificial microRNAs targeting Arabidopsis or human genes. We achieved greater than 97% accuracy in these trials. The strategies reported here can be applied to a wide variety of biological problems, including the determination of genotypic variation within large populations of individuals. PMID:19447965

  13. QTL analysis by sequencing of Water Use Efficiency (WUE) in potato

    DEFF Research Database (Denmark)

    Kaminski, Kacper Piotr; Sønderkær, Mads; Sørensen, Kirsten Kørup

    2013-01-01

    The traditional approach to potato breeding, the classical “mate and phenotype” approach is relatively costly and because phenotyping and growth capacity is limited, this are being slowly replaced by Marker Assisted Selection (MAS) breeding schemes. MAS is based on the presence of DNA polymorphic.......sparsipilum), phenotyped for water use efficiency. This population has also previously been phenotyped for the total glycoalkaloid (TGA) content....... and time consuming process. Here, a novel method for Quantitative Trait Locus (QTL) analysis has been developed, that allows for development of specific markers by use of genomic sequence reads and the recently published reference genome sequence for potato. Prior to sequencing the mapping population...

  14. Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly

    OpenAIRE

    Lam, Ernest T; Hastie, Alex; Lin, Chin; Ehrlich, Dean; Das, Somes K; Austin, Michael D; Deshpande, Paru; Cao, Han; Nagarajan, Niranjan; Xiao, Ming; Kwok, Pui-Yan

    2012-01-01

    We describe genome mapping on nanochannel arrays. In this approach, specific sequence motifs in single DNA molecules are fluorescently labeled, and the DNA molecules are uniformly stretched in thousands of silicon channels on a nanofluidic device. Fluorescence imaging allows the construction of maps of the physical distances between occurrences of the sequence motifs. We demonstrate the analysis, individually and as mixtures, of 95 bacterial artificial chromosome (BAC) clones that cover the 4...

  15. Automated sequence analysis and editing software for HIV drug resistance testing.

    Science.gov (United States)

    Struck, Daniel; Wallis, Carole L; Denisov, Gennady; Lambert, Christine; Servais, Jean-Yves; Viana, Raquel V; Letsoalo, Esrom; Bronze, Michelle; Aitken, Sue C; Schuurman, Rob; Stevens, Wendy; Schmit, Jean Claude; Rinke de Wit, Tobias; Perez Bercoff, Danielle

    2012-05-01

    Access to antiretroviral treatment in resource-limited-settings is inevitably paralleled by the emergence of HIV drug resistance. Monitoring treatment efficacy and HIV drugs resistance testing are therefore of increasing importance in resource-limited settings. Yet low-cost technologies and procedures suited to the particular context and constraints of such settings are still lacking. The ART-A (Affordable Resistance Testing for Africa) consortium brought together public and private partners to address this issue. To develop an automated sequence analysis and editing software to support high throughput automated sequencing. The ART-A Software was designed to automatically process and edit ABI chromatograms or FASTA files from HIV-1 isolates. The ART-A Software performs the basecalling, assigns quality values, aligns query sequences against a set reference, infers a consensus sequence, identifies the HIV type and subtype, translates the nucleotide sequence to amino acids and reports insertions/deletions, premature stop codons, ambiguities and mixed calls. The results can be automatically exported to Excel to identify mutations. Automated analysis was compared to manual analysis using a panel of 1624 PR-RT sequences generated in 3 different laboratories. Discrepancies between manual and automated sequence analysis were 0.69% at the nucleotide level and 0.57% at the amino acid level (668,047 AA analyzed), and discordances at major resistance mutations were recorded in 62 cases (4.83% of differences, 0.04% of all AA) for PR and 171 (6.18% of differences, 0.03% of all AA) cases for RT. The ART-A Software is a time-sparing tool for pre-analyzing HIV and viral quasispecies sequences in high throughput laboratories and highlighting positions requiring attention. Copyright © 2012 Elsevier B.V. All rights reserved.

  16. Next-generation sequencing of multiple individuals per barcoded library by deconvolution of sequenced amplicons using endonuclease fragment analysis.

    Science.gov (United States)

    Andersen, Jeppe D; Pereira, Vania; Pietroni, Carlotta; Mikkelsen, Martin; Johansen, Peter; Børsting, Claus; Morling, Niels

    2014-08-01

    The simultaneous sequencing of samples from multiple individuals increases the efficiency of next-generation sequencing (NGS) while also reducing costs. Here we describe a novel and simple approach for sequencing DNA from multiple individuals per barcode. Our strategy relies on the endonuclease digestion of PCR amplicons prior to library preparation, creating a specific fragment pattern for each individual that can be resolved after sequencing. By using both barcodes and restriction fragment patterns, we demonstrate the ability to sequence the human melanocortin 1 receptor (MC1R) genes from 72 individuals using only 24 barcoded libraries.

  17. ESSENTIALS: software for rapid analysis of high throughput transposon insertion sequencing data.

    Directory of Open Access Journals (Sweden)

    Aldert Zomer

    Full Text Available High-throughput analysis of genome-wide random transposon mutant libraries is a powerful tool for (conditional essential gene discovery. Recently, several next-generation sequencing approaches, e.g. Tn-seq/INseq, HITS and TraDIS, have been developed that accurately map the site of transposon insertions by mutant-specific amplification and sequence readout of DNA flanking the transposon insertions site, assigning a measure of essentiality based on the number of reads per insertion site flanking sequence or per gene. However, analysis of these large and complex datasets is hampered by the lack of an easy to use and automated tool for transposon insertion sequencing data. To fill this gap, we developed ESSENTIALS, an open source, web-based software tool for researchers in the genomics field utilizing transposon insertion sequencing analysis. It accurately predicts (conditionally essential genes and offers the flexibility of using different sample normalization methods, genomic location bias correction, data preprocessing steps, appropriate statistical tests and various visualizations to examine the results, while requiring only a minimum of input and hands-on work from the researcher. We successfully applied ESSENTIALS to in-house and published Tn-seq, TraDIS and HITS datasets and we show that the various pre- and post-processing steps on the sequence reads and count data with ESSENTIALS considerably improve the sensitivity and specificity of predicted gene essentiality.

  18. Mediterranean Neocomian belemnites. Part I : Río Argos sequence (Province of Murcia, Spain) : the Berriasian-Valanginian and the Hauterivian-Barremian boundaries

    NARCIS (Netherlands)

    Janssen, N.M.M.

    1997-01-01

    In the Río Argos sequence (Murcia, SE Spain) belemnites were collected in ammonite controlled Lower Cretaceous strata. The calibration towards ammonite biozones gives the opportunity to specify the biostratigraphical ranges of belemnites in the (western) Mediterranean. Emphasized are the Berriasian-

  19. Third-Generation Sequencing and Analysis of Four Complete Pig Liver Esterase Gene Sequences in Clones Identified by Screening BAC Library.

    Science.gov (United States)

    Zhou, Qiongqiong; Sun, Wenjuan; Liu, Xiyan; Wang, Xiliang; Xiao, Yuncai; Bi, Dingren; Yin, Jingdong; Shi, Deshi

    2016-01-01

    Pig liver carboxylesterase (PLE) gene sequences in GenBank are incomplete, which has led to difficulties in studying the genetic structure and regulation mechanisms of gene expression of PLE family genes. The aim of this study was to obtain and analysis of complete gene sequences of PLE family by screening from a Rongchang pig BAC library and third-generation PacBio gene sequencing. After a number of existing incomplete PLE isoform gene sequences were analysed, primers were designed based on conserved regions in PLE exons, and the whole pig genome used as a template for Polymerase chain reaction (PCR) amplification. Specific primers were then selected based on the PCR amplification results. A three-step PCR screening method was used to identify PLE-positive clones by screening a Rongchang pig BAC library and PacBio third-generation sequencing was performed. BLAST comparisons and other bioinformatics methods were applied for sequence analysis. Five PLE-positive BAC clones, designated BAC-10, BAC-70, BAC-75, BAC-119 and BAC-206, were identified. Sequence analysis yielded the complete sequences of four PLE genes, PLE1, PLE-B9, PLE-C4, and PLE-G2. Complete PLE gene sequences were defined as those containing regulatory sequences, exons, and introns. It was found that, not only did the PLE exon sequences of the four genes show a high degree of homology, but also that the intron sequences were highly similar. Additionally, the regulatory region of the genes contained two 720bps reverse complement sequences that may have an important function in the regulation of PLE gene expression. This is the first report to confirm the complete sequences of four PLE genes. In addition, the study demonstrates that each PLE isoform is encoded by a single gene and that the various genes exhibit a high degree of sequence homology, suggesting that the PLE family evolved from a single ancestral gene. Obtaining the complete sequences of these PLE genes provides the necessary foundation for

  20. Whole-genome sequencing and genetic variant analysis of a Quarter Horse mare.

    KAUST Repository

    Doan, Ryan

    2012-02-17

    BACKGROUND: The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. RESULTS: Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse\\'s genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. CONCLUSIONS: This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids.

  1. In silico analysis of 3'-end-processing signals in Aspergillus oryzae using expressed sequence tags and genomic sequencing data.

    Science.gov (United States)

    Tanaka, Mizuki; Sakai, Yoshifumi; Yamada, Osamu; Shintani, Takahiro; Gomi, Katsuya

    2011-06-01

    To investigate 3'-end-processing signals in Aspergillus oryzae, we created a nucleotide sequence data set of the 3'-untranslated region (3' UTR) plus 100 nucleotides (nt) sequence downstream of the poly(A) site using A. oryzae expressed sequence tags and genomic sequencing data. This data set comprised 1065 sequences derived from 1042 unique genes. The average 3' UTR length in A. oryzae was 241 nt, which is greater than that in yeast but similar to that in plants. The 3' UTR and 100 nt sequence downstream of the poly(A) site is notably U-rich, while the region located 15-30 nt upstream of the poly(A) site is markedly A-rich. The most frequently found hexanucleotide in this A-rich region is AAUGAA, although this sequence accounts for only 6% of all transcripts. These data suggested that A. oryzae has no highly conserved sequence element equivalent to AAUAAA, a mammalian polyadenylation signal. We identified that putative 3'-end-processing signals in A. oryzae, while less well conserved than those in mammals, comprised four sequence elements: the furthest upstream U-rich element, A-rich sequence, cleavage site, and downstream U-rich element flanking the cleavage site. Although these putative 3'-end-processing signals are similar to those in yeast and plants, some notable differences exist between them.

  2. Biostratigraphic reappraisal of the Lower Triassic Sanga do Cabral Supersequence from South America, with a description of new material attributable to the parareptile genus Procolophon

    Science.gov (United States)

    Dias-da-Silva, Sérgio; Pinheiro, Felipe L.; Stock Da-Rosa, Átila Augusto; Martinelli, Agustín G.; Schultz, Cesar L.; Silva-Neves, Eduardo; Modesto, Sean P.

    2017-11-01

    The Sanga do Cabral Supersequence (SCS), comprises the Brazilian Sanga do Cabral Formation (SCF) and the Uruguayan Buena Vista Formation (BVF). So far, the SCS has yielded temnospondyls, parareptiles, archosauromorphs, putative synapsids, and a number of indeterminate specimens. In the absence of absolute dates for these rocks, a biostratigraphic approach is necessary to establish the ages of the SCF and the BVF. It is well established that the SCF is Early Triassic mainly due to the presence of the widespread Gondwanan reptile Procolophon trigoniceps. Conversely, the age of the BVF is subject of great controversy, being regarded alternatively as Permian, Permo-Triassic, and Early Triassic. The BVF has yielded the definite procolophonid Pintosaurus magnidentis. Procolophonoidea is one of the most diverse and conspicuous terrestrial tetrapod groups of the Lower Triassic Lystrosaurus Assemblage Zone in the Karoo Basin of South Africa, which preserves tetrapods from the aftermath of the end-Permian extinction event. Based on a previous interpretation that the fauna of the BVF is Permian, and in the reinterpretation of disarticulated vertebrae from SCF with 'swollen' neural arches as belonging to either seymouriamorphs or diadectomorphs, it was recently suggested that at least part of the SCF is Permian in age, which prompted this comprehensive reevaluation of both SCS's faunal content and geology. Moreoever, new, strikingly large procolophonid specimens (skull, vertebra, and a mandibular fragment) from the SCF are described and referred to the genus Procolophon. The large procolophonid vertebra described here contradicts the recent hypothesis that similar specimens from the SCF belong to seymouriamorphs or diadectomorphs, because its morphology is consistent with that found in Procolophon. There is not a single diagnostic specimen that supports the inference of Permian levels in the SCS. Accordingly, because all diagnostic and biostratigraphically informative fossils

  3. Molecular characterization, sequence analysis and tissue expression of a porcine gene – MOSPD2

    Directory of Open Access Journals (Sweden)

    Yang Jie

    2017-01-01

    Full Text Available The full-length cDNA sequence of a porcine gene, MOSPD2, was amplified using the rapid amplification of cDNA ends method based on a pig expressed sequence tag sequence which was highly homologous to the coding sequence of the human MOSPD2 gene. Sequence prediction analysis revealed that the open reading frame of this gene encodes a protein of 491 amino acids that has high homology with the motile sperm domain-containing protein 2 (MOSPD2 of five species: horse (89%, human (90%, chimpanzee (89%, rhesus monkey (89% and mouse (85%; thus, it could be defined as a porcine MOSPD2 gene. This novel porcine gene was assigned GeneID: 100153601. This gene is structured in 15 exons and 14 introns as revealed by computer-assisted analysis. The phylogenetic analysis revealed that the porcine MOSPD2 gene has a closer genetic relationship with the MOSPD2 gene of horse. Tissue expression analysis indicated that the porcine MOSPD2 gene is generally and differentially expressed in the spleen, muscle, skin, kidney, lung, liver, fat and heart. Our experiment is the first to establish the primary foundation for further research on the porcine MOSPD2 gene.

  4. Sequence and phylogenetic analysis of chicken anaemia virus obtained from backyard and commercial chickens in Nigeria.

    Science.gov (United States)

    Oluwayelu, D O; Todd, D; Olaleye, O D

    2008-12-01

    This work reports the first molecular analysis study of chicken anaemia virus (CAV) in backyard chickens in Africa using molecular cloning and sequence analysis to characterize CAV strains obtained from commercial chickens and Nigerian backyard chickens. Partial VP1 gene sequences were determined for three CAVs from commercial chickens and for six CAV variants present in samples from a backyard chicken. Multiple alignment analysis revealed that the 6% and 4% nucleotide diversity obtained respectively for the commercial and backyard chicken strains translated to only 2% amino acid diversity for each breed. Overall, the amino acid composition of Nigerian CAVs was found to be highly conserved. Since the partial VP1 gene sequence of two backyard chicken cloned CAV strains (NGR/CI-8 and NGR/CI-9) were almost identical and evolutionarily closely related to the commercial chicken strains NGR-1, and NGR-4 and NGR-5, respectively, we concluded that CAV infections had crossed the farm boundary.

  5. A technique for setting analytical thresholds in massively parallel sequencing-based forensic DNA analysis.

    Science.gov (United States)

    Young, Brian; King, Jonathan L; Budowle, Bruce; Armogida, Luigi

    2017-01-01

    Amplicon (targeted) sequencing by massively parallel sequencing (PCR-MPS) is a potential method for use in forensic DNA analyses. In this application, PCR-MPS may supplement or replace other instrumental analysis methods such as capillary electrophoresis and Sanger sequencing for STR and mitochondrial DNA typing, respectively. PCR-MPS also may enable the expansion of forensic DNA analysis methods to include new marker systems such as single nucleotide polymorphisms (SNPs) and insertion/deletions (indels) that currently are assayable using various instrumental analysis methods including microarray and quantitative PCR. Acceptance of PCR-MPS as a forensic method will depend in part upon developing protocols and criteria that define the limitations of a method, including a defensible analytical threshold or method detection limit. This paper describes an approach to establish objective analytical thresholds suitable for multiplexed PCR-MPS methods. A definition is proposed for PCR-MPS method background noise, and an analytical threshold based on background noise is described.

  6. Sequence Analysis of the Genome of an Oil-Bearing Tree, Jatropha curcas L.

    Science.gov (United States)

    Sato, Shusei; Hirakawa, Hideki; Isobe, Sachiko; Fukai, Eigo; Watanabe, Akiko; Kato, Midori; Kawashima, Kumiko; Minami, Chiharu; Muraki, Akiko; Nakazaki, Naomi; Takahashi, Chika; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Yamada, Manabu; Tsuruoka, Hisano; Sasamoto, Shigemi; Tabata, Satoshi; Aizu, Tomoyuki; Toyoda, Atsushi; Shin-i, Tadasu; Minakuchi, Yohei; Kohara, Yuji; Fujiyama, Asao; Tsuchimoto, Suguru; Kajiyama, Shin'ichiro; Makigano, Eri; Ohmido, Nobuko; Shibagaki, Nakako; Cartagena, Joyce A.; Wada, Naoki; Kohinata, Tsutomu; Atefeh, Alipour; Yuasa, Shota; Matsunaga, Sachihiro; Fukui, Kiichi

    2011-01-01

    The whole genome of Jatropha curcas was sequenced, using a combination of the conventional Sanger method and new-generation multiplex sequencing methods. Total length of the non-redundant sequences thus obtained was 285 858 490 bp consisting of 120 586 contigs and 29 831 singlets. They accounted for ∼95% of the gene-containing regions with the average G + C content was 34.3%. A total of 40 929 complete and partial structures of protein encoding genes have been deduced. Comparison with genes of other plant species indicated that 1529 (4%) of the putative protein-encoding genes are specific to the Euphorbiaceae family. A high degree of microsynteny was observed with the genome of castor bean and, to a lesser extent, with those of soybean and Arabidopsis thaliana. In parallel with genome sequencing, cDNAs derived from leaf and callus tissues were subjected to pyrosequencing, and a total of 21 225 unigene data have been generated. Polymorphism analysis using microsatellite markers developed from the genomic sequence data obtained was performed with 12 J. curcas lines collected from various parts of the world to estimate their genetic diversity. The genomic sequence and accompanying information presented here are expected to serve as valuable resources for the acceleration of fundamental and applied research with J. curcas, especially in the fields of environment-related research such as biofuel production. Further information on the genomic sequences and DNA markers is available at http://www.kazusa.or.jp/jatropha/. PMID:21149391

  7. A priori Considerations When Conducting High-Throughput Amplicon-Based Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Aditi Sengupta

    2016-03-01

    Full Text Available Amplicon-based sequencing strategies that include 16S rRNA and functional genes, alongside “meta-omics” analyses of communities of microorganisms, have allowed researchers to pose questions and find answers to “who” is present in the environment and “what” they are doing. Next-generation sequencing approaches that aid microbial ecology studies of agricultural systems are fast gaining popularity among agronomy, crop, soil, and environmental science researchers. Given the rapid development of these high-throughput sequencing techniques, researchers with no prior experience will desire information about the best practices that can be used before actually starting high-throughput amplicon-based sequence analyses. We have outlined items that need to be carefully considered in experimental design, sampling, basic bioinformatics, sequencing of mock communities and negative controls, acquisition of metadata, and in standardization of reaction conditions as per experimental requirements. Not all considerations mentioned here may pertain to a particular study. The overall goal is to inform researchers about considerations that must be taken into account when conducting high-throughput microbial DNA sequencing and sequences analysis.

  8. Identification and sequence analysis of grain softness protein in selected wheat, rye and triticale.

    Science.gov (United States)

    Kharrazi, M A S; Bobojonov, V

    2012-08-16

    Grain softness protein (GSP) is an important protein for overcoming milling and grain defenses in the innate immunity systems of cereals. The objective of this study was to evaluate and understand GSP sequences in selected wheat, rye and triticale. Using sequences for this gene from a sequence database, we performed clustering analysis to compare the sequences obtained from 3 germplasms with other studied sequences for GSP. The maximum difference between the Hirmand GSP genotype in wheat and the database sequences was 23% in EF109396 and EF109399. Most amino acid variation between the GSP sequences involved the same amino acids. The Nikita rye GSP gene showed 64% identity with DQ269918 and AY667063. The isoelectric point in the GSP of wheat and Lasko triticale was significantly higher than that of rye GSP. In addition, parameters such as optical density, grand average of hydrophobicity, percentage of hydrophobicity and hydrophilic amino acids, and number of alpha helices and beta sheets in GSP were similar in wheat and triticale but not in wheat and rye.

  9. Texture analysis of common renal masses in multiple MR sequences for prediction of pathology

    Science.gov (United States)

    Hoang, Uyen N.; Malayeri, Ashkan A.; Lay, Nathan S.; Summers, Ronald M.; Yao, Jianhua

    2017-03-01

    This pilot study performs texture analysis on multiple magnetic resonance (MR) images of common renal masses for differentiation of renal cell carcinoma (RCC). Bounding boxes are drawn around each mass on one axial slice in T1 delayed sequence to use for feature extraction and classification. All sequences (T1 delayed, venous, arterial, pre-contrast phases, T2, and T2 fat saturated sequences) are co-registered and texture features are extracted from each sequence simultaneously. Random forest is used to construct models to classify lesions on 96 normal regions, 87 clear cell RCCs, 8 papillary RCCs, and 21 renal oncocytomas; ground truths are verified through pathology reports. The highest performance is seen in random forest model when data from all sequences are used in conjunction, achieving an overall classification accuracy of 83.7%. When using data from one single sequence, the overall accuracies achieved for T1 delayed, venous, arterial, and pre-contrast phase, T2, and T2 fat saturated were 79.1%, 70.5%, 56.2%, 61.0%, 60.0%, and 44.8%, respectively. This demonstrates promising results of utilizing intensity information from multiple MR sequences for accurate classification of renal masses.

  10. Isolation and sequence analysis of a cDNA clone encoding the fifth complement component

    DEFF Research Database (Denmark)

    Lundwall, Åke B; Wetsel, Rick A; Kristensen, Torsten

    1985-01-01

    clone of 1.85 kilobase pairs was isolated. Hybridization of the mixed-sequence probe to the complementary strand of the plasmid insert and sequence analysis by the dideoxy method predicted the expected protein sequence of C5a (positions 1-12), amino-terminal to the anticipated priming site. The sequence......We have used available protein sequence data for the anaphylatoxin (C5a) portion of the fifth component of human complement (residues 19-25) to synthesize a mixed-sequence oligonucleotide probe. The labeled oligonucleotide was then used to screen a human liver cDNA library, and a single candidate cDNA...... obtained further predicted an arginine-rich sequence (RPRR) immediately upstream of the N-terminal threonine of C5a, indicating that the promolecule form of C5 is synthesized with a beta alpha-chain orientation as previously shown for pro-C3 and pro-C4. The C5 cDNA clone was sheared randomly by sonication...

  11. A graph theoretic approach to the analysis of DNA sequencing data.

    Science.gov (United States)

    Berno, A J

    1996-02-01

    The analysis of data from automated DNA sequencing instruments has been a limiting factor in the development of new sequencing technology. A new base-calling algorithm that is intended to be independent of any particular sequencing technology has been developed and shown to be effective with data from the Applied Biosystems 373 sequencing system. This algorithm makes use of a nonlinear deconvolution filter to detect likely oligomer events and a graph theoretic editing strategy to find the subset of those events that is most likely to correspond to the correct sequence. Metrics evaluating the quality and accuracy of the resulting sequence are also generated and have been shown to be predictive of measured error rates. Compared to the Applied Biosystems Analysis software, this algorithm generates 18% fewer insertion errors, 80% more deletion errors, and 4% fewer mismatches. The tradeoff between different types of errors can be controlled through a secondary editing step that inserts or deletes base calls depending on their associated confidence values.

  12. NEW BIOSTRATIGRAPHIC DATA FROM THE REITANO FLYSCH AUCT. (SICILY, ITALY: A KEY TO A REVISED STRATIGRAPHY OF THE SICILIDE UNITS

    Directory of Open Access Journals (Sweden)

    STEFANO TORRICELLI

    2010-07-01

    Full Text Available The study of palynomorphs and calcareous nannofossils recovered from the volcano-arenitic succession outcropping at Troina and Cerami (Sicily documents Rupelian assemblages comparable to those published for the Tusa Tuffite. This new evidence, combined with petrographic, geochemical and sedimentological affinities documented in the literature, eventually proves the genetic relationships between these units. Accordingly, the new name Troina-Tusa Formation is proposed to include all these lower Oligocene volcano-sedimentary units and to replace inappropriate names formerly used. The Troina-Tusa Formation conformably lies on a mixed siliciclastic-carbonate turbidite succession, lacking volcanic detritus, reported in the literature with different names (Polizzi Formation, Varicoloured Shales, Troina-Tusa Flysch and different ages (ranging from Eocene to Early Miocene. Palynomorphs and nannofossils recovered from its uppermost part, indicate an earliest Oligocene age. The denomination Polizzi Formation is recommended for this unit that includes also the Varicoloured Shales (Eocene-basal Oligocene. The appearance of conglomerates and volcano-arenites in the basal portion of the Troina-Tusa Formation, immediately above the top of the Polizzi Formation, marks a sudden reorganization of the Rupelian depositional systems related to the rise and erosion of a volcanic belt. Apparently, no biostratigraphically detectable hiatus is associated to this boundary. Differences in the composition of sandstones, sedimentary features and relationships with the substratum do exist between the ‘internal’ Reitano Flysch, outcropping in the type-area on the northern slope of the Nebrodi Mountains, and the volcano-arenitic successions of Cerami and Troina, reported by some authors as ‘external’ Reitano Flysch. These differences are widely documented in the literature, where the ‘internal’ Reitano Flysch is shown to lack volcanic detritus and to rest

  13. Resources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing.

    Directory of Open Access Journals (Sweden)

    Samuel V Angiuoli

    Full Text Available The widespread popularity of genomic applications is threatened by the "bioinformatics bottleneck" resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly.We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2, which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers.Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer invested in 16S rRNA amplicon sequencing

  14. Resources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing.

    Science.gov (United States)

    Angiuoli, Samuel V; White, James R; Matalka, Malcolm; White, Owen; Fricke, W Florian

    2011-01-01

    The widespread popularity of genomic applications is threatened by the "bioinformatics bottleneck" resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single

  15. PseudoMLSA: a database for multigenic sequence analysis of Pseudomonas species

    Directory of Open Access Journals (Sweden)

    Lalucat Jorge

    2010-04-01

    Full Text Available Abstract Background The genus Pseudomonas comprises more than 100 species of environmental, clinical, agricultural, and biotechnological interest. Although, the recommended method for discriminating bacterial species is DNA-DNA hybridisation, alternative techniques based on multigenic sequence analysis are becoming a common practice in bacterial species discrimination studies. Since there is not a general criterion for determining which genes are more useful for species resolution; the number of strains and genes analysed is increasing continuously. As a result, sequences of different genes are dispersed throughout several databases. This sequence information needs to be collected in a common database, in order to be useful for future identification-based projects. Description The PseudoMLSA Database is a comprehensive database of multiple gene sequences from strains of Pseudomonas species. The core of the database is composed of selected gene sequences from all Pseudomonas type strains validly assigned to the genus through 2008. The database is aimed to be useful for MultiLocus Sequence Analysis (MLSA procedures, for the identification and characterisation of any Pseudomonas bacterial isolate. The sequences are available for download via a direct connection to the National Center for Biotechnology Information (NCBI. Additionally, the database includes an online BLAST interface for flexible nucleotide queries and similarity searches with the user's datasets, and provides a user-friendly output for easily parsing, navigating, and analysing BLAST results. Conclusions The PseudoMLSA database amasses strains and sequence information of validly described Pseudomonas species, and allows free querying of the database via a user-friendly, web-based interface available at http://www.uib.es/microbiologiaBD/Welcome.html. The web-based platform enables easy retrieval at strain or gene sequence information level; including references to published peer

  16. Strategies for exome and genome sequence data analysis in disease-gene discovery projects.

    Science.gov (United States)

    Robinson, Peter N; Krawitz, P; Mundlos, S

    2011-08-01

    In whole-exome sequencing (WES), target capture methods are used to enrich the sequences of the coding regions of genes from fragmented total genomic DNA, followed by massively parallel, 'next-generation' sequencing of the captured fragments. Since its introduction in 2009, WES has been successfully used in several disease-gene discovery projects, but the analysis of whole-exome sequence data can be challenging. In this overview, we present a summary of the main computational strategies that have been applied to identify novel disease genes in whole-exome data, including intersect filters, the search for de novo mutations, and the application of linkage mapping or inference of identity-by-descent (IBD) in family studies. © 2011 John Wiley & Sons A/S.

  17. Automatic knowledge extraction in sequencing analysis with multiagent system and grid computing.

    Science.gov (United States)

    González, Roberto; Zato, Carolina; Benito, Rocío; Bajo, Javier; Hernández, Jesús M; De Paz, Juan F; Vera, Vicente; Corchado, Juan M

    2012-12-01

    Advances in bioinformatics have contributed towards a significant increase in available information. Information analysis requires the use of distributed computing systems to best engage the process of data analysis. This study proposes a multiagent system that incorporates grid technology to facilitate distributed data analysis by dynamically incorporating the roles associated to each specific case study. The system was applied to genetic sequencing data to extract relevant information about insertions, deletions or polymorphisms.

  18. Automatic knowledge extraction in sequencing analysis with multiagent system and grid computing

    Directory of Open Access Journals (Sweden)

    González Roberto

    2012-12-01

    Full Text Available Advances in bioinformatics have contributed towards a significant increase in available information. Information analysis requires the use of distributed computing systems to best engage the process of data analysis. This study proposes a multiagent system that incorporates grid technology to facilitate distributed data analysis by dynamically incorporating the roles associated to each specific case study. The system was applied to genetic sequencing data to extract relevant information about insertions, deletions or polymorphisms.

  19. Capillary electrophoresis fragment analysis and clone sequencing in detection of dynamic mutations of spinocerebellar ataxia

    Directory of Open Access Journals (Sweden)

    Yuan-yuan CHEN

    2018-04-01

    Full Text Available Objective To estimate the accuracy and stability of capillary electrophoresis fragment analysis and clone sequencing in detecting dynamic mutations of spinocerebellar ataxia (SCA. Methods Capillary electrophoresis fragment analysis and clone sequencing were used in detecting trinucleotide repeated sequence of 14 SCA patients (3 cases of SCA2, 2 cases of SCA7, 7 cases of SCA8 and 2 cases of SCA17. Results Capillary electrophoresis fragment analysis of 3 SCA2 cases showed the expanded cytosine-adenine-guanine (CAG repeats were 31, 30 and 32, and the copy numbers of 3 clone sequencing for 3 colonies in each case were 37/40/40, 37/38/39 and 38/39/40 respectively. Capillary electrophoresis fragment analysis of 2 SCA7 cases showed the expanded CAG repeats were 57 and 34, and the copy numbers of repeats were 69, 74, 75 in 3 colonies of one case, and was 45 in the other case. For the 7 SCA8 cases with the expanded cytosine-thymine-adenine (CTA/cytosine-thymine-guanine (CTG repeats of 99, 111, 104, 92, 89, 104 and 75, the results of clone sequencing were 97, 116, 104, 90, 90, 102 and 76 respectively. For 2 SCA17 cases with the short/expanded CAG repeats of 37/50 and 36/45, the results of clone sequencing were 51/50/52 and 45/44 for 3 and 2 colonies. Conclusions Although the higher mobility of polymerase chain reaction (PCR products containing dynamic mutation in the capillary electrophoresis fragment analysis might cause the deviation for analysis of copy numbers, the deviation was predictable and the results were repeatable. The clone sequencing results showed obvious instability, especially for SCA2 and SCA7 genes, which might owing to their simple CAG repeats. Consequently, clone sequencing is not suited for detection of dynamic mutation, not to mention the quantitative criteria of dynamic mutation sequencing. DOI: 10.3969/j.issn.1672-6731.2018.03.008

  20. Aligning to the sample-specific reference sequence to optimize the accuracy of next-generation sequencing analysis for hepatitis B virus.

    Science.gov (United States)

    Liu, Wen-Chun; Lin, Chih-Peng; Cheng, Chun-Pei; Ho, Cheng-Hsun; Lan, Kuo-Lun; Cheng, Ji-Hong; Yen, Chia-Jui; Cheng, Pin-Nan; Wu, I-Chin; Li, I-Chen; Chang, Bill Chia-Han; Tseng, Vincent S; Chiu, Yen-Cheng; Chang, Ting-Tsung

    2016-01-01

    Hepatitis B virus (HBV) quasispecies are crucial in the pathogenesis of chronic liver disease. Next-generation sequencing (NGS) is powerful for identifying viral quasispecies. To improve mapping quality and single nucleotide variant (SNV) calling accuracy in the NGS analysis of HBV, we compared different mapping references, including the sample-specific reference sequence, same genotype sequences and different genotype sequences, according to the sample. Real Illumina HBV datasets from 86 patients, and simulated datasets from 158 HBV strains in the GenBank database, were used to assess mapping quality. SNV calling accuracy was evaluated using different mapping references to align Real Illumina datasets from a single HBV clone. Using the sample-specific reference sequence as a mapping reference produced the largest number of mappable reads and coverages. With a different genotype mapping reference, the consensus sequence derived from the Real Illumina datasets of the single HBV clone showed 21 false SNV callings in polymerase and surface genes, the regions most divergent between the mapping reference and this HBV clone. A ~6 % coverage of most of these false SNVs was yielded even with a same genotype mapping reference, but none with the sample-specific reference sequence. Using sample-specific reference sequences as a mapping reference in NGS analysis optimized mapping quality and the SNV calling accuracy for HBV quasispecies.

  1. Analysis Of Segmental Duplications In The Pig Genome Based On Next-Generation Sequencing

    DEFF Research Database (Denmark)

    Fadista, João; Bendixen, Christian

    extensively studied in other organisms, its analysis in pig has been hampered by the lack of a complete pig genome assembly. By measuring the depth of coverage of Illumina whole-genome shotgun sequencing reads of the Tabasco animal aligned to the latest pig genome assembly (Sus scrofa 10 – based also...... on Tabasco), led us to the detection of a high-resolution map of segmental duplications in the pig genome. Comparing these segments with four other Duroc animals sequenced at our institute, supplied the resources needed to describe the first genome-wide and systematic analysis of segmental duplications...

  2. Non-coding sequence retrieval system for comparative genomic analysis of gene regulatory elements

    Directory of Open Access Journals (Sweden)

    Temple Matthew H

    2007-03-01

    Full Text Available Abstract Background Completion of the human genome sequence along with other species allows for greater understanding of the biochemical mechanisms and processes that govern healthy as well as diseased states. The large size of the genome sequences has made them difficult to study using traditional methods. There are many studies focusing on the protein coding sequences, however, not much is known about the function of non-coding regions of the genome. It has been demonstrated that parts of the non-coding region play a critical role as gene regulatory elements. Enhancers that regulate transcription processes have been found in intergenic regions. Furthermore, it is observed that regulatory elements found in non-coding regions are highly conserved across different species. However, the analysis of these regulatory elements is not as straightforward as it may first seem. The development of a centralized resource that allows for the quick and easy retrieval of non-coding sequences from multiple species and is capable of handing multi-gene queries is critical for the analysis of non-coding sequences. Here we describe the development of a web-based non-coding sequence retrieval system. Results This paper presents a Non-Coding Sequences Retrieval System (NCSRS. The NCSRS is a web-based bioinformatics tool that performs fast and convenient retrieval of non-coding and coding sequences from multiple species related to a specific gene or set of genes. This tool has compiled resources from multiple sources into one easy to use and convenient web based interface. With no software installation necessary, the user needs only internet access to use this tool. Conclusion The unique features of this tool will be very helpful for those studying gene regulatory elements that exist in non-coding regions. The web based application can be accessed on the internet at: http://cell.rutgers.edu/ncsrs/.

  3. Genomic sequence around butterfly wing development genes: annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Inês C Conceição

    Full Text Available BACKGROUND: Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. METHODOLOGY/PRINCIPAL FINDINGS: We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes. CONCLUSIONS: The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1 the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2 the high

  4. Facies analysis and sequence stratigraphy of the Eocene successions, east Beni Suef area, eastern Desert, Egypt

    Science.gov (United States)

    Saber, Shaban G.; Salama, Yasser F.

    2017-11-01

    Three Eocene stratigraphic successions east of the Beni Suef area are described and measured. These successions are made up of four rock units that are from base to top: Qarara (upper Lutetian), El Fashn (Bartonian), Beni Suef, and Maadi (Priabonian) formations. A detailed facies and sequence analysis unravels the stratigraphic framework and constructs a depositional model for the Middle-Upper Eocene succession. Ten microfacies types were grouped into four facies associations on a homoclinal ramp that compose the Upper Lutetian-Priabonian succession exposed in the east Beni Suef area. The depositional environment varied from a shallow to deep ramp setting. Four third-order depositional sequences were identified in the studied sections. The sequence boundaries are paleosoil horizons that can be traced throughout the entire outcrop area. Missing biozones are also evidence of the sequence boundaries. The history of these sequences mirrors the eustatic sea-level changes and the local tectonics in the region. Each sequence comprises facies associations that make up lowstand and/or transgressive and highstand systems tracts. The lowstand systems tract (LST) deposits are mainly sandstone facies and in Sequences 3 and 4 at Gabal Abyiad and Gabal Homret Shaibun respectively. The transgressive systems tract (TST) of Sequence 1 is dominated by nummulitic facies at Gabal Diya. The shale, mudstone and wackestone facies with planktic foraminifera and echinoids dominate the TST of Sequences 2 and 3 at Gabal Abyiad and Gabal Homret Shaibun respectively. The highstand systems tract (HST) of the studied sections is characterized by benthic foraminifera and bryozoan wackestone and packstone facies.

  5. Characterization of Liaoning cashmere goat transcriptome: sequencing, de novo assembly, functional annotation and comparative analysis.

    Science.gov (United States)

    Liu, Hongliang; Wang, Tingting; Wang, Jinke; Quan, Fusheng; Zhang, Yong

    2013-01-01

    Liaoning cashmere goat is a famous goat breed for cashmere wool. In order to increase the transcriptome data and accelerate genetic improvement for this breed, we performed de novo transcriptome sequencing to generate the first expressed sequence tag dataset for the Liaoning cashmere goat, using next-generation sequencing technology. Transcriptome sequencing of Liaoning cashmere goat on a Roche 454 platform yielded 804,601 high-quality reads. Clustering and assembly of these reads produced a non-redundant set of 117,854 unigenes, comprising 13,194 isotigs and 104,660 singletons. Based on similarity searches with known proteins, 17,356 unigenes were assigned to 6,700 GO categories, and the terms were summarized into three main GO categories and 59 sub-categories. 3,548 and 46,778 unigenes had significant similarity to existing sequences in the KEGG and COG databases, respectively. Comparative analysis revealed that 42,254 unigenes were aligned to 17,532 different sequences in NCBI non-redundant nucleotide databases. 97,236 (82.51%) unigenes were mapped to the 30 goat chromosomes. 35,551 (30.17%) unigenes were matched to 11,438 reported goat protein-coding genes. The remaining non-matched unigenes were further compared with cattle and human reference genes, 67 putative new goat genes were discovered. Additionally, 2,781 potential simple sequence repeats were initially identified from all unigenes. The transcriptome of Liaoning cashmere goat was deep sequenced, de novo assembled, and annotated, providing abundant data to better understand the Liaoning cashmere goat transcriptome. The potential simple sequence repeats provide a material basis for future genetic linkage and quantitative trait loci analyses.

  6. Characterization of Liaoning cashmere goat transcriptome: sequencing, de novo assembly, functional annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Hongliang Liu

    Full Text Available Liaoning cashmere goat is a famous goat breed for cashmere wool. In order to increase the transcriptome data and accelerate genetic improvement for this breed, we performed de novo transcriptome sequencing to generate the first expressed sequence tag dataset for the Liaoning cashmere goat, using next-generation sequencing technology.Transcriptome sequencing of Liaoning cashmere goat on a Roche 454 platform yielded 804,601 high-quality reads. Clustering and assembly of these reads produced a non-redundant set of 117,854 unigenes, comprising 13,194 isotigs and 104,660 singletons. Based on similarity searches with known proteins, 17,356 unigenes were assigned to 6,700 GO categories, and the terms were summarized into three main GO categories and 59 sub-categories. 3,548 and 46,778 unigenes had significant similarity to existing sequences in the KEGG and COG databases, respectively. Comparative analysis revealed that 42,254 unigenes were aligned to 17,532 different sequences in NCBI non-redundant nucleotide databases. 97,236 (82.51% unigenes were mapped to the 30 goat chromosomes. 35,551 (30.17% unigenes were matched to 11,438 reported goat protein-coding genes. The remaining non-matched unigenes were further compared with cattle and human reference genes, 67 putative new goat genes were discovered. Additionally, 2,781 potential simple sequence repeats were initially identified from all unigenes.The transcriptome of Liaoning cashmere goat was deep sequenced, de novo assembled, and annotated, providing abundant data to better understand the Liaoning cashmere goat transcriptome. The potential simple sequence repeats provide a material basis for future genetic linkage and quantitative trait loci analyses.

  7. Complete motif analysis of sequence requirements for translation initiation at non-AUG start codons.

    Science.gov (United States)

    Diaz de Arce, Alexander J; Noderer, William L; Wang, Clifford L

    2018-01-25

    The initiation of mRNA translation from start codons other than AUG was previously believed to be rare and of relatively low impact. More recently, evidence has suggested that as much as half of all translation initiation utilizes non-AUG start codons, codons that deviate from AUG by a single base. Furthermore, non-AUG start codons have been shown to be involved in regulation of expression and disease etiology. Yet the ability to gauge expression based on the sequence of a translation initiation site (start codon and its flanking bases) has been limited. Here we have performed a comprehensive analysis of translation initiation sites that utilize non-AUG start codons. By combining genetic-reporter, cell-sorting, and high-throughput sequencing technologies, we have analyzed the expression associated with all possible variants of the -4 to +4 positions of non-AUG translation initiation site motifs. This complete motif analysis revealed that 1) with the right sequence context, certain non-AUG start codons can generate expression comparable to that of AUG start codons, 2) sequence context affects each non-AUG start codon differently, and 3) initiation at non-AUG start codons is highly sensitive to changes in the flanking sequences. Complete motif analysis has the potential to be a key tool for experimental and diagnostic genomics. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Comparative analysis of genome sequences of the conifer tree pathogen, Heterobasidion annosum s.s.

    Directory of Open Access Journals (Sweden)

    Jaeyoung Choi

    2017-12-01

    Full Text Available The causal agent of root and butt rot of conifer trees, Heterobasidion annosum, is widespread in boreal forests and economically responsible for annual loss of approximately 50 million euros to forest industries in Finland alone and much more at European level. In order to further understand the pathobiology of this fungus at the genome level, a Finnish isolate of H. annosum sensu stricto (isolate 03012 was sequenced and analyzed with the genome sequences of 23 white-rot and 13 brown-rot fungi. The draft genome assembly of H. annosum has a size of 31.01 Mb, containing 11,453 predicted genes. Whole genome alignment showed that 84.38% of H. annosum genome sequences were aligned with those of previously sequenced H. irregulare TC 32-1 counterparts. The result is further supported by the protein sequence clustering analysis which revealed that the two genomes share 6719 out of 8647 clusters. When sequencing reads of H. annosum were aligned against the genome sequences of H. irregulare, six single nucleotide polymorphisms were found in every 1 kb, on average. In addition, 98.68% of SNPs were found to be homo-variants, suggesting that the two species have long evolved from different niches. Gene family analysis revealed that most of the white-rot fungi investigated had more gene families involved in lignin degradation or modification, including laccases and peroxidase. Comparative analysis of the two Heterobasidion spp. as well as white-/brown-rot fungi would provide new insights for understanding the pathobiology of the conifer tree pathogen.

  9. Human Genome Sequencing at the Population Scale: A Primer on High-Throughput DNA Sequencing and Analysis.

    Science.gov (United States)

    Goldfeder, Rachel L; Wall, Dennis P; Khoury, Muin J; Ioannidis, John P A; Ashley, Euan A

    2017-10-15

    Most human diseases have underlying genetic causes. To better understand the impact of genes on disease and its implications for medicine and public health, researchers have pursued methods for determining the sequences of individual genes, then all genes, and now complete human genomes. Massively parallel high-throughput sequencing technology, where DNA is sheared into smaller pieces, sequenced, and then computationally reordered and analyzed, enables fast and affordable sequencing of full human genomes. As the price of sequencing continues to decline, more and more individuals are having their genomes sequenced. This may facilitate better population-level disease subtyping and characterization, as well as individual-level diagnosis and personalized treatment and prevention plans. In this review, we describe several massively parallel high-throughput DNA sequencing technologies and their associated strengths, limitations, and error modes, with a focus on applications in epidemiologic research and precision medicine. We detail the methods used to computationally process and interpret sequence data to inform medical or preventative action. © The Author(s) 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  10. Transcriptome analysis for Caenorhabditis elegans based on novel expressed sequence tags

    Directory of Open Access Journals (Sweden)

    Moerman Donald G

    2008-07-01

    Full Text Available Abstract Background We have applied a high-throughput pyrosequencing technology for transcriptome profiling of Caenorhabditis elegans in its first larval stage. Using this approach, we have generated a large amount of data for expressed sequence tags, which provides an opportunity for the discovery of putative novel transcripts and alternative splice variants that could be developmentally specific to the first larval stage. This work also demonstrates the successful and efficient application of a next generation sequencing methodology. Results We have generated over 30 million bases of novel expressed sequence tags from first larval stage worms utilizing high-throughput sequencing technology. We have shown that approximately 14% of the newly sequenced expressed sequence tags map completely or partially to genomic regions where there are no annotated genes or splice variants and therefore, imply that these are novel genetic structures. Expressed sequence tags, which map to intergenic (around 1000 and intronic regions (around 580, may represent novel transcribed regions, such as unannotated or unrecognized small protein-coding or non-protein-coding genes or splice variants. Expressed sequence tags, which map across intron-exon boundaries (around 300, indicate possible alternative splice sites, while expressed sequence tags, which map near the ends of known transcripts (around 600, suggest extension of the coding or untranslated regions. We have also discovered that intergenic and intronic expressed sequence tags, which are well conserved across different nematode species, are likely to represent non-coding RNAs. Lastly, we have incorporated available serial analysis of gene expression data generated from first larval stage worms, in order to predict novel transcripts that might be specifically or predominantly expressed in the first larval stage. Conclusion We have demonstrated the use of a high-throughput sequencing methodology to efficiently

  11. A strategic stakeholder approach for addressing further analysis requests in whole genome sequencing research.

    Science.gov (United States)

    Thornock, Bradley Steven O

    2016-01-01

    Whole genome sequencing (WGS) can be a cost-effective and efficient means of diagnosis for some children, but it also raises a number of ethical concerns. One such concern is how researchers derive and communicate results from WGS, including future requests for further analysis of stored sequences. The purpose of this paper is to think about what is at stake, and for whom, in any solution that is developed to deal with such requests. To accomplish this task, this paper will utilize stakeholder theory, a common method used in business ethics. Several scenarios that connect stakeholder concerns and WGS will also posited and analyzed. This paper concludes by developing criteria composed of a series of questions that researchers can answer in order to more effectively address requests for further analysis of stored sequences.

  12. The first complete chloroplast genome sequences of Ulmus species by de novo sequencing: Genome comparative and taxonomic position analysis.

    Directory of Open Access Journals (Sweden)

    Li-Hui Zuo

    Full Text Available Elm (Ulmus has a long history of use as a high-quality heavy hardwood famous for its resistance to drought, cold, and salt. It grows in temperate, warm temperate, and subtropical regions. This is the first report of Ulmaceae chloroplast genomes by de novo sequencing. The Ulmus chloroplast genomes exhibited a typical quadripartite structure with two single-copy regions (long single copy [LSC] and short single copy [SSC] sections separated by a pair of inverted repeats (IRs. The lengths of the chloroplast genomes from five Ulmus ranged from 158,953 to 159,453 bp, with the largest observed in Ulmus davidiana and the smallest in Ulmus laciniata. The genomes contained 137-145 protein-coding genes, of which Ulmus davidiana var. japonica and U. davidiana had the most and U. pumila had the fewest. The five Ulmus species exhibited different evolutionary routes, as some genes had been lost. In total, 18 genes contained introns, 13 of which (trnL-TAA+, trnL-TAA-, rpoC1-, rpl2-, ndhA-, ycf1, rps12-, rps12+, trnA-TGC+, trnA-TGC-, trnV-TAC-, trnI-GAT+, and trnI-GAT were shared among all five species. The intron of ycf1 was the longest (5,675bp while that of trnF-AAA was the smallest (53bp. All Ulmus species except U. davidiana exhibited the same degree of amplification in the IR region. To determine the phylogenetic positions of the Ulmus species, we performed phylogenetic analyses using common protein-coding genes in chloroplast sequences of 42 other species published in NCBI. The cluster results showed the closest plants to Ulmaceae were Moraceae and Cannabaceae, followed by Rosaceae. Ulmaceae and Moraceae both belonged to Urticales, and the chloroplast genome clustering results were consistent with their traditional taxonomy. The results strongly supported the position of Ulmaceae as a member of the order Urticales. In addition, we found a potential error in the traditional taxonomies of U. davidiana and U. davidiana var. japonica, which should be

  13. The first complete chloroplast genome sequences of Ulmus species by de novo sequencing: Genome comparative and taxonomic position analysis.

    Science.gov (United States)

    Zuo, Li-Hui; Shang, Ai-Qin; Zhang, Shuang; Yu, Xiao-Yue; Ren, Ya-Chao; Yang, Min-Sheng; Wang, Jin-Mao

    2017-01-01

    Elm (Ulmus) has a long history of use as a high-quality heavy hardwood famous for its resistance to drought, cold, and salt. It grows in temperate, warm temperate, and subtropical regions. This is the first report of Ulmaceae chloroplast genomes by de novo sequencing. The Ulmus chloroplast genomes exhibited a typical quadripartite structure with two single-copy regions (long single copy [LSC] and short single copy [SSC] sections) separated by a pair of inverted repeats (IRs). The lengths of the chloroplast genomes from five Ulmus ranged from 158,953 to 159,453 bp, with the largest observed in Ulmus davidiana and the smallest in Ulmus laciniata. The genomes contained 137-145 protein-coding genes, of which Ulmus davidiana var. japonica and U. davidiana had the most and U. pumila had the fewest. The five Ulmus species exhibited different evolutionary routes, as some genes had been lost. In total, 18 genes contained introns, 13 of which (trnL-TAA+, trnL-TAA-, rpoC1-, rpl2-, ndhA-, ycf1, rps12-, rps12+, trnA-TGC+, trnA-TGC-, trnV-TAC-, trnI-GAT+, and trnI-GAT) were shared among all five species. The intron of ycf1 was the longest (5,675bp) while that of trnF-AAA was the smallest (53bp). All Ulmus species except U. davidiana exhibited the same degree of amplification in the IR region. To determine the phylogenetic positions of the Ulmus species, we performed phylogenetic analyses using common protein-coding genes in chloroplast sequences of 42 other species published in NCBI. The cluster results showed the closest plants to Ulmaceae were Moraceae and Cannabaceae, followed by Rosaceae. Ulmaceae and Moraceae both belonged to Urticales, and the chloroplast genome clustering results were consistent with their traditional taxonomy. The results strongly supported the position of Ulmaceae as a member of the order Urticales. In addition, we found a potential error in the traditional taxonomies of U. davidiana and U. davidiana var. japonica, which should be confirmed with a

  14. Antibody V and C domain sequence, structure, and interaction analysis with special reference to IMGT®.

    Science.gov (United States)

    Alamyar, Eltaf; Giudicelli, Véronique; Duroux, Patrice; Lefranc, Marie-Paule

    2014-01-01

    IMGT(®), the international ImMunoGeneTics information system(®) (http://www.imgt.org), created in 1989 (Centre National de la Recherche Scientifique, Montpellier University), is acknowledged as the global reference in immunogenetics and immunoinformatics. The accuracy and the consistency of the IMGT(®) data are based on IMGT-ONTOLOGY which bridges the gap between genes, sequences, and three-dimensional (3D) structures. Thus, receptors, chains, and domains are characterized with the same IMGT(®) rules and standards (IMGT standardized labels, IMGT gene and allele nomenclature, IMGT unique numbering, IMGT Collier de Perles), independently from the molecule type (genomic DNA, complementary DNA, transcript, or protein) or from the species. More particularly, IMGT(®) tools and databases provide a highly standardized analysis of the immunoglobulin (IG) or antibody and T cell receptor (TR) V and C domains. IMGT/V-QUEST analyzes the V domains of IG or TR rearranged nucleotide sequences, integrates the IMGT/JunctionAnalysis and IMGT/Automat tools, and provides IMGT Collier de Perles. IMGT/HighV-QUEST analyzes sequences from high-throughput sequencing (HTS) (up to 150,000 sequences per batch) and performs statistical analysis on up to 450,000 results, with the same resolution and high quality as IMGT/V-QUEST online. IMGT/DomainGapAlign analyzes amino acid sequences of V and C domains and IMGT/3Dstructure-DB and associated tools provide information on 3D structures, contact analysis, and paratope/epitope interactions. These IMGT(®) tools and databases, and the IMGT/mAb-DB interface with access to therapeutical antibody data, provide an invaluable help for antibody engineering and antibody humanization.

  15. Gestural overlap of stop-consonant sequences: Evidence from analysis and synthesis

    Science.gov (United States)

    Zhao, Sherry; Stevens, Kenneth N.

    2003-04-01

    This study uses an analysis-by-synthesis approach to discover possible principles governing the coordination of oral and laryngeal articulators in the production of English stop-consonant sequences. Individual recordings were made of two male and two female native American-English speakers reading phrases which include voiced and voiceless stop consonants in word-initial (V#CV) and word-final (VC#V) positions, as well as in VC#CV stop-stop consonant sequences. Articulatory timing estimates were made based on analyzing acoustic data including formant movements, closure durations, release bursts, and spectrum shape at low frequencies. Based on the gestural estimates, the same consonant sequences were generated using HLsyn, a quasiarticulatory synthesizer. The synthetic utterances were acoustically and perceptually compared to the actual utterances in order to verify and refine the articulatory timing estimates from which possible principles could be derived. Preliminary results agree with earlier findings of more overlapping of oral gestures in sequences with front-to-back order of place of articulation than those with back-to-front order [Chitoran, Goldstein, and Byrd, Lab. Phonology 7, 419-448 (2002)]. Furthermore, overlapping of laryngeal gestures is suggested by the smaller acoustical loss at the glottis in vowels after voiced-voiceless sequences than voiceless-voiceless sequences.

  16. Sequence analysis and genetic diversity of five new Indian isolates of cucumber mosaic virus.

    Science.gov (United States)

    Kumar, S; Gautam, K K; Raj, S K

    2015-12-01

    Cucumber mosaic virus (CMV) is an important virus since it causes severe losses to many economically important crops worldwide. Five new isolates of CMV were isolated from naturally infected Hippeastrum hybridum, Dahlia pinnata, Hemerocallis fulva, Acorus calamus and Typhonium trilobatum plants, all exhibiting severe leaf mosaic symptoms. For molecular identification and sequence analyses, the complete coat protein (CP) gene of these isolates was amplified by RT-PCR. The resulting amplicons were cloned and sequenced and isolates were designated as HH (KP698590), DP (JF682239), HF (KP698589), AC (KP698588) and TT (JX570732). For study of genetic diversity among these isolates, the sequence data were analysed by BLASTn, multiple alignment and generating phylogenetic trees along with the respective sequences of other CMV isolates available in GenBank Database were done. The isolates under study showed 82-99% sequence diversity among them at nucleotide and amino acid levels; however they showed close relationships with CMV isolates of subgroup IB. In alignment analysis of amino acid sequences of HH and AC isolates, we have found fifteen and twelve unique substitutions, compared to HF, DP and TT isolates, suggesting the cause of high genetic diversity.

  17. Characterization of squid enolase mRNA: sequence analysis, tissue distribution, and axonal localization.

    Science.gov (United States)

    Chun, J T; Gioio, A E; Crispino, M; Giuditta, A; Kaplan, B B

    1995-08-01

    Enolase is a glycolytic enzyme whose amino acid sequence is highly conserved across a wide range of animal species. In mammals, enolase is known to be a dimeric protein composed of distinct but closely related subunits: alpha (non-neuronal), beta (muscle-specific), and gamma (neuron-specific). However, little information is available on the primary sequence of enolase in invertebrates. Here we report the isolation of two overlapping cDNA clones and the putative primary structure of the enzyme from the squid (Loligo pealii) nervous system. The composite sequence of those cDNA clones is 1575 bp and contains the entire coding region (1302 bp), as well as 66 and 207 bp of 5' and 3' untranslated sequence, respectively. Cross-species comparison of enolase primary structure reveals that squid enolase shares over 70% sequence identity to vertebrate forms of the enzyme. The greatest degree of sequence similarity was manifest to the alpha isoform of the human homologue. Results of Northern analysis revealed a single 1.6 kb mRNA species, the relative abundance of which differs approximately 10-fold between various tissues. Interestingly, evidence derived from in situ hybridization and polymerase chain reaction experiments indicate that the mRNA encoding enolase is present in the squid giant axon.

  18. Sequencing and phylogenetic analysis of partial CXCR2 gene of Murrah buffalo

    Directory of Open Access Journals (Sweden)

    S. A. Wani

    2014-05-01

    Full Text Available Aim: Present study was carried out to sequence and phylogenetic analysis of CXCR2 gene of Murrah buffalo. Materials and Methods: For the present investigation, from a group of forty eight Murrah buffaloes (Bubalus bubalis, blood samples were collected randomly from eight animals, out of which four were healthy and four were mastitic. Results: The amplification of Interleukin-8B (IL-8B receptor gene target sequence was carried out using the primer pair in an optimized polymerase chain reaction. Partial sequencing of IL-8B receptor gene of Bubalus bubalis (Murrah has been done successfully. The sequences of IL-8B receptor gene showed 99% homology to that of Bos indicus × Bos taurus, 98% to that of Bos taurus, 97% to that of Ovis aries, 93% to that of Sus scrofa, 92% to that of Equus caballus and 90% to that of Felis catus. Conclusion: From the present study it can be concluded that the PCR amplification procedure for target region of IL-8B receptor gene yielding 459 bp products has been standardized, which yielded consistent and specific amplification. Amplification of partial IL-8B receptor gene (exon 2- 459 bp using self designed primers specific for cattle ortholog sequence signifies that the locus is conserved in cattle and buffaloes. In phylogenetic tree, the target sequence of IL-8B receptor gene of Bubalus bubalis were found to be more closely related to Bos indicus × Bos Taurus and Bos taurus than to Ovis aries and Sus scrofa.

  19. Comparison of standard PCR/cloning to single genome sequencing for analysis of HIV-1 populations.

    Science.gov (United States)

    Jordan, Michael R; Kearney, Mary; Palmer, Sarah; Shao, Wei; Maldarelli, Frank; Coakley, Eoin P; Chappey, Colombe; Wanke, Christine; Coffin, John M

    2010-09-01

    To compare standard PCR/cloning and single genome sequencing (SGS) in their ability to reflect actual intra-patient polymorphism of HIV-1 populations, a total of 530 HIV-1 pro-pol sequences obtained by both sequencing techniques from a set of 17 ART naïve patient specimens was analyzed. For each specimen, 12 and 15 sequences, on average, were characterized by the two techniques. Using phylogenetic analysis, tests for panmixia and entropy, and Bland-Altman plots, no difference in population structure or genetic diversity was shown in 14 of the 17 subjects. Evidence of sampling bias by the presence of subsets of identical sequences was found by either method. Overall, the study shows that neither method was more biased than the other, and providing that an adequate number of PCR templates is analyzed, and that the bulk sequencing captures the diversity of the viral population, either method is likely to provide a similar measure of population diversity. Copyright 2010 Elsevier B.V. All rights reserved.

  20. Multifractal analysis of 2001 Mw 7 . 7 Bhuj earthquake sequence in Gujarat, Western India

    Science.gov (United States)

    Aggarwal, Sandeep Kumar; Pastén, Denisse; Khan, Prosanta Kumar

    2017-12-01

    The 2001 Mw 7 . 7 Bhuj mainshock seismic sequence in the Kachchh area, occurring during 2001 to 2012, has been analyzed using mono-fractal and multi-fractal dimension spectrum analysis technique. This region was characterized by frequent moderate shocks of Mw ≥ 5 . 0 for more than a decade since the occurrence of 2001 Bhuj earthquake. The present study is therefore important for precursory analysis using this sequence. The selected long-sequence has been investigated first time for completeness magnitude Mc 3.0 using the maximum curvature method. Multi-fractal Dq spectrum (Dq ∼ q) analysis was carried out using effective window-length of 200 earthquakes with a moving window of 20 events overlapped by 180 events. The robustness of the analysis has been tested by considering the magnitude completeness correction term of 0.2 to Mc 3.0 as Mc 3.2 and we have tested the error in the calculus of Dq for each magnitude threshold. On the other hand, the stability of the analysis has been investigated down to the minimum magnitude of Mw ≥ 2 . 6 in the sequence. The analysis shows the multi-fractal dimension spectrum Dq decreases with increasing of clustering of events with time before a moderate magnitude earthquake in the sequence, which alternatively accounts for non-randomness in the spatial distribution of epicenters and its self-organized criticality. Similar behavior is ubiquitous elsewhere around the globe, and warns for proximity of a damaging seismic event in an area. OS: Please confirm math roman or italics in abs.

  1. Estimation of physiological parameters using knowledge-based factor analysis of dynamic nuclear medicine image sequences

    International Nuclear Information System (INIS)

    Yap, J.T.; Chen, C.T.; Cooper, M.

    1995-01-01

    The authors have previously developed a knowledge-based method of factor analysis to analyze dynamic nuclear medicine image sequences. In this paper, the authors analyze dynamic PET cerebral glucose metabolism and neuroreceptor binding studies. These methods have shown the ability to reduce the dimensionality of the data, enhance the image quality of the sequence, and generate meaningful functional images and their corresponding physiological time functions. The new information produced by the factor analysis has now been used to improve the estimation of various physiological parameters. A principal component analysis (PCA) is first performed to identify statistically significant temporal variations and remove the uncorrelated variations (noise) due to Poisson counting statistics. The statistically significant principal components are then used to reconstruct a noise-reduced image sequence as well as provide an initial solution for the factor analysis. Prior knowledge such as the compartmental models or the requirement of positivity and simple structure can be used to constrain the analysis. These constraints are used to rotate the factors to the most physically and physiologically realistic solution. The final result is a small number of time functions (factors) representing the underlying physiological processes and their associated weighting images representing the spatial localization of these functions. Estimation of physiological parameters can then be performed using the noise-reduced image sequence generated from the statistically significant PCs and/or the final factor images and time functions. These results are compared to the parameter estimation using standard methods and the original raw image sequences. Graphical analysis was performed at the pixel level to generate comparable parametric images of the slope and intercept (influx constant and distribution volume)

  2. CPSS: a computational platform for the analysis of small RNA deep sequencing data.

    Science.gov (United States)

    Zhang, Yuanwei; Xu, Bo; Yang, Yifan; Ban, Rongjun; Zhang, Huan; Jiang, Xiaohua; Cooke, Howard J; Xue, Yu; Shi, Qinghua

    2012-07-15

    Next generation sequencing (NGS) techniques have been widely used to document the small ribonucleic acids (RNAs) implicated in a variety of biological, physiological and pathological processes. An integrated computational tool is needed for handling and analysing the enormous datasets from small RNA deep sequencing approach. Herein, we present a novel web server, CPSS (a computational platform for the analysis of small RNA deep sequencing data), designed to completely annotate and functionally analyse microRNAs (miRNAs) from NGS data on one platform with a single data submission. Small RNA NGS data can be submitted to this server with analysis results being returned in two parts: (i) annotation analysis, which provides the most comprehensive analysis for small RNA transcriptome, including length distribution and genome mapping of sequencing reads, small RNA quantification, prediction of novel miRNAs, identification of differentially expressed miRNAs, piwi-interacting RNAs and other non-coding small RNAs between paired samples and detection of miRNA editing and modifications and (ii) functional analysis, including prediction of miRNA targeted genes by multiple tools, enrichment of gene ontology terms, signalling pathway involvement and protein-protein interaction analysis for the predicted genes. CPSS, a ready-to-use web server that integrates most functions of currently available bioinformatics tools, provides all the information wanted by the majority of users from small RNA deep sequencing datasets. CPSS is implemented in PHP/PERL+MySQL+R and can be freely accessed at http://mcg.ustc.edu.cn/db/cpss/index.html or http://mcg.ustc.edu.cn/sdap1/cpss/index.html.

  3. Hunting down frame shifts: Ecological analysis of diverse functional gene sequences

    Directory of Open Access Journals (Sweden)

    Michal eStrejcek

    2015-11-01

    Full Text Available Functional gene ecological analyses using amplicon sequencing can be challenging as translated sequences are often burdened with shifted reading frames. The aim of this work was to evaluate several bioinformatics tools designed to correct errors which arise during sequencing in an effort to reduce the number of frame-shifts (FS. Genes encoding for alpha subunits of biphenyl (bphA and benzoate (benA dioxygenases were used as model sequences. FrameBot, a FS correction tool, was able to reduce the number of detected FS to zero. However, up to 43.1% of sequences were discarded by FrameBot as non-specific targets. Therefore, we proposed a de novo mode of FrameBot for FS correction, which works on a similar basis as common chimera identifying platforms and is not dependent on reference sequences. By nature of FrameBot de novo design, it is crucial to provide it with data as error free as possible. We tested the ability of several publicly available correction tools to decrease the number of errors in the data sets. The combination of Maximum Expected Error (MEE filtering and single linkage pre-clustering (SLP proved the most efficient read procession. Applying FrameBot de novo on the processed data enabled analysis of BphA sequences with minimal losses of potentially functional sequences not homologous to those previously known. This experiment also demonstrated the extensive diversity of dioxygenases in soil. A script which performs FrameBot de novo is presented in the supplementary material to the study and the tool was implemented into FunGene Pipeline available at http://fungene.cme.msu.edu/FunGenePipeline/ and https://github.com/rdpstaff/Framebot.

  4. Phylogenetic analysis of lack gene sequences for 22 Chinese Leishmania isolates.

    Science.gov (United States)

    Zhang, Chun-Ying; Zhou, Juan; Ding, Bin; Lu, Xiao-Jun; Xiao, Yu-Ling; Hu, Xiao-Su; Ma, Ying

    2013-07-01

    The phylogenetic relationships between Chinese Leishmania strains were investigated using lack (Leishmania homolog of receptors for activated protein kinase C) gene sequences, and the power of this gene was assessed for understanding the epidemiology and population genetics of Leishmania. The lack gene sequences from Leishmania isolates were sequenced after polymerase chain reaction (PCR) amplification. Sequence alignment was performed and a phylogenetic tree was created using the MEGA 5.0 software program. Sequences of 850 bp were analyzed for each of the Leishmania strains collected from different locations in China, and minor differences in sequences were noted between the strains. Four distinct groups formed according to differences in the sequences of the lack gene. Group I consisted of 12 isolates from Shandong, Xinjiang, Gansu and Sichuan. These strains are part of the Leishmania donovani complex and are pathogenic to humans and canines. Group II included six isolates from Xinjiang and a reference strain, Leishmania turanica. Group III contained two isolates (one from a sand fly in Xinjiang and one from a rodent in Inner Mongolia) and they were identified as Leishmania gerbilli. Finally, group IV contained a strain from a sand fly in Xinjiang and a strain from a lizard in Inner Mongolia, and these strains were found to be Sauroleishmania. The Chinese Leishmania isolates formed four groups based on differences in the sequences of the lack gene, and this result is consistent with previous studies. Phylogenetic analysis suggests that the Leishmania isolates from China are more complicated than previously thought. There is consensus between genetic clustering and identification using classical methods, which means that the lack gene yields polymorphic information that could be used for genotyping Leishmania isolates. Copyright © 2013 Elsevier B.V. All rights reserved.

  5. Estimation of a Killer Whale (Orcinus orca Population's Diet Using Sequencing Analysis of DNA from Feces.

    Directory of Open Access Journals (Sweden)

    Michael J Ford

    Full Text Available Estimating diet composition is important for understanding interactions between predators and prey and thus illuminating ecosystem function. The diet of many species, however, is difficult to observe directly. Genetic analysis of fecal material collected in the field is therefore a useful tool for gaining insight into wild animal diets. In this study, we used high-throughput DNA sequencing to quantitatively estimate the diet composition of an endangered population of wild killer whales (Orcinus orca in their summer range in the Salish Sea. We combined 175 fecal samples collected between May and September from five years between 2006 and 2011 into 13 sample groups. Two known DNA composition control groups were also created. Each group was sequenced at a ~330bp segment of the 16s gene in the mitochondrial genome using an Illumina MiSeq sequencing system. After several quality controls steps, 4,987,107 individual sequences were aligned to a custom sequence database containing 19 potential fish prey species and the most likely species of each fecal-derived sequence was determined. Based on these alignments, salmonids made up >98.6% of the total sequences and thus of the inferred diet. Of the six salmonid species, Chinook salmon made up 79.5% of the sequences, followed by coho salmon (15%. Over all years, a clear pattern emerged with Chinook salmon dominating the estimated diet early in the summer, and coho salmon contributing an average of >40% of the diet in late summer. Sockeye salmon appeared to be occasionally important, at >18% in some sample groups. Non-salmonids were rarely observed. Our results are consistent with earlier results based on surface prey remains, and confirm the importance of Chinook salmon in this population's summer diet.

  6. Short-Read Sequencing for Genomic Analysis of the Brown Rot Fungus Fibroporia radiculosa

    Science.gov (United States)

    J. D. Tang; A. D. Perkins; T. S. Sonstegard; S. G. Schroeder; S. C. Burgess; S. V. Diehl

    2012-01-01

    The feasibility of short-read sequencing for genomic analysis was demonstrated for Fibroporia radiculosa, a copper-tolerant fungus that causes brown rot decay of wood. The effect of read quality on genomic assembly was assessed by filtering Illumina GAIIx reads from a single run of a paired-end library (75-nucleotide read length and 300-bp fragment...

  7. Formative Research on the Simplifying Conditions Method (SCM) for Task Analysis and Sequencing.

    Science.gov (United States)

    Kim, YoungHwan; Reigluth, Charles M.

    The Simplifying Conditions Method (SCM) is a set of guidelines for task analysis and sequencing of instructional content under the Elaboration Theory (ET). This article introduces the fundamentals of SCM and presents the findings from a formative research study on SCM. It was conducted in two distinct phases: design and instruction. In the first…

  8. Molecular cloning and sequence analysis of VP6 gene of giant ...

    African Journals Online (AJOL)

    Jane

    2011-10-24

    Oct 24, 2011 ... G), and the major structural protein of inner capsid particles (ICP), and also specific antigen of mucosa immunization that mediate specific immunological reaction. In this report, sequence analysis of VP6 gene of giant panda rotavirus was carried out. Full-length VP6 gene encoding for ICP of giant panda.

  9. Phylogenetic analysis of 23S rRNA gene sequences of some ...

    African Journals Online (AJOL)

    The phylogenetic relationships among thirteen Rhizobium leguminosarum bv. viciae isolates collected from various geographical regions were studied by analysis of the 23S rRNA sequences. The average of genetic distance among the studied isolates was very narrow (ranged from 0.00 to 0.04) and the studied isolates ...

  10. Genome-based exome sequencing analysis identifies GYG1, DIS3L ...

    Indian Academy of Sciences (India)

    Home; Journals; Journal of Genetics; Volume 96; Issue 6. Genome-based exome sequencing analysis identifies GYG1, DIS3L and DDRGK1 are associated with myocardial infarction in Koreans. JI-YOUNG LEE SANGHOON MOON YUN KYOUNG KIM SANG-HAK LEE BOK-SOO LEE MIN-YOUNG PARK JEONG EUY PARK ...

  11. Multilocus Sequence Analysis for Typing Leptospira interrogans and Leptospira kirschneri▿ †

    Science.gov (United States)

    Leon, Albertine; Pronost, Stéphane; Fortier, Guillaume; Andre-Fontaine, Geneviève; Leclercq, Roland

    2010-01-01

    Fifty-three strains belonging to the pathogenic species Leptospira interrogans and Leptospira kirschneri were analyzed by multilocus sequence analysis. The species formed two distinct branches. In the L. interrogans branch, the phylogenetic tree clustered the strains into three subgroups. Genogroups and serogroups were superimposed but not strictly. PMID:19955271

  12. Multilocus Sequence Analysis for Typing Leptospira interrogans and Leptospira kirschneri▿ †

    OpenAIRE

    Leon, Albertine; Pronost, Stéphane; Fortier, Guillaume; Andre-Fontaine, Geneviève; Leclercq, Roland

    2009-01-01

    Fifty-three strains belonging to the pathogenic species Leptospira interrogans and Leptospira kirschneri were analyzed by multilocus sequence analysis. The species formed two distinct branches. In the L. interrogans branch, the phylogenetic tree clustered the strains into three subgroups. Genogroups and serogroups were superimposed but not strictly.

  13. Reproducible Analysis of Sequencing-Based RNA Structure Probing Data with User-Friendly Tools.

    Science.gov (United States)

    Kielpinski, Lukasz Jan; Sidiropoulos, Nikolaos; Vinther, Jeppe

    2015-01-01

    RNA structure-probing data can improve the prediction of RNA secondary and tertiary structure and allow structural changes to be identified and investigated. In recent years, massive parallel sequencing has dramatically improved the throughput of RNA structure probing experiments, but at the same time also made analysis of the data challenging for scientists without formal training in computational biology. Here, we discuss different strategies for data analysis of massive parallel sequencing-based structure-probing data. To facilitate reproducible and standardized analysis of this type of data, we have made a collection of tools, which allow raw sequencing reads to be converted to normalized probing values using different published strategies. In addition, we also provide tools for visualization of the probing data in the UCSC Genome Browser and for converting RNA coordinates to genomic coordinates and vice versa. The collection is implemented as functions in the R statistical environment and as tools in the Galaxy platform, making them easily accessible for the scientific community. We demonstrate the usefulness of the collection by applying it to the analysis of sequencing-based hydroxyl radical probing data and comparing different normalization strategies. © 2015 Elsevier Inc. All rights reserved.

  14. Sequence analysis of putative swrW gene required for surfactant ...

    African Journals Online (AJOL)

    owner

    2012-07-17

    Jul 17, 2012 ... Sequence analysis of putative swrW gene required for surfactant serrawettin W1 production from Serratia marcescens. Monabel May N. Apao*, Franco G. Teves and Ma. Reina Suzette B. Madamba. Department of Biological Sciences, College of Science and Mathematics, MSU-Iligan Institute of Technology, ...

  15. Phylogenetic analysis of 23S rRNA gene sequences of some ...

    African Journals Online (AJOL)

    Tuoyo Aghomotsegin

    2016-08-31

    Aug 31, 2016 ... The phylogenetic relationships among thirteen Rhizobium leguminosarum bv. viciae isolates collected from various geographical regions were studied by analysis of the 23S rRNA sequences. The average of genetic distance among the studied isolates was very narrow (ranged from 0.00 to 0.04) and the ...

  16. Analysis of common SHOX gene sequence variants and ∼4.9-kb ...

    Indian Academy of Sciences (India)

    [Solc R., Hirschfeldova K., Kebrdlova V. and Baxova A. 2014 Analysis of common SHOX gene sequence variants and ∼4.9-kb PAR1 deletion in ISS patients. J. Genet. 93, 505–508]. Introduction. Defects of the SHOX gene (short stature homeobox- containing gene), localized in the pseudoautosomal region 1. (PAR1) have ...

  17. Sequence analysis of the N-acetyltransferase 2 gene (NAT2) among ...

    African Journals Online (AJOL)

    Yazun Bashir Jarrar

    2017-11-26

    Nov 26, 2017 ... Sequence analysis of the N-acetyltransferase 2 gene (NAT2) among Jordanian volunteers. Yazun Bashir Jarrar, Ayat Ahmed Balasmeh and Wassan Jarrar. Department of Pharmacy, College of Pharmacy, AlZaytoonah University of Jordan, Amman, Jordan. ABSTRACT. The present study aimed to identify ...

  18. Analyzing Dyadic Data Using Grid-Sequence Analysis: Interdyad Differences in Intradyad Dynamics.

    Science.gov (United States)

    Brinberg, Miriam; Ram, Nilam; Hülür, Gizem; Brick, Timothy R; Gerstorf, Denis

    2017-12-15

    Spouses are proximal contexts for and influence each other's behaviors, particularly in old age. In this article, we forward an integrated approach that merges state space grid methods adapted from the dynamic systems literature with sequence analysis methods adapted from molecular biology into a "grid-sequence" method for studying interdyad differences in intradyad dynamics. Using dyadic data from 108 older couples (MAge = 75.18 years) with six within-day emotion and activity reports over 7 days, we illustrate how grid-sequence analysis can be used to identify a taxonomy of dyads with different emotion dynamics. Results provide a basis for measuring a set of dyad-level variables that capture dynamic equilibrium, daily routines, and interdyad differences. Specifically, we identified four groups of dyads who differed in how their moment-to-moment happiness was organized, with some evidence that these patterns were related to dyad-level differences in agreement on amount of time spent with partner and in subjective health. Methodologically, grid-sequence analysis extends the toolbox of techniques for analysis of dyadic experience sampling data. Substantively, we identify patterns of dyad-level microdynamics that may serve as new markers of risk/protective factors and potential points for intervention in older adults' proximal context.

  19. ESSENTIALS: Software for Rapid Analysis of High Throughput Transposon Insertion Sequencing Data.

    NARCIS (Netherlands)

    Zomer, A.L.; Burghout, P.J.; Bootsma, H.J.; Hermans, P.W.M.; Hijum, S.A.F.T. van

    2012-01-01

    High-throughput analysis of genome-wide random transposon mutant libraries is a powerful tool for (conditional) essential gene discovery. Recently, several next-generation sequencing approaches, e.g. Tn-seq/INseq, HITS and TraDIS, have been developed that accurately map the site of transposon

  20. Optimization and Comparative Analysis of Plant Organellar DNA Enrichment Methods Suitable for Next-generation Sequencing.

    Science.gov (United States)

    Miller, Marisa E; Liberatore, Katie L; Kianian, Shahryar F

    2017-07-28

    Plant organellar genomes contain large, repetitive elements that may undergo pairing or recombination to form complex structures and/or sub-genomic fragments. Organellar genomes also exist in admixtures within a given cell or tissue type (heteroplasmy), and an abundance of subtypes may change throughout development or when under stress (sub-stoichiometric shifting). Next-generation sequencing (NGS) technologies are required to obtain deeper understanding of organellar genome structure and function. Traditional sequencing studies use several methods to obtain organellar DNA: (1) If a large amount of starting tissue is used, it is homogenized and subjected to differential centrifugation and/or gradient purification. (2) If a smaller amount of tissue is used (i.e., if seeds, material, or space is limited), the same process is performed as in (1), followed by whole-genome amplification to obtain sufficient DNA. (3) Bioinformatics analysis can be used to sequence the total genomic DNA and to parse out organellar reads. All these methods have inherent challenges and tradeoffs. In (1), it may be difficult to obtain such a large amount of starting tissue; in (2), whole-genome amplification could introduce a sequencing bias; and in (3), homology between nuclear and organellar genomes could interfere with assembly and analysis. In plants with large nuclear genomes, it is advantageous to enrich for organellar DNA to reduce sequencing costs and sequence complexity for bioinformatics analyses. Here, we compare a traditional differential centrifugation method with a fourth method, an adapted CpG-methyl pulldown approach, to separate the total genomic DNA into nuclear and organellar fractions. Both methods yield sufficient DNA for NGS, DNA that is highly enriched for organellar sequences, albeit at different ratios in mitochondria and chloroplasts. We present the optimization of these methods for wheat leaf tissue and discuss major advantages and disadvantages of each approach in

  1. Context based computational analysis and characterization of ARS consensus sequences (ACS of Saccharomyces cerevisiae genome

    Directory of Open Access Journals (Sweden)

    Vinod Kumar Singh

    2016-09-01

    Full Text Available Genome-wide experimental studies in Saccharomyces cerevisiae reveal that autonomous replicating sequence (ARS requires an essential consensus sequence (ACS for replication activity. Computational studies identified thousands of ACS like patterns in the genome. However, only a few hundreds of these sites act as replicating sites and the rest are considered as dormant or evolving sites. In a bid to understand the sequence makeup of replication sites, a content and context-based analysis was performed on a set of replicating ACS sequences that binds to origin-recognition complex (ORC denoted as ORC-ACS and non-replicating ACS sequences (nrACS, that are not bound by ORC. In this study, DNA properties such as base composition, correlation, sequence dependent thermodynamic and DNA structural profiles, and their positions have been considered for characterizing ORC-ACS and nrACS. Analysis reveals that ORC-ACS depict marked differences in nucleotide composition and context features in its vicinity compared to nrACS. Interestingly, an A-rich motif was also discovered in ORC-ACS sequences within its nucleosome-free region. Profound changes in the conformational features, such as DNA helical twist, inclination angle and stacking energy between ORC-ACS and nrACS were observed. Distribution of ACS motifs in the non-coding segments points to the locations of ORC-ACS which are found far away from the adjacent gene start position compared to nrACS thereby enabling an accessible environment for ORC-proteins. Our attempt is novel in considering the contextual view of ACS and its flanking region along with nucleosome positioning in the S. cerevisiae genome and may be useful for any computational prediction scheme.

  2. Cronobacter, the emergent bacterial pathogen Enterobacter sakazakii comes of age; MLST and whole genome sequence analysis.

    Science.gov (United States)

    Forsythe, Stephen J; Dickins, Benjamin; Jolley, Keith A

    2014-12-16

    Following the association of Cronobacter spp. to several publicized fatal outbreaks in neonatal intensive care units of meningitis and necrotising enterocolitis, the World Health Organization (WHO) in 2004 requested the establishment of a molecular typing scheme to enable the international control of the organism. This paper presents the application of Next Generation Sequencing (NGS) to Cronobacter which has led to the establishment of the Cronobacter PubMLST genome and sequence definition database (http://pubmlst.org/cronobacter/) containing over 1000 isolates with metadata along with the recognition of specific clonal lineages linked to neonatal meningitis and adult infections Whole genome sequencing and multilocus sequence typing (MLST) has supports the formal recognition of the genus Cronobacter composed of seven species to replace the former single species Enterobacter sakazakii. Applying the 7-loci MLST scheme to 1007 strains revealed 298 definable sequence types, yet only C. sakazakii clonal complex 4 (CC4) was principally associated with neonatal meningitis. This clonal lineage has been confirmed using ribosomal-MLST (51-loci) and whole genome-MLST (1865 loci) to analyse 107 whole genomes via the Cronobacter PubMLST database. This database has enabled the retrospective analysis of historic cases and outbreaks following re-identification of those strains. The Cronobacter PubMLST database offers a central, open access, reliable sequence-based repository for researchers. It has the capacity to create new analysis schemes 'on the fly', and to integrate metadata (source, geographic distribution, clinical presentation). It is also expandable and adaptable to changes in taxonomy, and able to support the development of reliable detection methods of use to industry and regulatory authorities. Therefore it meets the WHO (2004) request for the establishment of a typing scheme for this emergent bacterial pathogen. Whole genome sequencing has additionally shown a range

  3. Masking as an effective quality control method for next-generation sequencing data analysis.

    Science.gov (United States)

    Yun, Sajung; Yun, Sijung

    2014-12-13

    Next generation sequencing produces base calls with low quality scores that can affect the accuracy of identifying simple nucleotide variation calls, including single nucleotide polymorphisms and small insertions and deletions. Here we compare the effectiveness of two data preprocessing methods, masking and trimming, and the accuracy of simple nucleotide variation calls on whole-genome sequence data from Caenorhabditis elegans. Masking substitutes low quality base calls with 'N's (undetermined bases), whereas trimming removes low quality bases that results in a shorter read lengths. We demonstrate that masking is more effective than trimming in reducing the false-positive rate in single nucleotide polymorphism (SNP) calling. However, both of the preprocessing methods did not affect the false-negative rate in SNP calling with statistical significance compared to the data analysis without preprocessing. False-positive rate and false-negative rate for small insertions and deletions did not show differences between masking and trimming. We recommend masking over trimming as a more effective preprocessing method for next generation sequencing data analysis since masking reduces the false-positive rate in SNP calling without sacrificing the false-negative rate although trimming is more commonly used currently in the field. The perl script for masking is available at http://code.google.com/p/subn/. The sequencing data used in the study were deposited in the Sequence Read Archive (SRX450968 and SRX451773).

  4. Exploring genome wide bisulfite sequencing for DNA methylation analysis in livestock: A technical assessment

    Directory of Open Access Journals (Sweden)

    Rachael eDoherty

    2014-05-01

    Full Text Available Recent advances made in omics technologies are contributing to a revolution in livestock selection and breeding practices. Epigenetic mechanisms, including DNA methylation are important determinants for the control of gene expression in mammals. DNA methylation research will help our understanding of how environmental factors contribute to phenotypic variation of complex production and health traits. High-throughput sequencing is a vital tool for the comprehensive analysis of DNA methylation, and bisulfite-based strategies coupled with DNA sequencing allows for quantitative, site-specific methylation analysis at the genome level or genome wide. Reduced representation bisulfite sequencing (RRBS and more recently whole genome bisulfite sequencing (WGBS have proven to be effective techniques for studying DNA methylation in both humans and mice. Here we report the development of RRBS and WGBS for use in sheep, the first application of this technology in livestock species. Important technical issues associated with these methodologies including fragment size selection and sequence depth are examined and discussed.

  5. sRNAnalyzer—a flexible and customizable small RNA sequencing data analysis pipeline

    Science.gov (United States)

    Kim, Taek-Kyun; Baxter, David; Scherler, Kelsey; Gordon, Aaron; Fong, Olivia; Etheridge, Alton; Galas, David J.

    2017-01-01

    Abstract Although many tools have been developed to analyze small RNA sequencing (sRNA-Seq) data, it remains challenging to accurately analyze the small RNA population, mainly due to multiple sequence ID assignment caused by short read length. Additional issues in small RNA analysis include low consistency of microRNA (miRNA) measurement results across different platforms, miRNA mapping associated with miRNA sequence variation (isomiR) and RNA editing, and the origin of those unmapped reads after screening against all endogenous reference sequence databases. To address these issues, we built a comprehensive and customizable sRNA-Seq data analysis pipeline—sRNAnalyzer, which enables: (i) comprehensive miRNA profiling strategies to better handle isomiRs and summarization based on each nucleotide position to detect potential SNPs in miRNAs, (ii) different sequence mapping result assignment approaches to simulate results from microarray/qRT-PCR platforms and a local probabilistic model to assign mapping results to the most-likely IDs, (iii) comprehensive ribosomal RNA filtering for accurate mapping of exogenous RNAs and summarization based on taxonomy annotation. We evaluated our pipeline on both artificial samples (including synthetic miRNA and Escherichia coli cultures) and biological samples (human tissue and plasma). sRNAnalyzer is implemented in Perl and available at: http://srnanalyzer.systemsbiology.net/. PMID:29069500

  6. sRNAnalyzer-a flexible and customizable small RNA sequencing data analysis pipeline.

    Science.gov (United States)

    Wu, Xiaogang; Kim, Taek-Kyun; Baxter, David; Scherler, Kelsey; Gordon, Aaron; Fong, Olivia; Etheridge, Alton; Galas, David J; Wang, Kai

    2017-12-01

    Although many tools have been developed to analyze small RNA sequencing (sRNA-Seq) data, it remains challenging to accurately analyze the small RNA population, mainly due to multiple sequence ID assignment caused by short read length. Additional issues in small RNA analysis include low consistency of microRNA (miRNA) measurement results across different platforms, miRNA mapping associated with miRNA sequence variation (isomiR) and RNA editing, and the origin of those unmapped reads after screening against all endogenous reference sequence databases. To address these issues, we built a comprehensive and customizable sRNA-Seq data analysis pipeline-sRNAnalyzer, which enables: (i) comprehensive miRNA profiling strategies to better handle isomiRs and summarization based on each nucleotide position to detect potential SNPs in miRNAs, (ii) different sequence mapping result assignment approaches to simulate results from microarray/qRT-PCR platforms and a local probabilistic model to assign mapping results to the most-likely IDs, (iii) comprehensive ribosomal RNA filtering for accurate mapping of exogenous RNAs and summarization based on taxonomy annotation. We evaluated our pipeline on both artificial samples (including synthetic miRNA and Escherichia coli cultures) and biological samples (human tissue and plasma). sRNAnalyzer is implemented in Perl and available at: http://srnanalyzer.systemsbiology.net/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. Analysis of plant microbe interactions in the era of next generation sequencing technologies

    Directory of Open Access Journals (Sweden)

    Claudia eKnief

    2014-05-01

    Full Text Available Next generation sequencing (NGS technologies have impressively accelerated research in biological science during the last years by enabling the production of large volumes of sequence data to a drastically lower price per base, compared to traditional sequencing methods. The recent and ongoing developments in the field allow addressing research questions in plant-microbe biology that were not conceivable just a few years ago. The present review provides an overview of NGS technologies and their usefulness for the analysis of microorganisms that live in association with plants. Possible limitations of the different sequencing systems, in particular sources of errors and bias, are critically discussed and methods are disclosed that help to overcome these shortcomings. A focus will be on the application of NGS methods in metagenomic studies, including the analysis of microbial communities by amplicon sequencing, which can be considered as a targeted metagenomic approach. Different applications of NGS technologies are exemplified by selected research articles that address the biology of the pant associated microbiota to demonstrate the worth of the new methods.

  8. Complete mitochondrial genome sequence of Marmota himalayana (Rodentia: Sciuridae) and phylogenetic analysis within Rodentia.

    Science.gov (United States)

    Chao, Q J; Li, Y D; Geng, X X; Zhang, L; Dai, X; Zhang, X; Li, J; Zhang, H J

    2014-04-14

    This is the first report of a complete mitochondrial genome sequence from Himalayan marmot (Marmota himalayana, class Marmota). We determined the M. himalayana mitochondrial (mt) genome sequence by using long-PCR methods and a primer-walking sequencing strategy with genus-specific primers. The complete mt genome of M. himalayana was 16,443 bp in length and comprised 13 protein-coding genes, 2 ribosomal RNA (rRNA) genes, 22 transfer RNA (tRNA) genes, and a typical control region (CR). Gene order and orientation were identical to those in mt genomes of most vertebrates. The heavy strand showed an overall A+T content of 63.49%. AT and GC skews for the mt genome of the M. himalayana were 0.012 and -0.300, respectively, indicating a nucleotide bias against T and G. The control region was 997 bp in size and displayed some unusual features, including absence of repeated motifs and two conserved sequence blocks (CSB2 and CSB3), which is consistent with observations from two other rodent species, Sciurus vulgaris and Myoxus glis. Phylogenetic analysis of complete mt DNA sequences without the control region including 30 taxa of Rodentia was performed with Maximum-Likelihood (ML) and Bayesian Inference (BI) methods and provided strong support for Sciurognathi polyphyly and Hystricognathi monophyly. This analysis also provided evidence that M. himalayana mt DNA was closely related to that from Sciurus vulgaris (Sciuridae) and was similar to mt DNA from Myoxus glis.

  9. Genome wide characterization of simple sequence repeats in watermelon genome and their application in comparative mapping and genetic diversity analysis

    Science.gov (United States)

    Simple sequence repeats (SSR) or microsatellite markers are one of the most informative and versatile DNA-based markers. The use of next-generation sequencing technologies allow whole genome sequencing and make it possible to develop large numbers of SSRs through bioinformatic analysis of genome da...

  10. Method and apparatus for enhanced sequencing of complex molecules using surface-induced dissociation in conjunction with mass spectrometric analysis

    Science.gov (United States)

    Laskin, Julia [Richland, WA; Futrell, Jean H [Richland, WA

    2008-04-29

    The invention relates to a method and apparatus for enhanced sequencing of complex molecules using surface-induced dissociation (SID) in conjunction with mass spectrometric analysis. Results demonstrate formation of a wide distribution of structure-specific fragments having wide sequence coverage useful for sequencing and identifying the complex molecules.

  11. MethVisual - visualization and exploratory statistical analysis of DNA methylation profiles from bisulfite sequencing.

    Science.gov (United States)

    Zackay, Arie; Steinhoff, Christine

    2010-12-15

    Exploration of DNA methylation and its impact on various regulatory mechanisms has become a very active field of research. Simultaneously there is an arising need for tools to process and analyse the data together with statistical investigation and visualisation. MethVisual is a new application that enables exploratory analysis and intuitive visualization of DNA methylation data as is typically generated by bisulfite sequencing. The package allows the import of DNA methylation sequences, aligns them and performs quality control comparison. It comprises basic analysis steps as lollipop visualization, co-occurrence display of methylation of neighbouring and distant CpG sites, summary statistics on methylation status, clustering and correspondence analysis. The package has been developed for methylation data but can be also used for other data types for which binary coding can be inferred. The application of the package, as well as a comparison to existing DNA methylation analysis tools and its workflow based on two datasets is presented in this paper. The R package MethVisual offers various analysis procedures for data that can be binarized, in particular for bisulfite sequenced methylation data. R/Bioconductor has become one of the most important environments for statistical analysis of various types of biological and medical data. Therefore, any data analysis within R that allows the integration of various data types as provided from different technological platforms is convenient. It is the first and so far the only specific package for DNA methylation analysis, in particular for bisulfite sequenced data available in R/Bioconductor enviroment. The package is available for free at http://methvisual.molgen.mpg.de/ and from the Bioconductor Consortium http://www.bioconductor.org.

  12. A New Borrelia Species Defined by Multilocus Sequence Analysis of Housekeeping Genes▿ †

    Science.gov (United States)

    Margos, Gabriele; Vollmer, Stephanie A.; Cornet, Muriel; Garnier, Martine; Fingerle, Volker; Wilske, Bettina; Bormane, Antra; Vitorino, Liliana; Collares-Pereira, Margarida; Drancourt, Michel; Kurtenbach, Klaus

    2009-01-01

    Analysis of Lyme borreliosis (LB) spirochetes, using a novel multilocus sequence analysis scheme, revealed that OspA serotype 4 strains (a rodent-associated ecotype) of Borrelia garinii were sufficiently genetically distinct from bird-associated B. garinii strains to deserve species status. We suggest that OspA serotype 4 strains be raised to species status and named Borrelia bavariensis sp. nov. The rooted phylogenetic trees provide novel insights into the evolutionary history of LB spirochetes. PMID:19542332

  13. A new Borrelia species defined by multilocus sequence analysis of housekeeping genes.

    Science.gov (United States)

    Margos, Gabriele; Vollmer, Stephanie A; Cornet, Muriel; Garnier, Martine; Fingerle, Volker; Wilske, Bettina; Bormane, Antra; Vitorino, Liliana; Collares-Pereira, Margarida; Drancourt, Michel; Kurtenbach, Klaus

    2009-08-01

    Analysis of Lyme borreliosis (LB) spirochetes, using a novel multilocus sequence analysis scheme, revealed that OspA serotype 4 strains (a rodent-associated ecotype) of Borrelia garinii were sufficiently genetically distinct from bird-associated B. garinii strains to deserve species status. We suggest that OspA serotype 4 strains be raised to species status and named Borrelia bavariensis sp. nov. The rooted phylogenetic trees provide novel insights into the evolutionary history of LB spirochetes.

  14. Whole genome sequence phylogenetic analysis of four Mexican rabies viruses isolated from cattle.

    Science.gov (United States)

    Bárcenas-Reyes, I; Loza-Rubio, E; Cantó-Alarcón, G J; Luna-Cozar, J; Enríquez-Vázquez, A; Barrón-Rodríguez, R J; Milián-Suazo, F

    2017-08-01

    Phylogenetic analysis of the rabies virus in molecular epidemiology has been traditionally performed on partial sequences of the genome, such as the N, G, and P genes; however, that approach raises concerns about the discriminatory power compared to whole genome sequencing. In this study we characterized four strains of the rabies virus isolated from cattle in Querétaro, Mexico by comparing the whole genome sequence to that of strains from the American, European and Asian continents. Four cattle brain samples positive to rabies and characterized as AgV11, genotype 1, were used in the study. A cDNA sequence was generated by reverse transcription PCR (RT-PCR) using oligo dT. cDNA samples were sequenced in an Illumina NextSeq 500 platform. The phylogenetic analysis was performed with MEGA 6.0. Minimum evolution phylogenetic trees were constructed with the Neighbor-Joining method and bootstrapped with 1000 replicates. Three large and seven small clusters were formed with the 26 sequences used. The largest cluster grouped strains from different species in South America: Brazil, and the French Guyana. The second cluster grouped five strains from Mexico. A Mexican strain reported in a different study was highly related to our four strains, suggesting common source of infection. The phylogenetic analysis shows that the type of host is different for the different regions in the American Continent; rabies is more related to bats. It was concluded that the rabies virus in central Mexico is genetically stable and that it is transmitted by the vampire bat Desmodus rotundus. Copyright © 2017 Elsevier Ltd. All rights reserved.

  15. Generation and analysis of expressed sequence tags (ESTs) for marker development in yam (Dioscorea alata L.)

    Science.gov (United States)

    2011-01-01

    Background Anthracnose (Colletotrichum gloeosporioides) is a major limiting factor in the production of yam (Dioscorea spp.) worldwide. Availability of high quality sequence information is necessary for designing molecular markers associated with resistance. However, very limited sequence information pertaining to yam is available at public genome databases. Therefore, this collaborative project was developed for genetic improvement and germplasm characterization of yams using molecular markers. The current investigation is focused on studying gene expression, by large scale generation of ESTs, from one susceptible (TDa 95-0310) and two resistant yam genotypes (TDa 87-01091, TDa 95-0328) challenged with the fungus. Total RNA was isolated from young leaves of resistant and susceptible genotypes and cDNA libraries were sequenced using Roche 454 technology. Results A total of 44,757 EST sequences were generated from the cDNA libraries of the resistant and susceptible genotypes. Greater than 56% of ESTs were annotated using MapMan Mercator tool and Blast2GO search tools. Gene annotations were used to characterize the transcriptome in yam and also perform a differential gene expression analysis between the resistant and susceptible EST datasets. Mining for SSRs in the ESTs revealed 1702 unique sequences containing SSRs and 1705 SSR markers were designed using those sequences. Conclusion We have developed a comprehensive annotated transcriptome data set in yam to enrich the EST information in public databases. cDNA libraries were constructed from anthracnose fungus challenged leaf tissues for transcriptome characterization, and differential gene expression analysis. Thus, it helped in identifying unique transcripts in each library for disease resistance. These EST resources provide the basis for future microarray development, marker validation, genetic linkage mapping and QTL analysis in Dioscorea species. PMID:21303556

  16. Generation and analysis of expressed sequence tags (ESTs for marker development in yam (Dioscorea alata L.

    Directory of Open Access Journals (Sweden)

    Robert Asiedu

    2011-02-01

    Full Text Available Abstract Background Anthracnose (Colletotrichum gloeosporioides is a major limiting factor in the production of yam (Dioscorea spp. worldwide. Availability of high quality sequence information is necessary for designing molecular markers associated with resistance. However, very limited sequence information pertaining to yam is available at public genome databases. Therefore, this collaborative project was developed for genetic improvement and germplasm characterization of yams using molecular markers. The current investigation is focused on studying gene expression, by large scale generation of ESTs, from one susceptible (TDa 95-0310 and two resistant yam genotypes (TDa 87-01091, TDa 95-0328 challenged with the fungus. Total RNA was isolated from young leaves of resistant and susceptible genotypes and cDNA libraries were sequenced using Roche 454 technology. Results A total of 44,757 EST sequences were generated from the cDNA libraries of the resistant and susceptible genotypes. Greater than 56% of ESTs were annotated using MapMan Mercator tool and Blast2GO search tools. Gene annotations were used to characterize the transcriptome in yam and also perform a differential gene expression analysis between the resistant and susceptible EST datasets. Mining for SSRs in the ESTs revealed 1702 unique sequences containing SSRs and 1705 SSR markers were designed using those sequences. Conclusion We have developed a comprehensive annotated transcriptome data set in yam to enrich the EST information in public databases. cDNA libraries were constructed from anthracnose fungus challenged leaf tissues for transcriptome characterization, and differential gene expression analysis. Thus, it helped in identifying unique transcripts in each library for disease resistance. These EST resources provide the basis for future microarray development, marker validation, genetic linkage mapping and QTL analysis in Dioscorea species.

  17. Detection of methylation in promoter sequences by melting curve analysis-based semiquantitative real time PCR

    Directory of Open Access Journals (Sweden)

    Lázcoz Paula

    2008-02-01

    Full Text Available Abstract Background We present two melting curve analysis (MCA-based semiquantitative real time PCR techniques to detect the promoter methylation status of genes. The first, MCA-MSP, follows the same principle as standard MSP but it is performed in a real time thermalcycler with results being visualized in a melting curve. The second, MCA-Meth, uses a single pair of primers designed with no CpGs in its sequence. These primers amplify both unmethylated and methylated sequences. In clinical applications the MSP technique has revolutionized methylation detection by simplifying the analysis to a PCR-based protocol. MCA-analysis based techniques may be able to further improve and simplify methylation analyses by reducing starting DNA amounts, by introducing an all-in-one tube reaction and by eliminating a final gel stage for visualization of the result. The current study aimed at investigating the feasibility of both MCA-MSP and MCA-Meth in the analysis of promoter methylation, and at defining potential advantages and shortcomings in comparison to currently implemented techniques, i.e. bisulfite sequencing and standard MSP. Methods The promoters of the RASSF1A (3p21.3, BLU (3p21.3 and MGMT (10q26 genes were analyzed by MCA-MSP and MCA-Meth in 13 astrocytoma samples, 6 high grade glioma cell lines and 4 neuroblastoma cell lines. The data were compared with standard MSP and validated by bisulfite sequencing. Results Both, MCA-MSP and MCA-Meth, successfully determined promoter methylation. MCA-MSP provided information similar to standard MSP analyses. However the analysis was possible in a single tube and avoided the gel stage. MCA-Meth proved to be useful in samples with intermediate methylation status, reflected by a melting curve position shift in dependence on methylation extent. Conclusion We propose MCA-MSP and MCA-Meth as alternative or supplementary techniques to MSP or bisulfite sequencing.

  18. Biostratigraphic analysis of the top layer of sediment cores from the reference and test sites of the INDEX area

    Digital Repository Service at National Institute of Oceanography (India)

    Gupta, S.M.

    searched,identified,illustrated (Figure 1),counted,and tab-ulated for range charts (Figure 2). 260 S. M. Gupta Figure 1.Scanning electron microscopic photomicrographs of the Late Neogene radiolarian indexspecies:(1) Buccinosphaera invaginata...

  19. Phylogenetic position of Taylorella equigenitalis determined by analysis of amplified 16S ribosomal DNA sequences.

    Science.gov (United States)

    Bleumink-Pluym, N M; van Dijk, L; van Vliet, A H; van der Giessen, J W; van der Zeijst, B A

    1993-07-01

    The 16S ribosomal DNA sequence of Taylorella equigenitalis (formerly Haemophilus equigenitalis), the causative organism of contagious equine metritis, was determined. A phylogenetic analysis of this sequence revealed a phylogenetic position of T. equigenitalis in the beta subclass of the class Proteobacteria apart from the position of Haemophilus influenzae, which belongs to the gamma subclass of Proteobacteria. A close phylogenetic relationship among T. equigenitalis, Alcaligenes xylosoxidans, and Bordetella bronchiseptica was detected; Spirillum volutans and Chromobacterium fluviatile (Iodobacter fluviatile) were in the same group but slightly removed. This relationship is surprising in view of the considerable differences in the G + C contents of the genomes of these bacteria.

  20. MPSA: integrated system for multiple protein sequence analysis with client/server capabilities.

    Science.gov (United States)

    Blanchet, C; Combet, C; Geourjon, C; Deléage, G

    2000-03-01

    MPSA is a stand-alone software intended to protein sequence analysis with a high integration level and Web clients/server capabilities. It provides many methods and tools, which are integrated into an interactive graphical user interface. It is available for most Unix/Linux and non-Unix systems. MPSA is able to connect to a Web server (e.g. http://pbil.ibcp.fr/NPSA) in order to perform large-scale sequence comparison on up-to-date databanks. Free to academic http://www.ibcp.fr/mpsa/ c.blanchet@ibcp.fr

  1. Analysis of simulated image sequences from sensors for restricted-visibility operations

    Science.gov (United States)

    Kasturi, Rangachar

    1991-01-01

    A real time model of the visible output from a 94 GHz sensor, based on a radiometric simulation of the sensor, was developed. A sequence of images as seen from an aircraft as it approaches for landing was simulated using this model. Thirty frames from this sequence of 200 x 200 pixel images were analyzed to identify and track objects in the image using the Cantata image processing package within the visual programming environment provided by the Khoros software system. The image analysis operations are described.

  2. An overview of the Phalaenopsis orchid genome through BAC end sequence analysis

    Directory of Open Access Journals (Sweden)

    Hsiao Yu-Yun

    2011-01-01

    Full Text Available Abstract Background Phalaenopsis orchids are popular floral crops, and development of new cultivars is economically important to floricultural industries worldwide. Analysis of orchid genes could facilitate orchid improvement. Bacterial artificial chromosome (BAC end sequences (BESs can provide the first glimpses into the sequence composition of a novel genome and can yield molecular markers for use in genetic mapping and breeding. Results We used two BAC libraries (constructed using the BamHI and HindIII restriction enzymes of Phalaenopsis equestris to generate pair-end sequences from 2,920 BAC clones (71.4% and 28.6% from the BamHI and HindIII libraries, respectively, at a success rate of 95.7%. A total of 5,535 BESs were generated, representing 4.5 Mb, or about 0.3% of the Phalaenopsis genome. The trimmed sequences ranged from 123 to 1,397 base pairs (bp in size, with an average edited read length of 821 bp. When these BESs were subjected to sequence homology searches, it was found that 641 (11.6% were predicted to represent protein-encoding regions, whereas 1,272 (23.0% contained repetitive DNA. Most of the repetitive DNA sequences were gypsy- and copia-like retrotransposons (41.9% and 12.8%, respectively, whereas only 10.8% were DNA transposons. Further, 950 potential simple sequence repeats (SSRs were discovered. Dinucleotides were the most abundant repeat motifs; AT/TA dimer repeats were the most frequent SSRs, representing 253 (26.6% of all identified SSRs. Microsynteny analysis revealed that more BESs mapped to the whole-genome sequences of poplar than to those of grape or Arabidopsis, and even fewer mapped to the rice genome. This work will facilitate analysis of the Phalaenopsis genome, and will help clarify similarities and differences in genome composition between orchids and other plant species. Conclusion Using BES analysis, we obtained an overview of the Phalaenopsis genome in terms of gene abundance, the presence of repetitive

  3. Deep sequencing analysis of the developing mouse brain reveals a novel microRNA

    OpenAIRE

    Ling, King-Hwa; Brautigan, Peter J; Hahn, Christopher N; Daish, Tasman; Rayner, John R; Cheah, Pike-See; Raison, Joy M; Piltz, Sandra; Mann, Jeffrey R; Mattiske, Deidre M; Thomas, Paul Q; Adelson, David L; Scott, Hamish S

    2011-01-01

    Abstract Background MicroRNAs (miRNAs) are small non-coding RNAs that can exert multilevel inhibition/repression at a post-transcriptional or protein synthesis level during disease or development. Characterisation of miRNAs in adult mammalian brains by deep sequencing has been reported previously. However, to date, no small RNA profiling of the developing brain has been undertaken using this method. We have performed deep sequencing and small RNA analysis of a developing (E15.5) mouse brain. ...

  4. Genome cluster database. A sequence family analysis platform for Arabidopsis and rice.

    Science.gov (United States)

    Horan, Kevin; Lauricha, Josh; Bailey-Serres, Julia; Raikhel, Natasha; Girke, Thomas

    2005-05-01

    The genome-wide protein sequences from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) spp. japonica were clustered into families using sequence similarity and domain-based clustering. The two fundamentally different methods resulted in separate cluster sets with complementary properties to compensate the limitations for accurate family analysis. Functional names for the identified families were assigned with an efficient computational approach that uses the description of the most common molecular function gene ontology node within each cluster. Subsequently, multiple alignments and phylogenetic trees were calculated for the assembled families. All clustering results and their underlying sequences were organized in the Web-accessible Genome Cluster Database (http://bioinfo.ucr.edu/projects/GCD) with rich interactive and user-friendly sequence family mining tools to facilitate the analysis of any given family of interest for the plant science community. An automated clustering pipeline ensures current information for future updates in the annotations of the two genomes and clustering improvements. The analysis allowed the first systematic identification of family and singlet proteins present in both organisms as well as those restricted to one of them. In addition, the established Web resources for mining these data provide a road map for future studies of the composition and structure of protein families between the two species.

  5. DNA sequence analysis of X-ray induced Adh null mutations in Drosophila melanogaster

    International Nuclear Information System (INIS)

    Mahmoud, J.; Fossett, N.G.; Arbour-Reily, P.; McDaniel, M.; Tucker, A.; Chang, S.H.; Lee, W.R.

    1991-01-01

    The mutational spectrum for 28 X-ray induced mutations and 2 spontaneous mutations, previously determined by genetic and cytogenetic methods, consisted of 20 multilocus deficiencies (19 induced and 1 spontaneous) and 10 intragenic mutations (9 induced and 1 spontaneous). One of the X-ray induced intragenic mutations was lost, and another was determined to be a recombinant with the allele used in the recovery scheme. The DNA sequence of two X-ray induced intragenic mutations has been published. This paper reports the results of DNA sequence analysis of the remaining intragenic mutations and a summary of the X-ray induced mutational spectrum. The combination of DNA sequence analysis with genetic complementation analysis shows a continuous distribution in size of deletions rather than two different types of mutations consisting of deletions and 'point mutations'. Sequencing is shown to be essential for detecting intragenic deletions. Of particular importance for future studies is the observation that all of the intragenic deletions consist of a direct repeat adjacent to the breakpoint with one of the repeats deleted

  6. ANTHEPROT: an integrated protein sequence analysis software with client/server capabilities.

    Science.gov (United States)

    Deléage, G; Combet, C; Blanchet, C; Geourjon, C

    2001-07-01

    Programs devoted to the analysis of protein sequences exist either as stand-alone programs or as Web servers. However, stand-alone programs can hardly accommodate for the analysis that involves comparisons on databanks, which require regular updates. Moreover, Web servers cannot be as efficient as stand-alone programs when dealing with real-time graphic display. We describe here a stand-alone software program called ANTHEPROT, which is intended to perform protein sequence analysis with a high integration level and clients/server capabilities. It is an interactive program with a graphical user interface that allows handling of protein sequence and data in a very interactive and convenient manner. It provides many methods and tools, which are integrated into a graphical user interface. ANTHEPROT is available for Windows-based systems. It is able to connect to a Web server in order to perform large-scale sequence comparison on up-to-date databanks. ANTHEPROT is freely available to academic users and may be downloaded at http://pbil.ibcp.fr/ANTHEPROT.

  7. Whole-Genome Sequencing and Variant Analysis of Human Papillomavirus 16 Infections.

    Science.gov (United States)

    van der Weele, Pascal; Meijer, Chris J L M; King, Audrey J

    2017-10-01

    Human papillomavirus (HPV) is a strongly conserved DNA virus, high-risk types of which can cause cervical cancer in persistent infections. The most common type found in HPV-attributable cancer is HPV16, which can be subdivided into four lineages (A to D) with different carcinogenic properties. Studies have shown HPV16 sequence diversity in different geographical areas, but only limited information is available regarding HPV16 diversity within a population, especially at the whole-genome level. We analyzed HPV16 major variant diversity and conservation in persistent infections and performed a single nucleotide polymorphism (SNP) comparison between persistent and clearing infections. Materials were obtained in the Netherlands from a cohort study with longitudinal follow-up for up to 3 years. Our analysis shows a remarkably large variant diversity in the population. Whole-genome sequences were obtained for 57 persistent and 59 clearing HPV16 infections, resulting in 109 unique variants. Interestingly, persistent infections were completely conserved through time. One reinfection event was identified where the initial and follow-up samples clustered differently. Non-A1/A2 variants seemed to clear preferentially ( P = 0.02). Our analysis shows that population-wide HPV16 sequence diversity is very large. In persistent infections, the HPV16 sequence was fully conserved. Sequencing can identify HPV16 reinfections, although occurrence is rare. SNP comparison identified no strongly acting effect of the viral genome affecting HPV16 infection clearance or persistence in up to 3 years of follow-up. These findings suggest the progression of an early HPV16 infection could be host related. IMPORTANCE Human papillomavirus 16 (HPV16) is the predominant type found in cervical cancer. Progression of initial infection to cervical cancer has been linked to sequence properties; however, knowledge of variants circulating in European populations, especially with longitudinal follow-up, is

  8. Internal event analysis for Laguna Verde Unit 1 Nuclear Power Plant. Accident sequence quantification and results

    International Nuclear Information System (INIS)

    Huerta B, A.; Aguilar T, O.; Nunez C, A.; Lopez M, R.

    1994-01-01

    The Level 1 results of Laguna Verde Nuclear Power Plant PRA are presented in the I nternal Event Analysis for Laguna Verde Unit 1 Nuclear Power Plant, CNSNS-TR 004, in five volumes. The reports are organized as follows: CNSNS-TR 004 Volume 1: Introduction and Methodology. CNSNS-TR4 Volume 2: Initiating Event and Accident Sequences. CNSNS-TR 004 Volume 3: System Analysis. CNSNS-TR 004 Volume 4: Accident Sequence Quantification and Results. CNSNS-TR 005 Volume 5: Appendices A, B and C. This volume presents the development of the dependent failure analysis, the treatment of the support system dependencies, the identification of the shared-components dependencies, and the treatment of the common cause failure. It is also presented the identification of the main human actions considered along with the possible recovery actions included. The development of the data base and the assumptions and limitations in the data base are also described in this volume. The accident sequences quantification process and the resolution of the core vulnerable sequences are presented. In this volume, the source and treatment of uncertainties associated with failure rates, component unavailabilities, initiating event frequencies, and human error probabilities are also presented. Finally, the main results and conclusions for the Internal Event Analysis for Laguna Verde Nuclear Power Plant are presented. The total core damage frequency calculated is 9.03x 10-5 per year for internal events. The most dominant accident sequences found are the transients involving the loss of offsite power, the station blackout accidents, and the anticipated transients without SCRAM (ATWS). (Author)

  9. A functional U-statistic method for association analysis of sequencing data.

    Science.gov (United States)

    Jadhav, Sneha; Tong, Xiaoran; Lu, Qing

    2017-11-01

    Although sequencing studies hold great promise for uncovering novel variants predisposing to human diseases, the high dimensionality of the sequencing data brings tremendous challenges to data analysis. Moreover, for many complex diseases (e.g., psychiatric disorders) multiple related phenotypes are collected. These phenotypes can be different measurements of an underlying disease, or measurements characterizing multiple related diseases for studying common genetic mechanism. Although jointly analyzing these phenotypes could potentially increase the power of identifying disease-associated genes, the different types of phenotypes pose challenges for association analysis. To address these challenges, we propose a nonparametric method, functional U-statistic method (FU), for multivariate analysis of sequencing data. It first constructs smooth functions from individuals' sequencing data, and then tests the association of these functions with multiple phenotypes by using a U-statistic. The method provides a general framework for analyzing various types of phenotypes (e.g., binary and continuous phenotypes) with unknown distributions. Fitting the genetic variants within a gene using a smoothing function also allows us to capture complexities of gene structure (e.g., linkage disequilibrium, LD), which could potentially increase the power of association analysis. Through simulations, we compared our method to the multivariate outcome score test (MOST), and found that our test attained better performance than MOST. In a real data application, we apply our method to the sequencing data from Minnesota Twin Study (MTS) and found potential associations of several nicotine receptor subunit (CHRN) genes, including CHRNB3, associated with nicotine dependence and/or alcohol dependence. © 2017 WILEY PERIODICALS, INC.

  10. Isolation and sequence analysis of the gene encoding triose phosphate isomerase from Zygosaccharomyces bailii.

    Science.gov (United States)

    Merico, A; Rodrigues, F; Côrte-Real, M; Porro, D; Ranzi, B M; Compagno, C

    2001-06-30

    The ZbTPI1 gene encoding triose phosphate isomerase (TIM) was cloned from a Zygosaccharomyces bailii genomic library by complementation of the Saccharomyces cerevisiae tpi1 mutant strain. The nucleotide sequence of a 1.5 kb fragment showed an open reading frame (ORF) of 746 bp, encoding a protein of 248 amino acid residues. The deduced amino acid sequence shares a high degree of homology with TIMs from other yeast species, including some highly conserved regions. The analysis of the promoter sequence of the ZbTPI1 revealed the presence of putative motifs known to have regulatory functions in S. cerevisiae. The GenBank Accession No. of ZbTPI1 is AF325852. Copyright 2001 John Wiley & Sons, Ltd.

  11. IS1222: analysis and distribution of a new insertion sequence in Enterobacter agglomerans 339.

    Science.gov (United States)

    Steibl, H D; Lewecke, F M

    1995-04-14

    With a length of 1221 bp and 44-bp inverted repeats with ten mismatches, IS1222 was identified as an endogenous insertion sequence in Enterobacter agglomerans 339. In this host strain, four copies were located, three on the nif plasmid pEA9 and one at the chromosome. Sequence analysis showed two consecutive open reading frames, orfA and orfB, encoding putative polypeptides of 87 and 276 amino acids. In-between both reading frames, a potential frameshift window of the homonucleotide type was postulated, followed by a pseudoknot structure and a ribosome-binding site. Based on significant homology at the sequence level and similarity of the features discussed, IS1222 was placed among the group of IS3 elements with IS407, IS476 and ISR1 being the most closely related IS. Hybridization experiments suggest that the distribution of IS1222 is limited to a group of related bacterial strains among Enterobacteriaceae.

  12. Comparative Topological Analysis of Neuronal Arbors via Sequence Representation and Alignment

    Science.gov (United States)

    Gillette, Todd Aaron

    Neuronal morphology is a key mediator of neuronal function, defining the profile of connectivity and shaping signal integration and propagation. Reconstructing neurite processes is technically challenging and thus data has historically been relatively sparse. Data collection and curation along with more efficient and reliable data production methods provide opportunities for the application of informatics to find new relationships and more effectively explore the field. This dissertation presents a method for aiding the development of data production as well as a novel representation and set of analyses for extracting morphological patterns. The DIADEM Challenge was organized for the purposes of determining the state of the art in automated neuronal reconstruction and what existing challenges remained. As one of the co-organizers of the Challenge, I developed the DIADEM metric, a tool designed to measure the effectiveness of automated reconstruction algorithms by comparing resulting reconstructions to expert-produced gold standards and identifying errors of various types. It has been used in the DIADEM Challenge and in the testing of several algorithms since. Further, this dissertation describes a topological sequence representation of neuronal trees amenable to various forms of sequence analysis, notably motif analysis, global pairwise alignment, clustering, and multiple sequence alignment. Motif analysis of neuronal arbors shows a large difference in bifurcation type proportions between axons and dendrites, but that relatively simple growth mechanisms account for most higher order motifs. Pairwise global alignment of topological sequences, modified from traditional sequence alignment to preserve tree relationships, enabled cluster analysis which displayed strong correspondence with known cell classes by cell type, species, and brain region. Multiple alignment of sequences in selected clusters enabled the extraction of conserved features, revealing mouse

  13. Chaos game representation of functional protein sequences, and simulation and multifractal analysis of induced measures

    International Nuclear Information System (INIS)

    Zu-Guo, Yu; Qian-Jun, Xiao; Long, Shi; Jun-Wu, Yu; Anh, Vo

    2010-01-01

    Investigating the biological function of proteins is a key aspect of protein studies. Bioinformatic methods become important for studying the biological function of proteins. In this paper, we first give the chaos game representation (CGR) of randomly-linked functional protein sequences, then propose the use of the recurrent iterated function systems (RIFS) in fractal theory to simulate the measure based on their chaos game representations. This method helps to extract some features of functional protein sequences, and furthermore the biological functions of these proteins. Then multifractal analysis of the measures based on the CGRs of randomly-linked functional protein sequences are performed. We find that the CGRs have clear fractal patterns. The numerical results show that the RIFS can simulate the measure based on the CGR very well. The relative standard error and the estimated probability matrix in the RIFS do not depend on the order to link the functional protein sequences. The estimated probability matrices in the RIFS with different biological functions are evidently different. Hence the estimated probability matrices in the RIFS can be used to characterise the difference among linked functional protein sequences with different biological functions. From the values of the D q curves, one sees that these functional protein sequences are not completely random. The D q of all linked functional proteins studied are multifractal-like and sufficiently smooth for the C q (analogous to specific heat) curves to be meaningful. Furthermore, the D q curves of the measure μ based on their CGRs for different orders to link the functional protein sequences are almost identical if q ≥ 0. Finally, the C q curves of all linked functional proteins resemble a classical phase transition at a critical point. (cross-disciplinary physics and related areas of science and technology)

  14. On nanopore DNA sequencing by signal and noise analysis of ionic current.

    Science.gov (United States)

    Wen, Chenyu; Zeng, Shuangshuang; Zhang, Zhen; Hjort, Klas; Scheicher, Ralph; Zhang, Shi-Li

    2016-05-27

    DNA sequencing, i.e., the process of determining the succession of nucleotides on a DNA strand, has become a standard aid in biomedical research and is expected to revolutionize medicine. With the capability of handling single DNA molecules, nanopore technology holds high promises to become speedier in sequencing at lower cost than what are achievable with the commercially available optics- or semiconductor-based massively parallelized technologies. Despite tremendous progress made with biological and solid-state nanopores, high error rates and large uncertainties persist with the sequencing results. Here, we employ a nano-disk model to quantitatively analyze the sequencing process by examining the variations of ionic current when a DNA strand translocates a nanopore. Our focus is placed on signal-boosting and noise-suppressing strategies in order to attain the single-nucleotide resolution. Apart from decreasing pore diameter and thickness, it is crucial to also reduce the translocation speed and facilitate a stepwise translocation. Our best-case scenario analysis points to severe challenges with employing plain nanopore technology, i.e., without recourse to any signal amplification strategy, in achieving sequencing with the desired single-nucleotide resolution. A conceptual approach based on strand synthesis in the nanopore of the translocating DNA from single-stranded to double-stranded is shown to yield a 10-fold signal amplification. Although it involves no advanced physics and is very simple in mathematics, this simple model captures the essence of nanopore sequencing and is useful in guiding the design and operation of nanopore sequencing.

  15. [Analysis of COX1 sequences of Taenia isolates from four areas of Guangxi].

    Science.gov (United States)

    Yang, Yi-Chao; Ou-Yang, Yi; Su, Ai-Rong; Wan, Xiao-Ling; Li, Shu-Lin

    2012-06-01

    To analyze the COX1 sequences of Taenia isolates from four areas of Guangxi Zhuang Autonomous Region, and to understand the distribution of Taenia asiatica in Guangxi. Patients with taeniasis in Luzhai, Rongshui, Tiandong and Sanjiang in Guangxi were treated by deworming, and the Taenia isolates were collected. Cyclooxygenase-1 (COX1) sequences of these isolates were amplified by PCR, and the PCR products were sequenced by T-A clone sequencing. The homogeneities and genetic distances were calculated and analyzed, and the phylogenic trees were constructed by some softwares. Meanwhile, the COX1 sequences of the isolates from the 4 areas were compared separately with the sequences of Taenia species in GenBank. The COX1 sequence of the 5 Taenia isolates collected had the same length of 444 bp. There were 5 variable positions between the Luzhai isolate and Taenia asiatica, the homogeneity was 98.87% and their genetic distance was 0.011. The phylogenetic tree analysis revealed that the Luzhai isolate and Taenia asiatica locating at the same node had a close relationship. The homogeneity between Rongshui isolate A and Taenia solium was 100%, while the homogeneity of Rongshui isolate B with Taeniasis saginata and Taenia asiatica were 98.20% and 96.17%, respectively. The homogeneities of the Tiandong and Sanjiang isolates with Taenia solium were 99.55% and 96.40%, respectively, and the genetic distances were 0.005 and 0.037, respectively. The homogeneity between the Luzhai isolate and Taeniasis saginate was 96.40%. Taenia asiatica exists in Luzhai and Taenia solium and Taenia saginata coexist in Rongshui, Guangxi Zhuang Autonomous Region.

  16. Human papilloma viruses and cervical tumours: mapping of integration sites and analysis of adjacent cellular sequences

    International Nuclear Information System (INIS)

    Klimov, Eugene; Vinokourova, Svetlana; Moisjak, Elena; Rakhmanaliev, Elian; Kobseva, Vera; Laimins, Laimonis; Kisseljov, Fjodor; Sulimova, Galina

    2002-01-01

    In cervical tumours the integration of human papilloma viruses (HPV) transcripts often results in the generation of transcripts that consist of hybrids of viral and cellular sequences. Mapping data using a variety of techniques has demonstrated that HPV integration occurred without obvious specificity into human genome. However, these techniques could not demonstrate whether integration resulted in the generation of transcripts encoding viral or viral-cellular sequences. The aim of this work was to map the integration sites of HPV DNA and to analyse the adjacent cellular sequences. Amplification of the INTs was done by the APOT technique. The APOT products were sequenced according to standard protocols. The analysis of the sequences was performed using BLASTN program and public databases. To localise the INTs PCR-based screening of GeneBridge4-RH-panel was used. Twelve cellular sequences adjacent to integrated HPV16 (INT markers) expressed in squamous cell cervical carcinomas were isolated. For 11 INT markers homologous human genomic sequences were readily identified and 9 of these showed significant homologies to known genes/ESTs. Using the known locations of homologous cDNAs and the RH-mapping techniques, mapping studies showed that the INTs are distributed among different human chromosomes for each tumour sample and are located in regions with the high levels of expression. Integration of HPV genomes occurs into the different human chromosomes but into regions that contain highly transcribed genes. One interpretation of these studies is that integration of HPV occurs into decondensed regions, which are more accessible for integration of foreign DNA

  17. Sequence stratigraphy and high-frequency cycles: New aspects for a quantitative evaluation of the Gulf of Suez basin, Egypt

    Energy Technology Data Exchange (ETDEWEB)

    Nio, S.D.; Yang, C.S. (International Geoservices, Leiderdorp (Netherlands)); Tewfik, N.; Darwish, M. (Earth Resource Exploration, Cairo (Egypt)); Jonkman, H. (International Geoservices, Leiderdorp (Netherlands))

    1993-09-01

    A new development in the application of sequence stratigraphic concepts in marine as well as continental basins is the recognition of high-frequency cyclic patterns in rock successions in the subsurface. Studies of six wells from the northern, central, and southern parts of the Gulf of Suez show the presence of well-preserved, high-frequency cycles with periodicities similar to the orbitally forced Malankovitch parameters. Subsurface rock successions, third-order sequences, and high-frequency cycles were compared with outcrops. After establishing the biostratigraphic framework for the above-mentioned wells, a sequence analysis was performed. Sequence boundaries and maximum flooding positions in each well were calibrated with the occurrences and evaluation of the high-frequency cycles. It became obvious that there is an intimate relationship between these high-frequency Milankovitch cycles and sequence organization. In addition, a close relationship can be observed in the subsurface as well as in outcrops between high-frequency climatic changes (connected to the Milankovitch cycles) and (litho)facies variability. Quantitative evaluations of each sequence and/or systems tract can be computed with the International Geoservices' cyclicity analysis tool (MILABAR). The results are summarized in a well composite chart, rate (NAR), and ratio of preserved time. In correlations between the wells, an accuracy of 500-100 Ka can be obtained. The quantitative evaluation of the sequence and high-frequency cycle analysis gave some new aspects concerning the (litho)facies and geodynamic development during the pre- as well as the synrift stages of the Gulf of Suez Basin.

  18. Survey of methods for integrated sequence analysis with emphasis on man-machine interaction

    Energy Technology Data Exchange (ETDEWEB)

    Kahlbom, U.; Holmgren, P. [RELCON, Stockholm (Sweden)

    1995-05-01

    This report presents a literature study concerning recently developed monotonic methodologies in the human reliability area. The work was performed by RELCON AB on commission by NKS/RAK-1, subproject 3. The topic of subproject 3 is `Integrated Sequence Analysis with Emphasis on Man-Machine Interaction`. The purpose with the study was to compile recently developed methodologies and to propose some of these methodologies for use in the sequence analysis task. The report describes mainly non-dynamic (monotonic) methodologies. One exception is HITLINE, which is a semi-dynamic method. Reference provides a summary of approaches to dynamic analysis of man-machine-interaction, and explains the differences between monotonic and dynamic methodologies. (au) 21 refs.

  19. Survey of methods for integrated sequence analysis with emphasis on man-machine interaction

    International Nuclear Information System (INIS)

    Kahlbom, U.; Holmgren, P.

    1995-05-01

    This report presents a literature study concerning recently developed monotonic methodologies in the human reliability area. The work was performed by RELCON AB on commission by NKS/RAK-1, subproject 3. The topic of subproject 3 is 'Integrated Sequence Analysis with Emphasis on Man-Machine Interaction'. The purpose with the study was to compile recently developed methodologies and to propose some of these methodologies for use in the sequence analysis task. The report describes mainly non-dynamic (monotonic) methodologies. One exception is HITLINE, which is a semi-dynamic method. Reference provides a summary of approaches to dynamic analysis of man-machine-interaction, and explains the differences between monotonic and dynamic methodologies. (au) 21 refs

  20. HTSstation: a web application and open-access libraries for high-throughput sequencing data analysis.

    Directory of Open Access Journals (Sweden)

    Fabrice P A David

    Full Text Available The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch.

  1. HTSstation: a web application and open-access libraries for high-throughput sequencing data analysis.

    Science.gov (United States)

    David, Fabrice P A; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion

    2014-01-01

    The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch.

  2. Characterising the CRISPR immune system in Archaea using genome sequence analysis

    DEFF Research Database (Denmark)

    Shah, Shiraz Ali

    Archaea, a group of microorganisms distinct from bacteria and eukaryotes, are equipped with an adaptive immune system called the CRISPR system, which relies on an RNA interference mechanism to combat invading viruses and plasmids. Using a genome sequence analysis approach, the four components...... of archaeal genomic CRISPR loci were analysed, namely, repeats, spacers, leaders and cas genes. Based on analysis of spacer sequences it was predicted that the immune system combats viruses and plasmids by targeting their DNA. Furthermore, analysis of repeats, leaders and cas genes revealed that CRISPR...... systems exist as distinct families which have key differences between themselves. Closely related organisms were seen harbouring different CRISPR systems, while some distantly related species carried similar systems, indicating frequent horizontal exchange. Moreover, it was found that cas genes of Type I...

  3. Characterising the CRISPR immune system in Archaea using genome sequence analysis

    DEFF Research Database (Denmark)

    Shah, Shiraz Ali

    Archaea, a group of microorganisms distinct from bacteria and eukaryotes, are equipped with an adaptive immune system called the CRISPR system, which relies on an RNA interference mechanism to combat invading viruses and plasmids. Using a genome sequence analysis approach, the four components...... of archaeal genomic CRISPR loci were analysed, namely, repeats, spacers, leaders and cas genes. Based on analysis of spacer sequences it was predicted that the immune system combats viruses and plasmids by targeting their DNA. Furthermore, analysis of repeats, leaders and cas genes revealed that CRISPR...... the activity of the Type III interference complexes. This dynamic nature of the CRISPR immune systems may be a prerequisite for their continued efficacy against the ever changing threats they protect their hosts from....

  4. Chromosome-scale comparative sequence analysis unravels molecular mechanisms of genome evolution between two wheat cultivars

    KAUST Repository

    Thind, Anupriya Kaur

    2018-02-08

    Background: Recent improvements in DNA sequencing and genome scaffolding have paved the way to generate high-quality de novo assemblies of pseudomolecules representing complete chromosomes of wheat and its wild relatives. These assemblies form the basis to compare the evolutionary dynamics of wheat genomes on a megabase-scale. Results: Here, we provide a comparative sequence analysis of the 700-megabase chromosome 2D between two bread wheat genotypes, the old landrace Chinese Spring and the elite Swiss spring wheat line CH Campala Lr22a. There was a high degree of sequence conservation between the two chromosomes. Analysis of large structural variations revealed four large insertions/deletions (InDels) of >100 kb. Based on the molecular signatures at the breakpoints, unequal crossing over and double-strand break repair were identified as the evolutionary mechanisms that caused these InDels. Three of the large InDels affected copy number of NLRs, a gene family involved in plant immunity. Analysis of single nucleotide polymorphism (SNP) density revealed three haploblocks of 8 Mb, 9 Mb and 48 Mb with a 35-fold increased SNP density compared to the rest of the chromosome. Conclusions: This comparative analysis of two high-quality chromosome assemblies enabled a comprehensive assessment of large structural variations. The insight obtained from this analysis will form the basis of future wheat pan-genome studies.

  5. Software for rapid time dependent ChIP-sequencing analysis (TDCA).

    Science.gov (United States)

    Myschyshyn, Mike; Farren-Dai, Marco; Chuang, Tien-Jui; Vocadlo, David

    2017-11-25

    Chromatin immunoprecipitation followed by DNA sequencing (ChIP-seq) and associated methods are widely used to define the genome wide distribution of chromatin associated proteins, post-translational epigenetic marks, and modifications found on DNA bases. An area of emerging interest is to study time dependent changes in the distribution of such proteins and marks by using serial ChIP-seq experiments performed in a time resolved manner. Despite such time resolved studies becoming increasingly common, software to facilitate analysis of such data in a robust automated manner is limited. We have designed software called Time-Dependent ChIP-Sequencing Analyser (TDCA), which is the first program to automate analysis of time-dependent ChIP-seq data by fitting to sigmoidal curves. We provide users with guidance for experimental design of TDCA for modeling of time course (TC) ChIP-seq data using two simulated data sets. Furthermore, we demonstrate that this fitting strategy is widely applicable by showing that automated analysis of three previously published TC data sets accurately recapitulates key findings reported in these studies. Using each of these data sets, we highlight how biologically relevant findings can be readily obtained by exploiting TDCA to yield intuitive parameters that describe behavior at either a single locus or sets of loci. TDCA enables customizable analysis of user input aligned DNA sequencing data, coupled with graphical outputs in the form of publication-ready figures that describe behavior at either individual loci or sets of loci sharing common traits defined by the user. TDCA accepts sequencing data as standard binary alignment map (BAM) files and loci of interest in browser extensible data (BED) file format. TDCA accurately models the number of sequencing reads, or coverage, at loci from TC ChIP-seq studies or conceptually related TC sequencing experiments. TC experiments are reduced to intuitive parametric values that facilitate biologically

  6. Comparative analysis of full genomic sequences among different genotypes of dengue virus type 3

    Directory of Open Access Journals (Sweden)

    Lin Ting-Hsiang

    2008-05-01

    Full Text Available Abstract Background Although the previous study demonstrated the envelope protein of dengue viruses is under purifying selection pressure, little is known about the genetic differences of full-length viral genomes of DENV-3. In our study, complete genomic sequencing of DENV-3 strains collected from different geographical locations and isolation years were determined and the sequence diversity as well as selection pressure sites in the DENV genome other than within the E gene were also analyzed. Results Using maximum likelihood and Bayesian approaches, our phylogenetic analysis revealed that the Taiwan's indigenous DENV-3 isolated from 1994 and 1998 dengue/DHF epidemics and one 1999 sporadic case were of the three different genotypes – I, II, and III, each associated with DENV-3 circulating in Indonesia, Thailand and Sri Lanka, respectively. Sequence diversity and selection pressure of different genomic regions among DENV-3 different genotypes was further examined to understand the global DENV-3 evolution. The highest nucleotide sequence diversity among the fully sequenced DENV-3 strains was found in the nonstructural protein 2A (mean ± SD: 5.84 ± 0.54 and envelope protein gene regions (mean ± SD: 5.04 ± 0.32. Further analysis found that positive selection pressure of DENV-3 may occur in the non-structural protein 1 gene region and the positive selection site was detected at position 178 of the NS1 gene. Conclusion Our study confirmed that the envelope protein is under purifying selection pressure although it presented higher sequence diversity. The detection of positive selection pressure in the non-structural protein along genotype II indicated that DENV-3 originated from Southeast Asia needs to monitor the emergence of DENV strains with epidemic potential for better epidemic prevention and vaccine development.

  7. UNIVERSAL PRIMERS FOR THE AMPLIFICATION AND SEQUENCE ANALYSIS OF ACTIN-1 FROM DIVERSE MOSQUITO SPECIES

    Science.gov (United States)

    STALEY, MOLLY; DORMAN, KARIN S.; BARTHOLOMAY, LYRIC C.; FERNÁNDEZ-SALAS, ILDEFONSO; FARFAN-ALE, JOSE A.; LOROÑO-PINO, MARIA A.; GARCIA-REJON, JULIAN E.; IBARRA-JUAREZ, LUIS

    2010-01-01

    We report the development of universal primers for the reverse-transcription polymerase chain reaction (RT-PCR) amplification and nucleotide sequence analysis of actin cDNAs from taxonomically diverse mosquito species. Primers specific to conserved regions of the invertebrate actin-1 gene were designed after actin cDNA sequences of Anopheles gambiae, Bombyx mori, Drosophila melanogaster, and Caenorhabditis elegans. The efficacy of these primers was determined by RT-PCR with the use of total RNA from mosquitoes belonging to 30 species and 8 genera (Aedes, Anopheles, Culex, Deinocerites, Mansonia, Psorophora, Toxorhynchites, and Wyeomyia). The RT-PCR products were sequenced, and sequence data were used to design additional primers. One primer pair, denoted as Act-2F (5′-ATGGTCGGYATGGGNCAGAAGGACTC-3′) and Act-8R (5′-GATTCCATACCCAGGAAG-GADGG-3′), successfully amplified an RT-PCR product of the expected size (683-nt) in all mosquito spp. tested. We propose that this primer pair can be used as an internal control to test the quality of RNA from mosquitoes collected in vector surveillance studies. These primers can also be used in molecular experiments in which the detection, amplification or silencing of a ubiquitously expressed mosquito housekeeping gene is necessary. Sequence and phylogenetic data are also presented in this report. PMID:20649132

  8. Analysis and comparison of fragrant gene sequence in some rice cultivars

    Directory of Open Access Journals (Sweden)

    Karami Noushafarin

    2016-01-01

    Full Text Available It is known that the fragrant trait in rice (Oryza sativa L. is largely controlled by fgr gene on chromosome 8 and it has been specified that the existence of an 8 bp deletion and three single nucleotide polymorphism (SNP in exon 7 is effective on this trait. In this study, sequence alignment analysis of fgr exon7 on chromosome 8 for 11 different fragrant and non-fragrant cultivars revealed that 5 aromatic rice cultivars carried 3 SNPs and 8 bp deletion in exon7 which terminates prematurely at a TAA stop codon. However, 5 of the non-aromatics showed a sequence identical to the published Nipponbare, being non-fragrant Japonica variety sequence. An exception among them was Bejar, which had 8 bp deletion and 3SNPs but it was non-aromatic. Sequencing can determine nucleotide alignment of a gene and give beneficial information about gene function. In silico prediction showed proteins sequences alignment of fgr gene for Khazar and Domsiah genotypes were different. Betaine aldehyde dehydrogenase complete enzyme belongs to Khazar non-fragrant genotype that has complete length and 503 amino acids while non-functional BADH2 enzyme for Domsiah fragrant genotype has 251 amino acids that result in accumulate 2-acetyl-1-pyrroline (2AP and produces aroma in fragrant genotypes.

  9. Comparative Analysis of Microbial Diversity in Termite Gut and Termite Nest Using Ion Sequencing.

    Science.gov (United States)

    Manjula, Arumugam; Pushpanathan, Muthuirulan; Sathyavathi, Sundararaju; Gunasekaran, Paramasamy; Rajendhran, Jeyaprakash

    2016-03-01

    Termite gut and termite nest possess complex microbial communities. However, only limited information is available on the comparative investigation of termite gut- and nest-associated microbial communities. In the present study, we examined and compared the bacterial diversity of termite gut and their respective nest by high-throughput sequencing of V3 hypervariable region of 16S rDNA. A total of 14 barcoded libraries were generated from seven termite gut samples and their respective nest samples, and sequenced using Ion Torrent platform. The sequences of each group were pooled, which yielded 170,644 and 132,000 reads from termite gut and termite nest samples, respectively. Phylogenetic analysis revealed significant differences in the bacterial diversity and community structure between termite gut and termite nest samples. Phyla Verrucomicrobia and Acidobacteria were observed only in termite gut, whereas Synergistetes and Chlorobi were observed only in termite nest samples. These variations in microbial structure and composition could be attributed with the differences in physiological conditions prevailing in the termite gut (anoxic and alkaline) and termite nest (oxic, slightly acidic and rich in organic matter) environment. Overall, this study unmasked the complexity of bacterial population in the respective niche. Interestingly, majority of the sequence reads could be classified only up to the domain level indicating the presence of a huge number of uncultivable or unidentified novel bacterial species in both termite gut and nest samples. Whole metagenome sequencing and assessing the metabolic potential of these samples will be useful for biotechnological applications.

  10. Galaxy Workflows for Web-based Bioinformatics Analysis of Aptamer High-throughput Sequencing Data

    Directory of Open Access Journals (Sweden)

    William H Thiel

    2016-01-01

    Full Text Available Development of RNA and DNA aptamers for diagnostic and therapeutic applications is a rapidly growing field. Aptamers are identified through iterative rounds of selection in a process termed SELEX (Systematic Evolution of Ligands by EXponential enrichment. High-throughput sequencing (HTS revolutionized the modern SELEX process by identifying millions of aptamer sequences across multiple rounds of aptamer selection. However, these vast aptamer HTS datasets necessitated bioinformatics techniques. Herein, we describe a semiautomated approach to analyze aptamer HTS datasets using the Galaxy Project, a web-based open source collection of bioinformatics tools that were originally developed to analyze genome, exome, and transcriptome HTS data. Using a series of Workflows created in the Galaxy webserver, we demonstrate efficient processing of aptamer HTS data and compilation of a database of unique aptamer sequences. Additional Workflows were created to characterize the abundance and persistence of aptamer sequences within a selection and to filter sequences based on these parameters. A key advantage of this approach is that the online nature of the Galaxy webserver and its graphical interface allow for the analysis of HTS data without the need to compile code or install multiple programs.

  11. The Swiss-Army-Knife Approach to the Nearly Automatic Analysis for Microearthquake Sequences.

    Science.gov (United States)

    Kraft, T.; Simon, V.; Tormann, T.; Diehl, T.; Herrmann, M.

    2017-12-01

    Many Swiss earthquake sequence have been studied using relative location techniques, which often allowed to constrain the active fault planes and shed light on the tectonic processes that drove the seismicity. Yet, in the majority of cases the number of located earthquakes was too small to infer the details of the space-time evolution of the sequences, or their statistical properties. Therefore, it has mostly been impossible to resolve clear patterns in the seismicity of individual sequences, which are needed to improve our understanding of the mechanisms behind them. Here we present a nearly automatic workflow that combines well-established seismological analysis techniques and allows to significantly improve the completeness of detected and located earthquakes of a sequence. We start from the manually timed routine catalog of the Swiss Seismological Service (SED), which contains the larger events of a sequence. From these well-analyzed earthquakes we dynamically assemble a template set and perform a matched filter analysis on the station with: the best SNR for the sequence; and a recording history of at least 10-15 years, our typical analysis period. This usually allows us to detect events several orders of magnitude below the SED catalog detection threshold. The waveform similarity of the events is then further exploited to derive accurate and consistent magnitudes. The enhanced catalog is then analyzed statistically to derive high-resolution time-lines of the a- and b-value and consequently the occurrence probability of larger events. Many of the detected events are strong enough to be located using double-differences. No further manual interaction is needed; we simply time-shift the arrival-time pattern of the detecting template to the associated detection. Waveform similarity assures a good approximation of the expected arrival-times, which we use to calculate event-pair arrival-time differences by cross correlation. After a SNR and cycle-skipping quality

  12. Cloning, nucleotide sequence and transcriptional analysis of the uvrA gene from Neisseria gonorrhoeae

    International Nuclear Information System (INIS)

    Black, C.G.; Fyfe, J.A.M.; Davies, J.K.

    1997-01-01

    A recombinant plasmid capable of restoring UV resistance to an Escherichia coli uvrA mutant was isolated from a genomic library of Neisseria gonorrhoeae. Sequence analysis revealed an open reading frame whose deduced amino acid sequence displayed significant similarity to those of the UvrA proteins of other bacterial species. A second open reading frame (ORF259) was identified upstream from, and in the opposite orientation to the gonococcal uvrA gene. Transcriptional fusions between portions of the gonococcal uvrA upstream region and a reporter gene were used to localise promoter activity in both E. coli and N. gonorrhoeae. The transcriptional starting points of uvrA and ORF259 were mapped in E. coli by primer extension analysis, and corresponding σ 70 promoters were identified. The arrangement of the uvrA-ORF259 intergenic region is similar to that of the gonococcal recA-aroD intergenic region. Both contain inverted copies of the 10 bp neisserial DNA uptake sequence situated between divergently transcribed genes. However, there is no evidence that either the uptake sequence or the proximity of the promoters influences expression of these genes. (author)

  13. In Silico Genome Comparison and Distribution Analysis of Simple Sequences Repeats in Cassava

    Directory of Open Access Journals (Sweden)

    Andrea Vásquez

    2014-01-01

    Full Text Available We conducted a SSRs density analysis in different cassava genomic regions. The information obtained was useful to establish comparisons between cassava’s SSRs genomic distribution and those of poplar, flax, and Jatropha. In general, cassava has a low SSR density (~50 SSRs/Mbp and has a high proportion of pentanucleotides, (24,2 SSRs/Mbp. It was found that coding sequences have 15,5 SSRs/Mbp, introns have 82,3 SSRs/Mbp, 5′ UTRs have 196,1 SSRs/Mbp, and 3′ UTRs have 50,5 SSRs/Mbp. Through motif analysis of cassava’s genome SSRs, the most abundant motif was AT/AT while in intron sequences and UTRs regions it was AG/CT. In addition, in coding sequences the motif AAG/CTT was also found to occur most frequently; in fact, it is the third most used codon in cassava. Sequences containing SSRs were classified according to their functional annotation of Gene Ontology categories. The identified SSRs here may be a valuable addition for genetic mapping and future studies in phylogenetic analyses and genomic evolution.

  14. Exact combinatorial reliability analysis of dynamic systems with sequence-dependent failures

    International Nuclear Information System (INIS)

    Xing Liudong; Shrestha, Akhilesh; Dai Yuanshun

    2011-01-01

    Many real-life fault-tolerant systems are subjected to sequence-dependent failure behavior, in which the order in which the fault events occur is important to the system reliability. Such systems can be modeled by dynamic fault trees (DFT) with priority-AND (pAND) gates. Existing approaches for the reliability analysis of systems subjected to sequence-dependent failures are typically state-space-based, simulation-based or inclusion-exclusion-based methods. Those methods either suffer from the state-space explosion problem or require long computation time especially when results with high degree of accuracy are desired. In this paper, an analytical method based on sequential binary decision diagrams is proposed. The proposed approach can analyze the exact reliability of non-repairable dynamic systems subjected to the sequence-dependent failure behavior. Also, the proposed approach is combinatorial and is applicable for analyzing systems with any arbitrary component time-to-failure distributions. The application and advantages of the proposed approach are illustrated through analysis of several examples. - Highlights: → We analyze the sequence-dependent failure behavior using combinatorial models. → The method has no limitation on the type of time-to-failure distributions. → The method is analytical and based on sequential binary decision diagrams (SBDD). → The method is computationally more efficient than existing methods.

  15. Molecular cloning, sequence characteristics, and tissue expression analysis of ECE1 gene in Tibetan pig.

    Science.gov (United States)

    Wang, Yan-Dong; Zhang, Jian; Li, Chuan-Hao; Xu, Hai-Peng; Chen, Wei; Zeng, Yong-Qing; Wang, Hui

    2015-10-25

    Low air pressure and low oxygen partial pressure at high altitude seriously affect the survival and development of human beings and animals. ECE1 is a recently discovered gene that is involved in anti-hypoxia, but the full-length cDNA sequence has not been obtained. For a better understanding of the structure and function of the ECE1 gene and to study its effect in Tibetan pig, the cDNA of the ECE1 gene from the muscle of Tibetan pig was cloned, sequenced and characterized. The ECE1 full-length cDNA sequence consists of 2262 bp coding sequence (CDS) that encodes 753 amino acids with a molecular mass of 85,449 kD, 2 bp 5'UTR and 1507 bp 3'UTR. In addition, the phylogenetic tree analysis revealed that the Tibetan pig ECE1 has a closer genetic relationship and evolution distance with the land mammals ECE1. Furthermore, analysis by qPCR showed that the ECE1 transcript is constitutively expressed in the 10 tissues tested: the liver, subcutaneous fat, kidney, muscle, stomach, heart, brain, spleen, pancreas, and lung. These results serve as a foundation for further insight into the Tibetan pig ECE1 gene. Copyright © 2015 Elsevier B.V. All rights reserved.

  16. Expressed sequence tag analysis of the soybean rust pathogen Phakopsora pachyrhizi.

    Science.gov (United States)

    Posada-Buitrago, Martha Lucia; Frederick, Reid D

    2005-12-01

    Soybean rust is caused by the obligate fungal pathogen Phakopsora pachyrhizi Sydow. A unidirectional cDNA library was constructed using mRNA isolated from germinating P. pachyrhizi urediniospores to identify genes expressed at this physiological stage. Single pass sequence analysis of 908 clones revealed 488 unique expressed sequence tags (ESTs, unigenes) of which 107 appeared as multiple copies. BLASTX analysis identified 189 unigenes with significant similarities (Evalue<10(-5)) to sequences deposited in the NCBI non-redundant protein database. A search against the NCBI dbEST using the BLASTN algorithm revealed 32 ESTs with high or moderate similarities to plant and fungal sequences. Using the Expressed Gene Anatomy Classification, 31.7% of these ESTs were involved in primary metabolism, 14.3% in gene/protein expression, 7.4% in cell structure and growth, 6.9% in cell division, 4.8% in cell signaling/cell communication, and 4.8% in cell/organism defense. Approximately 29.6% of the identities were to hypothetical proteins and proteins with unknown function.

  17. Pattern matching through Chaos Game Representation: bridging numerical and discrete data structures for biological sequence analysis

    Directory of Open Access Journals (Sweden)

    Vinga Susana

    2012-05-01

    Full Text Available Abstract Background Chaos Game Representation (CGR is an iterated function that bijectively maps discrete sequences into a continuous domain. As a result, discrete sequences can be object of statistical and topological analyses otherwise reserved to numerical systems. Characteristically, CGR coordinates of substrings sharing an L-long suffix will be located within 2-L distance of each other. In the two decades since its original proposal, CGR has been generalized beyond its original focus on genomic sequences and has been successfully applied to a wide range of problems in bioinformatics. This report explores the possibility that it can be further extended to approach algorithms that rely on discrete, graph-based representations. Results The exploratory analysis described here consisted of selecting foundational string problems and refactoring them using CGR-based algorithms. We found that CGR can take the role of suffix trees and emulate sophisticated string algorithms, efficiently solving exact and approximate string matching problems such as finding all palindromes and tandem repeats, and matching with mismatches. The common feature of these problems is that they use longest common extension (LCE queries as subtasks of their procedures, which we show to have a constant time solution with CGR. Additionally, we show that CGR can be used as a rolling hash function within the Rabin-Karp algorithm. Conclusions The analysis of biological sequences relies on algorithmic foundations facing mounting challenges, both logistic (performance and analytical (lack of unifying mathematical framework. CGR is found to provide the latter and to promise the former: graph-based data structures for sequence analysis operations are entailed by numerical-based data structures produced by CGR maps, providing a unifying analytical framework for a diversity of pattern matching problems.

  18. Transcriptome analysis of Emiliania huxleyi cells grown under different conditions using high-throughput sequencing data

    Science.gov (United States)

    Andreson, R.; Anlauf, H.; Mackinder, L.; Iglesias-Rodriguez, D.; LaRoche, J.; Lenhard, B.

    2012-04-01

    Coccolithophores are ideal for studying genes responsible for biomineralization processes due to relatively small genome sizes, ability to grow in culture, and as a natural model system for measuring expression of calcification-related genes in two life stages. As the Emiliania huxleyi has several annotated calcification-related proteins, we have concentrated on analyzing its genes and promoter areas. Many recent studies have focused primarily on transcriptome analysis of E. huxleyi using nutrient-limited conditions to get more information about up-regulated genes involved in biomineralization and calcification processes. Although there are more than 100,000 EST sequences for E. huxleyi available from these projects in public databases, that data is often insufficient to identify the exact position of transcription start site (TSS) to perform precise analysis (nucleotide content, motif search) of core promoters and regulatory mechanisms in immediate flanking areas. ESTs are not ideal for these kinds of analyses because the standard technologies of producing 5' EST libraries do not guarantee that the exact 5' end of the transcript will be captured. To determine the extent and accurate positions of 5' ends of transcripts and therefore the positions of core promoters, Cap analysis of gene expression (CAGE) sequencing method was used for sequencing RNA of E. huxleyi in both stages, calcifying and non-calcifying. As an additional info, gene expression levels of RNA for 21 samples were retrieved with whole transcriptome shotgun sequencing (RNA-Seq). The collections of reads these methods produced were used to map and annotate genes on several samples and measure the RNA expression levels in different conditions. Although there are not much data available for close organisms, it is possible to compare these results with other species to find conserved regulatory mechanisms between genes related to calcification. Visualization tools allowing browsing of annotated genes

  19. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    Science.gov (United States)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.

  20. Analysis of whole genome sequencing for the Escherichia coli O157:H7 typing phages.

    Science.gov (United States)

    Cowley, Lauren A; Beckett, Stephen J; Chase-Topping, Margo; Perry, Neil; Dallman, Tim J; Gally, David L; Jenkins, Claire

    2015-04-08

    Shiga toxin producing Escherichia coli O157 can cause severe bloody diarrhea and haemolytic uraemic syndrome. Phage typing of E. coli O157 facilitates public health surveillance and outbreak investigations, certain phage types are more likely to occupy specific niches and are associated with specific age groups and disease severity. The aim of this study was to analyse the genome sequences of 16 (fourteen T4 and two T7) E. coli O157 typing phages and to determine the genes responsible for the subtle differences in phage type profiles. The typing phages were sequenced using paired-end Illumina sequencing at The Genome Analysis Centre and the Animal Health and Veterinary Laboratories Agency and bioinformatics programs including Velvet, Brig and Easyfig were used to analyse them. A two-way Euclidian cluster analysis highlighted the associations between groups of phage types and typing phages. The analysis showed that the T7 typing phages (9 and 10) differed by only three genes and that the T4 typing phages formed three distinct groups of similar genomic sequences: Group 1 (1, 8, 11, 12 and 15, 16), Group 2 (3, 6, 7 and 13) and Group 3 (2, 4, 5 and 14). The E. coli O157 phage typing scheme exhibited a significantly modular network linked to the genetic similarity of each group showing that these groups are specialised to infect a subset of phage types. Sequencing the typing phage has enabled us to identify the variable genes within each group and to determine how this corresponds to changes in phage type.

  1. CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences

    Directory of Open Access Journals (Sweden)

    Liu Chang

    2012-12-01

    Full Text Available Abstract Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas.

  2. SIMPLEX: Cloud-Enabled Pipeline for the Comprehensive Analysis of Exome Sequencing Data

    Science.gov (United States)

    Fischer, Maria; Snajder, Rene; Pabinger, Stephan; Dander, Andreas; Schossig, Anna; Zschocke, Johannes; Trajanoski, Zlatko; Stocker, Gernot

    2012-01-01

    In recent studies, exome sequencing has proven to be a successful screening tool for the identification of candidate genes causing rare genetic diseases. Although underlying targeted sequencing methods are well established, necessary data handling and focused, structured analysis still remain demanding tasks. Here, we present a cloud-enabled autonomous analysis pipeline, which comprises the complete exome analysis workflow. The pipeline combines several in-house developed and published applications to perform the following steps: (a) initial quality control, (b) intelligent data filtering and pre-processing, (c) sequence alignment to a reference genome, (d) SNP and DIP detection, (e) functional annotation of variants using different approaches, and (f) detailed report generation during various stages of the workflow. The pipeline connects the selected analysis steps, exposes all available parameters for customized usage, performs required data handling, and distributes computationally expensive tasks either on a dedicated high-performance computing infrastructure or on the Amazon cloud environment (EC2). The presented application has already been used in several research projects including studies to elucidate the role of rare genetic diseases. The pipeline is continuously tested and is publicly available under the GPL as a VirtualBox or Cloud image at http://simplex.i-med.ac.at; additional supplementary data is provided at http://www.icbi.at/exome. PMID:22870267

  3. SIMPLEX: cloud-enabled pipeline for the comprehensive analysis of exome sequencing data.

    Directory of Open Access Journals (Sweden)

    Maria Fischer

    Full Text Available In recent studies, exome sequencing has proven to be a successful screening tool for the identification of candidate genes causing rare genetic diseases. Although underlying targeted sequencing methods are well established, necessary data handling and focused, structured analysis still remain demanding tasks. Here, we present a cloud-enabled autonomous analysis pipeline, which comprises the complete exome analysis workflow. The pipeline combines several in-house developed and published applications to perform the following steps: (a initial quality control, (b intelligent data filtering and pre-processing, (c sequence alignment to a reference genome, (d SNP and DIP detection, (e functional annotation of variants using different approaches, and (f detailed report generation during various stages of the workflow. The pipeline connects the selected analysis steps, exposes all available parameters for customized usage, performs required data handling, and distributes computationally expensive tasks either on a dedicated high-performance computing infrastructure or on the Amazon cloud environment (EC2. The presented application has already been used in several research projects including studies to elucidate the role of rare genetic diseases. The pipeline is continuously tested and is publicly available under the GPL as a VirtualBox or Cloud image at http://simplex.i-med.ac.at; additional supplementary data is provided at http://www.icbi.at/exome.

  4. Chimira: analysis of small RNA sequencing data and microRNA modifications.

    Science.gov (United States)

    Vitsios, Dimitrios M; Enright, Anton J

    2015-10-15

    Chimira is a web-based system for microRNA (miRNA) analysis from small RNA-Seq data. Sequences are automatically cleaned, trimmed, size selected and mapped directly to miRNA hairpin sequences. This generates count-based miRNA expression data for subsequent statistical analysis. Moreover, it is capable of identifying epi-transcriptomic modifications in the input sequences. Supported modification types include multiple types of 3'-modifications (e.g. uridylation, adenylation), 5'-modifications and also internal modifications or variation (ADAR editing or single nucleotide polymorphisms). Besides cleaning and mapping of input sequences to miRNAs, Chimira provides a simple and intuitive set of tools for the analysis and interpretation of the results (see also Supplementary Material). These allow the visual study of the differential expression between two specific samples or sets of samples, the identification of the most highly expressed miRNAs within sample pairs (or sets of samples) and also the projection of the modification profile for specific miRNAs across all samples. Other tools have already been published in the past for various types of small RNA-Seq analysis, such as UEA workbench, seqBuster, MAGI, OASIS and CAP-miRSeq, CPSS for modifications identification. A comprehensive comparison of Chimira with each of these tools is provided in the Supplementary Material. Chimira outperforms all of these tools in total execution speed and aims to facilitate simple, fast and reliable analysis of small RNA-Seq data allowing also, for the first time, identification of global microRNA modification profiles in a simple intuitive interface. Chimira has been developed as a web application and it is accessible here: http://www.ebi.ac.uk/research/enright/software/chimira. aje@ebi.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  5. Citrate synthase gene sequence: a new tool for phylogenetic analysis and identification of Ehrlichia.

    Science.gov (United States)

    Inokuma, H; Brouqui, P; Drancourt, M; Raoult, D

    2001-09-01

    The sequence of the citrate synthase gene (gltA) of 13 ehrlichial species (Ehrlichia chaffeensis, Ehrlichia canis, Ehrlichia muris, an Ehrlichia species recently detected from Ixodes ovatus, Cowdria ruminantium, Ehrlichia phagocytophila, Ehrlichia equi, the human granulocytic ehrlichiosis [HGE] agent, Anaplasma marginale, Anaplasma centrale, Ehrlichia sennetsu, Ehrlichia risticii, and Neorickettsia helminthoeca) have been determined by degenerate PCR and the Genome Walker method. The ehrlichial gltA genes are 1,197 bp (E. sennetsu and E. risticii) to 1,254 bp (A. marginale and A. centrale) long, and GC contents of the gene vary from 30.5% (Ehrlichia sp. detected from I. ovatus) to 51.0% (A. centrale). The percent identities of the gltA nucleotide sequences among ehrlichial species were 49.7% (E. risticii versus A. centrale) to 99.8% (HGE agent versus E. equi). The percent identities of deduced amino acid sequences were 44.4% (E. sennetsu versus E. muris) to 99.5% (HGE agent versus E. equi), whereas the homology range of 16S rRNA genes was 83.5% (E. risticii versus the Ehrlichia sp. detected from I. ovatus) to 99.9% (HGE agent, E. equi, and E. phagocytophila). The architecture of the phylogenetic trees constructed by gltA nucleotide sequences or amino acid sequences was similar to that derived from the 16S rRNA gene sequences but showed more-significant bootstrap values. Based upon the alignment analysis of the ehrlichial gltA sequences, two sets of primers were designed to amplify tick-borne Ehrlichia and Neorickettsia genogroup Ehrlichia (N. helminthoeca, E. sennetsu, and E. risticii), respectively. Tick-borne Ehrlichia species were specifically identified by restriction fragment length polymorphism (RFLP) patterns of AcsI and XhoI with the exception of E. muris and the very closely related ehrlichia derived from I. ovatus for which sequence analysis of the PCR product is needed. Similarly, Neorickettsia genogroup Ehrlichia species were specifically identified by

  6. Automated Sanger Analysis Pipeline (ASAP): A Tool for Rapidly Analyzing Sanger Sequencing Data with Minimum User Interference.

    Science.gov (United States)

    Singh, Aditya; Bhatia, Prateek

    2016-12-01

    Sanger sequencing platforms, such as applied biosystems instruments, generate chromatogram files. Generally, for 1 region of a sequence, we use both forward and reverse primers to sequence that area, in that way, we have 2 sequences that need to be aligned and a consensus generated before mutation detection studies. This work is cumbersome and takes time, especially if the gene is large with many exons. Hence, we devised a rapid automated command system to filter, build, and align consensus sequences and also optionally extract exonic regions, translate them in all frames, and perform an amino acid alignment starting from raw sequence data within a very short time. In full capabilities of Automated Mutation Analysis Pipeline (ASAP), it is able to read "*.ab1" chromatogram files through command line interface, convert it to the FASTQ format, trim the low-quality regions, reverse-complement the reverse sequence, create a consensus sequence, extract the exonic regions using a reference exonic sequence, translate the sequence in all frames, and align the nucleic acid and amino acid sequences to reference nucleic acid and amino acid sequences, respectively. All files are created and can be used for further analysis. ASAP is available as Python 3.x executable at https://github.com/aditya-88/ASAP. The version described in this paper is 0.28.

  7. The Use of Next Generation Sequencing and Junction Sequence Analysis Bioinformatics to Achieve Molecular Characterization of Crops Improved Through Modern Biotechnology

    Directory of Open Access Journals (Sweden)

    David Kovalic

    2012-11-01

    Full Text Available The assessment of genetically modified (GM crops for regulatory approval currently requires a detailed molecular characterization of the DNA sequence and integrity of the transgene locus. In addition, molecular characterization is a critical component of event selection and advancement during product development. Typically, molecular characterization has relied on Southern blot analysis to establish locus and copy number along with targeted sequencing of polymerase chain reaction products spanning any inserted DNA to complete the characterization process. Here we describe the use of next generation (NexGen sequencing and junction sequence analysis bioinformatics in a new method for achieving full molecular characterization of a GM event without the need for Southern blot analysis. In this study, we examine a typical GM soybean [ (L. Merr.] line and demonstrate that this new method provides molecular characterization equivalent to the current Southern blot-based method. We also examine an event containing in vivo DNA rearrangement of multiple transfer DNA inserts to demonstrate that the new method is effective at identifying complex cases. Next generation sequencing and bioinformatics offers certain advantages over current approaches, most notably the simplicity, efficiency, and consistency of the method, and provides a viable alternative for efficiently and robustly achieving molecular characterization of GM crops.

  8. Sequence analysis of cultivated strawberry (Fragaria × ananassaDuch.) using microdissected single somatic chromosomes.

    Science.gov (United States)

    Yanagi, Tomohiro; Shirasawa, Kenta; Terachi, Mayuko; Isobe, Sachiko

    2017-01-01

    Cultivated strawberry ( Fragaria  ×  ananassa Duch.) has homoeologous chromosomes because of allo-octoploidy. For example, two homoeologous chromosomes that belong to different sub-genome of allopolyploids have similar base sequences. Thus, when conducting de novo assembly of DNA sequences, it is difficult to determine whether these sequences are derived from the same chromosome. To avoid the difficulties associated with homoeologous chromosomes and demonstrate the possibility of sequencing allopolyploids using single chromosomes, we conducted sequence analysis using microdissected single somatic chromosomes of cultivated strawberry. Three hundred and ten somatic chromosomes of the Japanese octoploid strawberry 'Reiko' were individually selected under a light microscope using a microdissection system. DNA from 288 of the dissected chromosomes was successfully amplified using a DNA amplification kit. Using next-generation sequencing, we decoded the base sequences of the amplified DNA segments, and on the basis of mapping, we identified DNA sequences from 144 samples that were best matched to the reference genomes of the octoploid strawberry, F.  ×  ananassa , and the diploid strawberry, F. vesca . The 144 samples were classified into seven pseudo-molecules of F. vesca . The coverage rates of the DNA sequences from the single chromosome onto all pseudo-molecular sequences varied from 3 to 29.9%. We demonstrated an efficient method for sequence analysis of allopolyploid plants using microdissected single chromosomes. On the basis of our results, we believe that whole-genome analysis of allopolyploid plants can be enhanced using methodology that employs microdissected single chromosomes.

  9. V3 loop sequence space analysis suggests different evolutionary patterns of CCR5- and CXCR4-tropic HIV.

    Directory of Open Access Journals (Sweden)

    Katarzyna Bozek

    Full Text Available The V3 loop of human immunodeficiency virus type 1 (HIV-1 is critical for coreceptor binding and is the main determinant of which of the cellular coreceptors, CCR5 or CXCR4, the virus uses for cell entry. The aim of this study is to provide a large-scale data driven analysis of HIV-1 coreceptor usage with respect to the V3 loop evolution and to characterize CCR5- and CXCR4-tropic viral phenotypes previously studied in small- and medium-scale settings. We use different sequence similarity measures, phylogenetic and clustering methods in order to analyze the distribution in sequence space of roughly 1000 V3 loop sequences and their tropism phenotypes. This analysis affords a means of characterizing those sequences that are misclassified by several sequence-based coreceptor prediction methods, as well as predicting the coreceptor using the location of the sequence in sequence space and of relating this location to the CD4(+ T-cell count of the patient. We support previous findings that the usage of CCR5 is correlated with relatively high sequence conservation whereas CXCR4-tropic viruses spread over larger regions in sequence space. The incorrectly predicted sequences are mostly located in regions in which their phenotype represents the minority or in close vicinity of regions dominated by the opposite phenotype. Nevertheless, the location of the sequence in sequence space can be used to improve the accuracy of the prediction of the coreceptor usage. Sequences from patients with high CD4(+ T-cell counts are relatively highly conserved as compared to those of immunosuppressed patients. Our study thus supports hypotheses of an association of immune system depletion with an increase in V3 loop sequence variability and with the escape of the viral sequence to distant parts of the sequence space.

  10. Analysis of diversity of chromophytic phytoplankton in a mangrove ecosystem using rbcL gene sequencing.

    Science.gov (United States)

    Samanta, Brajogopal; Bhadury, Punyasloke

    2014-04-01

    Phytoplankton forms the basis of primary production in mangrove environments. The phylogeny and diversity based on the amplification and sequencing of rbcL, the large subunit encoding the key enzyme ribulose-1, 5-bisphosphate carboxylase/oxygenase was investigated for improved understanding of the community structure and temporal trends of chromophytic eukaryotic phytoplankton assemblages in Sundarbans, the world's largest continuous mangrove. Diatoms (Bacillariophyceae) were by far the most frequently detected group in clone libraries (485 out of 525 clones), consistent with their importance as a major bloom-forming group. Other major chromophytic algal groups including Cryptophyceae, Haptophyceae, Pelagophyceae, Eustigmatophyceae, and Raphidophyceae which are important component of the assemblages were detected for the first time from Sundarbans based on rbcL approach. Many of the sequences from Sundarbans rbcL clone libraries showed identity with key bloom forming diatom genera namely Thalassiosira, Skeletonema and Nitzschia. Similarly, several rbcL sequences which were diatom-like were also detected highlighting the need to explore diatom communities from the study area. Some of the rbcL sequences detected from Sundarbans were ubiquitous in distribution showing 100% identities with uncultured rbcL sequences targeted previously from the Gulf of Mexico and California upwelling system that are geographically separated from study area. Novel rbcL lineages were also detected highlighting the need to culture and sequence phytoplankton from the ecoregion. Principal component analysis revealed that nitrate is an important variable that is associated with observed variation in phytoplankton assemblages (operational taxonomic units). This study applied molecular tools to highlight the ecological significance of diatoms, in addition to other chromophytic algal groups in Sundarbans. © 2014 Phycological Society of America.

  11. Long-range correlations in the fire sequences with Detrended Fluctuation Analysis

    Science.gov (United States)

    Zheng, H.; Song, W.

    2009-04-01

    Forest fires have been found to exhibit good power-law relation in the frequency-size distribution over many orders of magnitude in different countries, which identifies that forest fires behave as self-organized criticality (SOC). And in the temporal aspect, it is also found that the frequency-interval distributions of fires obey power-law with periodic fluctuations. The fire sequences cannot generally be described as Poisson point process, because the distribution of the occurrence times is not homogeneous and shows a clustering behavior. So the power-law distributions, the scaling behavior of the parameters are usually used to describe the sequence. Inter-event time series, the waiting-time between consecutive events, were studied in the similar earthquakes system in recent years, focusing on the distributions and the intrinsic mechanism. In order to find the long-range correlations of fire sequences, we analyzed the scaling behavior of the fires occurred in some places of Asia by means of the detrended fluctuation analysis (DFA), which provides the information of the scaling behavior and long-range characteristics in non-stationary time series. The scaling exponents, larger than 0.5, indicate the presence of persistent long-range correlations, while it performs white noise at 0.5. The detail fire data were investigated in several places, and with the different thresholds of the burned areas or losses. The result reveals the existence of long-range correlations in the fire interval sequences, and the scaling exponents are quite constant over several orders of magnitude. But the exponents are different from each other, possibly due to the orientation of the places we analyzed and other local influencing factors: human activity, weather, economic etc. Besides, the fire sequences of different types were studied in the same way, to find out the possible different long-range behaviors and their possible reasons. The results seem to be helpful to understand the

  12. Similarity Measures of Sequence of Fuzzy Numbers and Fuzzy Risk Analysis

    Directory of Open Access Journals (Sweden)

    Zarife Zararsız

    2015-01-01

    Full Text Available We present the methods to evaluate the similarity measures between sequence of triangular fuzzy numbers for making contributions to fuzzy risk analysis. Firstly, we calculate the COG (center of gravity points of sequence of triangular fuzzy numbers. After, we present the methods to measure the degree of similarity between sequence of triangular fuzzy numbers. In addition, we give an example to compare the methods mentioned in the text. Furthermore, in this paper, we deal with the (t1,t2 type fuzzy number. By defining the algebraic operations on the (t1,t2 type fuzzy numbers we can solve the equations in the form x+u(t1,t2=v(t1,t2, where u(t1,t2 and v(t1,t2 are fuzzy number. By this way, we can build an algebraic structure on fuzzy numbers. Additionally, the generalized difference sequence spaces of triangular fuzzy numbers [l∞(Ft]B(r^,s^, [c(Ft]B(r^,s^, and [c0(Ft]B(r^,s^, consisting of all sequences u∗=(u(t1,t2k such that Br^,s^u∗ is in the spaces l∞(Ft, c(Ft, and c0(Ft, have been constructed, respectively. Furthermore, some classes of matrix transformations from the space cFtB(r^,s^ and μ(Ft to μ(Ft and cFtB(r^,s^ are characterized, respectively, where μ(Ft is any sequence space.

  13. Detection bias in microarray and sequencing transcriptomic analysis identified by housekeeping genes.

    Science.gov (United States)

    Zhang, Yijuan; Akintola, Oluwafemi S; Liu, Ken J A; Sun, Bingyun

    2016-03-01

    This work includes the original data used to discover the gene ontology bias in transcriptomic analysis conducted by microarray and high throughput sequencing (Zhang et al., 2015) [1]. In the analysis, housekeeping genes were used to examine the differential detection ability by microarray and sequencing because these genes are probably the most reliably detected. The genes included here were compiled from 15 human housekeeping gene studies. The provided tables here comprise of detailed chromosomal location, detection breadth, normalized expression level, exon count, total exon length, and total intron length of each concerned gene and their related transcripts. We hope this information can help researchers better understand the differences in gene ontology-bias we discussed (Zhang et al., 2015) [1] and can encourage further improvement on these two technology platforms.

  14. Authentication of Zanthoxylum Species Based on Integrated Analysis of Complete Chloroplast Genome Sequences and Metabolite Profiles.

    Science.gov (United States)

    Lee, Hyeon Ju; Koo, Hyun Jo; Lee, Jonghoon; Lee, Sang-Choon; Lee, Dong Young; Giang, Vo Ngoc Linh; Kim, Minjung; Shim, Hyeonah; Park, Jee Young; Yoo, Ki-Oug; Sung, Sang Hyun; Yang, Tae-Jin

    2017-11-29

    We performed chloroplast genome sequencing and comparative analysis of two Rutaceae species, Zanthoxylum schinifolium (Korean pepper tree) and Z. piperitum (Japanese pepper tree), which are medicinal and culinary crops in Asia. We identified more than 837 single nucleotide polymorphisms and 103 insertions/deletions (InDels) based on a comparison of the two chloroplast genomes and developed seven DNA markers derived from five tandem repeats and two InDel variations that discriminated between Korean Zanthoxylum species. Metabolite profile analysis pointed to three metabolic groups, one with Korean Z. piperitum samples, one with Korean Z. schinifolium samples, and the last containing all the tested Chinese Zanthoxylum species samples, which are considered to be Z. bungeanum based on our results. Two markers were capable of distinguishing among these three groups. The chloroplast genome sequences identified in this study represent a valuable genomics resource for exploring diversity in Rutaceae, and the molecular markers will be useful for authenticating dried Zanthoxylum berries in the marketplace.

  15. Genome-wide quantitative analysis of DNA methylation from bisulfite sequencing data

    Science.gov (United States)

    Akman, Kemal; Haaf, Thomas; Gravina, Silvia; Vijg, Jan; Tresch, Achim

    2014-01-01

    Summary: Here we present the open-source R/Bioconductor software package BEAT (BS-Seq Epimutation Analysis Toolkit). It implements all bioinformatics steps required for the quantitative high-resolution analysis of DNA methylation patterns from bisulfite sequencing data, including the detection of regional epimutation events, i.e. loss or gain of DNA methylation at CG positions relative to a reference. Using a binomial mixture model, the BEAT package aggregates methylation counts per genomic position, thereby compensating for low coverage, incomplete conversion and sequencing errors. Availability and implementation: BEAT is freely available as part of Bioconductor at www.bioconductor.org/packages/devel/bioc/html/BEAT.html. The package is distributed under the GNU Lesser General Public License 3.0. Contact: akman@mpipz.mpg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24618468

  16. Screening of BRCA1 sequence variants within exon 11 by heteroduplex analysis

    Directory of Open Access Journals (Sweden)

    Lucian Negura

    2013-03-01

    Full Text Available Germ-line mutations of either BRCA1 or BRCA2 represents the major hereditary risk to breast and ovariancancer. Screening for mutations in these genes is now standard practice in molecular diagnosis, opening the way tooncogenetic counselling and follow-up. Because mutations in both BRCA1 and BRCA2 are distributed throughout theloci, accepted clinical protocols involve screening their entire coding regions. Systematic Sanger sequencing is time andmoney consuming. Therefore, a lot of pre-screening techniques evolved over time in order to identify anomalousamplicons prior to sequencing. Because BRCA mutations are always heterozygous, heteroduplex analysis proved to be asuitable pre-screening step. We previously implemented mismatch specific endonuclease heteroduplex analysis forBRCA1 exon7. Here we show the utility of the same method for mutations and SNPs found in BRCA1 exon 11

  17. The genome sequence of Blochmannia floridanus: Comparative analysis of reduced genomes

    Science.gov (United States)

    Gil, Rosario; Silva, Francisco J.; Zientz, Evelyn; Delmotte, François; González-Candelas, Fernando; Latorre, Amparo; Rausell, Carolina; Kamerbeek, Judith; Gadau, Jürgen; Hölldobler, Bert; van Ham, Roeland C. H. J.; Gross, Roy; Moya, Andrés

    2003-01-01

    Bacterial symbioses are widespread among insects, probably being one of the key factors of their evolutionary success. We present the complete genome sequence of Blochmannia floridanus, the primary endosymbiont of carpenter ants. Although these ants feed on a complex diet, this symbiosis very likely has a nutritional basis: Blochmannia is able to supply nitrogen and sulfur compounds to the host while it takes advantage of the host metabolic machinery. Remarkably, these bacteria lack all known genes involved in replication initiation (dnaA, priA, and recA). The phylogenetic analysis of a set of conserved protein-coding genes shows that Bl. floridanus is phylogenetically related to Buchnera aphidicola and Wigglesworthia glossinidia, the other endosymbiotic bacteria whose complete genomes have been sequenced so far. Comparative analysis of the five known genomes from insect endosymbiotic bacteria reveals they share only 313 genes, a number that may be close to the minimum gene set necessary to sustain endosymbiotic life. PMID:12886019

  18. Human factors review for nuclear power plant severe accident sequence analysis

    International Nuclear Information System (INIS)

    Krois, P.A.; Haas, P.M.

    1985-01-01

    The paper discusses work conducted to: (1) support the severe accident sequence analysis of a nuclear power plant transient based on an assessment of operator actions, and (2) develop a descriptive model of operator severe accident management. Operator actions during the transient are assessed using qualitative and quantitative methods. A function-oriented accident management model provides a structure for developing technical operator guidance on mitigating core damage preventing radiological release

  19. Splicing Express: a software suite for alternative splicing analysis using next-generation sequencing data

    OpenAIRE

    Kroll, Jose E.; Kim, Jihoon; Ohno-Machado, Lucila; de Souza, Sandro J.

    2015-01-01

    Motivation. Alternative splicing events (ASEs) are prevalent in the transcriptome of eukaryotic species and are known to influence many biological phenomena. The identification and quantification of these events are crucial for a better understanding of biological processes. Next-generation DNA sequencing technologies have allowed deep characterization of transcriptomes and made it possible to address these issues. ASEs analysis, however, represents a challenging task especially when many dif...

  20. Comparative genome sequence analysis underscores mycoparasitism as the ancestral life style of Trichoderma

    OpenAIRE

    Kubicek, Christian P.; Herrera-Estrella, Alfredo; Seidl-Seiboth, Verena; Martinez, Diego A.; Druzhinina, Irina S.; Thon, Michael; Zeilinger, Susanne; Casas-Flores, Sergio; Horwitz, Benjamin A.; Mukherjee, Prasun K.; Mukherjee, Mala; Kredics, László; Alcaraz, Luis D.; Aerts, Andrea; Antal, Zsuzsanna

    2011-01-01

    Background Mycoparasitism, a lifestyle where one fungus is parasitic on another fungus, has special relevance when the prey is a plant pathogen, providing a strategy for biological control of pests for plant protection. Probably, the most studied biocontrol agents are species of the genus Hypocrea/Trichoderma. Results Here we report an analysis of the genome sequences of the two biocontrol species Trichoderma atroviride (teleomorph Hypocrea atroviridis) and Trichoderma virens (formerly Gliocl...

  1. High-resolution analysis of the 5'-end transcriptome using a next generation DNA sequencer.

    Directory of Open Access Journals (Sweden)

    Shin-ichi Hashimoto

    Full Text Available Massively parallel, tag-based sequencing systems, such as the SOLiD system, hold the promise of revolutionizing the study of whole genome gene expression due to the number of data points that can be generated in a simple and cost-effective manner. We describe the development of a 5'-end transcriptome workflow for the SOLiD system and demonstrate the advantages in sensitivity and dynamic range offered by this tag-based application over traditional approaches for the study of whole genome gene expression. 5'-end transcriptome analysis was used to study whole genome gene expression within a colon cancer cell line, HT-29, treated with the DNA methyltransferase inhibitor, 5-aza-2'-deoxycytidine (5Aza. More than 20 million 25-base 5'-end tags were obtained from untreated and 5Aza-treated cells and matched to sequences within the human genome. Seventy three percent of the mapped unique tags were associated with RefSeq cDNA sequences, corresponding to approximately 14,000 different protein-coding genes in this single cell type. The level of expression of these genes ranged from 0.02 to 4,704 transcripts per cell. The sensitivity of a single sequence run of the SOLiD platform was 100-1,000 fold greater than that observed from 5'end SAGE data generated from the analysis of 70,000 tags obtained by Sanger sequencing. The high-resolution 5'end gene expression profiling presented in this study will not only provide novel insight into the transcriptional machinery but should also serve as a basis for a better understanding of cell biology.

  2. Transcriptome sequencing and analysis of leaf tissue of Avicennia marina using the Illumina platform.

    Directory of Open Access Journals (Sweden)

    Jianzi Huang

    Full Text Available Avicennia marina is a widely distributed mangrove species that thrives in high-salinity habitats. It plays a significant role in supporting coastal ecosystem and holds unique potential for studying molecular mechanisms underlying ecological adaptation. Despite and sometimes because of its numerous merits, this species is facing increasing pressure of exploitation and deforestation. Both study on adaptation mechanisms and conservation efforts necessitate more genomic resources for A. marina. In this study, we used Illumina sequencing of an A. marina foliar cDNA library to generate a transcriptome dataset for gene and marker discovery. We obtained 40 million high-quality reads and assembled them into 91,125 unigenes with a mean length of 463 bp. These unigenes covered most of the publicly available A. marina Sanger ESTs and greatly extended the repertoire of transcripts for this species. A total of 54,497 and 32,637 unigenes were annotated based on homology to sequences in the NCBI non-redundant and the Swiss-prot protein databases, respectively. Both Gene Ontology (GO analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG pathway analysis revealed some transcriptomic signatures of stress adaptation for this halophytic species. We also detected an extraordinary amount of transcripts derived from fungal endophytes and demonstrated the utility of transcriptome sequencing in surveying endophyte diversity without isolating them out of plant tissues. Additionally, we identified 3,423 candidate simple sequence repeats (SSRs from 3,141 unigenes with a density of one SSR locus every 8.25 kb sequence. Our transcriptomic data will provide valuable resources for ecological, genetic and evolutionary studies in A. marina.

  3. Transcriptome sequencing and analysis of leaf tissue of Avicennia marina using the Illumina platform.

    Science.gov (United States)

    Huang, Jianzi; Lu, Xiang; Zhang, Wanke; Huang, Rongfeng; Chen, Shouyi; Zheng, Yizhi

    2014-01-01

    Avicennia marina is a widely distributed mangrove species that thrives in high-salinity habitats. It plays a significant role in supporting coastal ecosystem and holds unique potential for studying molecular mechanisms underlying ecological adaptation. Despite and sometimes because of its numerous merits, this species is facing increasing pressure of exploitation and deforestation. Both study on adaptation mechanisms and conservation efforts necessitate more genomic resources for A. marina. In this study, we used Illumina sequencing of an A. marina foliar cDNA library to generate a transcriptome dataset for gene and marker discovery. We obtained 40 million high-quality reads and assembled them into 91,125 unigenes with a mean length of 463 bp. These unigenes covered most of the publicly available A. marina Sanger ESTs and greatly extended the repertoire of transcripts for this species. A total of 54,497 and 32,637 unigenes were annotated based on homology to sequences in the NCBI non-redundant and the Swiss-prot protein databases, respectively. Both Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis revealed some transcriptomic signatures of stress adaptation for this halophytic species. We also detected an extraordinary amount of transcripts derived from fungal endophytes and demonstrated the utility of transcriptome sequencing in surveying endophyte diversity without isolating them out of plant tissues. Additionally, we identified 3,423 candidate simple sequence repeats (SSRs) from 3,141 unigenes with a density of one SSR locus every 8.25 kb sequence. Our transcriptomic data will provide valuable resources for ecological, genetic and evolutionary studies in A. marina.

  4. Multilocus sequence analysis of nectar pseudomonads reveals high genetic diversity and contrasting recombination patterns.

    Science.gov (United States)

    Alvarez-Pérez, Sergio; de Vega, Clara; Herrera, Carlos M

    2013-01-01

    The genetic and evolutionary relationships among floral nectar-dwelling Pseudomonas 'sensu stricto' isolates associated to South African and Mediterranean plants were investigated by multilocus sequence analysis (MLSA) of four core housekeeping genes (rrs, gyrB, rpoB and rpoD). A total of 35 different sequence types were found for the 38 nectar bacterial isolates characterised. Phylogenetic analyses resulted in the identification of three main clades [nectar groups (NGs) 1, 2 and 3] of nectar pseudomonads, which were closely related to five intrageneric groups: Pseudomonas oryzihabitans (NG 1); P. fluorescens, P. lutea and P. syringae (NG 2); and P. rhizosphaerae (NG 3). Linkage disequilibrium analysis pointed to a mostly clonal population structure, even when the analysis was restricted to isolates from the same floristic region or belonging to the same NG. Nevertheless, signatures of recombination were observed for NG 3, which exclusively included isolates retrieved from the floral nectar of insect-pollinated Mediterranean plants. In contrast, the other two NGs comprised both South African and Mediterranean isolates. Analyses relating diversification to floristic region and pollinator type revealed that there has been more unique evolution of the nectar pseudomonads within the Mediterranean region than would be expected by chance. This is the first work analysing the sequence of multiple loci to reveal geno- and ecotypes of nectar bacteria.

  5. VisRseq: R-based visual framework for analysis of sequencing data.

    Science.gov (United States)

    Younesy, Hamid; Möller, Torsten; Lorincz, Matthew C; Karimi, Mohammad M; Jones, Steven J M

    2015-01-01

    Several tools have been developed to enable biologists to perform initial browsing and exploration of sequencing data. However the computational tool set for further analyses often requires significant computational expertise to use and many of the biologists with the knowledge needed to interpret these data must rely on programming experts. We present VisRseq, a framework for analysis of sequencing datasets that provides a computationally rich and accessible framework for integrative and interactive analyses without requiring programming expertise. We achieve this aim by providing R apps, which offer a semi-auto generated and unified graphical user interface for computational packages in R and repositories such as Bioconductor. To address the interactivity limitation inherent in R libraries, our framework includes several native apps that provide exploration and brushing operations as well as an integrated genome browser. The apps can be chained together to create more powerful analysis workflows. To validate the usability of VisRseq for analysis of sequencing data, we present two case studies performed by our collaborators and report their workflow and insights.

  6. Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics

    Directory of Open Access Journals (Sweden)

    Galtier Nicolas

    2006-04-01

    Full Text Available Abstract Background A large number of bioinformatics applications in the fields of bio-sequence analysis, molecular evolution and population genetics typically share input/ouput methods, data storage requirements and data analysis algorithms. Such common features may be conveniently bundled into re-usable libraries, which enable the rapid development of new methods and robust applications. Results We present Bio++, a set of Object Oriented libraries written in C++. Available components include classes for data storage and handling (nucleotide/amino-acid/codon sequences, trees, distance matrices, population genetics datasets, various input/output formats, basic sequence manipulation (concatenation, transcription, translation, etc., phylogenetic analysis (maximum parsimony, markov models, distance methods, likelihood computation and maximization, population genetics/genomics (diversity statistics, neutrality tests, various multi-locus analyses and various algorithms for numerical calculus. Conclusion Implementation of methods aims at being both efficient and user-friendly. A special concern was given to the library design to enable easy extension and new methods development. We defined a general hierarchy of classes that allow the developer to implement its own algorithms while remaining compatible with the rest of the libraries. Bio++ source code is distributed free of charge under the CeCILL general public licence from its website http://kimura.univ-montp2.fr/BioPP.

  7. Sequencing and Comparative Analysis of a Conserved Syntenic Segment in the Solanaceae

    Science.gov (United States)

    Wang, Ying; Diehl, Adam; Wu, Feinan; Vrebalov, Julia; Giovannoni, James; Siepel, Adam; Tanksley, Steven D.

    2008-01-01

    Comparative genomics is a powerful tool for gaining insight into genomic function and evolution. However, in plants, sequence data that would enable detailed comparisons of both coding and noncoding regions have been limited in availability. Here we report the generation and analysis of sequences for an unduplicated conserved syntenic segment (CSS) in the genomes of five members of the agriculturally important plant family Solanaceae. This CSS includes a 105-kb region of tomato chromosome 2 and orthologous regions of the potato, eggplant, pepper, and petunia genomes. With a total neutral divergence of 0.73–0.78 substitutions/site, these sequences are similar enough that most noncoding regions can be aligned, yet divergent enough to be informative about evolutionary dynamics and selective pressures. The CSS contains 17 distinct genes with generally conserved order and orientation, but with numerous small-scale differences between species. Our analysis indicates that the last common ancestor of these species lived ∼27–36 million years ago, that more than one-third of short genomic segments (5–15 bp) are under selection, and that more than two-thirds of selected bases fall in noncoding regions. In addition, we identify genes under positive selection and analyze hundreds of conserved noncoding elements. This analysis provides a window into 30 million years of plant evolution in the absence of polyploidization. PMID:18723883

  8. VisRseq: R-based visual framework for analysis of sequencing data

    Science.gov (United States)

    2015-01-01

    Background Several tools have been developed to enable biologists to perform initial browsing and exploration of sequencing data. However the computational tool set for further analyses often requires significant computational expertise to use and many of the biologists with the knowledge needed to interpret these data must rely on programming experts. Results We present VisRseq, a framework for analysis of sequencing datasets that provides a computationally rich and accessible framework for integrative and interactive analyses without requiring programming expertise. We achieve this aim by providing R apps, which offer a semi-auto generated and unified graphical user interface for computational packages in R and repositories such as Bioconductor. To address the interactivity limitation inherent in R libraries, our framework includes several native apps that provide exploration and brushing operations as well as an integrated genome browser. The apps can be chained together to create more powerful analysis workflows. Conclusions To validate the usability of VisRseq for analysis of sequencing data, we present two case studies performed by our collaborators and report their workflow and insights. PMID:26328469

  9. Analysis of the genome sequence of infectious hematopoietic necrosis virus HLJ-09 in China.

    Science.gov (United States)

    Wang, C; Zhao, L L; Li, Y J; Tang, L J; Qiao, X Y; Jiang, Y P; Liu, M

    2016-02-01

    Infectious hematopoietic necrosis virus (IHNV) is a highly contagious disease of juvenile salmonid fish. Six genome target fragments of the complete genome sequence of IHNV HLJ-09 were amplified by RT-PCR, and the 3'-terminal and 5'-terminal region of the genomic RNA were amplified using the RACE method. The complete genome sequence of HLJ-09 comprises 11,132 nucleotides (nt) (Accession number JX649101) and is different from that of other IHNV strains published in GenBank. Homology comparison and phylogenetic analysis of six ORF sequences were carried out using HLJ-09 and other IHNV strains published in GenBank. From phylogenetic tree analysis, the N gene, M gene, and P gene had the closest genetic relationship to IHNV-PRT from Korea. Phylogenetic analysis for the full length of the G gene showed that the HLJ-09 strain exhibited very close homology to the ChYa07, RtNag96, RtUi02, and RtGu01 strains from Korea and Japan, indicating that the HLJ-09 strain belonged to the genotype JRt. Ultimately, the Chinese IHNV HLJ-09 strain may have originated in Korea and Japan.

  10. Analysis of Complete Nucleotide Sequences of 12 Gossypium Chloroplast Genomes: Origin and Evolution of Allotetraploids

    Science.gov (United States)

    Xu, Qin; Xiong, Guanjun; Li, Pengbo; He, Fei; Huang, Yi; Wang, Kunbo; Li, Zhaohu; Hua, Jinping

    2012-01-01

    Background Cotton (Gossypium spp.) is a model system for the analysis of polyploidization. Although ascertaining the donor species of allotetraploid cotton has been intensively studied, sequence comparison of Gossypium chloroplast genomes is still of interest to understand the mechanisms underlining the evolution of Gossypium allotetraploids, while it is generally accepted that the parents were A- and D-genome containing species. Here we performed a comparative analysis of 13 Gossypium chloroplast genomes, twelve of which are presented here for the first time. Methodology/Principal Findings The size of 12 chloroplast genomes under study varied from 159,959 bp to 160,433 bp. The chromosomes were highly similar having >98% sequence identity. They encoded the same set of 112 unique genes which occurred in a uniform order with only slightly different boundary junctions. Divergence due to indels as well as substitutions was examined separately for genome, coding and noncoding sequences. The genome divergence was estimated as 0.374% to 0.583% between allotetraploid species and A-genome, and 0.159% to 0.454% within allotetraploids. Forty protein-coding genes were completely identical at the protein level, and 20 intergenic sequences were completely conserved. The 9 allotetraploids shared 5 insertions and 9 deletions in whole genome, and 7-bp substitutions in protein-coding genes. The phylogenetic tree confirmed a close relationship between allotetraploids and the ancestor of A-genome, and the allotetraploids were divided into four separate groups. Progenitor allotetraploid cotton originated 0.43–0.68 million years ago (MYA). Conclusion Despite high degree of conservation between the Gossypium chloroplast genomes, sequence variations among species could still be detected. Gossypium chloroplast genomes preferred for 5-bp indels and 1–3-bp indels are mainly attributed to the SSR polymorphisms. This study supports that the common ancestor of diploid A-genome species in

  11. Biostratigraphical, paleoecological and paleobiogeographical interest of Guembelitria species across the Cretaceous-Paleogene transition at Atlantic realm (Bidart, SW France) and comparaison with Tethys realm

    Science.gov (United States)

    Gallala, N.; Zaghbib-Turki, D.; Turki, M. M.; Arenillas, I.; Arz, J. A.; Molina, E.

    2009-04-01

    At the Bidart section, the extinction rate at the K/Pg boundary reach about 95 % of the planktic foraminiferal species; whereas the Cretaceous survivors persisting along the Danian could be restricted to opportunist species as G. cretacea and G. cf. trifolia, and probably some generalist species of Hedbergella and Heterohelix. The Guembelitria cretacea species, present a biostratigraphical interest, and define the Guembelitria cretacea biozone of the lower Danian interval. This biozone is marked by the presence of Guembelitria cretacea, G. trifolia, Hedbergella holmdelensis, H. monmouthensis, Heterohelix punctulata, H. glabrans, H. labellosa, H. planata, H. pulchra, H. globulosa, and H. navarroensis. The ecological opportunist or disaster species Guembelitria cretacea (>63 µm) is present in very low frequencies (Caravaca sections (Spain). Like the guembelitrids, the hedbergellids species and perhaps few heterohelicids could be possible survivors.

  12. Genome Sequencing and Analysis of the Biomass-Degrading Fungus Trichoderma reesei (syn. Hypocrea jecorina)

    Energy Technology Data Exchange (ETDEWEB)

    Martinez, Antonio D.; Berka, Randy; Henrissat, Bernard; Saloheimo, Markku; Arvas, Mikko; Baker, Scott E.; Chapman, Jaro d; Chertkov, Olga; Coutinho, Pedro M.; Cullen, Dan; Danchin, Etienne G.; Grigoriev, Igor V.; Harris, Paul; Jackson, Melissa ?.; kubicek, Christian P.; Han, Cliff F.; Ho, Isaac; Larrando, Luis F.; Lopez de Leon, Alfredo; Magnuson, Jon K.; Merino, Sandy; Misra, Monica; Nelson, Beth; Putnam, Nicholas; Robbertse, Barbara; Salamov, Asaf; Schmoll, Monika; Terry, Astrid ?.; Thayer, Nina; Westerholm-Parvinen, Ann; Schoch, Conrad L.; Yao, Jian ?.; Barbote, Ravi; Nelson, Mary Anne; Detter, Chris J.; Bruce, David; Kuske, Cheryl; Xie, Gary; Richardson, P. M.; Rokhsar, Daniel S.; Lucas, Susan; Rubin, Eddie M.; Dunn-Coleman, Nigel; Ward, Michael ?.; Brettin, T.

    2008-05-01

    A major thrust of the white biotechnology movement involves the development of enzyme systems which depolymerize biomass to simple sugars which are subsequently converted to sustainable biofuels (e.g., ethanol) and chemical intermediates. The fungus Trichoderma reesei (syn. Hypocrea jecorina) represents a paradigm for the industrial production of highly efficient cellulases and hemicellulases needed for hydrolysis of biomass polysaccharides. Herein we describe intriguing attributes of the T. reeseigenome in relation to the future of fuel biotechnology. The T. reesei genome sequence was derived using a whole genome shotgun approach combined with finishing work to generate an assembly comprising 89 scaffolds totaling 34 Mbp with few gaps. In total, 9,130 gene models were predicted using a combination of ab initio and sequence similarity-based methods and EST data. Considering the industrial utility and effectiveness of its enzymes, the T. reesei genome surprisingly encodes the fewest cellulases and hemicellulases of any fungus having the ability to hydrolyze plant cell wall polysaccharides and whose genome has been sequenced. Many genes encoding carbohydrate active enzymes are distributed non-randomly in groups or clusters that interestingly lie between regions of synteny with other Sordariomycetes. Additionally, the T. reesei genome contains a multitude of genes encoding biosynthetic pathways for secondary metabolites (possible antibacterial and antifungal compounds) which may promote successful competition and survival in the crowded and competitive soil habitat occupied by T. reesei. Our analysis coupled with the availability of genome sequence data provides a roadmap for construction of enhanced T. reesei strains for industrial applications.

  13. Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Yaron Orenstein

    2017-10-01

    Full Text Available With the rapidly increasing volume of deep sequencing data, more efficient algorithms and data structures are needed. Minimizers are a central recent paradigm that has improved various sequence analysis tasks, including hashing for faster read overlap detection, sparse suffix arrays for creating smaller indexes, and Bloom filters for speeding up sequence search. Here, we propose an alternative paradigm that can lead to substantial further improvement in these and other tasks. For integers k and L > k, we say that a set of k-mers is a universal hitting set (UHS if every possible L-long sequence must contain a k-mer from the set. We develop a heuristic called DOCKS to find a compact UHS, which works in two phases: The first phase is solved optimally, and for the second we propose several efficient heuristics, trading set size for speed and memory. The use of heuristics is motivated by showing the NP-hardness of a closely related problem. We show that DOCKS works well in practice and produces UHSs that are very close to a theoretical lower bound. We present results for various values of k and L and by applying them to real genomes show that UHSs indeed improve over minimizers. In particular, DOCKS uses less than 30% of the 10-mers needed to span the human genome compared to minimizers. The software and computed UHSs are freely available at github.com/Shamir-Lab/DOCKS/ and acgt.cs.tau.ac.il/docks/, respectively.

  14. Analysis of mutations in the entire coding sequence of the factor VIII gene

    Energy Technology Data Exchange (ETDEWEB)

    Bidichadani, S.I.; Lanyon, W.G.; Connor, J.M. [Glascow Univ. (United Kingdom)] [and others

    1994-09-01

    Hemophilia A is a common X-linked recessive disorder of bleeding caused by deleterious mutations in the gene for clotting factor VIII. The large size of the factor VIII gene, the high frequency of de novo mutations and its tissue-specific expression complicate the detection of mutations. We have used a combination of RT-PCR of ectopic factor VIII transcripts and genomic DNA-PCRs to amplify the entire essential sequence of the factor VIII gene. This is followed by chemical mismatch cleavage analysis and direct sequencing in order to facilitate a comprehensive search for mutations. We describe the characterization of nine potentially pathogenic mutations, six of which are novel. In each case, a correlation of the genotype with the observed phenotype is presented. In order to evaluate the pathogenicity of the five missense mutations detected, we have analyzed them for evolutionary sequence conservation and for their involvement of sequence motifs catalogued in the PROSITE database of protein sites and patterns.

  15. Phylogenetic inferences in Avena based on analysis of FL intron2 sequences.

    Science.gov (United States)

    Peng, Yuan-Ying; Wei, Yu-Ming; Baum, Bernard R; Yan, Ze-Hong; Lan, Xiu-Jin; Dai, Shou-Fen; Zheng, You-Liang

    2010-09-01

    The development and application of molecular methods in oats has been relatively slow compared with other crops. Results from the previous analyses have left many questions concerning species evolutionary relationships unanswered, especially regarding the origins of the B and D genomes, which are only known to be present in polyploid oat species. To investigate the species and genome relationships in genus Avena, among 13 diploid (A and C genomes), we used the second intron of the nuclear gene FLORICAULA/LEAFY (FL int2) in seven tetraploid (AB and AC genomes), and five hexaploid (ACD genome) species. The Avena FL int2 is rather long, and high levels of variation in length and sequence composition were found. Evidence for more than one copy of the FL int2 sequence was obtained for both the A and C genome groups, and the degree of divergence of the A genome copies was greater than that observed within the C genome sequences. Phylogenetic analysis of the FL int2 sequences resulted in topologies that contained four major groups; these groups reemphasize the major genomic divergence between the A and C genomes, and the close relationship among the A, B, and D genomes. However, the D genome in hexaploids more likely originated from a C genome diploid rather than the generally believed A genome, and the C genome diploid A. clauda may have played an important role in the origination of both the C and D genome in polyploids.

  16. Transcriptome sequencing and De Novo analysis of Youngia japonica using the illumina platform.

    Directory of Open Access Journals (Sweden)

    Yulan Peng

    Full Text Available Youngia japonica, a weed species distributed worldwide, has been widely used in traditional Chinese medicine. It is an ideal plant for studying the evolution of Asteraceae plants because of its short life history and abundant source. However, little is known about its evolution and genetic diversity. In this study, de novo transcriptome sequencing was conducted for the first time for the comprehensive analysis of the genetic diversity of Y. japonica. The Y. japonica transcriptome was sequenced using Illumina paired-end sequencing technology. We produced 21,847,909 high-quality reads for Y. japonica and assembled them into contigs. A total of 51,850 unigenes were identified, among which 46,087 were annotated in the NCBI non-redundant protein database and 41,752 were annotated in the Swiss-Prot database. We mapped 9,125 unigenes onto 163 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database. In addition, 3,648 simple sequence repeats (SSRs were detected. Our data provide the most comprehensive transcriptome resource currently available for Y. japonica. C4 photosynthesis unigenes were found in the biological process of Y. japonica. There were 5596 unigenes related to defense response and 1344 ungienes related to signal transduction mechanisms (10.95%. These data provide insights into the genetic diversity of Y. japonica. Numerous SSRs contributed to the development of novel markers. These data may serve as a new valuable resource for genomic studies on Youngia and, more generally, Cichoraceae.

  17. Haplotag: Software for Haplotype-Based Genotyping-by-Sequencing Analysis

    Directory of Open Access Journals (Sweden)

    Nicholas A. Tinker

    2016-04-01

    Full Text Available Genotyping-by-sequencing (GBS, and related methods, are based on high-throughput short-read sequencing of genomic complexity reductions followed by discovery of single nucleotide polymorphisms (SNPs within sequence tags. This provides a powerful and economical approach to whole-genome genotyping, facilitating applications in genomics, diversity analysis, and molecular breeding. However, due to the complexity of analyzing large data sets, applications of GBS may require substantial time, expertise, and computational resources. Haplotag, the novel GBS software described here, is freely available, and operates with minimal user-investment on widely available computer platforms. Haplotag is unique in fulfilling the following set of criteria: (1 operates without a reference genome; (2 can be used in a polyploid species; (3 provides a discovery mode, and a production mode; (4 discovers polymorphisms based on a model of tag-level haplotypes within sequenced tags; (5 reports SNPs as well as haplotype-based genotypes; and (6 provides an intuitive visual “passport” for each inferred locus. Haplotag is optimized for use in a self-pollinating plant species.

  18. Genomic Analysis of a Marine Bacterium: Bioinformatics for Comparison, Evaluation, and Interpretation of DNA Sequences

    Directory of Open Access Journals (Sweden)

    Bhagwan N. Rekadwad

    2016-01-01

    Full Text Available A total of five highly related strains of an unidentified marine bacterium were analyzed through their short genome sequences (AM260709–AM260713. Genome-to-Genome Distance (GGDC showed high similarity to Pseudoalteromonas haloplanktis (X67024. The generated unique Quick Response (QR codes indicated no identity to other microbial species or gene sequences. Chaos Game Representation (CGR showed the number of bases concentrated in the area. Guanine residues were highest in number followed by cytosine. Frequency of Chaos Game Representation (FCGR indicated that CC and GG blocks have higher frequency in the sequence from the evaluated marine bacterium strains. Maximum GC content for the marine bacterium strains ranged 53-54%. The use of QR codes, CGR, FCGR, and GC dataset helped in identifying and interpreting short genome sequences from specific isolates. A phylogenetic tree was constructed with the bootstrap test (1000 replicates using MEGA6 software. Principal Component Analysis (PCA was carried out using EMBL-EBI MUSCLE program. Thus, generated genomic data are of great assistance for hierarchical classification in Bacterial Systematics which combined with phenotypic features represents a basic procedure for a polyphasic approach on unambiguous bacterial isolate taxonomic classification.

  19. Development of Transcriptomic Markers for Population Analysis Using Restriction Site Associated RNA Sequencing (RARseq.

    Directory of Open Access Journals (Sweden)

    Magdy S Alabady

    Full Text Available We describe restriction site associated RNA sequencing (RARseq, an RNAseq-based genotype by sequencing (GBS method. It includes the construction of RNAseq libraries from double stranded cDNA digested with selected restriction enzymes. To test this, we constructed six single- and six-dual-digested RARseq libraries from six F2 pitcher plant individuals and sequenced them on a half of a Miseq run. On average, the de novo approach of population genome analysis detected 544 and 570 RNA SNPs, whereas the reference transcriptome-based approach revealed an average of 1907 and 1876 RNA SNPs per individual, from single- and dual-digested RARseq data, respectively. The average numbers of RNA SNPs and alleles per loci are 1.89 and 2.17, respectively. Our results suggest that the RARseq protocol allows good depth of coverage per loci for detecting RNA SNPs and polymorphic loci for population genomics and mapping analyses. In non-model systems where complete genomes sequences are not always available, RARseq data can be analyzed in reference to the transcriptome. In addition to enriching for functional markers, this method may prove particularly useful in organisms where the genomes are not favorable for DNA GBS.

  20. Sequence analysis of the msp4 gene of Anaplasma ovis strains

    Science.gov (United States)

    de la Fuente, J.; Atkinson, M.W.; Naranjo, V.; Fernandez de Mera, I. G.; Mangold, A.J.; Keating, K.A.; Kocan, K.M.

    2007-01-01

    Anaplasma ovis (Rickettsiales: Anaplasmataceae) is a tick-borne pathogen of sheep, goats and wild ruminants. The genetic diversity of A. ovis strains has not been well characterized due to the lack of sequence information. In this study, we evaluated bighorn sheep (Ovis canadensis) and mule deer (Odocoileus hemionus) from Montana for infection with A. ovis by serology and sequence analysis of the msp4 gene. Antibodies to Anaplasma spp. were detected in 37% and 39% of bighorn sheep and mule deer analyzed, respectively. Four new msp4 genotypes were identified. The A. ovis msp4 sequences identified herein were analyzed together with sequences reported previously for the characterization of the genetic diversity of A. ovis strains in comparison with other Anaplasma spp. The results of these studies demonstrated that although A. ovis msp4 genotypes may vary among geographic regions and between sheep and deer hosts, the variation observed was less than the variation observed between A. marginale and A. phagocytophilum strains. The results reported herein further confirm that A. ovis infection occurs in natural wild ruminant populations in Western United States and that bighorn sheep and mule deer may serve as wildlife reservoirs of A. ovis. ?? 2006.

  1. HBS-Tools for Hairpin Bisulfite Sequencing Data Processing and Analysis

    Directory of Open Access Journals (Sweden)

    Ming-an Sun

    2015-01-01

    Full Text Available The emerging genome-wide hairpin bisulfite sequencing (hairpin-BS-Seq technique enables the determination of the methylation pattern for DNA double strands simultaneously. Compared with traditional bisulfite sequencing (BS-Seq techniques, hairpin-BS-Seq can determine methylation fidelity and increase mapping efficiency. However, no computational tool has been designed for the analysis of hairpin-BS-Seq data yet. Here we present HBS-tools, a set of command line based tools for the preprocessing, mapping, methylation calling, and summarizing of genome-wide hairpin-BS-Seq data. It accepts paired-end hairpin-BS-Seq reads to recover the original (pre-bisulfite-converted sequences using global alignment and then calls the methylation statuses for cytosines on both DNA strands after mapping the original sequences to the reference genome. After applying to hairpin-BS-Seq datasets, we found that HBS-tools have a reduced mapping time and improved mapping efficiency compared with state-of-the-art mapping tools. The HBS-tools source scripts, along with user guide and testing data, are freely available for download.

  2. DNA Barcoding: Amplification and sequence analysis of rbcl and matK genome regions in three divergent plant species

    Directory of Open Access Journals (Sweden)

    Javed Iqbal Wattoo

    2016-11-01

    Full Text Available Background: DNA barcoding is a novel method of species identification based on nucleotide diversity of conserved sequences. The establishment and refining of plant DNA barcoding systems is more challenging due to high genetic diversity among different species. Therefore, targeting the conserved nuclear transcribed regions would be more reliable for plant scientists to reveal genetic diversity, species discrimination and phylogeny. Methods: In this study, we amplified and sequenced the chloroplast DNA regions (matk+rbcl of Solanum nigrum, Euphorbia helioscopia and Dalbergia sissoo to study the functional annotation, homology modeling and sequence analysis to allow a more efficient utilization of these sequences among different plant species. These three species represent three families; Solanaceae, Euphorbiaceae and Fabaceae respectively. Biological sequence homology and divergence of amplified sequences was studied using Basic Local Alignment Tool (BLAST. Results: Both primers (matk+rbcl showed good amplification in three species. The sequenced regions reveled conserved genome information for future identification of different medicinal plants belonging to these species. The amplified conserved barcodes revealed different levels of biological homology after sequence analysis. The results clearly showed that the use of these conserved DNA sequences as barcode primers would be an accurate way for species identification and discrimination. Conclusion: The amplification and sequencing of conserved genome regions identified a novel sequence of matK in native species of Solanum nigrum. The findings of the study would be applicable in medicinal industry to establish DNA based identification of different medicinal plant species to monitor adulteration.

  3. Signs of positive selection of somatic mutations in human cancers detected by EST sequence analysis

    International Nuclear Information System (INIS)

    Babenko, Vladimir N; Basu, Malay K; Kondrashov, Fyodor A; Rogozin, Igor B; Koonin, Eugene V

    2006-01-01

    Carcinogenesis typically involves multiple somatic mutations in caretaker (DNA repair) and gatekeeper (tumor suppressors and oncogenes) genes. Analysis of mutation spectra of the tumor suppressor that is most commonly mutated in human cancers, p53, unexpectedly suggested that somatic evolution of the p53 gene during tumorigenesis is dominated by positive selection for gain of function. This conclusion is supported by accumulating experimental evidence of evolution of new functions of p53 in tumors. These findings prompted a genome-wide analysis of possible positive selection during tumor evolution. A comprehensive analysis of probable somatic mutations in the sequences of Expressed Sequence Tags (ESTs) from malignant tumors and normal tissues was performed in order to access the prevalence of positive selection in cancer evolution. For each EST, the numbers of synonymous and non-synonymous substitutions were calculated. In order to identify genes with a signature of positive selection in cancers, these numbers were compared to: i) expected numbers and ii) the numbers for the respective genes in the ESTs from normal tissues. We identified 112 genes with a signature of positive selection in cancers, i.e., a significantly elevated ratio of non-synonymous to synonymous substitutions, in tumors as compared to 37 such genes in an approximately equal-sized EST collection from normal tissues. A substantial fraction of the tumor-specific positive-selection candidates have experimentally demonstrated or strongly predicted links to cancer. The results of EST analysis should be interpreted with extreme caution given the noise introduced by sequencing errors and undetected polymorphisms. Furthermore, an inherent limitation of EST analysis is that multiple mutations amenable to statistical analysis can be detected only in relatively highly expressed genes. Nevertheless, the present results suggest that positive selection might affect a substantial number of genes during

  4. Type division and controlling factor analysis of 3rd-order sequences in marine carbonate rocks

    Directory of Open Access Journals (Sweden)

    Yunbo Zhang

    2014-03-01

    Full Text Available Type division and controlling factor analysis of 3rd-order sequence are of practical significance to tectonic analysis, sedimentary environment identification, and other geological researches. Based on the comprehensive analysis of carbon and oxygen isotope trends, paleobathymetry and spectral-frequency of representative well logs, 3rd-order sequences can be divided into 3 types: (a global sea level (GSL sequence mainly controlled by GSL change; (b tectonic sequence mainly controlled by regional tectonic activity; and (c composite sequence jointly controlled by GSL change and regional tectonic activity. This study aims to identify the controlling factors of 3rd-order sequences and to illustrate a new method for classification of 3rd-order sequences of the middle Permian strata in the Sichuan Basin, China. The middle Permian strata in the Sichuan Basin consist of 3 basin-contrastive 3rd-order sequences, i.e., PSQ1, PSQ2 and PSQ3. Of these, PSQ1 is a GSL sequence while PSQ2 and PSQ3 are composite sequences. The results suggest that the depositional environment was stable during the deposition of PSQ1, but was activated by tectonic activity during the deposition of the middle Permian Maokou Formation.

  5. Post-contrast T1-weighted sequences in pediatric abdominal imaging: comparative analysis of three different sequences and imaging approach

    International Nuclear Information System (INIS)

    Roque, Andreia; Ramalho, Miguel; AlObaidy, Mamdoh; Heredia, Vasco; Burke, Lauren M.; De Campos, Rafael O.P.; Semelka, Richard C.

    2014-01-01

    Post-contrast T1-weighted imaging is an essential component of a comprehensive pediatric abdominopelvic MR examination. However, consistent good image quality is challenging, as respiratory motion in sedated children can substantially degrade the image quality. To compare the image quality of three different post-contrast T1-weighted imaging techniques - standard three-dimensional gradient-echo (3-D-GRE), magnetization-prepared gradient-recall echo (MP-GRE) and 3-D-GRE with radial data sampling (radial 3-D-GRE) - acquired in pediatric patients younger than 5 years of age. Sixty consecutive exams performed in 51 patients (23 females, 28 males; mean age 2.5 ± 1.4 years) constituted the final study population. Thirty-nine scans were performed at 3 T and 21 scans were performed at 1.5 T. Two different reviewers independently and blindly qualitatively evaluated all sequences to determine image quality and extent of artifacts. MP-GRE and radial 3-D-GRE sequences had the least respiratory motion (P < 0.0001). Standard 3-D-GRE sequences displayed the lowest average score ratings in hepatic and pancreatic edge definition, hepatic vessel clarity and overall image quality. Radial 3-D-GRE sequences showed the highest scores ratings in overall image quality. Our preliminary results support the preference of fat-suppressed radial 3-D-GRE as the best post-contrast T1-weighted imaging approach for patients under the age of 5 years, when dynamic imaging is not essential. (orig.)

  6. Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud.

    Directory of Open Access Journals (Sweden)

    Malachi Griffith

    2015-08-01

    Full Text Available Massively parallel RNA sequencing (RNA-seq has rapidly become the assay of choice for interrogating RNA transcript abundance and diversity. This article provides a detailed introduction to fundamental RNA-seq molecular biology and informatics concepts. We make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods. These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at www.rnaseq.wiki.

  7. NEW BIOSTRATIGRAPHIC DATA ON THE FRAZZANO' FORMATION (LONGI-TAORMINA UNIT: CONSEQUENCES ON DEFINING THE DEFORMATION AGE OF THE CALABRIA-PELORITANI ARC SOUTHERN SECTOR

    Directory of Open Access Journals (Sweden)

    PAOLA DE CAPOA

    1997-11-01

    Full Text Available New biostratigraphic data on the Frazzanò Flysch Formation are presented. This unit is the topmost formation of the stratigraphic succession characterizing the Longi-Taormina Unit, which in turn represents the lowest tectonic unit of the Peloritani Mountains and the only unit in the entire southern sector of the Calabria-Peloritani Arc in which cenozoic terrains have been recognized. The age of the Frazzanò Fm., which as yet has not been well defined, is essential to ascertain the time period during which the tectogenetic phase responsible for the stacking (superposition of the nappes in the Peloritani Mountains occurred . Coltro (1967 reported foraminiferal assemblages of Late Eocene age, but subsequently ages ranging between the Middle Eocene and the Oligocene have been pro posed, none of them supported by new biostratigraphic data. The identification of some coccolithid taxa which appear in the Late Oligocene and Early Miocene allowed us to attribute an age not older than Upper Oligocene to the levels that mark the transition between the Frazzanò Fm.and the underlying Militello Formation, and an age not older than Early Aquitanian to the most recent beds of the Frazzanò Formation. Therefore, the tectogenetic phase responsible for the superposition of the nappes in the Peloritani Mountains, very likely started during the Aquitanian. While these data agree with the evolution of homologous units recognised in the Betic and Rifian sectors, they challenge the Late Oligocene age ascribed to the basal levels of the Stilo-Capo d'Orlando Formation, which lies unconformably over all the tectonic units of the Calabria-Peloritani Arc and pro vides a chronological upper limit to their overthrusting.    

  8. Purification and sequence analysis of the mRNA coding for an immunoglobulin heavy chain

    International Nuclear Information System (INIS)

    Cowan, N.J.; Secher, D.S.; Milstein, C.

    1976-01-01

    A mutant cell line (IF2) derived from the mouse myeloma MOPC 21 has been used for the isolation and sequence analysis of H-chain mRNA. The IF2 cells synthesise an H-chain of reduced size in which the Csub(H)1 homology region is missing. Sizing of the IF2 H-chain mRNA and wild-type H-chain mRNA revealed that the deletion is expressed at the mRNA level. The mutant H-chain mRNA sedimented at 16-S, enabling effective resolution from 18-S ribosomal RNA. In experiments using IF2 cells labelled with [ 32 P]phosphate, the 16-S mRNA was purified by oligo(T)-cellulose chromatography. Polyacrylamide gel analysis of the poly(A)-containing fraction showed the presence of a single radioactive band. Comparison of the mobility of this band relative to markers of known molecular weight revealed that the molecule contained about 1,600 nucleotides. Digestion of the 32 P-labelled mRNA with T 1 ribonuclease and two-dimensional fractionation of the resulting oligonucleotides yielded a fingerprint' suitable for a preliminary sequence analysis. By using the established amino acid sequence of the IF2 H-chain and a knowledge of the genetic code, 14 oligonucleotides were assigned within the constant region and four within the variable region of the IF2 H-chain. This sequence data accounts for 19.5% of the coding region. Several other oligonucleotides, which could not be assigned within the coding region but which occurred in approximately molar yield, have also been partially characterised. These oligonucleotides are presumably derived from the untranslated regions of the mRNA. (orig.) [de

  9. Phylogenetic analysis of H1N1 sequences from pandemic infections during 2009 in India.

    Science.gov (United States)

    Flavia, Guntupally Balaswamy Arti; Natarajaseenivasan, Kalimuthusamy

    2011-02-15

    Since April 2009, a serious pandemic infection has been rapidly spread across the world. These infections are caused due to the novel swine origin influenza A (H1N1) virus and hence these are commonly called as "Swine Flu". This new virus is the reassortment of avian, human and swine influenza viruses and thus it has a unique genome composition. There are 16 different types of hemagglutinin (HA) and 9 different types of neuraminidase (NA) that can be genetically and antigenetically differentiated. The first influenza A virus isolated from pigs was of the H1N1 subtype and these viruses have been reported to cause infection in pigs in many countries. The outbreak of this virus has been transmitted from pigs to humans. This new reassorted (exchange of genes) virus which is the cause of 2009 pandemic infections has the ability to spread from human to human. This spread of infection should be brought to an end. In this study, a phylogenetic analysis of the nucleotide sequences of the RNA segments of human H1N1 viruses was carried using MEGA version 4.0 to demonstrate the route map of infection to India. Phylogenetic analysis of the sequences from India, published in Influenza Virus Resource (a database that integrates information gathered from the Influenza Genome Sequencing Project of the National Institute of Allergy and Infectious diseases (NIAID) and the genbank of the (NCBI)) was retrieved and used for the analysis. The results showed that the various segments of the Indian isolates clustered well with the sequences from American, Asian and European countries and thus indicating the transmission of viruses from these places to India.

  10. Expressed Sequence Tag-Simple Sequence Repeat (EST-SSR Marker Resources for Diversity Analysis of Mango (Mangifera indica L.

    Directory of Open Access Journals (Sweden)

    Natalie L. Dillon

    2014-01-01

    Full Text Available In this study, a collection of 24,840 expressed sequence tags (ESTs generated from five mango (Mangifera indica L. cDNA libraries was mined for EST-based simple sequence repeat (SSR markers. Over 1,000 ESTs with SSR motifs were detected from more than 24,000 EST sequences with di- and tri-nucleotide repeat motifs the most abundant. Of these, 25 EST-SSRs in genes involved in plant development, stress response, and fruit color and flavor development pathways were selected, developed into PCR markers and characterized in a population of 32 mango selections including M. indica varieties, and related Mangifera species. Twenty-four of the 25 EST-SSR markers exhibited polymorphisms, identifying a total of 86 alleles with an average of 5.38 alleles per locus, and distinguished between all Mangifera selections. Private alleles were identified for Mangifera species. These newly developed EST-SSR markers enhance the current 11 SSR mango genetic identity panel utilized by the Australian Mango Breeding Program. The current panel has been used to identify progeny and parents for selection and the application of this extended panel will further improve and help to design mango hybridization strategies for increased breeding efficiency.

  11. SNP Analysis and Whole Exome Sequencing: Their Application in the Analysis of a Consanguineous Pedigree Segregating Ataxia

    Directory of Open Access Journals (Sweden)

    Sarah L. Nickerson

    2015-10-01

    Full Text Available Autosomal recessive cerebellar ataxia encompasses a large and heterogeneous group of neurodegenerative disorders. We employed single nucleotide polymorphism (SNP analysis and whole exome sequencing to investigate a consanguineous Maori pedigree segregating ataxia. We identified a novel mutation in exon 10 of the SACS gene: c.7962T>G p.(Tyr2654*, establishing the diagnosis of autosomal recessive spastic ataxia of Charlevoix-Saguenay (ARSACS. Our findings expand both the genetic and phenotypic spectrum of this rare disorder, and highlight the value of high-density SNP analysis and whole exome sequencing as powerful and cost-effective tools in the diagnosis of genetically heterogeneous disorders such as the hereditary ataxias.

  12. Multilocus Sequence Analysis of Cercospora spp. from Different Host Plant Families

    Directory of Open Access Journals (Sweden)

    Floreta Fiska Yuliarni

    2014-06-01

    Full Text Available Identification of the genus Cercospora is still complicated due to the host preferences often being used as the main criteria to propose a new name. We determined the relationship between host plants and multilocus sequence variations (ITS rDNA including 5.8S rDNA, elongation factor 1-α, and calmodulin in Cercospora spp. to investigate the host specificity. We used 53 strains of Cercospora spp. infecting 12 plant families for phylogenetic analysis. The sequences of 23 strains of Cercospora spp. infecting the plant families of Asteraceae, Cucurbitaceae, and Solanaceae were determined in this study. The sequences of 30 strains of Cercospora spp. infecting the plant families of Fabaceae, Amaranthaceae, Apiaceae, Plumbaginaceae, Malvaceae, Cistaceae, Plantaginaceae, Lamiaceae, and Poaceae were obtained from GenBank. The molecular phylogenetic analysis revealed that the majority of Cercospora species lack host specificity, and only C. zinniicola, C. zeina, C. zeae-maydis, C. cocciniae, and C. mikaniicola were found to be host-specific. Closely related species of Cercospora could not be distinguished using molecular analyses of ITS, EF, and CAL gene regions. The topology of the phylogenetic tree based on the CAL gene showed a better topology and Cercospora species separation than the trees developed based on the ITS rDNA region or the EF gene.

  13. Molecular Analysis of Methanogen Richness in Landfill and Marshland Targeting 16S rDNA Sequences.

    Science.gov (United States)

    Yadav, Shailendra; Kundu, Sharbadeb; Ghosh, Sankar K; Maitra, S S

    2015-01-01

    Methanogens, a key contributor in global carbon cycling, methane emission, and alternative energy production, generate methane gas via anaerobic digestion of organic matter. The methane emission potential depends upon methanogenic diversity and activity. Since they are anaerobes and difficult to isolate and culture, their diversity present in the landfill sites of Delhi and marshlands of Southern Assam, India, was analyzed using molecular techniques like 16S rDNA sequencing, DGGE, and qPCR. The sequencing results indicated the presence of methanogens belonging to the seventh order and also the order Methanomicrobiales in the Ghazipur and Bhalsawa landfill sites of Delhi. Sequences, related to the phyla Crenarchaeota (thermophilic) and Thaumarchaeota (mesophilic), were detected from marshland sites of Southern Assam, India. Jaccard analysis of DGGE gel using Gel2K showed three main clusters depending on the number and similarity of band patterns. The copy number analysis of hydrogenotrophic methanogens using qPCR indicates higher abundance in landfill sites of Delhi as compared to the marshlands of Southern Assam. The knowledge about "methanogenic archaea composition" and "abundance" in the contrasting ecosystems like "landfill" and "marshland" may reorient our understanding of the Archaea inhabitants. This study could shed light on the relationship between methane-dynamics and the global warming process.

  14. Molecular Analysis of Methanogen Richness in Landfill and Marshland Targeting 16S rDNA Sequences

    Directory of Open Access Journals (Sweden)

    Shailendra Yadav

    2015-01-01

    Full Text Available Methanogens, a key contributor in global carbon cycling, methane emission, and alternative energy production, generate methane gas via anaerobic digestion of organic matter. The methane emission potential depends upon methanogenic diversity and activity. Since they are anaerobes and difficult to isolate and culture, their diversity present in the landfill sites of Delhi and marshlands of Southern Assam, India, was analyzed using molecular techniques like 16S rDNA sequencing, DGGE, and qPCR. The sequencing results indicated the presence of methanogens belonging to the seventh order and also the order Methanomicrobiales in the Ghazipur and Bhalsawa landfill sites of Delhi. Sequences, related to the phyla Crenarchaeota (thermophilic and Thaumarchaeota (mesophilic, were detected from marshland sites of Southern Assam, India. Jaccard analysis of DGGE gel using Gel2K showed three main clusters depending on the number and similarity of band patterns. The copy number analysis of hydrogenotrophic methanogens using qPCR indicates higher abundance in landfill sites of Delhi as compared to the marshlands of Southern Assam. The knowledge about “methanogenic archaea composition” and “abundance” in the contrasting ecosystems like “landfill” and “marshland” may reorient our understanding of the Archaea inhabitants. This study could shed light on the relationship between methane-dynamics and the global warming process.

  15. Molecular cloning and sequencing analysis of the interferon receptor (IFNAR-1) from Columba livia.

    Science.gov (United States)

    Li, Chao; Chang, Wei Shan

    2014-01-01

    Partial sequence cloning of interferon receptor (IFNAR-1) of Columba livia. In order to obtain a certain length (630 bp) of gene, a pair of primers was designed according to the conserved nucleotide sequence of Gallus (EU477527.1) and Taeniopygia guttata (XM_002189232.1) IFNAR-1 gene fragment that was published by GenBank. Special primers were designed by the Race method to amplify the 3'terminal cDNA. The Columba livia IFNAR-1 displayed 88.5%, 80.5% and 73.8% nucleotide identity to Falco peregrinus, Gallus and Taeniopygia guttata, respectively. Phylogenetic analysis of the IFNAR1 gene showed that the relationship of Columba livia, Falco peregrinus and chicken had high homology. We successfully obtained a Columba livia IFNAR-1 gene partial sequence. Analysis of the genetic tree showed that the relationship of Columba livia and Falco peregrinus IFNAR-1 had high homology. This result can be used as reference for further research and practical application.

  16. HIERARCHICAL ADAPTIVE ROOD PATTERN SEARCH FOR MOTION ESTIMATION AT VIDEO SEQUENCE ANALYSIS

    Directory of Open Access Journals (Sweden)

    V. T. Nguyen

    2016-05-01

    Full Text Available Subject of Research.The paper deals with the motion estimation algorithms for the analysis of video sequences in compression standards MPEG-4 Visual and H.264. Anew algorithm has been offered based on the analysis of the advantages and disadvantages of existing algorithms. Method. Thealgorithm is called hierarchical adaptive rood pattern search (Hierarchical ARPS, HARPS. This new algorithm includes the classic adaptive rood pattern search ARPS and hierarchical search MP (Hierarchical search or Mean pyramid. All motion estimation algorithms have been implemented using MATLAB package and tested with several video sequences. Main Results. The criteria for evaluating the algorithms were: speed, peak signal to noise ratio, mean square error and mean absolute deviation. The proposed method showed a much better performance at a comparable error and deviation. The peak signal to noise ratio in different video sequences shows better and worse results than characteristics of known algorithms so it requires further investigation. Practical Relevance. Application of this algorithm in MPEG-4 and H.264 codecs instead of the standard can significantly reduce compression time. This feature enables to recommend it in telecommunication systems for multimedia data storing, transmission and processing.

  17. Zooplankton diversity analysis through single-gene sequencing of a community sample

    Directory of Open Access Journals (Sweden)

    Nishida Mutsumi

    2009-09-01

    Full Text Available Abstract Background Oceans cover more than 70% of the earth's surface and are critical for the homeostasis of the environment. Among the components of the ocean ecosystem, zooplankton play vital roles in energy and matter transfer through the system. Despite their importance, understanding of zooplankton biodiversity is limited because of their fragile nature, small body size, and the large number of species from various taxonomic phyla. Here we present the results of single-gene zooplankton community analysis using a method that determines a large number of mitochondrial COI gene sequences from a bulk zooplankton sample. This approach will enable us to estimate the species richness of almost the entire zooplankton community. Results A sample was collected from a depth of 721 m to the surface in the western equatorial Pacific off Pohnpei Island, Micronesia, with a plankton net equipped with a 2-m2 mouth opening. A total of 1,336 mitochondrial COI gene sequences were determined from the cDNA library made from the sample. From the determined sequences, the occurrence of 189 species of zooplankton was estimated. BLASTN search results showed high degrees of similarity (>98% between the query and database for 10 species, including holozooplankton and merozooplankton. Conclusion In conjunction with the Census of Marine Zooplankton and Barcode of Life projects, single-gene zooplankton community analysis will be a powerful tool for estimating the species richness of zooplankton communities.

  18. Comparative analysis of sequences, polymorphisms and topology of yeasts aquaporins and aquaglyceroporins.

    Science.gov (United States)

    Sabir, Farzana; Loureiro-Dias, Maria C; Prista, Catarina

    2016-05-01

    Efficient homeostasis of water and glycerol is a prerequisite for osmoregulation and other aspects of yeasts life. The cellular status of these molecules is often associated with functional presence of aquaporins and aquaglyceroporins. The present study provides a detailed updated analysis of aquaporins and aquaglyceroporins in 47 yeast species. A comprehensive analysis of aquaporins and aquaglyceroporins in 38 strains of Saccharomyces cerevisiae from different ecological niches is also presented. The functionality of specific aquaporins in yeasts has been associated with their adaptation requirements in different environmental conditions. In the present study, various inactivating mutations in aquaporin sequences were found in strains of S. cerevisiae Likewise, several new interesting polymorphisms in aquaglyceroporin sequences of some commercial wine and brewing strains, vineyard and bakery strains were also observed. Conceivably, both in the case of aquaporins and aquaglyceroporins inactivating mutations resulted in competitive advantage in selected environments. Topology and conservation of important regulatory residues within all sequences are also analyzed. We expect that the present review may contribute to establish the functional relevance of aquaporins/aquaglyceroporins for various aspects of yeasts physiology. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  19. In silico Analysis of osr40c1 Promoter Sequence Isolated from Indica Variety Pokkali

    Directory of Open Access Journals (Sweden)

    W.S.I. de Silva

    2017-07-01

    Full Text Available The promoter region of a drought and abscisic acid (ABA inducible gene, osr40c1, was isolated from a salt-tolerant indica rice variety Pokkali, which is 670 bp upstream of the putative translation start codon. In silico promoter analysis of resulted sequence showed that at least 15 types of putative motifs were distributed within the sequence, including two types of common promoter elements, TATA and CAAT boxes. Additionally, several putative cis-acing regulatory elements which may be involved in regulation of osr40c1 expression under different conditions were found in the 5′-upstream region of osr40c1. These are ABA-responsive element, light-responsive elements (ATCT-motif, Box I, G-box, GT1-motif, Gap-box and Sp1, myeloblastosis oncogene response element (CCAAT-box, auxin responsive element (TGA-element, gibberellin-responsive element (GARE-motif and fungal-elicitor responsive elements (Box E and Box-W1. A putative regulatory element, required for endosperm-specific pattern of gene expression designated as Skn-1 motif, was also detected in the Pokkali osr40c1 promoter region. In conclusion, the bioinformatic analysis of osr40c1 promoter region isolated from indica rice variety Pokkali led to the identification of several important stress-responsive cis-acting regulatory elements, and therefore, the isolated promoter sequence could be employed in rice genetic transformation to mediate expression of abiotic stress induced genes.

  20. Sequence analysis of ORF IV RTBV isolated from tungro infected Oryza sativa L. cv Ciherang

    Science.gov (United States)

    Hastilestari, Bernadetta Rina; Astuti, Dwi; Estiati, Amy; Nugroho, Satya

    2015-09-01

    The Effort to increase rice production is often constrained by pest and disease such as Tungro. The Tungro disease is caused by the joint infection with two dissimilar viruses; a bacil-form-DNA virus, the Rice tungro bacilliform virus(RTBV) and the spherical RNA virus, Rice tungro spherical virus (RTSV) and transmitted by Green leafhopper (Nephotettix virescens). The symptom of disease is caused by the presence of RTBV. The genome of RTBV consists of four Open reading frames (ORFs) which encode functional proteins. Of the four, ORF IV is unique because it exists only in RTBV. The most efficient method of generating disease resistance plants is to look for natural sources of resistance genes in wild or germplasm and then transfer the gene and the accompanying resistance in cultivated crop varieties. The aim of this study is, therefore, to isolate and analyze of 1170 bp gene of ORF 4 of Tungro virus isolated from an Indonesian rice cultivar, Ciherang (Oryza sativa L. cv Indica). DNA sequencing analysis using BLAST showed 94% similarity with the reference sequence gen bank Acc.M65026.1. The comparisons and mutation analysis of DNA sequences were discussed in this research.

  1. Genome-Wide Analysis of Simple Sequence Repeats in Bitter Gourd (Momordica charantia

    Directory of Open Access Journals (Sweden)

    Junjie Cui

    2017-06-01

    Full Text Available Bitter gourd (Momordica charantia is widely cultivated as a vegetable and medicinal herb in many Asian and African countries. After the sequencing of the cucumber (Cucumis sativus, watermelon (Citrullus lanatus, and melon (Cucumis melo genomes, bitter gourd became the fourth cucurbit species whose whole genome was sequenced. However, a comprehensive analysis of simple sequence repeats (SSRs in bitter gourd, including a comparison with the three aforementioned cucurbit species has not yet been published. Here, we identified a total of 188,091 and 167,160 SSR motifs in the genomes of the bitter gourd lines ‘Dali-11’ and ‘OHB3-1,’ respectively. Subsequently, the SSR content, motif lengths, and classified motif types were characterized for the bitter gourd genomes and compared among all the cucurbit genomes. Lastly, a large set of 138,727 unique in silico SSR primer pairs were designed for bitter gourd. Among these, 71 primers were selected, all of which successfully amplified SSRs from the two bitter gourd lines ‘Dali-11’ and ‘K44’. To further examine the utilization of unique SSR primers, 21 SSR markers were used to genotype a collection of 211 bitter gourd lines from all over the world. A model-based clustering method and phylogenetic analysis indicated a clear separation among the geographic groups. The genomic SSR markers developed in this study have considerable potential value in advancing bitter gourd research.

  2. Sequence and transcription analysis of the human cytomegalovirus DNA polymerase gene

    International Nuclear Information System (INIS)

    Kouzarides, T.; Bankier, A.T.; Satchwell, S.C.; Weston, K.; Tomlinson, P.; Barrell, B.G.

    1987-01-01

    DNA sequence analysis has revealed that the gene coding for the human cytomegalovirus (HCMV) DNA polymerase is present within the long unique region of the virus genome. Identification is based on extensive amino acid homology between the predicted HCMV open reading frame HFLF2 and the DNA polymerase of herpes simplex virus type 1. The authors present here a 5280 base-pair DNA sequence containing the HCMV pol gene, along with the analysis of transcripts encoded within this region. Since HCMV pol also shows homology to the predicted Epstein-Barr virus pol, they were able to analyze the extent of homology between the DNA polymerases of three distantly related herpes viruses, HCMV, Epstein-Barr virus, and herpes simplex virus. The comparison shows that these DNA polymerases exhibit considerable amino acid homology and highlights a number of highly conserved regions; two such regions show homology to sequences within the adenovirus type 2 DNA polymerase. The HCMV pol gene is flanked by open reading frames with homology to those of other herpes viruses; upstream, there is a reading frame homologous to the glycoprotein B gene of herpes simplex virus type I and Epstein-Barr virus, and downstream there is a reading frame homologous to BFLF2 of Epstein-Barr virus

  3. De novo sequencing and a comprehensive analysis of purple sweet potato (Impomoea batatas L.) transcriptome.

    Science.gov (United States)

    Xie, Fuliang; Burklew, Caitlin E; Yang, Yanfang; Liu, Min; Xiao, Peng; Zhang, Baohong; Qiu, Deyou

    2012-07-01

    High-throughput RNA sequencing was performed for comprehensively analyzing the transcriptome of the purple sweet potato. A total of 58,800 unigenes were obtained and ranged from 200 nt to 10,380 nt with an average length of 476 nt. The average expression of one unigene was 34 reads per kb per million reads (RPKM) with a maximum expression of 1,935 RPKM. At least 40,280 (68.5%) unigenes were identified to be protein-coding genes, in which 11,978 and 5,184 genes were homologous to Arabidopsis and rice proteins, respectively. Gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) analysis showed that 19,707 (33.5%) unigenes were classified to 1,807 terms of GO including molecular functions, biological processes, and cellular components and 9,970 (17.0%) unigenes were enriched to 11,119 KEGG pathways. We found that at least 3,553 genes may be involved in the biosynthesis pathways of starch, alkaloids, anthocyanin pigments, and vitamins. Additionally, 851 potential simple sequence repeats (SSRs) were identified in all unigenes. Transcriptome sequencing on tuberous roots of the sweet potato yielded substantial transcriptional sequences and potentially useful SSR markers which provide an important data source for sweet potato research. Comparison of two RNA-sequence datasets from the purple and the yellow sweet potato showed that UDP-glucose-flavonoid 3-O-glucosyltransferase was one of the key enzymes in the pathway of anthocyanin biosynthesis and that anthocyanin-3-glucoside might be one of the major components for anthocyanin pigments in the purple sweet potato. This study contributes to the molecular mechanisms of sweet potato development and metabolism and therefore that increases the potential utilization of the sweet potato in food nutrition and pharmacy.

  4. A Reference Viral Database (RVDB) To Enhance Bioinformatics Analysis of High-Throughput Sequencing for Novel Virus Detection.

    Science.gov (United States)

    Goodacre, Norman; Aljanahi, Aisha; Nandakumar, Subhiksha; Mikailov, Mike; Khan, Arifa S

    2018-01-01

    Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have

  5. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

    Science.gov (United States)

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

    2015-05-01

    To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.

  6. Analysis of simple sequence repeats in rice bean (Vigna umbellata using an SSR-enriched library

    Directory of Open Access Journals (Sweden)

    Lixia Wang

    2016-02-01

    Full Text Available Rice bean (Vigna umbellata Thunb., a warm-season annual legume, is grown in Asia mainly for dried grain or fodder and plays an important role in human and animal nutrition because the grains are rich in protein and some essential fatty acids and minerals. With the aim of expediting the genetic improvement of rice bean, we initiated a project to develop genomic resources and tools for molecular breeding in this little-known but important crop. Here we report the construction of an SSR-enriched genomic library from DNA extracted from pooled young leaf tissues of 22 rice bean genotypes and developing SSR markers. In 433,562 reads generated by a Roche 454 GS-FLX sequencer, we identified 261,458 SSRs, of which 48.8% were of compound form. Dinucleotide repeats were predominant with an absolute proportion of 81.6%, followed by trinucleotides (17.8%. Other types together accounted for 0.6%. The motif AC/GT accounted for 77.7% of the total, followed by AAG/CTT (14.3%, and all others accounted for 12.0%. Among the flanking sequences, 2928 matched putative genes or gene models in the protein database of Arabidopsis thaliana, corresponding with 608 non-redundant Gene Ontology terms. Of these sequences, 11.2% were involved in cellular components, 24.2% were involved molecular functions, and 64.6% were associated with biological processes. Based on homolog analysis, 1595 flanking sequences were similar to mung bean and 500 to common bean genomic sequences. Comparative mapping was conducted using 350 sequences homologous to both mung bean and common bean sequences. Finally, a set of primer pairs were designed, and a validation test showed that 58 of 220 new primers can be used in rice bean and 53 can be transferred to mung bean. However, only 11 were polymorphic when tested on 32 rice bean varieties. We propose that this study lays the groundwork for developing novel SSR markers and will enhance the mapping of qualitative and quantitative traits and marker

  7. Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing.

    Science.gov (United States)

    Zhao, Shanrong; Prenger, Kurt; Smith, Lance; Messina, Thomas; Fan, Hongtao; Jaeger, Edward; Stephens, Susan

    2013-06-27

    Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses. Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies. Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of the box. Rainbow is available

  8. Membrane gene ontology bias in sequencing and microarray obtained by housekeeping-gene analysis.

    Science.gov (United States)

    Zhang, Yijuan; Akintola, Oluwafemi S; Liu, Ken J A; Sun, Bingyun

    2016-01-10

    Microarray (MA) and high-throughput sequencing are two commonly used detection systems for global gene expression profiling. Although these two systems are frequently used in parallel, the differences in their final results have not been examined thoroughly. Transcriptomic analysis of housekeeping (HK) genes provides a unique opportunity to reliably examine the technical difference between these two systems. We investigated here the structure, genome location, expression quantity, microarray probe coverage, as well as biological functions of differentially identified human HK genes by 9 MA and 6 sequencing studies. These in-depth analyses allowed us to discover, for the first time, a subset of transcripts encoding membrane, cell surface and nuclear proteins that were prone to differential identification by the two platforms. We hope that the discovery can aid the future development of these technologies for comprehensive transcriptomic studies. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. Whole-exome sequencing and homozygosity analysis implicate depolarization-regulated neuronal genes in autism.

    Directory of Open Access Journals (Sweden)

    Maria H Chahrour

    Full Text Available Although autism has a clear genetic component, the high genetic heterogeneity of the disorder has been a challenge for the identification of causative genes. We used homozygosity analysis to identify probands from nonconsanguineous families that showed evidence of distant shared ancestry, suggesting potentially recessive mutations. Whole-exome sequencing of 16 probands revealed validated homozygous, potentially pathogenic recessive mutations that segregated perfectly with disease in 4/16 families. The candidate genes (UBE3B, CLTCL1, NCKAP5L, ZNF18 encode proteins involved in proteolysis, GTPase-mediated signaling, cytoskeletal organization, and other pathways. Furthermore, neuronal depolarization regulated the transcription of these genes, suggesting potential activity-dependent roles in neurons. We present a multidimensional strategy for filtering whole-exome sequence data to find candidate recessive mutations in autism, which may have broader applicability to other complex, heterogeneous disorders.

  10. Comparative analysis of protein coding sequences from human, mouse, and the domesticated pig  

    DEFF Research Database (Denmark)

    Jørgensen, Frank Grønlund; Hobolth, Asger; Hornshøj, H.

    2005-01-01

    Background The availability of abundant sequence data from key model organisms has made large scale studies of molecular evolution an exciting possibility. Here we use full length cDNA alignments comprising more than 700,000 nucleotides from human, mouse, pig and the Japanese pufferfish Fugu...... indicate that a large fraction of these genes may have lost their function quite recently or may still be functional genes in some or all of the three mammalian species. Conclusions We present a comparative analysis of protein coding genes from three major mammalian lineages. Our study demonstrates...... the usefulness of codon-based likelihood models in detecting selection and it illustrates the value of sequencing organisms at different phylogenetic distances for comparative studies....

  11. Metatranscriptomic analysis of small RNAs present in soybean deep sequencing libraries

    Directory of Open Access Journals (Sweden)

    Lorrayne Gomes Molina

    2012-01-01

    Full Text Available A large number of small RNAs unrelated to the soybean genome were identified after deep sequencing of soybean small RNA libraries. A metatranscriptomic analysis was carried out to identify the origin of these sequences. Comparative analyses of small interference RNAs (siRNAs present in samples collected in open areas corresponding to soybean field plantations and samples from soybean cultivated in greenhouses under a controlled environment were made. Different pathogenic, symbiotic and free-living organisms were identified from samples of both growth systems. They included viruses, bacteria and different groups of fungi. This approach can be useful not only to identify potentially unknown pathogens and pests, but also to understand the relations that soybean plants establish with microorganisms that may affect, directly or indirectly, plant health and crop production.

  12. Antibody-based screening for hereditary nonpolyposis colorectal carcinoma compared with microsatellite analysis and sequencing

    DEFF Research Database (Denmark)

    Christensen, Mariann; Katballe, Niels; Wikman, Friedrik

    2002-01-01

    BACKGROUND: Germline mutations in the DNA mismatch repair genes, MSH2, MLH1, and others are associated with hereditary nonpolyposis colorectal cancer (HNPCC). Due to the high costs of sequencing, cheaper screening methods are needed to identify HNPCC cases. Ideally, these methods should have a high...... sensitivity and identify all mutated cases without too many false-positive cases. METHODS: Sequencing was compared with microsatellite analysis and immunohistochemistry to detect the presence or absence of the mismatch repair proteins. In the current study, the authors examined 42 patients with colorectal...... with germ line mutations were detected by either immunohistochemistry or microsatellite instability, indicating that a combination of these methods may be suitable for HNPCC screening. Microsatellite instability and abnormal immunohistoc