large-scale est sequencing: Topics by WorldWideScience.org

Sample records for large-scale est sequencing

Large-scale Identification of Expressed Sequence Tags (ESTs from Nicotianatabacum by Normalized cDNA Library Sequencing

Directory of Open Access Journals (Sweden)

Alvarez S Perez

2014-12-01

Full Text Available An expressed sequence tags (EST resource for tobacco plants (Nicotianatabacum was established using high-throughput sequencing of randomly selected clones from one cDNA library representing a range of plant organs (leaf, stem, root and root base. Over 5000 ESTs were generated from the 3’ ends of 8000 clones, analyzed by BLAST searches and categorized functionally. All annotated ESTs were classified into 18 functional categories, unique transcripts involved in energy were the largest group accounting for 831 (32.32% of the annotated ESTs. After excluding 2450 non-significant tentative unique transcripts (TUTs, 100 unique sequences (1.67% of total TUTs were identified from the N. tabacum database. In the array result two genes strongly related to the tobacco mosaic virus (TMV were obtained, one basic form of pathogenesis-related protein 1 precursor (TBT012G08 and ubiquitin (TBT087G01. Both of them were found in the variety Hongda, some other important genes were classified into two groups, one of these implicated in plant development like those genes related to a photosynthetic process (chlorophyll a-b binding protein, photosystem I, ferredoxin I and III, ATP synthase and a further group including genes related to plant stress response (ubiquitin, ubiquitin-like protein SMT3, glycine-rich RNA binding protein, histones and methallothionein. The interesting finding in this study is that two of these genes have never been reported before in N. tabacum (ubiquitin-like protein SMT3 and methallothionein. The array results were confirmed using quantitative PCR.
Characteristics of the Lotus japonicus gene repertoire deduced from large-scale expressed sequence tag (EST) analysis.

Science.gov (United States)

Asamizu, Erika; Nakamura, Yasukazu; Sato, Shusei; Tabata, Satoshi

2004-02-01

To perform a comprehensive analysis of genes expressed in a model legume, Lotus japonicus, a total of 74472 3'-end expressed sequence tags (EST) were generated from cDNA libraries produced from six different organs. Clustering of sequences was performed with an identity criterion of 95% for 50 bases, and a total of 20457 non-redundant sequences, 8503 contigs and 11954 singletons were generated. EST sequence coverage was analyzed by using the annotated L. japonicus genomic sequence and 1093 of the 1889 predicted protein-encoding genes (57.9%) were hit by the EST sequence(s). Gene content was compared to several plant species. Among the 8503 contigs, 471 were identified as sequences conserved only in leguminous species and these included several disease resistance-related genes. This suggested that in legumes, these genes may have evolved specifically to resist pathogen attack. The rate of gene sequence divergence was assessed by comparing similarity level and functional category based on the Gene Ontology (GO) annotation of Arabidopsis genes. This revealed that genes encoding ribosomal proteins, as well as those related to translation, photosynthesis, and cellular structure were more abundantly represented in the highly conserved class, and that genes encoding transcription factors and receptor protein kinases were abundantly represented in the less conserved class. To make the sequence information and the cDNA clones available to the research community, a Web database with useful services was created at http://www.kazusa.or.jp/en/plant/lotus/EST/.
Generation and analysis of large-scale expressed sequence tags (ESTs from a full-length enriched cDNA library of porcine backfat tissue

Directory of Open Access Journals (Sweden)

Lee Hae-Young

2006-02-01

Full Text Available Abstract Background Genome research in farm animals will expand our basic knowledge of the genetic control of complex traits, and the results will be applied in the livestock industry to improve meat quality and productivity, as well as to reduce the incidence of disease. A combination of quantitative trait locus mapping and microarray analysis is a useful approach to reduce the overall effort needed to identify genes associated with quantitative traits of interest. Results We constructed a full-length enriched cDNA library from porcine backfat tissue. The estimated average size of the cDNA inserts was 1.7 kb, and the cDNA fullness ratio was 70%. In total, we deposited 16,110 high-quality sequences in the dbEST division of GenBank (accession numbers: DT319652-DT335761. For all the expressed sequence tags (ESTs, approximately 10.9 Mb of porcine sequence were generated with an average length of 674 bp per EST (range: 200–952 bp. Clustering and assembly of these ESTs resulted in a total of 5,008 unique sequences with 1,776 contigs (35.46% and 3,232 singleton (65.54% ESTs. From a total of 5,008 unique sequences, 3,154 (62.98% were similar to other sequences, and 1,854 (37.02% were identified as having no hit or low identity (Sus scrofa. Gene ontology (GO annotation of unique sequences showed that approximately 31.7, 32.3, and 30.8% were assigned molecular function, biological process, and cellular component GO terms, respectively. A total of 1,854 putative novel transcripts resulted after comparison and filtering with the TIGR SsGI; these included a large percentage of singletons (80.64% and a small proportion of contigs (13.36%. Conclusion The sequence data generated in this study will provide valuable information for studying expression profiles using EST-based microarrays and assist in the condensation of current pig TCs into clusters representing longer stretches of cDNA sequences. The isolation of genes expressed in backfat tissue is the
A resource of large-scale molecular markers for monitoring Agropyron cristatum chromatin introgression in wheat background based on transcriptome sequences.

Science.gov (United States)

Zhang, Jinpeng; Liu, Weihua; Lu, Yuqing; Liu, Qunxing; Yang, Xinming; Li, Xiuquan; Li, Lihui

2017-09-20

Agropyron cristatum is a wild grass of the tribe Triticeae and serves as a gene donor for wheat improvement. However, very few markers can be used to monitor A. cristatum chromatin introgressions in wheat. Here, we reported a resource of large-scale molecular markers for tracking alien introgressions in wheat based on transcriptome sequences. By aligning A. cristatum unigenes with the Chinese Spring reference genome sequences, we designed 9602 A. cristatum expressed sequence tag-sequence-tagged site (EST-STS) markers for PCR amplification and experimental screening. As a result, 6063 polymorphic EST-STS markers were specific for the A. cristatum P genome in the single-receipt wheat background. A total of 4956 randomly selected polymorphic EST-STS markers were further tested in eight wheat variety backgrounds, and 3070 markers displaying stable and polymorphic amplification were validated. These markers covered more than 98% of the A. cristatum genome, and the marker distribution density was approximately 1.28 cM. An application case of all EST-STS markers was validated on the A. cristatum 6 P chromosome. These markers were successfully applied in the tracking of alien A. cristatum chromatin. Altogether, this study provided a universal method of large-scale molecular marker development to monitor wild relative chromatin in wheat.
EST2Prot: Mapping EST sequences to proteins

Directory of Open Access Journals (Sweden)

Lin David M

2006-03-01

Full Text Available Abstract Background EST libraries are used in various biological studies, from microarray experiments to proteomic and genetic screens. These libraries usually contain many uncharacterized ESTs that are typically ignored since they cannot be mapped to known genes. Consequently, new discoveries are possibly overlooked. Results We describe a system (EST2Prot that uses multiple elements to map EST sequences to their corresponding protein products. EST2Prot uses UniGene clusters, substring analysis, information about protein coding regions in existing DNA sequences and protein database searches to detect protein products related to a query EST sequence. Gene Ontology terms, Swiss-Prot keywords, and protein similarity data are used to map the ESTs to functional descriptors. Conclusion EST2Prot extends and significantly enriches the popular UniGene mapping by utilizing multiple relations between known biological entities. It produces a mapping between ESTs and proteins in real-time through a simple web-interface. The system is part of the Biozon database and is accessible at http://biozon.org/tools/est/.
Diversity analysis in Cannabis sativa based on large-scale development of expressed sequence tag-derived simple sequence repeat markers.

Science.gov (United States)

Gao, Chunsheng; Xin, Pengfei; Cheng, Chaohua; Tang, Qing; Chen, Ping; Wang, Changbiao; Zang, Gonggu; Zhao, Lining

2014-01-01

Cannabis sativa L. is an important economic plant for the production of food, fiber, oils, and intoxicants. However, lack of sufficient simple sequence repeat (SSR) markers has limited the development of cannabis genetic research. Here, large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed to obtain more informative genetic markers, and to assess genetic diversity in cannabis (Cannabis sativa L.). Based on the cannabis transcriptome, 4,577 SSRs were identified from 3,624 ESTs. From there, a total of 3,442 complementary primer pairs were designed as SSR markers. Among these markers, trinucleotide repeat motifs (50.99%) were the most abundant, followed by hexanucleotide (25.13%), dinucleotide (16.34%), tetranucloetide (3.8%), and pentanucleotide (3.74%) repeat motifs, respectively. The AAG/CTT trinucleotide repeat (17.96%) was the most abundant motif detected in the SSRs. One hundred and seventeen EST-SSR markers were randomly selected to evaluate primer quality in 24 cannabis varieties. Among these 117 markers, 108 (92.31%) were successfully amplified and 87 (74.36%) were polymorphic. Forty-five polymorphic primer pairs were selected to evaluate genetic diversity and relatedness among the 115 cannabis genotypes. The results showed that 115 varieties could be divided into 4 groups primarily based on geography: Northern China, Europe, Central China, and Southern China. Moreover, the coefficient of similarity when comparing cannabis from Northern China with the European group cannabis was higher than that when comparing with cannabis from the other two groups, owing to a similar climate. This study outlines the first large-scale development of SSR markers for cannabis. These data may serve as a foundation for the development of genetic linkage, quantitative trait loci mapping, and marker-assisted breeding of cannabis.
ConiferEST: an integrated bioinformatics system for data reprocessing and mining of conifer expressed sequence tags (ESTs).

Science.gov (United States)

Liang, Chun; Wang, Gang; Liu, Lin; Ji, Guoli; Fang, Lin; Liu, Yuansheng; Carter, Kikia; Webb, Jason S; Dean, Jeffrey F D

2007-05-29

With the advent of low-cost, high-throughput sequencing, the amount of public domain Expressed Sequence Tag (EST) sequence data available for both model and non-model organism is growing exponentially. While these data are widely used for characterizing various genomes, they also present a serious challenge for data quality control and validation due to their inherent deficiencies, particularly for species without genome sequences. ConiferEST is an integrated system for data reprocessing, visualization and mining of conifer ESTs. In its current release, Build 1.0, it houses 172,229 loblolly pine EST sequence reads, which were obtained from reprocessing raw DNA sequencer traces using our software--WebTraceMiner. The trace files were downloaded from NCBI Trace Archive. ConiferEST provides biologists unique, easy-to-use data visualization and mining tools for a variety of putative sequence features including cloning vector segments, adapter sequences, restriction endonuclease recognition sites, polyA and polyT runs, and their corresponding Phred quality values. Based on these putative features, verified sequence features such as 3' and/or 5' termini of cDNA inserts in either sense or non-sense strand have been identified in-silico. Interestingly, only 30.03% of the designated 3' ESTs were found to have an authenticated 5' terminus in the non-sense strand (i.e., polyT tails), while fewer than 5.34% of the designated 5' ESTs had a verified 5' terminus in the sense strand. Such previously ignored features provide valuable insight for data quality control and validation of error-prone ESTs, as well as the ability to identify novel functional motifs embedded in large EST datasets. We found that "double-termini adapters" were effective indicators of potential EST chimeras. For all sequences with in-silico verified termini/terminus, we used InterProScan to assign protein domain signatures, results of which are available for in-depth exploration using our biologist
ConiferEST: an integrated bioinformatics system for data reprocessing and mining of conifer expressed sequence tags (ESTs

Directory of Open Access Journals (Sweden)

Carter Kikia

2007-05-01

Full Text Available Abstract Background With the advent of low-cost, high-throughput sequencing, the amount of public domain Expressed Sequence Tag (EST sequence data available for both model and non-model organism is growing exponentially. While these data are widely used for characterizing various genomes, they also present a serious challenge for data quality control and validation due to their inherent deficiencies, particularly for species without genome sequences. Description ConiferEST is an integrated system for data reprocessing, visualization and mining of conifer ESTs. In its current release, Build 1.0, it houses 172,229 loblolly pine EST sequence reads, which were obtained from reprocessing raw DNA sequencer traces using our software – WebTraceMiner. The trace files were downloaded from NCBI Trace Archive. ConiferEST provides biologists unique, easy-to-use data visualization and mining tools for a variety of putative sequence features including cloning vector segments, adapter sequences, restriction endonuclease recognition sites, polyA and polyT runs, and their corresponding Phred quality values. Based on these putative features, verified sequence features such as 3' and/or 5' termini of cDNA inserts in either sense or non-sense strand have been identified in-silico. Interestingly, only 30.03% of the designated 3' ESTs were found to have an authenticated 5' terminus in the non-sense strand (i.e., polyT tails, while fewer than 5.34% of the designated 5' ESTs had a verified 5' terminus in the sense strand. Such previously ignored features provide valuable insight for data quality control and validation of error-prone ESTs, as well as the ability to identify novel functional motifs embedded in large EST datasets. We found that "double-termini adapters" were effective indicators of potential EST chimeras. For all sequences with in-silico verified termini/terminus, we used InterProScan to assign protein domain signatures, results of which are available
Assembly of 500,000 inter-specific catfish expressed sequence tags and large scale gene-associated marker development for whole genome association studies

Energy Technology Data Exchange (ETDEWEB)

Catfish Genome Consortium; Wang, Shaolin; Peatman, Eric; Abernathy, Jason; Waldbieser, Geoff; Lindquist, Erika; Richardson, Paul; Lucas, Susan; Wang, Mei; Li, Ping; Thimmapuram, Jyothi; Liu, Lei; Vullaganti, Deepika; Kucuktas, Huseyin; Murdock, Christopher; Small, Brian C; Wilson, Melanie; Liu, Hong; Jiang, Yanliang; Lee, Yoona; Chen, Fei; Lu, Jianguo; Wang, Wenqi; Xu, Peng; Somridhivej, Benjaporn; Baoprasertkul, Puttharat; Quilang, Jonas; Sha, Zhenxia; Bao, Baolong; Wang, Yaping; Wang, Qun; Takano, Tomokazu; Nandi, Samiran; Liu, Shikai; Wong, Lilian; Kaltenboeck, Ludmilla; Quiniou, Sylvie; Bengten, Eva; Miller, Norman; Trant, John; Rokhsar, Daniel; Liu, Zhanjiang

2010-03-23

Background-Through the Community Sequencing Program, a catfish EST sequencing project was carried out through a collaboration between the catfish research community and the Department of Energy's Joint Genome Institute. Prior to this project, only a limited EST resource from catfish was available for the purpose of SNP identification. Results-A total of 438,321 quality ESTs were generated from 8 channel catfish (Ictalurus punctatus) and 4 blue catfish (Ictalurus furcatus) libraries, bringing the number of catfish ESTs to nearly 500,000. Assembly of all catfish ESTs resulted in 45,306 contigs and 66,272 singletons. Over 35percent of the unique sequences had significant similarities to known genes, allowing the identification of 14,776 unique genes in catfish. Over 300,000 putative SNPs have been identified, of which approximately 48,000 are high-quality SNPs identified from contigs with at least four sequences and the minor allele presence of at least two sequences in the contig. The EST resource should be valuable for identification of microsatellites, genome annotation, large-scale expression analysis, and comparative genome analysis. Conclusions-This project generated a large EST resource for catfish that captured the majority of the catfish transcriptome. The parallel analysis of ESTs from two closely related Ictalurid catfishes should also provide powerful means for the evaluation of ancient and recent gene duplications, and for the development of high-density microarrays in catfish. The inter- and intra-specific SNPs identified from all catfish EST dataset assembly will greatly benefit the catfish introgression breeding program and whole genome association studies.
GenEST, a powerful bidirectional link between cDNA sequence data and gene expression profiles generated by cDNA-AFLP

NARCIS (Netherlands)

Qin Ling,; Prins, P.; Jones, J.T.; Popeijus, H.; Smant, G.; Bakker, J.; Helder, J.

2001-01-01

The release of vast quantities of DNA sequence data by large-scale genome and expressed sequence tag (EST) projects underlines the necessity for the development of efficient and inexpensive ways to link sequence databases with temporal and spatial expression profiles. Here we demonstrate the power
EST-PAC a web package for EST annotation and protein sequence prediction

Directory of Open Access Journals (Sweden)

Strahm Yvan

2006-10-01

Full Text Available Abstract With the decreasing cost of DNA sequencing technology and the vast diversity of biological resources, researchers increasingly face the basic challenge of annotating a larger number of expressed sequences tags (EST from a variety of species. This typically consists of a series of repetitive tasks, which should be automated and easy to use. The results of these annotation tasks need to be stored and organized in a consistent way. All these operations should be self-installing, platform independent, easy to customize and amenable to using distributed bioinformatics resources available on the Internet. In order to address these issues, we present EST-PAC a web oriented multi-platform software package for expressed sequences tag (EST annotation. EST-PAC provides a solution for the administration of EST and protein sequence annotations accessible through a web interface. Three aspects of EST annotation are automated: 1 searching local or remote biological databases for sequence similarities using Blast services, 2 predicting protein coding sequence from EST data and, 3 annotating predicted protein sequences with functional domain predictions. In practice, EST-PAC integrates the BLASTALL suite, EST-Scan2 and HMMER in a relational database system accessible through a simple web interface. EST-PAC also takes advantage of the relational database to allow consistent storage, powerful queries of results and, management of the annotation process. The system allows users to customize annotation strategies and provides an open-source data-management environment for research and education in bioinformatics.
Characterization of the Kenaf (Hibiscus cannabinus) Global Transcriptome Using Illumina Paired-End Sequencing and Development of EST-SSR Markers

Science.gov (United States)

Li, Hui; Li, Defang; Chen, Anguo; Tang, Huijuan; Li, Jianjun; Huang, Siqi

2016-01-01

Kenaf (Hibiscus cannabinus L.) is an economically important natural fiber crop grown worldwide. However, only 20 expressed tag sequences (ESTs) for kenaf are available in public databases. The aim of this study was to develop large-scale simple sequence repeat (SSR) markers to lay a solid foundation for the construction of genetic linkage maps and marker-assisted breeding in kenaf. We used Illumina paired-end sequencing technology to generate new EST-simple sequences and MISA software to mine SSR markers. We identified 71,318 unigenes with an average length of 1143 nt and annotated these unigenes using four different protein databases. Overall, 9324 complementary pairs were designated as EST-SSR markers, and their quality was validated using 100 randomly selected SSR markers. In total, 72 primer pairs reproducibly amplified target amplicons, and 61 of these primer pairs detected significant polymorphism among 28 kenaf accessions. Thus, in this study, we have developed large-scale SSR markers for kenaf, and this new resource will facilitate construction of genetic linkage maps, investigation of fiber growth and development in kenaf, and also be of value to novel gene discovery and functional genomic studies. PMID:26960153
Characterization of the Kenaf (Hibiscus cannabinus) Global Transcriptome Using Illumina Paired-End Sequencing and Development of EST-SSR Markers.

Science.gov (United States)

Li, Hui; Li, Defang; Chen, Anguo; Tang, Huijuan; Li, Jianjun; Huang, Siqi

2016-01-01

Kenaf (Hibiscus cannabinus L.) is an economically important natural fiber crop grown worldwide. However, only 20 expressed tag sequences (ESTs) for kenaf are available in public databases. The aim of this study was to develop large-scale simple sequence repeat (SSR) markers to lay a solid foundation for the construction of genetic linkage maps and marker-assisted breeding in kenaf. We used Illumina paired-end sequencing technology to generate new EST-simple sequences and MISA software to mine SSR markers. We identified 71,318 unigenes with an average length of 1143 nt and annotated these unigenes using four different protein databases. Overall, 9324 complementary pairs were designated as EST-SSR markers, and their quality was validated using 100 randomly selected SSR markers. In total, 72 primer pairs reproducibly amplified target amplicons, and 61 of these primer pairs detected significant polymorphism among 28 kenaf accessions. Thus, in this study, we have developed large-scale SSR markers for kenaf, and this new resource will facilitate construction of genetic linkage maps, investigation of fiber growth and development in kenaf, and also be of value to novel gene discovery and functional genomic studies.
Using relational databases for improved sequence similarity searching and large-scale genomic analyses.

Science.gov (United States)

Mackey, Aaron J; Pearson, William R

2004-10-01

Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.
GarlicESTdb: an online database and mining tool for garlic EST sequences

Directory of Open Access Journals (Sweden)

Choi Sang-Haeng

2009-05-01

Full Text Available Abstract Background Allium sativum., commonly known as garlic, is a species in the onion genus (Allium, which is a large and diverse one containing over 1,250 species. Its close relatives include chives, onion, leek and shallot. Garlic has been used throughout recorded history for culinary, medicinal use and health benefits. Currently, the interest in garlic is highly increasing due to nutritional and pharmaceutical value including high blood pressure and cholesterol, atherosclerosis and cancer. For all that, there are no comprehensive databases available for Expressed Sequence Tags(EST of garlic for gene discovery and future efforts of genome annotation. That is why we developed a new garlic database and applications to enable comprehensive analysis of garlic gene expression. Description GarlicESTdb is an integrated database and mining tool for large-scale garlic (Allium sativum EST sequencing. A total of 21,595 ESTs collected from an in-house cDNA library were used to construct the database. The analysis pipeline is an automated system written in JAVA and consists of the following components: automatic preprocessing of EST reads, assembly of raw sequences, annotation of the assembled sequences, storage of the analyzed information into MySQL databases, and graphic display of all processed data. A web application was implemented with the latest J2EE (Java 2 Platform Enterprise Edition software technology (JSP/EJB/JavaServlet for browsing and querying the database, for creation of dynamic web pages on the client side, and for mapping annotated enzymes to KEGG pathways, the AJAX framework was also used partially. The online resources, such as putative annotation, single nucleotide polymorphisms (SNP and tandem repeat data sets, can be searched by text, explored on the website, searched using BLAST, and downloaded. To archive more significant BLAST results, a curation system was introduced with which biologists can easily edit best-hit annotation
GarlicESTdb: an online database and mining tool for garlic EST sequences.

Science.gov (United States)

Kim, Dae-Won; Jung, Tae-Sung; Nam, Seong-Hyeuk; Kwon, Hyuk-Ryul; Kim, Aeri; Chae, Sung-Hwa; Choi, Sang-Haeng; Kim, Dong-Wook; Kim, Ryong Nam; Park, Hong-Seog

2009-05-18

Allium sativum., commonly known as garlic, is a species in the onion genus (Allium), which is a large and diverse one containing over 1,250 species. Its close relatives include chives, onion, leek and shallot. Garlic has been used throughout recorded history for culinary, medicinal use and health benefits. Currently, the interest in garlic is highly increasing due to nutritional and pharmaceutical value including high blood pressure and cholesterol, atherosclerosis and cancer. For all that, there are no comprehensive databases available for Expressed Sequence Tags(EST) of garlic for gene discovery and future efforts of genome annotation. That is why we developed a new garlic database and applications to enable comprehensive analysis of garlic gene expression. GarlicESTdb is an integrated database and mining tool for large-scale garlic (Allium sativum) EST sequencing. A total of 21,595 ESTs collected from an in-house cDNA library were used to construct the database. The analysis pipeline is an automated system written in JAVA and consists of the following components: automatic preprocessing of EST reads, assembly of raw sequences, annotation of the assembled sequences, storage of the analyzed information into MySQL databases, and graphic display of all processed data. A web application was implemented with the latest J2EE (Java 2 Platform Enterprise Edition) software technology (JSP/EJB/JavaServlet) for browsing and querying the database, for creation of dynamic web pages on the client side, and for mapping annotated enzymes to KEGG pathways, the AJAX framework was also used partially. The online resources, such as putative annotation, single nucleotide polymorphisms (SNP) and tandem repeat data sets, can be searched by text, explored on the website, searched using BLAST, and downloaded. To archive more significant BLAST results, a curation system was introduced with which biologists can easily edit best-hit annotation information for others to view. The Garlic
Improvement of methods for large scale sequencing; application to human Xq28

Energy Technology Data Exchange (ETDEWEB)

Gibbs, R.A.; Andersson, B.; Wentland, M.A. [Baylor College of Medicine, Houston, TX (United States)] [and others

1994-09-01

Sequencing of a one-metabase region of Xq28, spanning the FRAXA and IDS loci has been undertaken in order to investigate the practicality of the shotgun approach for large scale sequencing and as a platform to develop improved methods. The efficiency of several steps in the shotgun sequencing strategy has been increased using PCR-based approaches. An improved method for preparation of M13 libraries has been developed. This protocol combines a previously described adaptor-based protocol with the uracil DNA glycosylase (UDG)-cloning procedure. The efficiency of this procedure has been found to be up to 100-fold higher than that of previously used protocols. In addition the novel protocol is more reliable and thus easy to establish in a laboratory. The method has also been adapted for the simultaneous shotgun sequencing of multiple short fragments by concentrating them before library construction is presented. This protocol is suitable for rapid characterization of cDNA clones. A library was constructed from 15 PCR-amplified and concentrated human cDNA inserts, and the insert sequences could easily be identified as separate contigs during the assembly process and the sequence coverage was even along each fragment. Using this strategy, the fine structures of the FraxA and IDS loci have been revealed and several EST homologies indicating novel expressed sequences have been identified. Use of PCR to close repetitive regions that are difficult to clone was tested by determination of the sequence of a cosmid mapping DXS455 in Xq28, containing a polymorphic VNTR. The region containing the VNTR was not represented in the shotgun library, but by designing PCR primers in the sequences flanking the gap and by cloning and sequencing the PCR product, the fine structure of the VNTR has been determined. It was found to be an AT-rich VNTR with a repeated 25-mer at the center.
Differential representation of sunflower ESTs in enriched organ-specific cDNA libraries in a small scale sequencing project

Directory of Open Access Journals (Sweden)

Heinz Ruth A

2003-09-01

Full Text Available Abstract Background Subtractive hybridization methods are valuable tools for identifying differentially regulated genes in a given tissue avoiding redundant sequencing of clones representing the same expressed genes, maximizing detection of low abundant transcripts and thus, affecting the efficiency and cost effectiveness of small scale cDNA sequencing projects aimed to the specific identification of useful genes for breeding purposes. The objective of this work is to evaluate alternative strategies to high-throughput sequencing projects for the identification of novel genes differentially expressed in sunflower as a source of organ-specific genetic markers that can be functionally associated to important traits. Results Differential organ-specific ESTs were generated from leaf, stem, root and flower bud at two developmental stages (R1 and R4. The use of different sources of RNA as tester and driver cDNA for the construction of differential libraries was evaluated as a tool for detection of rare or low abundant transcripts. Organ-specificity ranged from 75 to 100% of non-redundant sequences in the different cDNA libraries. Sequence redundancy varied according to the target and driver cDNA used in each case. The R4 flower cDNA library was the less redundant library with 62% of unique sequences. Out of a total of 919 sequences that were edited and annotated, 318 were non-redundant sequences. Comparison against sequences in public databases showed that 60% of non-redundant sequences showed significant similarity to known sequences. The number of predicted novel genes varied among the different cDNA libraries, ranging from 56% in the R4 flower to 16 % in the R1 flower bud library. Comparison with sunflower ESTs on public databases showed that 197 of non-redundant sequences (60% did not exhibit significant similarity to previously reported sunflower ESTs. This approach helped to successfully isolate a significant number of new reported sequences
Novel expressed sequence tag- simple sequence repeats (EST ...

African Journals Online (AJOL)

Using different bioinformatic criteria, the SUCEST database was used to mine for simple sequence repeat (SSR) markers. Among 42,189 clusters, 1,425 expressed sequence tag- simple sequence repeats (EST-SSRs) were identified in silico. Trinucleotide repeats were the most abundant SSRs detected. Of 212 primer pairs ...
Gene Discovery in the Apicomplexa as Revealed by EST Sequencing and Assembly of a Comparative Gene Database

Science.gov (United States)

Li, Li; Brunk, Brian P.; Kissinger, Jessica C.; Pape, Deana; Tang, Keliang; Cole, Robert H.; Martin, John; Wylie, Todd; Dante, Mike; Fogarty, Steven J.; Howe, Daniel K.; Liberator, Paul; Diaz, Carmen; Anderson, Jennifer; White, Michael; Jerome, Maria E.; Johnson, Emily A.; Radke, Jay A.; Stoeckert, Christian J.; Waterston, Robert H.; Clifton, Sandra W.; Roos, David S.; Sibley, L. David

2003-01-01

Large-scale EST sequencing projects for several important parasites within the phylum Apicomplexa were undertaken for the purpose of gene discovery. Included were several parasites of medical importance (Plasmodium falciparum, Toxoplasma gondii) and others of veterinary importance (Eimeria tenella, Sarcocystis neurona, and Neospora caninum). A total of 55,192 ESTs, deposited into dbEST/GenBank, were included in the analyses. The resulting sequences have been clustered into nonredundant gene assemblies and deposited into a relational database that supports a variety of sequence and text searches. This database has been used to compare the gene assemblies using BLAST similarity comparisons to the public protein databases to identify putative genes. Of these new entries, ∼15%–20% represent putative homologs with a conservative cutoff of p neurona: , , , , , , , , , , , , , –, –, –, –, –. Eimeria tenella: –, –, –, –, –, –, –, –, – , –, –, –, –, –, –, –, –, –, –, –. Neospora caninum: –, –, , – , –, –.] PMID:12618375

AcEST(EST sequences of Adiantum capillus-veneris and their annotation) - AcEST | LSDB Archive [Life Science Database Archive metadata

Lifescience Database Archive (English)

Full Text Available List Contact us AcEST AcEST(EST sequences of Adiantum capillus-veneris and their annotation) Data detail Dat...a name AcEST(EST sequences of Adiantum capillus-veneris and their annotation) DOI 10.18908/lsdba.nbdc00839-0...01 Description of data contents EST sequence of Adiantum capillus-veneris and its annotation (clone ID, libr...le search URL http://togodb.biosciencedbc.jp/togodb/view/archive_acest#en Data acquisition method Capillary ...ainst UniProtKB/Swiss-Prot and UniProtKB/TrEMBL databases) Number of data entries Adiantum capillus-veneris
Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing.

Science.gov (United States)

Zhao, Shanrong; Prenger, Kurt; Smith, Lance; Messina, Thomas; Fan, Hongtao; Jaeger, Edward; Stephens, Susan

2013-06-27

Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses. Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies. Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of the box. Rainbow is available
Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses.

Science.gov (United States)

Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T

2014-06-01

Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. Copyright © 2014 Elsevier Inc. All rights reserved.
Next-Generation Sequencing of the Chrysanthemum nankingense (Asteraceae) Transcriptome Permits Large-Scale Unigene Assembly and SSR Marker Discovery

Science.gov (United States)

Wang, Haibin; Jiang, Jiafu; Chen, Sumei; Qi, Xiangyu; Peng, Hui; Li, Pirui; Song, Aiping; Guan, Zhiyong; Fang, Weimin; Liao, Yuan; Chen, Fadi

2013-01-01

Background Simple sequence repeats (SSRs) are ubiquitous in eukaryotic genomes. Chrysanthemum is one of the largest genera in the Asteraceae family. Only few Chrysanthemum expressed sequence tag (EST) sequences have been acquired to date, so the number of available EST-SSR markers is very low. Methodology/Principal Findings Illumina paired-end sequencing technology produced over 53 million sequencing reads from C. nankingense mRNA. The subsequent de novo assembly yielded 70,895 unigenes, of which 45,789 (64.59%) unigenes showed similarity to the sequences in NCBI database. Out of 45,789 sequences, 107 have hits to the Chrysanthemum Nr protein database; 679 and 277 sequences have hits to the database of Helianthus and Lactuca species, respectively. MISA software identified a large number of putative EST-SSRs, allowing 1,788 primer pairs to be designed from the de novo transcriptome sequence and a further 363 from archival EST sequence. Among 100 primer pairs randomly chosen, 81 markers have amplicons and 20 are polymorphic for genotypes analysis in Chrysanthemum. The results showed that most (but not all) of the assays were transferable across species and that they exposed a significant amount of allelic diversity. Conclusions/Significance SSR markers acquired by transcriptome sequencing are potentially useful for marker-assisted breeding and genetic analysis in the genus Chrysanthemum and its related genera. PMID:23626799
Large scale identification and categorization of protein sequences using structured logistic regression.

Directory of Open Access Journals (Sweden)

Bjørn P Pedersen

Full Text Available BACKGROUND: Structured Logistic Regression (SLR is a newly developed machine learning tool first proposed in the context of text categorization. Current availability of extensive protein sequence databases calls for an automated method to reliably classify sequences and SLR seems well-suited for this task. The classification of P-type ATPases, a large family of ATP-driven membrane pumps transporting essential cations, was selected as a test-case that would generate important biological information as well as provide a proof-of-concept for the application of SLR to a large scale bioinformatics problem. RESULTS: Using SLR, we have built classifiers to identify and automatically categorize P-type ATPases into one of 11 pre-defined classes. The SLR-classifiers are compared to a Hidden Markov Model approach and shown to be highly accurate and scalable. Representing the bulk of currently known sequences, we analysed 9.3 million sequences in the UniProtKB and attempted to classify a large number of P-type ATPases. To examine the distribution of pumps on organisms, we also applied SLR to 1,123 complete genomes from the Entrez genome database. Finally, we analysed the predicted membrane topology of the identified P-type ATPases. CONCLUSIONS: Using the SLR-based classification tool we are able to run a large scale study of P-type ATPases. This study provides proof-of-concept for the application of SLR to a bioinformatics problem and the analysis of P-type ATPases pinpoints new and interesting targets for further biochemical characterization and structural analysis.
Phylogenetic distribution of large-scale genome patchiness

Directory of Open Access Journals (Sweden)

Hackenberg Michael

2008-04-01

Full Text Available Abstract Background The phylogenetic distribution of large-scale genome structure (i.e. mosaic compositional patchiness has been explored mainly by analytical ultracentrifugation of bulk DNA. However, with the availability of large, good-quality chromosome sequences, and the recently developed computational methods to directly analyze patchiness on the genome sequence, an evolutionary comparative analysis can be carried out at the sequence level. Results The local variations in the scaling exponent of the Detrended Fluctuation Analysis are used here to analyze large-scale genome structure and directly uncover the characteristic scales present in genome sequences. Furthermore, through shuffling experiments of selected genome regions, computationally-identified, isochore-like regions were identified as the biological source for the uncovered large-scale genome structure. The phylogenetic distribution of short- and large-scale patchiness was determined in the best-sequenced genome assemblies from eleven eukaryotic genomes: mammals (Homo sapiens, Pan troglodytes, Mus musculus, Rattus norvegicus, and Canis familiaris, birds (Gallus gallus, fishes (Danio rerio, invertebrates (Drosophila melanogaster and Caenorhabditis elegans, plants (Arabidopsis thaliana and yeasts (Saccharomyces cerevisiae. We found large-scale patchiness of genome structure, associated with in silico determined, isochore-like regions, throughout this wide phylogenetic range. Conclusion Large-scale genome structure is detected by directly analyzing DNA sequences in a wide range of eukaryotic chromosome sequences, from human to yeast. In all these genomes, large-scale patchiness can be associated with the isochore-like regions, as directly detected in silico at the sequence level.
Wheat EST resources for functional genomics of abiotic stress

Directory of Open Access Journals (Sweden)

Links Matthew G

2006-06-01

Full Text Available Abstract Background Wheat is an excellent species to study freezing tolerance and other abiotic stresses. However, the sequence of the wheat genome has not been completely characterized due to its complexity and large size. To circumvent this obstacle and identify genes involved in cold acclimation and associated stresses, a large scale EST sequencing approach was undertaken by the Functional Genomics of Abiotic Stress (FGAS project. Results We generated 73,521 quality-filtered ESTs from eleven cDNA libraries constructed from wheat plants exposed to various abiotic stresses and at different developmental stages. In addition, 196,041 ESTs for which tracefiles were available from the National Science Foundation wheat EST sequencing program and DuPont were also quality-filtered and used in the analysis. Clustering of the combined ESTs with d2_cluster and TGICL yielded a few large clusters containing several thousand ESTs that were refractory to routine clustering techniques. To resolve this problem, the sequence proximity and "bridges" were identified by an e-value distance graph to manually break clusters into smaller groups. Assembly of the resolved ESTs generated a 75,488 unique sequence set (31,580 contigs and 43,908 singletons/singlets. Digital expression analyses indicated that the FGAS dataset is enriched in stress-regulated genes compared to the other public datasets. Over 43% of the unique sequence set was annotated and classified into functional categories according to Gene Ontology. Conclusion We have annotated 29,556 different sequences, an almost 5-fold increase in annotated sequences compared to the available wheat public databases. Digital expression analysis combined with gene annotation helped in the identification of several pathways associated with abiotic stress. The genomic resources and knowledge developed by this project will contribute to a better understanding of the different mechanisms that govern stress tolerance in
Large-Scale Sequencing: The Future of Genomic Sciences Colloquium

Energy Technology Data Exchange (ETDEWEB)

Margaret Riley; Merry Buckley

2009-01-01

Genetic sequencing and the various molecular techniques it has enabled have revolutionized the field of microbiology. Examining and comparing the genetic sequences borne by microbes - including bacteria, archaea, viruses, and microbial eukaryotes - provides researchers insights into the processes microbes carry out, their pathogenic traits, and new ways to use microorganisms in medicine and manufacturing. Until recently, sequencing entire microbial genomes has been laborious and expensive, and the decision to sequence the genome of an organism was made on a case-by-case basis by individual researchers and funding agencies. Now, thanks to new technologies, the cost and effort of sequencing is within reach for even the smallest facilities, and the ability to sequence the genomes of a significant fraction of microbial life may be possible. The availability of numerous microbial genomes will enable unprecedented insights into microbial evolution, function, and physiology. However, the current ad hoc approach to gathering sequence data has resulted in an unbalanced and highly biased sampling of microbial diversity. A well-coordinated, large-scale effort to target the breadth and depth of microbial diversity would result in the greatest impact. The American Academy of Microbiology convened a colloquium to discuss the scientific benefits of engaging in a large-scale, taxonomically-based sequencing project. A group of individuals with expertise in microbiology, genomics, informatics, ecology, and evolution deliberated on the issues inherent in such an effort and generated a set of specific recommendations for how best to proceed. The vast majority of microbes are presently uncultured and, thus, pose significant challenges to such a taxonomically-based approach to sampling genome diversity. However, we have yet to even scratch the surface of the genomic diversity among cultured microbes. A coordinated sequencing effort of cultured organisms is an appropriate place to begin
galaxieEST: addressing EST identity through automated phylogenetic analysis.

Science.gov (United States)

Nilsson, R Henrik; Rajashekar, Balaji; Larsson, Karl-Henrik; Ursing, Björn M

2004-07-05

Research involving expressed sequence tags (ESTs) is intricately coupled to the existence of large, well-annotated sequence repositories. Comparatively complete and satisfactory annotated public sequence libraries are, however, available only for a limited range of organisms, rendering the absence of sequences and gene structure information a tangible problem for those working with taxa lacking an EST or genome sequencing project. Paralogous genes belonging to the same gene family but distinguished by derived characteristics are particularly prone to misidentification and erroneous annotation; high but incomplete levels of sequence similarity are typically difficult to interpret and have formed the basis of many unsubstantiated assumptions of orthology. In these cases, a phylogenetic study of the query sequence together with the most similar sequences in the database may be of great value to the identification process. In order to facilitate this laborious procedure, a project to employ automated phylogenetic analysis in the identification of ESTs was initiated. galaxieEST is an open source Perl-CGI script package designed to complement traditional similarity-based identification of EST sequences through employment of automated phylogenetic analysis. It uses a series of BLAST runs as a sieve to retrieve nucleotide and protein sequences for inclusion in neighbour joining and parsimony analyses; the output includes the BLAST output, the results of the phylogenetic analyses, and the corresponding multiple alignments. galaxieEST is available as an on-line web service for identification of fungal ESTs and for download / local installation for use with any organism group at http://galaxie.cgb.ki.se/galaxieEST.html. By addressing sequence relatedness in addition to similarity, galaxieEST provides an integrative view on EST origin and identity, which may prove particularly useful in cases where similarity searches return one or more pertinent, but not full, matches and
Sequence assembly

DEFF Research Database (Denmark)

Scheibye-Alsing, Karsten; Hoffmann, S.; Frankel, Annett Maria

2009-01-01

Despite the rapidly increasing number of sequenced and re-sequenced genomes, many issues regarding the computational assembly of large-scale sequencing data have remain unresolved. Computational assembly is crucial in large genome projects as well for the evolving high-throughput technologies and...... in genomic DNA, highly expressed genes and alternative transcripts in EST sequences. We summarize existing comparisons of different assemblers and provide a detailed descriptions and directions for download of assembly programs at: http://genome.ku.dk/resources/assembly/methods.html....
SNP mining porcine ESTs with MAVIANT, a novel tool for SNP evaluation and annotation

DEFF Research Database (Denmark)

Panitz, Frank; Stengaard, Henrik; Hornshoj, Henrik

2007-01-01

MOTIVATION: Single nucleotide polymorphisms (SNPs) analysis is an important means to study genetic variation. A fast and cost-efficient approach to identify large numbers of novel candidates is the SNP mining of large scale sequencing projects. The increasing availability of sequence trace data...... manual annotation, which is immediately accessible and can be easily shared with external collaborators. RESULTS: Large-scale SNP mining of polymorphisms bases on porcine EST sequences yielded more than 7900 candidate SNPs in coding regions (cSNPs), which were annotated relative to the human genome. Non...
Rapid in silico cloning of genes using expressed sequence tags (ESTs).

Science.gov (United States)

Gill, R W; Sanseau, P

2000-01-01

Expressed sequence tags (ESTs) are short single-pass DNA sequences obtained from either end of cDNA clones. These ESTs are derived from a vast number of cDNA libraries obtained from different species. Human ESTs are the bulk of the data and have been widely used to identify new members of gene families, as markers on the human chromosomes, to discover polymorphism sites and to compare expression patterns in different tissues or pathologies states. Information strategies have been devised to query EST databases. Since most of the analysis is performed with a computer, the term "in silico" strategy has been coined. In this chapter we will review the current status of EST databases, the pros and cons of EST-type data and describe possible strategies to retrieve meaningful information.
Large-scale chromosome folding versus genomic DNA sequences: A discrete double Fourier transform technique.

Science.gov (United States)

Chechetkin, V R; Lobzin, V V

2017-08-07

Using state-of-the-art techniques combining imaging methods and high-throughput genomic mapping tools leaded to the significant progress in detailing chromosome architecture of various organisms. However, a gap still remains between the rapidly growing structural data on the chromosome folding and the large-scale genome organization. Could a part of information on the chromosome folding be obtained directly from underlying genomic DNA sequences abundantly stored in the databanks? To answer this question, we developed an original discrete double Fourier transform (DDFT). DDFT serves for the detection of large-scale genome regularities associated with domains/units at the different levels of hierarchical chromosome folding. The method is versatile and can be applied to both genomic DNA sequences and corresponding physico-chemical parameters such as base-pairing free energy. The latter characteristic is closely related to the replication and transcription and can also be used for the assessment of temperature or supercoiling effects on the chromosome folding. We tested the method on the genome of E. coli K-12 and found good correspondence with the annotated domains/units established experimentally. As a brief illustration of further abilities of DDFT, the study of large-scale genome organization for bacteriophage PHIX174 and bacterium Caulobacter crescentus was also added. The combined experimental, modeling, and bioinformatic DDFT analysis should yield more complete knowledge on the chromosome architecture and genome organization. Copyright © 2017 Elsevier Ltd. All rights reserved.
JUICE: a data management system that facilitates the analysis of large volumes of information in an EST project workflow.

Science.gov (United States)

Latorre, Mariano; Silva, Herman; Saba, Juan; Guziolowski, Carito; Vizoso, Paula; Martinez, Veronica; Maldonado, Jonathan; Morales, Andrea; Caroca, Rodrigo; Cambiazo, Veronica; Campos-Vargas, Reinaldo; Gonzalez, Mauricio; Orellana, Ariel; Retamales, Julio; Meisel, Lee A

2006-11-23

Expressed sequence tag (EST) analyses provide a rapid and economical means to identify candidate genes that may be involved in a particular biological process. These ESTs are useful in many Functional Genomics studies. However, the large quantity and complexity of the data generated during an EST sequencing project can make the analysis of this information a daunting task. In an attempt to make this task friendlier, we have developed JUICE, an open source data management system (Apache + PHP + MySQL on Linux), which enables the user to easily upload, organize, visualize and search the different types of data generated in an EST project pipeline. In contrast to other systems, the JUICE data management system allows a branched pipeline to be established, modified and expanded, during the course of an EST project. The web interfaces and tools in JUICE enable the users to visualize the information in a graphical, user-friendly manner. The user may browse or search for sequences and/or sequence information within all the branches of the pipeline. The user can search using terms associated with the sequence name, annotation or other characteristics stored in JUICE and associated with sequences or sequence groups. Groups of sequences can be created by the user, stored in a clipboard and/or downloaded for further analyses. Different user profiles restrict the access of each user depending upon their role in the project. The user may have access exclusively to visualize sequence information, access to annotate sequences and sequence information, or administrative access. JUICE is an open source data management system that has been developed to aid users in organizing and analyzing the large amount of data generated in an EST Project workflow. JUICE has been used in one of the first functional genomics projects in Chile, entitled "Functional Genomics in nectarines: Platform to potentiate the competitiveness of Chile in fruit exportation". However, due to its ability to
Discovery and mapping of a new expressed sequence tag-single nucleotide polymorphism and simple sequence repeat panel for large-scale genetic studies and breeding of Theobroma cacao L.

Science.gov (United States)

Allegre, Mathilde; Argout, Xavier; Boccara, Michel; Fouet, Olivier; Roguet, Yolande; Bérard, Aurélie; Thévenin, Jean Marc; Chauveau, Aurélie; Rivallan, Ronan; Clement, Didier; Courtois, Brigitte; Gramacho, Karina; Boland-Augé, Anne; Tahi, Mathias; Umaharan, Pathmanathan; Brunel, Dominique; Lanaud, Claire

2012-01-01

Theobroma cacao is an economically important tree of several tropical countries. Its genetic improvement is essential to provide protection against major diseases and improve chocolate quality. We discovered and mapped new expressed sequence tag-single nucleotide polymorphism (EST-SNP) and simple sequence repeat (SSR) markers and constructed a high-density genetic map. By screening 149 650 ESTs, 5246 SNPs were detected in silico, of which 1536 corresponded to genes with a putative function, while 851 had a clear polymorphic pattern across a collection of genetic resources. In addition, 409 new SSR markers were detected on the Criollo genome. Lastly, 681 new EST-SNPs and 163 new SSRs were added to the pre-existing 418 co-dominant markers to construct a large consensus genetic map. This high-density map and the set of new genetic markers identified in this study are a milestone in cocoa genomics and for marker-assisted breeding. The data are available at http://tropgenedb.cirad.fr. PMID:22210604
Expressed Sequence Tag-Simple Sequence Repeat (EST-SSR Marker Resources for Diversity Analysis of Mango (Mangifera indica L.

Directory of Open Access Journals (Sweden)

Natalie L. Dillon

2014-01-01

Full Text Available In this study, a collection of 24,840 expressed sequence tags (ESTs generated from five mango (Mangifera indica L. cDNA libraries was mined for EST-based simple sequence repeat (SSR markers. Over 1,000 ESTs with SSR motifs were detected from more than 24,000 EST sequences with di- and tri-nucleotide repeat motifs the most abundant. Of these, 25 EST-SSRs in genes involved in plant development, stress response, and fruit color and flavor development pathways were selected, developed into PCR markers and characterized in a population of 32 mango selections including M. indica varieties, and related Mangifera species. Twenty-four of the 25 EST-SSR markers exhibited polymorphisms, identifying a total of 86 alleles with an average of 5.38 alleles per locus, and distinguished between all Mangifera selections. Private alleles were identified for Mangifera species. These newly developed EST-SSR markers enhance the current 11 SSR mango genetic identity panel utilized by the Australian Mango Breeding Program. The current panel has been used to identify progeny and parents for selection and the application of this extended panel will further improve and help to design mango hybridization strategies for increased breeding efficiency.
JUICE: a data management system that facilitates the analysis of large volumes of information in an EST project workflow

Directory of Open Access Journals (Sweden)

Martinez Veronica

2006-11-01

Full Text Available Abstract Background Expressed sequence tag (EST analyses provide a rapid and economical means to identify candidate genes that may be involved in a particular biological process. These ESTs are useful in many Functional Genomics studies. However, the large quantity and complexity of the data generated during an EST sequencing project can make the analysis of this information a daunting task. Results In an attempt to make this task friendlier, we have developed JUICE, an open source data management system (Apache + PHP + MySQL on Linux, which enables the user to easily upload, organize, visualize and search the different types of data generated in an EST project pipeline. In contrast to other systems, the JUICE data management system allows a branched pipeline to be established, modified and expanded, during the course of an EST project. The web interfaces and tools in JUICE enable the users to visualize the information in a graphical, user-friendly manner. The user may browse or search for sequences and/or sequence information within all the branches of the pipeline. The user can search using terms associated with the sequence name, annotation or other characteristics stored in JUICE and associated with sequences or sequence groups. Groups of sequences can be created by the user, stored in a clipboard and/or downloaded for further analyses. Different user profiles restrict the access of each user depending upon their role in the project. The user may have access exclusively to visualize sequence information, access to annotate sequences and sequence information, or administrative access. Conclusion JUICE is an open source data management system that has been developed to aid users in organizing and analyzing the large amount of data generated in an EST Project workflow. JUICE has been used in one of the first functional genomics projects in Chile, entitled "Functional Genomics in nectarines: Platform to potentiate the competitiveness of Chile in
BioPig: a Hadoop-based analytic toolkit for large-scale sequence data.

Science.gov (United States)

Nordberg, Henrik; Bhatia, Karan; Wang, Kai; Wang, Zhong

2013-12-01

The recent revolution in sequencing technologies has led to an exponential growth of sequence data. As a result, most of the current bioinformatics tools become obsolete as they fail to scale with data. To tackle this 'data deluge', here we introduce the BioPig sequence analysis toolkit as one of the solutions that scale to data and computation. We built BioPig on the Apache's Hadoop MapReduce system and the Pig data flow language. Compared with traditional serial and MPI-based algorithms, BioPig has three major advantages: first, BioPig's programmability greatly reduces development time for parallel bioinformatics applications; second, testing BioPig with up to 500 Gb sequences demonstrates that it scales automatically with size of data; and finally, BioPig can be ported without modification on many Hadoop infrastructures, as tested with Magellan system at National Energy Research Scientific Computing Center and the Amazon Elastic Compute Cloud. In summary, BioPig represents a novel program framework with the potential to greatly accelerate data-intensive bioinformatics analysis.
XLID-causing mutations and associated genes challenged in light of data from large-scale human exome sequencing.

Science.gov (United States)

Piton, Amélie; Redin, Claire; Mandel, Jean-Louis

2013-08-08

Because of the unbalanced sex ratio (1.3-1.4 to 1) observed in intellectual disability (ID) and the identification of large ID-affected families showing X-linked segregation, much attention has been focused on the genetics of X-linked ID (XLID). Mutations causing monogenic XLID have now been reported in over 100 genes, most of which are commonly included in XLID diagnostic gene panels. Nonetheless, the boundary between true mutations and rare non-disease-causing variants often remains elusive. The sequencing of a large number of control X chromosomes, required for avoiding false-positive results, was not systematically possible in the past. Such information is now available thanks to large-scale sequencing projects such as the National Heart, Lung, and Blood (NHLBI) Exome Sequencing Project, which provides variation information on 10,563 X chromosomes from the general population. We used this NHLBI cohort to systematically reassess the implication of 106 genes proposed to be involved in monogenic forms of XLID. We particularly question the implication in XLID of ten of them (AGTR2, MAGT1, ZNF674, SRPX2, ATP6AP2, ARHGEF6, NXF5, ZCCHC12, ZNF41, and ZNF81), in which truncating variants or previously published mutations are observed at a relatively high frequency within this cohort. We also highlight 15 other genes (CCDC22, CLIC2, CNKSR2, FRMPD4, HCFC1, IGBP1, KIAA2022, KLF8, MAOA, NAA10, NLGN3, RPL10, SHROOM4, ZDHHC15, and ZNF261) for which replication studies are warranted. We propose that similar reassessment of reported mutations (and genes) with the use of data from large-scale human exome sequencing would be relevant for a wide range of other genetic diseases. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Discovery of candidate disease genes in ENU-induced mouse mutants by large-scale sequencing, including a splice-site mutation in nucleoredoxin.

Directory of Open Access Journals (Sweden)

Melissa K Boles

2009-12-01

Full Text Available An accurate and precisely annotated genome assembly is a fundamental requirement for functional genomic analysis. Here, the complete DNA sequence and gene annotation of mouse Chromosome 11 was used to test the efficacy of large-scale sequencing for mutation identification. We re-sequenced the 14,000 annotated exons and boundaries from over 900 genes in 41 recessive mutant mouse lines that were isolated in an N-ethyl-N-nitrosourea (ENU mutation screen targeted to mouse Chromosome 11. Fifty-nine sequence variants were identified in 55 genes from 31 mutant lines. 39% of the lesions lie in coding sequences and create primarily missense mutations. The other 61% lie in noncoding regions, many of them in highly conserved sequences. A lesion in the perinatal lethal line l11Jus13 alters a consensus splice site of nucleoredoxin (Nxn, inserting 10 amino acids into the resulting protein. We conclude that point mutations can be accurately and sensitively recovered by large-scale sequencing, and that conserved noncoding regions should be included for disease mutation identification. Only seven of the candidate genes we report have been previously targeted by mutation in mice or rats, showing that despite ongoing efforts to functionally annotate genes in the mammalian genome, an enormous gap remains between phenotype and function. Our data show that the classical positional mapping approach of disease mutation identification can be extended to large target regions using high-throughput sequencing.

Genomic resources for water yam (Dioscorea alata L.): analyses of EST-Sequences, De Novo sequencing and GBS libraries

Science.gov (United States)

The reducing cost and rapid progress in next-generation sequencing techniques coupled with high performance computational approaches have resulted in large-scale discovery of advanced genomic resources such as SSRs, SNPs and InDels in several model and non-model plant species. Yam (Dioscorea spp.) i...
Comparison of zero-sequence injection methods in cascaded H-bridge multilevel converters for large-scale photovoltaic integration

DEFF Research Database (Denmark)

Yu, Yifan; Konstantinou, Georgios; Townsend, Christopher David

2017-01-01

to maintain three-phase balanced grid currents with unbalanced power generation. This study theoretically compares power balance capabilities of various zero-sequence injection methods based on two metrics which can be easily generalised for all CHB applications to PV systems. Experimental results based......Photovoltaic (PV) power generation levels in the three phases of a multilevel cascaded H-bridge (CHB) converter can be significantly unbalanced, owing to different irradiance levels and ambient temperatures over a large-scale solar PV power plant. Injection of a zero-sequence voltage is required...... on a 430 V, 10 kW, three-phase, seven-level cascaded H-bridge converter prototype confirm superior performance of the optimal zero-sequence injection technique....
Transcriptome sequencing of mung bean (Vigna radiate L.) genes and the identification of EST-SSR markers.

Science.gov (United States)

Chen, Honglin; Wang, Lixia; Wang, Suhua; Liu, Chunji; Blair, Matthew Wohlgemuth; Cheng, Xuzhen

2015-01-01

Mung bean (Vigna radiate (L.) Wilczek) is an important traditional food legume crop, with high economic and nutritional value. It is widely grown in China and other Asian countries. Despite its importance, genomic information is currently unavailable for this crop plant species or some of its close relatives in the Vigna genus. In this study, more than 103 million high quality cDNA sequence reads were obtained from mung bean using Illumina paired-end sequencing technology. The processed reads were assembled into 48,693 unigenes with an average length of 874 bp. Of these unigenes, 25,820 (53.0%) and 23,235 (47.7%) showed significant similarity to proteins in the NCBI non-redundant protein and nucleotide sequence databases, respectively. Furthermore, 19,242 (39.5%) could be classified into gene ontology categories, 18,316 (37.6%) into Swiss-Prot categories and 10,918 (22.4%) into KOG database categories (E-value SSR), and 2,303 sequences contained more than one SSR together in the same expressed sequence tag (EST). A total of 13,134 EST-SSRs were identified as potential molecular markers, with mono-nucleotide A/T repeats being the most abundant motif class and G/C repeats being rare. In this SSR analysis, we found five main repeat motifs: AG/CT (30.8%), GAA/TTC (12.6%), AAAT/ATTT (6.8%), AAAAT/ATTTT (6.2%) and AAAAAT/ATTTTT (1.9%). A total of 200 SSR loci were randomly selected for validation by PCR amplification as EST-SSR markers. Of these, 66 marker primer pairs produced reproducible amplicons that were polymorphic among 31 mung bean accessions selected from diverse geographical locations. The large number of SSR-containing sequences found in this study will be valuable for the construction of a high-resolution genetic linkage maps, association or comparative mapping and genetic analyses of various Vigna species.
Large-scale identification of odorant-binding proteins and chemosensory proteins from expressed sequence tags in insects

Science.gov (United States)

2009-01-01

Background Insect odorant binding proteins (OBPs) and chemosensory proteins (CSPs) play an important role in chemical communication of insects. Gene discovery of these proteins is a time-consuming task. In recent years, expressed sequence tags (ESTs) of many insect species have accumulated, thus providing a useful resource for gene discovery. Results We have developed a computational pipeline to identify OBP and CSP genes from insect ESTs. In total, 752,841 insect ESTs were examined from 54 species covering eight Orders of Insecta. From these ESTs, 142 OBPs and 177 CSPs were identified, of which 117 OBPs and 129 CSPs are new. The complete open reading frames (ORFs) of 88 OBPs and 123 CSPs were obtained by electronic elongation. We randomly chose 26 OBPs from eight species of insects, and 21 CSPs from four species for RT-PCR validation. Twenty two OBPs and 16 CSPs were confirmed by RT-PCR, proving the efficiency and reliability of the algorithm. Together with all family members obtained from the NCBI (OBPs) or the UniProtKB (CSPs), 850 OBPs and 237 CSPs were analyzed for their structural characteristics and evolutionary relationship. Conclusions A large number of new OBPs and CSPs were found, providing the basis for deeper understanding of these proteins. In addition, the conserved motif and evolutionary analysis provide some new insights into the evolution of insect OBPs and CSPs. Motif pattern fine-tune the functions of OBPs and CSPs, leading to the minor difference in binding sex pheromone or plant volatiles in different insect Orders. PMID:20034407
Transcriptome sequencing of lentil based on second-generation technology permits large-scale unigene assembly and SSR marker discovery

Directory of Open Access Journals (Sweden)

Materne Michael

2011-05-01

Full Text Available Abstract Background Lentil (Lens culinaris Medik. is a cool-season grain legume which provides a rich source of protein for human consumption. In terms of genomic resources, lentil is relatively underdeveloped, in comparison to other Fabaceae species, with limited available data. There is hence a significant need to enhance such resources in order to identify novel genes and alleles for molecular breeding to increase crop productivity and quality. Results Tissue-specific cDNA samples from six distinct lentil genotypes were sequenced using Roche 454 GS-FLX Titanium technology, generating c. 1.38 × 106 expressed sequence tags (ESTs. De novo assembly generated a total of 15,354 contigs and 68,715 singletons. The complete unigene set was sequence-analysed against genome drafts of the model legume species Medicago truncatula and Arabidopsis thaliana to identify 12,639, and 7,476 unique matches, respectively. When compared to the genome of Glycine max, a total of 20,419 unique hits were observed corresponding to c. 31% of the known gene space. A total of 25,592 lentil unigenes were subsequently annoated from GenBank. Simple sequence repeat (SSR-containing ESTs were identified from consensus sequences and a total of 2,393 primer pairs were designed. A subset of 192 EST-SSR markers was screened for validation across a panel 12 cultivated lentil genotypes and one wild relative species. A total of 166 primer pairs obtained successful amplification, of which 47.5% detected genetic polymorphism. Conclusions A substantial collection of ESTs has been developed from sequence analysis of lentil genotypes using second-generation technology, permitting unigene definition across a broad range of functional categories. As well as providing resources for functional genomics studies, the unigene set has permitted significant enhancement of the number of publicly-available molecular genetic markers as tools for improvement of this species.
Salmon louse (Lepeophtheirus salmonis transcriptomes during post molting maturation and egg production, revealed using EST-sequencing and microarray analysis

Directory of Open Access Journals (Sweden)

Jonassen Inge

2008-03-01

Full Text Available Abstract Background Lepeophtheirus salmonis is an ectoparasitic copepod feeding on skin, mucus and blood from salmonid hosts. Initial analysis of EST sequences from pre adult and adult stages of L. salmonis revealed a large proportion of novel transcripts. In order to link unknown transcripts to biological functions we have combined EST sequencing and microarray analysis to characterize female salmon louse transcriptomes during post molting maturation and egg production. Results EST sequence analysis shows that 43% of the ESTs have no significant hits in GenBank. Sequenced ESTs assembled into 556 contigs and 1614 singletons and whenever homologous genes were identified no clear correlation with homologous genes from any specific animal group was evident. Sequence comparison of 27 L. salmonis proteins with homologous proteins in humans, zebrafish, insects and crustaceans revealed an almost identical sequence identity with all species. Microarray analysis of maturing female adult salmon lice revealed two major transcription patterns; up-regulation during the final molting followed by down regulation and female specific up regulation during post molting growth and egg production. For a third minor group of ESTs transcription decreased during molting from pre-adult II to immature adults. Genes regulated during molting typically gave hits with cuticula proteins whilst transcripts up regulated during post molting growth were female specific, including two vitellogenins. Conclusion The copepod L.salmonis contains high a level of novel genes. Among analyzed L.salmonis proteins, sequence identities with homologous proteins in crustaceans are no higher than to homologous proteins in humans. Three distinct processes, molting, post molting growth and egg production correlate with transcriptional regulation of three groups of transcripts; two including genes related to growth, one including genes related to egg production. The function of the regulated
An Ambystoma mexicanum EST sequencing project: analysis of 17,352 expressed sequence tags from embryonic and regenerating blastema cDNA libraries

Science.gov (United States)

Habermann, Bianca; Bebin, Anne-Gaelle; Herklotz, Stephan; Volkmer, Michael; Eckelt, Kay; Pehlke, Kerstin; Epperlein, Hans Henning; Schackert, Hans Konrad; Wiebe, Glenis; Tanaka, Elly M

2004-01-01

Background The ambystomatid salamander, Ambystoma mexicanum (axolotl), is an important model organism in evolutionary and regeneration research but relatively little sequence information has so far been available. This is a major limitation for molecular studies on caudate development, regeneration and evolution. To address this lack of sequence information we have generated an expressed sequence tag (EST) database for A. mexicanum. Results Two cDNA libraries, one made from stage 18-22 embryos and the other from day-6 regenerating tail blastemas, generated 17,352 sequences. From the sequenced ESTs, 6,377 contigs were assembled that probably represent 25% of the expressed genes in this organism. Sequence comparison revealed significant homology to entries in the NCBI non-redundant database. Further examination of this gene set revealed the presence of genes involved in important cell and developmental processes, including cell proliferation, cell differentiation and cell-cell communication. On the basis of these data, we have performed phylogenetic analysis of key cell-cycle regulators. Interestingly, while cell-cycle proteins such as the cyclin B family display expected evolutionary relationships, the cyclin-dependent kinase inhibitor 1 gene family shows an unusual evolutionary behavior among the amphibians. Conclusions Our analysis reveals the importance of a comprehensive sequence set from a representative of the Caudata and illustrates that the EST sequence database is a rich source of molecular, developmental and regeneration studies. To aid in data mining, the ESTs have been organized into an easily searchable database that is freely available online. PMID:15345051
VESPA: Very large-scale Evolutionary and Selective Pressure Analyses

Directory of Open Access Journals (Sweden)

Andrew E. Webb

2017-06-01

Full Text Available Background Large-scale molecular evolutionary analyses of protein coding sequences requires a number of preparatory inter-related steps from finding gene families, to generating alignments and phylogenetic trees and assessing selective pressure variation. Each phase of these analyses can represent significant challenges, particularly when working with entire proteomes (all protein coding sequences in a genome from a large number of species. Methods We present VESPA, software capable of automating a selective pressure analysis using codeML in addition to the preparatory analyses and summary statistics. VESPA is written in python and Perl and is designed to run within a UNIX environment. Results We have benchmarked VESPA and our results show that the method is consistent, performs well on both large scale and smaller scale datasets, and produces results in line with previously published datasets. Discussion Large-scale gene family identification, sequence alignment, and phylogeny reconstruction are all important aspects of large-scale molecular evolutionary analyses. VESPA provides flexible software for simplifying these processes along with downstream selective pressure variation analyses. The software automatically interprets results from codeML and produces simplified summary files to assist the user in better understanding the results. VESPA may be found at the following website: http://www.mol-evol.org/VESPA.
Flavonoid Biosynthesis Genes Putatively Identified in the Aromatic Plant Polygonum minus via Expressed Sequences Tag (EST Analysis

Directory of Open Access Journals (Sweden)

Zamri Zainal

2012-02-01

Full Text Available P. minus is an aromatic plant, the leaf of which is widely used as a food additive and in the perfume industry. The leaf also accumulates secondary metabolites that act as active ingredients such as flavonoid. Due to limited genomic and transcriptomic data, the biosynthetic pathway of flavonoids is currently unclear. Identification of candidate genes involved in the flavonoid biosynthetic pathway will significantly contribute to understanding the biosynthesis of active compounds. We have constructed a standard cDNA library from P. minus leaves, and two normalized full-length enriched cDNA libraries were constructed from stem and root organs in order to create a gene resource for the biosynthesis of secondary metabolites, especially flavonoid biosynthesis. Thus, large‑scale sequencing of P. minus cDNA libraries identified 4196 expressed sequences tags (ESTs which were deposited in dbEST in the National Center of Biotechnology Information (NCBI. From the three constructed cDNA libraries, 11 ESTs encoding seven genes were mapped to the flavonoid biosynthetic pathway. Finally, three flavonoid biosynthetic pathway-related ESTs chalcone synthase, CHS (JG745304, flavonol synthase, FLS (JG705819 and leucoanthocyanidin dioxygenase, LDOX (JG745247 were selected for further examination by quantitative RT-PCR (qRT-PCR in different P. minus organs. Expression was detected in leaf, stem and root. Gene expression studies have been initiated in order to better understand the underlying physiological processes.
Genomic divergences among cattle, dog and human estimated from large-scale alignments of genomic sequences

Directory of Open Access Journals (Sweden)

Shade Larry L

2006-06-01

Full Text Available Abstract Background Approximately 11 Mb of finished high quality genomic sequences were sampled from cattle, dog and human to estimate genomic divergences and their regional variation among these lineages. Results Optimal three-way multi-species global sequence alignments for 84 cattle clones or loci (each >50 kb of genomic sequence were constructed using the human and dog genome assemblies as references. Genomic divergences and substitution rates were examined for each clone and for various sequence classes under different functional constraints. Analysis of these alignments revealed that the overall genomic divergences are relatively constant (0.32–0.37 change/site for pairwise comparisons among cattle, dog and human; however substitution rates vary across genomic regions and among different sequence classes. A neutral mutation rate (2.0–2.2 × 10(-9 change/site/year was derived from ancestral repetitive sequences, whereas the substitution rate in coding sequences (1.1 × 10(-9 change/site/year was approximately half of the overall rate (1.9–2.0 × 10(-9 change/site/year. Relative rate tests also indicated that cattle have a significantly faster rate of substitution as compared to dog and that this difference is about 6%. Conclusion This analysis provides a large-scale and unbiased assessment of genomic divergences and regional variation of substitution rates among cattle, dog and human. It is expected that these data will serve as a baseline for future mammalian molecular evolution studies.
Fast selection of miRNA candidates based on large-scale pre-computed MFE sets of randomized sequences.

Science.gov (United States)

Warris, Sven; Boymans, Sander; Muiser, Iwe; Noback, Michiel; Krijnen, Wim; Nap, Jan-Peter

2014-01-13

Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification.
TIMPs of parasitic helminths - a large-scale analysis of high-throughput sequence datasets.

Science.gov (United States)

Cantacessi, Cinzia; Hofmann, Andreas; Pickering, Darren; Navarro, Severine; Mitreva, Makedonka; Loukas, Alex

2013-05-30

Tissue inhibitors of metalloproteases (TIMPs) are a multifunctional family of proteins that orchestrate extracellular matrix turnover, tissue remodelling and other cellular processes. In parasitic helminths, such as hookworms, TIMPs have been proposed to play key roles in the host-parasite interplay, including invasion of and establishment in the vertebrate animal hosts. Currently, knowledge of helminth TIMPs is limited to a small number of studies on canine hookworms, whereas no information is available on the occurrence of TIMPs in other parasitic helminths causing neglected diseases. In the present study, we conducted a large-scale investigation of TIMP proteins of a range of neglected human parasites including the hookworm Necator americanus, the roundworm Ascaris suum, the liver flukes Clonorchis sinensis and Opisthorchis viverrini, as well as the schistosome blood flukes. This entailed mining available transcriptomic and/or genomic sequence datasets for the presence of homologues of known TIMPs, predicting secondary structures of defined protein sequences, systematic phylogenetic analyses and assessment of differential expression of genes encoding putative TIMPs in the developmental stages of A. suum, N. americanus and Schistosoma haematobium which infect the mammalian hosts. A total of 15 protein sequences with high homology to known eukaryotic TIMPs were predicted from the complement of sequence data available for parasitic helminths and subjected to in-depth bioinformatic analyses. Supported by the availability of gene manipulation technologies such as RNA interference and/or transgenesis, this work provides a basis for future functional explorations of helminth TIMPs and, in particular, of their role/s in fundamental biological pathways linked to long-term establishment in the vertebrate hosts, with a view towards the development of novel approaches for the control of neglected helminthiases.
Preparing and Analyzing Expressed Sequence Tags (ESTs Library for the Mammary Tissue of Local Turkish Kivircik Sheep

Directory of Open Access Journals (Sweden)

Nehir Ozdemir Ozgenturk

2017-01-01

Full Text Available Kivircik sheep is an important local Turkish sheep according to its meat quality and milk productivity. The aim of this study was to analyze gene expression profiles of both prenatal and postnatal stages for the Kivircik sheep. Therefore, two different cDNA libraries, which were taken from the same Kivircik sheep mammary gland tissue at prenatal and postnatal stages, were constructed. Total 3072 colonies which were randomly selected from the two libraries were sequenced for developing a sheep ESTs collection. We used Phred/Phrap computer programs for analysis of the raw EST and readable EST sequences were assembled with the CAP3 software. Putative functions of all unique sequences and statistical analysis were determined by Geneious software. Total 422 ESTs have over 80% similarity to known sequences of other organisms in NCBI classified by Panther database for the Gene Ontology (GO category. By comparing gene expression profiles, we observed some putative genes that may be relative to reproductive performance or play important roles in milk synthesis and secretion. A total of 2414 ESTs have been deposited to the NCBI GenBank database (GW996847–GW999260. EST data in this study have provided a new source of information to functional genome studies of sheep.
Characterization and Development of EST-SSRs by Deep Transcriptome Sequencing in Chinese Cabbage (Brassica rapa L. ssp. pekinensis

Directory of Open Access Journals (Sweden)

Qian Ding

2015-01-01

Full Text Available Simple sequence repeats (SSRs are among the most important markers for population analysis and have been widely used in plant genetic mapping and molecular breeding. Expressed sequence tag-SSR (EST-SSR markers, located in the coding regions, are potentially more efficient for QTL mapping, gene targeting, and marker-assisted breeding. In this study, we investigated 51,694 nonredundant unigenes, assembled from clean reads from deep transcriptome sequencing with a Solexa/Illumina platform, for identification and development of EST-SSRs in Chinese cabbage. In total, 10,420 EST-SSRs with over 12 bp were identified and characterized, among which 2744 EST-SSRs are new and 2317 are known ones showing polymorphism with previously reported SSRs. A total of 7877 PCR primer pairs for 1561 EST-SSR loci were designed, and primer pairs for twenty-four EST-SSRs were selected for primer evaluation. In nineteen EST-SSR loci (79.2%, amplicons were successfully generated with high quality. Seventeen (89.5% showed polymorphism in twenty-four cultivars of Chinese cabbage. The polymorphic alleles of each polymorphic locus were sequenced, and the results showed that most polymorphisms were due to variations of SSR repeat motifs. The EST-SSRs identified and characterized in this study have important implications for developing new tools for genetics and molecular breeding in Chinese cabbage.
Generation and Analysis of Expressed Sequence Tags (ESTs from Halophyte Atriplex canescens to Explore Salt-Responsive Related Genes

Directory of Open Access Journals (Sweden)

Jingtao Li

2014-06-01

Full Text Available Little information is available on gene expression profiling of halophyte A. canescens. To elucidate the molecular mechanism for stress tolerance in A. canescens, a full-length complementary DNA library was generated from A. canescens exposed to 400 mM NaCl, and provided 343 high-quality ESTs. In an evaluation of 343 valid EST sequences in the cDNA library, 197 unigenes were assembled, among which 190 unigenes (83.1% ESTs were identified according to their significant similarities with proteins of known functions. All the 343 EST sequences have been deposited in the dbEST GenBank under accession numbers JZ535802 to JZ536144. According to Arabidopsis MIPS functional category and GO classifications, we identified 193 unigenes of the 311 annotations EST, representing 72 non-redundant unigenes sharing similarities with genes related to the defense response. The sets of ESTs obtained provide a rich genetic resource and 17 up-regulated genes related to salt stress resistance were identified by qRT-PCR. Six of these genes may contribute crucially to earlier and later stage salt stress resistance. Additionally, among the 343 unigenes sequences, 22 simple sequence repeats (SSRs were also identified contributing to the study of A. canescens resources.
EST Express: PHP/MySQL based automated annotation of ESTs from expression libraries.

Science.gov (United States)

Smith, Robin P; Buchser, William J; Lemmon, Marcus B; Pardinas, Jose R; Bixby, John L; Lemmon, Vance P

2008-04-10

Several biological techniques result in the acquisition of functional sets of cDNAs that must be sequenced and analyzed. The emergence of redundant databases such as UniGene and centralized annotation engines such as Entrez Gene has allowed the development of software that can analyze a great number of sequences in a matter of seconds. We have developed "EST Express", a suite of analytical tools that identify and annotate ESTs originating from specific mRNA populations. The software consists of a user-friendly GUI powered by PHP and MySQL that allows for online collaboration between researchers and continuity with UniGene, Entrez Gene and RefSeq. Two key features of the software include a novel, simplified Entrez Gene parser and tools to manage cDNA library sequencing projects. We have tested the software on a large data set (2,016 samples) produced by subtractive hybridization. EST Express is an open-source, cross-platform web server application that imports sequences from cDNA libraries, such as those generated through subtractive hybridization or yeast two-hybrid screens. It then provides several layers of annotation based on Entrez Gene and RefSeq to allow the user to highlight useful genes and manage cDNA library projects.
EST Express: PHP/MySQL based automated annotation of ESTs from expression libraries

Directory of Open Access Journals (Sweden)

Pardinas Jose R

2008-04-01

Full Text Available Abstract Background Several biological techniques result in the acquisition of functional sets of cDNAs that must be sequenced and analyzed. The emergence of redundant databases such as UniGene and centralized annotation engines such as Entrez Gene has allowed the development of software that can analyze a great number of sequences in a matter of seconds. Results We have developed "EST Express", a suite of analytical tools that identify and annotate ESTs originating from specific mRNA populations. The software consists of a user-friendly GUI powered by PHP and MySQL that allows for online collaboration between researchers and continuity with UniGene, Entrez Gene and RefSeq. Two key features of the software include a novel, simplified Entrez Gene parser and tools to manage cDNA library sequencing projects. We have tested the software on a large data set (2,016 samples produced by subtractive hybridization. Conclusion EST Express is an open-source, cross-platform web server application that imports sequences from cDNA libraries, such as those generated through subtractive hybridization or yeast two-hybrid screens. It then provides several layers of annotation based on Entrez Gene and RefSeq to allow the user to highlight useful genes and manage cDNA library projects.
[EST-SSR identification, markers development of Ligusticum chuanxiong based on Ligusticum chuanxiong transcriptome sequences].

Science.gov (United States)

Yuan, Can; Peng, Fang; Yang, Ze-Mao; Zhong, Wen-Juan; Mou, Fang-Sheng; Gong, Yi-Yun; Ji, Pei-Cheng; Pu, De-Qiang; Huang, Hai-Yan; Yang, Xiao; Zhang, Chao

2017-09-01

Ligusticum chuanxiong is a well-known traditional Chinese medicine plant. The study on its molecular markers development and germplasm resources is very important. In this study, we obtained 24 422 unigenes by assembling transcriptome sequencing reads of L. chuanxiong root. EST-SSR was detected and 4 073 SSR loci were identified. EST-SSR distribution and characteristic analysis results showed that the mono-nucleotide repeats were the main repeat types, accounting for 41.0%. In addition, the sequences containing SSR were functionally annotated in Gene Ontology (GO) and KEGG pathway and were assigned to 49 GO categories, 242 KEGG pathways, among them 2 201 sequences were annotated against Nr database. By validating 235 EST-SSRs,74 primer pairs were ultimately proved to have high quality amplification. Subsequently, genetic diversity analysis, UPGMA cluster analysis, PCoA analysis and population structure analysis of 34 L. chuanxiong germplasm resources were carried out with 74 primer pairs. In both UPGMA tree and PCoA results, L. chuanxiong resources were clustered into two groups, which are believed to be partial related to their geographical distribution. In this study, EST-SSRs in L. chuanxiong was firstly identified, and newly developed molecular markers would contribute significantly to further genetic diversity study, the purity detection, gene mapping, and molecular breeding. Copyright© by the Chinese Pharmaceutical Association.
Simultaneous identification of long similar substrings in large sets of sequences

Directory of Open Access Journals (Sweden)

Wittig Burghardt

2007-05-01

Full Text Available Abstract Background Sequence comparison faces new challenges today, with many complete genomes and large libraries of transcripts known. Gene annotation pipelines match these sequences in order to identify genes and their alternative splice forms. However, the software currently available cannot simultaneously compare sets of sequences as large as necessary especially if errors must be considered. Results We therefore present a new algorithm for the identification of almost perfectly matching substrings in very large sets of sequences. Its implementation, called ClustDB, is considerably faster and can handle 16 times more data than VMATCH, the most memory efficient exact program known today. ClustDB simultaneously generates large sets of exactly matching substrings of a given minimum length as seeds for a novel method of match extension with errors. It generates alignments of maximum length with a considered maximum number of errors within each overlapping window of a given size. Such alignments are not optimal in the usual sense but faster to calculate and often more appropriate than traditional alignments for genomic sequence comparisons, EST and full-length cDNA matching, and genomic sequence assembly. The method is used to check the overlaps and to reveal possible assembly errors for 1377 Medicago truncatula BAC-size sequences published at http://www.medicago.org/genome/assembly_table.php?chr=1. Conclusion The program ClustDB proves that window alignment is an efficient way to find long sequence sections of homogenous alignment quality, as expected in case of random errors, and to detect systematic errors resulting from sequence contaminations. Such inserts are systematically overlooked in long alignments controlled by only tuning penalties for mismatches and gaps. ClustDB is freely available for academic use.
Large-scale analysis of peptide sequence variants: the case for high-field asymmetric waveform ion mobility spectrometry.

Science.gov (United States)

Creese, Andrew J; Smart, Jade; Cooper, Helen J

2013-05-21

Large scale analysis of proteins by mass spectrometry is becoming increasingly routine; however, the presence of peptide isomers remains a significant challenge for both identification and quantitation in proteomics. Classes of isomers include sequence inversions, structural isomers, and localization variants. In many cases, liquid chromatography is inadequate for separation of peptide isomers. The resulting tandem mass spectra are composite, containing fragments from multiple precursor ions. The benefits of high-field asymmetric waveform ion mobility spectrometry (FAIMS) for proteomics have been demonstrated by a number of groups, but previously work has focused on extending proteome coverage generally. Here, we present a systematic study of the benefits of FAIMS for a key challenge in proteomics, that of peptide isomers. We have applied FAIMS to the analysis of a phosphopeptide library comprising the sequences GPSGXVpSXAQLX(K/R) and SXPFKXpSPLXFG(K/R), where X = ADEFGLSTVY. The library has defined limits enabling us to make valid conclusions regarding FAIMS performance. The library contains numerous sequence inversions and structural isomers. In addition, there are large numbers of theoretical localization variants, allowing false localization rates to be determined. The FAIMS approach is compared with reversed-phase liquid chromatography and strong cation exchange chromatography. The FAIMS approach identified 35% of the peptide library, whereas LC-MS/MS alone identified 8% and LC-MS/MS with strong cation exchange chromatography prefractionation identified 17.3% of the library.

ESAP plus: a web-based server for EST-SSR marker development.

Science.gov (United States)

Ponyared, Piyarat; Ponsawat, Jiradej; Tongsima, Sissades; Seresangtakul, Pusadee; Akkasaeng, Chutipong; Tantisuwichwong, Nathpapat

2016-12-22

Simple sequence repeats (SSRs) have become widely used as molecular markers in plant genetic studies due to their abundance, high allelic variation at each locus and simplicity to analyze using conventional PCR amplification. To study plants with unknown genome sequence, SSR markers from Expressed Sequence Tags (ESTs), which can be obtained from the plant mRNA (converted to cDNA), must be utilized. With the advent of high-throughput sequencing technology, huge EST sequence data have been generated and are now accessible from many public databases. However, SSR marker identification from a large in-house or public EST collection requires a computational pipeline that makes use of several standard bioinformatic tools to design high quality EST-SSR primers. Some of these computational tools are not users friendly and must be tightly integrated with reference genomic databases. A web-based bioinformatic pipeline, called EST Analysis Pipeline Plus (ESAP Plus), was constructed for assisting researchers to develop SSR markers from a large EST collection. ESAP Plus incorporates several bioinformatic scripts and some useful standard software tools necessary for the four main procedures of EST-SSR marker development, namely 1) pre-processing, 2) clustering and assembly, 3) SSR mining and 4) SSR primer design. The proposed pipeline also provides two alternative steps for reducing EST redundancy and identifying SSR loci. Using public sugarcane ESTs, ESAP Plus automatically executed the aforementioned computational pipeline via a simple web user interface, which was implemented using standard PHP, HTML, CSS and Java scripts. With ESAP Plus, users can upload raw EST data and choose various filtering options and parameters to analyze each of the four main procedures through this web interface. All input EST data and their predicted SSR results will be stored in the ESAP Plus MySQL database. Users will be notified via e-mail when the automatic process is completed and they can
Large-scale analysis of intrinsic disorder flavors and associated functions in the protein sequence universe.

Science.gov (United States)

Necci, Marco; Piovesan, Damiano; Tosatto, Silvio C E

2016-12-01

Intrinsic disorder (ID) in proteins has been extensively described for the last decade; a large-scale classification of ID in proteins is mostly missing. Here, we provide an extensive analysis of ID in the protein universe on the UniProt database derived from sequence-based predictions in MobiDB. Almost half the sequences contain an ID region of at least five residues. About 9% of proteins have a long ID region of over 20 residues which are more abundant in Eukaryotic organisms and most frequently cover less than 20% of the sequence. A small subset of about 67,000 (out of over 80 million) proteins is fully disordered and mostly found in Viruses. Most proteins have only one ID, with short ID evenly distributed along the sequence and long ID overrepresented in the center. The charged residue composition of Das and Pappu was used to classify ID proteins by structural propensities and corresponding functional enrichment. Swollen Coils seem to be used mainly as structural components and in biosynthesis in both Prokaryotes and Eukaryotes. In Bacteria, they are confined in the nucleoid and in Viruses provide DNA binding function. Coils & Hairpins seem to be specialized in ribosome binding and methylation activities. Globules & Tadpoles bind antigens in Eukaryotes but are involved in killing other organisms and cytolysis in Bacteria. The Undefined class is used by Bacteria to bind toxic substances and mediate transport and movement between and within organisms in Viruses. Fully disordered proteins behave similarly, but are enriched for glycine residues and extracellular structures. © 2016 The Protein Society.
A large scale analysis of cDNA in Arabidopsis thaliana: generation of 12,028 non-redundant expressed sequence tags from normalized and size-selected cDNA libraries.

Science.gov (United States)

Asamizu, E; Nakamura, Y; Sato, S; Tabata, S

2000-06-30

For comprehensive analysis of genes expressed in the model dicotyledonous plant, Arabidopsis thaliana, expressed sequence tags (ESTs) were accumulated. Normalized and size-selected cDNA libraries were constructed from aboveground organs, flower buds, roots, green siliques and liquid-cultured seedlings, respectively, and a total of 14,026 5'-end ESTs and 39,207 3'-end ESTs were obtained. The 3'-end ESTs could be clustered into 12,028 non-redundant groups. Similarity search of the non-redundant ESTs against the public non-redundant protein database indicated that 4816 groups show similarity to genes of known function, 1864 to hypothetical genes, and the remaining 5348 are novel sequences. Gene coverage by the non-redundant ESTs was analyzed using the annotated genomic sequences of approximately 10 Mb on chromosomes 3 and 5. A total of 923 regions were hit by at least one EST, among which only 499 regions were hit by the ESTs deposited in the public database. The result indicates that the EST source generated in this project complements the EST data in the public database and facilitates new gene discovery.
Gene discovery from Jatropha curcas by sequencing of ESTs from normalized and full-length enriched cDNA library from developing seeds

Directory of Open Access Journals (Sweden)

Sugantham Priyanka Annabel

2010-10-01

Full Text Available Abstract Background Jatropha curcas L. is promoted as an important non-edible biodiesel crop worldwide. Jatropha oil, which is a triacylglycerol, can be directly blended with petro-diesel or transesterified with methanol and used as biodiesel. Genetic improvement in jatropha is needed to increase the seed yield, oil content, drought and pest resistance, and to modify oil composition so that it becomes a technically and economically preferred source for biodiesel production. However, genetic improvement efforts in jatropha could not take advantage of genetic engineering methods due to lack of cloned genes from this species. To overcome this hurdle, the current gene discovery project was initiated with an objective of isolating as many functional genes as possible from J. curcas by large scale sequencing of expressed sequence tags (ESTs. Results A normalized and full-length enriched cDNA library was constructed from developing seeds of J. curcas. The cDNA library contained about 1 × 106 clones and average insert size of the clones was 2.1 kb. Totally 12,084 ESTs were sequenced to average high quality read length of 576 bp. Contig analysis revealed 2258 contigs and 4751 singletons. Contig size ranged from 2-23 and there were 7333 ESTs in the contigs. This resulted in 7009 unigenes which were annotated by BLASTX. It showed 3982 unigenes with significant similarity to known genes and 2836 unigenes with significant similarity to genes of unknown, hypothetical and putative proteins. The remaining 191 unigenes which did not show similarity with any genes in the public database may encode for unique genes. Functional classification revealed unigenes related to broad range of cellular, molecular and biological functions. Among the 7009 unigenes, 6233 unigenes were identified to be potential full-length genes. Conclusions The high quality normalized cDNA library was constructed from developing seeds of J. curcas for the first time and 7009 unigenes coding
Statistical processing of large image sequences.

Science.gov (United States)

Khellah, F; Fieguth, P; Murray, M J; Allen, M

2005-01-01

The dynamic estimation of large-scale stochastic image sequences, as frequently encountered in remote sensing, is important in a variety of scientific applications. However, the size of such images makes conventional dynamic estimation methods, for example, the Kalman and related filters, impractical. In this paper, we present an approach that emulates the Kalman filter, but with considerably reduced computational and storage requirements. Our approach is illustrated in the context of a 512 x 512 image sequence of ocean surface temperature. The static estimation step, the primary contribution here, uses a mixture of stationary models to accurately mimic the effect of a nonstationary prior, simplifying both computational complexity and modeling. Our approach provides an efficient, stable, positive-definite model which is consistent with the given correlation structure. Thus, the methods of this paper may find application in modeling and single-frame estimation.
Sequencing and analysis of full-length cDNAs, 5'-ESTs and 3'-ESTs from a cartilaginous fish, the elephant shark (Callorhinchus milii).

KAUST Repository

Brenner, Sydney

2012-10-08

Cartilaginous fishes are the most ancient group of living jawed vertebrates (gnathostomes) and are, therefore, an important reference group for understanding the evolution of vertebrates. The elephant shark (Callorhinchus milii), a holocephalan cartilaginous fish, has been identified as a model cartilaginous fish genome because of its compact genome (∼910 Mb) and a genome project has been initiated to obtain its whole genome sequence. In this study, we have generated and sequenced full-length enriched cDNA libraries of the elephant shark using the \\'oligo-capping\\' method and Sanger sequencing. A total of 6,778 full-length protein-coding cDNA and 10,701 full-length noncoding cDNA were sequenced from six tissues (gills, intestine, kidney, liver, spleen, and testis) of the elephant shark. Analysis of their polyadenylation signals showed that polyadenylation usage in elephant shark is similar to that in mammals. Furthermore, both coding and noncoding transcripts of the elephant shark use the same proportion of canonical polyadenylation sites. Besides BLASTX searches, protein-coding transcripts were annotated by Gene Ontology, InterPro domain, and KEGG pathway analyses. By comparing elephant shark genes to bony vertebrate genes, we identified several ancient genes present in elephant shark but differentially lost in tetrapods or teleosts. Only ∼6% of elephant shark noncoding cDNA showed similarity to known noncoding RNAs (ncRNAs). The rest are either highly divergent ncRNAs or novel ncRNAs. In addition to full-length transcripts, 30,375 5\\'-ESTs and 41,317 3\\'-ESTs were sequenced and annotated. The clones and transcripts generated in this study are valuable resources for annotating transcription start sites, exon-intron boundaries, and UTRs of genes in the elephant shark genome, and for the functional characterization of protein sequences. These resources will also be useful for annotating genes in other cartilaginous fishes whose genomes have been targeted for
Sequencing and analysis of full-length cDNAs, 5'-ESTs and 3'-ESTs from a cartilaginous fish, the elephant shark (Callorhinchus milii).

KAUST Repository

Brenner, Sydney; Kodzius, Rimantas; Tan, Yue Ying; Tay, Alice; Tay, Boon-Hui; Venkatesh, Byrappa

2012-01-01

Cartilaginous fishes are the most ancient group of living jawed vertebrates (gnathostomes) and are, therefore, an important reference group for understanding the evolution of vertebrates. The elephant shark (Callorhinchus milii), a holocephalan cartilaginous fish, has been identified as a model cartilaginous fish genome because of its compact genome (∼910 Mb) and a genome project has been initiated to obtain its whole genome sequence. In this study, we have generated and sequenced full-length enriched cDNA libraries of the elephant shark using the 'oligo-capping' method and Sanger sequencing. A total of 6,778 full-length protein-coding cDNA and 10,701 full-length noncoding cDNA were sequenced from six tissues (gills, intestine, kidney, liver, spleen, and testis) of the elephant shark. Analysis of their polyadenylation signals showed that polyadenylation usage in elephant shark is similar to that in mammals. Furthermore, both coding and noncoding transcripts of the elephant shark use the same proportion of canonical polyadenylation sites. Besides BLASTX searches, protein-coding transcripts were annotated by Gene Ontology, InterPro domain, and KEGG pathway analyses. By comparing elephant shark genes to bony vertebrate genes, we identified several ancient genes present in elephant shark but differentially lost in tetrapods or teleosts. Only ∼6% of elephant shark noncoding cDNA showed similarity to known noncoding RNAs (ncRNAs). The rest are either highly divergent ncRNAs or novel ncRNAs. In addition to full-length transcripts, 30,375 5'-ESTs and 41,317 3'-ESTs were sequenced and annotated. The clones and transcripts generated in this study are valuable resources for annotating transcription start sites, exon-intron boundaries, and UTRs of genes in the elephant shark genome, and for the functional characterization of protein sequences. These resources will also be useful for annotating genes in other cartilaginous fishes whose genomes have been targeted for whole
Targeted sequencing of large genomic regions with CATCH-Seq.

Directory of Open Access Journals (Sweden)

Kenneth Day

Full Text Available Current target enrichment systems for large-scale next-generation sequencing typically require synthetic oligonucleotides used as capture reagents to isolate sequences of interest. The majority of target enrichment reagents are focused on gene coding regions or promoters en masse. Here we introduce development of a customizable targeted capture system using biotinylated RNA probe baits transcribed from sheared bacterial artificial chromosome clone templates that enables capture of large, contiguous blocks of the genome for sequencing applications. This clone adapted template capture hybridization sequencing (CATCH-Seq procedure can be used to capture both coding and non-coding regions of a gene, and resolve the boundaries of copy number variations within a genomic target site. Furthermore, libraries constructed with methylated adapters prior to solution hybridization also enable targeted bisulfite sequencing. We applied CATCH-Seq to diverse targets ranging in size from 125 kb to 3.5 Mb. Our approach provides a simple and cost effective alternative to other capture platforms because of template-based, enzymatic probe synthesis and the lack of oligonucleotide design costs. Given its similarity in procedure, CATCH-Seq can also be performed in parallel with commercial systems.
Inference of functional properties from large-scale analysis of enzyme superfamilies.

Science.gov (United States)

Brown, Shoshana D; Babbitt, Patricia C

2012-01-02

As increasingly large amounts of data from genome and other sequencing projects become available, new approaches are needed to determine the functions of the proteins these genes encode. We show how large-scale computational analysis can help to address this challenge by linking functional information to sequence and structural similarities using protein similarity networks. Network analyses using three functionally diverse enzyme superfamilies illustrate the use of these approaches for facile updating and comparison of available structures for a large superfamily, for creation of functional hypotheses for metagenomic sequences, and to summarize the limits of our functional knowledge about even well studied superfamilies.
Large scale electrolysers

International Nuclear Information System (INIS)

B Bello; M Junker

2006-01-01

Hydrogen production by water electrolysis represents nearly 4 % of the world hydrogen production. Future development of hydrogen vehicles will require large quantities of hydrogen. Installation of large scale hydrogen production plants will be needed. In this context, development of low cost large scale electrolysers that could use 'clean power' seems necessary. ALPHEA HYDROGEN, an European network and center of expertise on hydrogen and fuel cells, has performed for its members a study in 2005 to evaluate the potential of large scale electrolysers to produce hydrogen in the future. The different electrolysis technologies were compared. Then, a state of art of the electrolysis modules currently available was made. A review of the large scale electrolysis plants that have been installed in the world was also realized. The main projects related to large scale electrolysis were also listed. Economy of large scale electrolysers has been discussed. The influence of energy prices on the hydrogen production cost by large scale electrolysis was evaluated. (authors)
HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

Science.gov (United States)

Wan, Shixiang; Zou, Quan

2017-01-01

Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.
Exploiting EST databases for the development and characterization of EST-SSR markers in castor bean (Ricinus communis L.

Directory of Open Access Journals (Sweden)

Yang Jun-Bo

2010-12-01

Full Text Available Abstract Background The castor bean (Ricinus communis L., a monotypic species in the spurge family (Euphorbiaceae, 2n = 20, is an important non-edible oilseed crop widely cultivated in tropical, sub-tropical and temperate countries for its high economic value. Because of the high level of ricinoleic acid (over 85% in its seed oil, the castor bean seed derivatives are often used in aviation oil, lubricants, nylon, dyes, inks, soaps, adhesive and biodiesel. Due to lack of efficient molecular markers, little is known about the population genetic diversity and the genetic relationships among castor bean germplasm. Efficient and robust molecular markers are increasingly needed for breeding and improving varieties in castor bean. The advent of modern genomics has produced large amounts of publicly available DNA sequence data. In particular, expressed sequence tags (ESTs provide valuable resources to develop gene-associated SSR markers. Results In total, 18,928 publicly available non-redundant castor bean EST sequences, representing approximately 17.03 Mb, were evaluated and 7732 SSR sites in 5,122 ESTs were identified by data mining. Castor bean exhibited considerably high frequency of EST-SSRs. We developed and characterized 118 polymorphic EST-SSR markers from 379 primer pairs flanking repeats by screening 24 castor bean samples collected from different countries. A total of 350 alleles were identified from 118 polymorphic SSR loci, ranging from 2-6 per locus (A with an average of 2.97. The EST-SSR markers developed displayed moderate gene diversity (He with an average of 0.41. Genetic relationships among 24 germplasms were investigated using the genotypes of 350 alleles, showing geographic pattern of genotypes across genetic diversity centers of castor bean. Conclusion Castor bean EST sequences exhibited considerably high frequency of SSR sites, and were rich resources for developing EST-SSR markers. These EST-SSR markers would be particularly
Inference of Functional Properties from Large-scale Analysis of Enzyme Superfamilies*

Science.gov (United States)

Brown, Shoshana D.; Babbitt, Patricia C.

2012-01-01

As increasingly large amounts of data from genome and other sequencing projects become available, new approaches are needed to determine the functions of the proteins these genes encode. We show how large-scale computational analysis can help to address this challenge by linking functional information to sequence and structural similarities using protein similarity networks. Network analyses using three functionally diverse enzyme superfamilies illustrate the use of these approaches for facile updating and comparison of available structures for a large superfamily, for creation of functional hypotheses for metagenomic sequences, and to summarize the limits of our functional knowledge about even well studied superfamilies. PMID:22069325
Comparative high-throughput transcriptome sequencing and development of SiESTa, the Silene EST annotation database

Directory of Open Access Journals (Sweden)

Marais Gabriel AB

2011-07-01

Full Text Available Abstract Background The genus Silene is widely used as a model system for addressing ecological and evolutionary questions in plants, but advances in using the genus as a model system are impeded by the lack of available resources for studying its genome. Massively parallel sequencing cDNA has recently developed into an efficient method for characterizing the transcriptomes of non-model organisms, generating massive amounts of data that enable the study of multiple species in a comparative framework. The sequences generated provide an excellent resource for identifying expressed genes, characterizing functional variation and developing molecular markers, thereby laying the foundations for future studies on gene sequence and gene expression divergence. Here, we report the results of a comparative transcriptome sequencing study of eight individuals representing four Silene and one Dianthus species as outgroup. All sequences and annotations have been deposited in a newly developed and publicly available database called SiESTa, the Silene EST annotation database. Results A total of 1,041,122 EST reads were generated in two runs on a Roche GS-FLX 454 pyrosequencing platform. EST reads were analyzed separately for all eight individuals sequenced and were assembled into contigs using TGICL. These were annotated with results from BLASTX searches and Gene Ontology (GO terms, and thousands of single-nucleotide polymorphisms (SNPs were characterized. Unassembled reads were kept as singletons and together with the contigs contributed to the unigenes characterized in each individual. The high quality of unigenes is evidenced by the proportion (49% that have significant hits in similarity searches with the A. thaliana proteome. The SiESTa database is accessible at http://www.siesta.ethz.ch. Conclusion The sequence collections established in the present study provide an important genomic resource for four Silene and one Dianthus species and will help to
Comparative high-throughput transcriptome sequencing and development of SiESTa, the Silene EST annotation database

Science.gov (United States)

2011-01-01

Background The genus Silene is widely used as a model system for addressing ecological and evolutionary questions in plants, but advances in using the genus as a model system are impeded by the lack of available resources for studying its genome. Massively parallel sequencing cDNA has recently developed into an efficient method for characterizing the transcriptomes of non-model organisms, generating massive amounts of data that enable the study of multiple species in a comparative framework. The sequences generated provide an excellent resource for identifying expressed genes, characterizing functional variation and developing molecular markers, thereby laying the foundations for future studies on gene sequence and gene expression divergence. Here, we report the results of a comparative transcriptome sequencing study of eight individuals representing four Silene and one Dianthus species as outgroup. All sequences and annotations have been deposited in a newly developed and publicly available database called SiESTa, the Silene EST annotation database. Results A total of 1,041,122 EST reads were generated in two runs on a Roche GS-FLX 454 pyrosequencing platform. EST reads were analyzed separately for all eight individuals sequenced and were assembled into contigs using TGICL. These were annotated with results from BLASTX searches and Gene Ontology (GO) terms, and thousands of single-nucleotide polymorphisms (SNPs) were characterized. Unassembled reads were kept as singletons and together with the contigs contributed to the unigenes characterized in each individual. The high quality of unigenes is evidenced by the proportion (49%) that have significant hits in similarity searches with the A. thaliana proteome. The SiESTa database is accessible at http://www.siesta.ethz.ch. Conclusion The sequence collections established in the present study provide an important genomic resource for four Silene and one Dianthus species and will help to further develop Silene as a
PAVE: Program for assembling and viewing ESTs

Directory of Open Access Journals (Sweden)

Bomhoff Matthew

2009-08-01

Full Text Available Abstract Background New sequencing technologies are rapidly emerging. Many laboratories are simultaneously working with the traditional Sanger ESTs and experimenting with ESTs generated by the 454 Life Science sequencers. Though Sanger ESTs have been used to generate contigs for many years, no program takes full advantage of the 5' and 3' mate-pair information, hence, many tentative transcripts are assembled into two separate contigs. The new 454 technology has the benefit of high-throughput expression profiling, but introduces time and space problems for assembling large contigs. Results The PAVE (Program for Assembling and Viewing ESTs assembler takes advantage of the 5' and 3' mate-pair information by requiring that the mate-pairs be assembled into the same contig and joined by n's if the two sub-contigs do not overlap. It handles the depth of 454 data sets by "burying" similar ESTs during assembly, which retains the expression level information while circumventing time and space problems. PAVE uses MegaBLAST for the clustering step and CAP3 for assembly, however it assembles incrementally to enforce the mate-pair constraint, bury ESTs, and reduce incorrect joins and splits. The PAVE data management system uses a MySQL database to store multiple libraries of ESTs along with their metadata; the management system allows multiple assemblies with variations on libraries and parameters. Analysis routines provide standard annotation for the contigs including a measure of differentially expressed genes across the libraries. A Java viewer program is provided for display and analysis of the results. Our results clearly show the benefit of using the PAVE assembler to explicitly use mate-pair information and bury ESTs for large contigs. Conclusion The PAVE assembler provides a software package for assembling Sanger and/or 454 ESTs. The assembly software, data management software, Java viewer and user's guide are freely available.
PAVE: program for assembling and viewing ESTs.

Science.gov (United States)

Soderlund, Carol; Johnson, Eric; Bomhoff, Matthew; Descour, Anne

2009-08-26

New sequencing technologies are rapidly emerging. Many laboratories are simultaneously working with the traditional Sanger ESTs and experimenting with ESTs generated by the 454 Life Science sequencers. Though Sanger ESTs have been used to generate contigs for many years, no program takes full advantage of the 5' and 3' mate-pair information, hence, many tentative transcripts are assembled into two separate contigs. The new 454 technology has the benefit of high-throughput expression profiling, but introduces time and space problems for assembling large contigs. The PAVE (Program for Assembling and Viewing ESTs) assembler takes advantage of the 5' and 3' mate-pair information by requiring that the mate-pairs be assembled into the same contig and joined by n's if the two sub-contigs do not overlap. It handles the depth of 454 data sets by "burying" similar ESTs during assembly, which retains the expression level information while circumventing time and space problems. PAVE uses MegaBLAST for the clustering step and CAP3 for assembly, however it assembles incrementally to enforce the mate-pair constraint, bury ESTs, and reduce incorrect joins and splits. The PAVE data management system uses a MySQL database to store multiple libraries of ESTs along with their metadata; the management system allows multiple assemblies with variations on libraries and parameters. Analysis routines provide standard annotation for the contigs including a measure of differentially expressed genes across the libraries. A Java viewer program is provided for display and analysis of the results. Our results clearly show the benefit of using the PAVE assembler to explicitly use mate-pair information and bury ESTs for large contigs. The PAVE assembler provides a software package for assembling Sanger and/or 454 ESTs. The assembly software, data management software, Java viewer and user's guide are freely available.
Development of EST-derived markers in Dendrobium from EST of related taxa

OpenAIRE

Narisa Juejun; Chataporn Chunwongse; Julapark Chunwongse

2013-01-01

Public databases are useful for molecular marker development. The major aim of this study was to develop expressedsequence tag (EST)-derived markers in Dendrobium from available ESTs of Phalaenopsis and Dendrobium. A total of 6063sequences were screened for simple sequence repeats (SSRs) and introns. Primers flanking these regions were generated andtested on genomic DNAs of Phalaenopsis and Dendrobium. Twenty-three percent of amplifiable Phalaenopsis EST-derivedmarkers were cross-genera trans...
An expressed sequence tag (EST) library for Drosophila serrata, a model system for sexual selection and climatic adaptation studies.

Science.gov (United States)

Frentiu, Francesca D; Adamski, Marcin; McGraw, Elizabeth A; Blows, Mark W; Chenoweth, Stephen F

2009-01-21

The native Australian fly Drosophila serrata belongs to the highly speciose montium subgroup of the melanogaster species group. It has recently emerged as an excellent model system with which to address a number of important questions, including the evolution of traits under sexual selection and traits involved in climatic adaptation along latitudinal gradients. Understanding the molecular genetic basis of such traits has been limited by a lack of genomic resources for this species. Here, we present the first expressed sequence tag (EST) collection for D. serrata that will enable the identification of genes underlying sexually-selected phenotypes and physiological responses to environmental change and may help resolve controversial phylogenetic relationships within the montium subgroup. A normalized cDNA library was constructed from whole fly bodies at several developmental stages, including larvae and adults. Assembly of 11,616 clones sequenced from the 3' end allowed us to identify 6,607 unique contigs, of which at least 90% encoded peptides. Partial transcripts were discovered from a variety of genes of evolutionary interest by BLASTing contigs against the 12 Drosophila genomes currently sequenced. By incorporating into the cDNA library multiple individuals from populations spanning a large portion of the geographical range of D. serrata, we were able to identify 11,057 putative single nucleotide polymorphisms (SNPs), with 278 different contigs having at least one "double hit" SNP that is highly likely to be a real polymorphism. At least 394 EST-associated microsatellite markers, representing 355 different contigs, were also found, providing an additional set of genetic markers. The assembled EST library is available online at http://www.chenowethlab.org/serrata/index.cgi. We have provided the first gene collection and largest set of polymorphic genetic markers, to date, for the fly D. serrata. The EST collection will provide much needed genomic resources for
Analysis of a normalised expressed sequence tag (EST) library from a key pollinator, the bumblebee Bombus terrestris.

Science.gov (United States)

Sadd, Ben M; Kube, Michael; Klages, Sven; Reinhardt, Richard; Schmid-Hempel, Paul

2010-02-15

The bumblebee, Bombus terrestris (Order Hymenoptera), is of widespread importance. This species is extensively used for commercial pollination in Europe, and along with other Bombus spp. is a key member of natural pollinator assemblages. Furthermore, the species is studied in a wide variety of biological fields. The objective of this project was to create a B. terrestris EST resource that will prove to be valuable in obtaining a deeper understanding of this significant social insect. A normalised cDNA library was constructed from the thorax and abdomen of B. terrestris workers in order to enhance the discovery of rare genes. A total of 29'428 ESTs were sequenced. Subsequent clustering resulted in 13'333 unique sequences. Of these, 58.8 percent had significant similarities to known proteins, with 54.5 percent having a "best-hit" to existing Hymenoptera sequences. Comparisons with the honeybee and other insects allowed the identification of potential candidates for gene loss, pseudogene evolution, and possible incomplete annotation in the honeybee genome. Further, given the focus of much basic research and the perceived threat of disease to natural and commercial populations, the immune system of bumblebees is a particularly relevant component. Although the library is derived from unchallenged bees, we still uncover transcription of a number of immune genes spanning the principally described insect immune pathways. Additionally, the EST library provides a resource for the discovery of genetic markers that can be used in population level studies. Indeed, initial screens identified 589 simple sequence repeats and 854 potential single nucleotide polymorphisms. The resource that these B. terrestris ESTs represent is valuable for ongoing work. The ESTs provide direct evidence of transcriptionally active regions, but they will also facilitate further functional genomics, gene discovery and future genome annotation. These are important aspects in obtaining a greater

Generation and analysis of ESTs from the eastern oyster, Crassostrea virginica Gmelin and identification of microsatellite and SNP markers

Directory of Open Access Journals (Sweden)

Wallace Richard

2007-06-01

Full Text Available Abstract Background The eastern oyster, Crassostrea virginica (Gmelin 1791, is an economically important species cultured in many areas in North America. It is also ecologically important because of the impact of its filter feeding behaviour on water quality. Populations of C. virginica have been threatened by overfishing, habitat degradation, and diseases. Through genome research, strategies are being developed to reverse its population decline. However, large-scale expressed sequence tag (EST resources have been lacking for this species. Efficient generation of EST resources from this species has been hindered by a high redundancy of transcripts. The objectives of this study were to construct a normalized cDNA library for efficient EST analysis, to generate thousands of ESTs, and to analyze the ESTs for microsatellites and potential single nucleotide polymorphisms (SNPs. Results A normalized and subtracted C. virginica cDNA library was constructed from pooled RNA isolated from hemocytes, mantle, gill, gonad and digestive tract, muscle, and a whole juvenile oyster. A total of 6,528 clones were sequenced from this library generating 5,542 high-quality EST sequences. Cluster analysis indicated the presence of 635 contigs and 4,053 singletons, generating a total of 4,688 unique sequences. About 46% (2,174 of the unique ESTs had significant hits (E-value ≤ 1e-05 to the non-redundant protein database; 1,104 of which were annotated using Gene Ontology (GO terms. A total of 35 microsatellites were identified from the ESTs, with 18 having sufficient flanking sequences for primer design. A total of 6,533 putative SNPs were also identified using all existing and the newly generated EST resources of the eastern oysters. Conclusion A high quality normalized cDNA library was constructed. A total of 5,542 ESTs were generated representing 4,688 unique sequences. Putative microsatellite and SNP markers were identified. These genome resources provide the
The Porcelain Crab Transcriptome and PCAD, the Porcelain Crab Microarray and Sequence Database

Energy Technology Data Exchange (ETDEWEB)

Tagmount, Abderrahmane; Wang, Mei; Lindquist, Erika; Tanaka, Yoshihiro; Teranishi, Kristen S.; Sunagawa, Shinichi; Wong, Mike; Stillman, Jonathon H.

2010-01-27

Background: With the emergence of a completed genome sequence of the freshwater crustacean Daphnia pulex, construction of genomic-scale sequence databases for additional crustacean sequences are important for comparative genomics and annotation. Porcelain crabs, genus Petrolisthes, have been powerful crustacean models for environmental and evolutionary physiology with respect to thermal adaptation and understanding responses of marine organisms to climate change. Here, we present a large-scale EST sequencing and cDNA microarray database project for the porcelain crab Petrolisthes cinctipes. Methodology/Principal Findings: A set of ~;;30K unique sequences (UniSeqs) representing ~;;19K clusters were generated from ~;;98K high quality ESTs from a set of tissue specific non-normalized and mixed-tissue normalized cDNA libraries from the porcelain crab Petrolisthes cinctipes. Homology for each UniSeq was assessed using BLAST, InterProScan, GO and KEGG database searches. Approximately 66percent of the UniSeqs had homology in at least one of the databases. All EST and UniSeq sequences along with annotation results and coordinated cDNA microarray datasets have been made publicly accessible at the Porcelain Crab Array Database (PCAD), a feature-enriched version of the Stanford and Longhorn Array Databases.Conclusions/Significance: The EST project presented here represents the third largest sequencing effort for any crustacean, and the largest effort for any crab species. Our assembly and clustering results suggest that our porcelain crab EST data set is equally diverse to the much larger EST set generated in the Daphnia pulex genome sequencing project, and thus will be an important resource to the Daphnia research community. Our homology results support the pancrustacea hypothesis and suggest that Malacostraca may be ancestral to Branchiopoda and Hexapoda. Our results also suggest that our cDNA microarrays cover as much of the transcriptome as can reasonably be captured in
Characterization and comparison of EST-SSR and TRAP markers for genetic analysis of the Japanese persimmon Diospyros kaki.

Science.gov (United States)

Luo, C; Zhang, F; Zhang, Q L; Guo, D Y; Luo, Z R

2013-01-09

We developed and characterized expressed sequence tags (ESTs)-simple sequence repeats (SSRs) and targeted region amplified polymorphism (TRAP) markers to examine genetic relationships in the persimmon genus Diospyros gene pool. In total, we characterized 14 EST-SSR primer pairs and 36 TRAP primer combinations, which were amplified across 20 germplasms of 4 species in the genus Diospyros. We used various genetic parameters, including effective multiplex ratio (EMR), diversity index (DI), and marker index (MI), to test the utility of these markers. TRAP markers gave higher EMR (24.85) but lower DI (0.33), compared to EST-SSRs (EMR = 3.65, DI = 0.34). TRAP gave a very high MI (8.08), which was about 8 times than the MI of EST-SSR (1.25). These markers were utilized for phylogenetic inference of 20 genotypes of Diospyros kaki Thunb. and allied species, with a result that all kaki genotypes clustered closely and 3 allied species formed an independent group. These markers could be further exploited for large-scale genetic relationship inference.
PineElm_SSRdb: a microsatellite marker database identified from genomic, chloroplast, mitochondrial and EST sequences of pineapple (Ananas comosus (L.) Merrill).

Science.gov (United States)

Chaudhary, Sakshi; Mishra, Bharat Kumar; Vivek, Thiruvettai; Magadum, Santoshkumar; Yasin, Jeshima Khan

2016-01-01

Simple Sequence Repeats or microsatellites are resourceful molecular genetic markers. There are only few reports of SSR identification and development in pineapple. Complete genome sequence of pineapple available in the public domain can be used to develop numerous novel SSRs. Therefore, an attempt was made to identify SSRs from genomic, chloroplast, mitochondrial and EST sequences of pineapple which will help in deciphering genetic makeup of its germplasm resources. A total of 359511 SSRs were identified in pineapple (356385 from genome sequence, 45 from chloroplast sequence, 249 in mitochondrial sequence and 2832 from EST sequences). The list of EST-SSR markers and their details are available in the database. PineElm_SSRdb is an open source database available for non-commercial academic purpose at http://app.bioelm.com/ with a mapping tool which can develop circular maps of selected marker set. This database will be of immense use to breeders, researchers and graduates working on Ananas spp. and to others working on cross-species transferability of markers, investigating diversity, mapping and DNA fingerprinting.
Large-scale transcriptome analyses reveal new genetic marker candidates of head, neck, and thyroid cancer

DEFF Research Database (Denmark)

Reis, Eduardo M; Ojopi, Elida P B; Alberto, Fernando L

2005-01-01

A detailed genome mapping analysis of 213,636 expressed sequence tags (EST) derived from nontumor and tumor tissues of the oral cavity, larynx, pharynx, and thyroid was done. Transcripts matching known human genes were identified; potential new splice variants were flagged and subjected to manual...... that can be used for future studies on the molecular basis of these tumors. Similar analysis is warranted for a number of other tumors for which large EST data sets are available....
Insertion Sequence-Caused Large Scale-Rearrangements in the Genome of Escherichia coli

Science.gov (United States)

2016-07-18

affordable ap- proach to genome-wide characterization of genetic varia - tion in bacterial and eukaryotic genomes (1–3). In addition to small-scale...Paired-End Reads), that uses a graph-based al- gorithm (27) capable of detecting most large-scale varia - tion involving repetitive regions, including novel...Avila,P., Grinsted,J. and De La Cruz,F. (1988) Analysis of the variable endpoints generated by one-ended transposition of Tn21.. J. Bacteriol., 170
Identification and characterization of two novel bla(KLUC resistance genes through large-scale resistance plasmids sequencing.

Directory of Open Access Journals (Sweden)

Teng Xu

Full Text Available Plasmids are important antibiotic resistance determinant carriers that can disseminate various drug resistance genes among species or genera. By using a high throughput sequencing approach, two groups of plasmids of Escherichia coli (named E1 and E2, each consisting of 160 clinical E. coli strains isolated from different periods of time were sequenced and analyzed. A total of 20 million reads were obtained and mapped onto the known resistance gene sequences. As a result, a total of 9 classes, including 36 types of antibiotic resistant genes, were identified. Among these genes, 25 and 27 single nucleotide polymorphisms (SNPs appeared, of which 9 and 12 SNPs are nonsynonymous substitutions in the E1 and E2 samples. It is interesting to find that a novel genotype of bla(KLUC, whose close relatives, bla(KLUC-1 and bla(KLUC-2, have been previously reported as carried on the Kluyvera cryocrescens chromosome and Enterobacter cloacae plasmid, was identified. It shares 99% and 98% amino acid identities with Kluc-1 and Kluc-2, respectively. Further PCR screening of 608 Enterobacteriaceae family isolates yielded a second variant (named bla(KLUC-4. It was interesting to find that Kluc-3 showed resistance to several cephalosporins including cefotaxime, whereas bla(KLUC-4 did not show any resistance to the antibiotics tested. This may be due to a positively charged residue, Arg, replaced by a neutral residue, Leu, at position 167, which is located within an omega-loop. This work represents large-scale studies on resistance gene distribution, diversification and genetic variation in pooled multi-drug resistance plasmids, and provides insight into the use of high throughput sequencing technology for microbial resistance gene detection.
Generation and analysis of a large-scale expressed sequence Tag database from a full-length enriched cDNA library of developing leaves of Gossypium hirsutum L.

Directory of Open Access Journals (Sweden)

Min Lin

Full Text Available BACKGROUND: Cotton (Gossypium hirsutum L. is one of the world's most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. METHODOLOGY/PRINCIPAL FINDINGS: In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR, which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. CONCLUSIONS/SIGNIFICANCE: These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence
MytiBase: a knowledgebase of mussel (M. galloprovincialis transcribed sequences

Directory of Open Access Journals (Sweden)

Roch Philippe

2009-02-01

Full Text Available Abstract Background Although Bivalves are among the most studied marine organisms due to their ecological role, economic importance and use in pollution biomonitoring, very little information is available on the genome sequences of mussels. This study reports the functional analysis of a large-scale Expressed Sequence Tag (EST sequencing from different tissues of Mytilus galloprovincialis (the Mediterranean mussel challenged with toxic pollutants, temperature and potentially pathogenic bacteria. Results We have constructed and sequenced seventeen cDNA libraries from different Mediterranean mussel tissues: gills, digestive gland, foot, anterior and posterior adductor muscle, mantle and haemocytes. A total of 24,939 clones were sequenced from these libraries generating 18,788 high-quality ESTs which were assembled into 2,446 overlapping clusters and 4,666 singletons resulting in a total of 7,112 non-redundant sequences. In particular, a high-quality normalized cDNA library (Nor01 was constructed as determined by the high rate of gene discovery (65.6%. Bioinformatic screening of the non-redundant M. galloprovincialis sequences identified 159 microsatellite-containing ESTs. Clusters, consensuses, related similarities and gene ontology searches have been organized in a dedicated, searchable database http://mussel.cribi.unipd.it. Conclusion We defined the first species-specific catalogue of M. galloprovincialis ESTs including 7,112 unique transcribed sequences. Putative microsatellite markers were identified. This annotated catalogue represents a valuable platform for expression studies, marker validation and genetic linkage analysis for investigations in the biology of Mediterranean mussels.
Generation and analysis of expressed sequence tags from the ciliate protozoan parasite Ichthyophthirius multifiliis

Directory of Open Access Journals (Sweden)

Arias Covadonga

2007-06-01

Full Text Available Abstract Background The ciliate protozoan Ichthyophthirius multifiliis (Ich is an important parasite of freshwater fish that causes 'white spot disease' leading to significant losses. A genomic resource for large-scale studies of this parasite has been lacking. To study gene expression involved in Ich pathogenesis and virulence, our goal was to generate expressed sequence tags (ESTs for the development of a powerful microarray platform for the analysis of global gene expression in this species. Here, we initiated a project to sequence and analyze over 10,000 ESTs. Results We sequenced 10,368 EST clones using a normalized cDNA library made from pooled samples of the trophont, tomont, and theront life-cycle stages, and generated 9,769 sequences (94.2% success rate. Post-sequencing processing led to 8,432 high quality sequences. Clustering analysis of these ESTs allowed identification of 4,706 unique sequences containing 976 contigs and 3,730 singletons. These unique sequences represent over two million base pairs (~10% of Plasmodium falciparum genome, a phylogenetically related protozoan. BLASTX searches produced 2,518 significant (E-value -5 hits and further Gene Ontology (GO analysis annotated 1,008 of these genes. The ESTs were analyzed comparatively against the genomes of the related protozoa Tetrahymena thermophila and P. falciparum, allowing putative identification of additional genes. All the EST sequences were deposited by dbEST in GenBank (GenBank: EG957858–EG966289. Gene discovery and annotations are presented and discussed. Conclusion This set of ESTs represents a significant proportion of the Ich transcriptome, and provides a material basis for the development of microarrays useful for gene expression studies concerning Ich development, pathogenesis, and virulence.
Comprehensive EST analysis of the symbiotic sea anemone, Anemonia viridis.

Science.gov (United States)

Sabourault, Cécile; Ganot, Philippe; Deleury, Emeline; Allemand, Denis; Furla, Paola

2009-07-23

Coral reef ecosystems are renowned for their diversity and beauty. Their immense ecological success is due to a symbiotic association between cnidarian hosts and unicellular dinoflagellate algae, known as zooxanthellae. These algae are photosynthetic and the cnidarian-zooxanthellae association is based on nutritional exchanges. Maintenance of such an intimate cellular partnership involves many crosstalks between the partners. To better characterize symbiotic relationships between a cnidarian host and its dinoflagellate symbionts, we conducted a large-scale EST study on a symbiotic sea anemone, Anemonia viridis, in which the two tissue layers (epiderm and gastroderm) can be easily separated. A single cDNA library was constructed from symbiotic tissue of sea anemones A. viridis in various environmental conditions (both normal and stressed). We generated 39,939 high quality ESTs, which were assembled into 14,504 unique sequences (UniSeqs). Sequences were analysed and sorted according to their putative origin (animal, algal or bacterial). We identified many new repeated elements in the 3'UTR of most animal genes, suggesting that these elements potentially have a biological role, especially with respect to gene expression regulation. We identified genes of animal origin that have no homolog in the non-symbiotic starlet sea anemone Nematostella vectensis genome, but in other symbiotic cnidarians, and may therefore be involved in the symbiosis relationship in A. viridis. Comparison of protein domain occurrence in A. viridis with that in N. vectensis demonstrated an increase in abundance of some molecular functions, such as protein binding or antioxidant activity, suggesting that these functions are essential for the symbiotic state and may be specific adaptations. This large dataset of sequences provides a valuable resource for future studies on symbiotic interactions in Cnidaria. The comparison with the closest available genome, the sea anemone N. vectensis, as well as
Comprehensive EST analysis of the symbiotic sea anemone, Anemonia viridis

Directory of Open Access Journals (Sweden)

Deleury Emeline

2009-07-01

Full Text Available Abstract Background Coral reef ecosystems are renowned for their diversity and beauty. Their immense ecological success is due to a symbiotic association between cnidarian hosts and unicellular dinoflagellate algae, known as zooxanthellae. These algae are photosynthetic and the cnidarian-zooxanthellae association is based on nutritional exchanges. Maintenance of such an intimate cellular partnership involves many crosstalks between the partners. To better characterize symbiotic relationships between a cnidarian host and its dinoflagellate symbionts, we conducted a large-scale EST study on a symbiotic sea anemone, Anemonia viridis, in which the two tissue layers (epiderm and gastroderm can be easily separated. Results A single cDNA library was constructed from symbiotic tissue of sea anemones A. viridis in various environmental conditions (both normal and stressed. We generated 39,939 high quality ESTs, which were assembled into 14,504 unique sequences (UniSeqs. Sequences were analysed and sorted according to their putative origin (animal, algal or bacterial. We identified many new repeated elements in the 3'UTR of most animal genes, suggesting that these elements potentially have a biological role, especially with respect to gene expression regulation. We identified genes of animal origin that have no homolog in the non-symbiotic starlet sea anemone Nematostella vectensis genome, but in other symbiotic cnidarians, and may therefore be involved in the symbiosis relationship in A. viridis. Comparison of protein domain occurrence in A. viridis with that in N. vectensis demonstrated an increase in abundance of some molecular functions, such as protein binding or antioxidant activity, suggesting that these functions are essential for the symbiotic state and may be specific adaptations. Conclusion This large dataset of sequences provides a valuable resource for future studies on symbiotic interactions in Cnidaria. The comparison with the closest
Exploring nervous system transcriptomes during embryogenesis and metamorphosis in Xenopus tropicalis using EST analysis

Directory of Open Access Journals (Sweden)

Wegnez Maurice

2007-05-01

Full Text Available Abstract Background The western African clawed frog Xenopus tropicalis is an anuran amphibian species now used as model in vertebrate comparative genomics. It provides the same advantages as Xenopus laevis but is diploid and has a smaller genome of 1.7 Gbp. Therefore X. tropicalis is more amenable to systematic transcriptome surveys. We initiated a large-scale partial cDNA sequencing project to provide a functional genomics resource on genes expressed in the nervous system during early embryogenesis and metamorphosis in X. tropicalis. Results A gene index was defined and analysed after the collection of over 48,785 high quality sequences. These partial cDNA sequences were obtained from an embryonic head and retina library (30,272 sequences and from a metamorphic brain and spinal cord library (27,602 sequences. These ESTs are estimated to represent 9,693 transcripts derived from an estimated 6,000 genes. Comparison of these cDNA sequences with protein databases indicates that 46% contain their start codon. Further annotation included Gene Ontology functional classification, InterPro domain analysis, alternative splicing and non-coding RNA identification. Gene expression profiles were derived from EST counts and used to define transcripts specific to metamorphic stages of development. Moreover, these ESTs allowed identification of a set of 225 polymorphic microsatellites that can be used as genetic markers. Conclusion These cDNA sequences permit in silico cloning of numerous genes and will facilitate studies aimed at deciphering the roles of cognate genes expressed in the nervous system during neural development and metamorphosis. The genomic resources developed to study X. tropicalis biology will accelerate exploration of amphibian physiology and genetics. In particular, the model will facilitate analysis of key questions related to anuran embryogenesis and metamorphosis and its associated regulatory processes.
Generation, analysis and functional annotation of expressed sequence tags from the ectoparasitic mite Psoroptes ovis

Directory of Open Access Journals (Sweden)

Kenyon Fiona

2011-07-01

Full Text Available Abstract Background Sheep scab is caused by Psoroptes ovis and is arguably the most important ectoparasitic disease affecting sheep in the UK. The disease is highly contagious and causes and considerable pruritis and irritation and is therefore a major welfare concern. Current methods of treatment are unsustainable and in order to elucidate novel methods of disease control a more comprehensive understanding of the parasite is required. To date, no full genomic DNA sequence or large scale transcript datasets are available and prior to this study only 484 P. ovis expressed sequence tags (ESTs were accessible in public databases. Results In order to further expand upon the transcriptomic coverage of P. ovis thus facilitating novel insights into the mite biology we undertook a larger scale EST approach, incorporating newly generated and previously described P. ovis transcript data and representing the largest collection of P. ovis ESTs to date. We sequenced 1,574 ESTs and assembled these along with 484 previously generated P. ovis ESTs, which resulted in the identification of 1,545 unique P. ovis sequences. BLASTX searches identified 961 ESTs with significant hits (E-value P. ovis ESTs. Gene Ontology (GO analysis allowed the functional annotation of 880 ESTs and included predictions of signal peptide and transmembrane domains; allowing the identification of potential P. ovis excreted/secreted factors, and mapping of metabolic pathways. Conclusions This dataset currently represents the largest collection of P. ovis ESTs, all of which are publicly available in the GenBank EST database (dbEST (accession numbers FR748230 - FR749648. Functional analysis of this dataset identified important homologues, including house dust mite allergens and tick salivary factors. These findings offer new insights into the underlying biology of P. ovis, facilitating further investigations into mite biology and the identification of novel methods of intervention.
Sugarcane expressed sequences tags (ESTs encoding enzymes involved in lignin biosynthesis pathways

Directory of Open Access Journals (Sweden)

Ramos Rose Lucia Braz

2001-01-01

Full Text Available Lignins are phenolic polymers found in the secondary wall of plant conductive systems where they play an important role by reducing the permeability of the cell wall to water. Lignins are also responsible for the rigidity of the cell wall and are involved in mechanisms of resistance to pathogens. The metabolic routes and enzymes involved in synthesis of lignins have been largely characterized and representative genes that encode enzymes involved in these processes have been cloned from several plant species. The synthesis of lignins is liked to the general metabolism of the phenylpropanoids in plants, having enzymes (e.g. phenylalanine ammonia-lyase (PAL, cinnamate 4-hydroxylase (C4H and caffeic acid O-methyltransferase (COMT common to other processes as well as specific enzymes such as cinnamoyl-CoA reductase (CCR and cinnamyl alcohol dehydrogenase (CAD. Some maize and sorghum mutants, shown to have defective in CAD and/or COMT activity, are easier to digest because they have a reduced lignin content, something which has motivated different research groups to alter the lignin content and composition of model plants by genetic engineering try to improve, for example, the efficiency of paper pulping and digestibility. In the work reported in this paper, we have made an inventory of the sugarcane expressed sequence tag (EST coding for enzymes involved in lignin metabolism which are present in the sugarcane EST genome project (SUCEST database. Our analysis focused on the key enzymes ferulate-5-hydroxylase (F5H, caffeic acid O-methyltransferase (COMT, caffeoyl CoA O-methyltransferase (CCoAOMT, hydroxycinnamate CoA ligase (4CL, cinnamoyl-CoA reductase (CCR and cinnamyl alcohol dehydrogenase (CAD. The comparative analysis of these genes with those described in other species could be used as molecular markers for breeding as well as for the manipulation of lignin metabolism in sugarcane.
Gene prediction in metagenomic fragments: A large scale machine learning approach

Directory of Open Access Journals (Sweden)

Morgenstern Burkhard

2008-04-01

Full Text Available Abstract Background Metagenomics is an approach to the characterization of microbial genomes via the direct isolation of genomic sequences from the environment without prior cultivation. The amount of metagenomic sequence data is growing fast while computational methods for metagenome analysis are still in their infancy. In contrast to genomic sequences of single species, which can usually be assembled and analyzed by many available methods, a large proportion of metagenome data remains as unassembled anonymous sequencing reads. One of the aims of all metagenomic sequencing projects is the identification of novel genes. Short length, for example, Sanger sequencing yields on average 700 bp fragments, and unknown phylogenetic origin of most fragments require approaches to gene prediction that are different from the currently available methods for genomes of single species. In particular, the large size of metagenomic samples requires fast and accurate methods with small numbers of false positive predictions. Results We introduce a novel gene prediction algorithm for metagenomic fragments based on a two-stage machine learning approach. In the first stage, we use linear discriminants for monocodon usage, dicodon usage and translation initiation sites to extract features from DNA sequences. In the second stage, an artificial neural network combines these features with open reading frame length and fragment GC-content to compute the probability that this open reading frame encodes a protein. This probability is used for the classification and scoring of gene candidates. With large scale training, our method provides fast single fragment predictions with good sensitivity and specificity on artificially fragmented genomic DNA. Additionally, this method is able to predict translation initiation sites accurately and distinguishes complete from incomplete genes with high reliability. Conclusion Large scale machine learning methods are well-suited for gene
Discovery and functional prioritization of Parkinson's disease candidate genes from large-scale whole exome sequencing

NARCIS (Netherlands)

I. Jansen (Iris); Ye, H. (Hui); Heetveld, S. (Sasja); Lechler, M.C. (Marie C.); Michels, H. (Helen); Seinstra, R.I. (Renée I.); Lubbe, S.J. (Steven J.); Drouet, V. (Valérie); S. Lesage (Suzanne); E. Majounie (Elisa); Gibbs, J.R. (J.Raphael); M.A. Nalls (Michael); M. Ryten (Mina); Botia, J.A. (Juan A.); J. Vandrovcova (Jana); J. Simón-Sánchez (Javier); Castillo-Lizardo, M. (Melissa); P. Rizzu (Patrizia); Blauwendraat, C. (Cornelis); Chouhan, A.K. (Amit K.); Li, Y. (Yarong); Yogi, P. (Puja); N. Amin (Najaf); C.M. van Duijn (Cornelia); Morris, H.R. (Huw R.); Brice, A. (Alexis); A. Singleton (Andrew); David, D.C. (Della C.); Nollen, E.A. (Ellen A.); A. Jain (Ashok); J.M. Shulman; P. Heutink (Peter); D.G. Hernandez (Dena); S. Arepalli (Sampath); J. Brooks (Janet); Price, R. (Ryan); Nicolas, A. (Aude); S. Chong (Sean); M.R. Cookson (Mark); A. Dillman (Allissa); M. Moore (Matt); B.J. Traynor (Bryan); A. Singleton (Andrew); V. Plagnol (Vincent); Nicholas W Wood,; U.-M. Sheerin (Una-Marie); Jose M Bras,; K. Charlesworth (Kate); M. Gardner (Mac); R. Guerreiro (Rita); D. Trabzuni (Danyah); Hardy, J. (John); M. Sharma; M. Saad (Mohamad); Javier Simón-Sánchez,; C. Schulte (Claudia); J.C. Corvol (Jean-Christophe); Dürr, A. (Alexandra); M. Vidailhet (M.); S. Sveinbjörnsdóttir (Sigurlaug); R.A. Barker (Roger); Caroline H Williams-Gray,; Y. Ben-Shlomo; H.W. Berendse (Henk W.); K.D. van Dijk (Karin); D. Berg (Daniela); K. Brockmann; K.D. Wurster (Kathrin); Mätzler, W. (Walter); Gasser, T. (Thomas); M. Martinez (Maria); R.M.A. de Bie (Rob); A. Biffi (Alessandro); D. Velseboer (Daan); B.R. Bloem (Bastiaan); B. Post (Bart); M. Wickremaratchi (Mirdhu); B. van de Warrenburg (Bart); Z. Bochdanovits (Zoltan); M. von Bonin (Malte); H. Pétursson (Hjörvar); O. Riess (Olaf); D.J. Burn (David); Lubbe, S. (Steven); Cooper, J.M. (J Mark); N.H. McNeill (Nathan); Schapira, A. (Anthony); Lungu, C. (Codrin); Chen, H. (Honglei); Dong, J. (Jing); Chinnery, P.F. (Patrick F.); G. Hudson (Gavin); Clarke, C.E. (Carl E.); C. Moorby (Catriona); C. Counsell (Carl); P. Damier (Philippe); J.-F. Dartigues; P. Deloukas (Panagiotis); E. Gray (Emma); T. Edkins (Ted); Hunt, S.E. (Sarah E.); S.C. Potter (Simon); A. Tashakkori-Ghanbaria (Avazeh); G. Deuschl (Günther); D. Lorenz (Delia); D.T. Dexter (David); F. Durif (Frank); J. Evans (Jonathan Mark); Langford, C. (Cordelia); T. Foltynie (Thomas); A.M. Goate (Alison); C. Harris (Clare); J.J. van Hilten (Jacobus); A. Hofman (Albert); J.R. Hollenbeck (John R.); J.L. Holton (Janice); Hu, M. (Michele); X. Huang (Xiaohong); Illig, T. (Thomas); P.V. Jónsson (Pálmi); J.-C. Lambert; S.S. O'Sullivan (Sean); T. Revesz (Tamas); K. Shaw (Karen); A.J. Lees (Andrew); P. Lichtner (Peter); P. Limousin (Patricia); G. Lopez; Escott-Price, V. (Valentina); J. Pearson (Justin); N. Williams (Nigel); E. Mudanohwo (Ese); J.S. Perlmutter (Joel); Pollak, P. (Pierre); F. Rivadeneira Ramirez (Fernando); A.G. Uitterlinden (André); S.J. Sawcer (Stephen); H. Scheffer (Hans); I. Shoulson (Ira); L. Shulman (Lee); Smith, C. (Colin); R. Walker (Robert); C.C.A. Spencer (Chris C.); A. Strange (Amy); H. Stefansson (Hreinn); F. Bettella (Francesco); J-A. Zwart (John-Anker); Stockton, J.D. (Joanna D.); D. Talbot; C.M. Tanner (Carlie); F. Tison (François); S. Winder-Rhodes (Sophie); K.P. Bhatia (Kailash)

2017-01-01

textabstractBackground: Whole-exome sequencing (WES) has been successful in identifying genes that cause familial Parkinson's disease (PD). However, until now this approach has not been deployed to study large cohorts of unrelated participants. To discover rare PD susceptibility variants, we
Analysis of expressed sequence tags from Actinidia: applications of a cross species EST database for gene discovery in the areas of flavor, health, color and ripening

Directory of Open Access Journals (Sweden)

Richardson Annette C

2008-07-01

Full Text Available Abstract Background Kiwifruit (Actinidia spp. are a relatively new, but economically important crop grown in many different parts of the world. Commercial success is driven by the development of new cultivars with novel consumer traits including flavor, appearance, healthful components and convenience. To increase our understanding of the genetic diversity and gene-based control of these key traits in Actinidia, we have produced a collection of 132,577 expressed sequence tags (ESTs. Results The ESTs were derived mainly from four Actinidia species (A. chinensis, A. deliciosa, A. arguta and A. eriantha and fell into 41,858 non redundant clusters (18,070 tentative consensus sequences and 23,788 EST singletons. Analysis of flavor and fragrance-related gene families (acyltransferases and carboxylesterases and pathways (terpenoid biosynthesis is presented in comparison with a chemical analysis of the compounds present in Actinidia including esters, acids, alcohols and terpenes. ESTs are identified for most genes in color pathways controlling chlorophyll degradation and carotenoid biosynthesis. In the health area, data are presented on the ESTs involved in ascorbic acid and quinic acid biosynthesis showing not only that genes for many of the steps in these pathways are represented in the database, but that genes encoding some critical steps are absent. In the convenience area, genes related to different stages of fruit softening are identified. Conclusion This large EST resource will allow researchers to undertake the tremendous challenge of understanding the molecular basis of genetic diversity in the Actinidia genus as well as provide an EST resource for comparative fruit genomics. The various bioinformatics analyses we have undertaken demonstrates the extent of coverage of ESTs for genes encoding different biochemical pathways in Actinidia.
Gene discovery using massively parallel pyrosequencing to develop ESTs for the flesh fly Sarcophaga crassipalpis

Directory of Open Access Journals (Sweden)

Hahn Daniel A

2009-05-01

Full Text Available Abstract Background Flesh flies in the genus Sarcophaga are important models for investigating endocrinology, diapause, cold hardiness, reproduction, and immunity. Despite the prominence of Sarcophaga flesh flies as models for insect physiology and biochemistry, and in forensic studies, little genomic or transcriptomic data are available for members of this genus. We used massively parallel pyrosequencing on the Roche 454-FLX platform to produce a substantial EST dataset for the flesh fly Sarcophaga crassipalpis. To maximize sequence diversity, we pooled RNA extracted from whole bodies of all life stages and normalized the cDNA pool after reverse transcription. Results We obtained 207,110 ESTs with an average read length of 241 bp. These reads assembled into 20,995 contigs and 31,056 singletons. Using BLAST searches of the NR and NT databases we were able to identify 11,757 unique gene elements (ES. crassipalpis unigenes among GO Biological Process functional groups with that of the Drosophila melanogaster transcriptome suggests that our ESTs are broadly representative of the flesh fly transcriptome. Insertion and deletion errors in 454 sequencing present a serious hurdle to comparative transcriptome analysis. Aided by a new approach to correcting for these errors, we performed a comparative analysis of genetic divergence across GO categories among S. crassipalpis, D. melanogaster, and Anopheles gambiae. The results suggest that non-synonymous substitutions occur at similar rates across categories, although genes related to response to stimuli may evolve slightly faster. In addition, we identified over 500 potential microsatellite loci and more than 12,000 SNPs among our ESTs. Conclusion Our data provides the first large-scale EST-project for flesh flies, a much-needed resource for exploring this model species. In addition, we identified a large number of potential microsatellite and SNP markers that could be used in population and systematic
Transcriptome sequencing and characterization for the sea cucumber Apostichopus japonicus (Selenka, 1867.

Directory of Open Access Journals (Sweden)

Huixia Du

Full Text Available BACKGROUND: Sea cucumbers are a special group of marine invertebrates. They occupy a taxonomic position that is believed to be important for understanding the origin and evolution of deuterostomes. Some of them such as Apostichopus japonicus represent commercially important aquaculture species in Asian countries. Many efforts have been devoted to increasing the number of expressed sequence tags (ESTs for A. japonicus, but a comprehensive characterization of its transcriptome remains lacking. Here, we performed the large-scale transcriptome profiling and characterization by pyrosequencing diverse cDNA libraries from A. japonicus. RESULTS: In total, 1,061,078 reads were obtained by 454 sequencing of eight cDNA libraries representing different developmental stages and adult tissues in A. japonicus. These reads were assembled into 29,666 isotigs, which were further clustered into 21,071 isogroups. Nearly 40% of the isogroups showed significant matches to known proteins based on sequence similarity. Gene ontology (GO and KEGG pathway analyses recovered diverse biological functions and processes. Candidate genes that were potentially involved in aestivation were identified. Transcriptome comparison with the sea urchin Strongylocentrotus purpuratus revealed similar patterns of GO term representation. In addition, 4,882 putative orthologous genes were identified, of which 202 were not present in the non-echinoderm organisms. More than 700 simple sequence repeats (SSRs and 54,000 single nucleotide polymorphisms (SNPs were detected in the A. japonicus transcriptome. CONCLUSION: Pyrosequencing was proven to be efficient in rapidly identifying a large set of genes for the sea cucumber A. japonicus. Through the large-scale transcriptome sequencing as well as public EST data integration, we performed a comprehensive characterization of the A. japonicus transcriptome and identified candidate aestivation-related genes. A large number of potential genetic

Genome-wide analysis of immune system genes by EST profiling

Science.gov (United States)

Giallourakis, Cosmas; Benita, Yair; Molinie, Benoit; Cao, Zhifang; Despo, Orion; Pratt, Henry E.; Zukerberg, Lawrence R.; Daly, Mark J.; Rioux, John D.; Xavier, Ramnik J.

2013-01-01

Profiling studies of mRNA and miRNA, particularly microarray-based studies, have been extensively used to create compendia of genes that are preferentially expressed in the immune system. In some instances, functional studies have been subsequently pursued. Recent efforts such as ENCODE have demonstrated the benefit of coupling RNA-Seq analysis with information from expressed sequence tags (ESTs) for transcriptomic analysis. However, the full characterization and identification of transcripts that function as modulators of human immune responses remains incomplete. In this study, we demonstrate that an integrated analysis of human ESTs provides a robust platform to identify the immune transcriptome. Beyond recovering a reference set of immune-enriched genes and providing large-scale cross-validation of previous microarray studies, we discovered hundreds of novel genes preferentially expressed in the immune system, including non-coding RNAs. As a result, we have established the Immunogene database, representing an integrated EST “road map” of gene expression in human immune cells, which can be used to further investigate the function of coding and non-coding genes in the immune system. Using this approach, we have uncovered a unique metabolic gene signature of human macrophages and identified PRDM15 as a novel overexpressed gene in human lymphomas. Thus we demonstrate the utility of EST profiling as a basis for further deconstruction of physiologic and pathologic immune processes. PMID:23616578
Large-scale gene function analysis with the PANTHER classification system.

Science.gov (United States)

Mi, Huaiyu; Muruganujan, Anushya; Casagrande, John T; Thomas, Paul D

2013-08-01

The PANTHER (protein annotation through evolutionary relationship) classification system (http://www.pantherdb.org/) is a comprehensive system that combines gene function, ontology, pathways and statistical analysis tools that enable biologists to analyze large-scale, genome-wide data from sequencing, proteomics or gene expression experiments. The system is built with 82 complete genomes organized into gene families and subfamilies, and their evolutionary relationships are captured in phylogenetic trees, multiple sequence alignments and statistical models (hidden Markov models or HMMs). Genes are classified according to their function in several different ways: families and subfamilies are annotated with ontology terms (Gene Ontology (GO) and PANTHER protein class), and sequences are assigned to PANTHER pathways. The PANTHER website includes a suite of tools that enable users to browse and query gene functions, and to analyze large-scale experimental data with a number of statistical tests. It is widely used by bench scientists, bioinformaticians, computer scientists and systems biologists. In the 2013 release of PANTHER (v.8.0), in addition to an update of the data content, we redesigned the website interface to improve both user experience and the system's analytical capability. This protocol provides a detailed description of how to analyze genome-wide experimental data with the PANTHER classification system.
Parallel Index and Query for Large Scale Data Analysis

Energy Technology Data Exchange (ETDEWEB)

Chou, Jerry; Wu, Kesheng; Ruebel, Oliver; Howison, Mark; Qiang, Ji; Prabhat,; Austin, Brian; Bethel, E. Wes; Ryne, Rob D.; Shoshani, Arie

2011-07-18

Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for process- ing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process mas- sive datasets on modern supercomputing platforms. We apply FastQuery to processing of a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for inter- esting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.
Large-scale solar purchasing

International Nuclear Information System (INIS)

1999-01-01

The principal objective of the project was to participate in the definition of a new IEA task concerning solar procurement (''the Task'') and to assess whether involvement in the task would be in the interest of the UK active solar heating industry. The project also aimed to assess the importance of large scale solar purchasing to UK active solar heating market development and to evaluate the level of interest in large scale solar purchasing amongst potential large scale purchasers (in particular housing associations and housing developers). A further aim of the project was to consider means of stimulating large scale active solar heating purchasing activity within the UK. (author)
A second generation framework for the analysis of microsatellites in expressed sequence tags and the development of EST-SSR markers for a conifer, Cryptomeria japonica

Directory of Open Access Journals (Sweden)

Ueno Saneyoshi

2012-04-01

Full Text Available Abstract Background Microsatellites or simple sequence repeats (SSRs in expressed sequence tags (ESTs are useful resources for genome analysis because of their abundance, functionality and polymorphism. The advent of commercial second generation sequencing machines has lead to new strategies for developing EST-SSR markers, necessitating the development of bioinformatic framework that can keep pace with the increasing quality and quantity of sequence data produced. We describe an open scheme for analyzing ESTs and developing EST-SSR markers from reads collected by Sanger sequencing and pyrosequencing of sugi (Cryptomeria japonica. Results We collected 141,097 sequence reads by Sanger sequencing and 1,333,444 by pyrosequencing. After trimming contaminant and low quality sequences, 118,319 Sanger and 1,201,150 pyrosequencing reads were passed to the MIRA assembler, generating 81,284 contigs that were analysed for SSRs. 4,059 SSRs were found in 3,694 (4.54% contigs, giving an SSR frequency lower than that in seven other plant species with gene indices (5.4–21.9%. The average GC content of the SSR-containing contigs was 41.55%, compared to 40.23% for all contigs. Tri-SSRs were the most common SSRs; the most common motif was AT, which was found in 655 (46.3% di-SSRs, followed by the AAG motif, found in 342 (25.9% tri-SSRs. Most (72.8% tri-SSRs were in coding regions, but 55.6% of the di-SSRs were in non-coding regions; the AT motif was most abundant in 3′ untranslated regions. Gene ontology (GO annotations showed that six GO terms were significantly overrepresented within SSR-containing contigs. Forty–four EST-SSR markers were developed from 192 primer pairs using two pipelines: read2Marker and the newly-developed CMiB, which combines several open tools. Markers resulting from both pipelines showed no differences in PCR success rate and polymorphisms, but PCR success and polymorphism were significantly affected by the expected PCR product size
A second generation framework for the analysis of microsatellites in expressed sequence tags and the development of EST-SSR markers for a conifer, Cryptomeria japonica

Science.gov (United States)

2012-01-01

Background Microsatellites or simple sequence repeats (SSRs) in expressed sequence tags (ESTs) are useful resources for genome analysis because of their abundance, functionality and polymorphism. The advent of commercial second generation sequencing machines has lead to new strategies for developing EST-SSR markers, necessitating the development of bioinformatic framework that can keep pace with the increasing quality and quantity of sequence data produced. We describe an open scheme for analyzing ESTs and developing EST-SSR markers from reads collected by Sanger sequencing and pyrosequencing of sugi (Cryptomeria japonica). Results We collected 141,097 sequence reads by Sanger sequencing and 1,333,444 by pyrosequencing. After trimming contaminant and low quality sequences, 118,319 Sanger and 1,201,150 pyrosequencing reads were passed to the MIRA assembler, generating 81,284 contigs that were analysed for SSRs. 4,059 SSRs were found in 3,694 (4.54%) contigs, giving an SSR frequency lower than that in seven other plant species with gene indices (5.4–21.9%). The average GC content of the SSR-containing contigs was 41.55%, compared to 40.23% for all contigs. Tri-SSRs were the most common SSRs; the most common motif was AT, which was found in 655 (46.3%) di-SSRs, followed by the AAG motif, found in 342 (25.9%) tri-SSRs. Most (72.8%) tri-SSRs were in coding regions, but 55.6% of the di-SSRs were in non-coding regions; the AT motif was most abundant in 3′ untranslated regions. Gene ontology (GO) annotations showed that six GO terms were significantly overrepresented within SSR-containing contigs. Forty–four EST-SSR markers were developed from 192 primer pairs using two pipelines: read2Marker and the newly-developed CMiB, which combines several open tools. Markers resulting from both pipelines showed no differences in PCR success rate and polymorphisms, but PCR success and polymorphism were significantly affected by the expected PCR product size and number of SSR
Sampling gene diversity across the supergroup Amoebozoa: large EST data sets from Acanthamoeba castellanii, Hartmannella vermiformis, Physarum polycephalum, Hyperamoeba dachnaya and Hyperamoeba sp.

Science.gov (United States)

Watkins, Russell F; Gray, Michael W

2008-04-01

From comparative analysis of EST data for five taxa within the eukaryotic supergroup Amoebozoa, including two free-living amoebae (Acanthamoeba castellanii, Hartmannella vermiformis) and three slime molds (Physarum polycephalum, Hyperamoeba dachnaya and Hyperamoeba sp.), we obtained new broad-range perspectives on the evolution and biosynthetic capacity of this assemblage. Together with genome sequences for the amoebozoans Dictyostelium discoideum and Entamoeba histolytica, and including partial genome sequence available for A. castellanii, we used the EST data to identify genes that appear to be exclusive to the supergroup, and to specific clades therein. Many of these genes are likely involved in cell-cell communication or differentiation. In examining on a broad scale a number of characters that previously have been considered in simpler cross-species comparisons, typically between Dictyostelium and Entamoeba, we find that Amoebozoa as a whole exhibits striking variation in the number and distribution of biosynthetic pathways, for example, ones for certain critical stress-response molecules, including trehalose and mannitol. Finally, we report additional compelling cases of lateral gene transfer within Amoebozoa, further emphasizing that although this process has influenced genome evolution in all examined amoebozoan taxa, it has done so to a variable extent.
Comparative analyses of six solanaceous transcriptomes reveal a high degree of sequence conservation and species-specific transcripts

Directory of Open Access Journals (Sweden)

Ouyang Shu

2005-09-01

Full Text Available Abstract Background The Solanaceae is a family of closely related species with diverse phenotypes that have been exploited for agronomic purposes. Previous studies involving a small number of genes suggested sequence conservation across the Solanaceae. The availability of large collections of Expressed Sequence Tags (ESTs for the Solanaceae now provides the opportunity to assess sequence conservation and divergence on a genomic scale. Results All available ESTs and Expressed Transcripts (ETs, 449,224 sequences for six Solanaceae species (potato, tomato, pepper, petunia, tobacco and Nicotiana benthamiana, were clustered and assembled into gene indices. Examination of gene ontologies revealed that the transcripts within the gene indices encode a similar suite of biological processes. Although the ESTs and ETs were derived from a variety of tissues, 55–81% of the sequences had significant similarity at the nucleotide level with sequences among the six species. Putative orthologs could be identified for 28–58% of the sequences. This high degree of sequence conservation was supported by expression profiling using heterologous hybridizations to potato cDNA arrays that showed similar expression patterns in mature leaves for all six solanaceous species. 16–19% of the transcripts within the six Solanaceae gene indices did not have matches among Solanaceae, Arabidopsis, rice or 21 other plant gene indices. Conclusion Results from this genome scale analysis confirmed a high level of sequence conservation at the nucleotide level of the coding sequence among Solanaceae. Additionally, the results indicated that part of the Solanaceae transcriptome is likely to be unique for each species.
Development and Evaluation of a Novel Set of EST-SSR Markers Based on Transcriptome Sequences of Black Locust (Robinia pseudoacacia L.).

Science.gov (United States)

Guo, Qi; Wang, Jin-Xing; Su, Li-Zhuo; Lv, Wei; Sun, Yu-Han; Li, Yun

2017-07-07

Black locust ( Robinia pseudoacacia L. of the family Fabaceae) is an ecologically and economically important deciduous tree. However, few genomic resources are available for this forest species, and few effective expressed sequence tag-derived simple sequence repeat (EST-SSR) markers have been developed to date. In this study, paired-end sequencing was used to sequence transcriptomes of R. pseudoacacia by the Illumina HiSeq TM2000 platform, and EST-SSR loci were identified by de novo assembly. Furthermore, a total of 1697 primer pairs were successfully designed, from which 286 primers met the selection screening criteria; 94 pairs were randomly selected and tested for validation using polymerase chain reaction amplification. Forty-five primers were verified as polymorphic, with clear bands. The polymorphism information content values were 0.033-0.765, the number of alleles per locus ranged from 2 to 10, and the observed and expected heterozygosities were 0.000-0.931 and 0.035-0.810, respectively, indicating a high level of informativeness. Subsequently, 45 polymorphic EST-SSR loci were tested for amplification efficiency, using the verified primers, in an additional nine species of Leguminosae, 23 loci were amplified in more than three species, of which two loci were amplified successfully in all species. These EST-SSR markers provide a valuable tool for investigating the genetic diversity and population structure of R . pseudoacacia , constructing a DNA fingerprint database, performing quantitative trait locus mapping, and preserving genetic information.
Peanut (Arachis hypogaea Expressed Sequence Tag Project: Progress and Application

Directory of Open Access Journals (Sweden)

Suping Feng

2012-01-01

Full Text Available Many plant ESTs have been sequenced as an alternative to whole genome sequences, including peanut because of the genome size and complexity. The US peanut research community had the historic 2004 Atlanta Genomics Workshop and named the EST project as a main priority. As of August 2011, the peanut research community had deposited 252,832 ESTs in the public NCBI EST database, and this resource has been providing the community valuable tools and core foundations for various genome-scale experiments before the whole genome sequencing project. These EST resources have been used for marker development, gene cloning, microarray gene expression and genetic map construction. Certainly, the peanut EST sequence resources have been shown to have a wide range of applications and accomplished its essential role at the time of need. Then the EST project contributes to the second historic event, the Peanut Genome Project 2010 Inaugural Meeting also held in Atlanta where it was decided to sequence the entire peanut genome. After the completion of peanut whole genome sequencing, ESTs or transcriptome will continue to play an important role to fill in knowledge gaps, to identify particular genes and to explore gene function.
Large Scale Chromosome Folding Is Stable against Local Changes in Chromatin Structure.

Directory of Open Access Journals (Sweden)

Ana-Maria Florescu

2016-06-01

Full Text Available Characterizing the link between small-scale chromatin structure and large-scale chromosome folding during interphase is a prerequisite for understanding transcription. Yet, this link remains poorly investigated. Here, we introduce a simple biophysical model where interphase chromosomes are described in terms of the folding of chromatin sequences composed of alternating blocks of fibers with different thicknesses and flexibilities, and we use it to study the influence of sequence disorder on chromosome behaviors in space and time. By employing extensive computer simulations, we thus demonstrate that chromosomes undergo noticeable conformational changes only on length-scales smaller than 105 basepairs and time-scales shorter than a few seconds, and we suggest there might exist effective upper bounds to the detection of chromosome reorganization in eukaryotes. We prove the relevance of our framework by modeling recent experimental FISH data on murine chromosomes.
Insights into SCP/TAPS proteins of liver flukes based on large-scale bioinformatic analyses of sequence datasets.

Directory of Open Access Journals (Sweden)

Cinzia Cantacessi

Full Text Available BACKGROUND: SCP/TAPS proteins of parasitic helminths have been proposed to play key roles in fundamental biological processes linked to the invasion of and establishment in their mammalian host animals, such as the transition from free-living to parasitic stages and the modulation of host immune responses. Despite the evidence that SCP/TAPS proteins of parasitic nematodes are involved in host-parasite interactions, there is a paucity of information on this protein family for parasitic trematodes of socio-economic importance. METHODOLOGY/PRINCIPAL FINDINGS: We conducted the first large-scale study of SCP/TAPS proteins of a range of parasitic trematodes of both human and veterinary importance (including the liver flukes Clonorchis sinensis, Opisthorchis viverrini, Fasciola hepatica and F. gigantica as well as the blood flukes Schistosoma mansoni, S. japonicum and S. haematobium. We mined all current transcriptomic and/or genomic sequence datasets from public databases, predicted secondary structures of full-length protein sequences, undertook systematic phylogenetic analyses and investigated the differential transcription of SCP/TAPS genes in O. viverrini and F. hepatica, with an emphasis on those that are up-regulated in the developmental stages infecting the mammalian host. CONCLUSIONS: This work, which sheds new light on SCP/TAPS proteins, guides future structural and functional explorations of key SCP/TAPS molecules associated with diseases caused by flatworms. Future fundamental investigations of these molecules in parasites and the integration of structural and functional data could lead to new approaches for the control of parasitic diseases.
An EST dataset for Metasequoia glyptostroboides buds: the first EST resource for molecular genomics studies in Metasequoia.

Science.gov (United States)

Zhao, Ying; Thammannagowda, Shivegowda; Staton, Margaret; Tang, Sha; Xia, Xinli; Yin, Weilun; Liang, Haiying

2013-03-01

The "living fossil" Metasequoia glyptostroboides Hu et Cheng, commonly known as dawn redwood or Chinese redwood, is the only living species in the genus and is valued for its essential oil and crude extracts that have great potential for anti-fungal activity. Despite its paleontological significance and economical value as a rare relict species, genomic resources of Metasequoia are very limited. In order to gain insight into the molecular mechanisms behind the formation of reproductive buds and the transition from vegetative phase to reproductive phase in Metasequoia, we performed sequencing of expressed sequence tags from Metasequoia vegetative buds and female buds. By using the 454 pyrosequencing technology, a total of 1,571,764 high-quality reads were generated, among which 733,128 were from vegetative buds and 775,636 were from female buds. These EST reads were clustered and assembled into 114,124 putative unique transcripts (PUTs) with an average length of 536 bp. The 97,565 PUTs that were at least 100 bp in length were functionally annotated by a similarity search against public databases and assigned with Gene Ontology (GO) terms. A total of 59 known floral gene families and 190 isotigs involved in hormone regulation were captured in the dataset. Furthermore, a set of PUTs differentially expressed in vegetative and reproductive buds, as well as SSR motifs and high confidence SNPs, were identified. This is the first large-scale expressed sequence tags ever generated in Metasequoia and the first evidence for floral genes in this critically endangered deciduous conifer species.
Energy transfers in large-scale and small-scale dynamos

Science.gov (United States)

Samtaney, Ravi; Kumar, Rohit; Verma, Mahendra

2015-11-01

We present the energy transfers, mainly energy fluxes and shell-to-shell energy transfers in small-scale dynamo (SSD) and large-scale dynamo (LSD) using numerical simulations of MHD turbulence for Pm = 20 (SSD) and for Pm = 0.2 on 10243 grid. For SSD, we demonstrate that the magnetic energy growth is caused by nonlocal energy transfers from the large-scale or forcing-scale velocity field to small-scale magnetic field. The peak of these energy transfers move towards lower wavenumbers as dynamo evolves, which is the reason for the growth of the magnetic fields at the large scales. The energy transfers U2U (velocity to velocity) and B2B (magnetic to magnetic) are forward and local. For LSD, we show that the magnetic energy growth takes place via energy transfers from large-scale velocity field to large-scale magnetic field. We observe forward U2U and B2B energy flux, similar to SSD.
Detection of large-scale concentric gravity waves from a Chinese airglow imager network

Science.gov (United States)

Lai, Chang; Yue, Jia; Xu, Jiyao; Yuan, Wei; Li, Qinzeng; Liu, Xiao

2018-06-01

Concentric gravity waves (CGWs) contain a broad spectrum of horizontal wavelengths and periods due to their instantaneous localized sources (e.g., deep convection, volcanic eruptions, or earthquake, etc.). However, it is difficult to observe large-scale gravity waves of >100 km wavelength from the ground for the limited field of view of a single camera and local bad weather. Previously, complete large-scale CGW imagery could only be captured by satellite observations. In the present study, we developed a novel method that uses assembling separate images and applying low-pass filtering to obtain temporal and spatial information about complete large-scale CGWs from a network of all-sky airglow imagers. Coordinated observations from five all-sky airglow imagers in Northern China were assembled and processed to study large-scale CGWs over a wide area (1800 km × 1 400 km), focusing on the same two CGW events as Xu et al. (2015). Our algorithms yielded images of large-scale CGWs by filtering out the small-scale CGWs. The wavelengths, wave speeds, and periods of CGWs were measured from a sequence of consecutive assembled images. Overall, the assembling and low-pass filtering algorithms can expand the airglow imager network to its full capacity regarding the detection of large-scale gravity waves.
Recurrence time statistics: versatile tools for genomic DNA sequence analysis.

Science.gov (United States)

Cao, Yinhe; Tung, Wen-Wen; Gao, J B

2004-01-01

With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.
Musical Scales in Tone Sequences Improve Temporal Accuracy.

Science.gov (United States)

Li, Min S; Di Luca, Massimiliano

2018-01-01

Predicting the time of stimulus onset is a key component in perception. Previous investigations of perceived timing have focused on the effect of stimulus properties such as rhythm and temporal irregularity, but the influence of non-temporal properties and their role in predicting stimulus timing has not been exhaustively considered. The present study aims to understand how a non-temporal pattern in a sequence of regularly timed stimuli could improve or bias the detection of temporal deviations. We presented interspersed sequences of 3, 4, 5, and 6 auditory tones where only the timing of the last stimulus could slightly deviate from isochrony. Participants reported whether the last tone was 'earlier' or 'later' relative to the expected regular timing. In two conditions, the tones composing the sequence were either organized into musical scales or they were random tones. In one experiment, all sequences ended with the same tone; in the other experiment, each sequence ended with a different tone. Results indicate higher discriminability of anisochrony with musical scales and with longer sequences, irrespective of the knowledge of the final tone. Such an outcome suggests that the predictability of non-temporal properties, as enabled by the musical scale pattern, can be a factor in determining the sensitivity of time judgments.
A comprehensive assessment of the transcriptome of cork oak (Quercus suber) through EST sequencing.

Science.gov (United States)

Pereira-Leal, José B; Abreu, Isabel A; Alabaça, Cláudia S; Almeida, Maria Helena; Almeida, Paulo; Almeida, Tânia; Amorim, Maria Isabel; Araújo, Susana; Azevedo, Herlânder; Badia, Aleix; Batista, Dora; Bohn, Andreas; Capote, Tiago; Carrasquinho, Isabel; Chaves, Inês; Coelho, Ana Cristina; Costa, Maria Manuela Ribeiro; Costa, Rita; Cravador, Alfredo; Egas, Conceição; Faro, Carlos; Fortes, Ana M; Fortunato, Ana S; Gaspar, Maria João; Gonçalves, Sónia; Graça, José; Horta, Marília; Inácio, Vera; Leitão, José M; Lino-Neto, Teresa; Marum, Liliana; Matos, José; Mendonça, Diogo; Miguel, Andreia; Miguel, Célia M; Morais-Cecílio, Leonor; Neves, Isabel; Nóbrega, Filomena; Oliveira, Maria Margarida; Oliveira, Rute; Pais, Maria Salomé; Paiva, Jorge A; Paulo, Octávio S; Pinheiro, Miguel; Raimundo, João A P; Ramalho, José C; Ribeiro, Ana I; Ribeiro, Teresa; Rocheta, Margarida; Rodrigues, Ana Isabel; Rodrigues, José C; Saibo, Nelson J M; Santo, Tatiana E; Santos, Ana Margarida; Sá-Pereira, Paula; Sebastiana, Mónica; Simões, Fernanda; Sobral, Rómulo S; Tavares, Rui; Teixeira, Rita; Varela, Carolina; Veloso, Maria Manuela; Ricardo, Cândido P P

2014-05-15

Cork oak (Quercus suber) is one of the rare trees with the ability to produce cork, a material widely used to make wine bottle stoppers, flooring and insulation materials, among many other uses. The molecular mechanisms of cork formation are still poorly understood, in great part due to the difficulty in studying a species with a long life-cycle and for which there is scarce molecular/genomic information. Cork oak forests are of great ecological importance and represent a major economic and social resource in Southern Europe and Northern Africa. However, global warming is threatening the cork oak forests by imposing thermal, hydric and many types of novel biotic stresses. Despite the economic and social value of the Q. suber species, few genomic resources have been developed, useful for biotechnological applications and improved forest management. We generated in excess of 7 million sequence reads, by pyrosequencing 21 normalized cDNA libraries derived from multiple Q. suber tissues and organs, developmental stages and physiological conditions. We deployed a stringent sequence processing and assembly pipeline that resulted in the identification of ~159,000 unigenes. These were annotated according to their similarity to known plant genes, to known Interpro domains, GO classes and E.C. numbers. The phylogenetic extent of this ESTs set was investigated, and we found that cork oak revealed a significant new gene space that is not covered by other model species or EST sequencing projects. The raw data, as well as the full annotated assembly, are now available to the community in a dedicated web portal at http://www.corkoakdb.org. This genomic resource represents the first trancriptome study in a cork producing species. It can be explored to develop new tools and approaches to understand stress responses and developmental processes in forest trees, as well as the molecular cascades underlying cork differentiation and disease response.
Development of EST-derived markers in Dendrobium from EST of related taxa

Directory of Open Access Journals (Sweden)

Narisa Juejun

2013-04-01

Full Text Available Public databases are useful for molecular marker development. The major aim of this study was to develop expressedsequence tag (EST-derived markers in Dendrobium from available ESTs of Phalaenopsis and Dendrobium. A total of 6063sequences were screened for simple sequence repeats (SSRs and introns. Primers flanking these regions were generated andtested on genomic DNAs of Phalaenopsis and Dendrobium. Twenty-three percent of amplifiable Phalaenopsis EST-derivedmarkers were cross-genera transferable to Dendrobium. Forty-one markers from both Phalaenopsis and Dendrobium thatamplified in Dendrobium were assessed on six commercial cultivars and six wild accessions. All of them were transferableamong Dendrobium species. High polymorphism and heterozygosity were observed within wild accessions. Sixteen polymorphic markers were evaluated for linkage analysis on an F1 segregating population. Seven markers were mapped into threelinkage groups, two of which showed syntenic relationship between dendrobium and rice. This relationship will facilitatefurther quantitative trait loci (QTL mapping and comparative genomic studies of Dendrobium. Our results indicate thatPhalaenopsis EST-derived markers are valuable tools for genetic research and breeding applications in Dendrobium.
Transcriptome analysis of carnation (Dianthus caryophyllus L.) based on next-generation sequencing technology.

Science.gov (United States)

Tanase, Koji; Nishitani, Chikako; Hirakawa, Hideki; Isobe, Sachiko; Tabata, Satoshi; Ohmiya, Akemi; Onozaki, Takashi

2012-07-02

Carnation (Dianthus caryophyllus L.), in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, we generated an expressed sequence tag (EST) database for a carnation cultivar important in horticulture by high-throughput sequencing using 454 pyrosequencing technology. We constructed a normalized cDNA library and a 3'-UTR library of carnation, obtaining a total of 1,162,126 high-quality reads. These reads were assembled into 300,740 unigenes consisting of 37,844 contigs and 262,896 singlets. The contigs were searched against an Arabidopsis sequence database, and 61.8% (23,380) of them had at least one BLASTX hit. These contigs were also annotated with Gene Ontology (GO) and were found to cover a broad range of GO categories. Furthermore, we identified 17,362 potential simple sequence repeats (SSRs) in 14,291 of the unigenes. We focused on gene discovery in the areas of flower color and ethylene biosynthesis. Transcripts were identified for almost every gene involved in flower chlorophyll and carotenoid metabolism and in anthocyanin biosynthesis. Transcripts were also identified for every step in the ethylene biosynthesis pathway. We present the first large-scale sequence data set for carnation, generated using next-generation sequencing technology. The large EST database generated from these sequences is an informative resource for identifying genes involved in various biological processes in carnation and provides an EST resource for understanding the genetic diversity of this plant.

Transcriptome analysis of carnation (Dianthus caryophyllus L. based on next-generation sequencing technology

Directory of Open Access Journals (Sweden)

Tanase Koji

2012-07-01

Full Text Available Abstract Background Carnation (Dianthus caryophyllus L., in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, we generated an expressed sequence tag (EST database for a carnation cultivar important in horticulture by high-throughput sequencing using 454 pyrosequencing technology. Results We constructed a normalized cDNA library and a 3’-UTR library of carnation, obtaining a total of 1,162,126 high-quality reads. These reads were assembled into 300,740 unigenes consisting of 37,844 contigs and 262,896 singlets. The contigs were searched against an Arabidopsis sequence database, and 61.8% (23,380 of them had at least one BLASTX hit. These contigs were also annotated with Gene Ontology (GO and were found to cover a broad range of GO categories. Furthermore, we identified 17,362 potential simple sequence repeats (SSRs in 14,291 of the unigenes. We focused on gene discovery in the areas of flower color and ethylene biosynthesis. Transcripts were identified for almost every gene involved in flower chlorophyll and carotenoid metabolism and in anthocyanin biosynthesis. Transcripts were also identified for every step in the ethylene biosynthesis pathway. Conclusions We present the first large-scale sequence data set for carnation, generated using next-generation sequencing technology. The large EST database generated from these sequences is an informative resource for identifying genes involved in various biological processes in carnation and provides an EST resource for understanding the genetic diversity of this plant.
Algorithm of search and track of static and moving large-scale objects

Directory of Open Access Journals (Sweden)

Kalyaev Anatoly

2017-01-01

Full Text Available We suggest an algorithm for processing of a sequence, which contains images of search and track of static and moving large-scale objects. The possible software implementation of the algorithm, based on multithread CUDA processing, is suggested. Experimental analysis of the suggested algorithm implementation is performed.
Large-scale data analytics

CERN Document Server

Gkoulalas-Divanis, Aris

2014-01-01

Provides cutting-edge research in large-scale data analytics from diverse scientific areas Surveys varied subject areas and reports on individual results of research in the field Shares many tips and insights into large-scale data analytics from authors and editors with long-term experience and specialization in the field
CitEST libraries

Directory of Open Access Journals (Sweden)

Maria Luísa P. Natividade Targon

2007-01-01

Full Text Available In order to obtain a better understanding of what is citrus, 33 cDNA libraries were constructed from different citrus species and genera. Total RNA was extracted from fruits, leaves, flowers, bark, seeds and roots, and subjected or not to different biotic and abiotic stresses (pathogens and drought and at several developmental stages. To identify putative promoter sequences, as well as molecular markers that could be useful for breeding programs, one shotgun library was prepared from sweet orange (Citrus sinensis var. Olimpia. In addition, EST libraries were also constructed for a citrus pathogen, the oomycete Phythophthora parasitica in either virulent or avirulent form. A total of 286,559 cDNA clones from citrus were sequenced from their 5’ end, generating 242,790 valid reads of citrus. A total of 9,504 sequences were produced in the shotgun library and the valid reads were assembled using CAP3. In this procedure, we obtained 1,131 contigs and 4,083 singletons. A total of 19,200 cDNA clones from P. parasitica were sequenced, resulting in 16,400 valid reads. The number of ESTs generated in this project is, to our knowledge, the largest citrus sequence database in the world.
Analysis and functional annotation of expressed sequence tags (ESTs from multiple tissues of oil palm (Elaeis guineensis Jacq.

Directory of Open Access Journals (Sweden)

Lee Weng-Wah

2007-10-01

Full Text Available Abstract Background Oil palm is the second largest source of edible oil which contributes to approximately 20% of the world's production of oils and fats. In order to understand the molecular biology involved in in vitro propagation, flowering, efficient utilization of nitrogen sources and root diseases, we have initiated an expressed sequence tag (EST analysis on oil palm. Results In this study, six cDNA libraries from oil palm zygotic embryos, suspension cells, shoot apical meristems, young flowers, mature flowers and roots, were constructed. We have generated a total of 14537 expressed sequence tags (ESTs from these libraries, from which 6464 tentative unique contigs (TUCs and 2129 singletons were obtained. Approximately 6008 of these tentative unique genes (TUGs have significant matches to the non-redundant protein database, from which 2361 were assigned to one or more Gene Ontology categories. Predominant transcripts and differentially expressed genes were identified in multiple oil palm tissues. Homologues of genes involved in many aspects of flower development were also identified among the EST collection, such as CONSTANS-like, AGAMOUS-like (AGL2, AGL20, LFY-like, SQUAMOSA, SQUAMOSA binding protein (SBP etc. Majority of them are the first representatives in oil palm, providing opportunities to explore the cause of epigenetic homeotic flowering abnormality in oil palm, given the importance of flowering in fruit production. The transcript levels of two flowering-related genes, EgSBP and EgSEP were analysed in the flower tissues of various developmental stages. Gene homologues for enzymes involved in oil biosynthesis, utilization of nitrogen sources, and scavenging of oxygen radicals, were also uncovered among the oil palm ESTs. Conclusion The EST sequences generated will allow comparative genomic studies between oil palm and other monocotyledonous and dicotyledonous plants, development of gene-targeted markers for the reference genetic map
New Insights about Enzyme Evolution from Large Scale Studies of Sequence and Structure Relationships*

Science.gov (United States)

Brown, Shoshana D.; Babbitt, Patricia C.

2014-01-01

Understanding how enzymes have evolved offers clues about their structure-function relationships and mechanisms. Here, we describe evolution of functionally diverse enzyme superfamilies, each representing a large set of sequences that evolved from a common ancestor and that retain conserved features of their structures and active sites. Using several examples, we describe the different structural strategies nature has used to evolve new reaction and substrate specificities in each unique superfamily. The results provide insight about enzyme evolution that is not easily obtained from studies of one or only a few enzymes. PMID:25210038
Large-scale grid management

International Nuclear Information System (INIS)

Langdal, Bjoern Inge; Eggen, Arnt Ove

2003-01-01

The network companies in the Norwegian electricity industry now have to establish a large-scale network management, a concept essentially characterized by (1) broader focus (Broad Band, Multi Utility,...) and (2) bigger units with large networks and more customers. Research done by SINTEF Energy Research shows so far that the approaches within large-scale network management may be structured according to three main challenges: centralization, decentralization and out sourcing. The article is part of a planned series
Large-scale DNA Barcode Library Generation for Biomolecule Identification in High-throughput Screens.

Science.gov (United States)

Lyons, Eli; Sheridan, Paul; Tremmel, Georg; Miyano, Satoru; Sugano, Sumio

2017-10-24

High-throughput screens allow for the identification of specific biomolecules with characteristics of interest. In barcoded screens, DNA barcodes are linked to target biomolecules in a manner allowing for the target molecules making up a library to be identified by sequencing the DNA barcodes using Next Generation Sequencing. To be useful in experimental settings, the DNA barcodes in a library must satisfy certain constraints related to GC content, homopolymer length, Hamming distance, and blacklisted subsequences. Here we report a novel framework to quickly generate large-scale libraries of DNA barcodes for use in high-throughput screens. We show that our framework dramatically reduces the computation time required to generate large-scale DNA barcode libraries, compared with a naїve approach to DNA barcode library generation. As a proof of concept, we demonstrate that our framework is able to generate a library consisting of one million DNA barcodes for use in a fragment antibody phage display screening experiment. We also report generating a general purpose one billion DNA barcode library, the largest such library yet reported in literature. Our results demonstrate the value of our novel large-scale DNA barcode library generation framework for use in high-throughput screening applications.
Expressed sequence tag (EST) analysis of two subspecies of Metarhizium anisopliae reveals a plethora of secreted proteins with potential activity in insect hosts.

Science.gov (United States)

Freimoser, Florian M; Screen, Steven; Bagga, Savita; Hu, Gang; St Leger, Raymond J

2003-01-01

Expressed sequence tag (EST) libraries for Metarhizium anisopliae, the causative agent of green muscardine disease, were developed from the broad host-range pathogen Metarhizium anisopliae sf. anisopliae and the specific grasshopper pathogen, M. anisopliae sf. acridum. Approximately 1,700 5' end sequences from each subspecies were generated from cDNA libraries representing fungi grown under conditions that maximize secretion of cuticle-degrading enzymes. Both subspecies had ESTs for virtually all pathogenicity-related genes cloned to date from M. anisopliae, but many novel genes encoding potential virulence factors were also tagged. Enzymes with potential targets in the insect host included proteases, chitinases, phospholipases, lipases, esterases, phosphatases and enzymes producing toxic secondary metabolites. A diverse array of proteases composed 36 % of all M. anisopliae sf. anisopliae ESTs. Eighty percent of the ESTs that could be clustered into functional groups had significant matches (Ehistory of this clade.
Analysis of SSR information in EST resources of sugarcane

Science.gov (United States)

Expressed sequence tags ( ESTs) offer the opportunity to exploit single, low -copy, conserved sequence motifs for the development of simple sequence repeats ( SSRs). The total of 262 113 ESTs of sugarcane (Saccharum officinarum) in the database of NCBI were downloaded and analyzed, which resulted in...
Survey of transposable elements in sugarcane expressed sequence tags (ESTs

Directory of Open Access Journals (Sweden)

Rossi Magdalena

2001-01-01

Full Text Available The sugarcane expressed sequence tag (SUCEST project has produced a large number of cDNA sequences from several plant tissues submitted or not to different conditions of stress. In this paper we report the result of a search for transposable elements (TEs revealing a surprising amount of expressed TEs homologues. Of the 260,781 sequences grouped in 81,223 fragment assembly program (Phrap clusters, a total of 276 clones showed homology to previously reported TEs using a stringent cut-off value of e-50 or better. Homologous clones to Copia/Ty1 and Gypsy/Ty3 groups of long terminal repeat (LTR retrotransposons were found but no non-LTR retroelements were identified. All major transposon families were represented in sugarcane including Activator (Ac, Mutator (MuDR, Suppressor-mutator (En/Spm and Mariner. In order to compare the TE diversity in grasses genomes, we carried out a search for TEs described in sugarcane related species O.sativa, Z. mays and S. bicolor. We also present preliminary results showing the potential use of TEs insertion pattern polymorphism as molecular markers for cultivar identification.
Ethics of large-scale change

OpenAIRE

Arler, Finn

2006-01-01

The subject of this paper is long-term large-scale changes in human society. Some very significant examples of large-scale change are presented: human population growth, human appropriation of land and primary production, the human use of fossil fuels, and climate change. The question is posed, which kind of attitude is appropriate when dealing with large-scale changes like these from an ethical point of view. Three kinds of approaches are discussed: Aldo Leopold's mountain thinking, th...
Gene discovery in EST sequences from the wheat leaf rust fungus Puccinia triticina sexual spores, asexual spores and haustoria, compared to other rust and corn smut fungi

Science.gov (United States)

2011-01-01

Background Rust fungi are biotrophic basidiomycete plant pathogens that cause major diseases on plants and trees world-wide, affecting agriculture and forestry. Their biotrophic nature precludes many established molecular genetic manipulations and lines of research. The generation of genomic resources for these microbes is leading to novel insights into biology such as interactions with the hosts and guiding directions for breakthrough research in plant pathology. Results To support gene discovery and gene model verification in the genome of the wheat leaf rust fungus, Puccinia triticina (Pt), we have generated Expressed Sequence Tags (ESTs) by sampling several life cycle stages. We focused on several spore stages and isolated haustorial structures from infected wheat, generating 17,684 ESTs. We produced sequences from both the sexual (pycniospores, aeciospores and teliospores) and asexual (germinated urediniospores) stages of the life cycle. From pycniospores and aeciospores, produced by infecting the alternate host, meadow rue (Thalictrum speciosissimum), 4,869 and 1,292 reads were generated, respectively. We generated 3,703 ESTs from teliospores produced on the senescent primary wheat host. Finally, we generated 6,817 reads from haustoria isolated from infected wheat as well as 1,003 sequences from germinated urediniospores. Along with 25,558 previously generated ESTs, we compiled a database of 13,328 non-redundant sequences (4,506 singlets and 8,822 contigs). Fungal genes were predicted using the EST version of the self-training GeneMarkS algorithm. To refine the EST database, we compared EST sequences by BLASTN to a set of 454 pyrosequencing-generated contigs and Sanger BAC-end sequences derived both from the Pt genome, and to ESTs and genome reads from wheat. A collection of 6,308 fungal genes was identified and compared to sequences of the cereal rusts, Puccinia graminis f. sp. tritici (Pgt) and stripe rust, P. striiformis f. sp. tritici (Pst), and poplar
Gene discovery in EST sequences from the wheat leaf rust fungus Puccinia triticina sexual spores, asexual spores and haustoria, compared to other rust and corn smut fungi

Directory of Open Access Journals (Sweden)

Wynhoven Brian

2011-03-01

Full Text Available Abstract Background Rust fungi are biotrophic basidiomycete plant pathogens that cause major diseases on plants and trees world-wide, affecting agriculture and forestry. Their biotrophic nature precludes many established molecular genetic manipulations and lines of research. The generation of genomic resources for these microbes is leading to novel insights into biology such as interactions with the hosts and guiding directions for breakthrough research in plant pathology. Results To support gene discovery and gene model verification in the genome of the wheat leaf rust fungus, Puccinia triticina (Pt, we have generated Expressed Sequence Tags (ESTs by sampling several life cycle stages. We focused on several spore stages and isolated haustorial structures from infected wheat, generating 17,684 ESTs. We produced sequences from both the sexual (pycniospores, aeciospores and teliospores and asexual (germinated urediniospores stages of the life cycle. From pycniospores and aeciospores, produced by infecting the alternate host, meadow rue (Thalictrum speciosissimum, 4,869 and 1,292 reads were generated, respectively. We generated 3,703 ESTs from teliospores produced on the senescent primary wheat host. Finally, we generated 6,817 reads from haustoria isolated from infected wheat as well as 1,003 sequences from germinated urediniospores. Along with 25,558 previously generated ESTs, we compiled a database of 13,328 non-redundant sequences (4,506 singlets and 8,822 contigs. Fungal genes were predicted using the EST version of the self-training GeneMarkS algorithm. To refine the EST database, we compared EST sequences by BLASTN to a set of 454 pyrosequencing-generated contigs and Sanger BAC-end sequences derived both from the Pt genome, and to ESTs and genome reads from wheat. A collection of 6,308 fungal genes was identified and compared to sequences of the cereal rusts, Puccinia graminis f. sp. tritici (Pgt and stripe rust, P. striiformis f. sp
Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics

Directory of Open Access Journals (Sweden)

Anjani Ragothaman

2014-01-01

Full Text Available While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread—a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure.
Exploiting the transcriptome of Euphrates Poplar, Populus euphratica (Salicaceae to develop and characterize new EST-SSR markers and construct an EST-SSR database.

Directory of Open Access Journals (Sweden)

Fang K Du

Full Text Available BACKGROUND: Microsatellite markers or Simple Sequence Repeats (SSRs are the most popular markers in population/conservation genetics. However, the development of novel microsatellite markers has been impeded by high costs, a lack of available sequence data and technical difficulties. New species-specific microsatellite markers were required to investigate the evolutionary history of the Euphratica tree, Populus euphratica, the only tree species found in the desert regions of Western China and adjacent Central Asian countries. METHODOLOGY/PRINCIPAL FINDINGS: A total of 94,090 non-redundant Expressed Sequence Tags (ESTs from P. euphratica comprising around 63 Mb of sequence data were searched for SSRs. 4,202 SSRs were found in 3,839 ESTs, with 311 ESTs containing multiple SSRs. The most common motif types were trinucleotides (37% and hexanucleotides (33% repeats. We developed primer pairs for all of the identified EST-SSRs (eSSRs and selected 673 of these pairs at random for further validation. 575 pairs (85% gave successful amplification, of which, 464 (80.7% were polymorphic in six to 24 individuals from natural populations across Northern China. We also tested the transferability of the polymorphic eSSRs to nine other Populus species. In addition, to facilitate the use of these new eSSR markers by other researchers, we mapped them onto Populus trichocarpa scaffolds in silico and compiled our data into a web-based database (http://202.205.131.253:8080/poplar/resources/static_page/index.html. CONCLUSIONS: The large set of validated eSSRs identified in this work will have many potential applications in studies on P. euphratica and other poplar species, in fields such as population genetics, comparative genomics, linkage mapping, QTL, and marker-assisted breeding. Their use will be facilitated by their incorporation into a user-friendly web-based database.
New insights about enzyme evolution from large scale studies of sequence and structure relationships.

Science.gov (United States)

Brown, Shoshana D; Babbitt, Patricia C

2014-10-31

Understanding how enzymes have evolved offers clues about their structure-function relationships and mechanisms. Here, we describe evolution of functionally diverse enzyme superfamilies, each representing a large set of sequences that evolved from a common ancestor and that retain conserved features of their structures and active sites. Using several examples, we describe the different structural strategies nature has used to evolve new reaction and substrate specificities in each unique superfamily. The results provide insight about enzyme evolution that is not easily obtained from studies of one or only a few enzymes. © 2014 by The American Society for Biochemistry and Molecular Biology, Inc.
An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms

Directory of Open Access Journals (Sweden)

B. Jayashree

2007-01-01

Full Text Available The large amounts of EST sequence data available from a single species of an organism as well as for several species within a genus provide an easy source of identification of intra- and interspecies single nucleotide polymorphisms (SNPs. In the case of model organisms, the data available are numerous, given the degree of redundancy in the deposited EST data. There are several available bioinformatics tools that can be used to mine this data; however, using them requires a certain level of expertise: the tools have to be used sequentially with accompanying format conversion and steps like clustering and assembly of sequences become time-intensive jobs even for moderately sized datasets. We report here a pipeline of open source software extended to run on multiple CPU architectures that can be used to mine large EST datasets for SNPs and identify restriction sites for assaying the SNPs so that cost-effective CAPS assays can be developed for SNP genotyping in genetics and breeding applications. At the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT, the pipeline has been implemented to run on a Paracel high-performance system consisting of four dual AMD Opteron processors running Linux with MPICH. The pipeline can be accessed through user-friendly web interfaces at http://hpc.icrisat.cgiar.org/PBSWeb and is available on request for academic use. We have validated the developed pipeline by mining chickpea ESTs for interspecies SNPs, development of CAPS assays for SNP genotyping, and confirmation of restriction digestion pattern at the sequence level.
Low rank approximation methods for MR fingerprinting with large scale dictionaries.

Science.gov (United States)

Yang, Mingrui; Ma, Dan; Jiang, Yun; Hamilton, Jesse; Seiberlich, Nicole; Griswold, Mark A; McGivney, Debra

2018-04-01

This work proposes new low rank approximation approaches with significant memory savings for large scale MR fingerprinting (MRF) problems. We introduce a compressed MRF with randomized singular value decomposition method to significantly reduce the memory requirement for calculating a low rank approximation of large sized MRF dictionaries. We further relax this requirement by exploiting the structures of MRF dictionaries in the randomized singular value decomposition space and fitting them to low-degree polynomials to generate high resolution MRF parameter maps. In vivo 1.5T and 3T brain scan data are used to validate the approaches. T 1 , T 2 , and off-resonance maps are in good agreement with that of the standard MRF approach. Moreover, the memory savings is up to 1000 times for the MRF-fast imaging with steady-state precession sequence and more than 15 times for the MRF-balanced, steady-state free precession sequence. The proposed compressed MRF with randomized singular value decomposition and dictionary fitting methods are memory efficient low rank approximation methods, which can benefit the usage of MRF in clinical settings. They also have great potentials in large scale MRF problems, such as problems considering multi-component MRF parameters or high resolution in the parameter space. Magn Reson Med 79:2392-2400, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
The characterization of a new set of EST-derived simple sequence repeat (SSR markers as a resource for the genetic analysis of Phaseolus vulgaris

Directory of Open Access Journals (Sweden)

Borba Tereza CO

2011-05-01

Full Text Available Abstract Background Over recent years, a growing effort has been made to develop microsatellite markers for the genomic analysis of the common bean (Phaseolus vulgaris to broaden the knowledge of the molecular genetic basis of this species. The availability of large sets of expressed sequence tags (ESTs in public databases has given rise to an expedient approach for the identification of SSRs (Simple Sequence Repeats, specifically EST-derived SSRs. In the present work, a battery of new microsatellite markers was obtained from a search of the Phaseolus vulgaris EST database. The diversity, degree of transferability and polymorphism of these markers were tested. Results From 9,583 valid ESTs, 4,764 had microsatellite motifs, from which 377 were used to design primers, and 302 (80.11% showed good amplification quality. To analyze transferability, a group of 167 SSRs were tested, and the results showed that they were 82% transferable across at least one species. The highest amplification rates were observed between the species from the Phaseolus (63.7%, Vigna (25.9%, Glycine (19.8%, Medicago (10.2%, Dipterix (6% and Arachis (1.8% genera. The average PIC (Polymorphism Information Content varied from 0.53 for genomic SSRs to 0.47 for EST-SSRs, and the average number of alleles per locus was 4 and 3, respectively. Among the 315 newly tested SSRs in the BJ (BAT93 X Jalo EEP558 population, 24% (76 were polymorphic. The integration of these segregant loci into a framework map composed of 123 previously obtained SSR markers yielded a total of 199 segregant loci, of which 182 (91.5% were mapped to 14 linkage groups, resulting in a map length of 1,157 cM. Conclusions A total of 302 newly developed EST-SSR markers, showing good amplification quality, are available for the genetic analysis of Phaseolus vulgaris. These markers showed satisfactory rates of transferability, especially between species that have great economic and genomic values. Their diversity

Genomic analysis of expressed sequence tags in American black bear Ursus americanus

Science.gov (United States)

2010-01-01

Background Species of the bear family (Ursidae) are important organisms for research in molecular evolution, comparative physiology and conservation biology, but relatively little genetic sequence information is available for this group. Here we report the development and analyses of the first large scale Expressed Sequence Tag (EST) resource for the American black bear (Ursus americanus). Results Comprehensive analyses of molecular functions, alternative splicing, and tissue-specific expression of 38,757 black bear EST sequences were conducted using the dog genome as a reference. We identified 18 genes, involved in functions such as lipid catabolism, cell cycle, and vesicle-mediated transport, that are showing rapid evolution in the bear lineage Three genes, Phospholamban (PLN), cysteine glycine-rich protein 3 (CSRP3) and Troponin I type 3 (TNNI3), are related to heart contraction, and defects in these genes in humans lead to heart disease. Two genes, biphenyl hydrolase-like (BPHL) and CSRP3, contain positively selected sites in bear. Global analysis of evolution rates of hibernation-related genes in bear showed that they are largely conserved and slowly evolving genes, rather than novel and fast-evolving genes. Conclusion We provide a genomic resource for an important mammalian organism and our study sheds new light on the possible functions and evolution of bear genes. PMID:20338065
Genomic analysis of expressed sequence tags in American black bear Ursus americanus.

Science.gov (United States)

Zhao, Sen; Shao, Chunxuan; Goropashnaya, Anna V; Stewart, Nathan C; Xu, Yichi; Tøien, Øivind; Barnes, Brian M; Fedorov, Vadim B; Yan, Jun

2010-03-26

Species of the bear family (Ursidae) are important organisms for research in molecular evolution, comparative physiology and conservation biology, but relatively little genetic sequence information is available for this group. Here we report the development and analyses of the first large scale Expressed Sequence Tag (EST) resource for the American black bear (Ursus americanus). Comprehensive analyses of molecular functions, alternative splicing, and tissue-specific expression of 38,757 black bear EST sequences were conducted using the dog genome as a reference. We identified 18 genes, involved in functions such as lipid catabolism, cell cycle, and vesicle-mediated transport, that are showing rapid evolution in the bear lineage Three genes, Phospholamban (PLN), cysteine glycine-rich protein 3 (CSRP3) and Troponin I type 3 (TNNI3), are related to heart contraction, and defects in these genes in humans lead to heart disease. Two genes, biphenyl hydrolase-like (BPHL) and CSRP3, contain positively selected sites in bear. Global analysis of evolution rates of hibernation-related genes in bear showed that they are largely conserved and slowly evolving genes, rather than novel and fast-evolving genes. We provide a genomic resource for an important mammalian organism and our study sheds new light on the possible functions and evolution of bear genes.
An elm EST database for identifying leaf beetle egg-induced defense genes

Directory of Open Access Journals (Sweden)

Büchel Kerstin

2012-06-01

Full Text Available Abstract Background Plants can defend themselves against herbivorous insects prior to the onset of larval feeding by responding to the eggs laid on their leaves. In the European field elm (Ulmus minor, egg laying by the elm leaf beetle ( Xanthogaleruca luteola activates the emission of volatiles that attract specialised egg parasitoids, which in turn kill the eggs. Little is known about the transcriptional changes that insect eggs trigger in plants and how such indirect defense mechanisms are orchestrated in the context of other biological processes. Results Here we present the first large scale study of egg-induced changes in the transcriptional profile of a tree. Five cDNA libraries were generated from leaves of (i untreated control elms, and elms treated with (ii egg laying and feeding by elm leaf beetles, (iii feeding, (iv artificial transfer of egg clutches, and (v methyl jasmonate. A total of 361,196 ESTs expressed sequence tags (ESTs were identified which clustered into 52,823 unique transcripts (Unitrans and were stored in a database with a public web interface. Among the analyzed Unitrans, 73% could be annotated by homology to known genes in the UniProt (Plant database, particularly to those from Vitis, Ricinus, Populus and Arabidopsis. Comparative in silico analysis among the different treatments revealed differences in Gene Ontology term abundances. Defense- and stress-related gene transcripts were present in high abundance in leaves after herbivore egg laying, but transcripts involved in photosynthesis showed decreased abundance. Many pathogen-related genes and genes involved in phytohormone signaling were expressed, indicative of jasmonic acid biosynthesis and activation of jasmonic acid responsive genes. Cross-comparisons between different libraries based on expression profiles allowed the identification of genes with a potential relevance in egg-induced defenses, as well as other biological processes, including signal transduction
An elm EST database for identifying leaf beetle egg-induced defense genes.

Science.gov (United States)

Büchel, Kerstin; McDowell, Eric; Nelson, Will; Descour, Anne; Gershenzon, Jonathan; Hilker, Monika; Soderlund, Carol; Gang, David R; Fenning, Trevor; Meiners, Torsten

2012-06-15

Plants can defend themselves against herbivorous insects prior to the onset of larval feeding by responding to the eggs laid on their leaves. In the European field elm (Ulmus minor), egg laying by the elm leaf beetle ( Xanthogaleruca luteola) activates the emission of volatiles that attract specialised egg parasitoids, which in turn kill the eggs. Little is known about the transcriptional changes that insect eggs trigger in plants and how such indirect defense mechanisms are orchestrated in the context of other biological processes. Here we present the first large scale study of egg-induced changes in the transcriptional profile of a tree. Five cDNA libraries were generated from leaves of (i) untreated control elms, and elms treated with (ii) egg laying and feeding by elm leaf beetles, (iii) feeding, (iv) artificial transfer of egg clutches, and (v) methyl jasmonate. A total of 361,196 ESTs expressed sequence tags (ESTs) were identified which clustered into 52,823 unique transcripts (Unitrans) and were stored in a database with a public web interface. Among the analyzed Unitrans, 73% could be annotated by homology to known genes in the UniProt (Plant) database, particularly to those from Vitis, Ricinus, Populus and Arabidopsis. Comparative in silico analysis among the different treatments revealed differences in Gene Ontology term abundances. Defense- and stress-related gene transcripts were present in high abundance in leaves after herbivore egg laying, but transcripts involved in photosynthesis showed decreased abundance. Many pathogen-related genes and genes involved in phytohormone signaling were expressed, indicative of jasmonic acid biosynthesis and activation of jasmonic acid responsive genes. Cross-comparisons between different libraries based on expression profiles allowed the identification of genes with a potential relevance in egg-induced defenses, as well as other biological processes, including signal transduction, transport and primary metabolism
Political consultation and large-scale research

International Nuclear Information System (INIS)

Bechmann, G.; Folkers, H.

1977-01-01

Large-scale research and policy consulting have an intermediary position between sociological sub-systems. While large-scale research coordinates science, policy, and production, policy consulting coordinates science, policy and political spheres. In this very position, large-scale research and policy consulting lack of institutional guarantees and rational back-ground guarantee which are characteristic for their sociological environment. This large-scale research can neither deal with the production of innovative goods under consideration of rentability, nor can it hope for full recognition by the basis-oriented scientific community. Policy consulting knows neither the competence assignment of the political system to make decisions nor can it judge succesfully by the critical standards of the established social science, at least as far as the present situation is concerned. This intermediary position of large-scale research and policy consulting has, in three points, a consequence supporting the thesis which states that this is a new form of institutionalization of science: These are: 1) external control, 2) the organization form, 3) the theoretical conception of large-scale research and policy consulting. (orig.) [de
Development and Validation of EST-SSR Markers from the Transcriptome of Adzuki Bean (Vigna angularis).

Science.gov (United States)

Chen, Honglin; Liu, Liping; Wang, Lixia; Wang, Suhua; Somta, Prakit; Cheng, Xuzhen

2015-01-01

The adzuki bean (Vigna angularis (Ohwi) Ohwi and Ohashi) is an important grain legume of Asia. It is cultivated mainly in China, Japan and Korea. Despite its importance, few genomic resources are available for molecular genetic research of adzuki bean. In this study, we developed EST-SSR markers for the adzuki bean through next-generation sequencing. More than 112 million high-quality cDNA sequence reads were obtained from adzuki bean using Illumina paired-end sequencing technology, and the sequences were de novo assembled into 65,950 unigenes. The average length of the unigenes was 1,213 bp. Among the unigenes, 14,547 sequences contained a unique simple sequence repeat (SSR) and 3,350 sequences contained more than one SSR. A total of 7,947 EST-SSRs were identified as potential molecular markers, with mono-nucleotide A/T repeats (99.0%) as the most abundant motif class, followed by AG/CT (68.4%), AAG/CTT (30.0%), AAAG/CTTT (26.2%), AAAAG/CTTTT (16.1%), and AACGGG/CCCGTT (6.0%). A total of 500 SSR markers were randomly selected for validation, of which 296 markers produced reproducible amplicons with 38 polymorphic markers among the 32 adzuki bean genotypes selected from diverse geographical locations across China. The large number of SSR-containing sequences and EST-SSR markers will be valuable for genetic analysis of the adzuki bean and related Vigna species.
Large-scale multimedia modeling applications

International Nuclear Information System (INIS)

Droppo, J.G. Jr.; Buck, J.W.; Whelan, G.; Strenge, D.L.; Castleton, K.J.; Gelston, G.M.

1995-08-01

Over the past decade, the US Department of Energy (DOE) and other agencies have faced increasing scrutiny for a wide range of environmental issues related to past and current practices. A number of large-scale applications have been undertaken that required analysis of large numbers of potential environmental issues over a wide range of environmental conditions and contaminants. Several of these applications, referred to here as large-scale applications, have addressed long-term public health risks using a holistic approach for assessing impacts from potential waterborne and airborne transport pathways. Multimedia models such as the Multimedia Environmental Pollutant Assessment System (MEPAS) were designed for use in such applications. MEPAS integrates radioactive and hazardous contaminants impact computations for major exposure routes via air, surface water, ground water, and overland flow transport. A number of large-scale applications of MEPAS have been conducted to assess various endpoints for environmental and human health impacts. These applications are described in terms of lessons learned in the development of an effective approach for large-scale applications
SWORDS: A statistical tool for analysing large DNA sequences

Indian Academy of Sciences (India)

Unknown

These techniques are based on frequency distributions of DNA words in a large sequence, and have been packaged into a software called SWORDS. Using sequences available in ... tions with the cellular processes like recombination, replication .... in DNA sequences using certain specific probability laws. (Pevzner et al ...
Decentralized Large-Scale Power Balancing

DEFF Research Database (Denmark)

Halvgaard, Rasmus; Jørgensen, John Bagterp; Poulsen, Niels Kjølstad

2013-01-01

problem is formulated as a centralized large-scale optimization problem but is then decomposed into smaller subproblems that are solved locally by each unit connected to an aggregator. For large-scale systems the method is faster than solving the full problem and can be distributed to include an arbitrary...
Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

Science.gov (United States)

Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo

2016-07-19

Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .
Automating large-scale reactor systems

International Nuclear Information System (INIS)

Kisner, R.A.

1985-01-01

This paper conveys a philosophy for developing automated large-scale control systems that behave in an integrated, intelligent, flexible manner. Methods for operating large-scale systems under varying degrees of equipment degradation are discussed, and a design approach that separates the effort into phases is suggested. 5 refs., 1 fig
Compressing DNA sequence databases with coil

Directory of Open Access Journals (Sweden)

Hendy Michael D

2008-05-01

Full Text Available Abstract Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.
Pairagon+N-SCAN_EST: a model-based gene annotation pipeline

DEFF Research Database (Denmark)

Arumugam, Manimozhiyan; Wei, Chaochun; Brown, Randall H

2006-01-01

This paper describes Pairagon+N-SCAN_EST, a gene annotation pipeline that uses only native alignments. For each expressed sequence it chooses the best genomic alignment. Systems like ENSEMBL and ExoGean rely on trans alignments, in which expressed sequences are aligned to the genomic loci...... with de novo gene prediction by using N-SCAN_EST. N-SCAN_EST is based on a generalized HMM probability model augmented with a phylogenetic conservation model and EST alignments. It can predict complete transcripts by extending or merging EST alignments, but it can also predict genes in regions without EST...
Deep sequencing of ESTs from nacreous and prismatic layer producing tissues and a screen for novel shell formation-related genes in the pearl oyster.

Directory of Open Access Journals (Sweden)

Shigeharu Kinoshita

Full Text Available BACKGROUND: Despite its economic importance, we have a limited understanding of the molecular mechanisms underlying shell formation in pearl oysters, wherein the calcium carbonate crystals, nacre and prism, are formed in a highly controlled manner. We constructed comprehensive expressed gene profiles in the shell-forming tissues of the pearl oyster Pinctada fucata and identified novel shell formation-related genes candidates. PRINCIPAL FINDINGS: We employed the GS FLX 454 system and constructed transcriptome data sets from pallial mantle and pearl sac, which form the nacreous layer, and from the mantle edge, which forms the prismatic layer in P. fucata. We sequenced 260477 reads and obtained 29682 unique sequences. We also screened novel nacreous and prismatic gene candidates by a combined analysis of sequence and expression data sets, and identified various genes encoding lectin, protease, protease inhibitors, lysine-rich matrix protein, and secreting calcium-binding proteins. We also examined the expression of known nacreous and prismatic genes in our EST library and identified novel isoforms with tissue-specific expressions. CONCLUSIONS: We constructed EST data sets from the nacre- and prism-producing tissues in P. fucata and found 29682 unique sequences containing novel gene candidates for nacreous and prismatic layer formation. This is the first report of deep sequencing of ESTs in the shell-forming tissues of P. fucata and our data provide a powerful tool for a comprehensive understanding of the molecular mechanisms of molluscan biomineralization.
A large-scale chromosome-specific SNP discovery guideline.

Science.gov (United States)

Akpinar, Bala Ani; Lucas, Stuart; Budak, Hikmet

2017-01-01

Single-nucleotide polymorphisms (SNPs) are the most prevalent type of variation in genomes that are increasingly being used as molecular markers in diversity analyses, mapping and cloning of genes, and germplasm characterization. However, only a few studies reported large-scale SNP discovery in Aegilops tauschii, restricting their potential use as markers for the low-polymorphic D genome. Here, we report 68,592 SNPs found on the gene-related sequences of the 5D chromosome of Ae. tauschii genotype MvGB589 using genomic and transcriptomic sequences from seven Ae. tauschii accessions, including AL8/78, the only genotype for which a draft genome sequence is available at present. We also suggest a workflow to compare SNP positions in homologous regions on the 5D chromosome of Triticum aestivum, bread wheat, to mark single nucleotide variations between these closely related species. Overall, the identified SNPs define a density of 4.49 SNPs per kilobyte, among the highest reported for the genic regions of Ae. tauschii so far. To our knowledge, this study also presents the first chromosome-specific SNP catalog in Ae. tauschii that should facilitate the association of these SNPs with morphological traits on chromosome 5D to be ultimately targeted for wheat improvement.
The Software Reliability of Large Scale Integration Circuit and Very Large Scale Integration Circuit

OpenAIRE

Artem Ganiyev; Jan Vitasek

2010-01-01

This article describes evaluation method of faultless function of large scale integration circuits (LSI) and very large scale integration circuits (VLSI). In the article there is a comparative analysis of factors which determine faultless of integrated circuits, analysis of already existing methods and model of faultless function evaluation of LSI and VLSI. The main part describes a proposed algorithm and program for analysis of fault rate in LSI and VLSI circuits.
Testing Scaling Relations for Solar-like Oscillations from the Main Sequence to Red Giants Using Kepler Data

DEFF Research Database (Denmark)

Huber, D.; Bedding, T.R.; Stello, D.

2011-01-01

), and oscillation amplitudes. We show that the difference of the Δν-νmax relation for unevolved and evolved stars can be explained by different distributions in effective temperature and stellar mass, in agreement with what is expected from scaling relations. For oscillation amplitudes, we show that neither (L/M) s......We have analyzed solar-like oscillations in ~1700 stars observed by the Kepler Mission, spanning from the main sequence to the red clump. Using evolutionary models, we test asteroseismic scaling relations for the frequency of maximum power (νmax), the large frequency separation (Δν...... scaling nor the revised scaling relation by Kjeldsen & Bedding is accurate for red-giant stars, and demonstrate that a revised scaling relation with a separate luminosity-mass dependence can be used to calculate amplitudes from the main sequence to red giants to a precision of ~25%. The residuals show...
Managing large-scale models: DBS

International Nuclear Information System (INIS)

1981-05-01

A set of fundamental management tools for developing and operating a large scale model and data base system is presented. Based on experience in operating and developing a large scale computerized system, the only reasonable way to gain strong management control of such a system is to implement appropriate controls and procedures. Chapter I discusses the purpose of the book. Chapter II classifies a broad range of generic management problems into three groups: documentation, operations, and maintenance. First, system problems are identified then solutions for gaining management control are disucssed. Chapters III, IV, and V present practical methods for dealing with these problems. These methods were developed for managing SEAS but have general application for large scale models and data bases
Large Scale Self-Organizing Information Distribution System

National Research Council Canada - National Science Library

Low, Steven

2005-01-01

This project investigates issues in "large-scale" networks. Here "large-scale" refers to networks with large number of high capacity nodes and transmission links, and shared by a large number of users...
Large scale structure and baryogenesis

International Nuclear Information System (INIS)

Kirilova, D.P.; Chizhov, M.V.

2001-08-01

We discuss a possible connection between the large scale structure formation and the baryogenesis in the universe. An update review of the observational indications for the presence of a very large scale 120h -1 Mpc in the distribution of the visible matter of the universe is provided. The possibility to generate a periodic distribution with the characteristic scale 120h -1 Mpc through a mechanism producing quasi-periodic baryon density perturbations during inflationary stage, is discussed. The evolution of the baryon charge density distribution is explored in the framework of a low temperature boson condensate baryogenesis scenario. Both the observed very large scale of a the visible matter distribution in the universe and the observed baryon asymmetry value could naturally appear as a result of the evolution of a complex scalar field condensate, formed at the inflationary stage. Moreover, for some model's parameters a natural separation of matter superclusters from antimatter ones can be achieved. (author)

Automatic management software for large-scale cluster system

International Nuclear Information System (INIS)

Weng Yunjian; Chinese Academy of Sciences, Beijing; Sun Gongxing

2007-01-01

At present, the large-scale cluster system faces to the difficult management. For example the manager has large work load. It needs to cost much time on the management and the maintenance of large-scale cluster system. The nodes in large-scale cluster system are very easy to be chaotic. Thousands of nodes are put in big rooms so that some managers are very easy to make the confusion with machines. How do effectively carry on accurate management under the large-scale cluster system? The article introduces ELFms in the large-scale cluster system. Furthermore, it is proposed to realize the large-scale cluster system automatic management. (authors)
New Sequences with Low Correlation and Large Family Size

Science.gov (United States)

Zeng, Fanxin

In direct-sequence code-division multiple-access (DS-CDMA) communication systems and direct-sequence ultra wideband (DS-UWB) radios, sequences with low correlation and large family size are important for reducing multiple access interference (MAI) and accepting more active users, respectively. In this paper, a new collection of families of sequences of length pn-1, which includes three constructions, is proposed. The maximum number of cyclically distinct families without GMW sequences in each construction is φ(pn-1)/n·φ(pm-1)/m, where p is a prime number, n is an even number, and n=2m, and these sequences can be binary or polyphase depending upon choice of the parameter p. In Construction I, there are pn distinct sequences within each family and the new sequences have at most d+2 nontrivial periodic correlation {-pm-1, -1, pm-1, 2pm-1,…,dpm-1}. In Construction II, the new sequences have large family size p2n and possibly take the nontrivial correlation values in {-pm-1, -1, pm-1, 2pm-1,…,(3d-4)pm-1}. In Construction III, the new sequences possess the largest family size p(d-1)n and have at most 2d correlation levels {-pm-1, -1,pm-1, 2pm-1,…,(2d-2)pm-1}. Three constructions are near-optimal with respect to the Welch bound because the values of their Welch-Ratios are moderate, WR_??_d, WR_??_3d-4 and WR_??_2d-2, respectively. Each family in Constructions I, II and III contains a GMW sequence. In addition, Helleseth sequences and Niho sequences are special cases in Constructions I and III, and their restriction conditions to the integers m and n, pm≠2 (mod 3) and n≅0 (mod 4), respectively, are removed in our sequences. Our sequences in Construction III include the sequences with Niho type decimation 3·2m-2, too. Finally, some open questions are pointed out and an example that illustrates the performance of these sequences is given.
BFAST: an alignment tool for large scale genome resequencing.

Directory of Open Access Journals (Sweden)

Nils Homer

2009-11-01

Full Text Available The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25-100 base range, in the presence of errors and true biological variation.We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels.We compare BFAST to a selection of large-scale alignment tools -- BLAT, MAQ, SHRiMP, and SOAP -- in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at (http://bfast.sourceforge.net.
Large scale network-centric distributed systems

CERN Document Server

Sarbazi-Azad, Hamid

2014-01-01

A highly accessible reference offering a broad range of topics and insights on large scale network-centric distributed systems Evolving from the fields of high-performance computing and networking, large scale network-centric distributed systems continues to grow as one of the most important topics in computing and communication and many interdisciplinary areas. Dealing with both wired and wireless networks, this book focuses on the design and performance issues of such systems. Large Scale Network-Centric Distributed Systems provides in-depth coverage ranging from ground-level hardware issu
Large-Scale Outflows in Seyfert Galaxies

Science.gov (United States)

Colbert, E. J. M.; Baum, S. A.

1995-12-01

\\catcode`\\@=11 \\ialign{m @th#1hfil ##hfil \\crcr#2\\crcr\\sim\\crcr}}} \\catcode`\\@=12 Highly collimated outflows extend out to Mpc scales in many radio-loud active galaxies. In Seyfert galaxies, which are radio-quiet, the outflows extend out to kpc scales and do not appear to be as highly collimated. In order to study the nature of large-scale (>~1 kpc) outflows in Seyferts, we have conducted optical, radio and X-ray surveys of a distance-limited sample of 22 edge-on Seyfert galaxies. Results of the optical emission-line imaging and spectroscopic survey imply that large-scale outflows are present in >~{{1} /{4}} of all Seyferts. The radio (VLA) and X-ray (ROSAT) surveys show that large-scale radio and X-ray emission is present at about the same frequency. Kinetic luminosities of the outflows in Seyferts are comparable to those in starburst-driven superwinds. Large-scale radio sources in Seyferts appear diffuse, but do not resemble radio halos found in some edge-on starburst galaxies (e.g. M82). We discuss the feasibility of the outflows being powered by the active nucleus (e.g. a jet) or a circumnuclear starburst.
Trimming and clustering sugarcane ESTs

Directory of Open Access Journals (Sweden)

Guilherme P. Telles

2001-12-01

Full Text Available The original clustering procedure adopted in the Sugarcane Expressed Sequence Tag project (SUCEST had many problems, for instance too many clusters, the presence of ribosomal sequences, etc. We therefore redesigned the clustering procedure entirely, including a much more careful initial trimming of the reads. In this paper the new trimming and clustering strategies are described in detail and we give the new official figures for the project, 237,954 expressed sequence tags and 43,141 clusters.O método de clustering adotado no Projeto SUCEST (Sugarcane EST Project tinha vários problemas (muitos clusters, presença de seqüências de ribossomo etc. Nós assumimos a tarefa de reprojetar todo o processo de clustering, propondo uma "limpeza" inicial mais cuidadosa das seqüências. Neste artigo as estratégias de limpeza das seqüências e de clustering são descritas em detalhe, incluindo os números oficiais do projeto (237,954 ESTs e 43,141 clusters.
Comprehensive large-scale assessment of intrinsic protein disorder.

Science.gov (United States)

Walsh, Ian; Giollo, Manuel; Di Domenico, Tomás; Ferrari, Carlo; Zimmermann, Olav; Tosatto, Silvio C E

2015-01-15

Intrinsically disordered regions are key for the function of numerous proteins. Due to the difficulties in experimental disorder characterization, many computational predictors have been developed with various disorder flavors. Their performance is generally measured on small sets mainly from experimentally solved structures, e.g. Protein Data Bank (PDB) chains. MobiDB has only recently started to collect disorder annotations from multiple experimental structures. MobiDB annotates disorder for UniProt sequences, allowing us to conduct the first large-scale assessment of fast disorder predictors on 25 833 different sequences with X-ray crystallographic structures. In addition to a comprehensive ranking of predictors, this analysis produced the following interesting observations. (i) The predictors cluster according to their disorder definition, with a consensus giving more confidence. (ii) Previous assessments appear over-reliant on data annotated at the PDB chain level and performance is lower on entire UniProt sequences. (iii) Long disordered regions are harder to predict. (iv) Depending on the structural and functional types of the proteins, differences in prediction performance of up to 10% are observed. The datasets are available from Web site at URL: http://mobidb.bio.unipd.it/lsd. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Analysis of cassava (Manihot esculenta) ESTs: A tool for the discovery of genes

International Nuclear Information System (INIS)

Zapata, Andres; Neme, Rafik; Sanabria, Carolina; Lopez, Camilo

2011-01-01

Cassava (Manihot esculenta) is the main source of calories for more than 1,000 millions of people around the world and has been consolidated as the fourth most important crop after rice, corn and wheat. Cassava is considered tolerant to abiotic and biotic stress conditions; nevertheless these characteristics are mainly present in non-commercial varieties. Genetic breeding strategies represent an alternative to introduce the desirable characteristics into commercial varieties. A fundamental step for accelerating the genetic breeding process in cassava requires the identification of genes associated to these characteristics. One rapid strategy for the identification of genes is the possibility to have a large collection of ESTs (expressed sequence tag). In this study, a complete analysis of cassava ESTs was done. The cassava ESTs represent 80,459 sequences which were assembled in a set of 29,231 unique genes (unigen), comprising 10,945 contigs and 18,286 singletones. These 29,231 unique genes represent about 80% of the genes of the cassava's genome. Between 5% and 10% of the unigenes of cassava not show similarity to any sequences present in the NCBI database and could be consider as cassava specific genes. a functional category was assigned to a group of sequences of the unigen set (29%) following the Gene Ontology Vocabulary. the molecular function component was the best represented with 43% of the sequences, followed by the biological process component (38%) and finally the cellular component with 19%. in the cassava ESTs collection, 3,709 microsatellites were identified and they could be used as molecular markers. this study represents an important contribution to the knowledge of the functional genomic structure of cassava and constitutes an important tool for the identification of genes associated to agricultural characteristics of interest that could be employed in cassava breeding programs.
SCALE INTERACTION IN A MIXING LAYER. THE ROLE OF THE LARGE-SCALE GRADIENTS

KAUST Repository

Fiscaletti, Daniele

2015-08-23

The interaction between scales is investigated in a turbulent mixing layer. The large-scale amplitude modulation of the small scales already observed in other works depends on the crosswise location. Large-scale positive fluctuations correlate with a stronger activity of the small scales on the low speed-side of the mixing layer, and a reduced activity on the high speed-side. However, from physical considerations we would expect the scales to interact in a qualitatively similar way within the flow and across different turbulent flows. Therefore, instead of the large-scale fluctuations, the large-scale gradients modulation of the small scales has been additionally investigated.
Dissecting the large-scale galactic conformity

Science.gov (United States)

Seo, Seongu

2018-01-01

Galactic conformity is an observed phenomenon that galaxies located in the same region have similar properties such as star formation rate, color, gas fraction, and so on. The conformity was first observed among galaxies within in the same halos (“one-halo conformity”). The one-halo conformity can be readily explained by mutual interactions among galaxies within a halo. Recent observations however further witnessed a puzzling connection among galaxies with no direct interaction. In particular, galaxies located within a sphere of ~5 Mpc radius tend to show similarities, even though the galaxies do not share common halos with each other ("two-halo conformity" or “large-scale conformity”). Using a cosmological hydrodynamic simulation, Illustris, we investigate the physical origin of the two-halo conformity and put forward two scenarios. First, back-splash galaxies are likely responsible for the large-scale conformity. They have evolved into red galaxies due to ram-pressure stripping in a given galaxy cluster and happen to reside now within a ~5 Mpc sphere. Second, galaxies in strong tidal field induced by large-scale structure also seem to give rise to the large-scale conformity. The strong tides suppress star formation in the galaxies. We discuss the importance of the large-scale conformity in the context of galaxy evolution.
Large-scale parallel genome assembler over cloud computing environment.

Science.gov (United States)

Das, Arghya Kusum; Koppa, Praveen Kumar; Goswami, Sayan; Platania, Richard; Park, Seung-Jong

2017-06-01

The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research. In this paper, we present a de Bruijn graph oriented Parallel Giraph-based Genome Assembler (GiGA), as well as the hardware platform required for its optimal performance. GiGA uses the power of Hadoop (MapReduce) and Giraph (large-scale graph analysis) to achieve high scalability over hundreds of compute nodes by collocating the computation and data. GiGA achieves significantly higher scalability with competitive assembly quality compared to contemporary parallel assemblers (e.g. ABySS and Contrail) over traditional HPC cluster. Moreover, we show that the performance of GiGA is significantly improved by using an SSD-based private cloud infrastructure over traditional HPC cluster. We observe that the performance of GiGA on 256 cores of this SSD-based cloud infrastructure closely matches that of 512 cores of traditional HPC cluster.
EST and transcriptome analysis of cephalochordate amphioxus--past, present and future.

Science.gov (United States)

Wang, Yu-Bin; Chen, Shu-Hwa; Lin, Chun-Yen; Yu, Jr-Kai

2012-03-01

The cephalochordates, commonly known as amphioxus or lancelets, are now considered the most basal chordate group, and the studies of these organisms therefore offer important insights into various levels of evolutionary biology. In the past two decades, the investigation of amphioxus developmental biology has provided key knowledge for understanding the basic patterning mechanisms of chordates. Comparative genome studies of vertebrates and amphioxus have uncovered clear evidence supporting the hypothesis of two-round whole-genome duplication thought to have occurred early in vertebrate evolution and have shed light on the evolution of morphological novelties in the complex vertebrate body plan. Complementary to the amphioxus genome-sequencing project, a large collection of expressed sequence tags (ESTs) has been generated for amphioxus in recent years; this valuable collection represents a rich resource for gene discovery, expression profiling and molecular developmental studies in the amphioxus model. Here, we review previous EST analyses and available cDNA resources in amphioxus and discuss their value for use in evolutionary and developmental studies. We also discuss the potential advantages of applying high-throughput, next-generation sequencing (NGS) technologies to the field of amphioxus research.
Novel and Stress Relevant EST Derived SSR Markers Developed and Validated in Peanut

Science.gov (United States)

Bosamia, Tejas C.; Mishra, Gyan P.; Thankappan, Radhakrishnan; Dobaria, Jentilal R.

2015-01-01

With the aim to increase the number of functional markers in resource poor crop like cultivated peanut (Arachis hypogaea), large numbers of available expressed sequence tags (ESTs) in the public databases, were employed for the development of novel EST derived simple sequence repeat (SSR) markers. From 16424 unigenes, 2784 (16.95%) SSRs containing unigenes having 3373 SSR motifs were identified. Of these, 2027 (72.81%) sequences were annotated and 4124 gene ontology terms were assigned. Among different SSR motif-classes, tri-nucleotide repeats (33.86%) were the most abundant followed by di-nucleotide repeats (27.51%) while AG/CT (20.7%) and AAG/CTT (13.25%) were the most abundant repeat-motifs. A total of 2456 EST-SSR novel primer pairs were designed, of which 366 unigenes having relevance to various stresses and other functions, were PCR validated using a set of 11 diverse peanut genotypes. Of these, 340 (92.62%) primer pairs yielded clear and scorable PCR products and 39 (10.66%) primer pairs exhibited polymorphisms. Overall, the number of alleles per marker ranged from 1-12 with an average of 3.77 and the PIC ranged from 0.028 to 0.375 with an average of 0.325. The identified EST-SSRs not only enriched the existing molecular markers kitty, but would also facilitate the targeted research in marker-trait association for various stresses, inter-specific studies and genetic diversity analysis in peanut. PMID:26046991
Large-scale analysis of phosphorylation site occupancy in eukaryotic proteins

DEFF Research Database (Denmark)

Rao, R Shyama Prasad; Møller, Ian Max

2012-01-01

in proteins is currently lacking. We have therefore analyzed the occurrence and occupancy of phosphorylated sites (~ 100,281) in a large set of eukaryotic proteins (~ 22,995). Phosphorylation probability was found to be much higher in both the termini of protein sequences and this is much pronounced...... maximum randomness. An analysis of phosphorylation motifs indicated that just 40 motifs and a much lower number of associated kinases might account for nearly 50% of the known phosphorylations in eukaryotic proteins. Our results provide a broad picture of the phosphorylation sites in eukaryotic proteins.......Many recent high throughput technologies have enabled large-scale discoveries of new phosphorylation sites and phosphoproteins. Although they have provided a number of insights into protein phosphorylation and the related processes, an inclusive analysis on the nature of phosphorylated sites...
Large-scale perspective as a challenge

NARCIS (Netherlands)

Plomp, M.G.A.

2012-01-01

1. Scale forms a challenge for chain researchers: when exactly is something ‘large-scale’? What are the underlying factors (e.g. number of parties, data, objects in the chain, complexity) that determine this? It appears to be a continuum between small- and large-scale, where positioning on that
Algorithm 896: LSA: Algorithms for Large-Scale Optimization

Czech Academy of Sciences Publication Activity Database

Lukšan, Ladislav; Matonoha, Ctirad; Vlček, Jan

2009-01-01

Roč. 36, č. 3 (2009), 16-1-16-29 ISSN 0098-3500 R&D Pro jects: GA AV ČR IAA1030405; GA ČR GP201/06/P397 Institutional research plan: CEZ:AV0Z10300504 Keywords : algorithms * design * large-scale optimization * large-scale nonsmooth optimization * large-scale nonlinear least squares * large-scale nonlinear minimax * large-scale systems of nonlinear equations * sparse pro blems * partially separable pro blems * limited-memory methods * discrete Newton methods * quasi-Newton methods * primal interior-point methods Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 1.904, year: 2009
Scale interactions in a mixing layer – the role of the large-scale gradients

KAUST Repository

Fiscaletti, D.

2016-02-15

© 2016 Cambridge University Press. The interaction between the large and the small scales of turbulence is investigated in a mixing layer, at a Reynolds number based on the Taylor microscale of , via direct numerical simulations. The analysis is performed in physical space, and the local vorticity root-mean-square (r.m.s.) is taken as a measure of the small-scale activity. It is found that positive large-scale velocity fluctuations correspond to large vorticity r.m.s. on the low-speed side of the mixing layer, whereas, they correspond to low vorticity r.m.s. on the high-speed side. The relationship between large and small scales thus depends on position if the vorticity r.m.s. is correlated with the large-scale velocity fluctuations. On the contrary, the correlation coefficient is nearly constant throughout the mixing layer and close to unity if the vorticity r.m.s. is correlated with the large-scale velocity gradients. Therefore, the small-scale activity appears closely related to large-scale gradients, while the correlation between the small-scale activity and the large-scale velocity fluctuations is shown to reflect a property of the large scales. Furthermore, the vorticity from unfiltered (small scales) and from low pass filtered (large scales) velocity fields tend to be aligned when examined within vortical tubes. These results provide evidence for the so-called \\'scale invariance\\' (Meneveau & Katz, Annu. Rev. Fluid Mech., vol. 32, 2000, pp. 1-32), and suggest that some of the large-scale characteristics are not lost at the small scales, at least at the Reynolds number achieved in the present simulation.
Large-scale matrix-handling subroutines 'ATLAS'

International Nuclear Information System (INIS)

Tsunematsu, Toshihide; Takeda, Tatsuoki; Fujita, Keiichi; Matsuura, Toshihiko; Tahara, Nobuo

1978-03-01

Subroutine package ''ATLAS'' has been developed for handling large-scale matrices. The package is composed of four kinds of subroutines, i.e., basic arithmetic routines, routines for solving linear simultaneous equations and for solving general eigenvalue problems and utility routines. The subroutines are useful in large scale plasma-fluid simulations. (auth.)
Generation and analysis of expressed sequence tags in the extreme large genomes Lilium and Tulipa

Directory of Open Access Journals (Sweden)

Shahin Arwa

2012-11-01

Full Text Available Abstract Background Bulbous flowers such as lily and tulip (Liliaceae family are monocot perennial herbs that are economically very important ornamental plants worldwide. However, there are hardly any genetic studies performed and genomic resources are lacking. To build genomic resources and develop tools to speed up the breeding in both crops, next generation sequencing was implemented. We sequenced and assembled transcriptomes of four lily and five tulip genotypes using 454 pyro-sequencing technology. Results Successfully, we developed the first set of 81,791 contigs with an average length of 514 bp for tulip, and enriched the very limited number of 3,329 available ESTs (Expressed Sequence Tags for lily with 52,172 contigs with an average length of 555 bp. The contigs together with singletons covered on average 37% of lily and 39% of tulip estimated transcriptome. Mining lily and tulip sequence data for SSRs (Simple Sequence Repeats showed that di-nucleotide repeats were twice more abundant in UTRs (UnTranslated Regions compared to coding regions, while tri-nucleotide repeats were equally spread over coding and UTR regions. Two sets of single nucleotide polymorphism (SNP markers suitable for high throughput genotyping were developed. In the first set, no SNPs flanking the target SNP (50 bp on either side were allowed. In the second set, one SNP in the flanking regions was allowed, which resulted in a 2 to 3 fold increase in SNP marker numbers compared with the first set. Orthologous groups between the two flower bulbs: lily and tulip (12,017 groups and among the three monocot species: lily, tulip, and rice (6,900 groups were determined using OrthoMCL. Orthologous groups were screened for common SNP markers and EST-SSRs to study synteny between lily and tulip, which resulted in 113 common SNP markers and 292 common EST-SSR. Lily and tulip contigs generated were annotated and described according to Gene Ontology terminology. Conclusions
Generation and analysis of expressed sequence tags in the extreme large genomes Lilium and Tulipa.

Science.gov (United States)

Shahin, Arwa; van Kaauwen, Martijn; Esselink, Danny; Bargsten, Joachim W; van Tuyl, Jaap M; Visser, Richard G F; Arens, Paul

2012-11-20

Bulbous flowers such as lily and tulip (Liliaceae family) are monocot perennial herbs that are economically very important ornamental plants worldwide. However, there are hardly any genetic studies performed and genomic resources are lacking. To build genomic resources and develop tools to speed up the breeding in both crops, next generation sequencing was implemented. We sequenced and assembled transcriptomes of four lily and five tulip genotypes using 454 pyro-sequencing technology. Successfully, we developed the first set of 81,791 contigs with an average length of 514 bp for tulip, and enriched the very limited number of 3,329 available ESTs (Expressed Sequence Tags) for lily with 52,172 contigs with an average length of 555 bp. The contigs together with singletons covered on average 37% of lily and 39% of tulip estimated transcriptome. Mining lily and tulip sequence data for SSRs (Simple Sequence Repeats) showed that di-nucleotide repeats were twice more abundant in UTRs (UnTranslated Regions) compared to coding regions, while tri-nucleotide repeats were equally spread over coding and UTR regions. Two sets of single nucleotide polymorphism (SNP) markers suitable for high throughput genotyping were developed. In the first set, no SNPs flanking the target SNP (50 bp on either side) were allowed. In the second set, one SNP in the flanking regions was allowed, which resulted in a 2 to 3 fold increase in SNP marker numbers compared with the first set. Orthologous groups between the two flower bulbs: lily and tulip (12,017 groups) and among the three monocot species: lily, tulip, and rice (6,900 groups) were determined using OrthoMCL. Orthologous groups were screened for common SNP markers and EST-SSRs to study synteny between lily and tulip, which resulted in 113 common SNP markers and 292 common EST-SSR. Lily and tulip contigs generated were annotated and described according to Gene Ontology terminology. Two transcriptome sets were built that are valuable

Large-scale solar heat

Energy Technology Data Exchange (ETDEWEB)

Tolonen, J.; Konttinen, P.; Lund, P. [Helsinki Univ. of Technology, Otaniemi (Finland). Dept. of Engineering Physics and Mathematics

1998-12-31

In this project a large domestic solar heating system was built and a solar district heating system was modelled and simulated. Objectives were to improve the performance and reduce costs of a large-scale solar heating system. As a result of the project the benefit/cost ratio can be increased by 40 % through dimensioning and optimising the system at the designing stage. (orig.)
Characterization and development of EST-derived SSR markers in cultivated sweetpotato (Ipomoea batatas

Directory of Open Access Journals (Sweden)

Li Yujun

2011-10-01

Full Text Available Abstract Background Currently there exists a limited availability of genetic marker resources in sweetpotato (Ipomoea batatas, which is hindering genetic research in this species. It is necessary to develop more molecular markers for potential use in sweetpotato genetic research. With the newly developed next generation sequencing technology, large amount of transcribed sequences of sweetpotato have been generated and are available for identifying SSR markers by data mining. Results In this study, we investigated 181,615 ESTs for the identification and development of SSR markers. In total, 8,294 SSRs were identified from 7,163 SSR-containing unique ESTs. On an average, one SSR was found per 7.1 kb of EST sequence with tri-nucleotide motifs (42.9% being the most abundant followed by di- (41.2%, tetra- (9.2%, penta- (3.7% and hexa-nucleotide (3.1% repeat types. The top five motifs included AG/CT (26.9%, AAG/CTT (13.5%, AT/TA (10.6%, CCG/CGG (5.8% and AAT/ATT (4.5%. After removing possible duplicate of published EST-SSRs of sweetpotato, a total of non-repeat 7,958 SSR motifs were identified. Based on these SSR-containing sequences, 1,060 pairs of high-quality SSR primers were designed and used for validation of the amplification and assessment of the polymorphism between two parents of one mapping population (E Shu 3 Hao and Guang 2k-30 and eight accessions of cultivated sweetpotatoes. The results showed that 816 primer pairs could yield reproducible and strong amplification products, of which 195 (23.9% and 342 (41.9% primer pairs exhibited polymorphism between E Shu 3 Hao and Guang 2k-30 and among the 8 cultivated sweetpotatoes, respectively. Conclusion This study gives an insight into the frequency, type and distribution of sweetpotato EST-SSRs and demonstrates successful development of EST-SSR markers in cultivated sweetpotato. These EST-SSR markers could enrich the current resource of molecular markers for the sweetpotato community and would
Large scale identification and categorization of protein sequences using structured logistic regression

DEFF Research Database (Denmark)

Pedersen, Bjørn Panella; Ifrim, Georgiana; Liboriussen, Poul

2014-01-01

Abstract Background Structured Logistic Regression (SLR) is a newly developed machine learning tool first proposed in the context of text categorization. Current availability of extensive protein sequence databases calls for an automated method to reliably classify sequences and SLR seems well...... problem. Results Using SLR, we have built classifiers to identify and automatically categorize P-type ATPases into one of 11 pre-defined classes. The SLR-classifiers are compared to a Hidden Markov Model approach and shown to be highly accurate and scalable. Representing the bulk of currently known...... for further biochemical characterization and structural analysis....
Probes of large-scale structure in the Universe

International Nuclear Information System (INIS)

Suto, Yasushi; Gorski, K.; Juszkiewicz, R.; Silk, J.

1988-01-01

Recent progress in observational techniques has made it possible to confront quantitatively various models for the large-scale structure of the Universe with detailed observational data. We develop a general formalism to show that the gravitational instability theory for the origin of large-scale structure is now capable of critically confronting observational results on cosmic microwave background radiation angular anisotropies, large-scale bulk motions and large-scale clumpiness in the galaxy counts. (author)
Large-scale grid management; Storskala Nettforvaltning

Energy Technology Data Exchange (ETDEWEB)

Langdal, Bjoern Inge; Eggen, Arnt Ove

2003-07-01

The network companies in the Norwegian electricity industry now have to establish a large-scale network management, a concept essentially characterized by (1) broader focus (Broad Band, Multi Utility,...) and (2) bigger units with large networks and more customers. Research done by SINTEF Energy Research shows so far that the approaches within large-scale network management may be structured according to three main challenges: centralization, decentralization and out sourcing. The article is part of a planned series.
Genetic variation patterns of American chestnut populations at EST-SSRs

Science.gov (United States)

Oliver Gailing; C. Dana Nelson

2017-01-01

The objective of this study is to analyze patterns of genetic variation at genic expressed sequence tag - simple sequence repeats (EST-SSRs) and at chloroplast DNA markers in populations of American chestnut (Castanea dentata Borkh.) to assist in conservation and breeding efforts. Allelic diversity at EST-SSRs decreased significantly from southwest to northeast along...
Japanese large-scale interferometers

CERN Document Server

Kuroda, K; Miyoki, S; Ishizuka, H; Taylor, C T; Yamamoto, K; Miyakawa, O; Fujimoto, M K; Kawamura, S; Takahashi, R; Yamazaki, T; Arai, K; Tatsumi, D; Ueda, A; Fukushima, M; Sato, S; Shintomi, T; Yamamoto, A; Suzuki, T; Saitô, Y; Haruyama, T; Sato, N; Higashi, Y; Uchiyama, T; Tomaru, T; Tsubono, K; Ando, M; Takamori, A; Numata, K; Ueda, K I; Yoneda, H; Nakagawa, K; Musha, M; Mio, N; Moriwaki, S; Somiya, K; Araya, A; Kanda, N; Telada, S; Sasaki, M; Tagoshi, H; Nakamura, T; Tanaka, T; Ohara, K

2002-01-01

The objective of the TAMA 300 interferometer was to develop advanced technologies for kilometre scale interferometers and to observe gravitational wave events in nearby galaxies. It was designed as a power-recycled Fabry-Perot-Michelson interferometer and was intended as a step towards a final interferometer in Japan. The present successful status of TAMA is presented. TAMA forms a basis for LCGT (large-scale cryogenic gravitational wave telescope), a 3 km scale cryogenic interferometer to be built in the Kamioka mine in Japan, implementing cryogenic mirror techniques. The plan of LCGT is schematically described along with its associated R and D.
Chromosome-scale comparative sequence analysis unravels molecular mechanisms of genome evolution between two wheat cultivars

KAUST Repository

Thind, Anupriya Kaur

2018-02-08

Background: Recent improvements in DNA sequencing and genome scaffolding have paved the way to generate high-quality de novo assemblies of pseudomolecules representing complete chromosomes of wheat and its wild relatives. These assemblies form the basis to compare the evolutionary dynamics of wheat genomes on a megabase-scale. Results: Here, we provide a comparative sequence analysis of the 700-megabase chromosome 2D between two bread wheat genotypes, the old landrace Chinese Spring and the elite Swiss spring wheat line CH Campala Lr22a. There was a high degree of sequence conservation between the two chromosomes. Analysis of large structural variations revealed four large insertions/deletions (InDels) of >100 kb. Based on the molecular signatures at the breakpoints, unequal crossing over and double-strand break repair were identified as the evolutionary mechanisms that caused these InDels. Three of the large InDels affected copy number of NLRs, a gene family involved in plant immunity. Analysis of single nucleotide polymorphism (SNP) density revealed three haploblocks of 8 Mb, 9 Mb and 48 Mb with a 35-fold increased SNP density compared to the rest of the chromosome. Conclusions: This comparative analysis of two high-quality chromosome assemblies enabled a comprehensive assessment of large structural variations. The insight obtained from this analysis will form the basis of future wheat pan-genome studies.
Large Scale Sequencing of Dothideomycetes Provides Insights into Genome Evolution and Adaptation

Energy Technology Data Exchange (ETDEWEB)

Haridas, Sajeet; Crous, Pedro; Binder, Manfred; Spatafora, Joseph; Grigoriev, Igor

2015-03-16

Dothideomycetes is the largest and most diverse class of ascomycete fungi with 23 orders 110 families, 1300 genera and over 19,000 known species. We present comparative analysis of 70 Dothideomycete genomes including over 50 that we sequenced and are as yet unpublished. This extensive sampling has almost quadrupled the previous study of 18 species and uncovered a 10 fold range of genome sizes. We were able to clarify the phylogenetic positions of several species whose origins were unclear in previous morphological and sequence comparison studies. We analyzed selected gene families including proteases, transporters and small secreted proteins and show that major differences in gene content is influenced by speciation.
Transcriptome sequencing of two phenotypic mosaic Eucalyptus trees reveals large scale transcriptome re-modelling.

Directory of Open Access Journals (Sweden)

Amanda Padovan

Full Text Available Phenotypic mosaic trees offer an ideal system for studying differential gene expression. We have investigated two mosaic eucalypt trees from two closely related species (Eucalyptus melliodora and E. sideroxylon, which each support two types of leaves: one part of the canopy is resistant to insect herbivory and the remaining leaves are susceptible. Driving this ecological distinction are differences in plant secondary metabolites. We used these phenotypic mosaics to investigate genome wide patterns of foliar gene expression with the aim of identifying patterns of differential gene expression and the somatic mutation(s that lead to this phenotypic mosaicism. We sequenced the mRNA pool from leaves of the resistant and susceptible ecotypes from both mosaic eucalypts using the Illumina HiSeq 2000 platform. We found large differences in pathway regulation and gene expression between the ecotypes of each mosaic. The expression of the genes in the MVA and MEP pathways is reflected by variation in leaf chemistry, however this is not the case for the terpene synthases. Apart from the terpene biosynthetic pathway, there are several other metabolic pathways that are differentially regulated between the two ecotypes, suggesting there is much more phenotypic diversity than has been described. Despite the close relationship between the two species, they show large differences in the global patterns of gene and pathway regulation.
Large scale model testing

International Nuclear Information System (INIS)

Brumovsky, M.; Filip, R.; Polachova, H.; Stepanek, S.

1989-01-01

Fracture mechanics and fatigue calculations for WWER reactor pressure vessels were checked by large scale model testing performed using large testing machine ZZ 8000 (with a maximum load of 80 MN) at the SKODA WORKS. The results are described from testing the material resistance to fracture (non-ductile). The testing included the base materials and welded joints. The rated specimen thickness was 150 mm with defects of a depth between 15 and 100 mm. The results are also presented of nozzles of 850 mm inner diameter in a scale of 1:3; static, cyclic, and dynamic tests were performed without and with surface defects (15, 30 and 45 mm deep). During cyclic tests the crack growth rate in the elastic-plastic region was also determined. (author). 6 figs., 2 tabs., 5 refs
Why small-scale cannabis growers stay small: five mechanisms that prevent small-scale growers from going large scale.

Science.gov (United States)

Hammersvik, Eirik; Sandberg, Sveinung; Pedersen, Willy

2012-11-01

Over the past 15-20 years, domestic cultivation of cannabis has been established in a number of European countries. New techniques have made such cultivation easier; however, the bulk of growers remain small-scale. In this study, we explore the factors that prevent small-scale growers from increasing their production. The study is based on 1 year of ethnographic fieldwork and qualitative interviews conducted with 45 Norwegian cannabis growers, 10 of whom were growing on a large-scale and 35 on a small-scale. The study identifies five mechanisms that prevent small-scale indoor growers from going large-scale. First, large-scale operations involve a number of people, large sums of money, a high work-load and a high risk of detection, and thus demand a higher level of organizational skills than for small growing operations. Second, financial assets are needed to start a large 'grow-site'. Housing rent, electricity, equipment and nutrients are expensive. Third, to be able to sell large quantities of cannabis, growers need access to an illegal distribution network and knowledge of how to act according to black market norms and structures. Fourth, large-scale operations require advanced horticultural skills to maximize yield and quality, which demands greater skills and knowledge than does small-scale cultivation. Fifth, small-scale growers are often embedded in the 'cannabis culture', which emphasizes anti-commercialism, anti-violence and ecological and community values. Hence, starting up large-scale production will imply having to renegotiate or abandon these values. Going from small- to large-scale cannabis production is a demanding task-ideologically, technically, economically and personally. The many obstacles that small-scale growers face and the lack of interest and motivation for going large-scale suggest that the risk of a 'slippery slope' from small-scale to large-scale growing is limited. Possible political implications of the findings are discussed. Copyright
Distributed large-scale dimensional metrology new insights

CERN Document Server

Franceschini, Fiorenzo; Maisano, Domenico

2011-01-01

Focuses on the latest insights into and challenges of distributed large scale dimensional metrology Enables practitioners to study distributed large scale dimensional metrology independently Includes specific examples of the development of new system prototypes
ESTIMA, a tool for EST management in a multi-project environment.

Science.gov (United States)

Kumar, Charu G; LeDuc, Richard; Gong, George; Roinishivili, Levan; Lewin, Harris A; Liu, Lei

2004-11-04

Single-pass, partial sequencing of complementary DNA (cDNA) libraries generates thousands of chromatograms that are processed into high quality expressed sequence tags (ESTs), and then assembled into contigs representative of putative genes. Usually, to be of value, ESTs and contigs must be associated with meaningful annotations, and made available to end-users. A web application, Expressed Sequence Tag Information Management and Annotation (ESTIMA), has been created to meet the EST annotation and data management requirements of multiple high-throughput EST sequencing projects. It is anchored on individual ESTs and organized around different properties of ESTs including chromatograms, base-calling quality scores, structure of assembled transcripts, and multiple sources of comparison to infer functional annotation, Gene Ontology associations, and cDNA library information. ESTIMA consists of a relational database schema and a set of interactive query interfaces. These are integrated with a suite of web-based tools that allow a user to query and retrieve information. Further, query results are interconnected among the various EST properties. ESTIMA has several unique features. Users may run their own EST processing pipeline, search against preferred reference genomes, and use any clustering and assembly algorithm. The ESTIMA database schema is very flexible and accepts output from any EST processing and assembly pipeline. ESTIMA has been used for the management of EST projects of many species, including honeybee (Apis mellifera), cattle (Bos taurus), songbird (Taeniopygia guttata), corn rootworm (Diabrotica vergifera), catfish (Ictalurus punctatus, Ictalurus furcatus), and apple (Malus x domestica). The entire resource may be downloaded and used as is, or readily adapted to fit the unique needs of other cDNA sequencing projects. The scripts used to create the ESTIMA interface are freely available to academic users in an archived format from http
Expressed sequence tags (ESTs) and single nucleotide ...

African Journals Online (AJOL)

SERVER

2008-02-19

Feb 19, 2008 ... the discovery of the DNA, a new area of modern plant biotechnology begun. In plant ... Marker Assisted Breeding and Sequence Tagged Sites. (STS) are all in use in modern ...... and behaviour in the honey bee. Genome Res.
Identification of candidate genes for human pituitary development by EST analysis

Directory of Open Access Journals (Sweden)

Xiao Huasheng

2009-03-01

Full Text Available Abstract Background The pituitary is a critical neuroendocrine gland that is comprised of five hormone-secreting cell types, which develops in tandem during the embryonic stage. Some essential genes have been identified in the early stage of adenohypophysial development, such as PITX1, FGF8, BMP4 and SF-1. However, it is likely that a large number of signaling molecules and transcription factors essential for determination and terminal differentiation of specific cell types remain unidentified. High-throughput methods such as microarray analysis may facilitate the measurement of gene transcriptional levels, while Expressed sequence tag (EST sequencing, an efficient method for gene discovery and expression level analysis, may no-redundantly help to understand gene expression patterns during development. Results A total of 9,271 ESTs were generated from both fetal and adult pituitaries, and assigned into 961 gene/EST clusters in fetal and 2,747 in adult pituitary by homology analysis. The transcription maps derived from these data indicated that developmentally relevant genes, such as Sox4, ST13 and ZNF185, were dominant in the cDNA library of fetal pituitary, while hormones and hormone-associated genes, such as GH1, GH2, POMC, LHβ, CHGA and CHGB, were dominant in adult pituitary. Furthermore, by using RT-PCR and in situ hybridization, Sox4 was found to be one of the main transcription factors expressed in fetal pituitary for the first time. It was expressed at least at E12.5, but decreased after E17.5. In addition, 40 novel ESTs were identified specifically in this tissue. Conclusion The significant changes in gene expression in both tissues suggest a distinct and dynamic switch between embryonic and adult pituitaries. All these data along with Sox4 should be confirmed to further understand the community of multiple signaling pathways that act as a cooperative network that regulates maturation of the pituitary. It was also suggested that EST
SCALE INTERACTION IN A MIXING LAYER. THE ROLE OF THE LARGE-SCALE GRADIENTS

KAUST Repository

Fiscaletti, Daniele; Attili, Antonio; Bisetti, Fabrizio; Elsinga, Gerrit E.

2015-01-01

from physical considerations we would expect the scales to interact in a qualitatively similar way within the flow and across different turbulent flows. Therefore, instead of the large-scale fluctuations, the large-scale gradients modulation of the small scales has been additionally investigated.
DNA repair-related genes in sugarcane expressed sequence tags (ESTs

Directory of Open Access Journals (Sweden)

R.M.A. Costa

2001-12-01

écies. Os mecanismos relacionados à remoção de danos pelo reparo de DNA, bem como suas conseqüências biológicas, já são bem conhecidas em bactérias, leveduras e animais. Entretanto, no que diz respeito a organismos vegetais, ainda há muito a ser investigado. No presente trabalho, apresentamos a identificação dos genes envolvidos nas principais vias de reparo de DNA em cana-de-açúcar, através de uma análise de similaridade do banco de dados do projeto brasileiro Sugarcane Expressed Sequence Tag (SUCEST com seqüências protéicas conhecidas disponíveis em outros bancos de dados públicos (National Center of Biotechnology Information (NCBI e Munich Information Center for Protein Sequences (MIPS Arabidopsis thaliana. Esta busca revelou que a gama de proteínas envolvidas no reparo de DNA em cana-de-açúcar é similar a de outros eucariotos. Mesmo assim, foi possível identificar algumas características interessantes encontradas apenas em vegetais, provavelmente em função do seu processo evolutivo independente. As vias de reparo de DNA aqui representadas incluem fotorreativação, reparo excisão de bases, reparo excisão de nucleotídeos, reparo mismatch, end-joinning não homólogo, reparo por recombinação homóloga e tolerância a lesões. Este trabalho descreve as principais diferenças encontradas na maquinaria de reparo de DNA de células vegetais em relação àquela de organismos nos quais encontra-se bem descrita. Tais diferenças chamam a atenção para um potencial de mecanismos distintos em vegetais, que merecem futuras investigações.
Several Families of Sequences with Low Correlation and Large Linear Span

Science.gov (United States)

Zeng, Fanxin; Zhang, Zhenyu

In DS-CDMA systems and DS-UWB radios, low correlation of spreading sequences can greatly help to minimize multiple access interference (MAI) and large linear span of spreading sequences can reduce their predictability. In this letter, new sequence sets with low correlation and large linear span are proposed. Based on the construction Trm1[Trnm(αbt+γiαdt)]r for generating p-ary sequences of period pn-1, where n=2m, d=upm±v, b=u±v, γi∈GF(pn), and p is an arbitrary prime number, several methods to choose the parameter d are provided. The obtained sequences with family size pn are of four-valued, five-valued, six-valued or seven-valued correlation and the maximum nontrivial correlation value is (u+v-1)pm-1. The simulation by a computer shows that the linear span of the new sequences is larger than that of the sequences with Niho-type and Welch-type decimations, and similar to that of [10].
A re-entrant flowshop heuristic for online scheduling of the paper path in a large scale printer

NARCIS (Netherlands)

Waqas, U.; Geilen, M.C.W.; Kandelaars, J.; Somers, L.J.A.M.; Basten, T.; Stuijk, S.; Vestjens, P.G.H.; Corporaal, H.

2015-01-01

A Large Scale Printer (LSP) is a Cyber Physical System (CPS) printing thousands of sheets per day with high quality. The print requests arrive at run-time requiring online scheduling. We capture the LSP scheduling problem as online scheduling of re-entrant flowshops with sequence dependent setup

annot8r: GO, EC and KEGG annotation of EST datasets

Directory of Open Access Journals (Sweden)

Schmid Ralf

2008-04-01

Full Text Available Abstract Background The expressed sequence tag (EST methodology is an attractive option for the generation of sequence data for species for which no completely sequenced genome is available. The annotation and comparative analysis of such datasets poses a formidable challenge for research groups that do not have the bioinformatics infrastructure of major genome sequencing centres. Therefore, there is a need for user-friendly tools to facilitate the annotation of non-model species EST datasets with well-defined ontologies that enable meaningful cross-species comparisons. To address this, we have developed annot8r, a platform for the rapid annotation of EST datasets with GO-terms, EC-numbers and KEGG-pathways. Results annot8r automatically downloads all files relevant for the annotation process and generates a reference database that stores UniProt entries, their associated Gene Ontology (GO, Enzyme Commission (EC and Kyoto Encyclopaedia of Genes and Genomes (KEGG annotation and additional relevant data. For each of GO, EC and KEGG, annot8r extracts a specific sequence subset from the UniProt dataset based on the information stored in the reference database. These three subsets are then formatted for BLAST searches. The user provides the protein or nucleotide sequences to be annotated and annot8r runs BLAST searches against these three subsets. The BLAST results are parsed and the corresponding annotations retrieved from the reference database. The annotations are saved both as flat files and also in a relational postgreSQL results database to facilitate more advanced searches within the results. annot8r is integrated with the PartiGene suite of EST analysis tools. Conclusion annot8r is a tool that assigns GO, EC and KEGG annotations for data sets resulting from EST sequencing projects both rapidly and efficiently. The benefits of an underlying relational database, flexibility and the ease of use of the program make it ideally suited for non
Trends in large-scale testing of reactor structures

International Nuclear Information System (INIS)

Blejwas, T.E.

2003-01-01

Large-scale tests of reactor structures have been conducted at Sandia National Laboratories since the late 1970s. This paper describes a number of different large-scale impact tests, pressurization tests of models of containment structures, and thermal-pressure tests of models of reactor pressure vessels. The advantages of large-scale testing are evident, but cost, in particular limits its use. As computer models have grown in size, such as number of degrees of freedom, the advent of computer graphics has made possible very realistic representation of results - results that may not accurately represent reality. A necessary condition to avoiding this pitfall is the validation of the analytical methods and underlying physical representations. Ironically, the immensely larger computer models sometimes increase the need for large-scale testing, because the modeling is applied to increasing more complex structural systems and/or more complex physical phenomena. Unfortunately, the cost of large-scale tests is a disadvantage that will likely severely limit similar testing in the future. International collaborations may provide the best mechanism for funding future programs with large-scale tests. (author)
Temporal sequencing of throughfall drop generation as revealed by use of a large-scale rainfall simulator

Science.gov (United States)

Nanko, K.; Levia, D. F., Jr.; Iida, S.; SUN, X.; Shinohara, Y.; Sakai, N.

2017-12-01

Scientists have been interested in throughfall drop size and its distribution because of its importance to soil erosion and the forest water balance. An indoor experiment was employed to deepen our understanding of throughfall drop generation processes to promote better management of forested ecosystems. The indoor experiment provides a unique opportunity to examine an array of constant rainfall intensities that are ideal conditions to pick up the effect of changing intensities and not found in the fields. Throughfall drop generation was examined for three species- Cryptomeria japonica D. Don (Japanese cedar), Chamaecyparis obtusa (Siebold & Zucc.) Endl. (Japanese cypress), and Zelkova serrata Thunb. (Japanese zelkova)- under both leafed and leafless conditions in the large-scale rainfall simulator in the National Research Institute for Earth Science and Disaster Resilience (Tsukuba, Japan) at varying rainfall intensities ranging from15 to 100 mm h-1. Drop size distributions of the applied rainfall and throughfall were measured simultaneously by 20 laser disdrometers. Utilizing the drop size dataset, throughfall was separated into three components: free throughfall, canopy drip, and splash throughfall. The temporal sequencing of the throughfall components were analyzed on a 1-min interval during each experimental run. The throughfall component percentage and drop size of canopy drip differed among tree species and rainfall intensities and by elapsed time from the beginning of the rainfall event. Preliminary analysis revealed that the time differences to produce branch drip as compared to leaf (or needle) drip was partly due to differential canopy wet-up processes and the disappearance of branch drips due to canopy saturation, leading to dissimilar throughfall drop size distributions beneath the various tree species examined. This research was supported by JSPS Invitation Fellowship for Research in Japan (Grant No.: S16088) and JSPS KAKENHI (Grant No.: JP15H05626).
Gene mining a marama bean expressed sequence tags (ESTs ...

African Journals Online (AJOL)

The authors reported the identification of genes associated with embryonic development and microsatellite sequences. The future direction will entail characterization of these genes using gene over-expression and mutant assays. Key words: Namibia, simple sequence repeats (SSR), data mining, homology searches, ...
Large Scale Computations in Air Pollution Modelling

DEFF Research Database (Denmark)

Zlatev, Z.; Brandt, J.; Builtjes, P. J. H.

Proceedings of the NATO Advanced Research Workshop on Large Scale Computations in Air Pollution Modelling, Sofia, Bulgaria, 6-10 July 1998......Proceedings of the NATO Advanced Research Workshop on Large Scale Computations in Air Pollution Modelling, Sofia, Bulgaria, 6-10 July 1998...
Large-Scale 3D Printing: The Way Forward

Science.gov (United States)

Jassmi, Hamad Al; Najjar, Fady Al; Ismail Mourad, Abdel-Hamid

2018-03-01

Research on small-scale 3D printing has rapidly evolved, where numerous industrial products have been tested and successfully applied. Nonetheless, research on large-scale 3D printing, directed to large-scale applications such as construction and automotive manufacturing, yet demands a great a great deal of efforts. Large-scale 3D printing is considered an interdisciplinary topic and requires establishing a blended knowledge base from numerous research fields including structural engineering, materials science, mechatronics, software engineering, artificial intelligence and architectural engineering. This review article summarizes key topics of relevance to new research trends on large-scale 3D printing, particularly pertaining (1) technological solutions of additive construction (i.e. the 3D printers themselves), (2) materials science challenges, and (3) new design opportunities.
Large-scale deletions of the ABCA1 gene in patients with hypoalphalipoproteinemia.

Science.gov (United States)

Dron, Jacqueline S; Wang, Jian; Berberich, Amanda J; Iacocca, Michael A; Cao, Henian; Yang, Ping; Knoll, Joan; Tremblay, Karine; Brisson, Diane; Netzer, Christian; Gouni-Berthold, Ioanna; Gaudet, Daniel; Hegele, Robert A

2018-06-04

Copy-number variations (CNVs) have been studied in the context of familial hypercholesterolemia but have not yet been evaluated in patients with extremes of high-density lipoprotein (HDL) cholesterol levels. We evaluated targeted next-generation sequencing data from patients with very low HDL cholesterol (i.e. hypoalphalipoproteinemia) using the VarSeq-CNV caller algorithm to screen for CNVs disrupting the ABCA1, LCAT or APOA1 genes. In four individuals, we found three unique deletions in ABCA1: a heterozygous deletion of exon 4, a heterozygous deletion spanning exons 8 to 31, and a heterozygous deletion of the entire ABCA1 gene. Breakpoints were identified using Sanger sequencing, and the full-gene deletion was also confirmed using exome sequencing and the Affymetrix CytoScanTM HD Array. Before now, large-scale deletions in candidate HDL genes have not been associated with hypoalphalipoproteinemia; our findings indicate that CNVs in ABCA1 may be a previously unappreciated genetic determinant of low HDL cholesterol levels. By coupling bioinformatic analyses with next-generation sequencing data, we can successfully assess the spectrum of genetic determinants of many dyslipidemias, now including hypoalphalipoproteinemia. Published under license by The American Society for Biochemistry and Molecular Biology, Inc.
Growth Limits in Large Scale Networks

DEFF Research Database (Denmark)

Knudsen, Thomas Phillip

limitations. The rising complexity of network management with the convergence of communications platforms is shown as problematic for both automatic management feasibility and for manpower resource management. In the fourth step the scope is extended to include the present society with the DDN project as its......The Subject of large scale networks is approached from the perspective of the network planner. An analysis of the long term planning problems is presented with the main focus on the changing requirements for large scale networks and the potential problems in meeting these requirements. The problems...... the fundamental technological resources in network technologies are analysed for scalability. Here several technological limits to continued growth are presented. The third step involves a survey of major problems in managing large scale networks given the growth of user requirements and the technological...
Accelerating sustainability in large-scale facilities

CERN Multimedia

Marina Giampietro

2011-01-01

Scientific research centres and large-scale facilities are intrinsically energy intensive, but how can big science improve its energy management and eventually contribute to the environmental cause with new cleantech? CERN’s commitment to providing tangible answers to these questions was sealed in the first workshop on energy management for large scale scientific infrastructures held in Lund, Sweden, on the 13-14 October. Participants at the energy management for large scale scientific infrastructures workshop. The workshop, co-organised with the European Spallation Source (ESS) and the European Association of National Research Facilities (ERF), tackled a recognised need for addressing energy issues in relation with science and technology policies. It brought together more than 150 representatives of Research Infrastrutures (RIs) and energy experts from Europe and North America. “Without compromising our scientific projects, we can ...
Pepper EST database: comprehensive in silico tool for analyzing the chili pepper (Capsicum annuum transcriptome

Directory of Open Access Journals (Sweden)

Kim Woo Taek

2008-10-01

Full Text Available Abstract Background There is no dedicated database available for Expressed Sequence Tags (EST of the chili pepper (Capsicum annuum, although the interest in a chili pepper EST database is increasing internationally due to the nutritional, economic, and pharmaceutical value of the plant. Recent advances in high-throughput sequencing of the ESTs of chili pepper cv. Bukang have produced hundreds of thousands of complementary DNA (cDNA sequences. Therefore, a chili pepper EST database was designed and constructed to enable comprehensive analysis of chili pepper gene expression in response to biotic and abiotic stresses. Results We built the Pepper EST database to mine the complexity of chili pepper ESTs. The database was built on 122,582 sequenced ESTs and 116,412 refined ESTs from 21 pepper EST libraries. The ESTs were clustered and assembled into virtual consensus cDNAs and the cDNAs were assigned to metabolic pathway, Gene Ontology (GO, and MIPS Functional Catalogue (FunCat. The Pepper EST database is designed to provide a workbench for (i identifying unigenes in pepper plants, (ii analyzing expression patterns in different developmental tissues and under conditions of stress, and (iii comparing the ESTs with those of other members of the Solanaceae family. The Pepper EST database is freely available at http://genepool.kribb.re.kr/pepper/. Conclusion The Pepper EST database is expected to provide a high-quality resource, which will contribute to gaining a systemic understanding of plant diseases and facilitate genetics-based population studies. The database is also expected to contribute to analysis of gene synteny as part of the chili pepper sequencing project by mapping ESTs to the genome.
Large scale reflood test

International Nuclear Information System (INIS)

Hirano, Kemmei; Murao, Yoshio

1980-01-01

The large-scale reflood test with a view to ensuring the safety of light water reactors was started in fiscal 1976 based on the special account act for power source development promotion measures by the entrustment from the Science and Technology Agency. Thereafter, to establish the safety of PWRs in loss-of-coolant accidents by joint international efforts, the Japan-West Germany-U.S. research cooperation program was started in April, 1980. Thereupon, the large-scale reflood test is now included in this program. It consists of two tests using a cylindrical core testing apparatus for examining the overall system effect and a plate core testing apparatus for testing individual effects. Each apparatus is composed of the mock-ups of pressure vessel, primary loop, containment vessel and ECCS. The testing method, the test results and the research cooperation program are described. (J.P.N.)
Development and production of an oligonucleotide MuscleChip: use for validation of ambiguous ESTs

Directory of Open Access Journals (Sweden)

Lanfranchi Gerolamo

2002-10-01

Full Text Available Abstract Background We describe the development, validation, and use of a highly redundant 120,000 oligonucleotide microarray (MuscleChip containing 4,601 probe sets representing 1,150 known genes expressed in muscle and 2,075 EST clusters from a non-normalized subtracted muscle EST sequencing project (28,074 EST sequences. This set included 369 novel EST clusters showing no match to previously characterized proteins in any database. Each probe set was designed to contain 20–32 25 mer oligonucleotides (10–16 paired perfect match and mismatch probe pairs per gene, with each probe evaluated for hybridization kinetics (Tm and similarity to other sequences. The 120,000 oligonucleotides were synthesized by photolithography and light-activated chemistry on each microarray. Results Hybridization of human muscle cRNAs to this MuscleChip (33 samples showed a correlation of 0.6 between the number of ESTs sequenced in each cluster and hybridization intensity. Out of 369 novel EST clusters not showing any similarity to previously characterized proteins, we focused on 250 EST clusters that were represented by robust probe sets on the MuscleChip fulfilling all stringent rules. 102 (41% were found to be consistently "present" by analysis of hybridization to human muscle RNA, of which 40 ESTs (39% could be genome anchored to potential transcription units in the human genome sequence. 19 ESTs of the 40 ESTs were furthermore computer-predicted as exons by one or more than three gene identification algorithms. Conclusion Our analysis found 40 transcriptionally validated, genome-anchored novel EST clusters to be expressed in human muscle. As most of these ESTs were low copy clusters (duplex and triplex in the original 28,000 EST project, the identification of these as significantly expressed is a robust validation of the transcript units that permits subsequent focus on the novel proteins encoded by these genes.
ESTIMA, a tool for EST management in a multi-project environment

Directory of Open Access Journals (Sweden)

Lewin Harris A

2004-11-01

Full Text Available Abstract Background Single-pass, partial sequencing of complementary DNA (cDNA libraries generates thousands of chromatograms that are processed into high quality expressed sequence tags (ESTs, and then assembled into contigs representative of putative genes. Usually, to be of value, ESTs and contigs must be associated with meaningful annotations, and made available to end-users. Results A web application, Expressed Sequence Tag Information Management and Annotation (ESTIMA, has been created to meet the EST annotation and data management requirements of multiple high-throughput EST sequencing projects. It is anchored on individual ESTs and organized around different properties of ESTs including chromatograms, base-calling quality scores, structure of assembled transcripts, and multiple sources of comparison to infer functional annotation, Gene Ontology associations, and cDNA library information. ESTIMA consists of a relational database schema and a set of interactive query interfaces. These are integrated with a suite of web-based tools that allow a user to query and retrieve information. Further, query results are interconnected among the various EST properties. ESTIMA has several unique features. Users may run their own EST processing pipeline, search against preferred reference genomes, and use any clustering and assembly algorithm. The ESTIMA database schema is very flexible and accepts output from any EST processing and assembly pipeline. ESTIMA has been used for the management of EST projects of many species, including honeybee (Apis mellifera, cattle (Bos taurus, songbird (Taeniopygia guttata, corn rootworm (Diabrotica vergifera, catfish (Ictalurus punctatus, Ictalurus furcatus, and apple (Malus x domestica. The entire resource may be downloaded and used as is, or readily adapted to fit the unique needs of other cDNA sequencing projects. Conclusions The scripts used to create the ESTIMA interface are freely available to academic users in
Large Scale Cosmological Anomalies and Inhomogeneous Dark Energy

Directory of Open Access Journals (Sweden)

Leandros Perivolaropoulos

2014-01-01

Full Text Available A wide range of large scale observations hint towards possible modifications on the standard cosmological model which is based on a homogeneous and isotropic universe with a small cosmological constant and matter. These observations, also known as “cosmic anomalies” include unexpected Cosmic Microwave Background perturbations on large angular scales, large dipolar peculiar velocity flows of galaxies (“bulk flows”, the measurement of inhomogenous values of the fine structure constant on cosmological scales (“alpha dipole” and other effects. The presence of the observational anomalies could either be a large statistical fluctuation in the context of ΛCDM or it could indicate a non-trivial departure from the cosmological principle on Hubble scales. Such a departure is very much constrained by cosmological observations for matter. For dark energy however there are no significant observational constraints for Hubble scale inhomogeneities. In this brief review I discuss some of the theoretical models that can naturally lead to inhomogeneous dark energy, their observational constraints and their potential to explain the large scale cosmic anomalies.
Large-scale patterns in Rayleigh-Benard convection

International Nuclear Information System (INIS)

Hardenberg, J. von; Parodi, A.; Passoni, G.; Provenzale, A.; Spiegel, E.A.

2008-01-01

Rayleigh-Benard convection at large Rayleigh number is characterized by the presence of intense, vertically moving plumes. Both laboratory and numerical experiments reveal that the rising and descending plumes aggregate into separate clusters so as to produce large-scale updrafts and downdrafts. The horizontal scales of the aggregates reported so far have been comparable to the horizontal extent of the containers, but it has not been clear whether that represents a limitation imposed by domain size. In this work, we present numerical simulations of convection at sufficiently large aspect ratio to ascertain whether there is an intrinsic saturation scale for the clustering process when that ratio is large enough. From a series of simulations of Rayleigh-Benard convection with Rayleigh numbers between 10 5 and 10 8 and with aspect ratios up to 12π, we conclude that the clustering process has a finite horizontal saturation scale with at most a weak dependence on Rayleigh number in the range studied
The genetic etiology of Tourette Syndrome: Large-scale collaborative efforts on the precipice of discovery

Directory of Open Access Journals (Sweden)

Marianthi Georgitsi

2016-08-01

Full Text Available Gilles de la Tourette Syndrome (TS is a childhood-onset neurodevelopmental disorder that is characterized by multiple motor and phonic tics. It has a complex etiology with multiple genes likely interacting with environmental factors to lead to the onset of symptoms. The genetic basis of the disorder remains elusive;however, multiple resources and large-scale projects are coming together, launching a new era in the field and bringing us on the verge of discovery. The large-scale efforts outlined in this report, are complementary and represent a range of different approaches to the study of disorders with complex inheritance. The Tourette Syndrome Association International Consortium for Genetics (TSAICG has focused on large families, parent-proband trios and cases for large case-control designs such as genomewide association studies (GWAS, copy number variation (CNV scans and exome/genome sequencing. TIC Genetics targets rare, large effect size mutations in simplex trios and multigenerational families. The European Multicentre Tics in Children Study (EMTICS seeks to elucidate gene-environment interactions including the involvement of infection and immune mechanisms in TS etiology. Finally, TS-EUROTRAIN, a Marie Curie Initial Training Network, aims to act as a platform to unify large-scale projects in the field and to educate the next generation of experts. Importantly, these complementary large-scale efforts are joining forces to uncover the full range of genetic variation and environmental risk factors for TS, holding great promise for indentifying definitive TS susceptibility genes and shedding light into the complex pathophysiology of this disorder.
The Genetic Etiology of Tourette Syndrome: Large-Scale Collaborative Efforts on the Precipice of Discovery

Science.gov (United States)

Georgitsi, Marianthi; Willsey, A. Jeremy; Mathews, Carol A.; State, Matthew; Scharf, Jeremiah M.; Paschou, Peristera

2016-01-01

Gilles de la Tourette Syndrome (TS) is a childhood-onset neurodevelopmental disorder that is characterized by multiple motor and phonic tics. It has a complex etiology with multiple genes likely interacting with environmental factors to lead to the onset of symptoms. The genetic basis of the disorder remains elusive. However, multiple resources and large-scale projects are coming together, launching a new era in the field and bringing us on the verge of discovery. The large-scale efforts outlined in this report are complementary and represent a range of different approaches to the study of disorders with complex inheritance. The Tourette Syndrome Association International Consortium for Genetics (TSAICG) has focused on large families, parent-proband trios and cases for large case-control designs such as genomewide association studies (GWAS), copy number variation (CNV) scans, and exome/genome sequencing. TIC Genetics targets rare, large effect size mutations in simplex trios, and multigenerational families. The European Multicentre Tics in Children Study (EMTICS) seeks to elucidate gene-environment interactions including the involvement of infection and immune mechanisms in TS etiology. Finally, TS-EUROTRAIN, a Marie Curie Initial Training Network, aims to act as a platform to unify large-scale projects in the field and to educate the next generation of experts. Importantly, these complementary large-scale efforts are joining forces to uncover the full range of genetic variation and environmental risk factors for TS, holding great promise for identifying definitive TS susceptibility genes and shedding light into the complex pathophysiology of this disorder. PMID:27536211
Manufacturing test of large scale hollow capsule and long length cladding in the large scale oxide dispersion strengthened (ODS) martensitic steel

International Nuclear Information System (INIS)

Narita, Takeshi; Ukai, Shigeharu; Kaito, Takeji; Ohtsuka, Satoshi; Fujiwara, Masayuki

2004-04-01

Mass production capability of oxide dispersion strengthened (ODS) martensitic steel cladding (9Cr) has being evaluated in the Phase II of the Feasibility Studies on Commercialized Fast Reactor Cycle System. The cost for manufacturing mother tube (raw materials powder production, mechanical alloying (MA) by ball mill, canning, hot extrusion, and machining) is a dominant factor in the total cost for manufacturing ODS ferritic steel cladding. In this study, the large-sale 9Cr-ODS martensitic steel mother tube which is made with a large-scale hollow capsule, and long length claddings were manufactured, and the applicability of these processes was evaluated. Following results were obtained in this study. (1) Manufacturing the large scale mother tube in the dimension of 32 mm OD, 21 mm ID, and 2 m length has been successfully carried out using large scale hollow capsule. This mother tube has a high degree of accuracy in size. (2) The chemical composition and the micro structure of the manufactured mother tube are similar to the existing mother tube manufactured by a small scale can. And the remarkable difference between the bottom and top sides in the manufactured mother tube has not been observed. (3) The long length cladding has been successfully manufactured from the large scale mother tube which was made using a large scale hollow capsule. (4) For reducing the manufacturing cost of the ODS steel claddings, manufacturing process of the mother tubes using a large scale hollow capsules is promising. (author)
Amplification of large-scale magnetic field in nonhelical magnetohydrodynamics

KAUST Repository

Kumar, Rohit

2017-08-11

It is typically assumed that the kinetic and magnetic helicities play a crucial role in the growth of large-scale dynamo. In this paper, we demonstrate that helicity is not essential for the amplification of large-scale magnetic field. For this purpose, we perform nonhelical magnetohydrodynamic (MHD) simulation, and show that the large-scale magnetic field can grow in nonhelical MHD when random external forcing is employed at scale 1/10 the box size. The energy fluxes and shell-to-shell transfer rates computed using the numerical data show that the large-scale magnetic energy grows due to the energy transfers from the velocity field at the forcing scales.
Validation of SCALE-4 criticality sequences using ENDF/B-V data

International Nuclear Information System (INIS)

Bowman, S.M.; Wright, R.Q.; DeHart, M.D.; Taniuchi, H.

1993-01-01

The SCALE code system developed at Oak Ridge National Laboratory contains criticality safety analysis sequences that include the KENO V.a Monte Carlo code for calculation of the effective multiplication factor. These sequences are widely used for criticality safety analyses performed both in the United States and abroad. The purpose of the current work is to validate the SCALE-4 criticality sequences with an ENDF/B-V cross-section library for future distribution with SCALE-4. The library used for this validation is a broad-group library (44 groups) collapsed from the 238-group SCALE library. Extensive data testing of both the 238-group and the 44-group libraries included 10 fast and 18 thermal CSEWG benchmarks and 5 other fast benchmarks. Both libraries contain approximately 300 nuclides and are, therefore, capable of modeling most systems, including those containing spent fuel or radioactive waste. The validation of the broad-group library used 93 critical experiments as benchmarks. The range of experiments included 60 light-water-reactor fuel rod lattices, 13 mixed-oxide fuel rod lattice, and 15 other low- and high-enriched uranium critical assemblies

Energetics and Structural Characterization of the large-scale Functional Motion of Adenylate Kinase

Science.gov (United States)

Formoso, Elena; Limongelli, Vittorio; Parrinello, Michele

2015-02-01

Adenylate Kinase (AK) is a signal transducing protein that regulates cellular energy homeostasis balancing between different conformations. An alteration of its activity can lead to severe pathologies such as heart failure, cancer and neurodegenerative diseases. A comprehensive elucidation of the large-scale conformational motions that rule the functional mechanism of this enzyme is of great value to guide rationally the development of new medications. Here using a metadynamics-based computational protocol we elucidate the thermodynamics and structural properties underlying the AK functional transitions. The free energy estimation of the conformational motions of the enzyme allows characterizing the sequence of events that regulate its action. We reveal the atomistic details of the most relevant enzyme states, identifying residues such as Arg119 and Lys13, which play a key role during the conformational transitions and represent druggable spots to design enzyme inhibitors. Our study offers tools that open new areas of investigation on large-scale motion in proteins.
WImpiBLAST: web interface for mpiBLAST to help biologists perform large-scale annotation using high performance computing.

Directory of Open Access Journals (Sweden)

Parichit Sharma

Full Text Available The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture
WImpiBLAST: web interface for mpiBLAST to help biologists perform large-scale annotation using high performance computing.

Science.gov (United States)

Sharma, Parichit; Mantri, Shrikant S

2014-01-01

The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design
Hydrometeorological variability on a large french catchment and its relation to large-scale circulation across temporal scales

Science.gov (United States)

Massei, Nicolas; Dieppois, Bastien; Fritier, Nicolas; Laignel, Benoit; Debret, Maxime; Lavers, David; Hannah, David

2015-04-01

In the present context of global changes, considerable efforts have been deployed by the hydrological scientific community to improve our understanding of the impacts of climate fluctuations on water resources. Both observational and modeling studies have been extensively employed to characterize hydrological changes and trends, assess the impact of climate variability or provide future scenarios of water resources. In the aim of a better understanding of hydrological changes, it is of crucial importance to determine how and to what extent trends and long-term oscillations detectable in hydrological variables are linked to global climate oscillations. In this work, we develop an approach associating large-scale/local-scale correlation, enmpirical statistical downscaling and wavelet multiresolution decomposition of monthly precipitation and streamflow over the Seine river watershed, and the North Atlantic sea level pressure (SLP) in order to gain additional insights on the atmospheric patterns associated with the regional hydrology. We hypothesized that: i) atmospheric patterns may change according to the different temporal wavelengths defining the variability of the signals; and ii) definition of those hydrological/circulation relationships for each temporal wavelength may improve the determination of large-scale predictors of local variations. The results showed that the large-scale/local-scale links were not necessarily constant according to time-scale (i.e. for the different frequencies characterizing the signals), resulting in changing spatial patterns across scales. This was then taken into account by developing an empirical statistical downscaling (ESD) modeling approach which integrated discrete wavelet multiresolution analysis for reconstructing local hydrometeorological processes (predictand : precipitation and streamflow on the Seine river catchment) based on a large-scale predictor (SLP over the Euro-Atlantic sector) on a monthly time-step. This approach
Superconducting materials for large scale applications

International Nuclear Information System (INIS)

Dew-Hughes, D.

1975-01-01

Applications of superconductors capable of carrying large current densities in large-scale electrical devices are examined. Discussions are included on critical current density, superconducting materials available, and future prospects for improved superconducting materials. (JRD)
Large-scale influences in near-wall turbulence.

Science.gov (United States)

Hutchins, Nicholas; Marusic, Ivan

2007-03-15

Hot-wire data acquired in a high Reynolds number facility are used to illustrate the need for adequate scale separation when considering the coherent structure in wall-bounded turbulence. It is found that a large-scale motion in the log region becomes increasingly comparable in energy to the near-wall cycle as the Reynolds number increases. Through decomposition of fluctuating velocity signals, it is shown that this large-scale motion has a distinct modulating influence on the small-scale energy (akin to amplitude modulation). Reassessment of DNS data, in light of these results, shows similar trends, with the rate and intensity of production due to the near-wall cycle subject to a modulating influence from the largest-scale motions.
Techniques for Large-Scale Bacterial Genome Manipulation and Characterization of the Mutants with Respect to In Silico Metabolic Reconstructions.

Science.gov (United States)

diCenzo, George C; Finan, Turlough M

2018-01-01

The rate at which all genes within a bacterial genome can be identified far exceeds the ability to characterize these genes. To assist in associating genes with cellular functions, a large-scale bacterial genome deletion approach can be employed to rapidly screen tens to thousands of genes for desired phenotypes. Here, we provide a detailed protocol for the generation of deletions of large segments of bacterial genomes that relies on the activity of a site-specific recombinase. In this procedure, two recombinase recognition target sequences are introduced into known positions of a bacterial genome through single cross-over plasmid integration. Subsequent expression of the site-specific recombinase mediates recombination between the two target sequences, resulting in the excision of the intervening region and its loss from the genome. We further illustrate how this deletion system can be readily adapted to function as a large-scale in vivo cloning procedure, in which the region excised from the genome is captured as a replicative plasmid. We next provide a procedure for the metabolic analysis of bacterial large-scale genome deletion mutants using the Biolog Phenotype MicroArray™ system. Finally, a pipeline is described, and a sample Matlab script is provided, for the integration of the obtained data with a draft metabolic reconstruction for the refinement of the reactions and gene-protein-reaction relationships in a metabolic reconstruction.
Construction of 12 EST libraries and characterization of a 12,226 EST dataset for chicory (Cichorium intybus root, leaves and nodules in the context of carbohydrate metabolism investigation

Directory of Open Access Journals (Sweden)

Boutry Marc

2009-01-01

Full Text Available Abstract Background The industrial chicory, Cichorium intybus, is a member of the Asteraceae family that accumulates fructan of the inulin type in its root. Inulin is a low calories sweetener, a texture agent and a health promoting ingredient due to its prebiotic properties. Average inulin chain length is a critical parameter that is genotype and temperature dependent. In the context of the study of carbohydrate metabolism and to get insight into the transcriptome of chicory root and to visualize temporal changes of gene expression during the growing season, we obtained and characterized 10 cDNA libraries from chicory roots regularly sampled in field during a growing season. A leaf and a nodule libraries were also obtained for comparison. Results Approximately 1,000 Expressed Sequence Tags (EST were obtained from each of twelve cDNA libraries resulting in a 12,226 EST dataset. Clustering of these ESTs returned 1,922 contigs and 4,869 singlets for a total of 6,791 putative unigenes. All ESTs were compared to public sequence databases and functionally classified. Data were specifically searched for sequences related to carbohydrate metabolism. Season wide evolution of functional classes was evaluated by comparing libraries at the level of functional categories and unigenes distribution. Conclusion This chicory EST dataset provides a season wide outlook of the genes expressed in the root and to a minor extent in leaves and nodules. The dataset contains more than 200 sequences related to carbohydrate metabolism and 3,500 new ESTs when compared to other recently released chicory EST datasets, probably because of the season wide coverage of the root samples. We believe that these sequences will contribute to accelerate research and breeding of the industrial chicory as well as of closely related species.
Large-scale simulations of plastic neural networks on neuromorphic hardware

Directory of Open Access Journals (Sweden)

James Courtney Knight

2016-04-01

Full Text Available SpiNNaker is a digital, neuromorphic architecture designed for simulating large-scale spiking neural networks at speeds close to biological real-time. Rather than using bespoke analog or digital hardware, the basic computational unit of a SpiNNaker system is a general-purpose ARM processor, allowing it to be programmed to simulate a wide variety of neuron and synapse models. This flexibility is particularly valuable in the study of biological plasticity phenomena. A recently proposed learning rule based on the Bayesian Confidence Propagation Neural Network (BCPNN paradigm offers a generic framework for modeling the interaction of different plasticity mechanisms using spiking neurons. However, it can be computationally expensive to simulate large networks with BCPNN learning since it requires multiple state variables for each synapse, each of which needs to be updated every simulation time-step. We discuss the trade-offs in efficiency and accuracy involved in developing an event-based BCPNN implementation for SpiNNaker based on an analytical solution to the BCPNN equations, and detail the steps taken to fit this within the limited computational and memory resources of the SpiNNaker architecture. We demonstrate this learning rule by learning temporal sequences of neural activity within a recurrent attractor network which we simulate at scales of up to 20000 neurons and 51200000 plastic synapses: the largest plastic neural network ever to be simulated on neuromorphic hardware. We also run a comparable simulation on a Cray XC-30 supercomputer system and find that, if it is to match the run-time of our SpiNNaker simulation, the super computer system uses approximately more power. This suggests that cheaper, more power efficient neuromorphic systems are becoming useful discovery tools in the study of plasticity in large-scale brain models.
PKI security in large-scale healthcare networks.

Science.gov (United States)

Mantas, Georgios; Lymberopoulos, Dimitrios; Komninos, Nikos

2012-06-01

During the past few years a lot of PKI (Public Key Infrastructures) infrastructures have been proposed for healthcare networks in order to ensure secure communication services and exchange of data among healthcare professionals. However, there is a plethora of challenges in these healthcare PKI infrastructures. Especially, there are a lot of challenges for PKI infrastructures deployed over large-scale healthcare networks. In this paper, we propose a PKI infrastructure to ensure security in a large-scale Internet-based healthcare network connecting a wide spectrum of healthcare units geographically distributed within a wide region. Furthermore, the proposed PKI infrastructure facilitates the trust issues that arise in a large-scale healthcare network including multi-domain PKI infrastructures.
DNA Data Bank of Japan at work on genome sequence data.

Science.gov (United States)

Tateno, Y; Fukami-Kobayashi, K; Miyazaki, S; Sugawara, H; Gojobori, T

1998-01-01

We at the DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) have recently begun receiving, processing and releasing EST and genome sequence data submitted by various Japanese genome projects. The data include those for human, Arabidopsis thaliana, rice, nematode, Synechocystis sp. and Escherichia coli. Since the quantity of data is very large, we organized teams to conduct preliminary discussions with project teams about data submission and handling for release to the public. We also developed a mass submission tool to cope with a large quantity of data. In addition, to provide genome data on WWW, we developed a genome information system using Java. This system (http://mol.genes.nig.ac.jp/ecoli/) can in theory be used for any genome sequence data. These activities will facilitate processing of large quantities of EST and genome data.
Emerging large-scale solar heating applications

International Nuclear Information System (INIS)

Wong, W.P.; McClung, J.L.

2009-01-01

Currently the market for solar heating applications in Canada is dominated by outdoor swimming pool heating, make-up air pre-heating and domestic water heating in homes, commercial and institutional buildings. All of these involve relatively small systems, except for a few air pre-heating systems on very large buildings. Together these applications make up well over 90% of the solar thermal collectors installed in Canada during 2007. These three applications, along with the recent re-emergence of large-scale concentrated solar thermal for generating electricity, also dominate the world markets. This paper examines some emerging markets for large scale solar heating applications, with a focus on the Canadian climate and market. (author)
Emerging large-scale solar heating applications

Energy Technology Data Exchange (ETDEWEB)

Wong, W.P.; McClung, J.L. [Science Applications International Corporation (SAIC Canada), Ottawa, Ontario (Canada)

2009-07-01

Currently the market for solar heating applications in Canada is dominated by outdoor swimming pool heating, make-up air pre-heating and domestic water heating in homes, commercial and institutional buildings. All of these involve relatively small systems, except for a few air pre-heating systems on very large buildings. Together these applications make up well over 90% of the solar thermal collectors installed in Canada during 2007. These three applications, along with the recent re-emergence of large-scale concentrated solar thermal for generating electricity, also dominate the world markets. This paper examines some emerging markets for large scale solar heating applications, with a focus on the Canadian climate and market. (author)
Developmental and Subcellular Organization of Single-Cell C₄ Photosynthesis in Bienertia sinuspersici Determined by Large-Scale Proteomics and cDNA Assembly from 454 DNA Sequencing.

Science.gov (United States)

Offermann, Sascha; Friso, Giulia; Doroshenk, Kelly A; Sun, Qi; Sharpe, Richard M; Okita, Thomas W; Wimmer, Diana; Edwards, Gerald E; van Wijk, Klaas J

2015-05-01

Kranz C4 species strictly depend on separation of primary and secondary carbon fixation reactions in different cell types. In contrast, the single-cell C4 (SCC4) species Bienertia sinuspersici utilizes intracellular compartmentation including two physiologically and biochemically different chloroplast types; however, information on identity, localization, and induction of proteins required for this SCC4 system is currently very limited. In this study, we determined the distribution of photosynthesis-related proteins and the induction of the C4 system during development by label-free proteomics of subcellular fractions and leaves of different developmental stages. This was enabled by inferring a protein sequence database from 454 sequencing of Bienertia cDNAs. Large-scale proteome rearrangements were observed as C4 photosynthesis developed during leaf maturation. The proteomes of the two chloroplasts are different with differential accumulation of linear and cyclic electron transport components, primary and secondary carbon fixation reactions, and a triose-phosphate shuttle that is shared between the two chloroplast types. This differential protein distribution pattern suggests the presence of a mRNA or protein-sorting mechanism for nuclear-encoded, chloroplast-targeted proteins in SCC4 species. The combined information was used to provide a comprehensive model for NAD-ME type carbon fixation in SCC4 species.
PGen: large-scale genomic variations analysis workflow and browser in SoyKB.

Science.gov (United States)

Liu, Yang; Khan, Saad M; Wang, Juexin; Rynge, Mats; Zhang, Yuanxun; Zeng, Shuai; Chen, Shiyuan; Maldonado Dos Santos, Joao V; Valliyodan, Babu; Calyam, Prasad P; Merchant, Nirav; Nguyen, Henry T; Xu, Dong; Joshi, Trupti

2016-10-06

With the advances in next-generation sequencing (NGS) technology and significant reductions in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations and to apply the knowledge towards improvements in traits. To efficiently facilitate large-scale NGS resequencing data analysis of genomic variations, we have developed "PGen", an integrated and optimized workflow using the Extreme Science and Engineering Discovery Environment (XSEDE) high-performance computing (HPC) virtual system, iPlant cloud data storage resources and Pegasus workflow management system (Pegasus-WMS). The workflow allows users to identify single nucleotide polymorphisms (SNPs) and insertion-deletions (indels), perform SNP annotations and conduct copy number variation analyses on multiple resequencing datasets in a user-friendly and seamless way. We have developed both a Linux version in GitHub ( https://github.com/pegasus-isi/PGen-GenomicVariations-Workflow ) and a web-based implementation of the PGen workflow integrated within the Soybean Knowledge Base (SoyKB), ( http://soykb.org/Pegasus/index.php ). Using PGen, we identified 10,218,140 single-nucleotide polymorphisms (SNPs) and 1,398,982 indels from analysis of 106 soybean lines sequenced at 15X coverage. 297,245 non-synonymous SNPs and 3330 copy number variation (CNV) regions were identified from this analysis. SNPs identified using PGen from additional soybean resequencing projects adding to 500+ soybean germplasm lines in total have been integrated. These SNPs are being utilized for trait improvement using genotype to phenotype prediction approaches developed in-house. In order to browse and access NGS data easily, we have also developed an NGS resequencing data browser ( http://soykb.org/NGS_Resequence/NGS_index.php ) within SoyKB to provide easy access to SNP and downstream analysis results for soybean researchers. PGen workflow has been optimized for the most
Large-scale regions of antimatter

International Nuclear Information System (INIS)

Grobov, A. V.; Rubin, S. G.

2015-01-01

Amodified mechanism of the formation of large-scale antimatter regions is proposed. Antimatter appears owing to fluctuations of a complex scalar field that carries a baryon charge in the inflation era
Large-scale regions of antimatter

Energy Technology Data Exchange (ETDEWEB)

Grobov, A. V., E-mail: alexey.grobov@gmail.com; Rubin, S. G., E-mail: sgrubin@mephi.ru [National Research Nuclear University MEPhI (Russian Federation)

2015-07-15

Amodified mechanism of the formation of large-scale antimatter regions is proposed. Antimatter appears owing to fluctuations of a complex scalar field that carries a baryon charge in the inflation era.
Preserving biological diversity in the face of large-scale demands for biofuels

International Nuclear Information System (INIS)

Cook, J.J.; Beyea, J.; Keeler, K.H.

1991-01-01

Large-scale production and harvesting of biomass to replace fossil fuels could reduce biological diversity by eliminating habitat for native species. Forests would be managed and harvested more intensively, and virtually all arable land unsuitable for high-value agriculture or silviculture might be used to grow crops dedicated to energy. Given the prospects for a potentially large increase in biofuel production, it is time now to develop strategies for mitigating the loss of biodiversity that might ensue. Planning at micro to macro scales will be crucial to minimize the ecological impacts of producing biofuels. In particular, cropping and harvesting systems will need to provide the biological, spatial, and temporal diversity characteristics of natural ecosystems and successional sequences, if we are to have this technology support the environmental health of the world rather than compromise it. Incorporation of these ecological values will be necessary to forestall costly environmental restoration, even at the cost of submaximal biomass productivity. It is therefore doubtful that all managers will take the longer view. Since the costs of biodiversity loss are largely external to economic markets, society cannot rely on the market to protect biodiversity, and some sort of intervention will be necessary. 116 refs., 1 tab
Large-Scale Analysis of Art Proportions

DEFF Research Database (Denmark)

Jensen, Karl Kristoffer

2014-01-01

While literature often tries to impute mathematical constants into art, this large-scale study (11 databases of paintings and photos, around 200.000 items) shows a different truth. The analysis, consisting of the width/height proportions, shows a value of rarely if ever one (square) and with majo......While literature often tries to impute mathematical constants into art, this large-scale study (11 databases of paintings and photos, around 200.000 items) shows a different truth. The analysis, consisting of the width/height proportions, shows a value of rarely if ever one (square...
The Expanded Large Scale Gap Test

Science.gov (United States)

1987-03-01

NSWC TR 86-32 DTIC THE EXPANDED LARGE SCALE GAP TEST BY T. P. LIDDIARD D. PRICE RESEARCH AND TECHNOLOGY DEPARTMENT ’ ~MARCH 1987 Ap~proved for public...arises, to reduce the spread in the LSGT 50% gap value.) The worst charges, such as those with the highest or lowest densities, the largest re-pressed...Arlington, VA 22217 PE 62314N INS3A 1 RJ14E31 7R4TBK 11 TITLE (Include Security CIlmsilficatiorn The Expanded Large Scale Gap Test . 12. PEIRSONAL AUTHOR() T

Exploiting Illumina Sequencing for the Development of 95 Novel Polymorphic EST-SSR Markers in Common Vetch (Vicia sativa subsp. sativa

Directory of Open Access Journals (Sweden)

Zhipeng Liu

2014-05-01

Full Text Available The common vetch (Vicia sativa subsp. sativa, a self-pollinating and diploid species, is one of the most important annual legumes in the world due to its short growth period, high nutritional value, and multiple usages as hay, grain, silage, and green manure. The available simple sequence repeat (SSR markers for common vetch, however, are insufficient to meet the developing demand for genetic and molecular research on this important species. Here, we aimed to develop and characterise several polymorphic EST-SSR markers from the vetch Illumina transcriptome. A total number of 1,071 potential EST-SSR markers were identified from 1025 unigenes whose lengths were greater than 1,000 bp, and 450 primer pairs were then designed and synthesized. Finally, 95 polymorphic primer pairs were developed for the 10 common vetch accessions, which included 50 individuals. Among the 95 EST-SSR markers, the number of alleles ranged from three to 13, and the polymorphism information content values ranged from 0.09 to 0.98. The observed heterozygosity values ranged from 0.00 to 1.00, and the expected heterozygosity values ranged from 0.11 to 0.98. These 95 EST-SSR markers developed from the vetch Illumina transcriptome could greatly promote the development of genetic and molecular breeding studies pertaining to in this species.
Large scale and big data processing and management

CERN Document Server

Sakr, Sherif

2014-01-01

Large Scale and Big Data: Processing and Management provides readers with a central source of reference on the data management techniques currently available for large-scale data processing. Presenting chapters written by leading researchers, academics, and practitioners, it addresses the fundamental challenges associated with Big Data processing tools and techniques across a range of computing environments.The book begins by discussing the basic concepts and tools of large-scale Big Data processing and cloud computing. It also provides an overview of different programming models and cloud-bas
Scaling Relations of Local Magnitude versus Moment Magnitude for Sequences of Similar Earthquakes in Switzerland

KAUST Repository

Bethmann, F.

2011-03-22

Theoretical considerations and empirical regressions show that, in the magnitude range between 3 and 5, local magnitude, ML, and moment magnitude, Mw, scale 1:1. Previous studies suggest that for smaller magnitudes this 1:1 scaling breaks down. However, the scatter between ML and Mw at small magnitudes is usually large and the resulting scaling relations are therefore uncertain. In an attempt to reduce these uncertainties, we first analyze the ML versus Mw relation based on 195 events, induced by the stimulation of a geothermal reservoir below the city of Basel, Switzerland. Values of ML range from 0.7 to 3.4. From these data we derive a scaling of ML ~ 1:5Mw over the given magnitude range. We then compare peak Wood-Anderson amplitudes to the low-frequency plateau of the displacement spectra for six sequences of similar earthquakes in Switzerland in the range of 0:5 ≤ ML ≤ 4:1. Because effects due to the radiation pattern and to the propagation path between source and receiver are nearly identical at a particular station for all events in a given sequence, the scatter in the data is substantially reduced. Again we obtain a scaling equivalent to ML ~ 1:5Mw. Based on simulations using synthetic source time functions for different magnitudes and Q values estimated from spectral ratios between downhole and surface recordings, we conclude that the observed scaling can be explained by attenuation and scattering along the path. Other effects that could explain the observed magnitude scaling, such as a possible systematic increase of stress drop or rupture velocity with moment magnitude, are masked by attenuation along the path.
Large scale cluster computing workshop

International Nuclear Information System (INIS)

Dane Skow; Alan Silverman

2002-01-01

Recent revolutions in computer hardware and software technologies have paved the way for the large-scale deployment of clusters of commodity computers to address problems heretofore the domain of tightly coupled SMP processors. Near term projects within High Energy Physics and other computing communities will deploy clusters of scale 1000s of processors and be used by 100s to 1000s of independent users. This will expand the reach in both dimensions by an order of magnitude from the current successful production facilities. The goals of this workshop were: (1) to determine what tools exist which can scale up to the cluster sizes foreseen for the next generation of HENP experiments (several thousand nodes) and by implication to identify areas where some investment of money or effort is likely to be needed. (2) To compare and record experimences gained with such tools. (3) To produce a practical guide to all stages of planning, installing, building and operating a large computing cluster in HENP. (4) To identify and connect groups with similar interest within HENP and the larger clustering community
Signaling pathways in a Citrus EST database

Directory of Open Access Journals (Sweden)

Angela Mehta

2007-01-01

Full Text Available Citrus spp. are economically important crops, which in Brazil are grown mainly in the State of São Paulo. Citrus cultures are attacked by several pathogens, causing severe yield losses. In order to better understand this culture, the Millenium Project (IAC Cordeirópolis was launched in order to sequence Citrus ESTs (expressed sequence tags from different tissues, including leaf, bark, fruit, root and flower. Plants were submitted to biotic and abiotic stresses and investigated under different development stages (adult vs. juvenile. Several cDNA libraries were constructed and the sequences obtained formed the Citrus ESTs database with almost 200,000 sequences. Searches were performed in the Citrus database to investigate the presence of different signaling pathway components. Several of the genes involved in the signaling of sugar, calcium, cytokinin, plant hormones, inositol phosphate, MAPKinase and COP9 were found in the citrus genome and are discussed in this paper. The results obtained may indicate that similar mechanisms described in other plants, such as Arabidopsis, occur in citrus. Further experimental studies must be conducted in order to understand the different signaling pathways present.
Large-Scale Agriculture and Outgrower Schemes in Ethiopia

DEFF Research Database (Denmark)

Wendimu, Mengistu Assefa

, the impact of large-scale agriculture and outgrower schemes on productivity, household welfare and wages in developing countries is highly contentious. Chapter 1 of this thesis provides an introduction to the study, while also reviewing the key debate in the contemporary land ‘grabbing’ and historical large...... sugarcane outgrower scheme on household income and asset stocks. Chapter 5 examines the wages and working conditions in ‘formal’ large-scale and ‘informal’ small-scale irrigated agriculture. The results in Chapter 2 show that moisture stress, the use of untested planting materials, and conflict over land...... commands a higher wage than ‘formal’ large-scale agriculture, while rather different wage determination mechanisms exist in the two sectors. Human capital characteristics (education and experience) partly explain the differences in wages within the formal sector, but play no significant role...
OFFSCALE: PC input processor for SCALE-4 criticality sequences

International Nuclear Information System (INIS)

Bowman, S.M.

1991-01-01

OFFSCALE is a personal computer program that serves as a user-friendly interface for the Criticality Safety Analysis Sequences (CSAS) available in SCALE-4. It is designed to assist a SCALE-4 user in preparing an input file for execution of criticality safety problems. Output from OFFSCALE is a card-image input file that may be uploaded to a mainframe computer to execute the CSAS4 control module in SCALE-4. OFFSCALE features a pulldown menu system that accesses sophisticated data entry screens. The program allows the user to quickly set up a CSAS4 input file and perform data checking
Economically viable large-scale hydrogen liquefaction

Science.gov (United States)

Cardella, U.; Decker, L.; Klein, H.

2017-02-01

The liquid hydrogen demand, particularly driven by clean energy applications, will rise in the near future. As industrial large scale liquefiers will play a major role within the hydrogen supply chain, production capacity will have to increase by a multiple of today’s typical sizes. The main goal is to reduce the total cost of ownership for these plants by increasing energy efficiency with innovative and simple process designs, optimized in capital expenditure. New concepts must ensure a manageable plant complexity and flexible operability. In the phase of process development and selection, a dimensioning of key equipment for large scale liquefiers, such as turbines and compressors as well as heat exchangers, must be performed iteratively to ensure technological feasibility and maturity. Further critical aspects related to hydrogen liquefaction, e.g. fluid properties, ortho-para hydrogen conversion, and coldbox configuration, must be analysed in detail. This paper provides an overview on the approach, challenges and preliminary results in the development of efficient as well as economically viable concepts for large-scale hydrogen liquefaction.
Large-scale phylogenomic analysis resolves a backbone phylogeny in ferns

Science.gov (United States)

Shen, Hui; Jin, Dongmei; Shu, Jiang-Ping; Zhou, Xi-Le; Lei, Ming; Wei, Ran; Shang, Hui; Wei, Hong-Jin; Zhang, Rui; Liu, Li; Gu, Yu-Feng; Zhang, Xian-Chun; Yan, Yue-Hong

2018-01-01

Abstract Background Ferns, originated about 360 million years ago, are the sister group of seed plants. Despite the remarkable progress in our understanding of fern phylogeny, with conflicting molecular evidence and different morphological interpretations, relationships among major fern lineages remain controversial. Results With the aim to obtain a robust fern phylogeny, we carried out a large-scale phylogenomic analysis using high-quality transcriptome sequencing data, which covered 69 fern species from 38 families and 11 orders. Both coalescent-based and concatenation-based methods were applied to both nucleotide and amino acid sequences in species tree estimation. The resulting topologies are largely congruent with each other, except for the placement of Angiopteris fokiensis, Cheiropleuria bicuspis, Diplaziopsis brunoniana, Matteuccia struthiopteris, Elaphoglossum mcclurei, and Tectaria subpedata. Conclusions Our result confirmed that Equisetales is sister to the rest of ferns, and Dennstaedtiaceae is sister to eupolypods. Moreover, our result strongly supported some relationships different from the current view of fern phylogeny, including that Marattiaceae may be sister to the monophyletic clade of Psilotaceae and Ophioglossaceae; that Gleicheniaceae and Hymenophyllaceae form a monophyletic clade sister to Dipteridaceae; and that Aspleniaceae is sister to the rest of the groups in eupolypods II. These results were interpreted with morphological traits, especially sporangia characters, and a new evolutionary route of sporangial annulus in ferns was suggested. This backbone phylogeny in ferns sets a foundation for further studies in biology and evolution in ferns, and therefore in plants. PMID:29186447
Large-scale phylogenomic analysis resolves a backbone phylogeny in ferns.

Science.gov (United States)

Shen, Hui; Jin, Dongmei; Shu, Jiang-Ping; Zhou, Xi-Le; Lei, Ming; Wei, Ran; Shang, Hui; Wei, Hong-Jin; Zhang, Rui; Liu, Li; Gu, Yu-Feng; Zhang, Xian-Chun; Yan, Yue-Hong

2018-02-01

Ferns, originated about 360 million years ago, are the sister group of seed plants. Despite the remarkable progress in our understanding of fern phylogeny, with conflicting molecular evidence and different morphological interpretations, relationships among major fern lineages remain controversial. With the aim to obtain a robust fern phylogeny, we carried out a large-scale phylogenomic analysis using high-quality transcriptome sequencing data, which covered 69 fern species from 38 families and 11 orders. Both coalescent-based and concatenation-based methods were applied to both nucleotide and amino acid sequences in species tree estimation. The resulting topologies are largely congruent with each other, except for the placement of Angiopteris fokiensis, Cheiropleuria bicuspis, Diplaziopsis brunoniana, Matteuccia struthiopteris, Elaphoglossum mcclurei, and Tectaria subpedata. Our result confirmed that Equisetales is sister to the rest of ferns, and Dennstaedtiaceae is sister to eupolypods. Moreover, our result strongly supported some relationships different from the current view of fern phylogeny, including that Marattiaceae may be sister to the monophyletic clade of Psilotaceae and Ophioglossaceae; that Gleicheniaceae and Hymenophyllaceae form a monophyletic clade sister to Dipteridaceae; and that Aspleniaceae is sister to the rest of the groups in eupolypods II. These results were interpreted with morphological traits, especially sporangia characters, and a new evolutionary route of sporangial annulus in ferns was suggested. This backbone phylogeny in ferns sets a foundation for further studies in biology and evolution in ferns, and therefore in plants. © The Authors 2017. Published by Oxford University Press.
Large scale chromatographic separations using continuous displacement chromatography (CDC)

International Nuclear Information System (INIS)

Taniguchi, V.T.; Doty, A.W.; Byers, C.H.

1988-01-01

A process for large scale chromatographic separations using a continuous chromatography technique is described. The process combines the advantages of large scale batch fixed column displacement chromatography with conventional analytical or elution continuous annular chromatography (CAC) to enable large scale displacement chromatography to be performed on a continuous basis (CDC). Such large scale, continuous displacement chromatography separations have not been reported in the literature. The process is demonstrated with the ion exchange separation of a binary lanthanide (Nd/Pr) mixture. The process is, however, applicable to any displacement chromatography separation that can be performed using conventional batch, fixed column chromatography
Large Scale Processes and Extreme Floods in Brazil

Science.gov (United States)

Ribeiro Lima, C. H.; AghaKouchak, A.; Lall, U.

2016-12-01

Persistent large scale anomalies in the atmospheric circulation and ocean state have been associated with heavy rainfall and extreme floods in water basins of different sizes across the world. Such studies have emerged in the last years as a new tool to improve the traditional, stationary based approach in flood frequency analysis and flood prediction. Here we seek to advance previous studies by evaluating the dominance of large scale processes (e.g. atmospheric rivers/moisture transport) over local processes (e.g. local convection) in producing floods. We consider flood-prone regions in Brazil as case studies and the role of large scale climate processes in generating extreme floods in such regions is explored by means of observed streamflow, reanalysis data and machine learning methods. The dynamics of the large scale atmospheric circulation in the days prior to the flood events are evaluated based on the vertically integrated moisture flux and its divergence field, which are interpreted in a low-dimensional space as obtained by machine learning techniques, particularly supervised kernel principal component analysis. In such reduced dimensional space, clusters are obtained in order to better understand the role of regional moisture recycling or teleconnected moisture in producing floods of a given magnitude. The convective available potential energy (CAPE) is also used as a measure of local convection activities. We investigate for individual sites the exceedance probability in which large scale atmospheric fluxes dominate the flood process. Finally, we analyze regional patterns of floods and how the scaling law of floods with drainage area responds to changes in the climate forcing mechanisms (e.g. local vs large scale).
Computing in Large-Scale Dynamic Systems

NARCIS (Netherlands)

Pruteanu, A.S.

2013-01-01

Software applications developed for large-scale systems have always been difficult to de- velop due to problems caused by the large number of computing devices involved. Above a certain network size (roughly one hundred), necessary services such as code updating, topol- ogy discovery and data
Fires in large scale ventilation systems

International Nuclear Information System (INIS)

Gregory, W.S.; Martin, R.A.; White, B.W.; Nichols, B.D.; Smith, P.R.; Leslie, I.H.; Fenton, D.L.; Gunaji, M.V.; Blythe, J.P.

1991-01-01

This paper summarizes the experience gained simulating fires in large scale ventilation systems patterned after ventilation systems found in nuclear fuel cycle facilities. The series of experiments discussed included: (1) combustion aerosol loading of 0.61x0.61 m HEPA filters with the combustion products of two organic fuels, polystyrene and polymethylemethacrylate; (2) gas dynamic and heat transport through a large scale ventilation system consisting of a 0.61x0.61 m duct 90 m in length, with dampers, HEPA filters, blowers, etc.; (3) gas dynamic and simultaneous transport of heat and solid particulate (consisting of glass beads with a mean aerodynamic diameter of 10μ) through the large scale ventilation system; and (4) the transport of heat and soot, generated by kerosene pool fires, through the large scale ventilation system. The FIRAC computer code, designed to predict fire-induced transients in nuclear fuel cycle facility ventilation systems, was used to predict the results of experiments (2) through (4). In general, the results of the predictions were satisfactory. The code predictions for the gas dynamics, heat transport, and particulate transport and deposition were within 10% of the experimentally measured values. However, the code was less successful in predicting the amount of soot generation from kerosene pool fires, probably due to the fire module of the code being a one-dimensional zone model. The experiments revealed a complicated three-dimensional combustion pattern within the fire room of the ventilation system. Further refinement of the fire module within FIRAC is needed. (orig.)
Pms2 suppresses large expansions of the (GAA·TTCn sequence in neuronal tissues.

Directory of Open Access Journals (Sweden)

Rebecka L Bourn

Full Text Available Expanded trinucleotide repeat sequences are the cause of several inherited neurodegenerative diseases. Disease pathogenesis is correlated with several features of somatic instability of these sequences, including further large expansions in postmitotic tissues. The presence of somatic expansions in postmitotic tissues is consistent with DNA repair being a major determinant of somatic instability. Indeed, proteins in the mismatch repair (MMR pathway are required for instability of the expanded (CAG·CTG(n sequence, likely via recognition of intrastrand hairpins by MutSβ. It is not clear if or how MMR would affect instability of disease-causing expanded trinucleotide repeat sequences that adopt secondary structures other than hairpins, such as the triplex/R-loop forming (GAA·TTC(n sequence that causes Friedreich ataxia. We analyzed somatic instability in transgenic mice that carry an expanded (GAA·TTC(n sequence in the context of the human FXN locus and lack the individual MMR proteins Msh2, Msh6 or Pms2. The absence of Msh2 or Msh6 resulted in a dramatic reduction in somatic mutations, indicating that mammalian MMR promotes instability of the (GAA·TTC(n sequence via MutSα. The absence of Pms2 resulted in increased accumulation of large expansions in the nervous system (cerebellum, cerebrum, and dorsal root ganglia but not in non-neuronal tissues (heart and kidney, without affecting the prevalence of contractions. Pms2 suppressed large expansions specifically in tissues showing MutSα-dependent somatic instability, suggesting that they may act on the same lesion or structure associated with the expanded (GAA·TTC(n sequence. We conclude that Pms2 specifically suppresses large expansions of a pathogenic trinucleotide repeat sequence in neuronal tissues, possibly acting independently of the canonical MMR pathway.
Large-scale Complex IT Systems

OpenAIRE

Sommerville, Ian; Cliff, Dave; Calinescu, Radu; Keen, Justin; Kelly, Tim; Kwiatkowska, Marta; McDermid, John; Paige, Richard

2011-01-01

This paper explores the issues around the construction of large-scale complex systems which are built as 'systems of systems' and suggests that there are fundamental reasons, derived from the inherent complexity in these systems, why our current software engineering methods and techniques cannot be scaled up to cope with the engineering challenges of constructing such systems. It then goes on to propose a research and education agenda for software engineering that identifies the major challen...
Large-scale complex IT systems

OpenAIRE

Sommerville, Ian; Cliff, Dave; Calinescu, Radu; Keen, Justin; Kelly, Tim; Kwiatkowska, Marta; McDermid, John; Paige, Richard

2012-01-01

12 pages, 2 figures This paper explores the issues around the construction of large-scale complex systems which are built as 'systems of systems' and suggests that there are fundamental reasons, derived from the inherent complexity in these systems, why our current software engineering methods and techniques cannot be scaled up to cope with the engineering challenges of constructing such systems. It then goes on to propose a research and education agenda for software engineering that ident...
First Mile Challenges for Large-Scale IoT

KAUST Repository

Bader, Ahmed; Elsawy, Hesham; Gharbieh, Mohammad; Alouini, Mohamed-Slim; Adinoyi, Abdulkareem; Alshaalan, Furaih

2017-01-01

The Internet of Things is large-scale by nature. This is not only manifested by the large number of connected devices, but also by the sheer scale of spatial traffic intensity that must be accommodated, primarily in the uplink direction. To that end
PCR-Based EST Mapping in Wheat (Triticum aestivum L.

Directory of Open Access Journals (Sweden)

J. PERRY GUSTAFSON

2009-04-01

Full Text Available Mapping expressed sequence tags (ESTs to hexaploid wheat is aimed to reveal the structure and function of the hexaploid wheat genome. Sixty eight ESTs representing 26 genes were mapped into all seven homologous chromosome groups of wheat (Triticum aestivum L using a polymerase chain reaction technique. The majority of the ESTs were mapped to homologous chromosome group 2, and the least were mapped to homologous chromosome group 6. Comparative analysis between the EST map from this study and the EST map based on RFLPs showed 14 genes that have been mapped by both approaches were mapped to the same arm of the same homologous chromosome, which indicated that using PCR-based ESTs was a reliable approach in mapping ESTs in hexaploid wheat.
QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species

Directory of Open Access Journals (Sweden)

Voorrips Roeland E

2006-10-01

Full Text Available Abstract Background Single nucleotide polymorphisms (SNPs are important tools in studying complex genetic traits and genome evolution. Computational strategies for SNP discovery make use of the large number of sequences present in public databases (in most cases as expressed sequence tags (ESTs and are considered to be faster and more cost-effective than experimental procedures. A major challenge in computational SNP discovery is distinguishing allelic variation from sequence variation between paralogous sequences, in addition to recognizing sequencing errors. For the majority of the public EST sequences, trace or quality files are lacking which makes detection of reliable SNPs even more difficult because it has to rely on sequence comparisons only. Results We have developed a new algorithm to detect reliable SNPs and insertions/deletions (indels in EST data, both with and without quality files. Implemented in a pipeline called QualitySNP, it uses three filters for the identification of reliable SNPs. Filter 1 screens for all potential SNPs and identifies variation between or within genotypes. Filter 2 is the core filter that uses a haplotype-based strategy to detect reliable SNPs. Clusters with potential paralogs as well as false SNPs caused by sequencing errors are identified. Filter 3 screens SNPs by calculating a confidence score, based upon sequence redundancy and quality. Non-synonymous SNPs are subsequently identified by detecting open reading frames of consensus sequences (contigs with SNPs. The pipeline includes a data storage and retrieval system for haplotypes, SNPs and alignments. QualitySNP's versatility is demonstrated by the identification of SNPs in EST datasets from potato, chicken and humans. Conclusion QualitySNP is an efficient tool for SNP detection, storage and retrieval in diploid as well as polyploid species. It is available for running on Linux or UNIX systems. The program, test data, and user manual are available at

Large-Scale Constraint-Based Pattern Mining

Science.gov (United States)

Zhu, Feida

2009-01-01

We studied the problem of constraint-based pattern mining for three different data formats, item-set, sequence and graph, and focused on mining patterns of large sizes. Colossal patterns in each data formats are studied to discover pruning properties that are useful for direct mining of these patterns. For item-set data, we observed robustness of…
Prospects for large scale electricity storage in Denmark

DEFF Research Database (Denmark)

Krog Ekman, Claus; Jensen, Søren Højgaard

2010-01-01

In a future power systems with additional wind power capacity there will be an increased need for large scale power management as well as reliable balancing and reserve capabilities. Different technologies for large scale electricity storage provide solutions to the different challenges arising w...
Evolution of scaling emergence in large-scale spatial epidemic spreading.

Science.gov (United States)

Wang, Lin; Li, Xiang; Zhang, Yi-Qing; Zhang, Yan; Zhang, Kan

2011-01-01

Zipf's law and Heaps' law are two representatives of the scaling concepts, which play a significant role in the study of complexity science. The coexistence of the Zipf's law and the Heaps' law motivates different understandings on the dependence between these two scalings, which has still hardly been clarified. In this article, we observe an evolution process of the scalings: the Zipf's law and the Heaps' law are naturally shaped to coexist at the initial time, while the crossover comes with the emergence of their inconsistency at the larger time before reaching a stable state, where the Heaps' law still exists with the disappearance of strict Zipf's law. Such findings are illustrated with a scenario of large-scale spatial epidemic spreading, and the empirical results of pandemic disease support a universal analysis of the relation between the two laws regardless of the biological details of disease. Employing the United States domestic air transportation and demographic data to construct a metapopulation model for simulating the pandemic spread at the U.S. country level, we uncover that the broad heterogeneity of the infrastructure plays a key role in the evolution of scaling emergence. The analyses of large-scale spatial epidemic spreading help understand the temporal evolution of scalings, indicating the coexistence of the Zipf's law and the Heaps' law depends on the collective dynamics of epidemic processes, and the heterogeneity of epidemic spread indicates the significance of performing targeted containment strategies at the early time of a pandemic disease.
Large-Scale Structure and Hyperuniformity of Amorphous Ices

Science.gov (United States)

Martelli, Fausto; Torquato, Salvatore; Giovambattista, Nicolas; Car, Roberto

2017-09-01

We investigate the large-scale structure of amorphous ices and transitions between their different forms by quantifying their large-scale density fluctuations. Specifically, we simulate the isothermal compression of low-density amorphous ice (LDA) and hexagonal ice to produce high-density amorphous ice (HDA). Both HDA and LDA are nearly hyperuniform; i.e., they are characterized by an anomalous suppression of large-scale density fluctuations. By contrast, in correspondence with the nonequilibrium phase transitions to HDA, the presence of structural heterogeneities strongly suppresses the hyperuniformity and the system becomes hyposurficial (devoid of "surface-area fluctuations"). Our investigation challenges the largely accepted "frozen-liquid" picture, which views glasses as structurally arrested liquids. Beyond implications for water, our findings enrich our understanding of pressure-induced structural transformations in glasses.
Towards a Database System for Large-scale Analytics on Strings

KAUST Repository

Sahli, Majed A.

2015-07-23

Recent technological advances are causing an explosion in the production of sequential data. Biological sequences, web logs and time series are represented as strings. Currently, strings are stored, managed and queried in an ad-hoc fashion because they lack a standardized data model and query language. String queries are computationally demanding, especially when strings are long and numerous. Existing approaches cannot handle the growing number of strings produced by environmental, healthcare, bioinformatic, and space applications. There is a trade- off between performing analytics efficiently and scaling to thousands of cores to finish in reasonable times. In this thesis, we introduce a data model that unifies the input and output representations of core string operations. We define a declarative query language for strings where operators can be pipelined to form complex queries. A rich set of core string operators is described to support string analytics. We then demonstrate a database system for string analytics based on our model and query language. In particular, we propose the use of a novel data structure augmented by efficient parallel computation to strike a balance between preprocessing overheads and query execution times. Next, we delve into repeated motifs extraction as a core string operation for large-scale string analytics. Motifs are frequent patterns used, for example, to identify biological functionality, periodic trends, or malicious activities. Statistical approaches are fast but inexact while combinatorial methods are sound but slow. We introduce ACME, a combinatorial repeated motifs extractor. We study the spatial and temporal locality of motif extraction and devise a cache-aware search space traversal technique. ACME is the only method that scales to gigabyte- long strings, handles large alphabets, and supports interesting motif types with minimal overhead. While ACME is cache-efficient, it is limited by being serial. We devise a lightweight
Transcriptome analysis of blueberry using 454 EST sequencing

Science.gov (United States)

Blueberry (Vaccinium corymbosum) is a major berry crop in the United States, and one that has great nutritional and economical value. Next generation sequencing methodologies, such as 454, have been demonstrated to be successful and efficient in producing a snap-shot of transcriptional activities du...
Stormbow: A Cloud-Based Tool for Reads Mapping and Expression Quantification in Large-Scale RNA-Seq Studies.

Science.gov (United States)

Zhao, Shanrong; Prenger, Kurt; Smith, Lance

2013-01-01

RNA-Seq is becoming a promising replacement to microarrays in transcriptome profiling and differential gene expression study. Technical improvements have decreased sequencing costs and, as a result, the size and number of RNA-Seq datasets have increased rapidly. However, the increasing volume of data from large-scale RNA-Seq studies poses a practical challenge for data analysis in a local environment. To meet this challenge, we developed Stormbow, a cloud-based software package, to process large volumes of RNA-Seq data in parallel. The performance of Stormbow has been tested by practically applying it to analyse 178 RNA-Seq samples in the cloud. In our test, it took 6 to 8 hours to process an RNA-Seq sample with 100 million reads, and the average cost was $3.50 per sample. Utilizing Amazon Web Services as the infrastructure for Stormbow allows us to easily scale up to handle large datasets with on-demand computational resources. Stormbow is a scalable, cost effective, and open-source based tool for large-scale RNA-Seq data analysis. Stormbow can be freely downloaded and can be used out of box to process Illumina RNA-Seq datasets.
Plasmonic nanoparticle lithography: Fast resist-free laser technique for large-scale sub-50 nm hole array fabrication

Science.gov (United States)

Pan, Zhenying; Yu, Ye Feng; Valuckas, Vytautas; Yap, Sherry L. K.; Vienne, Guillaume G.; Kuznetsov, Arseniy I.

2018-05-01

Cheap large-scale fabrication of ordered nanostructures is important for multiple applications in photonics and biomedicine including optical filters, solar cells, plasmonic biosensors, and DNA sequencing. Existing methods are either expensive or have strict limitations on the feature size and fabrication complexity. Here, we present a laser-based technique, plasmonic nanoparticle lithography, which is capable of rapid fabrication of large-scale arrays of sub-50 nm holes on various substrates. It is based on near-field enhancement and melting induced under ordered arrays of plasmonic nanoparticles, which are brought into contact or in close proximity to a desired material and acting as optical near-field lenses. The nanoparticles are arranged in ordered patterns on a flexible substrate and can be attached and removed from the patterned sample surface. At optimized laser fluence, the nanohole patterning process does not create any observable changes to the nanoparticles and they have been applied multiple times as reusable near-field masks. This resist-free nanolithography technique provides a simple and cheap solution for large-scale nanofabrication.
Application of large-scale sequencing to marker discovery in plants

Indian Academy of Sciences (India)

2012-10-15

Oct 15, 2012 ... mate-pair libraries (large insert libraries), RNA-Seq data, reduced ... range of different applications for SGS have been developed and applied to marker ..... duced by human selection for desirable grain qualities. A total of 399 ...
Double inflation: A possible resolution of the large-scale structure problem

International Nuclear Information System (INIS)

Turner, M.S.; Villumsen, J.V.; Vittorio, N.; Silk, J.; Juszkiewicz, R.

1986-11-01

A model is presented for the large-scale structure of the universe in which two successive inflationary phases resulted in large small-scale and small large-scale density fluctuations. This bimodal density fluctuation spectrum in an Ω = 1 universe dominated by hot dark matter leads to large-scale structure of the galaxy distribution that is consistent with recent observational results. In particular, large, nearly empty voids and significant large-scale peculiar velocity fields are produced over scales of ∼100 Mpc, while the small-scale structure over ≤ 10 Mpc resembles that in a low density universe, as observed. Detailed analytical calculations and numerical simulations are given of the spatial and velocity correlations. 38 refs., 6 figs
Large-scale fracture mechancis testing -- requirements and possibilities

International Nuclear Information System (INIS)

Brumovsky, M.

1993-01-01

Application of fracture mechanics to very important and/or complicated structures, like reactor pressure vessels, brings also some questions about the reliability and precision of such calculations. These problems become more pronounced in cases of elastic-plastic conditions of loading and/or in parts with non-homogeneous materials (base metal and austenitic cladding, property gradient changes through material thickness) or with non-homogeneous stress fields (nozzles, bolt threads, residual stresses etc.). For such special cases some verification by large-scale testing is necessary and valuable. This paper discusses problems connected with planning of such experiments with respect to their limitations, requirements to a good transfer of received results to an actual vessel. At the same time, an analysis of possibilities of small-scale model experiments is also shown, mostly in connection with application of results between standard, small-scale and large-scale experiments. Experience from 30 years of large-scale testing in SKODA is used as an example to support this analysis. 1 fig
Ethics of large-scale change

DEFF Research Database (Denmark)

Arler, Finn

2006-01-01

, which kind of attitude is appropriate when dealing with large-scale changes like these from an ethical point of view. Three kinds of approaches are discussed: Aldo Leopold's mountain thinking, the neoclassical economists' approach, and finally the so-called Concentric Circle Theories approach...
Comparison Between Overtopping Discharge in Small and Large Scale Models

DEFF Research Database (Denmark)

Helgason, Einar; Burcharth, Hans F.

2006-01-01

The present paper presents overtopping measurements from small scale model test performed at the Haudraulic & Coastal Engineering Laboratory, Aalborg University, Denmark and large scale model tests performed at the Largde Wave Channel,Hannover, Germany. Comparison between results obtained from...... small and large scale model tests show no clear evidence of scale effects for overtopping above a threshold value. In the large scale model no overtopping was measured for waveheights below Hs = 0.5m as the water sunk into the voids between the stones on the crest. For low overtopping scale effects...
Evaluating Unmanned Aerial Platforms for Cultural Heritage Large Scale Mapping

Science.gov (United States)

Georgopoulos, A.; Oikonomou, C.; Adamopoulos, E.; Stathopoulou, E. K.

2016-06-01

When it comes to large scale mapping of limited areas especially for cultural heritage sites, things become critical. Optical and non-optical sensors are developed to such sizes and weights that can be lifted by such platforms, like e.g. LiDAR units. At the same time there is an increase in emphasis on solutions that enable users to get access to 3D information faster and cheaper. Considering the multitude of platforms, cameras and the advancement of algorithms in conjunction with the increase of available computing power this challenge should and indeed is further investigated. In this paper a short review of the UAS technologies today is attempted. A discussion follows as to their applicability and advantages, depending on their specifications, which vary immensely. The on-board cameras available are also compared and evaluated for large scale mapping. Furthermore a thorough analysis, review and experimentation with different software implementations of Structure from Motion and Multiple View Stereo algorithms, able to process such dense and mostly unordered sequence of digital images is also conducted and presented. As test data set, we use a rich optical and thermal data set from both fixed wing and multi-rotor platforms over an archaeological excavation with adverse height variations and using different cameras. Dense 3D point clouds, digital terrain models and orthophotos have been produced and evaluated for their radiometric as well as metric qualities.
Needs, opportunities, and options for large scale systems research

Energy Technology Data Exchange (ETDEWEB)

Thompson, G.L.

1984-10-01

The Office of Energy Research was recently asked to perform a study of Large Scale Systems in order to facilitate the development of a true large systems theory. It was decided to ask experts in the fields of electrical engineering, chemical engineering and manufacturing/operations research for their ideas concerning large scale systems research. The author was asked to distribute a questionnaire among these experts to find out their opinions concerning recent accomplishments and future research directions in large scale systems research. He was also requested to convene a conference which included three experts in each area as panel members to discuss the general area of large scale systems research. The conference was held on March 26--27, 1984 in Pittsburgh with nine panel members, and 15 other attendees. The present report is a summary of the ideas presented and the recommendations proposed by the attendees.
All 5' EST - KOME | LSDB Archive [Life Science Database Archive metadata

Lifescience Database Archive (English)

Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data ...n of data contents 5' EST sequences Data file File name: CSV: kome_est_5end_all.zip File URL: ftp://ftp.biosciencedbc.jp/archiv...fasta.zip File URL: ftp://ftp.biosciencedbc.jp/archive/kome/LATEST/kome_est_5end_...se Description Download License Update History of This Database Site Policy | Contact Us All 5' EST - KOME | LSDB Archive ...
All 3' EST - KOME | LSDB Archive [Life Science Database Archive metadata

Lifescience Database Archive (English)

Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data ...n of data contents 3' EST sequences Data file File name: CSV: kome_est_3end_all.zip File URL: ftp://ftp.biosciencedbc.jp/archiv...fasta.zip File URL: ftp://ftp.biosciencedbc.jp/archive/kome/LATEST/kome_est_3end_...se Description Download License Update History of This Database Site Policy | Contact Us All 3' EST - KOME | LSDB Archive ...
Large-scale structure of the Universe

International Nuclear Information System (INIS)

Doroshkevich, A.G.

1978-01-01

The problems, discussed at the ''Large-scale Structure of the Universe'' symposium are considered on a popular level. Described are the cell structure of galaxy distribution in the Universe, principles of mathematical galaxy distribution modelling. The images of cell structures, obtained after reprocessing with the computer are given. Discussed are three hypothesis - vortical, entropic, adiabatic, suggesting various processes of galaxy and galaxy clusters origin. A considerable advantage of the adiabatic hypothesis is recognized. The relict radiation, as a method of direct studying the processes taking place in the Universe is considered. The large-scale peculiarities and small-scale fluctuations of the relict radiation temperature enable one to estimate the turbance properties at the pre-galaxy stage. The discussion of problems, pertaining to studying the hot gas, contained in galaxy clusters, the interactions within galaxy clusters and with the inter-galaxy medium, is recognized to be a notable contribution into the development of theoretical and observational cosmology
Seismic safety in conducting large-scale blasts

Science.gov (United States)

Mashukov, I. V.; Chaplygin, V. V.; Domanov, V. P.; Semin, A. A.; Klimkin, M. A.

2017-09-01

In mining enterprises to prepare hard rocks for excavation a drilling and blasting method is used. With the approach of mining operations to settlements the negative effect of large-scale blasts increases. To assess the level of seismic impact of large-scale blasts the scientific staff of Siberian State Industrial University carried out expertise for coal mines and iron ore enterprises. Determination of the magnitude of surface seismic vibrations caused by mass explosions was performed using seismic receivers, an analog-digital converter with recording on a laptop. The registration results of surface seismic vibrations during production of more than 280 large-scale blasts at 17 mining enterprises in 22 settlements are presented. The maximum velocity values of the Earth’s surface vibrations are determined. The safety evaluation of seismic effect was carried out according to the permissible value of vibration velocity. For cases with exceedance of permissible values recommendations were developed to reduce the level of seismic impact.
Image-based Exploration of Large-Scale Pathline Fields

KAUST Repository

Nagoor, Omniah H.

2014-05-27

While real-time applications are nowadays routinely used in visualizing large nu- merical simulations and volumes, handling these large-scale datasets requires high-end graphics clusters or supercomputers to process and visualize them. However, not all users have access to powerful clusters. Therefore, it is challenging to come up with a visualization approach that provides insight to large-scale datasets on a single com- puter. Explorable images (EI) is one of the methods that allows users to handle large data on a single workstation. Although it is a view-dependent method, it combines both exploration and modification of visual aspects without re-accessing the original huge data. In this thesis, we propose a novel image-based method that applies the concept of EI in visualizing large flow-field pathlines data. The goal of our work is to provide an optimized image-based method, which scales well with the dataset size. Our approach is based on constructing a per-pixel linked list data structure in which each pixel contains a list of pathlines segments. With this view-dependent method it is possible to filter, color-code and explore large-scale flow data in real-time. In addition, optimization techniques such as early-ray termination and deferred shading are applied, which further improves the performance and scalability of our approach.

Ontology and diversity of transcript-associated microsatellites mined from a globe artichoke EST database

Science.gov (United States)

Scaglione, Davide; Acquadro, Alberto; Portis, Ezio; Taylor, Christopher A; Lanteri, Sergio; Knapp, Steven J

2009-01-01

Background The globe artichoke (Cynara cardunculus var. scolymus L.) is a significant crop in the Mediterranean basin. Despite its commercial importance and its both dietary and pharmaceutical value, knowledge of its genetics and genomics remains scant. Microsatellite markers have become a key tool in genetic and genomic analysis, and we have exploited recently acquired EST (expressed sequence tag) sequence data (Composite Genome Project - CGP) to develop an extensive set of microsatellite markers. Results A unigene assembly was created from over 36,000 globe artichoke EST sequences, containing 6,621 contigs and 12,434 singletons. Over 12,000 of these unigenes were functionally assigned on the basis of homology with Arabidopsis thaliana reference proteins. A total of 4,219 perfect repeats, located within 3,308 unigenes was identified and the gene ontology (GO) analysis highlighted some GO term's enrichments among different classes of microsatellites with respect to their position. Sufficient flanking sequence was available to enable the design of primers to amplify 2,311 of these microsatellites, and a set of 300 was tested against a DNA panel derived from 28 C. cardunculus genotypes. Consistent amplification and polymorphism was obtained from 236 of these assays. Their polymorphic information content (PIC) ranged from 0.04 to 0.90 (mean 0.66). Between 176 and 198 of the assays were informative in at least one of the three available mapping populations. Conclusion EST-based microsatellites have provided a large set of de novo genetic markers, which show significant amounts of polymorphism both between and within the three taxa of C. cardunculus. They are thus well suited as assays for phylogenetic analysis, the construction of genetic maps, marker-assisted breeding, transcript mapping and other genomic applications in the species. PMID:19785740
Large scale analysis of signal reachability.

Science.gov (United States)

Todor, Andrei; Gabr, Haitham; Dobra, Alin; Kahveci, Tamer

2014-06-15

Major disorders, such as leukemia, have been shown to alter the transcription of genes. Understanding how gene regulation is affected by such aberrations is of utmost importance. One promising strategy toward this objective is to compute whether signals can reach to the transcription factors through the transcription regulatory network (TRN). Due to the uncertainty of the regulatory interactions, this is a #P-complete problem and thus solving it for very large TRNs remains to be a challenge. We develop a novel and scalable method to compute the probability that a signal originating at any given set of source genes can arrive at any given set of target genes (i.e., transcription factors) when the topology of the underlying signaling network is uncertain. Our method tackles this problem for large networks while providing a provably accurate result. Our method follows a divide-and-conquer strategy. We break down the given network into a sequence of non-overlapping subnetworks such that reachability can be computed autonomously and sequentially on each subnetwork. We represent each interaction using a small polynomial. The product of these polynomials express different scenarios when a signal can or cannot reach to target genes from the source genes. We introduce polynomial collapsing operators for each subnetwork. These operators reduce the size of the resulting polynomial and thus the computational complexity dramatically. We show that our method scales to entire human regulatory networks in only seconds, while the existing methods fail beyond a few tens of genes and interactions. We demonstrate that our method can successfully characterize key reachability characteristics of the entire transcriptions regulatory networks of patients affected by eight different subtypes of leukemia, as well as those from healthy control samples. All the datasets and code used in this article are available at bioinformatics.cise.ufl.edu/PReach/scalable.htm. © The Author 2014
Scaling properties of paleomagnetic reversal sequence

Directory of Open Access Journals (Sweden)

S. S. Ivanov

1996-01-01

Full Text Available The history of reversals of main geomagnetic field during last 160 My is analyzed as a sequence of events, presented as a point set on the time axis. Different techniques were applied including the method of boxcounting, dispersion counter-scaling, multifractal analysis and examination of attractor behaviour in multidimensional phase space. The existence of a crossover point at time interval 0.5-1.0 My was clearly identified, dividing the whole time range into two subranges with different scaling properties. The long-term subrange is characterized by monofractal dimension 0.88 and by an attractor, whose correlation dimension converges to 1.0, that provides evidence of a deterministic dynamical system in this subrange, similar to most existing dynamo models. In the short-term subrange the fractal dimension estimated by different methods varies from 0.47 to 0.88 and the dimensionality of the attractor is obtained to be about 3.7. These results are discussed in terms of non-linear superposition of processes in the Earth's geospheres.
Large-scale image-based profiling of single-cell phenotypes in arrayed CRISPR-Cas9 gene perturbation screens.

Science.gov (United States)

de Groot, Reinoud; Lüthi, Joel; Lindsay, Helen; Holtackers, René; Pelkmans, Lucas

2018-01-23

High-content imaging using automated microscopy and computer vision allows multivariate profiling of single-cell phenotypes. Here, we present methods for the application of the CISPR-Cas9 system in large-scale, image-based, gene perturbation experiments. We show that CRISPR-Cas9-mediated gene perturbation can be achieved in human tissue culture cells in a timeframe that is compatible with image-based phenotyping. We developed a pipeline to construct a large-scale arrayed library of 2,281 sequence-verified CRISPR-Cas9 targeting plasmids and profiled this library for genes affecting cellular morphology and the subcellular localization of components of the nuclear pore complex (NPC). We conceived a machine-learning method that harnesses genetic heterogeneity to score gene perturbations and identify phenotypically perturbed cells for in-depth characterization of gene perturbation effects. This approach enables genome-scale image-based multivariate gene perturbation profiling using CRISPR-Cas9. © 2018 The Authors. Published under the terms of the CC BY 4.0 license.
Homogenization of Large-Scale Movement Models in Ecology

Science.gov (United States)

Garlick, M.J.; Powell, J.A.; Hooten, M.B.; McFarlane, L.R.

2011-01-01

A difficulty in using diffusion models to predict large scale animal population dispersal is that individuals move differently based on local information (as opposed to gradients) in differing habitat types. This can be accommodated by using ecological diffusion. However, real environments are often spatially complex, limiting application of a direct approach. Homogenization for partial differential equations has long been applied to Fickian diffusion (in which average individual movement is organized along gradients of habitat and population density). We derive a homogenization procedure for ecological diffusion and apply it to a simple model for chronic wasting disease in mule deer. Homogenization allows us to determine the impact of small scale (10-100 m) habitat variability on large scale (10-100 km) movement. The procedure generates asymptotic equations for solutions on the large scale with parameters defined by small-scale variation. The simplicity of this homogenization procedure is striking when compared to the multi-dimensional homogenization procedure for Fickian diffusion,and the method will be equally straightforward for more complex models. ?? 2010 Society for Mathematical Biology.
The role of large-scale, extratropical dynamics in climate change

Energy Technology Data Exchange (ETDEWEB)

Shepherd, T.G. [ed.

1994-02-01

The climate modeling community has focused recently on improving our understanding of certain processes, such as cloud feedbacks and ocean circulation, that are deemed critical to climate-change prediction. Although attention to such processes is warranted, emphasis on these areas has diminished a general appreciation of the role played by the large-scale dynamics of the extratropical atmosphere. Lack of interest in extratropical dynamics may reflect the assumption that these dynamical processes are a non-problem as far as climate modeling is concerned, since general circulation models (GCMs) calculate motions on this scale from first principles. Nevertheless, serious shortcomings in our ability to understand and simulate large-scale dynamics exist. Partly due to a paucity of standard GCM diagnostic calculations of large-scale motions and their transports of heat, momentum, potential vorticity, and moisture, a comprehensive understanding of the role of large-scale dynamics in GCM climate simulations has not been developed. Uncertainties remain in our understanding and simulation of large-scale extratropical dynamics and their interaction with other climatic processes, such as cloud feedbacks, large-scale ocean circulation, moist convection, air-sea interaction and land-surface processes. To address some of these issues, the 17th Stanstead Seminar was convened at Bishop`s University in Lennoxville, Quebec. The purpose of the Seminar was to promote discussion of the role of large-scale extratropical dynamics in global climate change. Abstracts of the talks are included in this volume. On the basis of these talks, several key issues emerged concerning large-scale extratropical dynamics and their climatic role. Individual records are indexed separately for the database.
The role of large-scale, extratropical dynamics in climate change

International Nuclear Information System (INIS)

Shepherd, T.G.

1994-02-01

The climate modeling community has focused recently on improving our understanding of certain processes, such as cloud feedbacks and ocean circulation, that are deemed critical to climate-change prediction. Although attention to such processes is warranted, emphasis on these areas has diminished a general appreciation of the role played by the large-scale dynamics of the extratropical atmosphere. Lack of interest in extratropical dynamics may reflect the assumption that these dynamical processes are a non-problem as far as climate modeling is concerned, since general circulation models (GCMs) calculate motions on this scale from first principles. Nevertheless, serious shortcomings in our ability to understand and simulate large-scale dynamics exist. Partly due to a paucity of standard GCM diagnostic calculations of large-scale motions and their transports of heat, momentum, potential vorticity, and moisture, a comprehensive understanding of the role of large-scale dynamics in GCM climate simulations has not been developed. Uncertainties remain in our understanding and simulation of large-scale extratropical dynamics and their interaction with other climatic processes, such as cloud feedbacks, large-scale ocean circulation, moist convection, air-sea interaction and land-surface processes. To address some of these issues, the 17th Stanstead Seminar was convened at Bishop's University in Lennoxville, Quebec. The purpose of the Seminar was to promote discussion of the role of large-scale extratropical dynamics in global climate change. Abstracts of the talks are included in this volume. On the basis of these talks, several key issues emerged concerning large-scale extratropical dynamics and their climatic role. Individual records are indexed separately for the database
Pms2 suppresses large expansions of the (GAA·TTC)n sequence in neuronal tissues.

Science.gov (United States)

Bourn, Rebecka L; De Biase, Irene; Pinto, Ricardo Mouro; Sandi, Chiranjeevi; Al-Mahdawi, Sahar; Pook, Mark A; Bidichandani, Sanjay I

2012-01-01

Expanded trinucleotide repeat sequences are the cause of several inherited neurodegenerative diseases. Disease pathogenesis is correlated with several features of somatic instability of these sequences, including further large expansions in postmitotic tissues. The presence of somatic expansions in postmitotic tissues is consistent with DNA repair being a major determinant of somatic instability. Indeed, proteins in the mismatch repair (MMR) pathway are required for instability of the expanded (CAG·CTG)(n) sequence, likely via recognition of intrastrand hairpins by MutSβ. It is not clear if or how MMR would affect instability of disease-causing expanded trinucleotide repeat sequences that adopt secondary structures other than hairpins, such as the triplex/R-loop forming (GAA·TTC)(n) sequence that causes Friedreich ataxia. We analyzed somatic instability in transgenic mice that carry an expanded (GAA·TTC)(n) sequence in the context of the human FXN locus and lack the individual MMR proteins Msh2, Msh6 or Pms2. The absence of Msh2 or Msh6 resulted in a dramatic reduction in somatic mutations, indicating that mammalian MMR promotes instability of the (GAA·TTC)(n) sequence via MutSα. The absence of Pms2 resulted in increased accumulation of large expansions in the nervous system (cerebellum, cerebrum, and dorsal root ganglia) but not in non-neuronal tissues (heart and kidney), without affecting the prevalence of contractions. Pms2 suppressed large expansions specifically in tissues showing MutSα-dependent somatic instability, suggesting that they may act on the same lesion or structure associated with the expanded (GAA·TTC)(n) sequence. We conclude that Pms2 specifically suppresses large expansions of a pathogenic trinucleotide repeat sequence in neuronal tissues, possibly acting independently of the canonical MMR pathway.
Status: Large-scale subatmospheric cryogenic systems

International Nuclear Information System (INIS)

Peterson, T.

1989-01-01

In the late 1960's and early 1970's an interest in testing and operating RF cavities at 1.8K motivated the development and construction of four large (300 Watt) 1.8K refrigeration systems. in the past decade, development of successful superconducting RF cavities and interest in obtaining higher magnetic fields with the improved Niobium-Titanium superconductors has once again created interest in large-scale 1.8K refrigeration systems. The L'Air Liquide plant for Tore Supra is a recently commissioned 300 Watt 1.8K system which incorporates new technology, cold compressors, to obtain the low vapor pressure for low temperature cooling. CEBAF proposes to use cold compressors to obtain 5KW at 2.0K. Magnetic refrigerators of 10 Watt capacity or higher at 1.8K are now being developed. The state of the art of large-scale refrigeration in the range under 4K will be reviewed. 28 refs., 4 figs., 7 tabs
Large-scale weakly supervised object localization via latent category learning.

Science.gov (United States)

Chong Wang; Kaiqi Huang; Weiqiang Ren; Junge Zhang; Maybank, Steve

2015-04-01

Localizing objects in cluttered backgrounds is challenging under large-scale weakly supervised conditions. Due to the cluttered image condition, objects usually have large ambiguity with backgrounds. Besides, there is also a lack of effective algorithm for large-scale weakly supervised localization in cluttered backgrounds. However, backgrounds contain useful latent information, e.g., the sky in the aeroplane class. If this latent information can be learned, object-background ambiguity can be largely reduced and background can be suppressed effectively. In this paper, we propose the latent category learning (LCL) in large-scale cluttered conditions. LCL is an unsupervised learning method which requires only image-level class labels. First, we use the latent semantic analysis with semantic object representation to learn the latent categories, which represent objects, object parts or backgrounds. Second, to determine which category contains the target object, we propose a category selection strategy by evaluating each category's discrimination. Finally, we propose the online LCL for use in large-scale conditions. Evaluation on the challenging PASCAL Visual Object Class (VOC) 2007 and the large-scale imagenet large-scale visual recognition challenge 2013 detection data sets shows that the method can improve the annotation precision by 10% over previous methods. More importantly, we achieve the detection precision which outperforms previous results by a large margin and can be competitive to the supervised deformable part model 5.0 baseline on both data sets.
Large-scale networks in engineering and life sciences

CERN Document Server

Findeisen, Rolf; Flockerzi, Dietrich; Reichl, Udo; Sundmacher, Kai

2014-01-01

This edited volume provides insights into and tools for the modeling, analysis, optimization, and control of large-scale networks in the life sciences and in engineering. Large-scale systems are often the result of networked interactions between a large number of subsystems, and their analysis and control are becoming increasingly important. The chapters of this book present the basic concepts and theoretical foundations of network theory and discuss its applications in different scientific areas such as biochemical reactions, chemical production processes, systems biology, electrical circuits, and mobile agents. The aim is to identify common concepts, to understand the underlying mathematical ideas, and to inspire discussions across the borders of the various disciplines. The book originates from the interdisciplinary summer school “Large Scale Networks in Engineering and Life Sciences” hosted by the International Max Planck Research School Magdeburg, September 26-30, 2011, and will therefore be of int...
Development and Characterization of 1,906 EST-SSR Markers from Unigenes in Jute (Corchorus spp..

Directory of Open Access Journals (Sweden)

Liwu Zhang

Full Text Available Jute, comprising white and dark jute, is the second important natural fiber crop after cotton worldwide. However, the lack of expressed sequence tag-derived simple sequence repeat (EST-SSR markers has resulted in a large gap in the improvement of jute. Previously, de novo 48,914 unigenes from white jute were assembled. In this study, 1,906 EST-SSRs were identified from these assembled uingenes. Among these markers, di-, tri- and tetra-nucleotide repeat types were the abundant types (12.0%, 56.9% and 21.6% respectively. The AG-rich or GA-rich nucleotide repeats were the predominant. Subsequently, a sample of 116 SSRs, located in genes encoding transcription factors and cellulose synthases, were selected to survey polymorphisms among12 diverse jute accessions. Of these, 83.6% successfully amplified at least one fragment and detected polymorphism among the 12diverse genotypes, indicating that the newly developed SSRs are of good quality. Furthermore, the genetic similarity coefficients of all the 12 accessions were evaluated using 97 polymorphic SSRs. The cluster analysis divided the jute accessions into two main groups with genetic similarity coefficient of 0.61. These EST-SSR markers not only enrich molecular markers of jute genome, but also facilitate genetic and genomic researches in jute.
TESTING SCALING RELATIONS FOR SOLAR-LIKE OSCILLATIONS FROM THE MAIN SEQUENCE TO RED GIANTS USING KEPLER DATA

Energy Technology Data Exchange (ETDEWEB)

Huber, D.; Bedding, T. R.; Stello, D. [Sydney Institute for Astronomy (SIfA), School of Physics, University of Sydney, NSW 2006 (Australia); Hekker, S. [Astronomical Institute ' Anton Pannekoek' , University of Amsterdam, Science Park 904, 1098 XH Amsterdam (Netherlands); Mathur, S. [High Altitude Observatory, NCAR, P.O. Box 3000, Boulder, CO 80307 (United States); Mosser, B. [LESIA, CNRS, Universite Pierre et Marie Curie, Universite Denis, Diderot, Observatoire de Paris, 92195 Meudon cedex (France); Verner, G. A.; Elsworth, Y. P.; Hale, S. J.; Chaplin, W. J. [School of Physics and Astronomy, University of Birmingham, Birmingham B15 2TT (United Kingdom); Bonanno, A. [INAF Osservatorio Astrofisico di Catania (Italy); Buzasi, D. L. [Eureka Scientific, 2452 Delmer Street Suite 100, Oakland, CA 94602-3017 (United States); Campante, T. L. [Centro de Astrofisica da Universidade do Porto, Rua das Estrelas, 4150-762 Porto (Portugal); Kallinger, T. [Department of Physics and Astronomy, University of British Columbia, Vancouver (Canada); Silva Aguirre, V. [Max-Planck-Institut fuer Astrophysik, Karl-Schwarzschild-Str. 1, 85748 Garching (Germany); De Ridder, J. [Instituut voor Sterrenkunde, K.U.Leuven (Belgium); Garcia, R. A. [Laboratoire AIM, CEA/DSM-CNRS, Universite Paris 7 Diderot, IRFU/SAp, Centre de Saclay, 91191, Gif-sur-Yvette (France); Appourchaux, T. [Institut d' Astrophysique Spatiale, UMR 8617, Universite Paris Sud, 91405 Orsay Cedex (France); Frandsen, S. [Danish AsteroSeismology Centre (DASC), Department of Physics and Astronomy, Aarhus University, DK-8000 Aarhus C (Denmark); Houdek, G., E-mail: dhuber@physics.usyd.edu.au [Institute of Astronomy, University of Vienna, 1180 Vienna (Austria); and others

2011-12-20

We have analyzed solar-like oscillations in {approx}1700 stars observed by the Kepler Mission, spanning from the main sequence to the red clump. Using evolutionary models, we test asteroseismic scaling relations for the frequency of maximum power ({nu}{sub max}), the large frequency separation ({Delta}{nu}), and oscillation amplitudes. We show that the difference of the {Delta}{nu}-{nu}{sub max} relation for unevolved and evolved stars can be explained by different distributions in effective temperature and stellar mass, in agreement with what is expected from scaling relations. For oscillation amplitudes, we show that neither (L/M){sup s} scaling nor the revised scaling relation by Kjeldsen and Bedding is accurate for red-giant stars, and demonstrate that a revised scaling relation with a separate luminosity-mass dependence can be used to calculate amplitudes from the main sequence to red giants to a precision of {approx}25%. The residuals show an offset particularly for unevolved stars, suggesting that an additional physical dependency is necessary to fully reproduce the observed amplitudes. We investigate correlations between amplitudes and stellar activity, and find evidence that the effect of amplitude suppression is most pronounced for subgiant stars. Finally, we test the location of the cool edge of the instability strip in the Hertzsprung-Russell diagram using solar-like oscillations and find the detections in the hottest stars compatible with a domain of hybrid stochastically excited and opacity driven pulsation.
Development and characterization of EST-SSR markers in Bombax ceiba (Malvaceae).

Science.gov (United States)

Ju, Miao-Miao; Ma, Huan-Cheng; Xin, Pei-Yao; Zhou, Zhi-Li; Tian, Bin

2015-04-01

Bombax ceiba (Malvaceae), commonly known as silk cotton tree, is a multipurpose tree species of tropical forests. Novel expressed sequence tag-simple sequence repeat (EST-SSR) markers were developed and characterized for the species using transcriptome analysis. A total of 33 new EST-SSR markers were developed for B. ceiba, of which 13 showed polymorphisms across the 24 individuals from four distant populations tested in the study. The results showed that the number of alleles per polymorphic locus ranged from two to four, and the expected heterozygosity and observed heterozygosity per locus varied from 0.043 to 0.654 and from 0 to 0.609, respectively. These newly developed EST-SSR markers can be used in phylogeographic and population genetic studies to investigate the origin of B. ceiba populations. Furthermore, these EST-SSR markers could also greatly promote the development of molecular breeding studies pertaining to silk cotton tree.
SWANS: A Prototypic SCALE Criticality Sequence for Automated Optimization Using the SWAN Methodology

International Nuclear Information System (INIS)

Greenspan, E.

2001-01-01

SWANS is a new prototypic analysis sequence that provides an intelligent, semi-automatic search for the maximum k eff of a given amount of specified fissile material, or of the minimum critical mass. It combines the optimization strategy of the SWAN code with the composition-dependent resonance self-shielded cross sections of the SCALE package. For a given system composition arrived at during the iterative optimization process, the value of k eff is as accurate and reliable as obtained using the CSAS1X Sequence of SCALE-4.4. This report describes how SWAN is integrated within the SCALE system to form the new prototypic optimization sequence, describes the optimization procedure, provides a user guide for SWANS, and illustrates its application to five different types of problems. In addition, the report illustrates that resonance self-shielding might have a significant effect on the maximum k eff value a given fissile material mass can have
SWANS: A Prototypic SCALE Criticality Sequence for Automated Optimization Using the SWAN Methodology

Energy Technology Data Exchange (ETDEWEB)

Greenspan, E.

2001-01-11

SWANS is a new prototypic analysis sequence that provides an intelligent, semi-automatic search for the maximum k{sub eff} of a given amount of specified fissile material, or of the minimum critical mass. It combines the optimization strategy of the SWAN code with the composition-dependent resonance self-shielded cross sections of the SCALE package. For a given system composition arrived at during the iterative optimization process, the value of k{sub eff} is as accurate and reliable as obtained using the CSAS1X Sequence of SCALE-4.4. This report describes how SWAN is integrated within the SCALE system to form the new prototypic optimization sequence, describes the optimization procedure, provides a user guide for SWANS, and illustrates its application to five different types of problems. In addition, the report illustrates that resonance self-shielding might have a significant effect on the maximum k{sub eff} value a given fissile material mass can have.
Development of Novel Polymorphic EST-SSR Markers in Bailinggu (Pleurotus tuoliensis for Crossbreeding

Directory of Open Access Journals (Sweden)

Yueting Dai

2017-11-01

Full Text Available Identification of monokaryons and their mating types and discrimination of hybrid offspring are key steps for the crossbreeding of Pleurotus tuoliensis (Bailinggu. However, conventional crossbreeding methods are troublesome and time consuming. Using RNA-seq technology, we developed new expressed sequence tag-simple sequence repeat (EST-SSR markers for Bailinggu to easily and rapidly identify monokaryons and their mating types, genetic diversity and hybrid offspring. We identified 1110 potential EST-based SSR loci from a newly-sequenced Bailinggu transcriptome and then randomly selected 100 EST-SSRs for further validation. Results showed that 39, 43 and 34 novel EST-SSR markers successfully identified monokaryons from their parent dikaryons, differentiated two different mating types and discriminated F1 and F2 hybrid offspring, respectively. Furthermore, a total of 86 alleles were detected in 37 monokaryons using 18 highly informative EST-SSRs. The observed number of alleles per locus ranged from three to seven. Cluster analysis revealed that these monokaryons have a relatively high level of genetic diversity. Transfer rates of the EST-SSRs in the monokaryons of closely-related species Pleurotus eryngii var. ferulae and Pleurotus ostreatus were 72% and 64%, respectively. Therefore, our study provides new SSR markers and an efficient method to enhance the crossbreeding of Bailinggu and closely-related species.
Development of Novel Polymorphic EST-SSR Markers in Bailinggu (Pleurotus tuoliensis) for Crossbreeding

Science.gov (United States)

Dai, Yueting; Su, Wenying; Song, Bing; Li, Yu; Fu, Yongping

2017-01-01

Identification of monokaryons and their mating types and discrimination of hybrid offspring are key steps for the crossbreeding of Pleurotus tuoliensis (Bailinggu). However, conventional crossbreeding methods are troublesome and time consuming. Using RNA-seq technology, we developed new expressed sequence tag-simple sequence repeat (EST-SSR) markers for Bailinggu to easily and rapidly identify monokaryons and their mating types, genetic diversity and hybrid offspring. We identified 1110 potential EST-based SSR loci from a newly-sequenced Bailinggu transcriptome and then randomly selected 100 EST-SSRs for further validation. Results showed that 39, 43 and 34 novel EST-SSR markers successfully identified monokaryons from their parent dikaryons, differentiated two different mating types and discriminated F1 and F2 hybrid offspring, respectively. Furthermore, a total of 86 alleles were detected in 37 monokaryons using 18 highly informative EST-SSRs. The observed number of alleles per locus ranged from three to seven. Cluster analysis revealed that these monokaryons have a relatively high level of genetic diversity. Transfer rates of the EST-SSRs in the monokaryons of closely-related species Pleurotus eryngii var. ferulae and Pleurotus ostreatus were 72% and 64%, respectively. Therefore, our study provides new SSR markers and an efficient method to enhance the crossbreeding of Bailinggu and closely-related species. PMID:29149037
An Novel Architecture of Large-scale Communication in IOT

Science.gov (United States)

Ma, Wubin; Deng, Su; Huang, Hongbin

2018-03-01

In recent years, many scholars have done a great deal of research on the development of Internet of Things and networked physical systems. However, few people have made the detailed visualization of the large-scale communications architecture in the IOT. In fact, the non-uniform technology between IPv6 and access points has led to a lack of broad principles of large-scale communications architectures. Therefore, this paper presents the Uni-IPv6 Access and Information Exchange Method (UAIEM), a new architecture and algorithm that addresses large-scale communications in the IOT.
A New Perspective on Polyploid Fragaria (Strawberry) Genome Composition Based on Large-Scale, Multi-Locus Phylogenetic Analysis

OpenAIRE

Yang, Yilong; Davis, Thomas M

2017-01-01

Abstract The subgenomic compositions of the octoploid (2n = 8× = 56) strawberry (Fragaria) species, including the economically important cultivated species Fragaria x ananassa, have been a topic of long-standing interest. Phylogenomic approaches utilizing next-generation sequencing technologies offer a new window into species relationships and the subgenomic compositions of polyploids. We have conducted a large-scale phylogenetic analysis of Fragaria (strawberry) species using the Fluidigm Ac...

Benefits of transactive memory systems in large-scale development

OpenAIRE

Aivars, Sablis

2016-01-01

Context. Large-scale software development projects are those consisting of a large number of teams, maybe even spread across multiple locations, and working on large and complex software tasks. That means that neither a team member individually nor an entire team holds all the knowledge about the software being developed and teams have to communicate and coordinate their knowledge. Therefore, teams and team members in large-scale software development projects must acquire and manage expertise...
Study of a large scale neutron measurement channel

International Nuclear Information System (INIS)

Amarouayache, Anissa; Ben Hadid, Hayet.

1982-12-01

A large scale measurement channel allows the processing of the signal coming from an unique neutronic sensor, during three different running modes: impulses, fluctuations and current. The study described in this note includes three parts: - A theoretical study of the large scale channel and its brief description are given. The results obtained till now in that domain are presented. - The fluctuation mode is thoroughly studied and the improvements to be done are defined. The study of a fluctuation linear channel with an automatic commutation of scales is described and the results of the tests are given. In this large scale channel, the method of data processing is analogical. - To become independent of the problems generated by the use of a an analogical processing of the fluctuation signal, a digital method of data processing is tested. The validity of that method is improved. The results obtained on a test system realized according to this method are given and a preliminary plan for further research is defined [fr
Using SQL Databases for Sequence Similarity Searching and Analysis.

Science.gov (United States)

Pearson, William R; Mackey, Aaron J

2017-09-13

Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Mining and gene ontology based annotation of SSR markers from expressed sequence tags of Humulus lupulus

Science.gov (United States)

Singh, Swati; Gupta, Sanchita; Mani, Ashutosh; Chaturvedi, Anoop

2012-01-01

Humulus lupulus is commonly known as hops, a member of the family moraceae. Currently many projects are underway leading to the accumulation of voluminous genomic and expressed sequence tag sequences in public databases. The genetically characterized domains in these databases are limited due to non-availability of reliable molecular markers. The large data of EST sequences are available in hops. The simple sequence repeat markers extracted from EST data are used as molecular markers for genetic characterization, in the present study. 25,495 EST sequences were examined and assembled to get full-length sequences. Maximum frequency distribution was shown by mononucleotide SSR motifs i.e. 60.44% in contig and 62.16% in singleton where as minimum frequency are observed for hexanucleotide SSR in contig (0.09%) and pentanucleotide SSR in singletons (0.12%). Maximum trinucleotide motifs code for Glutamic acid (GAA) while AT/TA were the most frequent repeat of dinucleotide SSRs. Flanking primer pairs were designed in-silico for the SSR containing sequences. Functional categorization of SSRs containing sequences was done through gene ontology terms like biological process, cellular component and molecular function. PMID:22368382
Capabilities of the Large-Scale Sediment Transport Facility

Science.gov (United States)

2016-04-01

pump flow meters, sediment trap weigh tanks , and beach profiling lidar. A detailed discussion of the original LSTF features and capabilities can be...ERDC/CHL CHETN-I-88 April 2016 Approved for public release; distribution is unlimited. Capabilities of the Large-Scale Sediment Transport...describes the Large-Scale Sediment Transport Facility (LSTF) and recent upgrades to the measurement systems. The purpose of these upgrades was to increase
Spatiotemporal property and predictability of large-scale human mobility

Science.gov (United States)

Zhang, Hai-Tao; Zhu, Tao; Fu, Dongfei; Xu, Bowen; Han, Xiao-Pu; Chen, Duxin

2018-04-01

Spatiotemporal characteristics of human mobility emerging from complexity on individual scale have been extensively studied due to the application potential on human behavior prediction and recommendation, and control of epidemic spreading. We collect and investigate a comprehensive data set of human activities on large geographical scales, including both websites browse and mobile towers visit. Numerical results show that the degree of activity decays as a power law, indicating that human behaviors are reminiscent of scale-free random walks known as Lévy flight. More significantly, this study suggests that human activities on large geographical scales have specific non-Markovian characteristics, such as a two-segment power-law distribution of dwelling time and a high possibility for prediction. Furthermore, a scale-free featured mobility model with two essential ingredients, i.e., preferential return and exploration, and a Gaussian distribution assumption on the exploration tendency parameter is proposed, which outperforms existing human mobility models under scenarios of large geographical scales.
Multiscale properties of DNA primary structure: cross-scale correlations

International Nuclear Information System (INIS)

Altajskij, M.V.; Ivanov, V.V.; Polozov, R.V.

2000-01-01

Cross-scale correlations of wavelet coefficients of the DNA coding sequences are calculated and compared to that of the generated random sequence of the same length. The coding sequences are shown to have strong correlation between large and small scale structures, while random sequences have not
Problems of large-scale vertically-integrated aquaculture

Energy Technology Data Exchange (ETDEWEB)

Webber, H H; Riordan, P F

1976-01-01

The problems of vertically-integrated aquaculture are outlined; they are concerned with: species limitations (in the market, biological and technological); site selection, feed, manpower needs, and legal, institutional and financial requirements. The gaps in understanding of, and the constraints limiting, large-scale aquaculture are listed. Future action is recommended with respect to: types and diversity of species to be cultivated, marketing, biotechnology (seed supply, disease control, water quality and concerted effort), siting, feed, manpower, legal and institutional aids (granting of water rights, grants, tax breaks, duty-free imports, etc.), and adequate financing. The last of hard data based on experience suggests that large-scale vertically-integrated aquaculture is a high risk enterprise, and with the high capital investment required, banks and funding institutions are wary of supporting it. Investment in pilot projects is suggested to demonstrate that large-scale aquaculture can be a fully functional and successful business. Construction and operation of such pilot farms is judged to be in the interests of both the public and private sector.
Large-scale computing with Quantum Espresso

International Nuclear Information System (INIS)

Giannozzi, P.; Cavazzoni, C.

2009-01-01

This paper gives a short introduction to Quantum Espresso: a distribution of software for atomistic simulations in condensed-matter physics, chemical physics, materials science, and to its usage in large-scale parallel computing.
Construction of new EST-SSRs for Fusarium resistant wheat breeding.

Science.gov (United States)

Yumurtaci, Aysen; Sipahi, Hulya; Al-Abdallat, Ayed; Jighly, Abdulqader; Baum, Michael

2017-06-01

Surveying Fusarium resistance in wheat with easy applicable molecular markers such as simple sequence repeats (SSRs) is a prerequest for molecular breeding. Expressed sequence tags (ESTs) are one of the main sources for development of new SSR candidates. Therefore, 18.292 publicly available wheat ESTs were mined and genotyping of newly developed 55 EST-SSR derived primer pairs produced clear fragments in ten wheat cultivars carrying different levels of Fusarium resistance. Among the proved markers, 23 polymorphic EST-SSRs were obtained and related alleles were mostly found on B and D genome. Based on the fragment profiling and similarity analysis, a 327bp amplicon, which was a product of contig 1207 (chromosome 5BL), was detected only in Fusarium head blight (FHB) resistant cultivars (CM82036 and Sumai) and the amino acid sequences showed a similarity to pathogen related proteins. Another FHB resistance related EST-SSR, Contig 556 (chromosome 1BL) produced a 151bp fragment in Sumai and was associated to wax2-like protein. A polymorphic 204bp fragment, derived from Contig 578 (chromosome 1DL), was generated from root rot (FRR) resistant cultivars (2-49; Altay2000 and Sunco). A total of 98 alleles were displayed with an average of 1.8 alleles per locus and the polymorphic information content (PIC) ranged from 0.11 to 0.78. Dendrogram tree with two main and five sub-groups were displayed the highest genetic relationship between FRR resistant cultivars (2-49 and Altay2000), FRR sensitive cultivars (Seri82 and Scout66) and FHB resistant cultivars (CM82036 and Sumai). Thus, exploitation of these candidate EST-SSRs may help to genotype other wheat sources for Fusarium resistance. Copyright © 2017 Elsevier Ltd. All rights reserved.
School version of ESTE EU

International Nuclear Information System (INIS)

Carny, P.; Suchon, D.; Chyly, M.; Smejkalova, E.; Fabova, V.

2008-01-01

ESTE EU is information system and software for radiological impacts assessment to the territory of the country in case of radiation accident inside/outside of the country .The program enables to model dispersion of radioactive clouds in small-scale and meso-scale. The system enables the user to estimate prediction of the source term (release to the atmosphere ) for any point of radiation/nuclear accident in Europe (for any point of the release, but especially for the sites of European power reactors ). The system enables to utilize results of real radiological monitoring in the process of source term estimation. Radiological impacts of release to the atmosphere are modelled and calculated across the Europe and displayed in the geographical information system (GIS). The school version of ESTE EU is intended for students of the universities which are interested in or could work in the field of emergency response, radiological and nuclear accidents, dispersion modelling, radiological impacts calculation and urgent or preventive protective measures implementation. The school version of ESTE EU is planned to be donated to specialized departments of faculties in Slovakia, Czech Republic, etc. System can be fully operated in Slovak, Czech or English language. (authors)
School version of ESTE EU

International Nuclear Information System (INIS)

Carny, P.; Suchon, D.; Chyly, M.; Smejkalova, E.; Fabova, V.

2009-01-01

ESTE EU is information system and software for radiological impacts assessment to the territory of the country in case of radiation accident inside/outside of the country .The program enables to model dispersion of radioactive clouds in small-scale and meso-scale. The system enables the user to estimate prediction of the source term (release to the atmosphere ) for any point of radiation/nuclear accident in Europe (for any point of the release, but especially for the sites of European power reactors ). The system enables to utilize results of real radiological monitoring in the process of source term estimation. Radiological impacts of release to the atmosphere are modelled and calculated across the Europe and displayed in the geographical information system (GIS). The school version of ESTE EU is intended for students of the universities which are interested in or could work in the field of emergency response, radiological and nuclear accidents, dispersion modelling, radiological impacts calculation and urgent or preventive protective measures implementation. The school version of ESTE EU is planned to be donated to specialized departments of faculties in Slovakia, Czech Republic, etc. System can be fully operated in Slovak, Czech or English language. (authors)
Application of SCALE 6.1 MAVRIC Sequence for Activation Calculation in Reactor Primary Shield Concrete

International Nuclear Information System (INIS)

Kim, Yong IL

2014-01-01

Activation calculation requires flux information at desired location and reaction cross sections for the constituent elements to obtain production rate of activation products. Generally it is not an easy task to obtain fluxes or reaction rates with low uncertainties in a reasonable time for deep penetration problems by using standard Monte Carlo methods. The MAVRIC (Monaco with Automated Variance Reduction using Importance Calculations) sequence in SCALE 6.1 code package is intended to perform radiation transport on problems that are too challenging for standard, unbiased Monte Carlo methods. And the SCALE code system provides plenty of ENDF reaction types enough to consider almost all activation reactions in the nuclear reactor materials. To evaluate the activation of the important isotopes in primary shield, SCALE 6.1 MAVRIC sequence has been utilized for the KSNP reactor model and the calculated results are compared to the isotopic activity concentration of related standard. Related to the planning for decommission, the activation products in concrete primary shield such as Fe-55, Co-60, Ba-133, Eu-152, and Eu-154 are identified as important elements according to the comparisons with related standard for exemption. In this study, reference data are used for the concrete compositions in the activation calculation to see the applicability of MAVRIC code to the evaluation of activation inventory in the concrete primary shield. The composition data of trace elements as shown in Table 1 are obtained from various US power plant sites and accordingly they have large variations in quantity due to the characteristics of concrete composition. In practical estimation of activation radioactivity for a specific plant related to decommissioning, rigorous chemical analysis of concrete samples of the plant would first have to be performed to get exact information for compositions of concrete. Considering the capability of solving deep penetration transport problems and richness
Emergence of good conduct, scaling and zipf laws in human behavioral sequences in an online world.

Directory of Open Access Journals (Sweden)

Stefan Thurner

Full Text Available We study behavioral action sequences of players in a massive multiplayer online game. In their virtual life players use eight basic actions which allow them to interact with each other. These actions are communication, trade, establishing or breaking friendships and enmities, attack, and punishment. We measure the probabilities for these actions conditional on previous taken and received actions and find a dramatic increase of negative behavior immediately after receiving negative actions. Similarly, positive behavior is intensified by receiving positive actions. We observe a tendency towards antipersistence in communication sequences. Classifying actions as positive (good and negative (bad allows us to define binary 'world lines' of lives of individuals. Positive and negative actions are persistent and occur in clusters, indicated by large scaling exponents α ~ 0.87 of the mean square displacement of the world lines. For all eight action types we find strong signs for high levels of repetitiveness, especially for negative actions. We partition behavioral sequences into segments of length n (behavioral 'words' and 'motifs' and study their statistical properties. We find two approximate power laws in the word ranking distribution, one with an exponent of κ ~ -1 for the ranks up to 100, and another with a lower exponent for higher ranks. The Shannon n-tuple redundancy yields large values and increases in terms of word length, further underscoring the non-trivial statistical properties of behavioral sequences. On the collective, societal level the timeseries of particular actions per day can be understood by a simple mean-reverting log-normal model.
RESTRUCTURING OF THE LARGE-SCALE SPRINKLERS

Directory of Open Access Journals (Sweden)

Paweł Kozaczyk

2016-09-01

Full Text Available One of the best ways for agriculture to become independent from shortages of precipitation is irrigation. In the seventies and eighties of the last century a number of large-scale sprinklers in Wielkopolska was built. At the end of 1970’s in the Poznan province 67 sprinklers with a total area of 6400 ha were installed. The average size of the sprinkler reached 95 ha. In 1989 there were 98 sprinklers, and the area which was armed with them was more than 10 130 ha. The study was conducted on 7 large sprinklers with the area ranging from 230 to 520 hectares in 1986÷1998. After the introduction of the market economy in the early 90’s and ownership changes in agriculture, large-scale sprinklers have gone under a significant or total devastation. Land on the State Farms of the State Agricultural Property Agency has leased or sold and the new owners used the existing sprinklers to a very small extent. This involved a change in crop structure, demand structure and an increase in operating costs. There has also been a threefold increase in electricity prices. Operation of large-scale irrigation encountered all kinds of barriers in practice and limitations of system solutions, supply difficulties, high levels of equipment failure which is not inclined to rational use of available sprinklers. An effect of a vision of the local area was to show the current status of the remaining irrigation infrastructure. The adopted scheme for the restructuring of Polish agriculture was not the best solution, causing massive destruction of assets previously invested in the sprinkler system.
Large-scale synthesis of YSZ nanopowder by Pechini method

Indian Academy of Sciences (India)

Administrator

structure and chemical purity of 99⋅1% by inductively coupled plasma optical emission spectroscopy on a large scale. Keywords. Sol–gel; yttria-stabilized zirconia; large scale; nanopowder; Pechini method. 1. Introduction. Zirconia has attracted the attention of many scientists because of its tremendous thermal, mechanical ...
Parallel Optimization of Polynomials for Large-scale Problems in Stability and Control

Science.gov (United States)

Kamyar, Reza

In this thesis, we focus on some of the NP-hard problems in control theory. Thanks to the converse Lyapunov theory, these problems can often be modeled as optimization over polynomials. To avoid the problem of intractability, we establish a trade off between accuracy and complexity. In particular, we develop a sequence of tractable optimization problems --- in the form of Linear Programs (LPs) and/or Semi-Definite Programs (SDPs) --- whose solutions converge to the exact solution of the NP-hard problem. However, the computational and memory complexity of these LPs and SDPs grow exponentially with the progress of the sequence - meaning that improving the accuracy of the solutions requires solving SDPs with tens of thousands of decision variables and constraints. Setting up and solving such problems is a significant challenge. The existing optimization algorithms and software are only designed to use desktop computers or small cluster computers --- machines which do not have sufficient memory for solving such large SDPs. Moreover, the speed-up of these algorithms does not scale beyond dozens of processors. This in fact is the reason we seek parallel algorithms for setting-up and solving large SDPs on large cluster- and/or super-computers. We propose parallel algorithms for stability analysis of two classes of systems: 1) Linear systems with a large number of uncertain parameters; 2) Nonlinear systems defined by polynomial vector fields. First, we develop a distributed parallel algorithm which applies Polya's and/or Handelman's theorems to some variants of parameter-dependent Lyapunov inequalities with parameters defined over the standard simplex. The result is a sequence of SDPs which possess a block-diagonal structure. We then develop a parallel SDP solver which exploits this structure in order to map the computation, memory and communication to a distributed parallel environment. Numerical tests on a supercomputer demonstrate the ability of the algorithm to
The Phoenix series large scale LNG pool fire experiments.

Energy Technology Data Exchange (ETDEWEB)

Simpson, Richard B.; Jensen, Richard Pearson; Demosthenous, Byron; Luketa, Anay Josephine; Ricks, Allen Joseph; Hightower, Marion Michael; Blanchat, Thomas K.; Helmick, Paul H.; Tieszen, Sheldon Robert; Deola, Regina Anne; Mercier, Jeffrey Alan; Suo-Anttila, Jill Marie; Miller, Timothy J.

2010-12-01

The increasing demand for natural gas could increase the number and frequency of Liquefied Natural Gas (LNG) tanker deliveries to ports across the United States. Because of the increasing number of shipments and the number of possible new facilities, concerns about the potential safety of the public and property from an accidental, and even more importantly intentional spills, have increased. While improvements have been made over the past decade in assessing hazards from LNG spills, the existing experimental data is much smaller in size and scale than many postulated large accidental and intentional spills. Since the physics and hazards from a fire change with fire size, there are concerns about the adequacy of current hazard prediction techniques for large LNG spills and fires. To address these concerns, Congress funded the Department of Energy (DOE) in 2008 to conduct a series of laboratory and large-scale LNG pool fire experiments at Sandia National Laboratories (Sandia) in Albuquerque, New Mexico. This report presents the test data and results of both sets of fire experiments. A series of five reduced-scale (gas burner) tests (yielding 27 sets of data) were conducted in 2007 and 2008 at Sandia's Thermal Test Complex (TTC) to assess flame height to fire diameter ratios as a function of nondimensional heat release rates for extrapolation to large-scale LNG fires. The large-scale LNG pool fire experiments were conducted in a 120 m diameter pond specially designed and constructed in Sandia's Area III large-scale test complex. Two fire tests of LNG spills of 21 and 81 m in diameter were conducted in 2009 to improve the understanding of flame height, smoke production, and burn rate and therefore the physics and hazards of large LNG spills and fires.
Generation and analysis of expressed sequence tags from six developing xylem libraries in Pinus radiata D. Don

Directory of Open Access Journals (Sweden)

Dillon Shannon K

2009-01-01

Full Text Available Abstract Background Wood is a major renewable natural resource for the timber, fibre and bioenergy industry. Pinus radiata D. Don is the most important commercial plantation tree species in Australia and several other countries; however, genomic resources for this species are very limited in public databases. Our primary objective was to sequence a large number of expressed sequence tags (ESTs from genes involved in wood formation in radiata pine. Results Six developing xylem cDNA libraries were constructed from earlywood and latewood tissues sampled at juvenile (7 yrs, transition (11 yrs and mature (30 yrs ages, respectively. These xylem tissues represent six typical development stages in a rotation period of radiata pine. A total of 6,389 high quality ESTs were collected from 5,952 cDNA clones. Assembly of 5,952 ESTs from 5' end sequences generated 3,304 unigenes including 952 contigs and 2,352 singletons. About 97.0% of the 5,952 ESTs and 96.1% of the unigenes have matches in the UniProt and TIGR databases. Of the 3,174 unigenes with matches, 42.9% were not assigned GO (Gene Ontology terms and their functions are unknown or unclassified. More than half (52.1% of the 5,952 ESTs have matches in the Pfam database and represent 772 known protein families. About 18.0% of the 5,952 ESTs matched cell wall related genes in the MAIZEWALL database, representing all 18 categories, 91 of all 174 families and possibly 557 genes. Fifteen cell wall-related genes are ranked in the 30 most abundant genes, including CesA, tubulin, AGP, SAMS, actin, laccase, CCoAMT, MetE, phytocyanin, pectate lyase, cellulase, SuSy, expansin, chitinase and UDP-glucose dehydrogenase. Based on the PlantTFDB database 41 of the 64 transcription factor families in the poplar genome were identified as being involved in radiata pine wood formation. Comparative analysis of GO term abundance revealed a distinct transcriptome in juvenile earlywood formation compared to other stages of
Minimization of Linear Functionals Defined on| Solutions of Large-Scale Discrete Ill-Posed Problems

DEFF Research Database (Denmark)

Elden, Lars; Hansen, Per Christian; Rojas, Marielba

2003-01-01

The minimization of linear functionals de ned on the solutions of discrete ill-posed problems arises, e.g., in the computation of con dence intervals for these solutions. In 1990, Elden proposed an algorithm for this minimization problem based on a parametric-programming reformulation involving...... the solution of a sequence of trust-region problems, and using matrix factorizations. In this paper, we describe MLFIP, a large-scale version of this algorithm where a limited-memory trust-region solver is used on the subproblems. We illustrate the use of our algorithm in connection with an inverse heat...

Study of Large-Scale Wave Structure and Development of Equatorial Plasma Bubbles Using the C/NOFS Satellite

Science.gov (United States)

2012-10-31

scientific journals. The papers are listed below in chronological order. Kelley, M.C., F.S. Rodrigues, J.J. Makela, R. Tsunoda, P.A. Roddy, D.E. Hunton...source region be located on the dip equator. To illustrate, Figure 6 presents a sequence of satellite OLR maps, which were taken over Peru on 19-20...to large-scale wave structure and equatorial spread F, presented at the International Symposium for Equatorial Aeronomy, Paracas, Peru , March 2012
Geospatial Optimization of Siting Large-Scale Solar Projects

Energy Technology Data Exchange (ETDEWEB)

Macknick, Jordan [National Renewable Energy Lab. (NREL), Golden, CO (United States); Quinby, Ted [National Renewable Energy Lab. (NREL), Golden, CO (United States); Caulfield, Emmet [Stanford Univ., CA (United States); Gerritsen, Margot [Stanford Univ., CA (United States); Diffendorfer, Jay [U.S. Geological Survey, Boulder, CO (United States); Haines, Seth [U.S. Geological Survey, Boulder, CO (United States)

2014-03-01

Recent policy and economic conditions have encouraged a renewed interest in developing large-scale solar projects in the U.S. Southwest. However, siting large-scale solar projects is complex. In addition to the quality of the solar resource, solar developers must take into consideration many environmental, social, and economic factors when evaluating a potential site. This report describes a proof-of-concept, Web-based Geographical Information Systems (GIS) tool that evaluates multiple user-defined criteria in an optimization algorithm to inform discussions and decisions regarding the locations of utility-scale solar projects. Existing siting recommendations for large-scale solar projects from governmental and non-governmental organizations are not consistent with each other, are often not transparent in methods, and do not take into consideration the differing priorities of stakeholders. The siting assistance GIS tool we have developed improves upon the existing siting guidelines by being user-driven, transparent, interactive, capable of incorporating multiple criteria, and flexible. This work provides the foundation for a dynamic siting assistance tool that can greatly facilitate siting decisions among multiple stakeholders.
Large-scale Agricultural Land Acquisitions in West Africa | IDRC ...

International Development Research Centre (IDRC) Digital Library (Canada)

This project will examine large-scale agricultural land acquisitions in nine West African countries -Burkina Faso, Guinea-Bissau, Guinea, Benin, Mali, Togo, Senegal, Niger, and Côte d'Ivoire. ... They will use the results to increase public awareness and knowledge about the consequences of large-scale land acquisitions.
Construction of an Ostrea edulis database from genomic and expressed sequence tags (ESTs) obtained from Bonamia ostreae infected haemocytes: Development of an immune-enriched oligo-microarray.

Science.gov (United States)

Pardo, Belén G; Álvarez-Dios, José Antonio; Cao, Asunción; Ramilo, Andrea; Gómez-Tato, Antonio; Planas, Josep V; Villalba, Antonio; Martínez, Paulino

2016-12-01

response to B. ostreae through massively sequencing and has aided to improve our knowledge of the immune mechanisms of flat oyster. The validated oligo-microarray and the establishment of a reference transcriptome will be useful for large-scale gene expression studies in this species. Copyright Â© 2016 Elsevier Ltd. All rights reserved.
Large-scale motions in the universe: a review

International Nuclear Information System (INIS)

Burstein, D.

1990-01-01

The expansion of the universe can be retarded in localised regions within the universe both by the presence of gravity and by non-gravitational motions generated in the post-recombination universe. The motions of galaxies thus generated are called 'peculiar motions', and the amplitudes, size scales and coherence of these peculiar motions are among the most direct records of the structure of the universe. As such, measurements of these properties of the present-day universe provide some of the severest tests of cosmological theories. This is a review of the current evidence for large-scale motions of galaxies out to a distance of ∼5000 km s -1 (in an expanding universe, distance is proportional to radial velocity). 'Large-scale' in this context refers to motions that are correlated over size scales larger than the typical sizes of groups of galaxies, up to and including the size of the volume surveyed. To orient the reader into this relatively new field of study, a short modern history is given together with an explanation of the terminology. Careful consideration is given to the data used to measure the distances, and hence the peculiar motions, of galaxies. The evidence for large-scale motions is presented in a graphical fashion, using only the most reliable data for galaxies spanning a wide range in optical properties and over the complete range of galactic environments. The kinds of systematic errors that can affect this analysis are discussed, and the reliability of these motions is assessed. The predictions of two models of large-scale motion are compared to the observations, and special emphasis is placed on those motions in which our own Galaxy directly partakes. (author)
State of the Art in Large-Scale Soil Moisture Monitoring

Science.gov (United States)

Ochsner, Tyson E.; Cosh, Michael Harold; Cuenca, Richard H.; Dorigo, Wouter; Draper, Clara S.; Hagimoto, Yutaka; Kerr, Yan H.; Larson, Kristine M.; Njoku, Eni Gerald; Small, Eric E.;

2013-01-01

Soil moisture is an essential climate variable influencing land atmosphere interactions, an essential hydrologic variable impacting rainfall runoff processes, an essential ecological variable regulating net ecosystem exchange, and an essential agricultural variable constraining food security. Large-scale soil moisture monitoring has advanced in recent years creating opportunities to transform scientific understanding of soil moisture and related processes. These advances are being driven by researchers from a broad range of disciplines, but this complicates collaboration and communication. For some applications, the science required to utilize large-scale soil moisture data is poorly developed. In this review, we describe the state of the art in large-scale soil moisture monitoring and identify some critical needs for research to optimize the use of increasingly available soil moisture data. We review representative examples of 1) emerging in situ and proximal sensing techniques, 2) dedicated soil moisture remote sensing missions, 3) soil moisture monitoring networks, and 4) applications of large-scale soil moisture measurements. Significant near-term progress seems possible in the use of large-scale soil moisture data for drought monitoring. Assimilation of soil moisture data for meteorological or hydrologic forecasting also shows promise, but significant challenges related to model structures and model errors remain. Little progress has been made yet in the use of large-scale soil moisture observations within the context of ecological or agricultural modeling. Opportunities abound to advance the science and practice of large-scale soil moisture monitoring for the sake of improved Earth system monitoring, modeling, and forecasting.

Power spectral density and scaling exponent of high frequency global solar radiation sequences

Science.gov (United States)

Calif, Rudy; Schmitt, François G.; Huang, Yongxiang

2013-04-01

The part of the solar power production from photovlotaïcs systems is constantly increasing in the electric grids. Solar energy converter devices such as photovoltaic cells are very sensitive to instantaneous solar radiation fluctuations. Thus rapid variation of solar radiation due to changes in the local meteorological condition can induce large amplitude fluctuations of the produced electrical power and reduce the overall efficiency of the system. When large amount of photovoltaic electricity is send into a weak or small electricity network such as island network, the electric grid security can be in jeopardy due to these power fluctuations. The integration of this energy in the electrical network remains a major challenge, due to the high variability of solar radiation in time and space. To palliate these difficulties, it is essential to identify the characteristic of these fluctuations in order to anticipate the eventuality of power shortage or power surge. The objective of this study is to present an approach based on Empirical Mode Decomposition (EMD) and Hilbert-Huang Transform (HHT) to highlight the scaling properties of global solar irradiance data G(t). The scale of invariance is detected on this dataset using the Empirical Mode Decomposition in association with arbitrary-order Hilbert spectral analysis, a generalization of (HHT) or Hilbert Spectral Analysis (HSA). The first step is the EMD, consists in decomposing the normalized global solar radiation data G'(t) into several Intrinsic Mode Functions (IMF) Ci(t) without giving an a priori basis. Consequently, the normalized original solar radiation sequence G'(t) can be written as a sum of Ci(t) with a residual rn. From all IMF modes, a joint PDF P(f,A) of locally and instantaneous frequency f and amplitude A, is estimated. To characterize the scaling behavior in amplitude-frequency space, an arbitrary-order Hilbert marginal spectrum is defined to: Iq(f) = 0 P (f,A)A dA (1) with q × 0 In case of scale
A route to explosive large-scale magnetic reconnection in a super-ion-scale current sheet

Directory of Open Access Journals (Sweden)

K. G. Tanaka

2009-01-01

Full Text Available How to trigger magnetic reconnection is one of the most interesting and important problems in space plasma physics. Recently, electron temperature anisotropy (αeo=Te⊥/Te|| at the center of a current sheet and non-local effect of the lower-hybrid drift instability (LHDI that develops at the current sheet edges have attracted attention in this context. In addition to these effects, here we also study the effects of ion temperature anisotropy (αio=Ti⊥/Ti||. Electron anisotropy effects are known to be helpless in a current sheet whose thickness is of ion-scale. In this range of current sheet thickness, the LHDI effects are shown to weaken substantially with a small increase in thickness and the obtained saturation level is too low for a large-scale reconnection to be achieved. Then we investigate whether introduction of electron and ion temperature anisotropies in the initial stage would couple with the LHDI effects to revive quick triggering of large-scale reconnection in a super-ion-scale current sheet. The results are as follows. (1 The initial electron temperature anisotropy is consumed very quickly when a number of minuscule magnetic islands (each lateral length is 1.5~3 times the ion inertial length form. These minuscule islands do not coalesce into a large-scale island to enable large-scale reconnection. (2 The subsequent LHDI effects disturb the current sheet filled with the small islands. This makes the triggering time scale to be accelerated substantially but does not enhance the saturation level of reconnected flux. (3 When the ion temperature anisotropy is added, it survives through the small island formation stage and makes even quicker triggering to happen when the LHDI effects set-in. Furthermore the saturation level is seen to be elevated by a factor of ~2 and large-scale reconnection is achieved only in this case. Comparison with two-dimensional simulations that exclude the LHDI effects confirms that the saturation level
Large-scale Labeled Datasets to Fuel Earth Science Deep Learning Applications

Science.gov (United States)

Maskey, M.; Ramachandran, R.; Miller, J.

2017-12-01

Deep learning has revolutionized computer vision and natural language processing with various algorithms scaled using high-performance computing. However, generic large-scale labeled datasets such as the ImageNet are the fuel that drives the impressive accuracy of deep learning results. Large-scale labeled datasets already exist in domains such as medical science, but creating them in the Earth science domain is a challenge. While there are ways to apply deep learning using limited labeled datasets, there is a need in the Earth sciences for creating large-scale labeled datasets for benchmarking and scaling deep learning applications. At the NASA Marshall Space Flight Center, we are using deep learning for a variety of Earth science applications where we have encountered the need for large-scale labeled datasets. We will discuss our approaches for creating such datasets and why these datasets are just as valuable as deep learning algorithms. We will also describe successful usage of these large-scale labeled datasets with our deep learning based applications.
Two EST-derived marker systems for cultivar identification in tree peony.

Science.gov (United States)

Zhang, J J; Shu, Q Y; Liu, Z A; Ren, H X; Wang, L S; De Keyser, E

2012-02-01

Tree peony (Paeonia suffruticosa Andrews), a woody deciduous shrub, belongs to the section Moutan DC. in the genus of Paeonia of the Paeoniaceae family. To increase the efficiency of breeding, two EST-derived marker systems were developed based on a tree peony expressed sequence tag (EST) database. Using target region amplification polymorphism (TRAP), 19 of 39 primer pairs showed good amplification for 56 accessions with amplicons ranging from 120 to 3,000 bp long, among which 99.3% were polymorphic. In contrast, 7 of 21 primer pairs demonstrated adequate amplification with clear bands for simple sequence repeats (SSRs) developed from ESTs, and a total of 33 alleles were found in 56 accessions. The similarity matrices generated by TRAP and EST-SSR markers were compared, and the Mantel test (r = 0.57778, P = 0.0020) showed a moderate correlation between the two types of molecular markers. TRAP markers were suitable for DNA fingerprinting and EST-SSR markers were more appropriate for discriminating synonyms (the same cultivars with different names due to limited information exchanged among different geographic areas). The two sets of EST-derived markers will be used further for genetic linkage map construction and quantitative trait locus detection in tree peony.
The mining of toxin-like polypeptides from EST database by single residue distribution analysis.

Science.gov (United States)

Kozlov, Sergey; Grishin, Eugene

2011-01-31

Novel high throughput sequencing technologies require permanent development of bioinformatics data processing methods. Among them, rapid and reliable identification of encoded proteins plays a pivotal role. To search for particular protein families, the amino acid sequence motifs suitable for selective screening of nucleotide sequence databases may be used. In this work, we suggest a novel method for simplified representation of protein amino acid sequences named Single Residue Distribution Analysis, which is applicable both for homology search and database screening. Using the procedure developed, a search for amino acid sequence motifs in sea anemone polypeptides was performed, and 14 different motifs with broad and low specificity were discriminated. The adequacy of motifs for mining toxin-like sequences was confirmed by their ability to identify 100% toxin-like anemone polypeptides in the reference polypeptide database. The employment of novel motifs for the search of polypeptide toxins in Anemonia viridis EST dataset allowed us to identify 89 putative toxin precursors. The translated and modified ESTs were scanned using a special algorithm. In addition to direct comparison with the motifs developed, the putative signal peptides were predicted and homology with known structures was examined. The suggested method may be used to retrieve structures of interest from the EST databases using simple amino acid sequence motifs as templates. The efficiency of the procedure for directed search of polypeptides is higher than that of most currently used methods. Analysis of 39939 ESTs of sea anemone Anemonia viridis resulted in identification of five protein precursors of earlier described toxins, discovery of 43 novel polypeptide toxins, and prediction of 39 putative polypeptide toxin sequences. In addition, two precursors of novel peptides presumably displaying neuronal function were disclosed.
Characterization of transcriptome dynamics during watermelon fruit development: sequencing, assembly, annotation and gene expression profiles.

Science.gov (United States)

Guo, Shaogui; Liu, Jingan; Zheng, Yi; Huang, Mingyun; Zhang, Haiying; Gong, Guoyi; He, Hongju; Ren, Yi; Zhong, Silin; Fei, Zhangjun; Xu, Yong

2011-09-21

Cultivated watermelon [Citrullus lanatus (Thunb.) Matsum. & Nakai var. lanatus] is an important agriculture crop world-wide. The fruit of watermelon undergoes distinct stages of development with dramatic changes in its size, color, sweetness, texture and aroma. In order to better understand the genetic and molecular basis of these changes and significantly expand the watermelon transcript catalog, we have selected four critical stages of watermelon fruit development and used Roche/454 next-generation sequencing technology to generate a large expressed sequence tag (EST) dataset and a comprehensive transcriptome profile for watermelon fruit flesh tissues. We performed half Roche/454 GS-FLX run for each of the four watermelon fruit developmental stages (immature white, white-pink flesh, red flesh and over-ripe) and obtained 577,023 high quality ESTs with an average length of 302.8 bp. De novo assembly of these ESTs together with 11,786 watermelon ESTs collected from GenBank produced 75,068 unigenes with a total length of approximately 31.8 Mb. Overall 54.9% of the unigenes showed significant similarities to known sequences in GenBank non-redundant (nr) protein database and around two-thirds of them matched proteins of cucumber, the most closely-related species with a sequenced genome. The unigenes were further assigned with gene ontology (GO) terms and mapped to biochemical pathways. More than 5,000 SSRs were identified from the EST collection. Furthermore we carried out digital gene expression analysis of these ESTs and identified 3,023 genes that were differentially expressed during watermelon fruit development and ripening, which provided novel insights into watermelon fruit biology and a comprehensive resource of candidate genes for future functional analysis. We then generated profiles of several interesting metabolites that are important to fruit quality including pigmentation and sweetness. Integrative analysis of metabolite and digital gene expression
Large-scale structure observables in general relativity

International Nuclear Information System (INIS)

Jeong, Donghui; Schmidt, Fabian

2015-01-01

We review recent studies that rigorously define several key observables of the large-scale structure of the Universe in a general relativistic context. Specifically, we consider (i) redshift perturbation of cosmic clock events; (ii) distortion of cosmic rulers, including weak lensing shear and magnification; and (iii) observed number density of tracers of the large-scale structure. We provide covariant and gauge-invariant expressions of these observables. Our expressions are given for a linearly perturbed flat Friedmann–Robertson–Walker metric including scalar, vector, and tensor metric perturbations. While we restrict ourselves to linear order in perturbation theory, the approach can be straightforwardly generalized to higher order. (paper)
Fatigue Analysis of Large-scale Wind turbine

Directory of Open Access Journals (Sweden)

Zhu Yongli

2017-01-01

Full Text Available The paper does research on top flange fatigue damage of large-scale wind turbine generator. It establishes finite element model of top flange connection system with finite element analysis software MSC. Marc/Mentat, analyzes its fatigue strain, implements load simulation of flange fatigue working condition with Bladed software, acquires flange fatigue load spectrum with rain-flow counting method, finally, it realizes fatigue analysis of top flange with fatigue analysis software MSC. Fatigue and Palmgren-Miner linear cumulative damage theory. The analysis result indicates that its result provides new thinking for flange fatigue analysis of large-scale wind turbine generator, and possesses some practical engineering value.
Real-time simulation of large-scale floods

Science.gov (United States)

Liu, Q.; Qin, Y.; Li, G. D.; Liu, Z.; Cheng, D. J.; Zhao, Y. H.

2016-08-01

According to the complex real-time water situation, the real-time simulation of large-scale floods is very important for flood prevention practice. Model robustness and running efficiency are two critical factors in successful real-time flood simulation. This paper proposed a robust, two-dimensional, shallow water model based on the unstructured Godunov- type finite volume method. A robust wet/dry front method is used to enhance the numerical stability. An adaptive method is proposed to improve the running efficiency. The proposed model is used for large-scale flood simulation on real topography. Results compared to those of MIKE21 show the strong performance of the proposed model.
Characterization of new Schistosoma mansoni microsatellite loci in sequences obtained from public DNA databases and microsatellite enriched genomic libraries

Directory of Open Access Journals (Sweden)

Rodrigues NB

2002-01-01

Full Text Available In the last decade microsatellites have become one of the most useful genetic markers used in a large number of organisms due to their abundance and high level of polymorphism. Microsatellites have been used for individual identification, paternity tests, forensic studies and population genetics. Data on microsatellite abundance comes preferentially from microsatellite enriched libraries and DNA sequence databases. We have conducted a search in GenBank of more than 16,000 Schistosoma mansoni ESTs and 42,000 BAC sequences. In addition, we obtained 300 sequences from CA and AT microsatellite enriched genomic libraries. The sequences were searched for simple repeats using the RepeatMasker software. Of 16,022 ESTs, we detected 481 (3% sequences that contained 622 microsatellites (434 perfect, 164 imperfect and 24 compounds. Of the 481 ESTs, 194 were grouped in 63 clusters containing 2 to 15 ESTs per cluster. Polymorphisms were observed in 16 clusters. The 287 remaining ESTs were orphan sequences. Of the 42,017 BAC end sequences, 1,598 (3.8% contained microsatellites (2,335 perfect, 287 imperfect and 79 compounds. The 1,598 BAC end sequences 80 were grouped into 17 clusters containing 3 to 17 BAC end sequences per cluster. Microsatellites were present in 67 out of 300 sequences from microsatellite enriched libraries (55 perfect, 38 imperfect and 15 compounds. From all of the observed loci 55 were selected for having the longest perfect repeats and flanking regions that allowed the design of primers for PCR amplification. Additionally we describe two new polymorphic microsatellite loci.
Composite Binary Sequences with a Large Ensemble and Zero Correlation Zone

Directory of Open Access Journals (Sweden)

S. S. Yudachev

2015-01-01

Full Text Available The article considers a proposed class of derived signals such as composite binary sequences for application in advanced spread spectrum radio systems of various purposes, using signals based on spectrum spreading by direct sequence method. Considered composite sequences, having a representative set of lengths and unique correlation properties, compares favorably with the widely used at present large ensembles formed on a single algorithmic basis. To evaluate the properties of the composite sequences generated on the basis of two components - the Barker code and Kerdock sequences, expressions of periodic and aperiodic correlation functions are given.An algorithm for generating practical ensembles of composite sequences is presented. On the basis of the algorithm and its software implementation in C #, the samples of the sequence ensembles of various lengths were obtained and their periodic and aperiodic correlation functions and statistical characteristics were studied in detail. As an illustration, some of the most typical correlation functions are presented. The most remarkable characteristics allowing a ssessing the feasibility of using this type of sequences in the design of specific types of radio systems are considered.On the basis of the proposed program and the performed calculations the conclusions can be drawn about the possibility of using the sequences of these classes, with the aim of reducing intra-system disturbance in the projected spread spectrum CDMA.
Large-scale numerical simulations of plasmas

International Nuclear Information System (INIS)

Hamaguchi, Satoshi

2004-01-01

The recent trend of large scales simulations of fusion plasma and processing plasmas is briefly summarized. Many advanced simulation techniques have been developed for fusion plasmas and some of these techniques are now applied to analyses of processing plasmas. (author)
Nearly incompressible fluids: Hydrodynamics and large scale inhomogeneity

International Nuclear Information System (INIS)

Hunana, P.; Zank, G. P.; Shaikh, D.

2006-01-01

A system of hydrodynamic equations in the presence of large-scale inhomogeneities for a high plasma beta solar wind is derived. The theory is derived under the assumption of low turbulent Mach number and is developed for the flows where the usual incompressible description is not satisfactory and a full compressible treatment is too complex for any analytical studies. When the effects of compressibility are incorporated only weakly, a new description, referred to as 'nearly incompressible hydrodynamics', is obtained. The nearly incompressible theory, was originally applied to homogeneous flows. However, large-scale gradients in density, pressure, temperature, etc., are typical in the solar wind and it was unclear how inhomogeneities would affect the usual incompressible and nearly incompressible descriptions. In the homogeneous case, the lowest order expansion of the fully compressible equations leads to the usual incompressible equations, followed at higher orders by the nearly incompressible equations, as introduced by Zank and Matthaeus. With this work we show that the inclusion of large-scale inhomogeneities (in this case time-independent and radially symmetric background solar wind) modifies the leading-order incompressible description of solar wind flow. We find, for example, that the divergence of velocity fluctuations is nonsolenoidal and that density fluctuations can be described to leading order as a passive scalar. Locally (for small lengthscales), this system of equations converges to the usual incompressible equations and we therefore use the term 'locally incompressible' to describe the equations. This term should be distinguished from the term 'nearly incompressible', which is reserved for higher-order corrections. Furthermore, we find that density fluctuations scale with Mach number linearly, in contrast to the original homogeneous nearly incompressible theory, in which density fluctuations scale with the square of Mach number. Inhomogeneous nearly
Performance Health Monitoring of Large-Scale Systems

Energy Technology Data Exchange (ETDEWEB)

Rajamony, Ram [IBM Research, Austin, TX (United States)

2014-11-20

This report details the progress made on the ASCR funded project Performance Health Monitoring for Large Scale Systems. A large-scale application may not achieve its full performance potential due to degraded performance of even a single subsystem. Detecting performance faults, isolating them, and taking remedial action is critical for the scale of systems on the horizon. PHM aims to develop techniques and tools that can be used to identify and mitigate such performance problems. We accomplish this through two main aspects. The PHM framework encompasses diagnostics, system monitoring, fault isolation, and performance evaluation capabilities that indicates when a performance fault has been detected, either due to an anomaly present in the system itself or due to contention for shared resources between concurrently executing jobs. Software components called the PHM Control system then build upon the capabilities provided by the PHM framework to mitigate degradation caused by performance problems.

Visual management of large scale data mining projects.

Science.gov (United States)

Shah, I; Hunter, L

2000-01-01

This paper describes a unified framework for visualizing the preparations for, and results of, hundreds of machine learning experiments. These experiments were designed to improve the accuracy of enzyme functional predictions from sequence, and in many cases were successful. Our system provides graphical user interfaces for defining and exploring training datasets and various representational alternatives, for inspecting the hypotheses induced by various types of learning algorithms, for visualizing the global results, and for inspecting in detail results for specific training sets (functions) and examples (proteins). The visualization tools serve as a navigational aid through a large amount of sequence data and induced knowledge. They provided significant help in understanding both the significance and the underlying biological explanations of our successes and failures. Using these visualizations it was possible to efficiently identify weaknesses of the modular sequence representations and induction algorithms which suggest better learning strategies. The context in which our data mining visualization toolkit was developed was the problem of accurately predicting enzyme function from protein sequence data. Previous work demonstrated that approximately 6% of enzyme protein sequences are likely to be assigned incorrect functions on the basis of sequence similarity alone. In order to test the hypothesis that more detailed sequence analysis using machine learning techniques and modular domain representations could address many of these failures, we designed a series of more than 250 experiments using information-theoretic decision tree induction and naive Bayesian learning on local sequence domain representations of problematic enzyme function classes. In more than half of these cases, our methods were able to perfectly discriminate among various possible functions of similar sequences. We developed and tested our visualization techniques on this application.
Identification and Validation of EST-Derived Molecular Markers, TRAP and VNTRs, for Banana Research

NARCIS (Netherlands)

Garcia, S.A.L.; Talebi, R.; Ferreira, C.F.; Vroh, B.I.; Paiva, L.V.; Kema, G.H.J.; Souza, M.T.

2011-01-01

The advent of high-throughput sequencing technology has generated abundant information on DNA sequences for the genomes of many plant species. Expressed Sequence Tags (ESTs), which are unique DNA sequences derived from a cDNA library and therefore representing genes transcribed in specific tissues
Large-scale functional purification of recombinant HIV-1 capsid.

Directory of Open Access Journals (Sweden)

Magdeleine Hung

Full Text Available During human immunodeficiency virus type-1 (HIV-1 virion maturation, capsid proteins undergo a major rearrangement to form a conical core that protects the viral nucleoprotein complexes. Mutations in the capsid sequence that alter the stability of the capsid core are deleterious to viral infectivity and replication. Recently, capsid assembly has become an attractive target for the development of a new generation of anti-retroviral agents. Drug screening efforts and subsequent structural and mechanistic studies require gram quantities of active, homogeneous and pure protein. Conventional means of laboratory purification of Escherichia coli expressed recombinant capsid protein rely on column chromatography steps that are not amenable to large-scale production. Here we present a function-based purification of wild-type and quadruple mutant capsid proteins, which relies on the inherent propensity of capsid protein to polymerize and depolymerize. This method does not require the packing of sizable chromatography columns and can generate double-digit gram quantities of functionally and biochemically well-behaved proteins with greater than 98% purity. We have used the purified capsid protein to characterize two known assembly inhibitors in our in-house developed polymerization assay and to measure their binding affinities. Our capsid purification procedure provides a robust method for purifying large quantities of a key protein in the HIV-1 life cycle, facilitating identification of the next generation anti-HIV agents.
Differential transferability of EST-SSR primers developed from diploid species Pseudoroegneria spicata, Thinopyrum bessarabicum, and Th. elongatum

Science.gov (United States)

Simple sequence repeat technology based on expressed sequence tag (EST-SSR) is a useful genomic tool for genome mapping, characterizing plant species relationships, elucidating genome evolution, and tracing genes on alien chromosome segments. EST-SSR primers developed from three perennial diploid T...
Learning from large scale neural simulations

DEFF Research Database (Denmark)

Serban, Maria

2017-01-01

Large-scale neural simulations have the marks of a distinct methodology which can be fruitfully deployed to advance scientific understanding of the human brain. Computer simulation studies can be used to produce surrogate observational data for better conceptual models and new how...
Phenomenology of two-dimensional stably stratified turbulence under large-scale forcing

KAUST Repository

Kumar, Abhishek; Verma, Mahendra K.; Sukhatme, Jai

2017-01-01

In this paper, we characterise the scaling of energy spectra, and the interscale transfer of energy and enstrophy, for strongly, moderately and weakly stably stratified two-dimensional (2D) turbulence, restricted in a vertical plane, under large-scale random forcing. In the strongly stratified case, a large-scale vertically sheared horizontal flow (VSHF) coexists with small scale turbulence. The VSHF consists of internal gravity waves and the turbulent flow has a kinetic energy (KE) spectrum that follows an approximate k−3 scaling with zero KE flux and a robust positive enstrophy flux. The spectrum of the turbulent potential energy (PE) also approximately follows a k−3 power-law and its flux is directed to small scales. For moderate stratification, there is no VSHF and the KE of the turbulent flow exhibits Bolgiano–Obukhov scaling that transitions from a shallow k−11/5 form at large scales, to a steeper approximate k−3 scaling at small scales. The entire range of scales shows a strong forward enstrophy flux, and interestingly, large (small) scales show an inverse (forward) KE flux. The PE flux in this regime is directed to small scales, and the PE spectrum is characterised by an approximate k−1.64 scaling. Finally, for weak stratification, KE is transferred upscale and its spectrum closely follows a k−2.5 scaling, while PE exhibits a forward transfer and its spectrum shows an approximate k−1.6 power-law. For all stratification strengths, the total energy always flows from large to small scales and almost all the spectral indicies are well explained by accounting for the scale-dependent nature of the corresponding flux.
Phenomenology of two-dimensional stably stratified turbulence under large-scale forcing

KAUST Repository

Kumar, Abhishek

2017-01-11

In this paper, we characterise the scaling of energy spectra, and the interscale transfer of energy and enstrophy, for strongly, moderately and weakly stably stratified two-dimensional (2D) turbulence, restricted in a vertical plane, under large-scale random forcing. In the strongly stratified case, a large-scale vertically sheared horizontal flow (VSHF) coexists with small scale turbulence. The VSHF consists of internal gravity waves and the turbulent flow has a kinetic energy (KE) spectrum that follows an approximate k−3 scaling with zero KE flux and a robust positive enstrophy flux. The spectrum of the turbulent potential energy (PE) also approximately follows a k−3 power-law and its flux is directed to small scales. For moderate stratification, there is no VSHF and the KE of the turbulent flow exhibits Bolgiano–Obukhov scaling that transitions from a shallow k−11/5 form at large scales, to a steeper approximate k−3 scaling at small scales. The entire range of scales shows a strong forward enstrophy flux, and interestingly, large (small) scales show an inverse (forward) KE flux. The PE flux in this regime is directed to small scales, and the PE spectrum is characterised by an approximate k−1.64 scaling. Finally, for weak stratification, KE is transferred upscale and its spectrum closely follows a k−2.5 scaling, while PE exhibits a forward transfer and its spectrum shows an approximate k−1.6 power-law. For all stratification strengths, the total energy always flows from large to small scales and almost all the spectral indicies are well explained by accounting for the scale-dependent nature of the corresponding flux.
Exploring the large-scale structure of Taylor–Couette turbulence through Large-Eddy Simulations

Science.gov (United States)

Ostilla-Mónico, Rodolfo; Zhu, Xiaojue; Verzicco, Roberto

2018-04-01

Large eddy simulations (LES) of Taylor-Couette (TC) flow, the flow between two co-axial and independently rotating cylinders are performed in an attempt to explore the large-scale axially-pinned structures seen in experiments and simulations. Both static and dynamic LES models are used. The Reynolds number is kept fixed at Re = 3.4 · 104, and the radius ratio η = ri /ro is set to η = 0.909, limiting the effects of curvature and resulting in frictional Reynolds numbers of around Re τ ≈ 500. Four rotation ratios from Rot = ‑0.0909 to Rot = 0.3 are simulated. First, the LES of TC is benchmarked for different rotation ratios. Both the Smagorinsky model with a constant of cs = 0.1 and the dynamic model are found to produce reasonable results for no mean rotation and cyclonic rotation, but deviations increase for increasing rotation. This is attributed to the increasing anisotropic character of the fluctuations. Second, “over-damped” LES, i.e. LES with a large Smagorinsky constant is performed and is shown to reproduce some features of the large-scale structures, even when the near-wall region is not adequately modeled. This shows the potential for using over-damped LES for fast explorations of the parameter space where large-scale structures are found.
Large-scale preparation of hollow graphitic carbon nanospheres

International Nuclear Information System (INIS)

Feng, Jun; Li, Fu; Bai, Yu-Jun; Han, Fu-Dong; Qi, Yong-Xin; Lun, Ning; Lu, Xi-Feng

2013-01-01

Hollow graphitic carbon nanospheres (HGCNSs) were synthesized on large scale by a simple reaction between glucose and Mg at 550 °C in an autoclave. Characterization by X-ray diffraction, Raman spectroscopy and transmission electron microscopy demonstrates the formation of HGCNSs with an average diameter of 10 nm or so and a wall thickness of a few graphenes. The HGCNSs exhibit a reversible capacity of 391 mAh g −1 after 60 cycles when used as anode materials for Li-ion batteries. -- Graphical abstract: Hollow graphitic carbon nanospheres could be prepared on large scale by the simple reaction between glucose and Mg at 550 °C, which exhibit superior electrochemical performance to graphite. Highlights: ► Hollow graphitic carbon nanospheres (HGCNSs) were prepared on large scale at 550 °C ► The preparation is simple, effective and eco-friendly. ► The in situ yielded MgO nanocrystals promote the graphitization. ► The HGCNSs exhibit superior electrochemical performance to graphite.
Accelerating large-scale phase-field simulations with GPU

Directory of Open Access Journals (Sweden)

Xiaoming Shi

2017-10-01

Full Text Available A new package for accelerating large-scale phase-field simulations was developed by using GPU based on the semi-implicit Fourier method. The package can solve a variety of equilibrium equations with different inhomogeneity including long-range elastic, magnetostatic, and electrostatic interactions. Through using specific algorithm in Compute Unified Device Architecture (CUDA, Fourier spectral iterative perturbation method was integrated in GPU package. The Allen-Cahn equation, Cahn-Hilliard equation, and phase-field model with long-range interaction were solved based on the algorithm running on GPU respectively to test the performance of the package. From the comparison of the calculation results between the solver executed in single CPU and the one on GPU, it was found that the speed on GPU is enormously elevated to 50 times faster. The present study therefore contributes to the acceleration of large-scale phase-field simulations and provides guidance for experiments to design large-scale functional devices.
First Mile Challenges for Large-Scale IoT

KAUST Repository

Bader, Ahmed

2017-03-16

The Internet of Things is large-scale by nature. This is not only manifested by the large number of connected devices, but also by the sheer scale of spatial traffic intensity that must be accommodated, primarily in the uplink direction. To that end, cellular networks are indeed a strong first mile candidate to accommodate the data tsunami to be generated by the IoT. However, IoT devices are required in the cellular paradigm to undergo random access procedures as a precursor to resource allocation. Such procedures impose a major bottleneck that hinders cellular networks\\' ability to support large-scale IoT. In this article, we shed light on the random access dilemma and present a case study based on experimental data as well as system-level simulations. Accordingly, a case is built for the latent need to revisit random access procedures. A call for action is motivated by listing a few potential remedies and recommendations.
Large-Scale Analysis Exploring Evolution of Catalytic Machineries and Mechanisms in Enzyme Superfamilies.

Science.gov (United States)

Furnham, Nicholas; Dawson, Natalie L; Rahman, Syed A; Thornton, Janet M; Orengo, Christine A

2016-01-29

Enzymes, as biological catalysts, form the basis of all forms of life. How these proteins have evolved their functions remains a fundamental question in biology. Over 100 years of detailed biochemistry studies, combined with the large volumes of sequence and protein structural data now available, means that we are able to perform large-scale analyses to address this question. Using a range of computational tools and resources, we have compiled information on all experimentally annotated changes in enzyme function within 379 structurally defined protein domain superfamilies, linking the changes observed in functions during evolution to changes in reaction chemistry. Many superfamilies show changes in function at some level, although one function often dominates one superfamily. We use quantitative measures of changes in reaction chemistry to reveal the various types of chemical changes occurring during evolution and to exemplify these by detailed examples. Additionally, we use structural information of the enzymes active site to examine how different superfamilies have changed their catalytic machinery during evolution. Some superfamilies have changed the reactions they perform without changing catalytic machinery. In others, large changes of enzyme function, in terms of both overall chemistry and substrate specificity, have been brought about by significant changes in catalytic machinery. Interestingly, in some superfamilies, relatives perform similar functions but with different catalytic machineries. This analysis highlights characteristics of functional evolution across a wide range of superfamilies, providing insights that will be useful in predicting the function of uncharacterised sequences and the design of new synthetic enzymes. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
The mining of toxin-like polypeptides from EST database by single residue distribution analysis

Directory of Open Access Journals (Sweden)

Grishin Eugene

2011-01-01

Full Text Available Abstract Background Novel high throughput sequencing technologies require permanent development of bioinformatics data processing methods. Among them, rapid and reliable identification of encoded proteins plays a pivotal role. To search for particular protein families, the amino acid sequence motifs suitable for selective screening of nucleotide sequence databases may be used. In this work, we suggest a novel method for simplified representation of protein amino acid sequences named Single Residue Distribution Analysis, which is applicable both for homology search and database screening. Results Using the procedure developed, a search for amino acid sequence motifs in sea anemone polypeptides was performed, and 14 different motifs with broad and low specificity were discriminated. The adequacy of motifs for mining toxin-like sequences was confirmed by their ability to identify 100% toxin-like anemone polypeptides in the reference polypeptide database. The employment of novel motifs for the search of polypeptide toxins in Anemonia viridis EST dataset allowed us to identify 89 putative toxin precursors. The translated and modified ESTs were scanned using a special algorithm. In addition to direct comparison with the motifs developed, the putative signal peptides were predicted and homology with known structures was examined. Conclusions The suggested method may be used to retrieve structures of interest from the EST databases using simple amino acid sequence motifs as templates. The efficiency of the procedure for directed search of polypeptides is higher than that of most currently used methods. Analysis of 39939 ESTs of sea anemone Anemonia viridis resulted in identification of five protein precursors of earlier described toxins, discovery of 43 novel polypeptide toxins, and prediction of 39 putative polypeptide toxin sequences. In addition, two precursors of novel peptides presumably displaying neuronal function were disclosed.
Thermal power generation projects ``Large Scale Solar Heating``; EU-Thermie-Projekte ``Large Scale Solar Heating``

Energy Technology Data Exchange (ETDEWEB)

Kuebler, R.; Fisch, M.N. [Steinbeis-Transferzentrum Energie-, Gebaeude- und Solartechnik, Stuttgart (Germany)

1998-12-31

The aim of this project is the preparation of the ``Large-Scale Solar Heating`` programme for an Europe-wide development of subject technology. The following demonstration programme was judged well by the experts but was not immediately (1996) accepted for financial subsidies. In November 1997 the EU-commission provided 1,5 million ECU which allowed the realisation of an updated project proposal. By mid 1997 a small project was approved, that had been requested under the lead of Chalmes Industriteteknik (CIT) in Sweden and is mainly carried out for the transfer of technology. (orig.) [Deutsch] Ziel dieses Vorhabens ist die Vorbereitung eines Schwerpunktprogramms `Large Scale Solar Heating`, mit dem die Technologie europaweit weiterentwickelt werden sollte. Das daraus entwickelte Demonstrationsprogramm wurde von den Gutachtern positiv bewertet, konnte jedoch nicht auf Anhieb (1996) in die Foerderung aufgenommen werden. Im November 1997 wurden von der EU-Kommission dann kurzfristig noch 1,5 Mio ECU an Foerderung bewilligt, mit denen ein aktualisierter Projektvorschlag realisiert werden kann. Bereits Mitte 1997 wurde ein kleineres Vorhaben bewilligt, das unter Federfuehrung von Chalmers Industriteknik (CIT) in Schweden beantragt worden war und das vor allem dem Technologietransfer dient. (orig.)
Improved annotation of 3' untranslated regions and complex loci by combination of strand-specific direct RNA sequencing, RNA-Seq and ESTs.

Directory of Open Access Journals (Sweden)

Nicholas J Schurch

Full Text Available The reference annotations made for a genome sequence provide the framework for all subsequent analyses of the genome. Correct and complete annotation in addition to the underlying genomic sequence is particularly important when interpreting the results of RNA-seq experiments where short sequence reads are mapped against the genome and assigned to genes according to the annotation. Inconsistencies in annotations between the reference and the experimental system can lead to incorrect interpretation of the effect on RNA expression of an experimental treatment or mutation in the system under study. Until recently, the genome-wide annotation of 3' untranslated regions received less attention than coding regions and the delineation of intron/exon boundaries. In this paper, data produced for samples in Human, Chicken and A. thaliana by the novel single-molecule, strand-specific, Direct RNA Sequencing technology from Helicos Biosciences which locates 3' polyadenylation sites to within +/- 2 nt, were combined with archival EST and RNA-Seq data. Nine examples are illustrated where this combination of data allowed: (1 gene and 3' UTR re-annotation (including extension of one 3' UTR by 5.9 kb; (2 disentangling of gene expression in complex regions; (3 clearer interpretation of small RNA expression and (4 identification of novel genes. While the specific examples displayed here may become obsolete as genome sequences and their annotations are refined, the principles laid out in this paper will be of general use both to those annotating genomes and those seeking to interpret existing publically available annotations in the context of their own experimental data.
Large-scale retrieval for medical image analytics: A comprehensive review.

Science.gov (United States)

Li, Zhongyu; Zhang, Xiaofan; Müller, Henning; Zhang, Shaoting

2018-01-01

Over the past decades, medical image analytics was greatly facilitated by the explosion of digital imaging techniques, where huge amounts of medical images were produced with ever-increasing quality and diversity. However, conventional methods for analyzing medical images have achieved limited success, as they are not capable to tackle the huge amount of image data. In this paper, we review state-of-the-art approaches for large-scale medical image analysis, which are mainly based on recent advances in computer vision, machine learning and information retrieval. Specifically, we first present the general pipeline of large-scale retrieval, summarize the challenges/opportunities of medical image analytics on a large-scale. Then, we provide a comprehensive review of algorithms and techniques relevant to major processes in the pipeline, including feature representation, feature indexing, searching, etc. On the basis of existing work, we introduce the evaluation protocols and multiple applications of large-scale medical image retrieval, with a variety of exploratory and diagnostic scenarios. Finally, we discuss future directions of large-scale retrieval, which can further improve the performance of medical image analysis. Copyright © 2017 Elsevier B.V. All rights reserved.
Photorealistic large-scale urban city model reconstruction.

Science.gov (United States)

Poullis, Charalambos; You, Suya

2009-01-01

The rapid and efficient creation of virtual environments has become a crucial part of virtual reality applications. In particular, civil and defense applications often require and employ detailed models of operations areas for training, simulations of different scenarios, planning for natural or man-made events, monitoring, surveillance, games, and films. A realistic representation of the large-scale environments is therefore imperative for the success of such applications since it increases the immersive experience of its users and helps reduce the difference between physical and virtual reality. However, the task of creating such large-scale virtual environments still remains a time-consuming and manual work. In this work, we propose a novel method for the rapid reconstruction of photorealistic large-scale virtual environments. First, a novel, extendible, parameterized geometric primitive is presented for the automatic building identification and reconstruction of building structures. In addition, buildings with complex roofs containing complex linear and nonlinear surfaces are reconstructed interactively using a linear polygonal and a nonlinear primitive, respectively. Second, we present a rendering pipeline for the composition of photorealistic textures, which unlike existing techniques, can recover missing or occluded texture information by integrating multiple information captured from different optical sensors (ground, aerial, and satellite).
Prototype Vector Machine for Large Scale Semi-Supervised Learning

Energy Technology Data Exchange (ETDEWEB)

Zhang, Kai; Kwok, James T.; Parvin, Bahram

2009-04-29

Practicaldataminingrarelyfalls exactlyinto the supervisedlearning scenario. Rather, the growing amount of unlabeled data poses a big challenge to large-scale semi-supervised learning (SSL). We note that the computationalintensivenessofgraph-based SSLarises largely from the manifold or graph regularization, which in turn lead to large models that are dificult to handle. To alleviate this, we proposed the prototype vector machine (PVM), a highlyscalable,graph-based algorithm for large-scale SSL. Our key innovation is the use of"prototypes vectors" for effcient approximation on both the graph-based regularizer and model representation. The choice of prototypes are grounded upon two important criteria: they not only perform effective low-rank approximation of the kernel matrix, but also span a model suffering the minimum information loss compared with the complete model. We demonstrate encouraging performance and appealing scaling properties of the PVM on a number of machine learning benchmark data sets.
Polymerase-endonuclease amplification reaction (PEAR for large-scale enzymatic production of antisense oligonucleotides.

Directory of Open Access Journals (Sweden)

Xiaolong Wang

Full Text Available Antisense oligonucleotides targeting microRNAs or their mRNA targets prove to be powerful tools for molecular biology research and may eventually emerge as new therapeutic agents. Synthetic oligonucleotides are often contaminated with highly homologous failure sequences. Synthesis of a certain oligonucleotide is difficult to scale up because it requires expensive equipment, hazardous chemicals and a tedious purification process. Here we report a novel thermocyclic reaction, polymerase-endonuclease amplification reaction (PEAR, for the amplification of oligonucleotides. A target oligonucleotide and a tandem repeated antisense probe are subjected to repeated cycles of denaturing, annealing, elongation and cleaving, in which thermostable DNA polymerase elongation and strand slipping generate duplex tandem repeats, and thermostable endonuclease (PspGI cleavage releases monomeric duplex oligonucleotides. Each round of PEAR achieves over 100-fold amplification. The product can be used in one more round of PEAR directly, and the process can be further repeated. In addition to avoiding dangerous materials and improved product purity, this reaction is easy to scale up and amenable to full automation. PEAR has the potential to be a useful tool for large-scale production of antisense oligonucleotide drugs.
Pattern analysis approach reveals restriction enzyme cutting abnormalities and other cDNA library construction artifacts using raw EST data

Directory of Open Access Journals (Sweden)

Zhou Sun

2012-05-01

Full Text Available Abstract Background Expressed Sequence Tag (EST sequences are widely used in applications such as genome annotation, gene discovery and gene expression studies. However, some of GenBank dbEST sequences have proven to be “unclean”. Identification of cDNA termini/ends and their structures in raw ESTs not only facilitates data quality control and accurate delineation of transcription ends, but also furthers our understanding of the potential sources of data abnormalities/errors present in the wet-lab procedures for cDNA library construction. Results After analyzing a total of 309,976 raw Pinus taeda ESTs, we uncovered many distinct variations of cDNA termini, some of which prove to be good indicators of wet-lab artifacts, and characterized each raw EST by its cDNA terminus structure patterns. In contrast to the expected patterns, many ESTs displayed complex and/or abnormal patterns that represent potential wet-lab errors such as: a failure of one or both of the restriction enzymes to cut the plasmid vector; a failure of the restriction enzymes to cut the vector at the correct positions; the insertion of two cDNA inserts into a single vector; the insertion of multiple and/or concatenated adapters/linkers; the presence of 3′-end terminal structures in designated 5′-end sequences or vice versa; and so on. With a close examination of these artifacts, many problematic ESTs that have been deposited into public databases by conventional bioinformatics pipelines or tools could be cleaned or filtered by our methodology. We developed a software tool for Abnormality Filtering and Sequence Trimming for ESTs (AFST, http://code.google.com/p/afst/ using a pattern analysis approach. To compare AFST with other pipelines that submitted ESTs into dbEST, we reprocessed 230,783 Pinus taeda and 38,709 Arachis hypogaea GenBank ESTs. We found 7.4% of Pinus taeda and 29.2% of Arachis hypogaea GenBank ESTs are “unclean” or abnormal, all of which could be cleaned

From biomedicine to natural history research: EST resources for ambystomatid salamanders

Directory of Open Access Journals (Sweden)

Bryant Susan V

2004-08-01

Full Text Available Abstract Background Establishing genomic resources for closely related species will provide comparative insights that are crucial for understanding diversity and variability at multiple levels of biological organization. We developed ESTs for Mexican axolotl (Ambystoma mexicanum and Eastern tiger salamander (A. tigrinum tigrinum, species with deep and diverse research histories. Results Approximately 40,000 quality cDNA sequences were isolated for these species from various tissues, including regenerating limb and tail. These sequences and an existing set of 16,030 cDNA sequences for A. mexicanum were processed to yield 35,413 and 20,599 high quality ESTs for A. mexicanum and A. t. tigrinum, respectively. Because the A. t. tigrinum ESTs were obtained primarily from a normalized library, an approximately equal number of contigs were obtained for each species, with 21,091 unique contigs identified overall. The 10,592 contigs that showed significant similarity to sequences from the human RefSeq database reflected a diverse array of molecular functions and biological processes, with many corresponding to genes expressed during spinal cord injury in rat and fin regeneration in zebrafish. To demonstrate the utility of these EST resources, we searched databases to identify probes for regeneration research, characterized intra- and interspecific nucleotide polymorphism, saturated a human – Ambystoma synteny group with marker loci, and extended PCR primer sets designed for A. mexicanum / A. t. tigrinum orthologues to a related tiger salamander species. Conclusions Our study highlights the value of developing resources in traditional model systems where the likelihood of information transfer to multiple, closely related taxa is high, thus simultaneously enabling both laboratory and natural history research.
From biomedicine to natural history research: EST resources for ambystomatid salamanders

Science.gov (United States)

Putta, Srikrishna; Smith, Jeramiah J; Walker, John A; Rondet, Mathieu; Weisrock, David W; Monaghan, James; Samuels, Amy K; Kump, Kevin; King, David C; Maness, Nicholas J; Habermann, Bianca; Tanaka, Elly; Bryant, Susan V; Gardiner, David M; Parichy, David M; Voss, S Randal

2004-01-01

Background Establishing genomic resources for closely related species will provide comparative insights that are crucial for understanding diversity and variability at multiple levels of biological organization. We developed ESTs for Mexican axolotl (Ambystoma mexicanum) and Eastern tiger salamander (A. tigrinum tigrinum), species with deep and diverse research histories. Results Approximately 40,000 quality cDNA sequences were isolated for these species from various tissues, including regenerating limb and tail. These sequences and an existing set of 16,030 cDNA sequences for A. mexicanum were processed to yield 35,413 and 20,599 high quality ESTs for A. mexicanum and A. t. tigrinum, respectively. Because the A. t. tigrinum ESTs were obtained primarily from a normalized library, an approximately equal number of contigs were obtained for each species, with 21,091 unique contigs identified overall. The 10,592 contigs that showed significant similarity to sequences from the human RefSeq database reflected a diverse array of molecular functions and biological processes, with many corresponding to genes expressed during spinal cord injury in rat and fin regeneration in zebrafish. To demonstrate the utility of these EST resources, we searched databases to identify probes for regeneration research, characterized intra- and interspecific nucleotide polymorphism, saturated a human – Ambystoma synteny group with marker loci, and extended PCR primer sets designed for A. mexicanum / A. t. tigrinum orthologues to a related tiger salamander species. Conclusions Our study highlights the value of developing resources in traditional model systems where the likelihood of information transfer to multiple, closely related taxa is high, thus simultaneously enabling both laboratory and natural history research. PMID:15310388
Status of large-scale analysis of post-translational modifications by mass spectrometry

DEFF Research Database (Denmark)

Olsen, Jesper V; Mann, Matthias

2013-01-01

Cellular function can be controlled through the gene expression program but often protein post translations modifications (PTMs) provide a more precisely and elegant mechanism. Key functional roles of specific modification events for instance during the cell cycle have been known for decades...... of protein modifications. For many PTMs, including phosphorylation, ubiquitination, glycosylation and acetylation, tens of thousands of sites can now be confidently identified and localized in the sequence of the protein. Quantitation of PTM levels between different cellular states is likewise established......, with label-free methods showing particular promise. It is also becoming possible to determine the absolute occupancy or stoichiometry of PTMS sites on a large scale. Powerful software for the bioinformatic analysis of thousands of PTM sites has been developed. However, a complete inventory of sites has...
EST-derived SNP discovery and selective pressure analysis in Pacific white shrimp ( Litopenaeus vannamei)

Science.gov (United States)

Liu, Chengzhang; Wang, Xia; Xiang, Jianhai; Li, Fuhua

2012-09-01

Pacific white shrimp has become a major aquaculture and fishery species worldwide. Although a large scale EST resource has been publicly available since 2008, the data have not yet been widely used for SNP discovery or transcriptome-wide assessment of selective pressure. In this study, a set of 155 411 expressed sequence tags (ESTs) from the NCBI database were computationally analyzed and 17 225 single nucleotide polymorphisms (SNPs) were predicted, including 9 546 transitions, 5 124 transversions and 2 481 indels. Among the 7 298 SNP substitutions located in functionally annotated contigs, 58.4% (4 262) are non-synonymous SNPs capable of introducing amino acid mutations. Two hundred and fifty nonsynonymous SNPs in genes associated with economic traits have been identified as candidates for markers in selective breeding. Diversity estimates among the synonymous nucleotides were on average 3.49 times greater than those in non-synonymous, suggesting negative selection. Distribution of non-synonymous to synonymous substitutions (Ka/Ks) ratio ranges from 0 to 4.01, (average 0.42, median 0.26), suggesting that the majority of the affected genes are under purifying selection. Enrichment analysis identified multiple gene ontology categories under positive or negative selection. Categories involved in innate immune response and male gamete generation are rich in positively selected genes, which is similar to reports in Drosophila and primates. This work is the first transcriptome-wide assessment of selective pressure in a Penaeid shrimp species. The functionally annotated SNPs provide a valuable resource of potential molecular markers for selective breeding.
454 sequencing of pooled BAC clones on chromosome 3H of barley

Directory of Open Access Journals (Sweden)

Yamaji Nami

2011-05-01

Full Text Available Abstract Background Genome sequencing of barley has been delayed due to its large genome size (ca. 5,000Mbp. Among the fast sequencing systems, 454 liquid phase pyrosequencing provides the longest reads and is the most promising method for BAC clones. Here we report the results of pooled sequencing of BAC clones selected with ESTs genetically mapped to chromosome 3H. Results We sequenced pooled barley BAC clones using a 454 parallel genome sequencer. A PCR screening system based on primer sets derived from genetically mapped ESTs on chromosome 3H was used for clone selection in a BAC library developed from cultivar "Haruna Nijo". The DNA samples of 10 or 20 BAC clones were pooled and used for shotgun library development. The homology between contig sequences generated in each pooled library and mapped EST sequences was studied. The number of contigs assigned on chromosome 3H was 372. Their lengths ranged from 1,230 bp to 58,322 bp with an average 14,891 bp. Of these contigs, 240 showed homology and colinearity with the genome sequence of rice chromosome 1. A contig annotation browser supplemented with query search by unique sequence or genetic map position was developed. The identified contigs can be annotated with barley cDNAs and reference sequences on the browser. Homology analysis of these contigs with rice genes indicated that 1,239 rice genes can be assigned to barley contigs by the simple comparison of sequence lengths in both species. Of these genes, 492 are assigned to rice chromosome 1. Conclusions We demonstrate the efficiency of sequencing gene rich regions from barley chromosome 3H, with special reference to syntenic relationships with rice chromosome 1.
Database Description - AcEST | LSDB Archive [Life Science Database Archive metadata

Lifescience Database Archive (English)

Full Text Available abase Description General information of database Database name AcEST Alternative n...hi, Tokyo-to 192-0397 Tel: +81-42-677-1111(ext.3654) E-mail: Database classificat...eneris Taxonomy ID: 13818 Database description This is a database of EST sequences of Adiantum capillus-vene...(3): 223-227. External Links: Original website information Database maintenance site Plant Environmental Res...base Database Description Download License Update History of This Database Site Policy | Contact Us Database Description - AcEST | LSDB Archive ...
Accelerating Relevance Vector Machine for Large-Scale Data on Spark

Directory of Open Access Journals (Sweden)

Liu Fang

2017-01-01

Full Text Available Relevance vector machine (RVM is a machine learning algorithm based on a sparse Bayesian framework, which performs well when running classification and regression tasks on small-scale datasets. However, RVM also has certain drawbacks which restricts its practical applications such as (1 slow training process, (2 poor performance on training large-scale datasets. In order to solve these problem, we propose Discrete AdaBoost RVM (DAB-RVM which incorporate ensemble learning in RVM at first. This method performs well with large-scale low-dimensional datasets. However, as the number of features increases, the training time of DAB-RVM increases as well. To avoid this phenomenon, we utilize the sufficient training samples of large-scale datasets and propose all features boosting RVM (AFB-RVM, which modifies the way of obtaining weak classifiers. In our experiments we study the differences between various boosting techniques with RVM, demonstrating the performance of the proposed approaches on Spark. As a result of this paper, two proposed approaches on Spark for different types of large-scale datasets are available.
Development and Characterization of 37 Novel EST-SSR Markers in Pisum sativum (Fabaceae

Directory of Open Access Journals (Sweden)

Xiaofeng Zhuang

2013-01-01

Full Text Available Premise of the study: Simple sequence repeat markers were developed based on expressed sequence tags (EST-SSR and screened for polymorphism among 23 Pisum sativum individuals to assist development and refinement of pea linkage maps. In particular, the SSR markers were developed to assist in mapping of white mold disease resistance quantitative trait loci. Methods and Results: Primer pairs were designed for 46 SSRs identified in EST contiguous sequences assembled from a 454 pyrosequenced transcriptome of the pea cultivar, ‘LIFTER’. Thirty-seven SSR markers amplified PCR products, of which 11 (30% SSR markers produced polymorphism in 23 individuals, including parents of recombinant inbred lines, with two to four alleles. The observed and expected heterozygosities ranged from 0 to 0.43 and from 0.31 to 0.83, respectively. Conclusions: These EST-SSR markers for pea will be useful for refinement of pea linkage maps, and will likely be useful for comparative mapping of pea and as tools for marker-based pea breeding.
The large-scale blast score ratio (LS-BSR pipeline: a method to rapidly compare genetic content between bacterial genomes

Directory of Open Access Journals (Sweden)

Jason W. Sahl

2014-04-01

Full Text Available Background. As whole genome sequence data from bacterial isolates becomes cheaper to generate, computational methods are needed to correlate sequence data with biological observations. Here we present the large-scale BLAST score ratio (LS-BSR pipeline, which rapidly compares the genetic content of hundreds to thousands of bacterial genomes, and returns a matrix that describes the relatedness of all coding sequences (CDSs in all genomes surveyed. This matrix can be easily parsed in order to identify genetic relationships between bacterial genomes. Although pipelines have been published that group peptides by sequence similarity, no other software performs the rapid, large-scale, full-genome comparative analyses carried out by LS-BSR.Results. To demonstrate the utility of the method, the LS-BSR pipeline was tested on 96 Escherichia coli and Shigella genomes; the pipeline ran in 163 min using 16 processors, which is a greater than 7-fold speedup compared to using a single processor. The BSR values for each CDS, which indicate a relative level of relatedness, were then mapped to each genome on an independent core genome single nucleotide polymorphism (SNP based phylogeny. Comparisons were then used to identify clade specific CDS markers and validate the LS-BSR pipeline based on molecular markers that delineate between classical E. coli pathogenic variant (pathovar designations. Scalability tests demonstrated that the LS-BSR pipeline can process 1,000 E. coli genomes in 27–57 h, depending upon the alignment method, using 16 processors.Conclusions. LS-BSR is an open-source, parallel implementation of the BSR algorithm, enabling rapid comparison of the genetic content of large numbers of genomes. The results of the pipeline can be used to identify specific markers between user-defined phylogenetic groups, and to identify the loss and/or acquisition of genetic information between bacterial isolates. Taxa-specific genetic markers can then be translated
Bayesian hierarchical model for large-scale covariance matrix estimation.

Science.gov (United States)

Zhu, Dongxiao; Hero, Alfred O

2007-12-01

Many bioinformatics problems implicitly depend on estimating large-scale covariance matrix. The traditional approaches tend to give rise to high variance and low accuracy due to "overfitting." We cast the large-scale covariance matrix estimation problem into the Bayesian hierarchical model framework, and introduce dependency between covariance parameters. We demonstrate the advantages of our approaches over the traditional approaches using simulations and OMICS data analysis.
Creating Large Scale Database Servers

International Nuclear Information System (INIS)

Becla, Jacek

2001-01-01

The BaBar experiment at the Stanford Linear Accelerator Center (SLAC) is designed to perform a high precision investigation of the decays of the B-meson produced from electron-positron interactions. The experiment, started in May 1999, will generate approximately 300TB/year of data for 10 years. All of the data will reside in Objectivity databases accessible via the Advanced Multi-threaded Server (AMS). To date, over 70TB of data have been placed in Objectivity/DB, making it one of the largest databases in the world. Providing access to such a large quantity of data through a database server is a daunting task. A full-scale testbed environment had to be developed to tune various software parameters and a fundamental change had to occur in the AMS architecture to allow it to scale past several hundred terabytes of data. Additionally, several protocol extensions had to be implemented to provide practical access to large quantities of data. This paper will describe the design of the database and the changes that we needed to make in the AMS for scalability reasons and how the lessons we learned would be applicable to virtually any kind of database server seeking to operate in the Petabyte region
Creating Large Scale Database Servers

Energy Technology Data Exchange (ETDEWEB)

Becla, Jacek

2001-12-14

The BaBar experiment at the Stanford Linear Accelerator Center (SLAC) is designed to perform a high precision investigation of the decays of the B-meson produced from electron-positron interactions. The experiment, started in May 1999, will generate approximately 300TB/year of data for 10 years. All of the data will reside in Objectivity databases accessible via the Advanced Multi-threaded Server (AMS). To date, over 70TB of data have been placed in Objectivity/DB, making it one of the largest databases in the world. Providing access to such a large quantity of data through a database server is a daunting task. A full-scale testbed environment had to be developed to tune various software parameters and a fundamental change had to occur in the AMS architecture to allow it to scale past several hundred terabytes of data. Additionally, several protocol extensions had to be implemented to provide practical access to large quantities of data. This paper will describe the design of the database and the changes that we needed to make in the AMS for scalability reasons and how the lessons we learned would be applicable to virtually any kind of database server seeking to operate in the Petabyte region.
Large-scale pool fires

Directory of Open Access Journals (Sweden)

Steinhaus Thomas

2007-01-01

Full Text Available A review of research into the burning behavior of large pool fires and fuel spill fires is presented. The features which distinguish such fires from smaller pool fires are mainly associated with the fire dynamics at low source Froude numbers and the radiative interaction with the fire source. In hydrocarbon fires, higher soot levels at increased diameters result in radiation blockage effects around the perimeter of large fire plumes; this yields lower emissive powers and a drastic reduction in the radiative loss fraction; whilst there are simplifying factors with these phenomena, arising from the fact that soot yield can saturate, there are other complications deriving from the intermittency of the behavior, with luminous regions of efficient combustion appearing randomly in the outer surface of the fire according the turbulent fluctuations in the fire plume. Knowledge of the fluid flow instabilities, which lead to the formation of large eddies, is also key to understanding the behavior of large-scale fires. Here modeling tools can be effectively exploited in order to investigate the fluid flow phenomena, including RANS- and LES-based computational fluid dynamics codes. The latter are well-suited to representation of the turbulent motions, but a number of challenges remain with their practical application. Massively-parallel computational resources are likely to be necessary in order to be able to adequately address the complex coupled phenomena to the level of detail that is necessary.
Imprints of the large-scale structure on AGN formation and evolution

Science.gov (United States)

Porqueres, Natàlia; Jasche, Jens; Enßlin, Torsten A.; Lavaux, Guilhem

2018-04-01

Black hole masses are found to correlate with several global properties of their host galaxies, suggesting that black holes and galaxies have an intertwined evolution and that active galactic nuclei (AGN) have a significant impact on galaxy evolution. Since the large-scale environment can also affect AGN, this work studies how their formation and properties depend on the environment. We have used a reconstructed three-dimensional high-resolution density field obtained from a Bayesian large-scale structure reconstruction method applied to the 2M++ galaxy sample. A web-type classification relying on the shear tensor is used to identify different structures on the cosmic web, defining voids, sheets, filaments, and clusters. We confirm that the environmental density affects the AGN formation and their properties. We found that the AGN abundance is equivalent to the galaxy abundance, indicating that active and inactive galaxies reside in similar dark matter halos. However, occurrence rates are different for each spectral type and accretion rate. These differences are consistent with the AGN evolutionary sequence suggested by previous authors, Seyferts and Transition objects transforming into low-ionization nuclear emission line regions (LINERs), the weaker counterpart of Seyferts. We conclude that AGN properties depend on the environmental density more than on the web-type. More powerful starbursts and younger stellar populations are found in high densities, where interactions and mergers are more likely. AGN hosts show smaller masses in clusters for Seyferts and Transition objects, which might be due to gas stripping. In voids, the AGN population is dominated by the most massive galaxy hosts.
Decentralised stabilising controllers for a class of large-scale linear ...

Indian Academy of Sciences (India)

subsystems resulting from a new aggregation-decomposition technique. The method has been illustrated through a numerical example of a large-scale linear system consisting of three subsystems each of the fourth order. Keywords. Decentralised stabilisation; large-scale linear systems; optimal feedback control; algebraic ...
Large Scale Survey Data in Career Development Research

Science.gov (United States)

Diemer, Matthew A.

2008-01-01

Large scale survey datasets have been underutilized but offer numerous advantages for career development scholars, as they contain numerous career development constructs with large and diverse samples that are followed longitudinally. Constructs such as work salience, vocational expectations, educational expectations, work satisfaction, and…
Similitude and scaling of large structural elements: Case study

Directory of Open Access Journals (Sweden)

M. Shehadeh

2015-06-01

Full Text Available Scaled down models are widely used for experimental investigations of large structures due to the limitation in the capacities of testing facilities along with the expenses of the experimentation. The modeling accuracy depends upon the model material properties, fabrication accuracy and loading techniques. In the present work the Buckingham π theorem is used to develop the relations (i.e. geometry, loading and properties between the model and a large structural element as that is present in the huge existing petroleum oil drilling rigs. The model is to be designed, loaded and treated according to a set of similitude requirements that relate the model to the large structural element. Three independent scale factors which represent three fundamental dimensions, namely mass, length and time need to be selected for designing the scaled down model. Numerical prediction of the stress distribution within the model and its elastic deformation under steady loading is to be made. The results are compared with those obtained from the full scale structure numerical computations. The effect of scaled down model size and material on the accuracy of the modeling technique is thoroughly examined.
Multi-scale coding of genomic information: From DNA sequence to genome structure and function

International Nuclear Information System (INIS)

Arneodo, Alain; Vaillant, Cedric; Audit, Benjamin; Argoul, Francoise; D'Aubenton-Carafa, Yves; Thermes, Claude

2011-01-01

Understanding how chromatin is spatially and dynamically organized in the nucleus of eukaryotic cells and how this affects genome functions is one of the main challenges of cell biology. Since the different orders of packaging in the hierarchical organization of DNA condition the accessibility of DNA sequence elements to trans-acting factors that control the transcription and replication processes, there is actually a wealth of structural and dynamical information to learn in the primary DNA sequence. In this review, we show that when using concepts, methodologies, numerical and experimental techniques coming from statistical mechanics and nonlinear physics combined with wavelet-based multi-scale signal processing, we are able to decipher the multi-scale sequence encoding of chromatin condensation-decondensation mechanisms that play a fundamental role in regulating many molecular processes involved in nuclear functions.
Large-scale preparation of hollow graphitic carbon nanospheres

Energy Technology Data Exchange (ETDEWEB)

Feng, Jun; Li, Fu [Key Laboratory for Liquid-Solid Structural Evolution and Processing of Materials, Ministry of Education, Shandong University, Jinan 250061 (China); Bai, Yu-Jun, E-mail: byj97@126.com [Key Laboratory for Liquid-Solid Structural Evolution and Processing of Materials, Ministry of Education, Shandong University, Jinan 250061 (China); State Key laboratory of Crystal Materials, Shandong University, Jinan 250100 (China); Han, Fu-Dong; Qi, Yong-Xin; Lun, Ning [Key Laboratory for Liquid-Solid Structural Evolution and Processing of Materials, Ministry of Education, Shandong University, Jinan 250061 (China); Lu, Xi-Feng [Lunan Institute of Coal Chemical Engineering, Jining 272000 (China)

2013-01-15

Hollow graphitic carbon nanospheres (HGCNSs) were synthesized on large scale by a simple reaction between glucose and Mg at 550 Degree-Sign C in an autoclave. Characterization by X-ray diffraction, Raman spectroscopy and transmission electron microscopy demonstrates the formation of HGCNSs with an average diameter of 10 nm or so and a wall thickness of a few graphenes. The HGCNSs exhibit a reversible capacity of 391 mAh g{sup -1} after 60 cycles when used as anode materials for Li-ion batteries. -- Graphical abstract: Hollow graphitic carbon nanospheres could be prepared on large scale by the simple reaction between glucose and Mg at 550 Degree-Sign C, which exhibit superior electrochemical performance to graphite. Highlights: Black-Right-Pointing-Pointer Hollow graphitic carbon nanospheres (HGCNSs) were prepared on large scale at 550 Degree-Sign C Black-Right-Pointing-Pointer The preparation is simple, effective and eco-friendly. Black-Right-Pointing-Pointer The in situ yielded MgO nanocrystals promote the graphitization. Black-Right-Pointing-Pointer The HGCNSs exhibit superior electrochemical performance to graphite.
Large-scale impact cratering on the terrestrial planets

International Nuclear Information System (INIS)

Grieve, R.A.F.

1982-01-01

The crater densities on the earth and moon form the basis for a standard flux-time curve that can be used in dating unsampled planetary surfaces and constraining the temporal history of endogenic geologic processes. Abundant evidence is seen not only that impact cratering was an important surface process in planetary history but also that large imapact events produced effects that were crucial in scale. By way of example, it is noted that the formation of multiring basins on the early moon was as important in defining the planetary tectonic framework as plate tectonics is on the earth. Evidence from several planets suggests that the effects of very-large-scale impacts go beyond the simple formation of an impact structure and serve to localize increased endogenic activity over an extended period of geologic time. Even though no longer occurring with the frequency and magnitude of early solar system history, it is noted that large scale impact events continue to affect the local geology of the planets. 92 references

Optical interconnect for large-scale systems

Science.gov (United States)

Dress, William

2013-02-01

This paper presents a switchless, optical interconnect module that serves as a node in a network of identical distribution modules for large-scale systems. Thousands to millions of hosts or endpoints may be interconnected by a network of such modules, avoiding the need for multi-level switches. Several common network topologies are reviewed and their scaling properties assessed. The concept of message-flow routing is discussed in conjunction with the unique properties enabled by the optical distribution module where it is shown how top-down software control (global routing tables, spanning-tree algorithms) may be avoided.
Discovery, genotyping and characterization of structural variation and novel sequence at single nucleotide resolution from de novo genome assemblies on a population scale

DEFF Research Database (Denmark)

Liu, Siyang; Huang, Shujia; Rao, Junhua

2015-01-01

present a novel approach implemented in a single software package, AsmVar, to discover, genotype and characterize different forms of structural variation and novel sequence from population-scale de novo genome assemblies up to nucleotide resolution. Application of AsmVar to several human de novo genome......) as well as large deletions. However, these approaches consistently display a substantial bias against the recovery of complex structural variants and novel sequence in individual genomes and do not provide interpretation information such as the annotation of ancestral state and formation mechanism. We...... assemblies captures a wide spectrum of structural variants and novel sequences present in the human population in high sensitivity and specificity. Our method provides a direct solution for investigating structural variants and novel sequences from de novo genome assemblies, facilitating the construction...
Large-Scale Genomic Analysis of Codon Usage in Dengue Virus and Evaluation of Its Phylogenetic Dependence

Science.gov (United States)

Lara-Ramírez, Edgar E.; Salazar, Ma Isabel; López-López, María de Jesús; Salas-Benito, Juan Santiago; Sánchez-Varela, Alejandro

2014-01-01

The increasing number of dengue virus (DENV) genome sequences available allows identifying the contributing factors to DENV evolution. In the present study, the codon usage in serotypes 1–4 (DENV1–4) has been explored for 3047 sequenced genomes using different statistics methods. The correlation analysis of total GC content (GC) with GC content at the three nucleotide positions of codons (GC1, GC2, and GC3) as well as the effective number of codons (ENC, ENCp) versus GC3 plots revealed mutational bias and purifying selection pressures as the major forces influencing the codon usage, but with distinct pressure on specific nucleotide position in the codon. The correspondence analysis (CA) and clustering analysis on relative synonymous codon usage (RSCU) within each serotype showed similar clustering patterns to the phylogenetic analysis of nucleotide sequences for DENV1–4. These clustering patterns are strongly related to the virus geographic origin. The phylogenetic dependence analysis also suggests that stabilizing selection acts on the codon usage bias. Our analysis of a large scale reveals new feature on DENV genomic evolution. PMID:25136631
[A large-scale accident in Alpine terrain].

Science.gov (United States)

Wildner, M; Paal, P

2015-02-01

Due to the geographical conditions, large-scale accidents amounting to mass casualty incidents (MCI) in Alpine terrain regularly present rescue teams with huge challenges. Using an example incident, specific conditions and typical problems associated with such a situation are presented. The first rescue team members to arrive have the elementary tasks of qualified triage and communication to the control room, which is required to dispatch the necessary additional support. Only with a clear "concept", to which all have to adhere, can the subsequent chaos phase be limited. In this respect, a time factor confounded by adverse weather conditions or darkness represents enormous pressure. Additional hazards are frostbite and hypothermia. If priorities can be established in terms of urgency, then treatment and procedure algorithms have proven successful. For evacuation of causalities, a helicopter should be strived for. Due to the low density of hospitals in Alpine regions, it is often necessary to distribute the patients over a wide area. Rescue operations in Alpine terrain have to be performed according to the particular conditions and require rescue teams to have specific knowledge and expertise. The possibility of a large-scale accident should be considered when planning events. With respect to optimization of rescue measures, regular training and exercises are rational, as is the analysis of previous large-scale Alpine accidents.
Robust and rapid algorithms facilitate large-scale whole genome sequencing downstream analysis in an integrative framework.

Science.gov (United States)

Li, Miaoxin; Li, Jiang; Li, Mulin Jun; Pan, Zhicheng; Hsu, Jacob Shujui; Liu, Dajiang J; Zhan, Xiaowei; Wang, Junwen; Song, Youqiang; Sham, Pak Chung

2017-05-19

Whole genome sequencing (WGS) is a promising strategy to unravel variants or genes responsible for human diseases and traits. However, there is a lack of robust platforms for a comprehensive downstream analysis. In the present study, we first proposed three novel algorithms, sequence gap-filled gene feature annotation, bit-block encoded genotypes and sectional fast access to text lines to address three fundamental problems. The three algorithms then formed the infrastructure of a robust parallel computing framework, KGGSeq, for integrating downstream analysis functions for whole genome sequencing data. KGGSeq has been equipped with a comprehensive set of analysis functions for quality control, filtration, annotation, pathogenic prediction and statistical tests. In the tests with whole genome sequencing data from 1000 Genomes Project, KGGSeq annotated several thousand more reliable non-synonymous variants than other widely used tools (e.g. ANNOVAR and SNPEff). It took only around half an hour on a small server with 10 CPUs to access genotypes of ∼60 million variants of 2504 subjects, while a popular alternative tool required around one day. KGGSeq's bit-block genotype format used 1.5% or less space to flexibly represent phased or unphased genotypes with multiple alleles and achieved a speed of over 1000 times faster to calculate genotypic correlation. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Hierarchical Cantor set in the large scale structure with torus geometry

Energy Technology Data Exchange (ETDEWEB)

Murdzek, R. [Physics Department, ' Al. I. Cuza' University, Blvd. Carol I, Nr. 11, Iassy 700506 (Romania)], E-mail: rmurdzek@yahoo.com

2008-12-15

The formation of large scale structures is considered within a model with string on toroidal space-time. Firstly, the space-time geometry is presented. In this geometry, the Universe is represented by a string describing a torus surface. Thereafter, the large scale structure of the Universe is derived from the string oscillations. The results are in agreement with the cellular structure of the large scale distribution and with the theory of a Cantorian space-time.
Integrating large-scale data and RNA technology to protect crops from fungal pathogens

Directory of Open Access Journals (Sweden)

Ian Joseph Girard

2016-05-01

Full Text Available With a rapidly growing human population it is expected that plant science researchers and the agricultural community will need to increase food productivity using less arable land. This challenge is complicated by fungal pathogens and diseases, many of which can severely impact crop yield. Current measures to control fungal pathogens are either ineffective or have adverse effects on the agricultural enterprise. Thus, developing new strategies through research innovation to protect plants from pathogenic fungi is necessary to overcome these hurdles. RNA sequencing technologies are increasing our understanding of the underlying genes and gene regulatory networks mediating disease outcomes. The application of invigorating next generation sequencing strategies to study plant-pathogen interactions has and will provide unprecedented insight into the complex patterns of gene activity responsible for crop protection. However, questions remain about how biological processes in both the pathogen and the host are specified in space directly at the site of infection and over the infection period. The integration of cutting edge molecular and computational tools will provide plant scientists with the arsenal required to identify genes and molecules that play a role in plant protection. Large scale RNA sequence data can then be used to protect plants by targeting genes essential for pathogen viability in the production of stably transformed lines expressing RNA interference molecules, or through foliar applications of double stranded RNA.
Crystallization and preliminary X-ray crystallographic analysis of EstE1, a new and thermostable esterase cloned from a metagenomic library

Energy Technology Data Exchange (ETDEWEB)

Byun, Jung-Sue [Department of Biology, Yonsei University, Seoul 120-749 (Korea, Republic of); Protein Network Research Center, Yonsei University, Seoul 120-749 (Korea, Republic of); Rhee, Jin-Kyu [Department of Biotechnology, Yonsei University, Seoul 120-749 (Korea, Republic of); Kim, Dong-Uk [Department of Biology, Yonsei University, Seoul 120-749 (Korea, Republic of); Oh, Jong-Won [Department of Biotechnology, Yonsei University, Seoul 120-749 (Korea, Republic of); Cho, Hyun-Soo, E-mail: hscho8@yonsei.ac.kr [Department of Biology, Yonsei University, Seoul 120-749 (Korea, Republic of); Protein Network Research Center, Yonsei University, Seoul 120-749 (Korea, Republic of)

2006-02-01

Recombinant EstE1 protein with a histidine tag at the C-terminus was overexpressed in Escherichia coli strain BL21(DE3) and then purified by affinity chromatography. The protein was then crystallized at 290 K by the hanging-drop vapour-diffusion method. EstE1, a new thermostable esterase, was isolated by functional screening of a metagenomic DNA library from thermal environment samples. This enzyme showed activity towards short-chain acyl derivatives of length C4–C6 at a temperature of 303–363 K and displayed a high thermostability above 353 K. EstE1 has 64 and 57% amino-acid sequence similarity to est{sub pc}-encoded carboxylesterase from Pyrobaculum calidifontis and AFEST from Archaeoglobus fulgidus, respectively. The recombinant protein with a histidine tag at the C-terminus was overexpressed in Escherichia coli strain BL21(DE3) and then purified by affinity chromatography. The protein was crystallized at 290 K by the hanging-drop vapour-diffusion method. X-ray diffraction data were collected to 2.3 Å resolution from an EstE1 crystal; the crystal belongs to space group P4{sub 1}2{sub 1}2, with unit-cell parameters a = b = 73.71, c = 234.23 Å. Assuming the presence of four molecules in the asymmetric unit, the Matthews coefficient V{sub M} is calculated to be 2.2 Å{sup 3} Da{sup −1} and the solvent content is 44.1%.
Large-scale Motion of Solar Filaments

Indian Academy of Sciences (India)

tribpo

Large-scale Motion of Solar Filaments. Pavel Ambrož, Astronomical Institute of the Acad. Sci. of the Czech Republic, CZ-25165. Ondrejov, The Czech Republic. e-mail: pambroz@asu.cas.cz. Alfred Schroll, Kanzelhöehe Solar Observatory of the University of Graz, A-9521 Treffen,. Austria. e-mail: schroll@solobskh.ac.at.
Sensitivity analysis for large-scale problems

Science.gov (United States)

Noor, Ahmed K.; Whitworth, Sandra L.

1987-01-01

The development of efficient techniques for calculating sensitivity derivatives is studied. The objective is to present a computational procedure for calculating sensitivity derivatives as part of performing structural reanalysis for large-scale problems. The scope is limited to framed type structures. Both linear static analysis and free-vibration eigenvalue problems are considered.
Topology Optimization of Large Scale Stokes Flow Problems

DEFF Research Database (Denmark)

Aage, Niels; Poulsen, Thomas Harpsøe; Gersborg-Hansen, Allan

2008-01-01

This note considers topology optimization of large scale 2D and 3D Stokes flow problems using parallel computations. We solve problems with up to 1.125.000 elements in 2D and 128.000 elements in 3D on a shared memory computer consisting of Sun UltraSparc IV CPUs.......This note considers topology optimization of large scale 2D and 3D Stokes flow problems using parallel computations. We solve problems with up to 1.125.000 elements in 2D and 128.000 elements in 3D on a shared memory computer consisting of Sun UltraSparc IV CPUs....
The Cosmology Large Angular Scale Surveyor

Science.gov (United States)

Harrington, Kathleen; Marriage, Tobias; Ali, Aamir; Appel, John; Bennett, Charles; Boone, Fletcher; Brewer, Michael; Chan, Manwei; Chuss, David T.; Colazo, Felipe;

2016-01-01

The Cosmology Large Angular Scale Surveyor (CLASS) is a four telescope array designed to characterize relic primordial gravitational waves from inflation and the optical depth to reionization through a measurement of the polarized cosmic microwave background (CMB) on the largest angular scales. The frequencies of the four CLASS telescopes, one at 38 GHz, two at 93 GHz, and one dichroic system at 145217 GHz, are chosen to avoid spectral regions of high atmospheric emission and span the minimum of the polarized Galactic foregrounds: synchrotron emission at lower frequencies and dust emission at higher frequencies. Low-noise transition edge sensor detectors and a rapid front-end polarization modulator provide a unique combination of high sensitivity, stability, and control of systematics. The CLASS site, at 5200 m in the Chilean Atacama desert, allows for daily mapping of up to 70% of the sky and enables the characterization of CMB polarization at the largest angular scales. Using this combination of a broad frequency range, large sky coverage, control over systematics, and high sensitivity, CLASS will observe the reionization and recombination peaks of the CMB E- and B-mode power spectra. CLASS will make a cosmic variance limited measurement of the optical depth to reionization and will measure or place upper limits on the tensor-to-scalar ratio, r, down to a level of 0.01 (95% C.L.).

Prehospital Acute Stroke Severity Scale to Predict Large Artery Occlusion: Design and Comparison With Other Scales.

Science.gov (United States)

Hastrup, Sidsel; Damgaard, Dorte; Johnsen, Søren Paaske; Andersen, Grethe

2016-07-01

We designed and validated a simple prehospital stroke scale to identify emergent large vessel occlusion (ELVO) in patients with acute ischemic stroke and compared the scale to other published scales for prediction of ELVO. A national historical test cohort of 3127 patients with information on intracranial vessel status (angiography) before reperfusion therapy was identified. National Institutes of Health Stroke Scale (NIHSS) items with the highest predictive value of occlusion of a large intracranial artery were identified, and the most optimal combination meeting predefined criteria to ensure usefulness in the prehospital phase was determined. The predictive performance of Prehospital Acute Stroke Severity (PASS) scale was compared with other published scales for ELVO. The PASS scale was composed of 3 NIHSS scores: level of consciousness (month/age), gaze palsy/deviation, and arm weakness. In derivation of PASS 2/3 of the test cohort was used and showed accuracy (area under the curve) of 0.76 for detecting large arterial occlusion. Optimal cut point ≥2 abnormal scores showed: sensitivity=0.66 (95% CI, 0.62-0.69), specificity=0.83 (0.81-0.85), and area under the curve=0.74 (0.72-0.76). Validation on 1/3 of the test cohort showed similar performance. Patients with a large artery occlusion on angiography with PASS ≥2 had a median NIHSS score of 17 (interquartile range=6) as opposed to PASS <2 with a median NIHSS score of 6 (interquartile range=5). The PASS scale showed equal performance although more simple when compared with other scales predicting ELVO. The PASS scale is simple and has promising accuracy for prediction of ELVO in the field. © 2016 American Heart Association, Inc.
Comparison of relative efficiency of genomic SSR and EST-SSR markers in estimating genetic diversity in sugarcane.

Science.gov (United States)

Parthiban, S; Govindaraj, P; Senthilkumar, S

2018-03-01

Twenty-five primer pairs developed from genomic simple sequence repeats (SSR) were compared with 25 expressed sequence tags (EST) SSRs to evaluate the efficiency of these two sets of primers using 59 sugarcane genetic stocks. The mean polymorphism information content (PIC) of genomic SSR was higher (0.72) compared to the PIC value recorded by EST-SSR marker (0.62). The relatively low level of polymorphism in EST-SSR markers may be due to the location of these markers in more conserved and expressed sequences compared to genomic sequences which are spread throughout the genome. Dendrogram based on the genomic SSR and EST-SSR marker data showed differences in grouping of genotypes. A total of 59 sugarcane accessions were grouped into 6 and 4 clusters using genomic SSR and EST-SSR, respectively. The highly efficient genomic SSR could subcluster the genotypes of some of the clusters formed by EST-SSR markers. The difference in dendrogram observed was probably due to the variation in number of markers produced by genomic SSR and EST-SSR and different portion of genome amplified by both the markers. The combined dendrogram (genomic SSR and EST-SSR) more clearly showed the genetic relationship among the sugarcane genotypes by forming four clusters. The mean genetic similarity (GS) value obtained using EST-SSR among 59 sugarcane accessions was 0.70, whereas the mean GS obtained using genomic SSR was 0.63. Although relatively lower level of polymorphism was displayed by the EST-SSR markers, genetic diversity shown by the EST-SSR was found to be promising as they were functional marker. High level of PIC and low genetic similarity values of genomic SSR may be more useful in DNA fingerprinting, selection of true hybrids, identification of variety specific markers and genetic diversity analysis. Identification of diverse parents based on cluster analysis can be effectively done with EST-SSR as the genetic similarity estimates are based on functional attributes related to
Cell-free translational screening of an expression sequence tag library of Clonorchis sinensis for novel antigen discovery.

Science.gov (United States)

Kasi, Devi; Catherine, Christy; Lee, Seung-Won; Lee, Kyung-Ho; Kim, Yu Jung; Ro Lee, Myeong; Ju, Jung Won; Kim, Dong-Myung

2017-05-01

The rapidly evolving cloning and sequencing technologies have enabled understanding of genomic structure of parasite genomes, opening up new ways of combatting parasite-related diseases. To make the most of the exponentially accumulating genomic data, however, it is crucial to analyze the proteins encoded by these genomic sequences. In this study, we adopted an engineered cell-free protein synthesis system for large-scale expression screening of an expression sequence tag (EST) library of Clonorchis sinensis to identify potential antigens that can be used for diagnosis and treatment of clonorchiasis. To allow high-throughput expression and identification of individual genes comprising the library, a cell-free synthesis reaction was designed such that both the template DNA and the expressed proteins were co-immobilized on the same microbeads, leading to microbead-based linkage of the genotype and phenotype. This reaction configuration allowed streamlined expression, recovery, and analysis of proteins. This approach enabled us to identify 21 antigenic proteins. © 2017 American Institute of Chemical Engineers Biotechnol. Prog., 33:832-837, 2017. © 2017 American Institute of Chemical Engineers.
Analysis using large-scale ringing data

Directory of Open Access Journals (Sweden)

Baillie, S. R.

2004-06-01

Full Text Available Birds are highly mobile organisms and there is increasing evidence that studies at large spatial scales are needed if we are to properly understand their population dynamics. While classical metapopulation models have rarely proved useful for birds, more general metapopulation ideas involving collections of populations interacting within spatially structured landscapes are highly relevant (Harrison, 1994. There is increasing interest in understanding patterns of synchrony, or lack of synchrony, between populations and the environmental and dispersal mechanisms that bring about these patterns (Paradis et al., 2000. To investigate these processes we need to measure abundance, demographic rates and dispersal at large spatial scales, in addition to gathering data on relevant environmental variables. There is an increasing realisation that conservation needs to address rapid declines of common and widespread species (they will not remain so if such trends continue as well as the management of small populations that are at risk of extinction. While the knowledge needed to support the management of small populations can often be obtained from intensive studies in a few restricted areas, conservation of widespread species often requires information on population trends and processes measured at regional, national and continental scales (Baillie, 2001. While management prescriptions for widespread populations may initially be developed from a small number of local studies or experiments, there is an increasing need to understand how such results will scale up when applied across wider areas. There is also a vital role for monitoring at large spatial scales both in identifying such population declines and in assessing population recovery. Gathering data on avian abundance and demography at large spatial scales usually relies on the efforts of large numbers of skilled volunteers. Volunteer studies based on ringing (for example Constant Effort Sites [CES
Fast Simulation of Large-Scale Floods Based on GPU Parallel Computing

OpenAIRE

Qiang Liu; Yi Qin; Guodong Li

2018-01-01

Computing speed is a significant issue of large-scale flood simulations for real-time response to disaster prevention and mitigation. Even today, most of the large-scale flood simulations are generally run on supercomputers due to the massive amounts of data and computations necessary. In this work, a two-dimensional shallow water model based on an unstructured Godunov-type finite volume scheme was proposed for flood simulation. To realize a fast simulation of large-scale floods on a personal...
Managing Risk and Uncertainty in Large-Scale University Research Projects

Science.gov (United States)

Moore, Sharlissa; Shangraw, R. F., Jr.

2011-01-01

Both publicly and privately funded research projects managed by universities are growing in size and scope. Complex, large-scale projects (over $50 million) pose new management challenges and risks for universities. This paper explores the relationship between project success and a variety of factors in large-scale university projects. First, we…
Parallel clustering algorithm for large-scale biological data sets.

Science.gov (United States)

Wang, Minchao; Zhang, Wu; Ding, Wang; Dai, Dongbo; Zhang, Huiran; Xie, Hao; Chen, Luonan; Guo, Yike; Xie, Jiang

2014-01-01

Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs. Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes. A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies.
Large-scale analysis of in Vivo phosphorylated membrane proteins by immobilized metal ion affinity chromatography and mass spectrometry

DEFF Research Database (Denmark)

Nühse, Thomas S; Stensballe, Allan; Jensen, Ole N

2003-01-01

specificity. We investigated the potential of IMAC in combination with capillary liquid chromatography coupled to tandem mass spectrometry for the identification of plasma membrane phosphoproteins of Arabidopsis. Without chemical modification of peptides, over 75% pure phosphopeptides were isolated from...... plasma membrane digests and detected and sequenced by mass spectrometry. We present a scheme for two-dimensional peptide separation using strong anion exchange chromatography prior to IMAC that both decreases the complexity of IMAC-purified phosphopeptides and yields a far greater coverage...... of monophosphorylated peptides. Among the identified sequences, six originated from different isoforms of the plasma membrane H(+)-ATPase and defined two previously unknown phosphorylation sites at the regulatory C terminus. The potential for large-scale identification of phosphorylation sites on plasma membrane...

An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases

Directory of Open Access Journals (Sweden)

Md. Rezaul Karim

2012-03-01

Full Text Available Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous frequent patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous frequent patterns within a reasonable time.
Adaptive visualization for large-scale graph

International Nuclear Information System (INIS)

Nakamura, Hiroko; Shinano, Yuji; Ohzahata, Satoshi

2010-01-01

We propose an adoptive visualization technique for representing a large-scale hierarchical dataset within limited display space. A hierarchical dataset has nodes and links showing the parent-child relationship between the nodes. These nodes and links are described using graphics primitives. When the number of these primitives is large, it is difficult to recognize the structure of the hierarchical data because many primitives are overlapped within a limited region. To overcome this difficulty, we propose an adaptive visualization technique for hierarchical datasets. The proposed technique selects an appropriate graph style according to the nodal density in each area. (author)
Stabilization Algorithms for Large-Scale Problems

DEFF Research Database (Denmark)

Jensen, Toke Koldborg

2006-01-01

The focus of the project is on stabilization of large-scale inverse problems where structured models and iterative algorithms are necessary for computing approximate solutions. For this purpose, we study various iterative Krylov methods and their abilities to produce regularized solutions. Some......-curve. This heuristic is implemented as a part of a larger algorithm which is developed in collaboration with G. Rodriguez and P. C. Hansen. Last, but not least, a large part of the project has, in different ways, revolved around the object-oriented Matlab toolbox MOORe Tools developed by PhD Michael Jacobsen. New...
Development and Testing of New Gene-Homologous EST-SSRs for Eucalyptus gomphocephala (Myrtaceae

Directory of Open Access Journals (Sweden)

Donna Bradbury

2013-07-01

Full Text Available Premise of the study: New microsatellite (simple sequence repeat [SSR] primers were developed from Eucalyptus expressed sequence tags (ESTs and optimized for genetic studies of the southwestern Australian tree E. gomphocephala, which is severely impacted by tree health decline and habitat fragmentation. Methods and Results: A total of 133 gene-homologous EST-SSR primer pairs were designed for Eucalyptus, and 44 were screened in E. gomphocephala. Of these, 17 produced reliable amplification products and 11 were polymorphic. Between two and 13 alleles were observed per locus, and observed heterozygosities ranged from 0.172 to 0.867. All 17 EST-SSRs that amplified E. gomphocephala cross-amplified to at least one of E. marginata, E. camaldulensis, and E. victrix. Conclusions: This set of EST-SSR primer pairs will be valuable tools for future population genetic studies of E. gomphocephala and other eucalypts, particularly for studying gene-linked variation and informing seed-sourcing strategies for ecological restoration.
Global comparative analysis of ESTs from the southern cattle tick, Rhipicephalus (Boophilus microplus

Directory of Open Access Journals (Sweden)

Pertea Geo

2007-10-01

Full Text Available Abstract Background The southern cattle tick, Rhipicephalus (Boophilus microplus, is an economically important parasite of cattle and can transmit several pathogenic microorganisms to its cattle host during the feeding process. Understanding the biology and genomics of R. microplus is critical to developing novel methods for controlling these ticks. Results We present a global comparative genomic analysis of a gene index of R. microplus comprised of 13,643 unique transcripts assembled from 42,512 expressed sequence tags (ESTs, a significant fraction of the complement of R. microplus genes. The source material for these ESTs consisted of polyA RNA from various tissues, lifestages, and strains of R. microplus, including larvae exposed to heat, cold, host odor, and acaricide. Functional annotation using RPS-Blast analysis identified conserved protein domains in the conceptually translated gene index and assigned GO terms to those database transcripts which had informative BlastX hits. Blast Score Ratio and SimiTri analysis compared the conceptual transcriptome of the R. microplus database to other eukaryotic proteomes and EST databases, including those from 3 ticks. The most abundant protein domains in BmiGI were also analyzed by SimiTri methodology. Conclusion These results indicate that a large fraction of BmiGI entries have no homologs in other sequenced genomes. Analysis with the PartiGene annotation pipeline showed 64% of the members of BmiGI could not be assigned GO annotation, thus minimal information is available about a significant fraction of the tick genome. This highlights the important insights in tick biology which are likely to result from a tick genome sequencing project. Global comparative analysis identified some tick genes with unexpected phylogenetic relationships which detailed analysis attributed to gene losses in some members of the animal kingdom. Some tick genes were identified which had close orthologues to mammalian genes
Large-scale determination of sequence, structure, and function relationships in cytosolic glutathione transferases across the biosphere.

Science.gov (United States)

Mashiyama, Susan T; Malabanan, M Merced; Akiva, Eyal; Bhosle, Rahul; Branch, Megan C; Hillerich, Brandan; Jagessar, Kevin; Kim, Jungwook; Patskovsky, Yury; Seidel, Ronald D; Stead, Mark; Toro, Rafael; Vetting, Matthew W; Almo, Steven C; Armstrong, Richard N; Babbitt, Patricia C

2014-04-01

The cytosolic glutathione transferase (cytGST) superfamily comprises more than 13,000 nonredundant sequences found throughout the biosphere. Their key roles in metabolism and defense against oxidative damage have led to thousands of studies over several decades. Despite this attention, little is known about the physiological reactions they catalyze and most of the substrates used to assay cytGSTs are synthetic compounds. A deeper understanding of relationships across the superfamily could provide new clues about their functions. To establish a foundation for expanded classification of cytGSTs, we generated similarity-based subgroupings for the entire superfamily. Using the resulting sequence similarity networks, we chose targets that broadly covered unknown functions and report here experimental results confirming GST-like activity for 82 of them, along with 37 new 3D structures determined for 27 targets. These new data, along with experimentally known GST reactions and structures reported in the literature, were painted onto the networks to generate a global view of their sequence-structure-function relationships. The results show how proteins of both known and unknown function relate to each other across the entire superfamily and reveal that the great majority of cytGSTs have not been experimentally characterized or annotated by canonical class. A mapping of taxonomic classes across the superfamily indicates that many taxa are represented in each subgroup and highlights challenges for classification of superfamily sequences into functionally relevant classes. Experimental determination of disulfide bond reductase activity in many diverse subgroups illustrate a theme common for many reaction types. Finally, sequence comparison between an enzyme that catalyzes a reductive dechlorination reaction relevant to bioremediation efforts with some of its closest homologs reveals differences among them likely to be associated with evolution of this unusual reaction
Large-scale determination of sequence, structure, and function relationships in cytosolic glutathione transferases across the biosphere.

Directory of Open Access Journals (Sweden)

Susan T Mashiyama

2014-04-01

Full Text Available The cytosolic glutathione transferase (cytGST superfamily comprises more than 13,000 nonredundant sequences found throughout the biosphere. Their key roles in metabolism and defense against oxidative damage have led to thousands of studies over several decades. Despite this attention, little is known about the physiological reactions they catalyze and most of the substrates used to assay cytGSTs are synthetic compounds. A deeper understanding of relationships across the superfamily could provide new clues about their functions. To establish a foundation for expanded classification of cytGSTs, we generated similarity-based subgroupings for the entire superfamily. Using the resulting sequence similarity networks, we chose targets that broadly covered unknown functions and report here experimental results confirming GST-like activity for 82 of them, along with 37 new 3D structures determined for 27 targets. These new data, along with experimentally known GST reactions and structures reported in the literature, were painted onto the networks to generate a global view of their sequence-structure-function relationships. The results show how proteins of both known and unknown function relate to each other across the entire superfamily and reveal that the great majority of cytGSTs have not been experimentally characterized or annotated by canonical class. A mapping of taxonomic classes across the superfamily indicates that many taxa are represented in each subgroup and highlights challenges for classification of superfamily sequences into functionally relevant classes. Experimental determination of disulfide bond reductase activity in many diverse subgroups illustrate a theme common for many reaction types. Finally, sequence comparison between an enzyme that catalyzes a reductive dechlorination reaction relevant to bioremediation efforts with some of its closest homologs reveals differences among them likely to be associated with evolution of this
Design study on sodium cooled large-scale reactor

International Nuclear Information System (INIS)

Murakami, Tsutomu; Hishida, Masahiko; Kisohara, Naoyuki

2004-07-01

In Phase 1 of the 'Feasibility Studies on Commercialized Fast Reactor Cycle Systems (F/S)', an advanced loop type reactor has been selected as a promising concept of sodium-cooled large-scale reactor, which has a possibility to fulfill the design requirements of the F/S. In Phase 2, design improvement for further cost reduction of establishment of the plant concept has been performed. This report summarizes the results of the design study on the sodium-cooled large-scale reactor performed in JFY2003, which is the third year of Phase 2. In the JFY2003 design study, critical subjects related to safety, structural integrity and thermal hydraulics which found in the last fiscal year has been examined and the plant concept has been modified. Furthermore, fundamental specifications of main systems and components have been set and economy has been evaluated. In addition, as the interim evaluation of the candidate concept of the FBR fuel cycle is to be conducted, cost effectiveness and achievability for the development goal were evaluated and the data of the three large-scale reactor candidate concepts were prepared. As a results of this study, the plant concept of the sodium-cooled large-scale reactor has been constructed, which has a prospect to satisfy the economic goal (construction cost: less than 200,000 yens/kWe, etc.) and has a prospect to solve the critical subjects. From now on, reflecting the results of elemental experiments, the preliminary conceptual design of this plant will be preceded toward the selection for narrowing down candidate concepts at the end of Phase 2. (author)
Design study on sodium-cooled large-scale reactor

International Nuclear Information System (INIS)

Shimakawa, Yoshio; Nibe, Nobuaki; Hori, Toru

2002-05-01

In Phase 1 of the 'Feasibility Study on Commercialized Fast Reactor Cycle Systems (F/S)', an advanced loop type reactor has been selected as a promising concept of sodium-cooled large-scale reactor, which has a possibility to fulfill the design requirements of the F/S. In Phase 2 of the F/S, it is planed to precede a preliminary conceptual design of a sodium-cooled large-scale reactor based on the design of the advanced loop type reactor. Through the design study, it is intended to construct such a plant concept that can show its attraction and competitiveness as a commercialized reactor. This report summarizes the results of the design study on the sodium-cooled large-scale reactor performed in JFY2001, which is the first year of Phase 2. In the JFY2001 design study, a plant concept has been constructed based on the design of the advanced loop type reactor, and fundamental specifications of main systems and components have been set. Furthermore, critical subjects related to safety, structural integrity, thermal hydraulics, operability, maintainability and economy have been examined and evaluated. As a result of this study, the plant concept of the sodium-cooled large-scale reactor has been constructed, which has a prospect to satisfy the economic goal (construction cost: less than 200,000yens/kWe, etc.) and has a prospect to solve the critical subjects. From now on, reflecting the results of elemental experiments, the preliminary conceptual design of this plant will be preceded toward the selection for narrowing down candidate concepts at the end of Phase 2. (author)
Large scale CMB anomalies from thawing cosmic strings

Energy Technology Data Exchange (ETDEWEB)

Ringeval, Christophe [Centre for Cosmology, Particle Physics and Phenomenology, Institute of Mathematics and Physics, Louvain University, 2 Chemin du Cyclotron, 1348 Louvain-la-Neuve (Belgium); Yamauchi, Daisuke; Yokoyama, Jun' ichi [Research Center for the Early Universe (RESCEU), Graduate School of Science, The University of Tokyo, Tokyo 113-0033 (Japan); Bouchet, François R., E-mail: christophe.ringeval@uclouvain.be, E-mail: yamauchi@resceu.s.u-tokyo.ac.jp, E-mail: yokoyama@resceu.s.u-tokyo.ac.jp, E-mail: bouchet@iap.fr [Institut d' Astrophysique de Paris, UMR 7095-CNRS, Université Pierre et Marie Curie, 98bis boulevard Arago, 75014 Paris (France)

2016-02-01

Cosmic strings formed during inflation are expected to be either diluted over super-Hubble distances, i.e., invisible today, or to have crossed our past light cone very recently. We discuss the latter situation in which a few strings imprint their signature in the Cosmic Microwave Background (CMB) Anisotropies after recombination. Being almost frozen in the Hubble flow, these strings are quasi static and evade almost all of the previously derived constraints on their tension while being able to source large scale anisotropies in the CMB sky. Using a local variance estimator on thousand of numerically simulated Nambu-Goto all sky maps, we compute the expected signal and show that it can mimic a dipole modulation at large angular scales while being negligible at small angles. Interestingly, such a scenario generically produces one cold spot from the thawing of a cosmic string loop. Mixed with anisotropies of inflationary origin, we find that a few strings of tension GU = O(1) × 10{sup −6} match the amplitude of the dipole modulation reported in the Planck satellite measurements and could be at the origin of other large scale anomalies.
Exploiting multi-scale parallelism for large scale numerical modelling of laser wakefield accelerators

International Nuclear Information System (INIS)

Fonseca, R A; Vieira, J; Silva, L O; Fiuza, F; Davidson, A; Tsung, F S; Mori, W B

2013-01-01

A new generation of laser wakefield accelerators (LWFA), supported by the extreme accelerating fields generated in the interaction of PW-Class lasers and underdense targets, promises the production of high quality electron beams in short distances for multiple applications. Achieving this goal will rely heavily on numerical modelling to further understand the underlying physics and identify optimal regimes, but large scale modelling of these scenarios is computationally heavy and requires the efficient use of state-of-the-art petascale supercomputing systems. We discuss the main difficulties involved in running these simulations and the new developments implemented in the OSIRIS framework to address these issues, ranging from multi-dimensional dynamic load balancing and hybrid distributed/shared memory parallelism to the vectorization of the PIC algorithm. We present the results of the OASCR Joule Metric program on the issue of large scale modelling of LWFA, demonstrating speedups of over 1 order of magnitude on the same hardware. Finally, scalability to over ∼10 6 cores and sustained performance over ∼2 P Flops is demonstrated, opening the way for large scale modelling of LWFA scenarios. (paper)
Balancing modern Power System with large scale of wind power

DEFF Research Database (Denmark)

Basit, Abdul; Altin, Müfit; Hansen, Anca Daniela

2014-01-01

Power system operators must ensure robust, secure and reliable power system operation even with a large scale integration of wind power. Electricity generated from the intermittent wind in large propor-tion may impact on the control of power system balance and thus deviations in the power system...... frequency in small or islanded power systems or tie line power flows in interconnected power systems. Therefore, the large scale integration of wind power into the power system strongly concerns the secure and stable grid operation. To ensure the stable power system operation, the evolving power system has...... to be analysed with improved analytical tools and techniques. This paper proposes techniques for the active power balance control in future power systems with the large scale wind power integration, where power balancing model provides the hour-ahead dispatch plan with reduced planning horizon and the real time...
Large-Scale Graph Processing Using Apache Giraph

KAUST Repository

Sakr, Sherif

2017-01-07

This book takes its reader on a journey through Apache Giraph, a popular distributed graph processing platform designed to bring the power of big data processing to graph data. Designed as a step-by-step self-study guide for everyone interested in large-scale graph processing, it describes the fundamental abstractions of the system, its programming models and various techniques for using the system to process graph data at scale, including the implementation of several popular and advanced graph analytics algorithms.
Large-Scale Graph Processing Using Apache Giraph

KAUST Repository

Sakr, Sherif; Orakzai, Faisal Moeen; Abdelaziz, Ibrahim; Khayyat, Zuhair

2017-01-01

This book takes its reader on a journey through Apache Giraph, a popular distributed graph processing platform designed to bring the power of big data processing to graph data. Designed as a step-by-step self-study guide for everyone interested in large-scale graph processing, it describes the fundamental abstractions of the system, its programming models and various techniques for using the system to process graph data at scale, including the implementation of several popular and advanced graph analytics algorithms.
An interactive display system for large-scale 3D models

Science.gov (United States)

Liu, Zijian; Sun, Kun; Tao, Wenbing; Liu, Liman

2018-04-01

With the improvement of 3D reconstruction theory and the rapid development of computer hardware technology, the reconstructed 3D models are enlarging in scale and increasing in complexity. Models with tens of thousands of 3D points or triangular meshes are common in practical applications. Due to storage and computing power limitation, it is difficult to achieve real-time display and interaction with large scale 3D models for some common 3D display software, such as MeshLab. In this paper, we propose a display system for large-scale 3D scene models. We construct the LOD (Levels of Detail) model of the reconstructed 3D scene in advance, and then use an out-of-core view-dependent multi-resolution rendering scheme to realize the real-time display of the large-scale 3D model. With the proposed method, our display system is able to render in real time while roaming in the reconstructed scene and 3D camera poses can also be displayed. Furthermore, the memory consumption can be significantly decreased via internal and external memory exchange mechanism, so that it is possible to display a large scale reconstructed scene with over millions of 3D points or triangular meshes in a regular PC with only 4GB RAM.
Large-scale hydrology in Europe : observed patterns and model performance

Energy Technology Data Exchange (ETDEWEB)

Gudmundsson, Lukas

2011-06-15

In a changing climate, terrestrial water storages are of great interest as water availability impacts key aspects of ecosystem functioning. Thus, a better understanding of the variations of wet and dry periods will contribute to fully grasp processes of the earth system such as nutrient cycling and vegetation dynamics. Currently, river runoff from small, nearly natural, catchments is one of the few variables of the terrestrial water balance that is regularly monitored with detailed spatial and temporal coverage on large scales. River runoff, therefore, provides a foundation to approach European hydrology with respect to observed patterns on large scales, with regard to the ability of models to capture these.The analysis of observed river flow from small catchments, focused on the identification and description of spatial patterns of simultaneous temporal variations of runoff. These are dominated by large-scale variations of climatic variables but also altered by catchment processes. It was shown that time series of annual low, mean and high flows follow the same atmospheric drivers. The observation that high flows are more closely coupled to large scale atmospheric drivers than low flows, indicates the increasing influence of catchment properties on runoff under dry conditions. Further, it was shown that the low-frequency variability of European runoff is dominated by two opposing centres of simultaneous variations, such that dry years in the north are accompanied by wet years in the south.Large-scale hydrological models are simplified representations of our current perception of the terrestrial water balance on large scales. Quantification of the models strengths and weaknesses is the prerequisite for a reliable interpretation of simulation results. Model evaluations may also enable to detect shortcomings with model assumptions and thus enable a refinement of the current perception of hydrological systems. The ability of a multi model ensemble of nine large-scale
Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

Science.gov (United States)

de Souza, Sandro J.; Camargo, Anamaria A.; Briones, Marcelo R. S.; Costa, Fernando F.; Nagai, Maria Aparecida; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; de Fátima Sonati, Maria; Tajara, Eloiza H.; Valentini, Sandro R.; Acencio, Marcio; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Bengtson, Mário Henrique; Carraro, Dirce M.; Carvalho, Alex F.; Carvalho, Lúcia Helena; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Costa, Maria Cristina R.; Curcio, Cyntia; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Leite, Luciana C. C.; Maia, Gustavo; Majumder, Paromita; Marins, Mozart; Matsukuma, Adriana; Melo, Analy S. A.; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana Gilbert; Rahal, Paula; Rainho, Claudia A.; da Ro's, Nancy; de Sá, Renata G.; Sales, Magaly M.; da Silva, Neusa P.; Silva, Tereza C.; da Silva, Wilson; Simão, Daniel F.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Zalcberg, Heloisa; Brentani, Ricardo R.; Reis, Luis F. L.; Dias-Neto, Emmanuel; Simpson, Andrew J. G.

2000-01-01

Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1,181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes, for 67 (44.6%) of the 150 related genes, and for 45 of the 148 (30.4%) EST-predicted genes on this chromosome. Using a set of stringent criteria to validate our sequences, we identified a further 219 previously unannotated transcribed sequences on chromosome 22. Of these, 171 were in fact also defined by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48 transcribed sequences on chromosome 22 not defined by other sequences. All of the transcribed sequences defined by ORESTES coincided with DNA regions predicted as encoding exons by genscan. (http://genes.mit.edu/GENSCAN.html). PMID:11070084
Large-scale perturbations from the waterfall field in hybrid inflation

International Nuclear Information System (INIS)

Fonseca, José; Wands, David; Sasaki, Misao

2010-01-01

We estimate large-scale curvature perturbations from isocurvature fluctuations in the waterfall field during hybrid inflation, in addition to the usual inflaton field perturbations. The tachyonic instability at the end of inflation leads to an explosive growth of super-Hubble scale perturbations, but they retain the steep blue spectrum characteristic of vacuum fluctuations in a massive field during inflation. The power spectrum thus peaks around the Hubble-horizon scale at the end of inflation. We extend the usual δN formalism to include the essential role of these small fluctuations when estimating the large-scale curvature perturbation. The resulting curvature perturbation due to fluctuations in the waterfall field is second-order and the spectrum is expected to be of order 10 −54 on cosmological scales
Decoupling local mechanics from large-scale structure in modular metamaterials

Science.gov (United States)

Yang, Nan; Silverberg, Jesse L.

2017-04-01

A defining feature of mechanical metamaterials is that their properties are determined by the organization of internal structure instead of the raw fabrication materials. This shift of attention to engineering internal degrees of freedom has coaxed relatively simple materials into exhibiting a wide range of remarkable mechanical properties. For practical applications to be realized, however, this nascent understanding of metamaterial design must be translated into a capacity for engineering large-scale structures with prescribed mechanical functionality. Thus, the challenge is to systematically map desired functionality of large-scale structures backward into a design scheme while using finite parameter domains. Such “inverse design” is often complicated by the deep coupling between large-scale structure and local mechanical function, which limits the available design space. Here, we introduce a design strategy for constructing 1D, 2D, and 3D mechanical metamaterials inspired by modular origami and kirigami. Our approach is to assemble a number of modules into a voxelized large-scale structure, where the module’s design has a greater number of mechanical design parameters than the number of constraints imposed by bulk assembly. This inequality allows each voxel in the bulk structure to be uniquely assigned mechanical properties independent from its ability to connect and deform with its neighbors. In studying specific examples of large-scale metamaterial structures we show that a decoupling of global structure from local mechanical function allows for a variety of mechanically and topologically complex designs.
The origin of large scale cosmic structure

International Nuclear Information System (INIS)

Jones, B.J.T.; Palmer, P.L.

1985-01-01

The paper concerns the origin of large scale cosmic structure. The evolution of density perturbations, the nonlinear regime (Zel'dovich's solution and others), the Gott and Rees clustering hierarchy, the spectrum of condensations, and biassed galaxy formation, are all discussed. (UK)

A practical process for light-water detritiation at large scales

Energy Technology Data Exchange (ETDEWEB)

Boniface, H.A. [Atomic Energy of Canada Limited, Chalk River, ON (Canada); Robinson, J., E-mail: jr@tyne-engineering.com [Tyne Engineering, Burlington, ON (Canada); Gnanapragasam, N.V.; Castillo, I.; Suppiah, S. [Atomic Energy of Canada Limited, Chalk River, ON (Canada)

2014-07-01

AECL and Tyne Engineering have recently completed a preliminary engineering design for a modest-scale tritium removal plant for light water, intended for installation at AECL's Chalk River Laboratories (CRL). This plant design was based on the Combined Electrolysis and Catalytic Exchange (CECE) technology developed at CRL over many years and demonstrated there and elsewhere. The general features and capabilities of this design have been reported as well as the versatility of the design for separating any pair of the three hydrogen isotopes. The same CECE technology could be applied directly to very large-scale wastewater detritiation, such as the case at Fukushima Daiichi Nuclear Power Station. However, since the CECE process scales linearly with throughput, the required capital and operating costs are substantial for such large-scale applications. This paper discusses some options for reducing the costs of very large-scale detritiation. Options include: Reducing tritium removal effectiveness; Energy recovery; Improving the tolerance of impurities; Use of less expensive or more efficient equipment. A brief comparison with alternative processes is also presented. (author)
OffshoreDC DC grids for integration of large scale wind power

DEFF Research Database (Denmark)

Zeni, Lorenzo; Endegnanew, Atsede Gualu; Stamatiou, Georgios

The present report summarizes the main findings of the Nordic Energy Research project “DC grids for large scale integration of offshore wind power – OffshoreDC”. The project is been funded by Nordic Energy Research through the TFI programme and was active between 2011 and 2016. The overall...... objective of the project was to drive the development of the VSC based HVDC technology for future large scale offshore grids, supporting a standardised and commercial development of the technology, and improving the opportunities for the technology to support power system integration of large scale offshore...
Low-Complexity Transmit Antenna Selection and Beamforming for Large-Scale MIMO Communications

Directory of Open Access Journals (Sweden)

Kun Qian

2014-01-01

Full Text Available Transmit antenna selection plays an important role in large-scale multiple-input multiple-output (MIMO communications, but optimal large-scale MIMO antenna selection is a technical challenge. Exhaustive search is often employed in antenna selection, but it cannot be efficiently implemented in large-scale MIMO communication systems due to its prohibitive high computation complexity. This paper proposes a low-complexity interactive multiple-parameter optimization method for joint transmit antenna selection and beamforming in large-scale MIMO communication systems. The objective is to jointly maximize the channel outrage capacity and signal-to-noise (SNR performance and minimize the mean square error in transmit antenna selection and minimum variance distortionless response (MVDR beamforming without exhaustive search. The effectiveness of all the proposed methods is verified by extensive simulation results. It is shown that the required antenna selection processing time of the proposed method does not increase along with the increase of selected antennas, but the computation complexity of conventional exhaustive search method will significantly increase when large-scale antennas are employed in the system. This is particularly useful in antenna selection for large-scale MIMO communication systems.
Gene Expression Profiling and Identification of Resistance Genes to Aspergillus flavus Infection in Peanut through EST and Microarray Strategies

Directory of Open Access Journals (Sweden)

Baozhu Guo

2011-06-01

Full Text Available Aspergillus flavus and A. parasiticus infect peanut seeds and produce aflatoxins, which are associated with various diseases in domestic animals and humans throughout the world. The most cost-effective strategy to minimize aflatoxin contamination involves the development of peanut cultivars that are resistant to fungal infection and/or aflatoxin production. To identify peanut Aspergillus-interactive and peanut Aspergillus-resistance genes, we carried out a large scale peanut Expressed Sequence Tag (EST project which we used to construct a peanut glass slide oligonucleotide microarray. The fabricated microarray represents over 40% of the protein coding genes in the peanut genome. For expression profiling, resistant and susceptible peanut cultivars were infected with a mixture of Aspergillus flavus and parasiticus spores. The subsequent microarray analysis identified 62 genes in resistant cultivars that were up-expressed in response to Aspergillus infection. In addition, we identified 22 putative Aspergillus-resistance genes that were constitutively up-expressed in the resistant cultivar in comparison to the susceptible cultivar. Some of these genes were homologous to peanut, corn, and soybean genes that were previously shown to confer resistance to fungal infection. This study is a first step towards a comprehensive genome-scale platform for developing Aspergillus-resistant peanut cultivars through targeted marker-assisted breeding and genetic engineering.
The effective field theory of cosmological large scale structures

Energy Technology Data Exchange (ETDEWEB)

Carrasco, John Joseph M. [Stanford Univ., Stanford, CA (United States); Hertzberg, Mark P. [Stanford Univ., Stanford, CA (United States); SLAC National Accelerator Lab., Menlo Park, CA (United States); Senatore, Leonardo [Stanford Univ., Stanford, CA (United States); SLAC National Accelerator Lab., Menlo Park, CA (United States)

2012-09-20

Large scale structure surveys will likely become the next leading cosmological probe. In our universe, matter perturbations are large on short distances and small at long scales, i.e. strongly coupled in the UV and weakly coupled in the IR. To make precise analytical predictions on large scales, we develop an effective field theory formulated in terms of an IR effective fluid characterized by several parameters, such as speed of sound and viscosity. These parameters, determined by the UV physics described by the Boltzmann equation, are measured from N-body simulations. We find that the speed of sound of the effective fluid is c²_s ≈ 10^–6c² and that the viscosity contributions are of the same order. The fluid describes all the relevant physics at long scales k and permits a manifestly convergent perturbative expansion in the size of the matter perturbations δ(k) for all the observables. As an example, we calculate the correction to the power spectrum at order δ(k)⁴. As a result, the predictions of the effective field theory are found to be in much better agreement with observation than standard cosmological perturbation theory, already reaching percent precision at this order up to a relatively short scale k ≃ 0.24h Mpc^–1.
Temporal flexibility and careers: The role of large-scale organizations for physicians

OpenAIRE

Forrest Briscoe

2006-01-01

Temporal flexibility and careers: The role of large-scale organizations for physicians. Forrest Briscoe Briscoe This study investigates how employment in large-scale organizations affects the work lives of practicing physicians. Well-established theory associates larger organizations with bureaucratic constraint, loss of workplace control, and dissatisfaction, but this author finds that large scale is also associated with greater schedule and career flexibility. Ironically, the bureaucratic p...
The role of large scale motions on passive scalar transport

Science.gov (United States)

Dharmarathne, Suranga; Araya, Guillermo; Tutkun, Murat; Leonardi, Stefano; Castillo, Luciano

2014-11-01

We study direct numerical simulation (DNS) of turbulent channel flow at Reτ = 394 to investigate effect of large scale motions on fluctuating temperature field which forms a passive scalar field. Statistical description of the large scale features of the turbulent channel flow is obtained using two-point correlations of velocity components. Two-point correlations of fluctuating temperature field is also examined in order to identify possible similarities between velocity and temperature fields. The two-point cross-correlations betwen the velocity and temperature fluctuations are further analyzed to establish connections between these two fields. In addition, we use proper orhtogonal decompotion (POD) to extract most dominant modes of the fields and discuss the coupling of large scale features of turbulence and the temperature field.
Ulysses: accurate detection of low-frequency structural variations in large insert-size sequencing libraries.

Science.gov (United States)

Gillet-Markowska, Alexandre; Richard, Hugues; Fischer, Gilles; Lafontaine, Ingrid

2015-03-15

The detection of structural variations (SVs) in short-range Paired-End (PE) libraries remains challenging because SV breakpoints can involve large dispersed repeated sequences, or carry inherent complexity, hardly resolvable with classical PE sequencing data. In contrast, large insert-size sequencing libraries (Mate-Pair libraries) provide higher physical coverage of the genome and give access to repeat-containing regions. They can thus theoretically overcome previous limitations as they are becoming routinely accessible. Nevertheless, broad insert size distributions and high rates of chimerical sequences are usually associated to this type of libraries, which makes the accurate annotation of SV challenging. Here, we present Ulysses, a tool that achieves drastically higher detection accuracy than existing tools, both on simulated and real mate-pair sequencing datasets from the 1000 Human Genome project. Ulysses achieves high specificity over the complete spectrum of variants by assessing, in a principled manner, the statistical significance of each possible variant (duplications, deletions, translocations, insertions and inversions) against an explicit model for the generation of experimental noise. This statistical model proves particularly useful for the detection of low frequency variants. SV detection performed on a large insert Mate-Pair library from a breast cancer sample revealed a high level of somatic duplications in the tumor and, to a lesser extent, in the blood sample as well. Altogether, these results show that Ulysses is a valuable tool for the characterization of somatic mosaicism in human tissues and in cancer genomes. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Signatures of non-universal large scales in conditional structure functions from various turbulent flows

International Nuclear Information System (INIS)

Blum, Daniel B; Voth, Greg A; Bewley, Gregory P; Bodenschatz, Eberhard; Gibert, Mathieu; Xu Haitao; Gylfason, Ármann; Mydlarski, Laurent; Yeung, P K

2011-01-01

We present a systematic comparison of conditional structure functions in nine turbulent flows. The flows studied include forced isotropic turbulence simulated on a periodic domain, passive grid wind tunnel turbulence in air and in pressurized SF 6 , active grid wind tunnel turbulence (in both synchronous and random driving modes), the flow between counter-rotating discs, oscillating grid turbulence and the flow in the Lagrangian exploration module (in both constant and random driving modes). We compare longitudinal Eulerian second-order structure functions conditioned on the instantaneous large-scale velocity in each flow to assess the ways in which the large scales affect the small scales in a variety of turbulent flows. Structure functions are shown to have larger values when the large-scale velocity significantly deviates from the mean in most flows, suggesting that dependence on the large scales is typical in many turbulent flows. The effects of the large-scale velocity on the structure functions can be quite strong, with the structure function varying by up to a factor of 2 when the large-scale velocity deviates from the mean by ±2 standard deviations. In several flows, the effects of the large-scale velocity are similar at all the length scales we measured, indicating that the large-scale effects are scale independent. In a few flows, the effects of the large-scale velocity are larger on the smallest length scales. (paper)
Cytology of DNA Replication Reveals Dynamic Plasticity of Large-Scale Chromatin Fibers.

Science.gov (United States)

Deng, Xiang; Zhironkina, Oxana A; Cherepanynets, Varvara D; Strelkova, Olga S; Kireev, Igor I; Belmont, Andrew S

2016-09-26

In higher eukaryotic interphase nuclei, the 100- to >1,000-fold linear compaction of chromatin is difficult to reconcile with its function as a template for transcription, replication, and repair. It is challenging to imagine how DNA and RNA polymerases with their associated molecular machinery would move along the DNA template without transient decondensation of observed large-scale chromatin "chromonema" fibers [1]. Transcription or "replication factory" models [2], in which polymerases remain fixed while DNA is reeled through, are similarly difficult to conceptualize without transient decondensation of these chromonema fibers. Here, we show how a dynamic plasticity of chromatin folding within large-scale chromatin fibers allows DNA replication to take place without significant changes in the global large-scale chromatin compaction or shape of these large-scale chromatin fibers. Time-lapse imaging of lac-operator-tagged chromosome regions shows no major change in the overall compaction of these chromosome regions during their DNA replication. Improved pulse-chase labeling of endogenous interphase chromosomes yields a model in which the global compaction and shape of large-Mbp chromatin domains remains largely invariant during DNA replication, with DNA within these domains undergoing significant movements and redistribution as they move into and then out of adjacent replication foci. In contrast to hierarchical folding models, this dynamic plasticity of large-scale chromatin organization explains how localized changes in DNA topology allow DNA replication to take place without an accompanying global unfolding of large-scale chromatin fibers while suggesting a possible mechanism for maintaining epigenetic programming of large-scale chromatin domains throughout DNA replication. Copyright © 2016 Elsevier Ltd. All rights reserved.
Evaluation of drought propagation in an ensemble mean of large-scale hydrological models

NARCIS (Netherlands)

Loon, van A.F.; Huijgevoort, van M.H.J.; Lanen, van H.A.J.

2012-01-01

Hydrological drought is increasingly studied using large-scale models. It is, however, not sure whether large-scale models reproduce the development of hydrological drought correctly. The pressing question is how well do large-scale models simulate the propagation from meteorological to hydrological
Large-scale chromatin remodeling at the immunoglobulin heavy chain locus: a paradigm for multigene regulation.

Science.gov (United States)

Bolland, Daniel J; Wood, Andrew L; Corcoran, Anne E

2009-01-01

V(D)J recombination in lymphocytes is the cutting and pasting together of antigen receptor genes in cis to generate the enormous variety of coding sequences required to produce diverse antigen receptor proteins. It is the key role of the adaptive immune response, which must potentially combat millions of different foreign antigens. Most antigen receptor loci have evolved to be extremely large and contain multiple individual V, D and J genes. The immunoglobulin heavy chain (Igh) and immunoglobulin kappa light chain (Igk) loci are the largest multigene loci in the mammalian genome and V(D)J recombination is one of the most complicated genetic processes in the nucleus. The challenge for the appropriate lymphocyte is one of macro-management-to make all of the antigen receptor genes in a particular locus available for recombination at the appropriate developmental time-point. Conversely, these large loci must be kept closed in lymphocytes in which they do not normally recombine, to guard against genomic instability generated by the DNA double strand breaks inherent to the V(D)J recombination process. To manage all of these demanding criteria, V(D)J recombination is regulated at numerous levels. It is restricted to lymphocytes since the Rag genes which control the DNA double-strand break step of recombination are only expressed in these cells. Within the lymphocyte lineage, immunoglobulin recombination is restricted to B-lymphocytes and TCR recombination to T-lymphocytes by regulation of locus accessibility, which occurs at multiple levels. Accessibility of recombination signal sequences (RSSs) flanking individual V, D and J genes at the nucleosomal level is the key micro-management mechanism, which is discussed in greater detail in other chapters. This chapter will explore how the antigen receptor loci are regulated as a whole, focussing on the Igh locus as a paradigm for the mechanisms involved. Numerous recent studies have begun to unravel the complex and
Configuration management in large scale infrastructure development

NARCIS (Netherlands)

Rijn, T.P.J. van; Belt, H. van de; Los, R.H.

2000-01-01

Large Scale Infrastructure (LSI) development projects such as the construction of roads, rail-ways and other civil engineering (water)works is tendered differently today than a decade ago. Traditional workflow requested quotes from construction companies for construction works where the works to be
Dual Decomposition for Large-Scale Power Balancing

DEFF Research Database (Denmark)

Halvgaard, Rasmus; Jørgensen, John Bagterp; Vandenberghe, Lieven

2013-01-01

Dual decomposition is applied to power balancing of exible thermal storage units. The centralized large-scale problem is decomposed into smaller subproblems and solved locallyby each unit in the Smart Grid. Convergence is achieved by coordinating the units consumption through a negotiation...
Generation of large-scale vortives in compressible helical turbulence

International Nuclear Information System (INIS)

Chkhetiani, O.G.; Gvaramadze, V.V.

1989-01-01

We consider generation of large-scale vortices in compressible self-gravitating turbulent medium. The closed equation describing evolution of the large-scale vortices in helical turbulence with finite correlation time is obtained. This equation has the form similar to the hydromagnetic dynamo equation, which allows us to call the vortx genertation effect the vortex dynamo. It is possible that principally the same mechanism is responsible both for amplification and maintenance of density waves and magnetic fields in gaseous disks of spiral galaxies. (author). 29 refs
Analysis of newly established EST databases reveals similarities between heart regeneration in newt and fish

Directory of Open Access Journals (Sweden)

Weis Patrick

2010-01-01

Full Text Available Abstract Background The newt Notophthalmus viridescens possesses the remarkable ability to respond to cardiac damage by formation of new myocardial tissue. Surprisingly little is known about changes in gene activities that occur during the course of regeneration. To begin to decipher the molecular processes, that underlie restoration of functional cardiac tissue, we generated an EST database from regenerating newt hearts and compared the transcriptional profile of selected candidates with genes deregulated during zebrafish heart regeneration. Results A cDNA library of 100,000 cDNA clones was generated from newt hearts 14 days after ventricular injury. Sequencing of 11520 cDNA clones resulted in 2894 assembled contigs. BLAST searches revealed 1695 sequences with potential homology to sequences from the NCBI database. BLAST searches to TrEMBL and Swiss-Prot databases assigned 1116 proteins to Gene Ontology terms. We also identified a relatively large set of 174 ORFs, which are likely to be unique for urodele amphibians. Expression analysis of newt-zebrafish homologues confirmed the deregulation of selected genes during heart regeneration. Sequences, BLAST results and GO annotations were visualized in a relational web based database followed by grouping of identified proteins into clusters of GO Terms. Comparison of data from regenerating zebrafish hearts identified biological processes, which were uniformly overrepresented during cardiac regeneration in newt and zebrafish. Conclusion We concluded that heart regeneration in newts and zebrafish led to the activation of similar sets of genes, which suggests that heart regeneration in both species might follow similar principles. The design of the newly established newt EST database allows identification of molecular pathways important for heart regeneration.
Dipolar modulation of Large-Scale Structure

Science.gov (United States)

Yoon, Mijin

For the last two decades, we have seen a drastic development of modern cosmology based on various observations such as the cosmic microwave background (CMB), type Ia supernovae, and baryonic acoustic oscillations (BAO). These observational evidences have led us to a great deal of consensus on the cosmological model so-called LambdaCDM and tight constraints on cosmological parameters consisting the model. On the other hand, the advancement in cosmology relies on the cosmological principle: the universe is isotropic and homogeneous on large scales. Testing these fundamental assumptions is crucial and will soon become possible given the planned observations ahead. Dipolar modulation is the largest angular anisotropy of the sky, which is quantified by its direction and amplitude. We measured a huge dipolar modulation in CMB, which mainly originated from our solar system's motion relative to CMB rest frame. However, we have not yet acquired consistent measurements of dipolar modulations in large-scale structure (LSS), as they require large sky coverage and a number of well-identified objects. In this thesis, we explore measurement of dipolar modulation in number counts of LSS objects as a test of statistical isotropy. This thesis is based on two papers that were published in peer-reviewed journals. In Chapter 2 [Yoon et al., 2014], we measured a dipolar modulation in number counts of WISE matched with 2MASS sources. In Chapter 3 [Yoon & Huterer, 2015], we investigated requirements for detection of kinematic dipole in future surveys.
Impact of large-scale tides on cosmological distortions via redshift-space power spectrum

Science.gov (United States)

Akitsu, Kazuyuki; Takada, Masahiro

2018-03-01

Although large-scale perturbations beyond a finite-volume survey region are not direct observables, these affect measurements of clustering statistics of small-scale (subsurvey) perturbations in large-scale structure, compared with the ensemble average, via the mode-coupling effect. In this paper we show that a large-scale tide induced by scalar perturbations causes apparent anisotropic distortions in the redshift-space power spectrum of galaxies in a way depending on an alignment between the tide, wave vector of small-scale modes and line-of-sight direction. Using the perturbation theory of structure formation, we derive a response function of the redshift-space power spectrum to large-scale tide. We then investigate the impact of large-scale tide on estimation of cosmological distances and the redshift-space distortion parameter via the measured redshift-space power spectrum for a hypothetical large-volume survey, based on the Fisher matrix formalism. To do this, we treat the large-scale tide as a signal, rather than an additional source of the statistical errors, and show that a degradation in the parameter is restored if we can employ the prior on the rms amplitude expected for the standard cold dark matter (CDM) model. We also discuss whether the large-scale tide can be constrained at an accuracy better than the CDM prediction, if the effects up to a larger wave number in the nonlinear regime can be included.
Large-scale Intelligent Transporation Systems simulation

Energy Technology Data Exchange (ETDEWEB)

Ewing, T.; Canfield, T.; Hannebutte, U.; Levine, D.; Tentner, A.

1995-06-01

A prototype computer system has been developed which defines a high-level architecture for a large-scale, comprehensive, scalable simulation of an Intelligent Transportation System (ITS) capable of running on massively parallel computers and distributed (networked) computer systems. The prototype includes the modelling of instrumented ``smart`` vehicles with in-vehicle navigation units capable of optimal route planning and Traffic Management Centers (TMC). The TMC has probe vehicle tracking capabilities (display position and attributes of instrumented vehicles), and can provide 2-way interaction with traffic to provide advisories and link times. Both the in-vehicle navigation module and the TMC feature detailed graphical user interfaces to support human-factors studies. The prototype has been developed on a distributed system of networked UNIX computers but is designed to run on ANL`s IBM SP-X parallel computer system for large scale problems. A novel feature of our design is that vehicles will be represented by autonomus computer processes, each with a behavior model which performs independent route selection and reacts to external traffic events much like real vehicles. With this approach, one will be able to take advantage of emerging massively parallel processor (MPP) systems.
Transcriptome analysis of the desert locust central nervous system: production and annotation of a Schistocerca gregaria EST database.

Science.gov (United States)

Badisco, Liesbeth; Huybrechts, Jurgen; Simonet, Gert; Verlinden, Heleen; Marchal, Elisabeth; Huybrechts, Roger; Schoofs, Liliane; De Loof, Arnold; Vanden Broeck, Jozef

2011-03-21

The desert locust (Schistocerca gregaria) displays a fascinating type of phenotypic plasticity, designated as 'phase polyphenism'. Depending on environmental conditions, one genome can be translated into two highly divergent phenotypes, termed the solitarious and gregarious (swarming) phase. Although many of the underlying molecular events remain elusive, the central nervous system (CNS) is expected to play a crucial role in the phase transition process. Locusts have also proven to be interesting model organisms in a physiological and neurobiological research context. However, molecular studies in locusts are hampered by the fact that genome/transcriptome sequence information available for this branch of insects is still limited. We have generated 34,672 raw expressed sequence tags (EST) from the CNS of desert locusts in both phases. These ESTs were assembled in 12,709 unique transcript sequences and nearly 4,000 sequences were functionally annotated. Moreover, the obtained S. gregaria EST information is highly complementary to the existing orthopteran transcriptomic data. Since many novel transcripts encode neuronal signaling and signal transduction components, this paper includes an overview of these sequences. Furthermore, several transcripts being differentially represented in solitarious and gregarious locusts were retrieved from this EST database. The findings highlight the involvement of the CNS in the phase transition process and indicate that this novel annotated database may also add to the emerging knowledge of concomitant neuronal signaling and neuroplasticity events. In summary, we met the need for novel sequence data from desert locust CNS. To our knowledge, we hereby also present the first insect EST database that is derived from the complete CNS. The obtained S. gregaria EST data constitute an important new source of information that will be instrumental in further unraveling the molecular principles of phase polyphenism, in further establishing

Transcriptome analysis of the desert locust central nervous system: production and annotation of a Schistocerca gregaria EST database.

Directory of Open Access Journals (Sweden)

Liesbeth Badisco

Full Text Available BACKGROUND: The desert locust (Schistocerca gregaria displays a fascinating type of phenotypic plasticity, designated as 'phase polyphenism'. Depending on environmental conditions, one genome can be translated into two highly divergent phenotypes, termed the solitarious and gregarious (swarming phase. Although many of the underlying molecular events remain elusive, the central nervous system (CNS is expected to play a crucial role in the phase transition process. Locusts have also proven to be interesting model organisms in a physiological and neurobiological research context. However, molecular studies in locusts are hampered by the fact that genome/transcriptome sequence information available for this branch of insects is still limited. METHODOLOGY: We have generated 34,672 raw expressed sequence tags (EST from the CNS of desert locusts in both phases. These ESTs were assembled in 12,709 unique transcript sequences and nearly 4,000 sequences were functionally annotated. Moreover, the obtained S. gregaria EST information is highly complementary to the existing orthopteran transcriptomic data. Since many novel transcripts encode neuronal signaling and signal transduction components, this paper includes an overview of these sequences. Furthermore, several transcripts being differentially represented in solitarious and gregarious locusts were retrieved from this EST database. The findings highlight the involvement of the CNS in the phase transition process and indicate that this novel annotated database may also add to the emerging knowledge of concomitant neuronal signaling and neuroplasticity events. CONCLUSIONS: In summary, we met the need for novel sequence data from desert locust CNS. To our knowledge, we hereby also present the first insect EST database that is derived from the complete CNS. The obtained S. gregaria EST data constitute an important new source of information that will be instrumental in further unraveling the molecular
Intermittency as a universal characteristic of the complete chromosome DNA sequences of eukaryotes: From protozoa to human genomes

Science.gov (United States)

Rybalko, S.; Larionov, S.; Poptsova, M.; Loskutov, A.

2011-10-01

Large-scale dynamical properties of complete chromosome DNA sequences of eukaryotes are considered. Using the proposed deterministic models with intermittency and symbolic dynamics we describe a wide spectrum of large-scale patterns inherent in these sequences, such as segmental duplications, tandem repeats, and other complex sequence structures. It is shown that the recently discovered gene number balance on the strands is not of a random nature, and certain subsystems of a complete chromosome DNA sequence exhibit the properties of deterministic chaos.
The Hamburg large scale geostrophic ocean general circulation model. Cycle 1

International Nuclear Information System (INIS)

Maier-Reimer, E.; Mikolajewicz, U.

1992-02-01

The rationale for the Large Scale Geostrophic ocean circulation model (LSG-OGCM) is based on the observations that for a large scale ocean circulation model designed for climate studies, the relevant characteristic spatial scales are large compared with the internal Rossby radius throughout most of the ocean, while the characteristic time scales are large compared with the periods of gravity modes and barotropic Rossby wave modes. In the present version of the model, the fast modes have been filtered out by a conventional technique of integrating the full primitive equations, including all terms except the nonlinear advection of momentum, by an implicit time integration method. The free surface is also treated prognostically, without invoking a rigid lid approximation. The numerical scheme is unconditionally stable and has the additional advantage that it can be applied uniformly to the entire globe, including the equatorial and coastal current regions. (orig.)
Genome Partitioner: A web tool for multi-level partitioning of large-scale DNA constructs for synthetic biology applications.

Science.gov (United States)

Christen, Matthias; Del Medico, Luca; Christen, Heinz; Christen, Beat

2017-01-01

Recent advances in lower-cost DNA synthesis techniques have enabled new innovations in the field of synthetic biology. Still, efficient design and higher-order assembly of genome-scale DNA constructs remains a labor-intensive process. Given the complexity, computer assisted design tools that fragment large DNA sequences into fabricable DNA blocks are needed to pave the way towards streamlined assembly of biological systems. Here, we present the Genome Partitioner software implemented as a web-based interface that permits multi-level partitioning of genome-scale DNA designs. Without the need for specialized computing skills, biologists can submit their DNA designs to a fully automated pipeline that generates the optimal retrosynthetic route for higher-order DNA assembly. To test the algorithm, we partitioned a 783 kb Caulobacter crescentus genome design. We validated the partitioning strategy by assembling a 20 kb test segment encompassing a difficult to synthesize DNA sequence. Successful assembly from 1 kb subblocks into the 20 kb segment highlights the effectiveness of the Genome Partitioner for reducing synthesis costs and timelines for higher-order DNA assembly. The Genome Partitioner is broadly applicable to translate DNA designs into ready to order sequences that can be assembled with standardized protocols, thus offering new opportunities to harness the diversity of microbial genomes for synthetic biology applications. The Genome Partitioner web tool can be accessed at https://christenlab.ethz.ch/GenomePartitioner.
Genome Partitioner: A web tool for multi-level partitioning of large-scale DNA constructs for synthetic biology applications.

Directory of Open Access Journals (Sweden)

Matthias Christen

Full Text Available Recent advances in lower-cost DNA synthesis techniques have enabled new innovations in the field of synthetic biology. Still, efficient design and higher-order assembly of genome-scale DNA constructs remains a labor-intensive process. Given the complexity, computer assisted design tools that fragment large DNA sequences into fabricable DNA blocks are needed to pave the way towards streamlined assembly of biological systems. Here, we present the Genome Partitioner software implemented as a web-based interface that permits multi-level partitioning of genome-scale DNA designs. Without the need for specialized computing skills, biologists can submit their DNA designs to a fully automated pipeline that generates the optimal retrosynthetic route for higher-order DNA assembly. To test the algorithm, we partitioned a 783 kb Caulobacter crescentus genome design. We validated the partitioning strategy by assembling a 20 kb test segment encompassing a difficult to synthesize DNA sequence. Successful assembly from 1 kb subblocks into the 20 kb segment highlights the effectiveness of the Genome Partitioner for reducing synthesis costs and timelines for higher-order DNA assembly. The Genome Partitioner is broadly applicable to translate DNA designs into ready to order sequences that can be assembled with standardized protocols, thus offering new opportunities to harness the diversity of microbial genomes for synthetic biology applications. The Genome Partitioner web tool can be accessed at https://christenlab.ethz.ch/GenomePartitioner.
Soft X-ray Emission from Large-Scale Galactic Outflows in Seyfert Galaxies

Science.gov (United States)

Colbert, E. J. M.; Baum, S.; O'Dea, C.; Veilleux, S.

1998-01-01

Kiloparsec-scale soft X-ray nebulae extend along the galaxy minor axes in several Seyfert galaxies, including NGC 2992, NGC 4388 and NGC 5506. In these three galaxies, the extended X-ray emission observed in ROSAT HRI images has 0.2-2.4 keV X-ray luminosities of 0.4-3.5 x 10(40) erg s(-1) . The X-ray nebulae are roughly co-spatial with the large-scale radio emission, suggesting that both are produced by large-scale galactic outflows. Assuming pressure balance between the radio and X-ray plasmas, the X-ray filling factor is >~ 10(4) times as large as the radio plasma filling factor, suggesting that large-scale outflows in Seyfert galaxies are predominantly winds of thermal X-ray emitting gas. We favor an interpretation in which large-scale outflows originate as AGN-driven jets that entrain and heat gas on kpc scales as they make their way out of the galaxy. AGN- and starburst-driven winds are also possible explanations if the winds are oriented along the rotation axis of the galaxy disk. Since large-scale outflows are present in at least 50 percent of Seyfert galaxies, the soft X-ray emission from the outflowing gas may, in many cases, explain the ``soft excess" X-ray feature observed below 2 keV in X-ray spectra of many Seyfert 2 galaxies.
Pro website development and operations streamlining DevOps for large-scale websites

CERN Document Server

Sacks, Matthew

2012-01-01

Pro Website Development and Operations gives you the experience you need to create and operate a large-scale production website. Large-scale websites have their own unique set of problems regarding their design-problems that can get worse when agile methodologies are adopted for rapid results. Managing large-scale websites, deploying applications, and ensuring they are performing well often requires a full scale team involving the development and operations sides of the company-two departments that don't always see eye to eye. When departments struggle with each other, it adds unnecessary comp
Expressed sequence tags as a tool for phylogenetic analysis of placental mammal evolution.

Directory of Open Access Journals (Sweden)

Morgan Kullberg

Full Text Available BACKGROUND: We investigate the usefulness of expressed sequence tags, ESTs, for establishing divergences within the tree of placental mammals. This is done on the example of the established relationships among primates (human, lagomorphs (rabbit, rodents (rat and mouse, artiodactyls (cow, carnivorans (dog and proboscideans (elephant. METHODOLOGY/PRINCIPAL FINDINGS: We have produced 2000 ESTs (1.2 mega bases from a marsupial mouse and characterized the data for their use in phylogenetic analysis. The sequences were used to identify putative orthologous sequences from whole genome projects. Although most ESTs stem from single sequence reads, the frequency of potential sequencing errors was found to be lower than allelic variation. Most of the sequences represented slowly evolving housekeeping-type genes, with an average amino acid distance of 6.6% between human and mouse. Positive Darwinian selection was identified at only a few single sites. Phylogenetic analyses of the EST data yielded trees that were consistent with those established from whole genome projects. CONCLUSIONS: The general quality of EST sequences and the general absence of positive selection in these sequences make ESTs an attractive tool for phylogenetic analysis. The EST approach allows, at reasonable costs, a fast extension of data sampling from species outside the genome projects.
Neutrinos and large-scale structure

International Nuclear Information System (INIS)

Eisenstein, Daniel J.

2015-01-01

I review the use of cosmological large-scale structure to measure properties of neutrinos and other relic populations of light relativistic particles. With experiments to measure the anisotropies of the cosmic microwave anisotropies and the clustering of matter at low redshift, we now have securely measured a relativistic background with density appropriate to the cosmic neutrino background. Our limits on the mass of the neutrino continue to shrink. Experiments coming in the next decade will greatly improve the available precision on searches for the energy density of novel relativistic backgrounds and the mass of neutrinos
Neutrinos and large-scale structure

Energy Technology Data Exchange (ETDEWEB)

Eisenstein, Daniel J. [Daniel J. Eisenstein, Harvard-Smithsonian Center for Astrophysics, 60 Garden St., MS #20, Cambridge, MA 02138 (United States)

2015-07-15

I review the use of cosmological large-scale structure to measure properties of neutrinos and other relic populations of light relativistic particles. With experiments to measure the anisotropies of the cosmic microwave anisotropies and the clustering of matter at low redshift, we now have securely measured a relativistic background with density appropriate to the cosmic neutrino background. Our limits on the mass of the neutrino continue to shrink. Experiments coming in the next decade will greatly improve the available precision on searches for the energy density of novel relativistic backgrounds and the mass of neutrinos.
Characterization of Aftershock Sequences from Large Strike-Slip Earthquakes Along Geometrically Complex Faults

Science.gov (United States)

Sexton, E.; Thomas, A.; Delbridge, B. G.

2017-12-01

Large earthquakes often exhibit complex slip distributions and occur along non-planar fault geometries, resulting in variable stress changes throughout the region of the fault hosting aftershocks. To better discern the role of geometric discontinuities on aftershock sequences, we compare areas of enhanced and reduced Coulomb failure stress and mean stress for systematic differences in the time dependence and productivity of these aftershock sequences. In strike-slip faults, releasing structures, including stepovers and bends, experience an increase in both Coulomb failure stress and mean stress during an earthquake, promoting fluid diffusion into the region and further failure. Conversely, Coulomb failure stress and mean stress decrease in restraining bends and stepovers in strike-slip faults, and fluids diffuse away from these areas, discouraging failure. We examine spatial differences in seismicity patterns along structurally complex strike-slip faults which have hosted large earthquakes, such as the 1992 Mw 7.3 Landers, the 2010 Mw 7.2 El-Mayor Cucapah, the 2014 Mw 6.0 South Napa, and the 2016 Mw 7.0 Kumamoto events. We characterize the behavior of these aftershock sequences with the Epidemic Type Aftershock-Sequence Model (ETAS). In this statistical model, the total occurrence rate of aftershocks induced by an earthquake is λ(t) = λ_0 + \\sum_{i:t_i
Large-Scale Genomic Analysis of Codon Usage in Dengue Virus and Evaluation of Its Phylogenetic Dependence

Directory of Open Access Journals (Sweden)

Edgar E. Lara-Ramírez

2014-01-01

Full Text Available The increasing number of dengue virus (DENV genome sequences available allows identifying the contributing factors to DENV evolution. In the present study, the codon usage in serotypes 1–4 (DENV1–4 has been explored for 3047 sequenced genomes using different statistics methods. The correlation analysis of total GC content (GC with GC content at the three nucleotide positions of codons (GC1, GC2, and GC3 as well as the effective number of codons (ENC, ENCp versus GC3 plots revealed mutational bias and purifying selection pressures as the major forces influencing the codon usage, but with distinct pressure on specific nucleotide position in the codon. The correspondence analysis (CA and clustering analysis on relative synonymous codon usage (RSCU within each serotype showed similar clustering patterns to the phylogenetic analysis of nucleotide sequences for DENV1–4. These clustering patterns are strongly related to the virus geographic origin. The phylogenetic dependence analysis also suggests that stabilizing selection acts on the codon usage bias. Our analysis of a large scale reveals new feature on DENV genomic evolution.
Evaluation of Large-scale Public Sector Reforms

DEFF Research Database (Denmark)

Breidahl, Karen Nielsen; Gjelstrup, Gunnar; Hansen, Hanne Foss

2017-01-01

and more delimited policy areas take place. In our analysis we apply four governance perspectives (rational-instrumental, rational-interest based, institutional-cultural and a chaos perspective) in a comparative analysis of the evaluations of two large-scale public sector reforms in Denmark and Norway. We...
Highly Scalable Trip Grouping for Large Scale Collective Transportation Systems

DEFF Research Database (Denmark)

Gidofalvi, Gyozo; Pedersen, Torben Bach; Risch, Tore

2008-01-01

Transportation-related problems, like road congestion, parking, and pollution, are increasing in most cities. In order to reduce traffic, recent work has proposed methods for vehicle sharing, for example for sharing cabs by grouping "closeby" cab requests and thus minimizing transportation cost...... and utilizing cab space. However, the methods published so far do not scale to large data volumes, which is necessary to facilitate large-scale collective transportation systems, e.g., ride-sharing systems for large cities. This paper presents highly scalable trip grouping algorithms, which generalize previous...
Penalized Estimation in Large-Scale Generalized Linear Array Models

DEFF Research Database (Denmark)

Lund, Adam; Vincent, Martin; Hansen, Niels Richard

2017-01-01

Large-scale generalized linear array models (GLAMs) can be challenging to fit. Computation and storage of its tensor product design matrix can be impossible due to time and memory constraints, and previously considered design matrix free algorithms do not scale well with the dimension...
Large-scale coastal impact induced by a catastrophic storm

DEFF Research Database (Denmark)

Fruergaard, Mikkel; Andersen, Thorbjørn Joest; Johannessen, Peter N

breaching. Our results demonstrate that violent, millennial-scale storms can trigger significant large-scale and long-term changes on barrier coasts, and that coastal changes assumed to take place over centuries or even millennia may occur in association with a single extreme storm event....
Large-eddy simulation with accurate implicit subgrid-scale diffusion

NARCIS (Netherlands)

B. Koren (Barry); C. Beets

1996-01-01

textabstractA method for large-eddy simulation is presented that does not use an explicit subgrid-scale diffusion term. Subgrid-scale effects are modelled implicitly through an appropriate monotone (in the sense of Spekreijse 1987) discretization method for the advective terms. Special attention is
Challenges for Large Scale Structure Theory

CERN Multimedia

CERN. Geneva

2018-01-01

I will describe some of the outstanding questions in Cosmology where answers could be provided by observations of the Large Scale Structure of the Universe at late times.I will discuss some of the theoretical challenges which will have to be overcome to extract this information from the observations. I will describe some of the theoretical tools that might be useful to achieve this goal.
Macroecological factors explain large-scale spatial population patterns of ancient agriculturalists

NARCIS (Netherlands)

Xu, C.; Chen, B.; Abades, S.; Reino, L.; Teng, S.; Ljungqvist, F.C.; Huang, Z.Y.X.; Liu, X.

2015-01-01

Aim: It has been well demonstrated that the large-scale distribution patterns of numerous species are driven by similar macroecological factors. However, understanding of this topic remains limited when applied to our own species. Here we take a large-scale look at ancient agriculturalist
Large Scale Investments in Infrastructure : Competing Policy regimes to Control Connections

NARCIS (Netherlands)

Otsuki, K.; Read, M.L.; Zoomers, E.B.

2016-01-01

This paper proposes to analyse implications of large-scale investments in physical infrastructure for social and environmental justice. While case studies on the global land rush and climate change have advanced our understanding of how large-scale investments in land, forests and water affect

Rotation invariant fast features for large-scale recognition

Science.gov (United States)

Takacs, Gabriel; Chandrasekhar, Vijay; Tsai, Sam; Chen, David; Grzeszczuk, Radek; Girod, Bernd

2012-10-01

We present an end-to-end feature description pipeline which uses a novel interest point detector and Rotation- Invariant Fast Feature (RIFF) descriptors. The proposed RIFF algorithm is 15× faster than SURF1 while producing large-scale retrieval results that are comparable to SIFT.2 Such high-speed features benefit a range of applications from Mobile Augmented Reality (MAR) to web-scale image retrieval and analysis.
Large-scale bioenergy production: how to resolve sustainability trade-offs?

Science.gov (United States)

Humpenöder, Florian; Popp, Alexander; Bodirsky, Benjamin Leon; Weindl, Isabelle; Biewald, Anne; Lotze-Campen, Hermann; Dietrich, Jan Philipp; Klein, David; Kreidenweis, Ulrich; Müller, Christoph; Rolinski, Susanne; Stevanovic, Miodrag

2018-02-01

Large-scale 2nd generation bioenergy deployment is a key element of 1.5 °C and 2 °C transformation pathways. However, large-scale bioenergy production might have negative sustainability implications and thus may conflict with the Sustainable Development Goal (SDG) agenda. Here, we carry out a multi-criteria sustainability assessment of large-scale bioenergy crop production throughout the 21st century (300 EJ in 2100) using a global land-use model. Our analysis indicates that large-scale bioenergy production without complementary measures results in negative effects on the following sustainability indicators: deforestation, CO2 emissions from land-use change, nitrogen losses, unsustainable water withdrawals and food prices. One of our main findings is that single-sector environmental protection measures next to large-scale bioenergy production are prone to involve trade-offs among these sustainability indicators—at least in the absence of more efficient land or water resource use. For instance, if bioenergy production is accompanied by forest protection, deforestation and associated emissions (SDGs 13 and 15) decline substantially whereas food prices (SDG 2) increase. However, our study also shows that this trade-off strongly depends on the development of future food demand. In contrast to environmental protection measures, we find that agricultural intensification lowers some side-effects of bioenergy production substantially (SDGs 13 and 15) without generating new trade-offs—at least among the sustainability indicators considered here. Moreover, our results indicate that a combination of forest and water protection schemes, improved fertilization efficiency, and agricultural intensification would reduce the side-effects of bioenergy production most comprehensively. However, although our study includes more sustainability indicators than previous studies on bioenergy side-effects, our study represents only a small subset of all indicators relevant for the
Large-scale structure in the universe: Theory vs observations

International Nuclear Information System (INIS)

Kashlinsky, A.; Jones, B.J.T.

1990-01-01

A variety of observations constrain models of the origin of large scale cosmic structures. We review here the elements of current theories and comment in detail on which of the current observational data provide the principal constraints. We point out that enough observational data have accumulated to constrain (and perhaps determine) the power spectrum of primordial density fluctuations over a very large range of scales. We discuss the theories in the light of observational data and focus on the potential of future observations in providing even (and ever) tighter constraints. (orig.)
Scalable Kernel Methods and Algorithms for General Sequence Analysis

Science.gov (United States)

Kuksa, Pavel

2011-01-01

Analysis of large-scale sequential data has become an important task in machine learning and pattern recognition, inspired in part by numerous scientific and technological applications such as the document and text classification or the analysis of biological sequences. However, current computational methods for sequence comparison still lack…
Evaluation of drought propagation in an ensemble mean of large-scale hydrological models

Directory of Open Access Journals (Sweden)

A. F. Van Loon

2012-11-01

Full Text Available Hydrological drought is increasingly studied using large-scale models. It is, however, not sure whether large-scale models reproduce the development of hydrological drought correctly. The pressing question is how well do large-scale models simulate the propagation from meteorological to hydrological drought? To answer this question, we evaluated the simulation of drought propagation in an ensemble mean of ten large-scale models, both land-surface models and global hydrological models, that participated in the model intercomparison project of WATCH (WaterMIP. For a selection of case study areas, we studied drought characteristics (number of droughts, duration, severity, drought propagation features (pooling, attenuation, lag, lengthening, and hydrological drought typology (classical rainfall deficit drought, rain-to-snow-season drought, wet-to-dry-season drought, cold snow season drought, warm snow season drought, composite drought.

Drought characteristics simulated by large-scale models clearly reflected drought propagation; i.e. drought events became fewer and longer when moving through the hydrological cycle. However, more differentiation was expected between fast and slowly responding systems, with slowly responding systems having fewer and longer droughts in runoff than fast responding systems. This was not found using large-scale models. Drought propagation features were poorly reproduced by the large-scale models, because runoff reacted immediately to precipitation, in all case study areas. This fast reaction to precipitation, even in cold climates in winter and in semi-arid climates in summer, also greatly influenced the hydrological drought typology as identified by the large-scale models. In general, the large-scale models had the correct representation of drought types, but the percentages of occurrence had some important mismatches, e.g. an overestimation of classical rainfall deficit droughts, and an
Validation of dbEST-SSRs and transferability of some other solanaceous species SSR in ashwagandha [Withania Somnifera (L.) Dunal].

Science.gov (United States)

Parmar, Eva K; Fougat, Ranbir S; Patel, Chandni B; Zala, Harshvardhan N; Patel, Mahesh A; Patel, Swati K; Kumar, Sushil

2015-12-01

Cross-species transferability and expressed sequence tags (ESTs) in public databases are cost-effective means for developing simple sequence repeats (SSRs) for less-studied species like medicinal plants. In this study, 11 EST-SSR markers developed from 742 available ESTs of Withania Somnifera EST sequences and 95 SSR primer pairs derived from other solanaceous crops (tomato, eggplant, chili, and tobacco) were utilized for their amplification and validation. Out of 11, 10 EST-SSRs showed good amplification quality and produced 13 loci with a product size ranging between 167 and 291 bp. Similarly, of the 95 cross-genera SSR loci assayed, 20 (21 %) markers showed the transferability of 5, 27, 32, and 14.2 % from eggplant, chili, tomato, and tobacco, respectively, to ashwagandha. In toto, these 30 SSR markers reported here will be valuable resources and may be applicable for the analysis of intra- and inter-specific genetic diversity in ashwagandha for which till date no information about SSR is available.
Multiresolution comparison of precipitation datasets for large-scale models

Science.gov (United States)

Chun, K. P.; Sapriza Azuri, G.; Davison, B.; DeBeer, C. M.; Wheater, H. S.

2014-12-01

Gridded precipitation datasets are crucial for driving large-scale models which are related to weather forecast and climate research. However, the quality of precipitation products is usually validated individually. Comparisons between gridded precipitation products along with ground observations provide another avenue for investigating how the precipitation uncertainty would affect the performance of large-scale models. In this study, using data from a set of precipitation gauges over British Columbia and Alberta, we evaluate several widely used North America gridded products including the Canadian Gridded Precipitation Anomalies (CANGRD), the National Center for Environmental Prediction (NCEP) reanalysis, the Water and Global Change (WATCH) project, the thin plate spline smoothing algorithms (ANUSPLIN) and Canadian Precipitation Analysis (CaPA). Based on verification criteria for various temporal and spatial scales, results provide an assessment of possible applications for various precipitation datasets. For long-term climate variation studies (~100 years), CANGRD, NCEP, WATCH and ANUSPLIN have different comparative advantages in terms of their resolution and accuracy. For synoptic and mesoscale precipitation patterns, CaPA provides appealing performance of spatial coherence. In addition to the products comparison, various downscaling methods are also surveyed to explore new verification and bias-reduction methods for improving gridded precipitation outputs for large-scale models.
Toward Instructional Leadership: Principals' Perceptions of Large-Scale Assessment in Schools

Science.gov (United States)

Prytula, Michelle; Noonan, Brian; Hellsten, Laurie

2013-01-01

This paper describes a study of the perceptions that Saskatchewan school principals have regarding large-scale assessment reform and their perceptions of how assessment reform has affected their roles as principals. The findings revealed that large-scale assessments, especially provincial assessments, have affected the principal in Saskatchewan…
A large scale field experiment in the Amazon basin (LAMBADA/BATERISTA)

NARCIS (Netherlands)

Dolman, A.J.; Kabat, P.; Gash, J.H.C.; Noilhan, J.; Jochum, A.M.; Nobre, C.

1995-01-01

A description is given of a large-scale field experiment planned in the Amazon basin, aimed at assessing the large-scale balances of energy, water and carbon dioxide. The embedding of this experiment in global change programmes is described, viz. the Biospheric Aspects of the Hydrological Cycle
Large-scale derived flood frequency analysis based on continuous simulation

Science.gov (United States)

Dung Nguyen, Viet; Hundecha, Yeshewatesfa; Guse, Björn; Vorogushyn, Sergiy; Merz, Bruno

2016-04-01

There is an increasing need for spatially consistent flood risk assessments at the regional scale (several 100.000 km2), in particular in the insurance industry and for national risk reduction strategies. However, most large-scale flood risk assessments are composed of smaller-scale assessments and show spatial inconsistencies. To overcome this deficit, a large-scale flood model composed of a weather generator and catchments models was developed reflecting the spatially inherent heterogeneity. The weather generator is a multisite and multivariate stochastic model capable of generating synthetic meteorological fields (precipitation, temperature, etc.) at daily resolution for the regional scale. These fields respect the observed autocorrelation, spatial correlation and co-variance between the variables. They are used as input into catchment models. A long-term simulation of this combined system enables to derive very long discharge series at many catchment locations serving as a basic for spatially consistent flood risk estimates at the regional scale. This combined model was set up and validated for major river catchments in Germany. The weather generator was trained by 53-year observation data at 528 stations covering not only the complete Germany but also parts of France, Switzerland, Czech Republic and Australia with the aggregated spatial scale of 443,931 km2. 10.000 years of daily meteorological fields for the study area were generated. Likewise, rainfall-runoff simulations with SWIM were performed for the entire Elbe, Rhine, Weser, Donau and Ems catchments. The validation results illustrate a good performance of the combined system, as the simulated flood magnitudes and frequencies agree well with the observed flood data. Based on continuous simulation this model chain is then used to estimate flood quantiles for the whole Germany including upstream headwater catchments in neighbouring countries. This continuous large scale approach overcomes the several
Neutronics benchmarks of mixed-oxide fuels using the SCALE/CENTRM sequence

International Nuclear Information System (INIS)

Hollenbach, D.F.; Fox, P.B.

2000-01-01

The purpose of this study is to determine and document the reactor physics parameters (multiplication factors, spatially dependent flux ratios, and spacially dependent reaction rates ) for several distinct sets of problems using two distinct resonance cross-section processing techniques. In SCALE, by default, resonances are processed using NITAWL, which utilizes the Nordheim Integral Treatment. The results produced using this sequence are considered to be the base results. A second set of results are produced by replacing NITAWL with CENTRM/PMC. CENTRM produces point-wise fluxes for a given geometry configuration and set of isotopes. Using these fluxes, PMC produces problem-dependent self-shielding cross sections. Both sequences use ENDF/B-V cross-section data
GAIA: A WINDOW TO LARGE-SCALE MOTIONS

Energy Technology Data Exchange (ETDEWEB)

Nusser, Adi [Physics Department and the Asher Space Science Institute-Technion, Haifa 32000 (Israel); Branchini, Enzo [Department of Physics, Universita Roma Tre, Via della Vasca Navale 84, 00146 Rome (Italy); Davis, Marc, E-mail: adi@physics.technion.ac.il, E-mail: branchin@fis.uniroma3.it, E-mail: mdavis@berkeley.edu [Departments of Astronomy and Physics, University of California, Berkeley, CA 94720 (United States)

2012-08-10

Using redshifts as a proxy for galaxy distances, estimates of the two-dimensional (2D) transverse peculiar velocities of distant galaxies could be obtained from future measurements of proper motions. We provide the mathematical framework for analyzing 2D transverse motions and show that they offer several advantages over traditional probes of large-scale motions. They are completely independent of any intrinsic relations between galaxy properties; hence, they are essentially free of selection biases. They are free from homogeneous and inhomogeneous Malmquist biases that typically plague distance indicator catalogs. They provide additional information to traditional probes that yield line-of-sight peculiar velocities only. Further, because of their 2D nature, fundamental questions regarding vorticity of large-scale flows can be addressed. Gaia, for example, is expected to provide proper motions of at least bright galaxies with high central surface brightness, making proper motions a likely contender for traditional probes based on current and future distance indicator measurements.
Large-scale hydrogen production using nuclear reactors

Energy Technology Data Exchange (ETDEWEB)

Ryland, D.; Stolberg, L.; Kettner, A.; Gnanapragasam, N.; Suppiah, S. [Atomic Energy of Canada Limited, Chalk River, ON (Canada)

2014-07-01

For many years, Atomic Energy of Canada Limited (AECL) has been studying the feasibility of using nuclear reactors, such as the Supercritical Water-cooled Reactor, as an energy source for large scale hydrogen production processes such as High Temperature Steam Electrolysis and the Copper-Chlorine thermochemical cycle. Recent progress includes the augmentation of AECL's experimental capabilities by the construction of experimental systems to test high temperature steam electrolysis button cells at ambient pressure and temperatures up to 850{sup o}C and CuCl/HCl electrolysis cells at pressures up to 7 bar and temperatures up to 100{sup o}C. In parallel, detailed models of solid oxide electrolysis cells and the CuCl/HCl electrolysis cell are being refined and validated using experimental data. Process models are also under development to assess options for economic integration of these hydrogen production processes with nuclear reactors. Options for large-scale energy storage, including hydrogen storage, are also under study. (author)
Planck intermediate results XLII. Large-scale Galactic magnetic fields

DEFF Research Database (Denmark)

Adam, R.; Ade, P. A. R.; Alves, M. I. R.

2016-01-01

Recent models for the large-scale Galactic magnetic fields in the literature have been largely constrained by synchrotron emission and Faraday rotation measures. We use three different but representative models to compare their predicted polarized synchrotron and dust emission with that measured ...
Mining of expressed sequence tag libraries of cacao

Indian Academy of Sciences (India)

Expressed sequence tags (ESTs) provide researchers with a quick and inexpensive route for discovering new genes, data on gene expression and regulation, and also provide genic markers that help in constructing genome maps. Cacao is an important perennial crop of humid tropics. Cacao EST sequences, as available ...
A Topology Visualization Early Warning Distribution Algorithm for Large-Scale Network Security Incidents

Directory of Open Access Journals (Sweden)

Hui He

2013-01-01

Full Text Available It is of great significance to research the early warning system for large-scale network security incidents. It can improve the network system’s emergency response capabilities, alleviate the cyber attacks’ damage, and strengthen the system’s counterattack ability. A comprehensive early warning system is presented in this paper, which combines active measurement and anomaly detection. The key visualization algorithm and technology of the system are mainly discussed. The large-scale network system’s plane visualization is realized based on the divide and conquer thought. First, the topology of the large-scale network is divided into some small-scale networks by the MLkP/CR algorithm. Second, the sub graph plane visualization algorithm is applied to each small-scale network. Finally, the small-scale networks’ topologies are combined into a topology based on the automatic distribution algorithm of force analysis. As the algorithm transforms the large-scale network topology plane visualization problem into a series of small-scale network topology plane visualization and distribution problems, it has higher parallelism and is able to handle the display of ultra-large-scale network topology.
Large-scale preparation of active caspase-3 in E. coli by designing its thrombin-activatable precursors

Directory of Open Access Journals (Sweden)

Park Sung

2008-12-01

Full Text Available Abstract Background Caspase-3, a principal apoptotic effector that cleaves the majority of cellular substrates, is an important medicinal target for the treatment of cancers and neurodegenerative diseases. Large amounts of the protein are required for drug discovery research. However, previous efforts to express the full-length caspase-3 gene in E. coli have been unsuccessful. Results Overproducers of thrombin-activatable full-length caspase-3 precursors were prepared by engineering the auto-activation sites of caspase-3 precursor into a sequence susceptible to thrombin hydrolysis. The engineered precursors were highly expressed as soluble proteins in E. coli and easily purified by affinity chromatography, to levels of 10–15 mg from 1 L of E. coli culture, and readily activated by thrombin digestion. Kinetic evaluation disclosed that thrombin digestion enhanced catalytic activity (kcat/KM of the precursor proteins by two orders of magnitude. Conclusion A novel method for a large-scale preparation of active caspase-3 was developed by a strategic engineering to lack auto-activation during expression with amino acid sequences susceptible to thrombin, facilitating high-level expression in E. coli. The precursor protein was easily purified and activated through specific cleavage at the engineered sites by thrombin, generating active caspase-3 in high yields.
No Large Scale Curvature Perturbations during Waterfall of Hybrid Inflation

OpenAIRE

Abolhasani, Ali Akbar; Firouzjahi, Hassan

2010-01-01

In this paper the possibility of generating large scale curvature perturbations induced from the entropic perturbations during the waterfall phase transition of standard hybrid inflation model is studied. We show that whether or not appreciable amounts of large scale curvature perturbations are produced during the waterfall phase transition depend crucially on the competition between the classical and the quantum mechanical back-reactions to terminate inflation. If one considers only the clas...
Large Scale Emerging Properties from Non Hamiltonian Complex Systems

Directory of Open Access Journals (Sweden)

Marco Bianucci

2017-06-01

Full Text Available The concept of “large scale” depends obviously on the phenomenon we are interested in. For example, in the field of foundation of Thermodynamics from microscopic dynamics, the spatial and time large scales are order of fraction of millimetres and microseconds, respectively, or lesser, and are defined in relation to the spatial and time scales of the microscopic systems. In large scale oceanography or global climate dynamics problems the time scales of interest are order of thousands of kilometres, for space, and many years for time, and are compared to the local and daily/monthly times scales of atmosphere and ocean dynamics. In all the cases a Zwanzig projection approach is, at least in principle, an effective tool to obtain class of universal smooth “large scale” dynamics for few degrees of freedom of interest, starting from the complex dynamics of the whole (usually many degrees of freedom system. The projection approach leads to a very complex calculus with differential operators, that is drastically simplified when the basic dynamics of the system of interest is Hamiltonian, as it happens in Foundation of Thermodynamics problems. However, in geophysical Fluid Dynamics, Biology, and in most of the physical problems the building block fundamental equations of motions have a non Hamiltonian structure. Thus, to continue to apply the useful projection approach also in these cases, we exploit the generalization of the Hamiltonian formalism given by the Lie algebra of dissipative differential operators. In this way, we are able to analytically deal with the series of the differential operators stemming from the projection approach applied to these general cases. Then we shall apply this formalism to obtain some relevant results concerning the statistical properties of the El Niño Southern Oscillation (ENSO.
A new system of labour management in African large-scale agriculture?

DEFF Research Database (Denmark)

Gibbon, Peter; Riisgaard, Lone

2014-01-01

This paper applies a convention theory (CT) approach to the analysis of labour management systems in African large-scale farming. The reconstruction of previous analyses of high-value crop production on large-scale farms in Africa in terms of CT suggests that, since 1980–95, labour management has...

Pseudoscalar-photon mixing and the large scale alignment of QsO ...

Indian Academy of Sciences (India)

physics pp. 679-682. Pseudoscalar-photon mixing and the large scale alignment of QsO optical polarizations. PANKAJ JAIN, sUKANTA PANDA and s sARALA. Physics Department, Indian Institute of Technology, Kanpur 208 016, India. Abstract. We review the observation of large scale alignment of QSO optical polariza-.
Genetic architecture of vitamin B12 and folate levels uncovered applying deeply sequenced large datasets

DEFF Research Database (Denmark)

Grarup, Niels; Sulem, Patrick; Sandholt, Camilla H

2013-01-01

of the underlying biology of human traits and diseases. Here, we used a large Icelandic whole genome sequence dataset combined with Danish exome sequence data to gain insight into the genetic architecture of serum levels of vitamin B12 (B12) and folate. Up to 22.9 million sequence variants were analyzed in combined...... in serum B12 or folate levels do not modify the risk of developing these conditions. Yet, the study demonstrates the value of combining whole genome and exome sequencing approaches to ascertain the genetic and molecular architectures underlying quantitative trait associations....
On the universal character of the large scale structure of the universe

International Nuclear Information System (INIS)

Demianski, M.; International Center for Relativistic Astrophysics; Rome Univ.; Doroshkevich, A.G.

1991-01-01

We review different theories of formation of the large scale structure of the Universe. Special emphasis is put on the theory of inertial instability. We show that for a large class of initial spectra the resulting two point correlation functions are similar. We discuss also the adhesion theory which uses the Burgers equation, Navier-Stokes equation or coagulation process. We review the Zeldovich theory of gravitational instability and discuss the internal structure of pancakes. Finally we discuss the role of the velocity potential in determining the global characteristics of large scale structures (distribution of caustics, scale of voids, etc.). In the last chapter we list the main unsolved problems and main successes of the theory of formation of large scale structure. (orig.)
LAVA: Large scale Automated Vulnerability Addition

Science.gov (United States)

2016-05-23

LAVA: Large-scale Automated Vulnerability Addition Brendan Dolan -Gavitt∗, Patrick Hulin†, Tim Leek†, Fredrich Ulrich†, Ryan Whelan† (Authors listed...released, and thus rapidly become stale. We can expect tools to have been trained to detect bugs that have been released. Given the commercial price tag...low TCN) and dead (low liveness) program data is a powerful one for vulnera- bility injection. The DUAs it identifies are internal program quantities
Large-Scale Transit Signal Priority Implementation

OpenAIRE

Lee, Kevin S.; Lozner, Bailey

2018-01-01

In 2016, the District Department of Transportation (DDOT) deployed Transit Signal Priority (TSP) at 195 intersections in highly urbanized areas of Washington, DC. In collaboration with a broader regional implementation, and in partnership with the Washington Metropolitan Area Transit Authority (WMATA), DDOT set out to apply a systems engineering–driven process to identify, design, test, and accept a large-scale TSP system. This presentation will highlight project successes and lessons learned.
Probing cosmology with the homogeneity scale of the Universe through large scale structure surveys

International Nuclear Information System (INIS)

Ntelis, Pierros

2017-01-01

This thesis exposes my contribution to the measurement of homogeneity scale using galaxies, with the cosmological interpretation of results. In physics, any model is characterized by a set of principles. Most models in cosmology are based on the Cosmological Principle, which states that the universe is statistically homogeneous and isotropic on a large scales. Today, this principle is considered to be true since it is respected by those cosmological models that accurately describe the observations. However, while the isotropy of the universe is now confirmed by many experiments, it is not the case for the homogeneity. To study cosmic homogeneity, we propose to not only test a model but to test directly one of the postulates of modern cosmology. Since 1998 the measurements of cosmic distances using type Ia supernovae, we know that the universe is now in a phase of accelerated expansion. This phenomenon can be explained by the addition of an unknown energy component, which is called dark energy. Since dark energy is responsible for the expansion of the universe, we can study this mysterious fluid by measuring the rate of expansion of the universe. The universe has imprinted in its matter distribution a standard ruler, the Baryon Acoustic Oscillation (BAO) scale. By measuring this scale at different times during the evolution of our universe, it is then possible to measure the rate of expansion of the universe and thus characterize this dark energy. Alternatively, we can use the homogeneity scale to study this dark energy. Studying the homogeneity and the BAO scale requires the statistical study of the matter distribution of the universe at large scales, superior to tens of Mega-parsecs. Galaxies and quasars are formed in the vast over densities of matter and they are very luminous: these sources trace the distribution of matter. By measuring the emission spectra of these sources using large spectroscopic surveys, such as BOSS and eBOSS, we can measure their positions
Large scale comparative codon-pair context analysis unveils general rules that fine-tune evolution of mRNA primary structure.

Directory of Open Access Journals (Sweden)

Gabriela Moura

Full Text Available BACKGROUND: Codon usage and codon-pair context are important gene primary structure features that influence mRNA decoding fidelity. In order to identify general rules that shape codon-pair context and minimize mRNA decoding error, we have carried out a large scale comparative codon-pair context analysis of 119 fully sequenced genomes. METHODOLOGIES/PRINCIPAL FINDINGS: We have developed mathematical and software tools for large scale comparative codon-pair context analysis. These methodologies unveiled general and species specific codon-pair context rules that govern evolution of mRNAs in the 3 domains of life. We show that evolution of bacterial and archeal mRNA primary structure is mainly dependent on constraints imposed by the translational machinery, while in eukaryotes DNA methylation and tri-nucleotide repeats impose strong biases on codon-pair context. CONCLUSIONS: The data highlight fundamental differences between prokaryotic and eukaryotic mRNA decoding rules, which are partially independent of codon usage.
Large-Scale Optimization for Bayesian Inference in Complex Systems

Energy Technology Data Exchange (ETDEWEB)

Willcox, Karen [MIT; Marzouk, Youssef [MIT

2013-11-12

The SAGUARO (Scalable Algorithms for Groundwater Uncertainty Analysis and Robust Optimization) Project focused on the development of scalable numerical algorithms for large-scale Bayesian inversion in complex systems that capitalize on advances in large-scale simulation-based optimization and inversion methods. The project was a collaborative effort among MIT, the University of Texas at Austin, Georgia Institute of Technology, and Sandia National Laboratories. The research was directed in three complementary areas: efficient approximations of the Hessian operator, reductions in complexity of forward simulations via stochastic spectral approximations and model reduction, and employing large-scale optimization concepts to accelerate sampling. The MIT--Sandia component of the SAGUARO Project addressed the intractability of conventional sampling methods for large-scale statistical inverse problems by devising reduced-order models that are faithful to the full-order model over a wide range of parameter values; sampling then employs the reduced model rather than the full model, resulting in very large computational savings. Results indicate little effect on the computed posterior distribution. On the other hand, in the Texas--Georgia Tech component of the project, we retain the full-order model, but exploit inverse problem structure (adjoint-based gradients and partial Hessian information of the parameter-to-observation map) to implicitly extract lower dimensional information on the posterior distribution; this greatly speeds up sampling methods, so that fewer sampling points are needed. We can think of these two approaches as ``reduce then sample'' and ``sample then reduce.'' In fact, these two approaches are complementary, and can be used in conjunction with each other. Moreover, they both exploit deterministic inverse problem structure, in the form of adjoint-based gradient and Hessian information of the underlying parameter-to-observation map, to
Response of deep and shallow tropical maritime cumuli to large-scale processes

Science.gov (United States)

Yanai, M.; Chu, J.-H.; Stark, T. E.; Nitta, T.

1976-01-01

The bulk diagnostic method of Yanai et al. (1973) and a simplified version of the spectral diagnostic method of Nitta (1975) are used for a more quantitative evaluation of the response of various types of cumuliform clouds to large-scale processes, using the same data set in the Marshall Islands area for a 100-day period in 1956. The dependence of the cloud mass flux distribution on radiative cooling, large-scale vertical motion, and evaporation from the sea is examined. It is shown that typical radiative cooling rates in the tropics tend to produce a bimodal distribution of mass spectrum exhibiting deep and shallow clouds. The bimodal distribution is further enhanced when the large-scale vertical motion is upward, and a nearly unimodal distribution of shallow clouds prevails when the relative cooling is compensated by the heating due to the large-scale subsidence. Both deep and shallow clouds are modulated by large-scale disturbances. The primary role of surface evaporation is to maintain the moisture flux at the cloud base.
Accuracy assessment of planimetric large-scale map data for decision-making

Directory of Open Access Journals (Sweden)

Doskocz Adam

2016-06-01

Full Text Available This paper presents decision-making risk estimation based on planimetric large-scale map data, which are data sets or databases which are useful for creating planimetric maps on scales of 1:5,000 or larger. The studies were conducted on four data sets of large-scale map data. Errors of map data were used for a risk assessment of decision-making about the localization of objects, e.g. for land-use planning in realization of investments. An analysis was performed for a large statistical sample set of shift vectors of control points, which were identified with the position errors of these points (errors of map data.
Reviving large-scale projects

International Nuclear Information System (INIS)

Desiront, A.

2003-01-01

For the past decade, most large-scale hydro development projects in northern Quebec have been put on hold due to land disputes with First Nations. Hydroelectric projects have recently been revived following an agreement signed with Aboriginal communities in the province who recognized the need to find new sources of revenue for future generations. Many Cree are working on the project to harness the waters of the Eastmain River located in the middle of their territory. The work involves building an 890 foot long dam, 30 dikes enclosing a 603 square-km reservoir, a spillway, and a power house with 3 generating units with a total capacity of 480 MW of power for start-up in 2007. The project will require the use of 2,400 workers in total. The Cree Construction and Development Company is working on relations between Quebec's 14,000 Crees and the James Bay Energy Corporation, the subsidiary of Hydro-Quebec which is developing the project. Approximately 10 per cent of the $735-million project has been designated for the environmental component. Inspectors ensure that the project complies fully with environmental protection guidelines. Total development costs for Eastmain-1 are in the order of $2 billion of which $735 million will cover work on site and the remainder will cover generating units, transportation and financial charges. Under the treaty known as the Peace of the Braves, signed in February 2002, the Quebec government and Hydro-Quebec will pay the Cree $70 million annually for 50 years for the right to exploit hydro, mining and forest resources within their territory. The project comes at a time when electricity export volumes to the New England states are down due to growth in Quebec's domestic demand. Hydropower is a renewable and non-polluting source of energy that is one of the most acceptable forms of energy where the Kyoto Protocol is concerned. It was emphasized that large-scale hydro-electric projects are needed to provide sufficient energy to meet both
Comparative EST analysis provides insights into the basal aquatic fungus Blastocladiella emersonii

Directory of Open Access Journals (Sweden)

Gomes Suely L

2006-07-01

Full Text Available Abstract Background Blastocladiella emersonii is an aquatic fungus of the Chytridiomycete class, which is at the base of the fungal phylogenetic tree. In this sense, some ancestral characteristics of fungi and animals or fungi and plants could have been retained in this aquatic fungus and lost in members of late-diverging fungal species. To identify in B. emersonii sequences associated with these ancestral characteristics two approaches were followed: (1 a large-scale comparative analysis between putative unigene sequences (uniseqs from B. emersonii and three databases constructed ad hoc with fungal proteins, animal proteins and plant unigenes deposited in Genbank, and (2 a pairwise comparison between B. emersonii full-length cDNA sequences and their putative orthologues in the ascomycete Neurospora crassa and the basidiomycete Ustilago maydis. Results Comparative analyses of B. emersonii uniseqs with fungi, animal and plant databases through the two approaches mentioned above produced 166 B. emersonii sequences, which were identified as putatively absent from other fungi or not previously described. Through these approaches we found: (1 possible orthologues of genes previously identified as specific to animals and/or plants, and (2 genes conserved in fungi, but with a large difference in divergence rate in B. emersonii. Among these sequences, we observed cDNAs encoding enzymes from coenzyme B12-dependent propionyl-CoA pathway, a metabolic route not previously described in fungi, and validated their expression in Northern blots. Conclusion Using two different approaches involving comparative sequence analyses, we could identify sequences from the early-diverging fungus B. emersonii previously considered specific to animals or plants, and highly divergent sequences from the same fungus relative to other fungi.
Large-scale Flow and Transport of Magnetic Flux in the Solar ...

Indian Academy of Sciences (India)

tribpo

Abstract. Horizontal large-scale velocity field describes horizontal displacement of the photospheric magnetic flux in zonal and meridian directions. The flow systems of solar plasma, constructed according to the velocity field, create the large-scale cellular-like patterns with up-flow in the center and the down-flow on the ...
Large-scale analysis of antisense transcription in wheat using the Affymetrix GeneChip Wheat Genome Array

Directory of Open Access Journals (Sweden)

Settles Matthew L

2009-05-01

Full Text Available Abstract Background Natural antisense transcripts (NATs are transcripts of the opposite DNA strand to the sense-strand either at the same locus (cis-encoded or a different locus (trans-encoded. They can affect gene expression at multiple stages including transcription, RNA processing and transport, and translation. NATs give rise to sense-antisense transcript pairs and the number of these identified has escalated greatly with the availability of DNA sequencing resources and public databases. Traditionally, NATs were identified by the alignment of full-length cDNAs or expressed sequence tags to genome sequences, but an alternative method for large-scale detection of sense-antisense transcript pairs involves the use of microarrays. In this study we developed a novel protocol to assay sense- and antisense-strand transcription on the 55 K Affymetrix GeneChip Wheat Genome Array, which is a 3' in vitro transcription (3'IVT expression array. We selected five different tissue types for assay to enable maximum discovery, and used the 'Chinese Spring' wheat genotype because most of the wheat GeneChip probe sequences were based on its genomic sequence. This study is the first report of using a 3'IVT expression array to discover the expression of natural sense-antisense transcript pairs, and may be considered as proof-of-concept. Results By using alternative target preparation schemes, both the sense- and antisense-strand derived transcripts were labeled and hybridized to the Wheat GeneChip. Quality assurance verified that successful hybridization did occur in the antisense-strand assay. A stringent threshold for positive hybridization was applied, which resulted in the identification of 110 sense-antisense transcript pairs, as well as 80 potentially antisense-specific transcripts. Strand-specific RT-PCR validated the microarray observations, and showed that antisense transcription is likely to be tissue specific. For the annotated sense
Utilization of Large Scale Surface Models for Detailed Visibility Analyses

Science.gov (United States)

Caha, J.; Kačmařík, M.

2017-11-01

This article demonstrates utilization of large scale surface models with small spatial resolution and high accuracy, acquired from Unmanned Aerial Vehicle scanning, for visibility analyses. The importance of large scale data for visibility analyses on the local scale, where the detail of the surface model is the most defining factor, is described. The focus is not only the classic Boolean visibility, that is usually determined within GIS, but also on so called extended viewsheds that aims to provide more information about visibility. The case study with examples of visibility analyses was performed on river Opava, near the Ostrava city (Czech Republic). The multiple Boolean viewshed analysis and global horizon viewshed were calculated to determine most prominent features and visibility barriers of the surface. Besides that, the extended viewshed showing angle difference above the local horizon, which describes angular height of the target area above the barrier, is shown. The case study proved that large scale models are appropriate data source for visibility analyses on local level. The discussion summarizes possible future applications and further development directions of visibility analyses.
Large-scale modeling of rain fields from a rain cell deterministic model

Science.gov (United States)

FéRal, Laurent; Sauvageot, Henri; Castanet, Laurent; Lemorton, JoëL.; Cornet, FréDéRic; Leconte, Katia

2006-04-01

A methodology to simulate two-dimensional rain rate fields at large scale (1000 × 1000 km2, the scale of a satellite telecommunication beam or a terrestrial fixed broadband wireless access network) is proposed. It relies on a rain rate field cellular decomposition. At small scale (˜20 × 20 km2), the rain field is split up into its macroscopic components, the rain cells, described by the Hybrid Cell (HYCELL) cellular model. At midscale (˜150 × 150 km2), the rain field results from the conglomeration of rain cells modeled by HYCELL. To account for the rain cell spatial distribution at midscale, the latter is modeled by a doubly aggregative isotropic random walk, the optimal parameterization of which is derived from radar observations at midscale. The extension of the simulation area from the midscale to the large scale (1000 × 1000 km2) requires the modeling of the weather frontal area. The latter is first modeled by a Gaussian field with anisotropic covariance function. The Gaussian field is then turned into a binary field, giving the large-scale locations over which it is raining. This transformation requires the definition of the rain occupation rate over large-scale areas. Its probability distribution is determined from observations by the French operational radar network ARAMIS. The coupling with the rain field modeling at midscale is immediate whenever the large-scale field is split up into midscale subareas. The rain field thus generated accounts for the local CDF at each point, defining a structure spatially correlated at small scale, midscale, and large scale. It is then suggested that this approach be used by system designers to evaluate diversity gain, terrestrial path attenuation, or slant path attenuation for different azimuth and elevation angle directions.
Facile Large-scale synthesis of stable CuO nanoparticles

Science.gov (United States)

Nazari, P.; Abdollahi-Nejand, B.; Eskandari, M.; Kohnehpoushi, S.

2018-04-01

In this work, a novel approach in synthesizing the CuO nanoparticles was introduced. A sequential corrosion and detaching was proposed in the growth and dispersion of CuO nanoparticles in the optimum pH value of eight. The produced CuO nanoparticles showed six nm (±2 nm) in diameter and spherical feather with a high crystallinity and uniformity in size. In this method, a large-scale production of CuO nanoparticles (120 grams in an experimental batch) from Cu micro-particles was achieved which may met the market criteria for large-scale production of CuO nanoparticles.
Quantifying population genetic differentiation from next-generation sequencing data

DEFF Research Database (Denmark)

Fumagalli, Matteo; Garrett Vieira, Filipe Jorge; Korneliussen, Thorfinn Sand

2013-01-01

method for quantifying population genetic differentiation from next-generation sequencing data. In addition, we present a strategy to investigate population structure via Principal Components Analysis. Through extensive simulations, we compare the new method herein proposed to approaches based...... on genotype calling and demonstrate a marked improvement in estimation accuracy for a wide range of conditions. We apply the method to a large-scale genomic data set of domesticated and wild silkworms sequenced at low coverage. We find that we can infer the fine-scale genetic structure of the sampled......Over the last few years, new high-throughput DNA sequencing technologies have dramatically increased speed and reduced sequencing costs. However, the use of these sequencing technologies is often challenged by errors and biases associated with the bioinformatical methods used for analyzing the data...
Large-Scale Cooperative Task Distribution on Peer-to-Peer Networks

Science.gov (United States)

2012-01-01

SUBTITLE Large-scale cooperative task distribution on peer-to-peer networks 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6...disadvantages of ML- Chord are its fixed size (two layers), and limited scala - bility for large-scale systems. RC-Chord extends ML- D. Karrels et al...configurable before runtime. This can be improved by incorporating a distributed learning algorithm to tune the number and range of the DLoE tracking
Comparative Analysis of Different Protocols to Manage Large Scale Networks

OpenAIRE

Anil Rao Pimplapure; Dr Jayant Dubey; Prashant Sen

2013-01-01

In recent year the numbers, complexity and size is increased in Large Scale Network. The best example of Large Scale Network is Internet, and recently once are Data-centers in Cloud Environment. In this process, involvement of several management tasks such as traffic monitoring, security and performance optimization is big task for Network Administrator. This research reports study the different protocols i.e. conventional protocols like Simple Network Management Protocol and newly Gossip bas...

Some links on this page may take you to non-federal websites. Their policies may differ from this site.