genomes extending ensembl: Topics by WorldWideScience.org

Sample records for genomes extending ensembl

Ensembl Genomes: an integrative resource for genome-scale data from non-vertebrate species.

Science.gov (United States)

Kersey, Paul J; Staines, Daniel M; Lawson, Daniel; Kulesha, Eugene; Derwent, Paul; Humphrey, Jay C; Hughes, Daniel S T; Keenan, Stephan; Kerhornou, Arnaud; Koscielny, Gautier; Langridge, Nicholas; McDowall, Mark D; Megy, Karine; Maheswari, Uma; Nuhn, Michael; Paulini, Michael; Pedro, Helder; Toneva, Iliana; Wilson, Derek; Yates, Andrew; Birney, Ewan

2012-01-01

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrative resource for genome-scale data from non-vertebrate species. The project exploits and extends technology (for genome annotation, analysis and dissemination) developed in the context of the (vertebrate-focused) Ensembl project and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. Since its launch in 2009, Ensembl Genomes has undergone rapid expansion, with the goal of providing coverage of all major experimental organisms, and additionally including taxonomic reference points to provide the evolutionary context in which genes can be understood. Against the backdrop of a continuing increase in genome sequencing activities in all parts of the tree of life, we seek to work, wherever possible, with the communities actively generating and using data, and are participants in a growing range of collaborations involved in the annotation and analysis of genomes.
Ensembl Genomes 2016: more genomes, more complexity.

Science.gov (United States)

Kersey, Paul Julian; Allen, James E; Armean, Irina; Boddu, Sanjay; Bolt, Bruce J; Carvalho-Silva, Denise; Christensen, Mikkel; Davis, Paul; Falin, Lee J; Grabmueller, Christoph; Humphrey, Jay; Kerhornou, Arnaud; Khobova, Julia; Aranganathan, Naveen K; Langridge, Nicholas; Lowy, Ernesto; McDowall, Mark D; Maheswari, Uma; Nuhn, Michael; Ong, Chuang Kee; Overduin, Bert; Paulini, Michael; Pedro, Helder; Perry, Emily; Spudich, Giulietta; Tapanari, Electra; Walts, Brandon; Williams, Gareth; Tello-Ruiz, Marcela; Stein, Joshua; Wei, Sharon; Ware, Doreen; Bolser, Daniel M; Howe, Kevin L; Kulesha, Eugene; Lawson, Daniel; Maslen, Gareth; Staines, Daniel M

2016-01-04

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Ensembl 2002: accommodating comparative genomics.

Science.gov (United States)

Clamp, M; Andrews, D; Barker, D; Bevan, P; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Hubbard, T; Kasprzyk, A; Keefe, D; Lehvaslaiho, H; Iyer, V; Melsopp, C; Mongin, E; Pettett, R; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Birney, E

2003-01-01

The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of human, mouse and other genome sequences, available as either an interactive web site or as flat files. Ensembl also integrates manually annotated gene structures from external sources where available. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. These range from sequence analysis to data storage and visualisation and installations exist around the world in both companies and at academic sites. With both human and mouse genome sequences available and more vertebrate sequences to follow, many of the recent developments in Ensembl have focusing on developing automatic comparative genome analysis and visualisation.
Ensembl Genomes 2013: scaling up access to genome-wide data.

Science.gov (United States)

Kersey, Paul Julian; Allen, James E; Christensen, Mikkel; Davis, Paul; Falin, Lee J; Grabmueller, Christoph; Hughes, Daniel Seth Toney; Humphrey, Jay; Kerhornou, Arnaud; Khobova, Julia; Langridge, Nicholas; McDowall, Mark D; Maheswari, Uma; Maslen, Gareth; Nuhn, Michael; Ong, Chuang Kee; Paulini, Michael; Pedro, Helder; Toneva, Iliana; Tuli, Mary Ann; Walts, Brandon; Williams, Gareth; Wilson, Derek; Youens-Clark, Ken; Monaco, Marcela K; Stein, Joshua; Wei, Xuehong; Ware, Doreen; Bolser, Daniel M; Howe, Kevin Lee; Kulesha, Eugene; Lawson, Daniel; Staines, Daniel Michael

2014-01-01

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species. The project exploits and extends technologies for genome annotation, analysis and dissemination, developed in the context of the vertebrate-focused Ensembl project, and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. This article provides an update to the previous publications about the resource, with a focus on recent developments. These include the addition of important new genomes (and related data sets) including crop plants, vectors of human disease and eukaryotic pathogens. In addition, the resource has scaled up its representation of bacterial genomes, and now includes the genomes of over 9000 bacteria. Specific extensions to the web and programmatic interfaces have been developed to support users in navigating these large data sets. Looking forward, analytic tools to allow targeted selection of data for visualization and download are likely to become increasingly important in future as the number of available genomes increases within all domains of life, and some of the challenges faced in representing bacterial data are likely to become commonplace for eukaryotes in future.
The Ensembl genome database project.

Science.gov (United States)

Hubbard, T; Barker, D; Birney, E; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Huminiecki, L; Kasprzyk, A; Lehvaslaiho, H; Lijnzaad, P; Melsopp, C; Mongin, E; Pettett, R; Pocock, M; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Clamp, M

2002-01-01

The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.
Nencki Genomics Database--Ensembl funcgen enhanced with intersections, user data and genome-wide TFBS motifs.

Science.gov (United States)

Krystkowiak, Izabella; Lenart, Jakub; Debski, Konrad; Kuterba, Piotr; Petas, Michal; Kaminska, Bozena; Dabrowski, Michal

2013-01-01

We present the Nencki Genomics Database, which extends the functionality of Ensembl Regulatory Build (funcgen) for the three species: human, mouse and rat. The key enhancements over Ensembl funcgen include the following: (i) a user can add private data, analyze them alongside the public data and manage access rights; (ii) inside the database, we provide efficient procedures for computing intersections between regulatory features and for mapping them to the genes. To Ensembl funcgen-derived data, which include data from ENCODE, we add information on conserved non-coding (putative regulatory) sequences, and on genome-wide occurrence of transcription factor binding site motifs from the current versions of two major motif libraries, namely, Jaspar and Transfac. The intersections and mapping to the genes are pre-computed for the public data, and the result of any procedure run on the data added by the users is stored back into the database, thus incrementally increasing the body of pre-computed data. As the Ensembl funcgen schema for the rat is currently not populated, our database is the first database of regulatory features for this frequently used laboratory animal. The database is accessible without registration using the mysql client: mysql -h database.nencki-genomics.org -u public. Registration is required only to add or access private data. A WSDL webservice provides access to the database from any SOAP client, including the Taverna Workbench with a graphical user interface.
Automated ensemble assembly and validation of microbial genomes

Science.gov (United States)

2014-01-01

Background The continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practices for genome assembly. However, recent evaluations such as GAGE and the Assemblathon have concluded that there is no single best approach to genome assembly. Instead, it is preferable to generate multiple assemblies and validate them to determine which is most useful for the desired analysis; this is a labor-intensive process that is often impossible or unfeasible. Results To encourage best practices supported by the community, we present iMetAMOS, an automated ensemble assembly pipeline; iMetAMOS encapsulates the process of running, validating, and selecting a single assembly from multiple assemblies. iMetAMOS packages several leading open-source tools into a single binary that automates parameter selection and execution of multiple assemblers, scores the resulting assemblies based on multiple validation metrics, and annotates the assemblies for genes and contaminants. We demonstrate the utility of the ensemble process on 225 previously unassembled Mycobacterium tuberculosis genomes as well as a Rhodobacter sphaeroides benchmark dataset. On these real data, iMetAMOS reliably produces validated assemblies and identifies potential contamination without user intervention. In addition, intelligent parameter selection produces assemblies of R. sphaeroides comparable to or exceeding the quality of those from the GAGE-B evaluation, affecting the relative ranking of some assemblers. Conclusions Ensemble assembly with iMetAMOS provides users with multiple, validated assemblies for each genome. Although computationally limited to small or mid-sized genomes, this approach is the most effective and reproducible means for generating high-quality assemblies and enables users to
A Ruby API to query the Ensembl database for genomic features.

Science.gov (United States)

Strozzi, Francesco; Aerts, Jan

2011-04-01

The Ensembl database makes genomic features available via its Genome Browser. It is also possible to access the underlying data through a Perl API for advanced querying. We have developed a full-featured Ruby API to the Ensembl databases, providing the same functionality as the Perl interface with additional features. A single Ruby API is used to access different releases of the Ensembl databases and is also able to query multi-species databases. Most functionality of the API is provided using the ActiveRecord pattern. The library depends on introspection to make it release independent. The API is available through the Rubygem system and can be installed with the command gem install ruby-ensembl-api.
Ensembl 2004.

Science.gov (United States)

Birney, E; Andrews, D; Bevan, P; Caccamo, M; Cameron, G; Chen, Y; Clarke, L; Coates, G; Cox, T; Cuff, J; Curwen, V; Cutts, T; Down, T; Durbin, R; Eyras, E; Fernandez-Suarez, X M; Gane, P; Gibbins, B; Gilbert, J; Hammond, M; Hotz, H; Iyer, V; Kahari, A; Jekosch, K; Kasprzyk, A; Keefe, D; Keenan, S; Lehvaslaiho, H; McVicker, G; Melsopp, C; Meidl, P; Mongin, E; Pettett, R; Potter, S; Proctor, G; Rae, M; Searle, S; Slater, G; Smedley, D; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Storey, R; Ureta-Vidal, A; Woodwark, C; Clamp, M; Hubbard, T

2004-01-01

The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organize biology around the sequences of large genomes. It is a comprehensive and integrated source of annotation of large genome sequences, available via interactive website, web services or flat files. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. The facilities of the system range from sequence analysis to data storage and visualization and installations exist around the world both in companies and at academic sites. With a total of nine genome sequences available from Ensembl and more genomes to follow, recent developments have focused mainly on closer integration between genomes and external data.
Collective Dynamics of Specific Gene Ensembles Crucial for Neutrophil Differentiation: The Existence of Genome Vehicles Revealed

Science.gov (United States)

Giuliani, Alessandro; Tomita, Masaru

2010-01-01

Cell fate decision remarkably generates specific cell differentiation path among the multiple possibilities that can arise through the complex interplay of high-dimensional genome activities. The coordinated action of thousands of genes to switch cell fate decision has indicated the existence of stable attractors guiding the process. However, origins of the intracellular mechanisms that create “cellular attractor” still remain unknown. Here, we examined the collective behavior of genome-wide expressions for neutrophil differentiation through two different stimuli, dimethyl sulfoxide (DMSO) and all-trans-retinoic acid (atRA). To overcome the difficulties of dealing with single gene expression noises, we grouped genes into ensembles and analyzed their expression dynamics in correlation space defined by Pearson correlation and mutual information. The standard deviation of correlation distributions of gene ensembles reduces when the ensemble size is increased following the inverse square root law, for both ensembles chosen randomly from whole genome and ranked according to expression variances across time. Choosing the ensemble size of 200 genes, we show the two probability distributions of correlations of randomly selected genes for atRA and DMSO responses overlapped after 48 hours, defining the neutrophil attractor. Next, tracking the ranked ensembles' trajectories, we noticed that only certain, not all, fall into the attractor in a fractal-like manner. The removal of these genome elements from the whole genomes, for both atRA and DMSO responses, destroys the attractor providing evidence for the existence of specific genome elements (named “genome vehicle”) responsible for the neutrophil attractor. Notably, within the genome vehicles, genes with low or moderate expression changes, which are often considered noisy and insignificant, are essential components for the creation of the neutrophil attractor. Further investigations along with our findings might
Ensembl variation resources

Directory of Open Access Journals (Sweden)

Marin-Garcia Pablo

2010-05-01

Full Text Available Abstract Background The maturing field of genomics is rapidly increasing the number of sequenced genomes and producing more information from those previously sequenced. Much of this additional information is variation data derived from sampling multiple individuals of a given species with the goal of discovering new variants and characterising the population frequencies of the variants that are already known. These data have immense value for many studies, including those designed to understand evolution and connect genotype to phenotype. Maximising the utility of the data requires that it be stored in an accessible manner that facilitates the integration of variation data with other genome resources such as gene annotation and comparative genomics. Description The Ensembl project provides comprehensive and integrated variation resources for a wide variety of chordate genomes. This paper provides a detailed description of the sources of data and the methods for creating the Ensembl variation databases. It also explores the utility of the information by explaining the range of query options available, from using interactive web displays, to online data mining tools and connecting directly to the data servers programmatically. It gives a good overview of the variation resources and future plans for expanding the variation data within Ensembl. Conclusions Variation data is an important key to understanding the functional and phenotypic differences between individuals. The development of new sequencing and genotyping technologies is greatly increasing the amount of variation data known for almost all genomes. The Ensembl variation resources are integrated into the Ensembl genome browser and provide a comprehensive way to access this data in the context of a widely used genome bioinformatics system. All Ensembl data is freely available at http://www.ensembl.org and from the public MySQL database server at ensembldb.ensembl.org.
The Ensembl Web site: mechanics of a genome browser.

Science.gov (United States)

Stalker, James; Gibbins, Brian; Meidl, Patrick; Smith, James; Spooner, William; Hotz, Hans-Rudolf; Cox, Antony V

2004-05-01

The Ensembl Web site (http://www.ensembl.org/) is the principal user interface to the data of the Ensembl project, and currently serves >500,000 pages (approximately 2.5 million hits) per week, providing access to >80 GB (gigabyte) of data to users in more than 80 countries. Built atop an open-source platform comprising Apache/mod_perl and the MySQL relational database management system, it is modular, extensible, and freely available. It is being actively reused and extended in several different projects, and has been downloaded and installed in companies and academic institutions worldwide. Here, we describe some of the technical features of the site, with particular reference to its dynamic configuration that enables it to handle disparate data from multiple species.
Ensembl 2017

OpenAIRE

Aken, Bronwen L.; Achuthan, Premanand; Akanni, Wasiu; Amode, M. Ridwan; Bernsdorff, Friederike; Bhai, Jyothish; Billis, Konstantinos; Carvalho-Silva, Denise; Cummins, Carla; Clapham, Peter; Gil, Laurent; Gir?n, Carlos Garc?a; Gordon, Leo; Hourlier, Thibaut; Hunt, Sarah E.

2016-01-01

Ensembl (www.ensembl.org) is a database and genome browser for enabling research on vertebrate genomes. We import, analyse, curate and integrate a diverse collection of large-scale reference data to create a more comprehensive view of genome biology than would be possible from any individual dataset. Our extensive data resources include evidence-based gene and regulatory region annotation, genome variation and gene trees. An accompanying suite of tools, infrastructure and programmatic access ...
Canonical-ensemble extended Lagrangian Born-Oppenheimer molecular dynamics for the linear scaling density functional theory.

Science.gov (United States)

Hirakawa, Teruo; Suzuki, Teppei; Bowler, David R; Miyazaki, Tsuyoshi

2017-10-11

We discuss the development and implementation of a constant temperature (NVT) molecular dynamics scheme that combines the Nosé-Hoover chain thermostat with the extended Lagrangian Born-Oppenheimer molecular dynamics (BOMD) scheme, using a linear scaling density functional theory (DFT) approach. An integration scheme for this canonical-ensemble extended Lagrangian BOMD is developed and discussed in the context of the Liouville operator formulation. Linear scaling DFT canonical-ensemble extended Lagrangian BOMD simulations are tested on bulk silicon and silicon carbide systems to evaluate our integration scheme. The results show that the conserved quantity remains stable with no systematic drift even in the presence of the thermostat.
Statistical Viewer: a tool to upload and integrate linkage and association data as plots displayed within the Ensembl genome browser

Directory of Open Access Journals (Sweden)

Hauser Elizabeth R

2005-04-01

Full Text Available Abstract Background To facilitate efficient selection and the prioritization of candidate complex disease susceptibility genes for association analysis, increasingly comprehensive annotation tools are essential to integrate, visualize and analyze vast quantities of disparate data generated by genomic screens, public human genome sequence annotation and ancillary biological databases. We have developed a plug-in package for Ensembl called "Statistical Viewer" that facilitates the analysis of genomic features and annotation in the regions of interest defined by linkage analysis. Results Statistical Viewer is an add-on package to the open-source Ensembl Genome Browser and Annotation System that displays disease study-specific linkage and/or association data as 2 dimensional plots in new panels in the context of Ensembl's Contig View and Cyto View pages. An enhanced upload server facilitates the upload of statistical data, as well as additional feature annotation to be displayed in DAS tracts, in the form of Excel Files. The Statistical View panel, drawn directly under the ideogram, illustrates lod score values for markers from a study of interest that are plotted against their position in base pairs. A module called "Get Map" easily converts the genetic locations of markers to genomic coordinates. The graph is placed under the corresponding ideogram features a synchronized vertical sliding selection box that is seamlessly integrated into Ensembl's Contig- and Cyto- View pages to choose the region to be displayed in Ensembl's "Overview" and "Detailed View" panels. To resolve Association and Fine mapping data plots, a "Detailed Statistic View" plot corresponding to the "Detailed View" may be displayed underneath. Conclusion Features mapping to regions of linkage are accentuated when Statistic View is used in conjunction with the Distributed Annotation System (DAS to display supplemental laboratory information such as differentially expressed disease
Breaking-Cas—interactive design of guide RNAs for CRISPR-Cas experiments for ENSEMBL genomes

Science.gov (United States)

Oliveros, Juan C.; Franch, Mònica; Tabas-Madrid, Daniel; San-León, David; Montoliu, Lluis; Cubas, Pilar; Pazos, Florencio

2016-01-01

The CRISPR/Cas technology is enabling targeted genome editing in multiple organisms with unprecedented accuracy and specificity by using RNA-guided nucleases. A critical point when planning a CRISPR/Cas experiment is the design of the guide RNA (gRNA), which directs the nuclease and associated machinery to the desired genomic location. This gRNA has to fulfil the requirements of the nuclease and lack homology with other genome sites that could lead to off-target effects. Here we introduce the Breaking-Cas system for the design of gRNAs for CRISPR/Cas experiments, including those based in the Cas9 nuclease as well as others recently introduced. The server has unique features not available in other tools, including the possibility of using all eukaryotic genomes available in ENSEMBL (currently around 700), placing variable PAM sequences at 5′ or 3′ and setting the guide RNA length and the scores per nucleotides. It can be freely accessed at: http://bioinfogp.cnb.csic.es/tools/breakingcas, and the code is available upon request. PMID:27166368
HEPS4Power - Extended-range Hydrometeorological Ensemble Predictions for Improved Hydropower Operations and Revenues

Science.gov (United States)

Bogner, Konrad; Monhart, Samuel; Liniger, Mark; Spririg, Christoph; Jordan, Fred; Zappa, Massimiliano

2015-04-01

In recent years large progresses have been achieved in the operational prediction of floods and hydrological drought with up to ten days lead time. Both the public and the private sectors are currently using probabilistic runoff forecast in order to monitoring water resources and take actions when critical conditions are to be expected. The use of extended-range predictions with lead times exceeding 10 days is not yet established. The hydropower sector in particular might have large benefits from using hydro meteorological forecasts for the next 15 to 60 days in order to optimize the operations and the revenues from their watersheds, dams, captions, turbines and pumps. The new Swiss Competence Centers in Energy Research (SCCER) targets at boosting research related to energy issues in Switzerland. The objective of HEPS4POWER is to demonstrate that operational extended-range hydro meteorological forecasts have the potential to become very valuable tools for fine tuning the production of energy from hydropower systems. The project team covers a specific system-oriented value chain starting from the collection and forecast of meteorological data (MeteoSwiss), leading to the operational application of state-of-the-art hydrological models (WSL) and terminating with the experience in data presentation and power production forecasts for end-users (e-dric.ch). The first task of the HEPS4POWER will be the downscaling and post-processing of ensemble extended-range meteorological forecasts (EPS). The goal is to provide well-tailored forecasts of probabilistic nature that should be reliable in statistical and localized at catchment or even station level. The hydrology related task will consist in feeding the post-processed meteorological forecasts into a HEPS using a multi-model approach by implementing models with different complexity. Also in the case of the hydrological ensemble predictions, post-processing techniques need to be tested in order to improve the quality of the
Extension of the GHJW theorem for operator ensembles

International Nuclear Information System (INIS)

Choi, Jeong Woon; Hong, Dowon; Chang, Ku-Young; Chi, Dong Pyo; Lee, Soojoon

2011-01-01

The Gisin-Hughston-Jozsa-Wootters theorem plays an important role in analyzing various theories about quantum information, quantum communication, and quantum cryptography. It means that any purifications on the extended system which yield indistinguishable state ensembles on their subsystem should have a specific local unitary relation. In this Letter, we show that the local relation is also established even when the indistinguishability of state ensembles is extended to that of operator ensembles.
MSEBAG: a dynamic classifier ensemble generation based on `minimum-sufficient ensemble' and bagging

Science.gov (United States)

Chen, Lei; Kamel, Mohamed S.

2016-01-01

In this paper, we propose a dynamic classifier system, MSEBAG, which is characterised by searching for the 'minimum-sufficient ensemble' and bagging at the ensemble level. It adopts an 'over-generation and selection' strategy and aims to achieve a good bias-variance trade-off. In the training phase, MSEBAG first searches for the 'minimum-sufficient ensemble', which maximises the in-sample fitness with the minimal number of base classifiers. Then, starting from the 'minimum-sufficient ensemble', a backward stepwise algorithm is employed to generate a collection of ensembles. The objective is to create a collection of ensembles with a descending fitness on the data, as well as a descending complexity in the structure. MSEBAG dynamically selects the ensembles from the collection for the decision aggregation. The extended adaptive aggregation (EAA) approach, a bagging-style algorithm performed at the ensemble level, is employed for this task. EAA searches for the competent ensembles using a score function, which takes into consideration both the in-sample fitness and the confidence of the statistical inference, and averages the decisions of the selected ensembles to label the test pattern. The experimental results show that the proposed MSEBAG outperforms the benchmarks on average.
On the use of transition matrix methods with extended ensembles.

Science.gov (United States)

Escobedo, Fernando A; Abreu, Charlles R A

2006-03-14

Different extended ensemble schemes for non-Boltzmann sampling (NBS) of a selected reaction coordinate lambda were formulated so that they employ (i) "variable" sampling window schemes (that include the "successive umbrella sampling" method) to comprehensibly explore the lambda domain and (ii) transition matrix methods to iteratively obtain the underlying free-energy eta landscape (or "importance" weights) associated with lambda. The connection between "acceptance ratio" and transition matrix methods was first established to form the basis of the approach for estimating eta(lambda). The validity and performance of the different NBS schemes were then assessed using as lambda coordinate the configurational energy of the Lennard-Jones fluid. For the cases studied, it was found that the convergence rate in the estimation of eta is little affected by the use of data from high-order transitions, while it is noticeably improved by the use of a broader window of sampling in the variable window methods. Finally, it is shown how an "elastic" window of sampling can be used to effectively enact (nonuniform) preferential sampling over the lambda domain, and how to stitch the weights from separate one-dimensional NBS runs to produce a eta surface over a two-dimensional domain.

Extending Climate Analytics as a Service to the Earth System Grid Federation Progress Report on the Reanalysis Ensemble Service

Science.gov (United States)

Tamkin, G.; Schnase, J. L.; Duffy, D.; Li, J.; Strong, S.; Thompson, J. H.

2016-12-01

We are extending climate analytics-as-a-service, including: (1) A high-performance Virtual Real-Time Analytics Testbed supporting six major reanalysis data sets using advanced technologies like the Cloudera Impala-based SQL and Hadoop-based MapReduce analytics over native NetCDF files. (2) A Reanalysis Ensemble Service (RES) that offers a basic set of commonly used operations over the reanalysis collections that are accessible through NASA's climate data analytics Web services and our client-side Climate Data Services Python library, CDSlib. (3) An Open Geospatial Consortium (OGC) WPS-compliant Web service interface to CDSLib to accommodate ESGF's Web service endpoints. This presentation will report on the overall progress of this effort, with special attention to recent enhancements that have been made to the Reanalysis Ensemble Service, including the following: - An CDSlib Python library that supports full temporal, spatial, and grid-based resolution services - A new reanalysis collections reference model to enable operator design and implementation - An enhanced library of sample queries to demonstrate and develop use case scenarios - Extended operators that enable single- and multiple reanalysis area average, vertical average, re-gridding, and trend, climatology, and anomaly computations - Full support for the MERRA-2 reanalysis and the initial integration of two additional reanalyses - A prototype Jupyter notebook-based distribution mechanism that combines CDSlib documentation with interactive use case scenarios and personalized project management - Prototyped uncertainty quantification services that combine ensemble products with comparative observational products - Convenient, one-stop shopping for commonly used data products from multiple reanalyses, including basic subsetting and arithmetic operations over the data and extractions of trends, climatologies, and anomalies - The ability to compute and visualize multiple reanalysis intercomparisons
Disease-associated mutations that alter the RNA structural ensemble.

Directory of Open Access Journals (Sweden)

Matthew Halvorsen

2010-08-01

Full Text Available Genome-wide association studies (GWAS often identify disease-associated mutations in intergenic and non-coding regions of the genome. Given the high percentage of the human genome that is transcribed, we postulate that for some observed associations the disease phenotype is caused by a structural rearrangement in a regulatory region of the RNA transcript. To identify such mutations, we have performed a genome-wide analysis of all known disease-associated Single Nucleotide Polymorphisms (SNPs from the Human Gene Mutation Database (HGMD that map to the untranslated regions (UTRs of a gene. Rather than using minimum free energy approaches (e.g. mFold, we use a partition function calculation that takes into consideration the ensemble of possible RNA conformations for a given sequence. We identified in the human genome disease-associated SNPs that significantly alter the global conformation of the UTR to which they map. For six disease-states (Hyperferritinemia Cataract Syndrome, beta-Thalassemia, Cartilage-Hair Hypoplasia, Retinoblastoma, Chronic Obstructive Pulmonary Disease (COPD, and Hypertension, we identified multiple SNPs in UTRs that alter the mRNA structural ensemble of the associated genes. Using a Boltzmann sampling procedure for sub-optimal RNA structures, we are able to characterize and visualize the nature of the conformational changes induced by the disease-associated mutations in the structural ensemble. We observe in several cases (specifically the 5' UTRs of FTL and RB1 SNP-induced conformational changes analogous to those observed in bacterial regulatory Riboswitches when specific ligands bind. We propose that the UTR and SNP combinations we identify constitute a "RiboSNitch," that is a regulatory RNA in which a specific SNP has a structural consequence that results in a disease phenotype. Our SNPfold algorithm can help identify RiboSNitches by leveraging GWAS data and an analysis of the mRNA structural ensemble.
Evaluations of Extended-Range tropical Cyclone Forecasts in the Western North Pacific by using the Ensemble Reforecasts: Preliminary Results

Science.gov (United States)

Tsai, Hsiao-Chung; Chen, Pang-Cheng; Elsberry, Russell L.

2017-04-01

The objective of this study is to evaluate the predictability of the extended-range forecasts of tropical cyclone (TC) in the western North Pacific using reforecasts from National Centers for Environmental Prediction (NCEP) Global Ensemble Forecast System (GEFS) during 1996-2015, and from the Climate Forecast System (CFS) during 1999-2010. Tsai and Elsberry have demonstrated that an opportunity exists to support hydrological operations by using the extended-range TC formation and track forecasts in the western North Pacific from the ECMWF 32-day ensemble. To demonstrate this potential for the decision-making processes regarding water resource management and hydrological operation in Taiwan reservoir watershed areas, special attention is given to the skill of the NCEP GEFS and CFS models in predicting the TCs affecting the Taiwan area. The first objective of this study is to analyze the skill of NCEP GEFS and CFS TC forecasts and quantify the forecast uncertainties via verifications of categorical binary forecasts and probabilistic forecasts. The second objective is to investigate the relationships among the large-scale environmental factors [e.g., El Niño Southern Oscillation (ENSO), Madden-Julian Oscillation (MJO), etc.] and the model forecast errors by using the reforecasts. Preliminary results are indicating that the skill of the TC activity forecasts based on the raw forecasts can be further improved if the model biases are minimized by utilizing these reforecasts.
Deterministic Mean-Field Ensemble Kalman Filtering

KAUST Repository

Law, Kody

2016-05-03

The proof of convergence of the standard ensemble Kalman filter (EnKF) from Le Gland, Monbet, and Tran [Large sample asymptotics for the ensemble Kalman filter, in The Oxford Handbook of Nonlinear Filtering, Oxford University Press, Oxford, UK, 2011, pp. 598--631] is extended to non-Gaussian state-space models. A density-based deterministic approximation of the mean-field limit EnKF (DMFEnKF) is proposed, consisting of a PDE solver and a quadrature rule. Given a certain minimal order of convergence k between the two, this extends to the deterministic filter approximation, which is therefore asymptotically superior to standard EnKF for dimension d<2k. The fidelity of approximation of the true distribution is also established using an extension of the total variation metric to random measures. This is limited by a Gaussian bias term arising from nonlinearity/non-Gaussianity of the model, which arises in both deterministic and standard EnKF. Numerical results support and extend the theory.
Deterministic Mean-Field Ensemble Kalman Filtering

KAUST Repository

Law, Kody; Tembine, Hamidou; Tempone, Raul

2016-01-01

The proof of convergence of the standard ensemble Kalman filter (EnKF) from Le Gland, Monbet, and Tran [Large sample asymptotics for the ensemble Kalman filter, in The Oxford Handbook of Nonlinear Filtering, Oxford University Press, Oxford, UK, 2011, pp. 598--631] is extended to non-Gaussian state-space models. A density-based deterministic approximation of the mean-field limit EnKF (DMFEnKF) is proposed, consisting of a PDE solver and a quadrature rule. Given a certain minimal order of convergence k between the two, this extends to the deterministic filter approximation, which is therefore asymptotically superior to standard EnKF for dimension d<2k. The fidelity of approximation of the true distribution is also established using an extension of the total variation metric to random measures. This is limited by a Gaussian bias term arising from nonlinearity/non-Gaussianity of the model, which arises in both deterministic and standard EnKF. Numerical results support and extend the theory.
Reducing false-positive incidental findings with ensemble genotyping and logistic regression based variant filtering methods.

Science.gov (United States)

Hwang, Kyu-Baek; Lee, In-Hee; Park, Jin-Ho; Hambuch, Tina; Choe, Yongjoon; Kim, MinHyeok; Lee, Kyungjoon; Song, Taemin; Neu, Matthew B; Gupta, Neha; Kohane, Isaac S; Green, Robert C; Kong, Sek Won

2014-08-01

As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false-positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here, we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false-negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous single nucleotide variants (SNVs); 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery in NA12878, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and an ensemble genotyping would be essential to minimize false-positive DNM candidates. © 2014 WILEY PERIODICALS, INC.
Congruence as a measurement of extended haplotype structure across the genome

Science.gov (United States)

2012-01-01

Background Historically, extended haplotypes have been defined using only a few data points, such as alleles for several HLA genes in the MHC. High-density SNP data, and the increasing affordability of whole genome SNP typing, creates the opportunity to define higher resolution extended haplotypes. This drives the need for new tools that support quantification and visualization of extended haplotypes as defined by as many as 2000 SNPs. Confronted with high-density SNP data across the major histocompatibility complex (MHC) for 2,300 complete families, compiled by the Type 1 Diabetes Genetics Consortium (T1DGC), we developed software for studying extended haplotypes. Methods The software, called ExHap (Extended Haplotype), uses a similarity measurement we term congruence to identify and quantify long-range allele identity. Using ExHap, we analyzed congruence in both the T1DGC data and family-phased data from the International HapMap Project. Results Congruent chromosomes from the T1DGC data have between 96.5% and 99.9% allele identity over 1,818 SNPs spanning 2.64 megabases of the MHC (HLA-DRB1 to HLA-A). Thirty-three of 132 DQ-DR-B-A defined haplotype groups have > 50% congruent chromosomes in this region. For example, 92% of chromosomes within the DR3-B8-A1 haplotype are congruent from HLA-DRB1 to HLA-A (99.8% allele identity). We also applied ExHap to all 22 autosomes for both CEU and YRI cohorts from the International HapMap Project, identifying multiple candidate extended haplotypes. Conclusions Long-range congruence is not unique to the MHC region. Patterns of allele identity on phased chromosomes provide a simple, straightforward approach to visually and quantitatively inspect complex long-range structural patterns in the genome. Such patterns aid the biologist in appreciating genetic similarities and differences across cohorts, and can lead to hypothesis generation for subsequent studies. PMID:22369243
Comparing the ensemble and extended Kalman filters for in situ soil moisture assimilation with contrasting conditions

Directory of Open Access Journals (Sweden)

D. Fairbairn

2015-12-01

Full Text Available Two data assimilation (DA methods are compared for their ability to produce an accurate soil moisture analysis using the Météo-France land surface model: (i SEKF, a simplified extended Kalman filter, which uses a climatological background-error covariance, and (ii EnSRF, the ensemble square root filter, which uses an ensemble background-error covariance and approximates random rainfall errors stochastically. In situ soil moisture observations at 5 cm depth are assimilated into the surface layer and 30 cm deep observations are used to evaluate the root-zone analysis on 12 sites in south-western France (SMOSMANIA network. These sites differ in terms of climate and soil texture. The two methods perform similarly and improve on the open loop. Both methods suffer from incorrect linear assumptions which are particularly degrading to the analysis during water-stressed conditions: the EnSRF by a dry bias and the SEKF by an over-sensitivity of the model Jacobian between the surface and the root-zone layers. These problems are less severe for the sites with wetter climates. A simple bias correction technique is tested on the EnSRF. Although this reduces the bias, it modifies the soil moisture fluxes and suppresses the ensemble spread, which degrades the analysis performance. However, the EnSRF flow-dependent background-error covariance evidently captures seasonal variability in the soil moisture errors and should exploit planned improvements in the model physics. Synthetic twin experiments demonstrate that when there is only a random component in the precipitation forcing errors, the correct stochastic representation of these errors enables the EnSRF to perform better than the SEKF. It might therefore be possible for the EnSRF to perform better than the SEKF with real data, if the rainfall uncertainty was accurately captured. However, the simple rainfall error model is not advantageous in our real experiments. More realistic rainfall error models are
Training set extension for SVM ensemble in P300-speller with familiar face paradigm.

Science.gov (United States)

Li, Qi; Shi, Kaiyang; Gao, Ning; Li, Jian; Bai, Ou

2018-03-27

P300-spellers are brain-computer interface (BCI)-based character input systems. Support vector machine (SVM) ensembles are trained with large-scale training sets and used as classifiers in these systems. However, the required large-scale training data necessitate a prolonged collection time for each subject, which results in data collected toward the end of the period being contaminated by the subject's fatigue. This study aimed to develop a method for acquiring more training data based on a collected small training set. A new method was developed in which two corresponding training datasets in two sequences are superposed and averaged to extend the training set. The proposed method was tested offline on a P300-speller with the familiar face paradigm. The SVM ensemble with extended training set achieved 85% classification accuracy for the averaged results of four sequences, and 100% for 11 sequences in the P300-speller. In contrast, the conventional SVM ensemble with non-extended training set achieved only 65% accuracy for four sequences, and 92% for 11 sequences. The SVM ensemble with extended training set achieves higher classification accuracies than the conventional SVM ensemble, which verifies that the proposed method effectively improves the classification performance of BCI P300-spellers, thus enhancing their practicality.
A class of energy-based ensembles in Tsallis statistics

International Nuclear Information System (INIS)

Chandrashekar, R; Naina Mohammed, S S

2011-01-01

A comprehensive investigation is carried out on the class of energy-based ensembles. The eight ensembles are divided into two main classes. In the isothermal class of ensembles the individual members are at the same temperature. A unified framework is evolved to describe the four isothermal ensembles using the currently accepted third constraint formalism. The isothermal–isobaric, grand canonical and generalized ensembles are illustrated through a study of the classical nonrelativistic and extreme relativistic ideal gas models. An exact calculation is possible only in the case of the isothermal–isobaric ensemble. The study of the ideal gas models in the grand canonical and the generalized ensembles has been carried out using a perturbative procedure with the nonextensivity parameter (1 − q) as the expansion parameter. Though all the thermodynamic quantities have been computed up to a particular order in (1 − q) the procedure can be extended up to any arbitrary order in the expansion parameter. In the adiabatic class of ensembles the individual members of the ensemble have the same value of the heat function and a unified formulation to described all four ensembles is given. The nonrelativistic and the extreme relativistic ideal gases are studied in the isoenthalpic–isobaric ensemble, the adiabatic ensemble with number fluctuations and the adiabatic ensemble with number and particle fluctuations
Supersymmetry applied to the spectrum edge of random matrix ensembles

International Nuclear Information System (INIS)

Andreev, A.V.; Simons, B.D.; Taniguchi, N.

1994-01-01

A new matrix ensemble has recently been proposed to describe the transport properties in mesoscopic quantum wires. Both analytical and numerical studies have shown that the ensemble of Laguerre or of chiral random matrices provides a good description of scattering properties in this class of systems. Until now only conventional methods of random matrix theory have been used to study statistical properties within this ensemble. We demonstrate that the supersymmetry method, already employed in the study Dyson ensembles, can be extended to treat this class of random matrix ensembles. In developing this approach we investigate both new, as well as verify known statistical measures. Although we focus on ensembles in which T-invariance is violated our approach lays the foundation for future studies of T-invariant systems. ((orig.))
An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets.

Science.gov (United States)

Stanescu, Ana; Caragea, Doina

2015-01-01

Recent biochemical advances have led to inexpensive, time-efficient production of massive volumes of raw genomic data. Traditional machine learning approaches to genome annotation typically rely on large amounts of labeled data. The process of labeling data can be expensive, as it requires domain knowledge and expert involvement. Semi-supervised learning approaches that can make use of unlabeled data, in addition to small amounts of labeled data, can help reduce the costs associated with labeling. In this context, we focus on the problem of predicting splice sites in a genome using semi-supervised learning approaches. This is a challenging problem, due to the highly imbalanced distribution of the data, i.e., small number of splice sites as compared to the number of non-splice sites. To address this challenge, we propose to use ensembles of semi-supervised classifiers, specifically self-training and co-training classifiers. Our experiments on five highly imbalanced splice site datasets, with positive to negative ratios of 1-to-99, showed that the ensemble-based semi-supervised approaches represent a good choice, even when the amount of labeled data consists of less than 1% of all training data. In particular, we found that ensembles of co-training and self-training classifiers that dynamically balance the set of labeled instances during the semi-supervised iterations show improvements over the corresponding supervised ensemble baselines. In the presence of limited amounts of labeled data, ensemble-based semi-supervised approaches can successfully leverage the unlabeled data to enhance supervised ensembles learned from highly imbalanced data distributions. Given that such distributions are common for many biological sequence classification problems, our work can be seen as a stepping stone towards more sophisticated ensemble-based approaches to biological sequence annotation in a semi-supervised framework.
JEnsembl: a version-aware Java API to Ensembl data systems.

Science.gov (United States)

Paterson, Trevor; Law, Andy

2012-11-01

The Ensembl Project provides release-specific Perl APIs for efficient high-level programmatic access to data stored in various Ensembl database schema. Although Perl scripts are perfectly suited for processing large volumes of text-based data, Perl is not ideal for developing large-scale software applications nor embedding in graphical interfaces. The provision of a novel Java API would facilitate type-safe, modular, object-orientated development of new Bioinformatics tools with which to access, analyse and visualize Ensembl data. The JEnsembl API implementation provides basic data retrieval and manipulation functionality from the Core, Compara and Variation databases for all species in Ensembl and EnsemblGenomes and is a platform for the development of a richer API to Ensembl datasources. The JEnsembl architecture uses a text-based configuration module to provide evolving, versioned mappings from database schema to code objects. A single installation of the JEnsembl API can therefore simultaneously and transparently connect to current and previous database instances (such as those in the public archive) thus facilitating better analysis repeatability and allowing 'through time' comparative analyses to be performed. Project development, released code libraries, Maven repository and documentation are hosted at SourceForge (http://jensembl.sourceforge.net).
The Ensembl REST API: Ensembl Data for Any Language.

Science.gov (United States)

Yates, Andrew; Beal, Kathryn; Keenan, Stephen; McLaren, William; Pignatelli, Miguel; Ritchie, Graham R S; Ruffier, Magali; Taylor, Kieron; Vullo, Alessandro; Flicek, Paul

2015-01-01

We present a Web service to access Ensembl data using Representational State Transfer (REST). The Ensembl REST server enables the easy retrieval of a wide range of Ensembl data by most programming languages, using standard formats such as JSON and FASTA while minimizing client work. We also introduce bindings to the popular Ensembl Variant Effect Predictor tool permitting large-scale programmatic variant analysis independent of any specific programming language. The Ensembl REST API can be accessed at http://rest.ensembl.org and source code is freely available under an Apache 2.0 license from http://github.com/Ensembl/ensembl-rest. © The Author 2014. Published by Oxford University Press.
A Hyper-Heuristic Ensemble Method for Static Job-Shop Scheduling.

Science.gov (United States)

Hart, Emma; Sim, Kevin

2016-01-01

We describe a new hyper-heuristic method NELLI-GP for solving job-shop scheduling problems (JSSP) that evolves an ensemble of heuristics. The ensemble adopts a divide-and-conquer approach in which each heuristic solves a unique subset of the instance set considered. NELLI-GP extends an existing ensemble method called NELLI by introducing a novel heuristic generator that evolves heuristics composed of linear sequences of dispatching rules: each rule is represented using a tree structure and is itself evolved. Following a training period, the ensemble is shown to outperform both existing dispatching rules and a standard genetic programming algorithm on a large set of new test instances. In addition, it obtains superior results on a set of 210 benchmark problems from the literature when compared to two state-of-the-art hyper-heuristic approaches. Further analysis of the relationship between heuristics in the evolved ensemble and the instances each solves provides new insights into features that might describe similar instances.
An Assessment of the Subseasonal Forecast Performance in the Extended Global Ensemble Forecast System (GEFS)

Science.gov (United States)

Sinsky, E.; Zhu, Y.; Li, W.; Guan, H.; Melhauser, C.

2017-12-01

Optimal forecast quality is crucial for the preservation of life and property. Improving monthly forecast performance over both the tropics and extra-tropics requires attention to various physical aspects such as the representation of the underlying SST, model physics and the representation of the model physics uncertainty for an ensemble forecast system. This work focuses on the impact of stochastic physics, SST and the convection scheme on forecast performance for the sub-seasonal scale over the tropics and extra-tropics with emphasis on the Madden-Julian Oscillation (MJO). A 2-year period is evaluated using the National Centers for Environmental Prediction (NCEP) Global Ensemble Forecast System (GEFS). Three experiments with different configurations than the operational GEFS were performed to illustrate the impact of the stochastic physics, SST and convection scheme. These experiments are compared against a control experiment (CTL) which consists of the operational GEFS but its integration is extended from 16 to 35 days. The three configurations are: 1) SPs, which uses a Stochastically Perturbed Physics Tendencies (SPPT), Stochastic Perturbed Humidity (SHUM) and Stochastic Kinetic Energy Backscatter (SKEB); 2) SPs+SST_bc, which uses a combination of SPs and a bias-corrected forecast SST from the NCEP Climate Forecast System Version 2 (CFSv2); and 3) SPs+SST_bc+SA_CV, which combines SPs, a bias-corrected forecast SST and a scale aware convection scheme. When comparing to the CTL experiment, SPs shows substantial improvement. The MJO skill has improved by about 4 lead days during the 2-year period. Improvement is also seen over the extra-tropics due to the updated stochastic physics, where there is a 3.1% and a 4.2% improvement during weeks 3 and 4 over the northern hemisphere and southern hemisphere, respectively. Improvement is also seen when the bias-corrected CFSv2 SST is combined with SPs. Additionally, forecast performance enhances when the scale aware
Unsupervised Learning in an Ensemble of Spiking Neural Networks Mediated by ITDP.

Directory of Open Access Journals (Sweden)

Yoonsik Shim

2016-10-01

Full Text Available We propose a biologically plausible architecture for unsupervised ensemble learning in a population of spiking neural network classifiers. A mixture of experts type organisation is shown to be effective, with the individual classifier outputs combined via a gating network whose operation is driven by input timing dependent plasticity (ITDP. The ITDP gating mechanism is based on recent experimental findings. An abstract, analytically tractable model of the ITDP driven ensemble architecture is derived from a logical model based on the probabilities of neural firing events. A detailed analysis of this model provides insights that allow it to be extended into a full, biologically plausible, computational implementation of the architecture which is demonstrated on a visual classification task. The extended model makes use of a style of spiking network, first introduced as a model of cortical microcircuits, that is capable of Bayesian inference, effectively performing expectation maximization. The unsupervised ensemble learning mechanism, based around such spiking expectation maximization (SEM networks whose combined outputs are mediated by ITDP, is shown to perform the visual classification task well and to generalize to unseen data. The combined ensemble performance is significantly better than that of the individual classifiers, validating the ensemble architecture and learning mechanisms. The properties of the full model are analysed in the light of extensive experiments with the classification task, including an investigation into the influence of different input feature selection schemes and a comparison with a hierarchical STDP based ensemble architecture.
Unsupervised Learning in an Ensemble of Spiking Neural Networks Mediated by ITDP.

Science.gov (United States)

Shim, Yoonsik; Philippides, Andrew; Staras, Kevin; Husbands, Phil

2016-10-01

We propose a biologically plausible architecture for unsupervised ensemble learning in a population of spiking neural network classifiers. A mixture of experts type organisation is shown to be effective, with the individual classifier outputs combined via a gating network whose operation is driven by input timing dependent plasticity (ITDP). The ITDP gating mechanism is based on recent experimental findings. An abstract, analytically tractable model of the ITDP driven ensemble architecture is derived from a logical model based on the probabilities of neural firing events. A detailed analysis of this model provides insights that allow it to be extended into a full, biologically plausible, computational implementation of the architecture which is demonstrated on a visual classification task. The extended model makes use of a style of spiking network, first introduced as a model of cortical microcircuits, that is capable of Bayesian inference, effectively performing expectation maximization. The unsupervised ensemble learning mechanism, based around such spiking expectation maximization (SEM) networks whose combined outputs are mediated by ITDP, is shown to perform the visual classification task well and to generalize to unseen data. The combined ensemble performance is significantly better than that of the individual classifiers, validating the ensemble architecture and learning mechanisms. The properties of the full model are analysed in the light of extensive experiments with the classification task, including an investigation into the influence of different input feature selection schemes and a comparison with a hierarchical STDP based ensemble architecture.
Flood Forecasting Based on TIGGE Precipitation Ensemble Forecast

Directory of Open Access Journals (Sweden)

Jinyin Ye

2016-01-01

Full Text Available TIGGE (THORPEX International Grand Global Ensemble was a major part of the THORPEX (Observing System Research and Predictability Experiment. It integrates ensemble precipitation products from all the major forecast centers in the world and provides systematic evaluation on the multimodel ensemble prediction system. Development of meteorologic-hydrologic coupled flood forecasting model and early warning model based on the TIGGE precipitation ensemble forecast can provide flood probability forecast, extend the lead time of the flood forecast, and gain more time for decision-makers to make the right decision. In this study, precipitation ensemble forecast products from ECMWF, NCEP, and CMA are used to drive distributed hydrologic model TOPX. We focus on Yi River catchment and aim to build a flood forecast and early warning system. The results show that the meteorologic-hydrologic coupled model can satisfactorily predict the flow-process of four flood events. The predicted occurrence time of peak discharges is close to the observations. However, the magnitude of the peak discharges is significantly different due to various performances of the ensemble prediction systems. The coupled forecasting model can accurately predict occurrence of the peak time and the corresponding risk probability of peak discharge based on the probability distribution of peak time and flood warning, which can provide users a strong theoretical foundation and valuable information as a promising new approach.
Realization of Deutsch-like algorithm using ensemble computing

International Nuclear Information System (INIS)

Wei Daxiu; Luo Jun; Sun Xianping; Zeng Xizhi

2003-01-01

The Deutsch-like algorithm [Phys. Rev. A. 63 (2001) 034101] distinguishes between even and odd query functions using fewer function calls than its possible classical counterpart in a two-qubit system. But the similar method cannot be applied to a multi-qubit system. We propose a new approach for solving Deutsch-like problem using ensemble computing. The proposed algorithm needs an ancillary qubit and can be easily extended to multi-qubit system with one query. Our ensemble algorithm beginning with a easily-prepared initial state has three main steps. The classifications of the functions can be obtained directly from the spectra of the ancilla qubit. We also demonstrate the new algorithm in a four-qubit molecular system using nuclear magnetic resonance (NMR). One hydrogen and three carbons are selected as the four qubits, and one of carbons is ancilla qubit. We choice two unitary transformations, corresponding to two functions (one odd function and one even function), to validate the ensemble algorithm. The results show that our experiment is successfully and our ensemble algorithm for solving the Deutsch-like problem is virtual

Draft genomes and reference transcriptomes extend the coding potential of the fish pathogen Piscirickettsia salmonis

Directory of Open Access Journals (Sweden)

Angela D. Millar

2018-05-01

Full Text Available Background: Draft and complete genome sequences from bacteria are key tools to understand genetic determinants involved in pathogenesis in several disease models. Piscirickettsia salmonis is a Gram-negative bacterium responsible for the Salmon Rickettsial Syndrome (SRS, a bacterial disease that threatens the sustainability of the Chilean salmon industry. In previous reports, complete and draft genome sequences have been generated and annotated. However, the lack of transcriptome data underestimates the genetic potential, does not provide information about transcriptional units and contributes to disseminate annotation errors. Results: Here we present the draft genome and transcriptome sequences of four P. salmonis strains. We have identified the transcriptional architecture of previously characterized virulence factors and trait-specific genes associated to cation uptake, metal efflux, antibiotic resistance, secretion systems and other virulence factors. Conclusions: This data has provided a refined genome annotation and also new insights on the transcriptional structures and coding potential of this fish pathogen.How to cite: Millar AD, Tapia P, Gomez FA, et al. Draft genomes and reference transcriptomes extend the coding potential of the fish pathogen Piscirickettsia salmonis. Electron J Biotechnol 2018;33. https://doi.org/10.1016/j.ejbt.2018.04.002. Keywords: Bacterial genomes, Coding potential, Comparative analysis, Draft genome, Piscirickettsia salmonis, Reference transcriptome, Refined annotation, Salmon Rickettsial Syndrome, Salmonids
Pauci ex tanto numero: reduce redundancy in multi-model ensembles

Science.gov (United States)

Solazzo, E.; Riccio, A.; Kioutsioukis, I.; Galmarini, S.

2013-08-01

We explicitly address the fundamental issue of member diversity in multi-model ensembles. To date, no attempts in this direction have been documented within the air quality (AQ) community despite the extensive use of ensembles in this field. Common biases and redundancy are the two issues directly deriving from lack of independence, undermining the significance of a multi-model ensemble, and are the subject of this study. Shared, dependant biases among models do not cancel out but will instead determine a biased ensemble. Redundancy derives from having too large a portion of common variance among the members of the ensemble, producing overconfidence in the predictions and underestimation of the uncertainty. The two issues of common biases and redundancy are analysed in detail using the AQMEII ensemble of AQ model results for four air pollutants in two European regions. We show that models share large portions of bias and variance, extending well beyond those induced by common inputs. We make use of several techniques to further show that subsets of models can explain the same amount of variance as the full ensemble with the advantage of being poorly correlated. Selecting the members for generating skilful, non-redundant ensembles from such subsets proved, however, non-trivial. We propose and discuss various methods of member selection and rate the ensemble performance they produce. In most cases, the full ensemble is outscored by the reduced ones. We conclude that, although independence of outputs may not always guarantee enhancement of scores (but this depends upon the skill being investigated), we discourage selecting the members of the ensemble simply on the basis of scores; that is, independence and skills need to be considered disjointly.
Substrate-specific reorganization of the conformational ensemble of CSK implicates novel modes of kinase function.

Directory of Open Access Journals (Sweden)

Michael A Jamros

Full Text Available Protein kinases use ATP as a phosphoryl donor for the posttranslational modification of signaling targets. It is generally thought that the binding of this nucleotide induces conformational changes leading to closed, more compact forms of the kinase domain that ideally orient active-site residues for efficient catalysis. The kinase domain is oftentimes flanked by additional ligand binding domains that up- or down-regulate catalytic function. C-terminal Src kinase (Csk is a multidomain tyrosine kinase that is up-regulated by N-terminal SH2 and SH3 domains. Although the X-ray structure of Csk suggests the enzyme is compact, X-ray scattering studies indicate that the enzyme possesses both compact and open conformational forms in solution. Here, we investigated whether interactions with the ATP analog AMP-PNP and ADP can shift the conformational ensemble of Csk in solution using a combination of small angle x-ray scattering and molecular dynamics simulations. We find that binding of AMP-PNP shifts the ensemble towards more extended rather than more compact conformations. Binding of ADP further shifts the ensemble towards extended conformations, including highly extended conformations not adopted by the apo protein, nor by the AMP-PNP bound protein. These ensembles indicate that any compaction of the kinase domain induced by nucleotide binding does not extend to the overall multi-domain architecture. Instead, assembly of an ATP-bound kinase domain generates further extended forms of Csk that may have relevance for kinase scaffolding and Src regulation in the cell.
Pauci ex tanto numero: reducing redundancy in multi-model ensembles

Science.gov (United States)

Solazzo, E.; Riccio, A.; Kioutsioukis, I.; Galmarini, S.

2013-02-01

We explicitly address the fundamental issue of member diversity in multi-model ensembles. To date no attempts in this direction are documented within the air quality (AQ) community, although the extensive use of ensembles in this field. Common biases and redundancy are the two issues directly deriving from lack of independence, undermining the significance of a multi-model ensemble, and are the subject of this study. Shared biases among models will determine a biased ensemble, making therefore essential the errors of the ensemble members to be independent so that bias can cancel out. Redundancy derives from having too large a portion of common variance among the members of the ensemble, producing overconfidence in the predictions and underestimation of the uncertainty. The two issues of common biases and redundancy are analysed in detail using the AQMEII ensemble of AQ model results for four air pollutants in two European regions. We show that models share large portions of bias and variance, extending well beyond those induced by common inputs. We make use of several techniques to further show that subsets of models can explain the same amount of variance as the full ensemble with the advantage of being poorly correlated. Selecting the members for generating skilful, non-redundant ensembles from such subsets proved, however, non-trivial. We propose and discuss various methods of member selection and rate the ensemble performance they produce. In most cases, the full ensemble is outscored by the reduced ones. We conclude that, although independence of outputs may not always guarantee enhancement of scores (but this depends upon the skill being investigated) we discourage selecting the members of the ensemble simply on the basis of scores, that is, independence and skills need to be considered disjointly.
Ensemble Methods

Science.gov (United States)

Re, Matteo; Valentini, Giorgio

2012-03-01

Ensemble methods are statistical and computational learning procedures reminiscent of the human social learning behavior of seeking several opinions before making any crucial decision. The idea of combining the opinions of different "experts" to obtain an overall “ensemble” decision is rooted in our culture at least from the classical age of ancient Greece, and it has been formalized during the Enlightenment with the Condorcet Jury Theorem[45]), which proved that the judgment of a committee is superior to those of individuals, provided the individuals have reasonable competence. Ensembles are sets of learning machines that combine in some way their decisions, or their learning algorithms, or different views of data, or other specific characteristics to obtain more reliable and more accurate predictions in supervised and unsupervised learning problems [48,116]. A simple example is represented by the majority vote ensemble, by which the decisions of different learning machines are combined, and the class that receives the majority of “votes” (i.e., the class predicted by the majority of the learning machines) is the class predicted by the overall ensemble [158]. In the literature, a plethora of terms other than ensembles has been used, such as fusion, combination, aggregation, and committee, to indicate sets of learning machines that work together to solve a machine learning problem [19,40,56,66,99,108,123], but in this chapter we maintain the term ensemble in its widest meaning, in order to include the whole range of combination methods. Nowadays, ensemble methods represent one of the main current research lines in machine learning [48,116], and the interest of the research community on ensemble methods is witnessed by conferences and workshops specifically devoted to ensembles, first of all the multiple classifier systems (MCS) conference organized by Roli, Kittler, Windeatt, and other researchers of this area [14,62,85,149,173]. Several theories have been
Genome-Wide Linkage Analysis of Hemodynamic Parameters Under Mental and Physical Stress in Extended Omani Arab Pedigrees : The Oman Family Study

NARCIS (Netherlands)

Hassan, Mohammed O.; Jaju, Deepali; Voruganti, V. Saroja; Bayoumi, Riad A.; Albarwani, Sulayma; Al-Yahyaee, Saeed; Aslani, Afshin; Snieder, Harold; Lopez-Alvarenga, Juan C.; Al-Anqoudi, Zahir M.; Alizadeh, Behrooz Z.; Comuzzie, Anthony G.

Background: We performed a genome-wide scan in a homogeneous Arab population to identify genomic regions linked to blood pressure (BP) and its intermediate phenotypes during mental and physical stress tests. Methods: The Oman Family Study subjects (N = 1277) were recruited from five extended
Statistical ensembles for money and debt

Science.gov (United States)

Viaggiu, Stefano; Lionetto, Andrea; Bargigli, Leonardo; Longo, Michele

2012-10-01

We build a statistical ensemble representation of two economic models describing respectively, in simplified terms, a payment system and a credit market. To this purpose we adopt the Boltzmann-Gibbs distribution where the role of the Hamiltonian is taken by the total money supply (i.e. including money created from debt) of a set of interacting economic agents. As a result, we can read the main thermodynamic quantities in terms of monetary ones. In particular, we define for the credit market model a work term which is related to the impact of monetary policy on credit creation. Furthermore, with our formalism we recover and extend some results concerning the temperature of an economic system, previously presented in the literature by considering only the monetary base as a conserved quantity. Finally, we study the statistical ensemble for the Pareto distribution.
Functional Coverage of the Human Genome by Existing Structures, Structural Genomics Targets, and Homology Models.

Directory of Open Access Journals (Sweden)

2005-08-01

Full Text Available The bias in protein structure and function space resulting from experimental limitations and targeting of particular functional classes of proteins by structural biologists has long been recognized, but never continuously quantified. Using the Enzyme Commission and the Gene Ontology classifications as a reference frame, and integrating structure data from the Protein Data Bank (PDB, target sequences from the structural genomics projects, structure homology derived from the SUPERFAMILY database, and genome annotations from Ensembl and NCBI, we provide a quantified view, both at the domain and whole-protein levels, of the current and projected coverage of protein structure and function space relative to the human genome. Protein structures currently provide at least one domain that covers 37% of the functional classes identified in the genome; whole structure coverage exists for 25% of the genome. If all the structural genomics targets were solved (twice the current number of structures in the PDB, it is estimated that structures of one domain would cover 69% of the functional classes identified and complete structure coverage would be 44%. Homology models from existing experimental structures extend the 37% coverage to 56% of the genome as single domains and 25% to 31% for complete structures. Coverage from homology models is not evenly distributed by protein family, reflecting differing degrees of sequence and structure divergence within families. While these data provide coverage, conversely, they also systematically highlight functional classes of proteins for which structures should be determined. Current key functional families without structure representation are highlighted here; updated information on the "most wanted list" that should be solved is available on a weekly basis from http://function.rcsb.org:8080/pdb/function_distribution/index.html.
Generalized ensemble theory with non-extensive statistics

Science.gov (United States)

Shen, Ke-Ming; Zhang, Ben-Wei; Wang, En-Ke

2017-12-01

The non-extensive canonical ensemble theory is reconsidered with the method of Lagrange multipliers by maximizing Tsallis entropy, with the constraint that the normalized term of Tsallis' q -average of physical quantities, the sum ∑ pjq, is independent of the probability pi for Tsallis parameter q. The self-referential problem in the deduced probability and thermal quantities in non-extensive statistics is thus avoided, and thermodynamical relationships are obtained in a consistent and natural way. We also extend the study to the non-extensive grand canonical ensemble theory and obtain the q-deformed Bose-Einstein distribution as well as the q-deformed Fermi-Dirac distribution. The theory is further applied to the generalized Planck law to demonstrate the distinct behaviors of the various generalized q-distribution functions discussed in literature.
On Ensemble Nonlinear Kalman Filtering with Symmetric Analysis Ensembles

KAUST Repository

Luo, Xiaodong; Hoteit, Ibrahim; Moroz, Irene M.

2010-01-01

However, by adopting the Monte Carlo method, the EnSRF also incurs certain sampling errors. One way to alleviate this problem is to introduce certain symmetry to the ensembles, which can reduce the sampling errors and spurious modes in evaluation of the means and covariances of the ensembles [7]. In this contribution, we present two methods to produce symmetric ensembles. One is based on the unscented transform [8, 9], which leads to the unscented Kalman filter (UKF) [8, 9] and its variant, the ensemble unscented Kalman filter (EnUKF) [7]. The other is based on Stirling’s interpolation formula (SIF), which results in the divided difference filter (DDF) [10]. Here we propose a simplified divided difference filter (sDDF) in the context of ensemble filtering. The similarity and difference between the sDDF and the EnUKF will be discussed. Numerical experiments will also be conducted to investigate the performance of the sDDF and the EnUKF, and compare them to a well‐established EnSRF, the ensemble transform Kalman filter (ETKF) [2].
Ensembler: Enabling High-Throughput Molecular Simulations at the Superfamily Scale.

Directory of Open Access Journals (Sweden)

Daniel L Parton

2016-06-01

Full Text Available The rapidly expanding body of available genomic and protein structural data provides a rich resource for understanding protein dynamics with biomolecular simulation. While computational infrastructure has grown rapidly, simulations on an omics scale are not yet widespread, primarily because software infrastructure to enable simulations at this scale has not kept pace. It should now be possible to study protein dynamics across entire (superfamilies, exploiting both available structural biology data and conformational similarities across homologous proteins. Here, we present a new tool for enabling high-throughput simulation in the genomics era. Ensembler takes any set of sequences-from a single sequence to an entire superfamily-and shepherds them through various stages of modeling and refinement to produce simulation-ready structures. This includes comparative modeling to all relevant PDB structures (which may span multiple conformational states of interest, reconstruction of missing loops, addition of missing atoms, culling of nearly identical structures, assignment of appropriate protonation states, solvation in explicit solvent, and refinement and filtering with molecular simulation to ensure stable simulation. The output of this pipeline is an ensemble of structures ready for subsequent molecular simulations using computer clusters, supercomputers, or distributed computing projects like Folding@home. Ensembler thus automates much of the time-consuming process of preparing protein models suitable for simulation, while allowing scalability up to entire superfamilies. A particular advantage of this approach can be found in the construction of kinetic models of conformational dynamics-such as Markov state models (MSMs-which benefit from a diverse array of initial configurations that span the accessible conformational states to aid sampling. We demonstrate the power of this approach by constructing models for all catalytic domains in the human
Conformational Ensemble of the Poliovirus 3CD Precursor Observed by MD Simulations and Confirmed by SAXS: A Strategy to Expand the Viral Proteome?

Science.gov (United States)

Moustafa, Ibrahim M; Gohara, David W; Uchida, Akira; Yennawar, Neela; Cameron, Craig E

2015-11-23

The genomes of RNA viruses are relatively small. To overcome the small-size limitation, RNA viruses assign distinct functions to the processed viral proteins and their precursors. This is exemplified by poliovirus 3CD protein. 3C protein is a protease and RNA-binding protein. 3D protein is an RNA-dependent RNA polymerase (RdRp). 3CD exhibits unique protease and RNA-binding activities relative to 3C and is devoid of RdRp activity. The origin of these differences is unclear, since crystal structure of 3CD revealed "beads-on-a-string" structure with no significant structural differences compared to the fully processed proteins. We performed molecular dynamics (MD) simulations on 3CD to investigate its conformational dynamics. A compact conformation of 3CD was observed that was substantially different from that shown crystallographically. This new conformation explained the unique properties of 3CD relative to the individual proteins. Interestingly, simulations of mutant 3CD showed altered interface. Additionally, accelerated MD simulations uncovered a conformational ensemble of 3CD. When we elucidated the 3CD conformations in solution using small-angle X-ray scattering (SAXS) experiments a range of conformations from extended to compact was revealed, validating the MD simulations. The existence of conformational ensemble of 3CD could be viewed as a way to expand the poliovirus proteome, an observation that may extend to other viruses.
eHive: An Artificial Intelligence workflow system for genomic analysis

Science.gov (United States)

2010-01-01

Background The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future. Results We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios. Conclusions eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: http://www.ensembl.org/info/docs/eHive/. PMID:20459813
eHive: An Artificial Intelligence workflow system for genomic analysis

Directory of Open Access Journals (Sweden)

Gordon Leo

2010-05-01

Full Text Available Abstract Background The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future. Results We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1 pairwise whole genome alignments, (2 multiple whole genome alignments and (3 gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios. Conclusions eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: http://www.ensembl.org/info/docs/eHive/.
eHive: an artificial intelligence workflow system for genomic analysis.

Science.gov (United States)

Severin, Jessica; Beal, Kathryn; Vilella, Albert J; Fitzgerald, Stephen; Schuster, Michael; Gordon, Leo; Ureta-Vidal, Abel; Flicek, Paul; Herrero, Javier

2010-05-11

The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future. We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios. eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: http://www.ensembl.org/info/docs/eHive/.
On Ensemble Nonlinear Kalman Filtering with Symmetric Analysis Ensembles

KAUST Repository

Luo, Xiaodong

2010-09-19

The ensemble square root filter (EnSRF) [1, 2, 3, 4] is a popular method for data assimilation in high dimensional systems (e.g., geophysics models). Essentially the EnSRF is a Monte Carlo implementation of the conventional Kalman filter (KF) [5, 6]. It is mainly different from the KF at the prediction steps, where it is some ensembles, rather then the means and covariance matrices, of the system state that are propagated forward. In doing this, the EnSRF is computationally more efficient than the KF, since propagating a covariance matrix forward in high dimensional systems is prohibitively expensive. In addition, the EnSRF is also very convenient in implementation. By propagating the ensembles of the system state, the EnSRF can be directly applied to nonlinear systems without any change in comparison to the assimilation procedures in linear systems. However, by adopting the Monte Carlo method, the EnSRF also incurs certain sampling errors. One way to alleviate this problem is to introduce certain symmetry to the ensembles, which can reduce the sampling errors and spurious modes in evaluation of the means and covariances of the ensembles [7]. In this contribution, we present two methods to produce symmetric ensembles. One is based on the unscented transform [8, 9], which leads to the unscented Kalman filter (UKF) [8, 9] and its variant, the ensemble unscented Kalman filter (EnUKF) [7]. The other is based on Stirling’s interpolation formula (SIF), which results in the divided difference filter (DDF) [10]. Here we propose a simplified divided difference filter (sDDF) in the context of ensemble filtering. The similarity and difference between the sDDF and the EnUKF will be discussed. Numerical experiments will also be conducted to investigate the performance of the sDDF and the EnUKF, and compare them to a well‐established EnSRF, the ensemble transform Kalman filter (ETKF) [2].
Extending Correlation Filter-Based Visual Tracking by Tree-Structured Ensemble and Spatial Windowing.

Science.gov (United States)

Gundogdu, Erhan; Ozkan, Huseyin; Alatan, A Aydin

2017-11-01

Correlation filters have been successfully used in visual tracking due to their modeling power and computational efficiency. However, the state-of-the-art correlation filter-based (CFB) tracking algorithms tend to quickly discard the previous poses of the target, since they consider only a single filter in their models. On the contrary, our approach is to register multiple CFB trackers for previous poses and exploit the registered knowledge when an appearance change occurs. To this end, we propose a novel tracking algorithm [of complexity O(D) ] based on a large ensemble of CFB trackers. The ensemble [of size O(2 D ) ] is organized over a binary tree (depth D ), and learns the target appearance subspaces such that each constituent tracker becomes an expert of a certain appearance. During tracking, the proposed algorithm combines only the appearance-aware relevant experts to produce boosted tracking decisions. Additionally, we propose a versatile spatial windowing technique to enhance the individual expert trackers. For this purpose, spatial windows are learned for target objects as well as the correlation filters and then the windowed regions are processed for more robust correlations. In our extensive experiments on benchmark datasets, we achieve a substantial performance increase by using the proposed tracking algorithm together with the spatial windowing.
Control of inhomogeneous atomic ensembles of hyperfine qudits

DEFF Research Database (Denmark)

Mischuck, Brian Edward; Merkel, Seth T.; Deutsch, Ivan H.

2012-01-01

We study the ability to control d-dimensional quantum systems (qudits) encoded in the hyperfine spin of alkali-metal atoms through the application of radio- and microwave-frequency magnetic fields in the presence of inhomogeneities in amplitude and detuning. Such a capability is essential...... to the design of robust pulses that mitigate the effects of experimental uncertainty and also for application to tomographic addressing of particular members of an extended ensemble. We study the problem of preparing an arbitrary state in the Hilbert space from an initial fiducial state. We prove...... that inhomogeneous control of qudit ensembles is possible based on a semianalytic protocol that synthesizes the target through a sequence of alternating rf and microwave-driven SU(2) rotations in overlapping irreducible subspaces. Several examples of robust control are studied, and the semianalytic protocol...
Dynamics of heterogeneous oscillator ensembles in terms of collective variables

Science.gov (United States)

Pikovsky, Arkady; Rosenblum, Michael

2011-04-01

We consider general heterogeneous ensembles of phase oscillators, sine coupled to arbitrary external fields. Starting with the infinitely large ensembles, we extend the Watanabe-Strogatz theory, valid for identical oscillators, to cover the case of an arbitrary parameter distribution. The obtained equations yield the description of the ensemble dynamics in terms of collective variables and constants of motion. As a particular case of the general setup we consider hierarchically organized ensembles, consisting of a finite number of subpopulations, whereas the number of elements in a subpopulation can be both finite or infinite. Next, we link the Watanabe-Strogatz and Ott-Antonsen theories and demonstrate that the latter one corresponds to a particular choice of constants of motion. The approach is applied to the standard Kuramoto-Sakaguchi model, to its extension for the case of nonlinear coupling, and to the description of two interacting subpopulations, exhibiting a chimera state. With these examples we illustrate that, although the asymptotic dynamics can be found within the framework of the Ott-Antonsen theory, the transients depend on the constants of motion. The most dramatic effect is the dependence of the basins of attraction of different synchronous regimes on the initial configuration of phases.
Towards a GME ensemble forecasting system: Ensemble initialization using the breeding technique

Directory of Open Access Journals (Sweden)

Jan D. Keller

2008-12-01

Full Text Available The quantitative forecast of precipitation requires a probabilistic background particularly with regard to forecast lead times of more than 3 days. As only ensemble simulations can provide useful information of the underlying probability density function, we built a new ensemble forecasting system (GME-EFS based on the GME model of the German Meteorological Service (DWD. For the generation of appropriate initial ensemble perturbations we chose the breeding technique developed by Toth and Kalnay (1993, 1997, which develops perturbations by estimating the regions of largest model error induced uncertainty. This method is applied and tested in the framework of quasi-operational forecasts for a three month period in 2007. The performance of the resulting ensemble forecasts are compared to the operational ensemble prediction systems ECMWF EPS and NCEP GFS by means of ensemble spread of free atmosphere parameters (geopotential and temperature and ensemble skill of precipitation forecasting. This comparison indicates that the GME ensemble forecasting system (GME-EFS provides reasonable forecasts with spread skill score comparable to that of the NCEP GFS. An analysis with the continuous ranked probability score exhibits a lack of resolution for the GME forecasts compared to the operational ensembles. However, with significant enhancements during the 3 month test period, the first results of our work with the GME-EFS indicate possibilities for further development as well as the potential for later operational usage.

Using caching and optimization techniques to improve performance of the Ensembl website

Directory of Open Access Journals (Sweden)

Smith James A

2010-05-01

Full Text Available Abstract Background The Ensembl web site has provided access to genomic information for almost 10 years. During this time the amount of data available through Ensembl has grown dramatically. At the same time, the World Wide Web itself has become a dramatically more important component of the scientific workflow and the way that scientists share and access data and scientific information. Since 2000, the Ensembl web interface has had three major updates and numerous smaller updates. These have largely been in response to expanding data types and valuable representations of existing data types. In 2007 it was realised that a radical new approach would be required in order to serve the project's future requirements, and development therefore focused on identifying suitable web technologies for implementation in the 2008 site redesign. Results By comparing the Ensembl website to well-known "Web 2.0" sites, we were able to identify two main areas in which cutting-edge technologies could be advantageously deployed: server efficiency and interface latency. We then evaluated the performance of the existing site using browser-based tools and Apache benchmarking, and selected appropriate technologies to overcome any issues found. Solutions included optimization of the Apache web server, introduction of caching technologies and widespread implementation of AJAX code. These improvements were successfully deployed on the Ensembl website in late 2008 and early 2009. Conclusions Web 2.0 technologies provide a flexible and efficient way to access the terabytes of data now available from Ensembl, enhancing the user experience through improved website responsiveness and a rich, interactive interface.
Accounting for model error due to unresolved scales within ensemble Kalman filtering

OpenAIRE

Mitchell, Lewis; Carrassi, Alberto

2014-01-01

We propose a method to account for model error due to unresolved scales in the context of the ensemble transform Kalman filter (ETKF). The approach extends to this class of algorithms the deterministic model error formulation recently explored for variational schemes and extended Kalman filter. The model error statistic required in the analysis update is estimated using historical reanalysis increments and a suitable model error evolution law. Two different versions of the method are describe...
Managing uncertainty in metabolic network structure and improving predictions using EnsembleFBA.

Directory of Open Access Journals (Sweden)

Matthew B Biggs

2017-03-01

Full Text Available Genome-scale metabolic network reconstructions (GENREs are repositories of knowledge about the metabolic processes that occur in an organism. GENREs have been used to discover and interpret metabolic functions, and to engineer novel network structures. A major barrier preventing more widespread use of GENREs, particularly to study non-model organisms, is the extensive time required to produce a high-quality GENRE. Many automated approaches have been developed which reduce this time requirement, but automatically-reconstructed draft GENREs still require curation before useful predictions can be made. We present a novel approach to the analysis of GENREs which improves the predictive capabilities of draft GENREs by representing many alternative network structures, all equally consistent with available data, and generating predictions from this ensemble. This ensemble approach is compatible with many reconstruction methods. We refer to this new approach as Ensemble Flux Balance Analysis (EnsembleFBA. We validate EnsembleFBA by predicting growth and gene essentiality in the model organism Pseudomonas aeruginosa UCBPP-PA14. We demonstrate how EnsembleFBA can be included in a systems biology workflow by predicting essential genes in six Streptococcus species and mapping the essential genes to small molecule ligands from DrugBank. We found that some metabolic subsystems contributed disproportionately to the set of predicted essential reactions in a way that was unique to each Streptococcus species, leading to species-specific outcomes from small molecule interactions. Through our analyses of P. aeruginosa and six Streptococci, we show that ensembles increase the quality of predictions without drastically increasing reconstruction time, thus making GENRE approaches more practical for applications which require predictions for many non-model organisms. All of our functions and accompanying example code are available in an open online repository.
Young, intact and nested retrotransposons are abundant in the onion and asparagus genomes.

Science.gov (United States)

Vitte, C; Estep, M C; Leebens-Mack, J; Bennetzen, J L

2013-09-01

Although monocotyledonous plants comprise one of the two major groups of angiosperms and include >65 000 species, comprehensive genome analysis has been focused mainly on the Poaceae (grass) family. Due to this bias, most of the conclusions that have been drawn for monocot genome evolution are based on grasses. It is not known whether these conclusions apply to many other monocots. To extend our understanding of genome evolution in the monocots, Asparagales genomic sequence data were acquired and the structural properties of asparagus and onion genomes were analysed. Specifically, several available onion and asparagus bacterial artificial chromosomes (BACs) with contig sizes >35 kb were annotated and analysed, with a particular focus on the characterization of long terminal repeat (LTR) retrotransposons. The results reveal that LTR retrotransposons are the major components of the onion and garden asparagus genomes. These elements are mostly intact (i.e. with two LTRs), have mainly inserted within the past 6 million years and are piled up into nested structures. Analysis of shotgun genomic sequence data and the observation of two copies for some transposable elements (TEs) in annotated BACs indicates that some families have become particularly abundant, as high as 4-5 % (asparagus) or 3-4 % (onion) of the genome for the most abundant families, as also seen in large grass genomes such as wheat and maize. Although previous annotations of contiguous genomic sequences have suggested that LTR retrotransposons were highly fragmented in these two Asparagales genomes, the results presented here show that this was largely due to the methodology used. In contrast, this current work indicates an ensemble of genomic features similar to those observed in the Poaceae.
Ensemble models of neutrophil trafficking in severe sepsis.

Directory of Open Access Journals (Sweden)

Sang Ok Song

Full Text Available A hallmark of severe sepsis is systemic inflammation which activates leukocytes and can result in their misdirection. This leads to both impaired migration to the locus of infection and increased infiltration into healthy tissues. In order to better understand the pathophysiologic mechanisms involved, we developed a coarse-grained phenomenological model of the acute inflammatory response in CLP (cecal ligation and puncture-induced sepsis in rats. This model incorporates distinct neutrophil kinetic responses to the inflammatory stimulus and the dynamic interactions between components of a compartmentalized inflammatory response. Ensembles of model parameter sets consistent with experimental observations were statistically generated using a Markov-Chain Monte Carlo sampling. Prediction uncertainty in the model states was quantified over the resulting ensemble parameter sets. Forward simulation of the parameter ensembles successfully captured experimental features and predicted that systemically activated circulating neutrophils display impaired migration to the tissue and neutrophil sequestration in the lung, consequently contributing to tissue damage and mortality. Principal component and multiple regression analyses of the parameter ensembles estimated from survivor and non-survivor cohorts provide insight into pathologic mechanisms dictating outcome in sepsis. Furthermore, the model was extended to incorporate hypothetical mechanisms by which immune modulation using extracorporeal blood purification results in improved outcome in septic rats. Simulations identified a sub-population (about 18% of the treated population that benefited from blood purification. Survivors displayed enhanced neutrophil migration to tissue and reduced sequestration of lung neutrophils, contributing to improved outcome. The model ensemble presented herein provides a platform for generating and testing hypotheses in silico, as well as motivating further experimental
The Extended Nutrigenomics – Understanding the Interplay between the Genomes of Food, Gut Microbes and Human Host

Directory of Open Access Journals (Sweden)

Martin eKussmann

2011-05-01

Full Text Available Comprehensive investigation of nutritional health effects at molecular level requires understanding the interplay between three genomes, the food, the gut microbial and the human host genome. Food genomes are researched for exploitation of macro- and micronutrients as well as bioactives, with the genes coding for bioactive proteins and peptides being of central interest. The human gut microbiota encompasses a complex intestinal ecosystem with profound impact on host metabolism. It is studied at genomic, proteomic and metabolomic level. Humans are characterized at the level of: genetic predisposition and variability in terms of dietary response and direction of health trajectories; epigenetic, metabolic programming at certain life stages with health consequences later in life and for subsequent generations; and acute genomic expression as a holistic response to diet, monitored at gene transcript, protein and metabolite level.Modern nutrition science explores health aspects of bioactive food components, thereby promoting health, preventing or delaying the onset of disease, optimizing performance and assessing benefits and risks. Personalized nutrition means adapting food to individual needs, depending on the human host’s life stage, -style and -situation. Traditionally, nutrigenomics and nutri(epigenetics have been seen as the key sciences to understand human variability in preferences and requirements for diet as well as responses to nutrition. This article puts the three nutrition and health-relevant genomes into perspective, i.e. the food, the gut microbial and the human host’s genome, and calls for an extended nutrigenomics approach to build the future tools for personalized nutrition, health maintenance and disease prevention. We discuss examples of these genomes, proteomes, transcriptomes and metabolomes under the overarching term genomics that covers all Omics rather than the sole study of DNA and RNA.
Entropy of network ensembles

Science.gov (United States)

Bianconi, Ginestra

2009-03-01

In this paper we generalize the concept of random networks to describe network ensembles with nontrivial features by a statistical mechanics approach. This framework is able to describe undirected and directed network ensembles as well as weighted network ensembles. These networks might have nontrivial community structure or, in the case of networks embedded in a given space, they might have a link probability with a nontrivial dependence on the distance between the nodes. These ensembles are characterized by their entropy, which evaluates the cardinality of networks in the ensemble. In particular, in this paper we define and evaluate the structural entropy, i.e., the entropy of the ensembles of undirected uncorrelated simple networks with given degree sequence. We stress the apparent paradox that scale-free degree distributions are characterized by having small structural entropy while they are so widely encountered in natural, social, and technological complex systems. We propose a solution to the paradox by proving that scale-free degree distributions are the most likely degree distribution with the corresponding value of the structural entropy. Finally, the general framework we present in this paper is able to describe microcanonical ensembles of networks as well as canonical or hidden-variable network ensembles with significant implications for the formulation of network-constructing algorithms.
De novo prediction of human chromosome structures: Epigenetic marking patterns encode genome architecture.

Science.gov (United States)

Di Pierro, Michele; Cheng, Ryan R; Lieberman Aiden, Erez; Wolynes, Peter G; Onuchic, José N

2017-11-14

Inside the cell nucleus, genomes fold into organized structures that are characteristic of cell type. Here, we show that this chromatin architecture can be predicted de novo using epigenetic data derived from chromatin immunoprecipitation-sequencing (ChIP-Seq). We exploit the idea that chromosomes encode a 1D sequence of chromatin structural types. Interactions between these chromatin types determine the 3D structural ensemble of chromosomes through a process similar to phase separation. First, a neural network is used to infer the relation between the epigenetic marks present at a locus, as assayed by ChIP-Seq, and the genomic compartment in which those loci reside, as measured by DNA-DNA proximity ligation (Hi-C). Next, types inferred from this neural network are used as an input to an energy landscape model for chromatin organization [Minimal Chromatin Model (MiChroM)] to generate an ensemble of 3D chromosome conformations at a resolution of 50 kilobases (kb). After training the model, dubbed Maximum Entropy Genomic Annotation from Biomarkers Associated to Structural Ensembles (MEGABASE), on odd-numbered chromosomes, we predict the sequences of chromatin types and the subsequent 3D conformational ensembles for the even chromosomes. We validate these structural ensembles by using ChIP-Seq tracks alone to predict Hi-C maps, as well as distances measured using 3D fluorescence in situ hybridization (FISH) experiments. Both sets of experiments support the hypothesis of phase separation being the driving process behind compartmentalization. These findings strongly suggest that epigenetic marking patterns encode sufficient information to determine the global architecture of chromosomes and that de novo structure prediction for whole genomes may be increasingly possible. Copyright © 2017 the Author(s). Published by PNAS.
Creating nitrogen–vacancy ensembles in diamond for coupling with flux qubit

International Nuclear Information System (INIS)

Zheng Ya-Rui; Xing Jian; Chang Yan-Chun; Yan Zhi-Guang; Deng Hui; Wu Yu-Lin; Lü Li; Pan Xin-Yu; Zhu Xiao-Bo; Zheng Dong-Ning

2017-01-01

Hybrid quantum system of negatively charged nitrogen−vacancy (NV − ) centers in diamond and superconducting qubits provide the possibility to extend the performances of both systems. In this work, we numerically simulate the coupling strength between NV − ensembles and superconducting flux qubits and obtain a lower bound of 10 16 cm −3 for NV − concentration to achieve a sufficiently strong coupling of 10 MHz when the gap between NV-ensemble and flux qubit is 0. Moreover, we create NV − ensembles in different types of diamonds by 14 N + and 12 C + ion implantation, electron irradiation, and high temperature annealing. We obtain an NV − concentration of 1.05 × 10 16 cm −3 in the diamond with 1-ppm nitrogen impurity, which is expected to have a long coherence time for the low nitrogen impurity concentration. This shows a step toward performance improvement of flux qubit-NV − hybrid system. (paper)
Two-photon superradiance in extended medium

International Nuclear Information System (INIS)

Branzan, V.; Enache, N.

1993-01-01

The possibility of collectivization of an ensemble of atoms of an extended system (the distance between atoms is larger or equal to the wave-length of a spontaneous emitted radiation) during two-photon spontaneous decay is theoretically investigated. It is demonstrated that such systems of inverted atoms should emit phase-correlated pairs of photons. The time-space correlation among atoms is realized due to the two-photon exchanging through the electromagnetic field's vacuum. An increase of the spontaneous decay rate of the two-atom inverted ensemble is demonstrated. The dependence of two-photon superradiance on the sample geometry is investigated. A non-equilibrium method of the elimination of the atoms level Fermi-operators is proposed. (Author)
NYYD Ensemble

Index Scriptorium Estoniae

2002-01-01

NYYD Ensemble'i duost Traksmann - Lukk E.-S. Tüüri teosega "Symbiosis", mis on salvestatud ka hiljuti ilmunud NYYD Ensemble'i CDle. 2. märtsil Rakvere Teatri väikeses saalis ja 3. märtsil Rotermanni Soolalaos, kavas Tüür, Kaumann, Berio, Reich, Yun, Hauta-aho, Buckinx
Ensembles and Experiments in Classical and Quantum Physics

Science.gov (United States)

Neumaier, Arnold

A philosophically consistent axiomatic approach to classical and quantum mechanics is given. The approach realizes a strong formal implementation of Bohr's correspondence principle. In all instances, classical and quantum concepts are fully parallel: the same general theory has a classical realization and a quantum realization. Extending the ''probability via expectation'' approach of Whittle to noncommuting quantities, this paper defines quantities, ensembles, and experiments as mathematical concepts and shows how to model complementarity, uncertainty, probability, nonlocality and dynamics in these terms. The approach carries no connotation of unlimited repeatability; hence it can be applied to unique systems such as the universe. Consistent experiments provide an elegant solution to the reality problem, confirming the insistence of the orthodox Copenhagen interpretation on that there is nothing but ensembles, while avoiding its elusive reality picture. The weak law of large numbers explains the emergence of classical properties for macroscopic systems.
Dynamical mean-field theory of noisy spiking neuron ensembles: Application to the Hodgkin-Huxley model

International Nuclear Information System (INIS)

Hasegawa, Hideo

2003-01-01

A dynamical mean-field approximation (DMA) previously proposed by the present author [H. Hasegawa, Phys. Rev E 67, 041903 (2003)] has been extended to ensembles described by a general noisy spiking neuron model. Ensembles of N-unit neurons, each of which is expressed by coupled K-dimensional differential equations (DEs), are assumed to be subject to spatially correlated white noises. The original KN-dimensional stochastic DEs have been replaced by K(K+2)-dimensional deterministic DEs expressed in terms of means and the second-order moments of local and global variables: the fourth-order contributions are taken into account by the Gaussian decoupling approximation. Our DMA has been applied to an ensemble of Hodgkin-Huxley (HH) neurons (K=4), for which effects of the noise, the coupling strength, and the ensemble size on the response to a single-spike input have been investigated. Numerical results calculated by the DMA theory are in good agreement with those obtained by direct simulations, although the former computation is about a thousand times faster than the latter for a typical HH neuron ensemble with N=100
'Lazy' quantum ensembles

International Nuclear Information System (INIS)

Parfionov, George; Zapatrin, Roman

2006-01-01

We compare different strategies aimed to prepare an ensemble with a given density matrix ρ. Preparing the ensemble of eigenstates of ρ with appropriate probabilities can be treated as 'generous' strategy: it provides maximal accessible information about the state. Another extremity is the so-called 'Scrooge' ensemble, which is mostly stingy in sharing the information. We introduce 'lazy' ensembles which require minimal effort to prepare the density matrix by selecting pure states with respect to completely random choice. We consider two parties, Alice and Bob, playing a kind of game. Bob wishes to guess which pure state is prepared by Alice. His null hypothesis, based on the lack of any information about Alice's intention, is that Alice prepares any pure state with equal probability. Then, the average quantum state measured by Bob turns out to be ρ, and he has to make a new hypothesis about Alice's intention solely based on the information that the observed density matrix is ρ. The arising 'lazy' ensemble is shown to be the alternative hypothesis which minimizes type I error
On the proper use of Ensembles for Predictive Uncertainty assessment

Science.gov (United States)

Todini, Ezio; Coccia, Gabriele; Ortiz, Enrique

2015-04-01

Probabilistic forecasting has become popular in the last decades. Hydrological probabilistic forecasts have been based either on uncertainty processors (Krzysztofowic, 1999; Todini, 2004; Todini, 2008) or on ensembles, following meteorological traditional approaches and the establishment of the HEPEX program (http://hepex.irstea.fr. Unfortunately, the direct use of ensembles as a measure of the predictive density is an incorrect practice, because the ensemble measures the spread of the forecast instead of, following the definition of predictive uncertainty, the conditional probability of the future outcome conditional on the forecast. Only few correct approaches are reported in the literature, which correctly use the ensemble to estimate an expected conditional predictive density (Reggiani et al., 2009), similarly to what is done when several predictive models are available as in the BMA (Raftery et al., 2005) or MCP(Todini, 2008; Coccia and Todini, 2011) approaches. A major problem, limiting the correct use of ensembles, is in fact the difficulty of defining the time dependence of the ensemble members, due to the lack of a consistent ranking: in other words, when dealing with multiple models, the ith model remains the ith model regardless to the time of forecast, while this does not happen when dealing with ensemble members, since there is no definition for the ith member of an ensemble. Nonetheless, the MCP approach (Todini, 2008; Coccia and Todini, 2011), essentially based on a multiple regression in the Normal space, can be easily extended to use ensembles to represent the local (in time) smaller or larger conditional predictive uncertainty, as a function of the ensemble spread. This is done by modifying the classical linear regression equations, impliying perfectly observed predictors, to alternative regression equations similar to the Kalman filter ones, allowing for uncertain predictors. In this way, each prediction in time accounts for both the predictive
Annotation-Based Whole Genomic Prediction and Selection

DEFF Research Database (Denmark)

Kadarmideen, Haja; Do, Duy Ngoc; Janss, Luc

Genomic selection is widely used in both animal and plant species, however, it is performed with no input from known genomic or biological role of genetic variants and therefore is a black box approach in a genomic era. This study investigated the role of different genomic regions and detected QTLs...... in their contribution to estimated genomic variances and in prediction of genomic breeding values by applying SNP annotation approaches to feed efficiency. Ensembl Variant Predictor (EVP) and Pig QTL database were used as the source of genomic annotation for 60K chip. Genomic prediction was performed using the Bayes...... classes. Predictive accuracy was 0.531, 0.532, 0.302, and 0.344 for DFI, RFI, ADG and BF, respectively. The contribution per SNP to total genomic variance was similar among annotated classes across different traits. Predictive performance of SNP classes did not significantly differ from randomized SNP...
On the v-representability of ensemble densities of electron systems

Science.gov (United States)

Gonis, A.; Däne, M.

2018-05-01

Analogously to the case at zero temperature, where the density of the ground state of an interacting many-particle system determines uniquely (within an arbitrary additive constant) the external potential acting on the system, the thermal average of the density over an ensemble defined by the Boltzmann distribution at the minimum of the thermodynamic potential, or the free energy, determines the external potential uniquely (and not just modulo a constant) acting on a system described by this thermodynamic potential or free energy. The paper describes a formal procedure that generates the domain of a constrained search over general ensembles (at zero or elevated temperatures) that lead to a given density, including as a special case a density thermally averaged at a given temperature, and in the case of a v-representable density determines the external potential leading to the ensemble density. As an immediate consequence of the general formalism, the concept of v-representability is extended beyond the hitherto discussed case of ground state densities to encompass excited states as well. Specific application to thermally averaged densities solves the v-representability problem in connection with the Mermin functional in a manner analogous to that in which this problem was recently settled with respect to the Hohenberg and Kohn functional. The main formalism is illustrated with numerical results for ensembles of one-dimensional, non-interacting systems of particles under a harmonic potential.
Imprinting and recalling cortical ensembles.

Science.gov (United States)

Carrillo-Reid, Luis; Yang, Weijian; Bando, Yuki; Peterka, Darcy S; Yuste, Rafael

2016-08-12

Neuronal ensembles are coactive groups of neurons that may represent building blocks of cortical circuits. These ensembles could be formed by Hebbian plasticity, whereby synapses between coactive neurons are strengthened. Here we report that repetitive activation with two-photon optogenetics of neuronal populations from ensembles in the visual cortex of awake mice builds neuronal ensembles that recur spontaneously after being imprinted and do not disrupt preexisting ones. Moreover, imprinted ensembles can be recalled by single- cell stimulation and remain coactive on consecutive days. Our results demonstrate the persistent reconfiguration of cortical circuits by two-photon optogenetics into neuronal ensembles that can perform pattern completion. Copyright © 2016, American Association for the Advancement of Science.
World Music Ensemble: Kulintang

Science.gov (United States)

Beegle, Amy C.

2012-01-01

As instrumental world music ensembles such as steel pan, mariachi, gamelan and West African drums are becoming more the norm than the exception in North American school music programs, there are other world music ensembles just starting to gain popularity in particular parts of the United States. The kulintang ensemble, a drum and gong ensemble…
The adaptation of Escherichia coli cells grown in simulated microgravity for an extended period is both phenotypic and genomic.

Science.gov (United States)

Tirumalai, Madhan R; Karouia, Fathi; Tran, Quyen; Stepanov, Victor G; Bruce, Rebekah J; Ott, C Mark; Pierson, Duane L; Fox, George E

2017-01-01

Microorganisms impact spaceflight in a variety of ways. They play a positive role in biological systems, such as waste water treatment but can be problematic through buildups of biofilms that can affect advanced life support. Of special concern is the possibility that during extended missions, the microgravity environment will provide positive selection for undesirable genomic changes. Such changes could affect microbial antibiotic sensitivity and possibly pathogenicity. To evaluate this possibility, Escherichia coli (lac plus) cells were grown for over 1000 generations on Luria Broth medium under low-shear modeled microgravity conditions in a high aspect rotating vessel. This is the first study of its kind to grow bacteria for multiple generations over an extended period under low-shear modeled microgravity. Comparisons were made to a non-adaptive control strain using growth competitions. After 1000 generations, the final low-shear modeled microgravity-adapted strain readily outcompeted the unadapted lac minus strain. A portion of this advantage was maintained when the low-shear modeled microgravity strain was first grown in a shake flask environment for 10, 20, or 30 generations of growth. Genomic sequencing of the 1000 generation strain revealed 16 mutations. Of the five changes affecting codons, none were neutral. It is not clear how significant these mutations are as individual changes or as a group. It is concluded that part of the long-term adaptation to low-shear modeled microgravity is likely genomic. The strain was monitored for acquisition of antibiotic resistance by VITEK analysis throughout the adaptation period. Despite the evidence of genomic adaptation, resistance to a variety of antibiotics was never observed.

Producing genome structure populations with the dynamic and automated PGS software.

Science.gov (United States)

Hua, Nan; Tjong, Harianto; Shin, Hanjun; Gong, Ke; Zhou, Xianghong Jasmine; Alber, Frank

2018-05-01

Chromosome conformation capture technologies such as Hi-C are widely used to investigate the spatial organization of genomes. Because genome structures can vary considerably between individual cells of a population, interpreting ensemble-averaged Hi-C data can be challenging, in particular for long-range and interchromosomal interactions. We pioneered a probabilistic approach for the generation of a population of distinct diploid 3D genome structures consistent with all the chromatin-chromatin interaction probabilities from Hi-C experiments. Each structure in the population is a physical model of the genome in 3D. Analysis of these models yields new insights into the causes and the functional properties of the genome's organization in space and time. We provide a user-friendly software package, called PGS, which runs on local machines (for practice runs) and high-performance computing platforms. PGS takes a genome-wide Hi-C contact frequency matrix, along with information about genome segmentation, and produces an ensemble of 3D genome structures entirely consistent with the input. The software automatically generates an analysis report, and provides tools to extract and analyze the 3D coordinates of specific domains. Basic Linux command-line knowledge is sufficient for using this software. A typical running time of the pipeline is ∼3 d with 300 cores on a computer cluster to generate a population of 1,000 diploid genome structures at topological-associated domain (TAD)-level resolution.
Assessment of managed aquifer recharge potential using ensembles of local models.

Science.gov (United States)

Smith, Anthony J; Pollock, Daniel W

2012-01-01

A simple quantitative approach for assessing the artificial recharge potential of large regions using spatial ensembles of local models is proposed. The method extends existing qualitative approaches and enables rapid assessments within a programmable environment. Spatial discretization of a water resource region into continuous local domains allows simple local models to be applied independently in each domain using lumped parameters. The ensemble results can be analyzed directly or combined with other quantitative and thematic information and visualized as regional suitability maps. A case study considers the hydraulic potential for surface infiltration across a large water resource region using a published analytic model for basin recharge. The model solution was implemented within a geographic information system and evaluated independently in >21,000 local domains using lumped parameters derived from existing regional datasets. Computer execution times to run the whole ensemble and process the results were in the order of a few minutes. Relevant aspects of the case study results and general conclusions concerning the utility and limitations of the method are discussed. © 2011, CSIRO. Ground Water © 2011, National Ground Water Association.
Ensemble data assimilation in the Red Sea: sensitivity to ensemble selection and atmospheric forcing

KAUST Repository

Toye, Habib

2017-05-26

We present our efforts to build an ensemble data assimilation and forecasting system for the Red Sea. The system consists of the high-resolution Massachusetts Institute of Technology general circulation model (MITgcm) to simulate ocean circulation and of the Data Research Testbed (DART) for ensemble data assimilation. DART has been configured to integrate all members of an ensemble adjustment Kalman filter (EAKF) in parallel, based on which we adapted the ensemble operations in DART to use an invariant ensemble, i.e., an ensemble Optimal Interpolation (EnOI) algorithm. This approach requires only single forward model integration in the forecast step and therefore saves substantial computational cost. To deal with the strong seasonal variability of the Red Sea, the EnOI ensemble is then seasonally selected from a climatology of long-term model outputs. Observations of remote sensing sea surface height (SSH) and sea surface temperature (SST) are assimilated every 3 days. Real-time atmospheric fields from the National Center for Environmental Prediction (NCEP) and the European Center for Medium-Range Weather Forecasts (ECMWF) are used as forcing in different assimilation experiments. We investigate the behaviors of the EAKF and (seasonal-) EnOI and compare their performances for assimilating and forecasting the circulation of the Red Sea. We further assess the sensitivity of the assimilation system to various filtering parameters (ensemble size, inflation) and atmospheric forcing.
Capturing Three-Dimensional Genome Organization in Individual Cells by Single-Cell Hi-C.

Science.gov (United States)

Nagano, Takashi; Wingett, Steven W; Fraser, Peter

2017-01-01

Hi-C is a powerful method to investigate genome-wide, higher-order chromatin and chromosome conformations averaged from a population of cells. To expand the potential of Hi-C for single-cell analysis, we developed single-cell Hi-C. Similar to the existing "ensemble" Hi-C method, single-cell Hi-C detects proximity-dependent ligation events between cross-linked and restriction-digested chromatin fragments in cells. A major difference between the single-cell Hi-C and ensemble Hi-C protocol is that the proximity-dependent ligation is carried out in the nucleus. This allows the isolation of individual cells in which nearly the entire Hi-C procedure has been carried out, enabling the production of a Hi-C library and data from individual cells. With this new method, we studied genome conformations and found evidence for conserved topological domain organization from cell to cell, but highly variable interdomain contacts and chromosome folding genome wide. In addition, we found that the single-cell Hi-C protocol provided cleaner results with less technical noise suggesting it could be used to improve the ensemble Hi-C technique.
Predicting gene function using hierarchical multi-label decision tree ensembles

Directory of Open Access Journals (Sweden)

Kocev Dragi

2010-01-01

Full Text Available Abstract Background S. cerevisiae, A. thaliana and M. musculus are well-studied organisms in biology and the sequencing of their genomes was completed many years ago. It is still a challenge, however, to develop methods that assign biological functions to the ORFs in these genomes automatically. Different machine learning methods have been proposed to this end, but it remains unclear which method is to be preferred in terms of predictive performance, efficiency and usability. Results We study the use of decision tree based models for predicting the multiple functions of ORFs. First, we describe an algorithm for learning hierarchical multi-label decision trees. These can simultaneously predict all the functions of an ORF, while respecting a given hierarchy of gene functions (such as FunCat or GO. We present new results obtained with this algorithm, showing that the trees found by it exhibit clearly better predictive performance than the trees found by previously described methods. Nevertheless, the predictive performance of individual trees is lower than that of some recently proposed statistical learning methods. We show that ensembles of such trees are more accurate than single trees and are competitive with state-of-the-art statistical learning and functional linkage methods. Moreover, the ensemble method is computationally efficient and easy to use. Conclusions Our results suggest that decision tree based methods are a state-of-the-art, efficient and easy-to-use approach to ORF function prediction.
Girsanov reweighting for path ensembles and Markov state models

Science.gov (United States)

Donati, L.; Hartmann, C.; Keller, B. G.

2017-06-01

The sensitivity of molecular dynamics on changes in the potential energy function plays an important role in understanding the dynamics and function of complex molecules. We present a method to obtain path ensemble averages of a perturbed dynamics from a set of paths generated by a reference dynamics. It is based on the concept of path probability measure and the Girsanov theorem, a result from stochastic analysis to estimate a change of measure of a path ensemble. Since Markov state models (MSMs) of the molecular dynamics can be formulated as a combined phase-space and path ensemble average, the method can be extended to reweight MSMs by combining it with a reweighting of the Boltzmann distribution. We demonstrate how to efficiently implement the Girsanov reweighting in a molecular dynamics simulation program by calculating parts of the reweighting factor "on the fly" during the simulation, and we benchmark the method on test systems ranging from a two-dimensional diffusion process and an artificial many-body system to alanine dipeptide and valine dipeptide in implicit and explicit water. The method can be used to study the sensitivity of molecular dynamics on external perturbations as well as to reweight trajectories generated by enhanced sampling schemes to the original dynamics.
Localization of atomic ensembles via superfluorescence

International Nuclear Information System (INIS)

Macovei, Mihai; Evers, Joerg; Keitel, Christoph H.; Zubairy, M. Suhail

2007-01-01

The subwavelength localization of an ensemble of atoms concentrated to a small volume in space is investigated. The localization relies on the interaction of the ensemble with a standing wave laser field. The light scattered in the interaction of the standing wave field and the atom ensemble depends on the position of the ensemble relative to the standing wave nodes. This relation can be described by a fluorescence intensity profile, which depends on the standing wave field parameters and the ensemble properties and which is modified due to collective effects in the ensemble of nearby particles. We demonstrate that the intensity profile can be tailored to suit different localization setups. Finally, we apply these results to two localization schemes. First, we show how to localize an ensemble fixed at a certain position in the standing wave field. Second, we discuss localization of an ensemble passing through the standing wave field
Ensemble Sampling

OpenAIRE

Lu, Xiuyuan; Van Roy, Benjamin

2017-01-01

Thompson sampling has emerged as an effective heuristic for a broad range of online decision problems. In its basic form, the algorithm requires computing and sampling from a posterior distribution over models, which is tractable only for simple special cases. This paper develops ensemble sampling, which aims to approximate Thompson sampling while maintaining tractability even in the face of complex models such as neural networks. Ensemble sampling dramatically expands on the range of applica...
Optimized expanded ensembles for simulations involving molecular insertions and deletions. II. Open systems

Science.gov (United States)

Escobedo, Fernando A.

2007-11-01

In the Grand Canonical, osmotic, and Gibbs ensembles, chemical potential equilibrium is attained via transfers of molecules between the system and either a reservoir or another subsystem. In this work, the expanded ensemble (EXE) methods described in part I [F. A. Escobedo and F. J. Martínez-Veracoechea, J. Chem. Phys. 127, 174103 (2007)] of this series are extended to these ensembles to overcome the difficulties associated with implementing such whole-molecule transfers. In EXE, such moves occur via a target molecule that undergoes transitions through a number of intermediate coupling states. To minimize the tunneling time between the fully coupled and fully decoupled states, the intermediate states could be either: (i) sampled with an optimal frequency distribution (the sampling problem) or (ii) selected with an optimal spacing distribution (staging problem). The sampling issue is addressed by determining the biasing weights that would allow generating an optimal ensemble; discretized versions of this algorithm (well suited for small number of coupling stages) are also presented. The staging problem is addressed by selecting the intermediate stages in such a way that a flat histogram is the optimized ensemble. The validity of the advocated methods is demonstrated by their application to two model problems, the solvation of large hard spheres into a fluid of small and large spheres, and the vapor-liquid equilibrium of a chain system.
EnsembleGASVR: A novel ensemble method for classifying missense single nucleotide polymorphisms

KAUST Repository

Rapakoulia, Trisevgeni; Theofilatos, Konstantinos A.; Kleftogiannis, Dimitrios A.; Likothanasis, Spiridon D.; Tsakalidis, Athanasios K.; Mavroudi, Seferina P.

2014-01-01

do not support their predictions with confidence scores. Results: To overcome these limitations, a novel ensemble computational methodology is proposed. EnsembleGASVR facilitates a twostep algorithm, which in its first step applies a novel
Are there laws of genome evolution?

Directory of Open Access Journals (Sweden)

Eugene V Koonin

2011-08-01

Full Text Available Research in quantitative evolutionary genomics and systems biology led to the discovery of several universal regularities connecting genomic and molecular phenomic variables. These universals include the log-normal distribution of the evolutionary rates of orthologous genes; the power law-like distributions of paralogous family size and node degree in various biological networks; the negative correlation between a gene's sequence evolution rate and expression level; and differential scaling of functional classes of genes with genome size. The universals of genome evolution can be accounted for by simple mathematical models similar to those used in statistical physics, such as the birth-death-innovation model. These models do not explicitly incorporate selection; therefore, the observed universal regularities do not appear to be shaped by selection but rather are emergent properties of gene ensembles. Although a complete physical theory of evolutionary biology is inconceivable, the universals of genome evolution might qualify as "laws of evolutionary genomics" in the same sense "law" is understood in modern physics.
EnsembleGraph: Interactive Visual Analysis of Spatial-Temporal Behavior for Ensemble Simulation Data

Energy Technology Data Exchange (ETDEWEB)

Shu, Qingya; Guo, Hanqi; Che, Limei; Yuan, Xiaoru; Liu, Junfeng; Liang, Jie

2016-04-19

We present a novel visualization framework—EnsembleGraph— for analyzing ensemble simulation data, in order to help scientists understand behavior similarities between ensemble members over space and time. A graph-based representation is used to visualize individual spatiotemporal regions with similar behaviors, which are extracted by hierarchical clustering algorithms. A user interface with multiple-linked views is provided, which enables users to explore, locate, and compare regions that have similar behaviors between and then users can investigate and analyze the selected regions in detail. The driving application of this paper is the studies on regional emission influences over tropospheric ozone, which is based on ensemble simulations conducted with different anthropogenic emission absences using the MOZART-4 (model of ozone and related tracers, version 4) model. We demonstrate the effectiveness of our method by visualizing the MOZART-4 ensemble simulation data and evaluating the relative regional emission influences on tropospheric ozone concentrations. Positive feedbacks from domain experts and two case studies prove efficiency of our method.
Polarimetric SAR Image Classification Using Multiple-feature Fusion and Ensemble Learning

Directory of Open Access Journals (Sweden)

Sun Xun

2016-12-01

Full Text Available In this paper, we propose a supervised classification algorithm for Polarimetric Synthetic Aperture Radar (PolSAR images using multiple-feature fusion and ensemble learning. First, we extract different polarimetric features, including extended polarimetric feature space, Hoekman, Huynen, H/alpha/A, and fourcomponent scattering features of PolSAR images. Next, we randomly select two types of features each time from all feature sets to guarantee the reliability and diversity of later ensembles and use a support vector machine as the basic classifier for predicting classification results. Finally, we concatenate all prediction probabilities of basic classifiers as the final feature representation and employ the random forest method to obtain final classification results. Experimental results at the pixel and region levels show the effectiveness of the proposed algorithm.
Global Metabolic Reconstruction and Metabolic Gene Evolution in the Cattle Genome

Science.gov (United States)

Kim, Woonsu; Park, Hyesun; Seo, Seongwon

2016-01-01

The sequence of cattle genome provided a valuable opportunity to systematically link genetic and metabolic traits of cattle. The objectives of this study were 1) to reconstruct genome-scale cattle-specific metabolic pathways based on the most recent and updated cattle genome build and 2) to identify duplicated metabolic genes in the cattle genome for better understanding of metabolic adaptations in cattle. A bioinformatic pipeline of an organism for amalgamating genomic annotations from multiple sources was updated. Using this, an amalgamated cattle genome database based on UMD_3.1, was created. The amalgamated cattle genome database is composed of a total of 33,292 genes: 19,123 consensus genes between NCBI and Ensembl databases, 8,410 and 5,493 genes only found in NCBI or Ensembl, respectively, and 266 genes from NCBI scaffolds. A metabolic reconstruction of the cattle genome and cattle pathway genome database (PGDB) was also developed using Pathway Tools, followed by an intensive manual curation. The manual curation filled or revised 68 pathway holes, deleted 36 metabolic pathways, and added 23 metabolic pathways. Consequently, the curated cattle PGDB contains 304 metabolic pathways, 2,460 reactions including 2,371 enzymatic reactions, and 4,012 enzymes. Furthermore, this study identified eight duplicated genes in 12 metabolic pathways in the cattle genome compared to human and mouse. Some of these duplicated genes are related with specific hormone biosynthesis and detoxifications. The updated genome-scale metabolic reconstruction is a useful tool for understanding biology and metabolic characteristics in cattle. There has been significant improvements in the quality of cattle genome annotations and the MetaCyc database. The duplicated metabolic genes in the cattle genome compared to human and mouse implies evolutionary changes in the cattle genome and provides a useful information for further research on understanding metabolic adaptations of cattle. PMID
Impacts of calibration strategies and ensemble methods on ensemble flood forecasting over Lanjiang basin, Southeast China

Science.gov (United States)

Liu, Li; Xu, Yue-Ping

2017-04-01

Ensemble flood forecasting driven by numerical weather prediction products is becoming more commonly used in operational flood forecasting applications.In this study, a hydrological ensemble flood forecasting system based on Variable Infiltration Capacity (VIC) model and quantitative precipitation forecasts from TIGGE dataset is constructed for Lanjiang Basin, Southeast China. The impacts of calibration strategies and ensemble methods on the performance of the system are then evaluated.The hydrological model is optimized by parallel programmed ɛ-NSGAII multi-objective algorithm and two respectively parameterized models are determined to simulate daily flows and peak flows coupled with a modular approach.The results indicatethat the ɛ-NSGAII algorithm permits more efficient optimization and rational determination on parameter setting.It is demonstrated that the multimodel ensemble streamflow mean have better skills than the best singlemodel ensemble mean (ECMWF) and the multimodel ensembles weighted on members and skill scores outperform other multimodel ensembles. For typical flood event, it is proved that the flood can be predicted 3-4 days in advance, but the flows in rising limb can be captured with only 1-2 days ahead due to the flash feature. With respect to peak flows selected by Peaks Over Threshold approach, the ensemble means from either singlemodel or multimodels are generally underestimated as the extreme values are smoothed out by ensemble process.
The canonical ensemble redefined - 1: Formalism

International Nuclear Information System (INIS)

Venkataraman, R.

1984-12-01

For studying the thermodynamic properties of systems we propose an ensemble that lies in between the familiar canonical and microcanonical ensembles. We point out the transition from the canonical to microcanonical ensemble and prove from a comparative study that all these ensembles do not yield the same results even in the thermodynamic limit. An investigation of the coupling between two or more systems with these ensembles suggests that the state of thermodynamical equilibrium is a special case of statistical equilibrium. (author)
An ensemble self-training protein interaction article classifier.

Science.gov (United States)

Chen, Yifei; Hou, Ping; Manderick, Bernard

2014-01-01

Protein-protein interaction (PPI) is essential to understand the fundamental processes governing cell biology. The mining and curation of PPI knowledge are critical for analyzing proteomics data. Hence it is desired to classify articles PPI-related or not automatically. In order to build interaction article classification systems, an annotated corpus is needed. However, it is usually the case that only a small number of labeled articles can be obtained manually. Meanwhile, a large number of unlabeled articles are available. By combining ensemble learning and semi-supervised self-training, an ensemble self-training interaction classifier called EST_IACer is designed to classify PPI-related articles based on a small number of labeled articles and a large number of unlabeled articles. A biological background based feature weighting strategy is extended using the category information from both labeled and unlabeled data. Moreover, a heuristic constraint is put forward to select optimal instances from unlabeled data to improve the performance further. Experiment results show that the EST_IACer can classify the PPI related articles effectively and efficiently.
Dynamic principle for ensemble control tools.

Science.gov (United States)

Samoletov, A; Vasiev, B

2017-11-28

Dynamical equations describing physical systems in contact with a thermal bath are commonly extended by mathematical tools called "thermostats." These tools are designed for sampling ensembles in statistical mechanics. Here we propose a dynamic principle underlying a range of thermostats which is derived using fundamental laws of statistical physics and ensures invariance of the canonical measure. The principle covers both stochastic and deterministic thermostat schemes. Our method has a clear advantage over a range of proposed and widely used thermostat schemes that are based on formal mathematical reasoning. Following the derivation of the proposed principle, we show its generality and illustrate its applications including design of temperature control tools that differ from the Nosé-Hoover-Langevin scheme.
Ensemble methods for handwritten digit recognition

DEFF Research Database (Denmark)

Hansen, Lars Kai; Liisberg, Christian; Salamon, P.

1992-01-01

Neural network ensembles are applied to handwritten digit recognition. The individual networks of the ensemble are combinations of sparse look-up tables (LUTs) with random receptive fields. It is shown that the consensus of a group of networks outperforms the best individual of the ensemble....... It is further shown that it is possible to estimate the ensemble performance as well as the learning curve on a medium-size database. In addition the authors present preliminary analysis of experiments on a large database and show that state-of-the-art performance can be obtained using the ensemble approach...... by optimizing the receptive fields. It is concluded that it is possible to improve performance significantly by introducing moderate-size ensembles; in particular, a 20-25% improvement has been found. The ensemble random LUTs, when trained on a medium-size database, reach a performance (without rejects) of 94...
Eigenfunction statistics of Wishart Brownian ensembles

International Nuclear Information System (INIS)

Shukla, Pragya

2017-01-01

We theoretically analyze the eigenfunction fluctuation measures for a Hermitian ensemble which appears as an intermediate state of the perturbation of a stationary ensemble by another stationary ensemble of Wishart (Laguerre) type. Similar to the perturbation by a Gaussian stationary ensemble, the measures undergo a diffusive dynamics in terms of the perturbation parameter but the energy-dependence of the fluctuations is different in the two cases. This may have important consequences for the eigenfunction dynamics as well as phase transition studies in many areas of complexity where Brownian ensembles appear. (paper)

Measuring social interaction in music ensembles.

Science.gov (United States)

Volpe, Gualtiero; D'Ausilio, Alessandro; Badino, Leonardo; Camurri, Antonio; Fadiga, Luciano

2016-05-05

Music ensembles are an ideal test-bed for quantitative analysis of social interaction. Music is an inherently social activity, and music ensembles offer a broad variety of scenarios which are particularly suitable for investigation. Small ensembles, such as string quartets, are deemed a significant example of self-managed teams, where all musicians contribute equally to a task. In bigger ensembles, such as orchestras, the relationship between a leader (the conductor) and a group of followers (the musicians) clearly emerges. This paper presents an overview of recent research on social interaction in music ensembles with a particular focus on (i) studies from cognitive neuroscience; and (ii) studies adopting a computational approach for carrying out automatic quantitative analysis of ensemble music performances. © 2016 The Author(s).
Estimation of the uncertainty of a climate model using an ensemble simulation

Science.gov (United States)

Barth, A.; Mathiot, P.; Goosse, H.

2012-04-01

The atmospheric forcings play an important role in the study of the ocean and sea-ice dynamics of the Southern Ocean. Error in the atmospheric forcings will inevitably result in uncertain model results. The sensitivity of the model results to errors in the atmospheric forcings are studied with ensemble simulations using multivariate perturbations of the atmospheric forcing fields. The numerical ocean model used is the NEMO-LIM in a global configuration with an horizontal resolution of 2°. NCEP reanalyses are used to provide air temperature and wind data to force the ocean model over the last 50 years. A climatological mean is used to prescribe relative humidity, cloud cover and precipitation. In a first step, the model results is compared with OSTIA SST and OSI SAF sea ice concentration of the southern hemisphere. The seasonal behavior of the RMS difference and bias in SST and ice concentration is highlighted as well as the regions with relatively high RMS errors and biases such as the Antarctic Circumpolar Current and near the ice-edge. Ensemble simulations are performed to statistically characterize the model error due to uncertainties in the atmospheric forcings. Such information is a crucial element for future data assimilation experiments. Ensemble simulations are performed with perturbed air temperature and wind forcings. A Fourier decomposition of the NCEP wind vectors and air temperature for 2007 is used to generate ensemble perturbations. The perturbations are scaled such that the resulting ensemble spread matches approximately the RMS differences between the satellite SST and sea ice concentration. The ensemble spread and covariance are analyzed for the minimum and maximum sea ice extent. It is shown that errors in the atmospheric forcings can extend to several hundred meters in depth near the Antarctic Circumpolar Current.
Joys of Community Ensemble Playing: The Case of the Happy Roll Elastic Ensemble in Taiwan

Science.gov (United States)

Hsieh, Yuan-Mei; Kao, Kai-Chi

2012-01-01

The Happy Roll Elastic Ensemble (HREE) is a community music ensemble supported by Tainan Culture Centre in Taiwan. With enjoyment and friendship as its primary goals, it aims to facilitate the joys of ensemble playing and the spirit of social networking. This article highlights the key aspects of HREE's development in its first two years…
Ensemble Data Mining Methods

Science.gov (United States)

Oza, Nikunj C.

2004-01-01

Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods that leverage the power of multiple models to achieve better prediction accuracy than any of the individual models could on their own. The basic goal when designing an ensemble is the same as when establishing a committee of people: each member of the committee should be as competent as possible, but the members should be complementary to one another. If the members are not complementary, Le., if they always agree, then the committee is unnecessary---any one member is sufficient. If the members are complementary, then when one or a few members make an error, the probability is high that the remaining members can correct this error. Research in ensemble methods has largely revolved around designing ensembles consisting of competent yet complementary models.
Gridded Calibration of Ensemble Wind Vector Forecasts Using Ensemble Model Output Statistics

Science.gov (United States)

Lazarus, S. M.; Holman, B. P.; Splitt, M. E.

2017-12-01

A computationally efficient method is developed that performs gridded post processing of ensemble wind vector forecasts. An expansive set of idealized WRF model simulations are generated to provide physically consistent high resolution winds over a coastal domain characterized by an intricate land / water mask. Ensemble model output statistics (EMOS) is used to calibrate the ensemble wind vector forecasts at observation locations. The local EMOS predictive parameters (mean and variance) are then spread throughout the grid utilizing flow-dependent statistical relationships extracted from the downscaled WRF winds. Using data withdrawal and 28 east central Florida stations, the method is applied to one year of 24 h wind forecasts from the Global Ensemble Forecast System (GEFS). Compared to the raw GEFS, the approach improves both the deterministic and probabilistic forecast skill. Analysis of multivariate rank histograms indicate the post processed forecasts are calibrated. Two downscaling case studies are presented, a quiescent easterly flow event and a frontal passage. Strengths and weaknesses of the approach are presented and discussed.
Genomic prediction of complex human traits: relatedness, trait architecture and predictive meta-models

Science.gov (United States)

Spiliopoulou, Athina; Nagy, Reka; Bermingham, Mairead L.; Huffman, Jennifer E.; Hayward, Caroline; Vitart, Veronique; Rudan, Igor; Campbell, Harry; Wright, Alan F.; Wilson, James F.; Pong-Wong, Ricardo; Agakov, Felix; Navarro, Pau; Haley, Chris S.

2015-01-01

We explore the prediction of individuals' phenotypes for complex traits using genomic data. We compare several widely used prediction models, including Ridge Regression, LASSO and Elastic Nets estimated from cohort data, and polygenic risk scores constructed using published summary statistics from genome-wide association meta-analyses (GWAMA). We evaluate the interplay between relatedness, trait architecture and optimal marker density, by predicting height, body mass index (BMI) and high-density lipoprotein level (HDL) in two data cohorts, originating from Croatia and Scotland. We empirically demonstrate that dense models are better when all genetic effects are small (height and BMI) and target individuals are related to the training samples, while sparse models predict better in unrelated individuals and when some effects have moderate size (HDL). For HDL sparse models achieved good across-cohort prediction, performing similarly to the GWAMA risk score and to models trained within the same cohort, which indicates that, for predicting traits with moderately sized effects, large sample sizes and familial structure become less important, though still potentially useful. Finally, we propose a novel ensemble of whole-genome predictors with GWAMA risk scores and demonstrate that the resulting meta-model achieves higher prediction accuracy than either model on its own. We conclude that although current genomic predictors are not accurate enough for diagnostic purposes, performance can be improved without requiring access to large-scale individual-level data. Our methodologically simple meta-model is a means of performing predictive meta-analysis for optimizing genomic predictions and can be easily extended to incorporate multiple population-level summary statistics or other domain knowledge. PMID:25918167
The classicality and quantumness of a quantum ensemble

International Nuclear Information System (INIS)

Zhu Xuanmin; Pang Shengshi; Wu Shengjun; Liu Quanhui

2011-01-01

In this Letter, we investigate the classicality and quantumness of a quantum ensemble. We define a quantity called ensemble classicality based on classical cloning strategy (ECCC) to characterize how classical a quantum ensemble is. An ensemble of commuting states has a unit ECCC, while a general ensemble can have a ECCC less than 1. We also study how quantum an ensemble is by defining a related quantity called quantumness. We find that the classicality of an ensemble is closely related to how perfectly the ensemble can be cloned, and that the quantumness of the ensemble used in a quantum key distribution (QKD) protocol is exactly the attainable lower bound of the error rate in the sifted key. - Highlights: → A quantity is defined to characterize how classical a quantum ensemble is. → The classicality of an ensemble is closely related to the cloning performance. → Another quantity is also defined to investigate how quantum an ensemble is. → This quantity gives the lower bound of the error rate in a QKD protocol.
The semantic similarity ensemble

Directory of Open Access Journals (Sweden)

Andrea Ballatore

2013-12-01

Full Text Available Computational measures of semantic similarity between geographic terms provide valuable support across geographic information retrieval, data mining, and information integration. To date, a wide variety of approaches to geo-semantic similarity have been devised. A judgment of similarity is not intrinsically right or wrong, but obtains a certain degree of cognitive plausibility, depending on how closely it mimics human behavior. Thus selecting the most appropriate measure for a specific task is a significant challenge. To address this issue, we make an analogy between computational similarity measures and soliciting domain expert opinions, which incorporate a subjective set of beliefs, perceptions, hypotheses, and epistemic biases. Following this analogy, we define the semantic similarity ensemble (SSE as a composition of different similarity measures, acting as a panel of experts having to reach a decision on the semantic similarity of a set of geographic terms. The approach is evaluated in comparison to human judgments, and results indicate that an SSE performs better than the average of its parts. Although the best member tends to outperform the ensemble, all ensembles outperform the average performance of each ensemble's member. Hence, in contexts where the best measure is unknown, the ensemble provides a more cognitively plausible approach.
An Organic Computing Approach to Self-organising Robot Ensembles

Directory of Open Access Journals (Sweden)

Sebastian Albrecht von Mammen

2016-11-01

Full Text Available Similar to the Autonomous Computing initiative, that has mainly been advancing techniques for self-optimisation focussing on computing systems and infrastructures, Organic Computing (OC has been driving the development of system design concepts and algorithms for self-adaptive systems at large. Examples of application domains include, for instance, traffic management and control, cloud services, communication protocols, and robotic systems. Such an OC system typically consists of a potentially large set of autonomous and self-managed entities, where each entity acts with a local decision horizon. By means of cooperation of the individual entities, the behaviour of the entire ensemble system is derived. In this article, we present our work on how autonomous, adaptive robot ensembles can benefit from OC technology. Our elaborations are aligned with the different layers of an observer/controller framework which provides the foundation for the individuals' adaptivity at system design-level. Relying on an extended Learning Classifier System (XCS in combination with adequate simulation techniques, this basic system design empowers robot individuals to improve their individual and collaborative performances, e.g. by means of adapting to changing goals and conditions.Not only for the sake of generalisability, but also because of its enormous transformative potential, we stage our research in the domain of robot ensembles that are typically comprised of several quad-rotors and that organise themselves to fulfil spatial tasks such as maintenance of building facades or the collaborative search for mobile targets. Our elaborations detail the architectural concept, provide examples of individual self-optimisation as well as of the optimisation of collaborative efforts, and we show how the user can control the ensembles at multiple levels of abstraction. We conclude with a summary of our approach and an outlook on possible future steps.
Quantum ensembles of quantum classifiers.

Science.gov (United States)

Schuld, Maria; Petruccione, Francesco

2018-02-09

Quantum machine learning witnesses an increasing amount of quantum algorithms for data-driven decision making, a problem with potential applications ranging from automated image recognition to medical diagnosis. Many of those algorithms are implementations of quantum classifiers, or models for the classification of data inputs with a quantum computer. Following the success of collective decision making with ensembles in classical machine learning, this paper introduces the concept of quantum ensembles of quantum classifiers. Creating the ensemble corresponds to a state preparation routine, after which the quantum classifiers are evaluated in parallel and their combined decision is accessed by a single-qubit measurement. This framework naturally allows for exponentially large ensembles in which - similar to Bayesian learning - the individual classifiers do not have to be trained. As an example, we analyse an exponentially large quantum ensemble in which each classifier is weighed according to its performance in classifying the training data, leading to new results for quantum as well as classical machine learning.
Musical ensembles in Ancient Mesapotamia

NARCIS (Netherlands)

Krispijn, T.J.H.; Dumbrill, R.; Finkel, I.

2010-01-01

Identification of musical instruments from ancient Mesopotamia by comparing musical ensembles attested in Sumerian and Akkadian texts with depicted ensembles. Lexicographical contributions to the Sumerian and Akkadian lexicon.
PSO-Ensemble Demo Application

DEFF Research Database (Denmark)

2004-01-01

Within the framework of the PSO-Ensemble project (FU2101) a demo application has been created. The application use ECMWF ensemble forecasts. Two instances of the application are running; one for Nysted Offshore and one for the total production (except Horns Rev) in the Eltra area. The output...
Multilevel ensemble Kalman filter

KAUST Repository

Chernov, Alexey; Hoel, Haakon; Law, Kody; Nobile, Fabio; Tempone, Raul

2016-01-01

This work embeds a multilevel Monte Carlo (MLMC) sampling strategy into the Monte Carlo step of the ensemble Kalman filter (EnKF). In terms of computational cost vs. approximation error the asymptotic performance of the multilevel ensemble Kalman filter (MLEnKF) is superior to the EnKF s.
Multilevel ensemble Kalman filter

KAUST Repository

Chernov, Alexey

2016-01-06

This work embeds a multilevel Monte Carlo (MLMC) sampling strategy into the Monte Carlo step of the ensemble Kalman filter (EnKF). In terms of computational cost vs. approximation error the asymptotic performance of the multilevel ensemble Kalman filter (MLEnKF) is superior to the EnKF s.
Multi-Model Ensemble Wake Vortex Prediction

Science.gov (United States)

Koerner, Stephan; Holzaepfel, Frank; Ahmad, Nash'at N.

2015-01-01

Several multi-model ensemble methods are investigated for predicting wake vortex transport and decay. This study is a joint effort between National Aeronautics and Space Administration and Deutsches Zentrum fuer Luft- und Raumfahrt to develop a multi-model ensemble capability using their wake models. An overview of different multi-model ensemble methods and their feasibility for wake applications is presented. The methods include Reliability Ensemble Averaging, Bayesian Model Averaging, and Monte Carlo Simulations. The methodologies are evaluated using data from wake vortex field experiments.
Ensemble-based Kalman Filters in Strongly Nonlinear Dynamics

Institute of Scientific and Technical Information of China (English)

Zhaoxia PU; Joshua HACKER

2009-01-01

This study examines the effectiveness of ensemble Kalman filters in data assimilation with the strongly nonlinear dynamics of the Lorenz-63 model, and in particular their use in predicting the regime transition that occurs when the model jumps from one basin of attraction to the other. Four configurations of the ensemble-based Kalman filtering data assimilation techniques, including the ensemble Kalman filter, ensemble adjustment Kalman filter, ensemble square root filter and ensemble transform Kalman filter, are evaluated with their ability in predicting the regime transition (also called phase transition) and also are compared in terms of their sensitivity to both observational and sampling errors. The sensitivity of each ensemble-based filter to the size of the ensemble is also examined.
Monthly ENSO Forecast Skill and Lagged Ensemble Size

Science.gov (United States)

Trenary, L.; DelSole, T.; Tippett, M. K.; Pegion, K.

2018-04-01

The mean square error (MSE) of a lagged ensemble of monthly forecasts of the Niño 3.4 index from the Climate Forecast System (CFSv2) is examined with respect to ensemble size and configuration. Although the real-time forecast is initialized 4 times per day, it is possible to infer the MSE for arbitrary initialization frequency and for burst ensembles by fitting error covariances to a parametric model and then extrapolating to arbitrary ensemble size and initialization frequency. Applying this method to real-time forecasts, we find that the MSE consistently reaches a minimum for a lagged ensemble size between one and eight days, when four initializations per day are included. This ensemble size is consistent with the 8-10 day lagged ensemble configuration used operationally. Interestingly, the skill of both ensemble configurations is close to the estimated skill of the infinite ensemble. The skill of the weighted, lagged, and burst ensembles are found to be comparable. Certain unphysical features of the estimated error growth were tracked down to problems with the climatology and data discontinuities.
New technique for ensemble dressing combining Multimodel SuperEnsemble and precipitation PDF

Science.gov (United States)

Cane, D.; Milelli, M.

2009-09-01

The Multimodel SuperEnsemble technique (Krishnamurti et al., Science 285, 1548-1550, 1999) is a postprocessing method for the estimation of weather forecast parameters reducing direct model output errors. It differs from other ensemble analysis techniques by the use of an adequate weighting of the input forecast models to obtain a combined estimation of meteorological parameters. Weights are calculated by least-square minimization of the difference between the model and the observed field during a so-called training period. Although it can be applied successfully on the continuous parameters like temperature, humidity, wind speed and mean sea level pressure (Cane and Milelli, Meteorologische Zeitschrift, 15, 2, 2006), the Multimodel SuperEnsemble gives good results also when applied on the precipitation, a parameter quite difficult to handle with standard post-processing methods. Here we present our methodology for the Multimodel precipitation forecasts applied on a wide spectrum of results over Piemonte very dense non-GTS weather station network. We will focus particularly on an accurate statistical method for bias correction and on the ensemble dressing in agreement with the observed precipitation forecast-conditioned PDF. Acknowledgement: this work is supported by the Italian Civil Defence Department.
Thermostating extended Lagrangian Born-Oppenheimer molecular dynamics.

Science.gov (United States)

Martínez, Enrique; Cawkwell, Marc J; Voter, Arthur F; Niklasson, Anders M N

2015-04-21

Extended Lagrangian Born-Oppenheimer molecular dynamics is developed and analyzed for applications in canonical (NVT) simulations. Three different approaches are considered: the Nosé and Andersen thermostats and Langevin dynamics. We have tested the temperature distribution under different conditions of self-consistent field (SCF) convergence and time step and compared the results to analytical predictions. We find that the simulations based on the extended Lagrangian Born-Oppenheimer framework provide accurate canonical distributions even under approximate SCF convergence, often requiring only a single diagonalization per time step, whereas regular Born-Oppenheimer formulations exhibit unphysical fluctuations unless a sufficiently high degree of convergence is reached at each time step. The thermostated extended Lagrangian framework thus offers an accurate approach to sample processes in the canonical ensemble at a fraction of the computational cost of regular Born-Oppenheimer molecular dynamics simulations.
S-AMP: Approximate Message Passing for General Matrix Ensembles

DEFF Research Database (Denmark)

Cakmak, Burak; Winther, Ole; Fleury, Bernard H.

2014-01-01

the approximate message-passing (AMP) algorithm to general matrix ensembles with a well-defined large system size limit. The generalization is based on the S-transform (in free probability) of the spectrum of the measurement matrix. Furthermore, we show that the optimality of S-AMP follows directly from its......We propose a novel iterative estimation algorithm for linear observation models called S-AMP. The fixed points of S-AMP are the stationary points of the exact Gibbs free energy under a set of (first- and second-) moment consistency constraints in the large system limit. S-AMP extends...

Genomic Dissection of Travel-Associated Extended-Spectrum-Beta-Lactamase-Producing Salmonella enterica Serovar Typhi Isolates Originating from the Philippines: a One-Off Occurrence or a Threat to Effective Treatment of Typhoid Fever?

DEFF Research Database (Denmark)

Hendriksen, Rene S.; Leekitcharoenphon, Pimlapas; Mikoleit, Matthew

2015-01-01

One unreported case of extended-spectrum-beta-lactamase (ESBL)-producing Salmonella enterica serovar Typhi was identified, whole-genome sequence typed, among other analyses, and compared to other available genomes of S. Typhi. The reported strain was similar to a previously published strain harbo...
A Theoretical Analysis of Why Hybrid Ensembles Work

Directory of Open Access Journals (Sweden)

Kuo-Wei Hsu

2017-01-01

Full Text Available Inspired by the group decision making process, ensembles or combinations of classifiers have been found favorable in a wide variety of application domains. Some researchers propose to use the mixture of two different types of classification algorithms to create a hybrid ensemble. Why does such an ensemble work? The question remains. Following the concept of diversity, which is one of the fundamental elements of the success of ensembles, we conduct a theoretical analysis of why hybrid ensembles work, connecting using different algorithms to accuracy gain. We also conduct experiments on classification performance of hybrid ensembles of classifiers created by decision tree and naïve Bayes classification algorithms, each of which is a top data mining algorithm and often used to create non-hybrid ensembles. Therefore, through this paper, we provide a complement to the theoretical foundation of creating and using hybrid ensembles.
Ensembles of gustatory cortical neurons anticipate and discriminate between tastants in a single lick

Directory of Open Access Journals (Sweden)

Jennifer R Stapleton

2007-10-01

Full Text Available The gustatory cortex (GC processes chemosensory and somatosensory information and is involved in learning and anticipation. Previously we found that a subpopulation of GC neurons responded to tastants in a single lick (Stapleton et al., 2006. Here we extend this investigation to determine if small ensembles of GC neurons, obtained while rats received blocks of tastants on a fixed ratio schedule (FR5, can discriminate between tastants and their concentrations after a single 50 µL delivery. In the FR5 schedule subjects received tastants every fifth (reinforced lick and the intervening licks were unreinforced. The ensemble firing patterns were analyzed with a Bayesian generalized linear model whose parameters included the firing rates and temporal patterns of the spike trains. We found that when both the temporal and rate parameters were included, 12 of 13 ensembles correctly identified single tastant deliveries. We also found that the activity during the unreinforced licks contained signals regarding the identity of the upcoming tastant, which suggests that GC neurons contain anticipatory information about the next tastant delivery. To support this finding we performed experiments in which tastant delivery was randomized within each block and found that the neural activity following the unreinforced licks did not predict the upcoming tastant. Collectively, these results suggest that after a single lick ensembles of GC neurons can discriminate between tastants, that they may utilize both temporal and rate information, and when the tastant delivery is repetitive ensembles contain information about the identity of the upcoming tastant delivery.
Genomic prediction using subsampling.

Science.gov (United States)

Xavier, Alencar; Xu, Shizhong; Muir, William; Rainey, Katy Martin

2017-03-24

Genome-wide assisted selection is a critical tool for the genetic improvement of plants and animals. Whole-genome regression models in Bayesian framework represent the main family of prediction methods. Fitting such models with a large number of observations involves a prohibitive computational burden. We propose the use of subsampling bootstrap Markov chain in genomic prediction. Such method consists of fitting whole-genome regression models by subsampling observations in each round of a Markov Chain Monte Carlo. We evaluated the effect of subsampling bootstrap on prediction and computational parameters. Across datasets, we observed an optimal subsampling proportion of observations around 50% with replacement, and around 33% without replacement. Subsampling provided a substantial decrease in computation time, reducing the time to fit the model by half. On average, losses on predictive properties imposed by subsampling were negligible, usually below 1%. For each dataset, an optimal subsampling point that improves prediction properties was observed, but the improvements were also negligible. Combining subsampling with Gibbs sampling is an interesting ensemble algorithm. The investigation indicates that the subsampling bootstrap Markov chain algorithm substantially reduces computational burden associated with model fitting, and it may slightly enhance prediction properties.
Effect of land model ensemble versus coupled model ensemble on the simulation of precipitation climatology and variability

Science.gov (United States)

Wei, Jiangfeng; Dirmeyer, Paul A.; Yang, Zong-Liang; Chen, Haishan

2017-10-01

Through a series of model simulations with an atmospheric general circulation model coupled to three different land surface models, this study investigates the impacts of land model ensembles and coupled model ensemble on precipitation simulation. It is found that coupling an ensemble of land models to an atmospheric model has a very minor impact on the improvement of precipitation climatology and variability, but a simple ensemble average of the precipitation from three individually coupled land-atmosphere models produces better results, especially for precipitation variability. The generally weak impact of land processes on precipitation should be the main reason that the land model ensembles do not improve precipitation simulation. However, if there are big biases in the land surface model or land surface data set, correcting them could improve the simulated climate, especially for well-constrained regional climate simulations.
Semi-Supervised Multi-View Ensemble Learning Based On Extracting Cross-View Correlation

Directory of Open Access Journals (Sweden)

ZALL, R.

2016-05-01

Full Text Available Correlated information between different views incorporate useful for learning in multi view data. Canonical correlation analysis (CCA plays important role to extract these information. However, CCA only extracts the correlated information between paired data and cannot preserve correlated information between within-class samples. In this paper, we propose a two-view semi-supervised learning method called semi-supervised random correlation ensemble base on spectral clustering (SS_RCE. SS_RCE uses a multi-view method based on spectral clustering which takes advantage of discriminative information in multiple views to estimate labeling information of unlabeled samples. In order to enhance discriminative power of CCA features, we incorporate the labeling information of both unlabeled and labeled samples into CCA. Then, we use random correlation between within-class samples from cross view to extract diverse correlated features for training component classifiers. Furthermore, we extend a general model namely SSMV_RCE to construct ensemble method to tackle semi-supervised learning in the presence of multiple views. Finally, we compare the proposed methods with existing multi-view feature extraction methods using multi-view semi-supervised ensembles. Experimental results on various multi-view data sets are presented to demonstrate the effectiveness of the proposed methods.
Evaluating an ensemble classification approach for crop diversityverification in Danish greening subsidy control

DEFF Research Database (Denmark)

Chellasamy, Menaka; Ferre, Ty; Greve, Mogens Humlekrog

2016-01-01

Beginning in 2015, Danish farmers are obliged to meet specific crop diversification rules based on total land area and number of crops cultivated to be eligible for new greening subsidies. Hence, there is a need for the Danish government to extend their subsidy control system to verify farmers......’ declarations to war-rant greening payments under the new crop diversification rules. Remote Sensing (RS) technology has been used since 1992 to control farmers’ subsidies in Denmark. However, a proper RS-based approach is yet to be finalised to validate new crop diversity requirements designed for assessing...... compliance under the recent subsidy scheme (2014–2020); This study uses an ensemble classification approach(proposed by the authors in previous studies) for validating the crop diversity requirements of the new rules. The approach uses a neural network ensemble classification system with bi-temporal (spring...
New concept of statistical ensembles

International Nuclear Information System (INIS)

Gorenstein, M.I.

2009-01-01

An extension of the standard concept of the statistical ensembles is suggested. Namely, the statistical ensembles with extensive quantities fluctuating according to an externally given distribution is introduced. Applications in the statistical models of multiple hadron production in high energy physics are discussed.
Advanced Atmospheric Ensemble Modeling Techniques

Energy Technology Data Exchange (ETDEWEB)

Buckley, R. [Savannah River Site (SRS), Aiken, SC (United States). Savannah River National Lab. (SRNL); Chiswell, S. [Savannah River Site (SRS), Aiken, SC (United States). Savannah River National Lab. (SRNL); Kurzeja, R. [Savannah River Site (SRS), Aiken, SC (United States). Savannah River National Lab. (SRNL); Maze, G. [Savannah River Site (SRS), Aiken, SC (United States). Savannah River National Lab. (SRNL); Viner, B. [Savannah River Site (SRS), Aiken, SC (United States). Savannah River National Lab. (SRNL); Werth, D. [Savannah River Site (SRS), Aiken, SC (United States). Savannah River National Lab. (SRNL)

2017-09-29

Ensemble modeling (EM), the creation of multiple atmospheric simulations for a given time period, has become an essential tool for characterizing uncertainties in model predictions. We explore two novel ensemble modeling techniques: (1) perturbation of model parameters (Adaptive Programming, AP), and (2) data assimilation (Ensemble Kalman Filter, EnKF). The current research is an extension to work from last year and examines transport on a small spatial scale (<100 km) in complex terrain, for more rigorous testing of the ensemble technique. Two different release cases were studied, a coastal release (SF6) and an inland release (Freon) which consisted of two release times. Observations of tracer concentration and meteorology are used to judge the ensemble results. In addition, adaptive grid techniques have been developed to reduce required computing resources for transport calculations. Using a 20- member ensemble, the standard approach generated downwind transport that was quantitatively good for both releases; however, the EnKF method produced additional improvement for the coastal release where the spatial and temporal differences due to interior valley heating lead to the inland movement of the plume. The AP technique showed improvements for both release cases, with more improvement shown in the inland release. This research demonstrated that transport accuracy can be improved when models are adapted to a particular location/time or when important local data is assimilated into the simulation and enhances SRNL’s capability in atmospheric transport modeling in support of its current customer base and local site missions, as well as our ability to attract new customers within the intelligence community.
Layered Ensemble Architecture for Time Series Forecasting.

Science.gov (United States)

Rahman, Md Mustafizur; Islam, Md Monirul; Murase, Kazuyuki; Yao, Xin

2016-01-01

Time series forecasting (TSF) has been widely used in many application areas such as science, engineering, and finance. The phenomena generating time series are usually unknown and information available for forecasting is only limited to the past values of the series. It is, therefore, necessary to use an appropriate number of past values, termed lag, for forecasting. This paper proposes a layered ensemble architecture (LEA) for TSF problems. Our LEA consists of two layers, each of which uses an ensemble of multilayer perceptron (MLP) networks. While the first ensemble layer tries to find an appropriate lag, the second ensemble layer employs the obtained lag for forecasting. Unlike most previous work on TSF, the proposed architecture considers both accuracy and diversity of the individual networks in constructing an ensemble. LEA trains different networks in the ensemble by using different training sets with an aim of maintaining diversity among the networks. However, it uses the appropriate lag and combines the best trained networks to construct the ensemble. This indicates LEAs emphasis on accuracy of the networks. The proposed architecture has been tested extensively on time series data of neural network (NN)3 and NN5 competitions. It has also been tested on several standard benchmark time series data. In terms of forecasting accuracy, our experimental results have revealed clearly that LEA is better than other ensemble and nonensemble methods.
Rotationally invariant family of Levy-like random matrix ensembles

International Nuclear Information System (INIS)

Choi, Jinmyung; Muttalib, K A

2009-01-01

We introduce a family of rotationally invariant random matrix ensembles characterized by a parameter λ. While λ = 1 corresponds to well-known critical ensembles, we show that λ ≠ 1 describes 'Levy-like' ensembles, characterized by power-law eigenvalue densities. For λ > 1 the density is bounded, as in Gaussian ensembles, but λ < 1 describes ensembles characterized by densities with long tails. In particular, the model allows us to evaluate, in terms of a novel family of orthogonal polynomials, the eigenvalue correlations for Levy-like ensembles. These correlations differ qualitatively from those in either the Gaussian or the critical ensembles. (fast track communication)
Diversity in random subspacing ensembles

NARCIS (Netherlands)

Tsymbal, A.; Pechenizkiy, M.; Cunningham, P.; Kambayashi, Y.; Mohania, M.K.; Wöß, W.

2004-01-01

Ensembles of learnt models constitute one of the main current directions in machine learning and data mining. It was shown experimentally and theoretically that in order for an ensemble to be effective, it should consist of classifiers having diversity in their predictions. A number of ways are
Squeezing of Collective Excitations in Spin Ensembles

DEFF Research Database (Denmark)

Kraglund Andersen, Christian; Mølmer, Klaus

2012-01-01

We analyse the possibility to create two-mode spin squeezed states of two separate spin ensembles by inverting the spins in one ensemble and allowing spin exchange between the ensembles via a near resonant cavity field. We investigate the dynamics of the system using a combination of numerical an...
AUC-Maximizing Ensembles through Metalearning.

Science.gov (United States)

LeDell, Erin; van der Laan, Mark J; Petersen, Maya

2016-05-01

Area Under the ROC Curve (AUC) is often used to measure the performance of an estimator in binary classification problems. An AUC-maximizing classifier can have significant advantages in cases where ranking correctness is valued or if the outcome is rare. In a Super Learner ensemble, maximization of the AUC can be achieved by the use of an AUC-maximining metalearning algorithm. We discuss an implementation of an AUC-maximization technique that is formulated as a nonlinear optimization problem. We also evaluate the effectiveness of a large number of different nonlinear optimization algorithms to maximize the cross-validated AUC of the ensemble fit. The results provide evidence that AUC-maximizing metalearners can, and often do, out-perform non-AUC-maximizing metalearning methods, with respect to ensemble AUC. The results also demonstrate that as the level of imbalance in the training data increases, the Super Learner ensemble outperforms the top base algorithm by a larger degree.
Urban runoff forecasting with ensemble weather predictions

DEFF Research Database (Denmark)

Pedersen, Jonas Wied; Courdent, Vianney Augustin Thomas; Vezzaro, Luca

This research shows how ensemble weather forecasts can be used to generate urban runoff forecasts up to 53 hours into the future. The results highlight systematic differences between ensemble members that needs to be accounted for when these forecasts are used in practice.......This research shows how ensemble weather forecasts can be used to generate urban runoff forecasts up to 53 hours into the future. The results highlight systematic differences between ensemble members that needs to be accounted for when these forecasts are used in practice....
Spatial Ensemble Postprocessing of Precipitation Forecasts Using High Resolution Analyses

Science.gov (United States)

Lang, Moritz N.; Schicker, Irene; Kann, Alexander; Wang, Yong

2017-04-01

shows a mean improvement of more than 40% in CRPS when compared to bilinearly interpolated uncalibrated ensemble forecasts. The validation on randomly selected grid points, representing the true height distribution over Austria, still indicates a mean improvement of 35%. The applied statistical model is currently set up for 6-hourly and daily accumulation periods, but will be extended to a temporal resolution of 1-3 hours within a new probabilistic nowcasting system operated by ZAMG.
Translating the cancer genome: Going beyond p values

Energy Technology Data Exchange (ETDEWEB)

Chin, Lynda; Chin, Lynda; Gray, Joe W.

2008-04-03

Cancer cells are endowed with diverse biological capabilities driven by myriad inherited and somatic genetic and epigenetic aberrations that commandeer key cancer-relevant pathways. Efforts to elucidate these aberrations began with Boveri's hypothesis of aberrant mitoses causing cancer and continue today with a suite of powerful high-resolution technologies that enable detailed catalogues of genomic aberrations and epigenomic modifications. Tomorrow will likely bring the complete atlas of reversible and irreversible alteration in individual cancers. The challenge now is to discern causal molecular abnormalities from genomic and epigenomic 'noise', to understand how the ensemble of these aberrations collaborate to drive cancer pathophysiology. Here, we highlight lessons learned from now classical examples of successful translation of genomic discoveries into clinical practice, lessons that may be used to guide and accelerate translation of emerging genomic insights into practical clinical endpoints that can impact on practice of cancer medicine.
Multilevel ensemble Kalman filtering

KAUST Repository

Hoel, Haakon

2016-01-08

The ensemble Kalman filter (EnKF) is a sequential filtering method that uses an ensemble of particle paths to estimate the means and covariances required by the Kalman filter by the use of sample moments, i.e., the Monte Carlo method. EnKF is often both robust and efficient, but its performance may suffer in settings where the computational cost of accurate simulations of particles is high. The multilevel Monte Carlo method (MLMC) is an extension of classical Monte Carlo methods which by sampling stochastic realizations on a hierarchy of resolutions may reduce the computational cost of moment approximations by orders of magnitude. In this work we have combined the ideas of MLMC and EnKF to construct the multilevel ensemble Kalman filter (MLEnKF) for the setting of finite dimensional state and observation spaces. The main ideas of this method is to compute particle paths on a hierarchy of resolutions and to apply multilevel estimators on the ensemble hierarchy of particles to compute Kalman filter means and covariances. Theoretical results and a numerical study of the performance gains of MLEnKF over EnKF will be presented. Some ideas on the extension of MLEnKF to settings with infinite dimensional state spaces will also be presented.
Multilevel ensemble Kalman filtering

KAUST Repository

Hoel, Haakon; Chernov, Alexey; Law, Kody; Nobile, Fabio; Tempone, Raul

2016-01-01

The ensemble Kalman filter (EnKF) is a sequential filtering method that uses an ensemble of particle paths to estimate the means and covariances required by the Kalman filter by the use of sample moments, i.e., the Monte Carlo method. EnKF is often both robust and efficient, but its performance may suffer in settings where the computational cost of accurate simulations of particles is high. The multilevel Monte Carlo method (MLMC) is an extension of classical Monte Carlo methods which by sampling stochastic realizations on a hierarchy of resolutions may reduce the computational cost of moment approximations by orders of magnitude. In this work we have combined the ideas of MLMC and EnKF to construct the multilevel ensemble Kalman filter (MLEnKF) for the setting of finite dimensional state and observation spaces. The main ideas of this method is to compute particle paths on a hierarchy of resolutions and to apply multilevel estimators on the ensemble hierarchy of particles to compute Kalman filter means and covariances. Theoretical results and a numerical study of the performance gains of MLEnKF over EnKF will be presented. Some ideas on the extension of MLEnKF to settings with infinite dimensional state spaces will also be presented.
The international Genome sample resource (IGSR): A worldwide collection of genome variation incorporating the 1000 Genomes Project data.

Science.gov (United States)

Clarke, Laura; Fairley, Susan; Zheng-Bradley, Xiangqun; Streeter, Ian; Perry, Emily; Lowy, Ernesto; Tassé, Anne-Marie; Flicek, Paul

2017-01-04

The International Genome Sample Resource (IGSR; http://www.internationalgenome.org) expands in data type and population diversity the resources from the 1000 Genomes Project. IGSR represents the largest open collection of human variation data and provides easy access to these resources. IGSR was established in 2015 to maintain and extend the 1000 Genomes Project data, which has been widely used as a reference set of human variation and by researchers developing analysis methods. IGSR has mapped all of the 1000 Genomes sequence to the newest human reference (GRCh38), and will release updated variant calls to ensure maximal usefulness of the existing data. IGSR is collecting new structural variation data on the 1000 Genomes samples from long read sequencing and other technologies, and will collect relevant functional data into a single comprehensive resource. IGSR is extending coverage with new populations sequenced by collaborating groups. Here, we present the new data and analysis that IGSR has made available. We have also introduced a new data portal that increases discoverability of our data-previously only browseable through our FTP site-by focusing on particular samples, populations or data sets of interest. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

Bayesian ensemble refinement by replica simulations and reweighting

Science.gov (United States)

Hummer, Gerhard; Köfinger, Jürgen

2015-12-01

We describe different Bayesian ensemble refinement methods, examine their interrelation, and discuss their practical application. With ensemble refinement, the properties of dynamic and partially disordered (bio)molecular structures can be characterized by integrating a wide range of experimental data, including measurements of ensemble-averaged observables. We start from a Bayesian formulation in which the posterior is a functional that ranks different configuration space distributions. By maximizing this posterior, we derive an optimal Bayesian ensemble distribution. For discrete configurations, this optimal distribution is identical to that obtained by the maximum entropy "ensemble refinement of SAXS" (EROS) formulation. Bayesian replica ensemble refinement enhances the sampling of relevant configurations by imposing restraints on averages of observables in coupled replica molecular dynamics simulations. We show that the strength of the restraints should scale linearly with the number of replicas to ensure convergence to the optimal Bayesian result in the limit of infinitely many replicas. In the "Bayesian inference of ensembles" method, we combine the replica and EROS approaches to accelerate the convergence. An adaptive algorithm can be used to sample directly from the optimal ensemble, without replicas. We discuss the incorporation of single-molecule measurements and dynamic observables such as relaxation parameters. The theoretical analysis of different Bayesian ensemble refinement approaches provides a basis for practical applications and a starting point for further investigations.
Protein folding simulations by generalized-ensemble algorithms.

Science.gov (United States)

Yoda, Takao; Sugita, Yuji; Okamoto, Yuko

2014-01-01

In the protein folding problem, conventional simulations in physical statistical mechanical ensembles, such as the canonical ensemble with fixed temperature, face a great difficulty. This is because there exist a huge number of local-minimum-energy states in the system and the conventional simulations tend to get trapped in these states, giving wrong results. Generalized-ensemble algorithms are based on artificial unphysical ensembles and overcome the above difficulty by performing random walks in potential energy, volume, and other physical quantities or their corresponding conjugate parameters such as temperature, pressure, etc. The advantage of generalized-ensemble simulations lies in the fact that they not only avoid getting trapped in states of energy local minima but also allows the calculations of physical quantities as functions of temperature or other parameters from a single simulation run. In this article we review the generalized-ensemble algorithms. Four examples, multicanonical algorithm, replica-exchange method, replica-exchange multicanonical algorithm, and multicanonical replica-exchange method, are described in detail. Examples of their applications to the protein folding problem are presented.
Evaluation of medium-range ensemble flood forecasting based on calibration strategies and ensemble methods in Lanjiang Basin, Southeast China

Science.gov (United States)

Liu, Li; Gao, Chao; Xuan, Weidong; Xu, Yue-Ping

2017-11-01

Ensemble flood forecasts by hydrological models using numerical weather prediction products as forcing data are becoming more commonly used in operational flood forecasting applications. In this study, a hydrological ensemble flood forecasting system comprised of an automatically calibrated Variable Infiltration Capacity model and quantitative precipitation forecasts from TIGGE dataset is constructed for Lanjiang Basin, Southeast China. The impacts of calibration strategies and ensemble methods on the performance of the system are then evaluated. The hydrological model is optimized by the parallel programmed ε-NSGA II multi-objective algorithm. According to the solutions by ε-NSGA II, two differently parameterized models are determined to simulate daily flows and peak flows at each of the three hydrological stations. Then a simple yet effective modular approach is proposed to combine these daily and peak flows at the same station into one composite series. Five ensemble methods and various evaluation metrics are adopted. The results show that ε-NSGA II can provide an objective determination on parameter estimation, and the parallel program permits a more efficient simulation. It is also demonstrated that the forecasts from ECMWF have more favorable skill scores than other Ensemble Prediction Systems. The multimodel ensembles have advantages over all the single model ensembles and the multimodel methods weighted on members and skill scores outperform other methods. Furthermore, the overall performance at three stations can be satisfactory up to ten days, however the hydrological errors can degrade the skill score by approximately 2 days, and the influence persists until a lead time of 10 days with a weakening trend. With respect to peak flows selected by the Peaks Over Threshold approach, the ensemble means from single models or multimodels are generally underestimated, indicating that the ensemble mean can bring overall improvement in forecasting of flows. For
Ensemble method for dengue prediction.

Science.gov (United States)

Buczak, Anna L; Baugher, Benjamin; Moniz, Linda J; Bagley, Thomas; Babin, Steven M; Guven, Erhan

2018-01-01

In the 2015 NOAA Dengue Challenge, participants made three dengue target predictions for two locations (Iquitos, Peru, and San Juan, Puerto Rico) during four dengue seasons: 1) peak height (i.e., maximum weekly number of cases during a transmission season; 2) peak week (i.e., week in which the maximum weekly number of cases occurred); and 3) total number of cases reported during a transmission season. A dengue transmission season is the 12-month period commencing with the location-specific, historical week with the lowest number of cases. At the beginning of the Dengue Challenge, participants were provided with the same input data for developing the models, with the prediction testing data provided at a later date. Our approach used ensemble models created by combining three disparate types of component models: 1) two-dimensional Method of Analogues models incorporating both dengue and climate data; 2) additive seasonal Holt-Winters models with and without wavelet smoothing; and 3) simple historical models. Of the individual component models created, those with the best performance on the prior four years of data were incorporated into the ensemble models. There were separate ensembles for predicting each of the three targets at each of the two locations. Our ensemble models scored higher for peak height and total dengue case counts reported in a transmission season for Iquitos than all other models submitted to the Dengue Challenge. However, the ensemble models did not do nearly as well when predicting the peak week. The Dengue Challenge organizers scored the dengue predictions of the Challenge participant groups. Our ensemble approach was the best in predicting the total number of dengue cases reported for transmission season and peak height for Iquitos, Peru.
Ensemble method for dengue prediction.

Directory of Open Access Journals (Sweden)

Anna L Buczak

Full Text Available In the 2015 NOAA Dengue Challenge, participants made three dengue target predictions for two locations (Iquitos, Peru, and San Juan, Puerto Rico during four dengue seasons: 1 peak height (i.e., maximum weekly number of cases during a transmission season; 2 peak week (i.e., week in which the maximum weekly number of cases occurred; and 3 total number of cases reported during a transmission season. A dengue transmission season is the 12-month period commencing with the location-specific, historical week with the lowest number of cases. At the beginning of the Dengue Challenge, participants were provided with the same input data for developing the models, with the prediction testing data provided at a later date.Our approach used ensemble models created by combining three disparate types of component models: 1 two-dimensional Method of Analogues models incorporating both dengue and climate data; 2 additive seasonal Holt-Winters models with and without wavelet smoothing; and 3 simple historical models. Of the individual component models created, those with the best performance on the prior four years of data were incorporated into the ensemble models. There were separate ensembles for predicting each of the three targets at each of the two locations.Our ensemble models scored higher for peak height and total dengue case counts reported in a transmission season for Iquitos than all other models submitted to the Dengue Challenge. However, the ensemble models did not do nearly as well when predicting the peak week.The Dengue Challenge organizers scored the dengue predictions of the Challenge participant groups. Our ensemble approach was the best in predicting the total number of dengue cases reported for transmission season and peak height for Iquitos, Peru.
Contact planarization of ensemble nanowires

Science.gov (United States)

Chia, A. C. E.; LaPierre, R. R.

2011-06-01

The viability of four organic polymers (S1808, SC200, SU8 and Cyclotene) as filling materials to achieve planarization of ensemble nanowire arrays is reported. Analysis of the porosity, surface roughness and thermal stability of each filling material was performed. Sonication was used as an effective method to remove the tops of the nanowires (NWs) to achieve complete planarization. Ensemble nanowire devices were fully fabricated and I-V measurements confirmed that Cyclotene effectively planarizes the NWs while still serving the role as an insulating layer between the top and bottom contacts. These processes and analysis can be easily implemented into future characterization and fabrication of ensemble NWs for optoelectronic device applications.
Ensemble forecasting of species distributions.

Science.gov (United States)

Araújo, Miguel B; New, Mark

2007-01-01

Concern over implications of climate change for biodiversity has led to the use of bioclimatic models to forecast the range shifts of species under future climate-change scenarios. Recent studies have demonstrated that projections by alternative models can be so variable as to compromise their usefulness for guiding policy decisions. Here, we advocate the use of multiple models within an ensemble forecasting framework and describe alternative approaches to the analysis of bioclimatic ensembles, including bounding box, consensus and probabilistic techniques. We argue that, although improved accuracy can be delivered through the traditional tasks of trying to build better models with improved data, more robust forecasts can also be achieved if ensemble forecasts are produced and analysed appropriately.
Reproducing multi-model ensemble average with Ensemble-averaged Reconstructed Forcings (ERF) in regional climate modeling

Science.gov (United States)

Erfanian, A.; Fomenko, L.; Wang, G.

2016-12-01

Multi-model ensemble (MME) average is considered the most reliable for simulating both present-day and future climates. It has been a primary reference for making conclusions in major coordinated studies i.e. IPCC Assessment Reports and CORDEX. The biases of individual models cancel out each other in MME average, enabling the ensemble mean to outperform individual members in simulating the mean climate. This enhancement however comes with tremendous computational cost, which is especially inhibiting for regional climate modeling as model uncertainties can originate from both RCMs and the driving GCMs. Here we propose the Ensemble-based Reconstructed Forcings (ERF) approach to regional climate modeling that achieves a similar level of bias reduction at a fraction of cost compared with the conventional MME approach. The new method constructs a single set of initial and boundary conditions (IBCs) by averaging the IBCs of multiple GCMs, and drives the RCM with this ensemble average of IBCs to conduct a single run. Using a regional climate model (RegCM4.3.4-CLM4.5), we tested the method over West Africa for multiple combination of (up to six) GCMs. Our results indicate that the performance of the ERF method is comparable to that of the MME average in simulating the mean climate. The bias reduction seen in ERF simulations is achieved by using more realistic IBCs in solving the system of equations underlying the RCM physics and dynamics. This endows the new method with a theoretical advantage in addition to reducing computational cost. The ERF output is an unaltered solution of the RCM as opposed to a climate state that might not be physically plausible due to the averaging of multiple solutions with the conventional MME approach. The ERF approach should be considered for use in major international efforts such as CORDEX. Key words: Multi-model ensemble, ensemble analysis, ERF, regional climate modeling
Conductor gestures influence evaluations of ensemble performance.

Science.gov (United States)

Morrison, Steven J; Price, Harry E; Smedley, Eric M; Meals, Cory D

2014-01-01

Previous research has found that listener evaluations of ensemble performances vary depending on the expressivity of the conductor's gestures, even when performances are otherwise identical. It was the purpose of the present study to test whether this effect of visual information was evident in the evaluation of specific aspects of ensemble performance: articulation and dynamics. We constructed a set of 32 music performances that combined auditory and visual information and were designed to feature a high degree of contrast along one of two target characteristics: articulation and dynamics. We paired each of four music excerpts recorded by a chamber ensemble in both a high- and low-contrast condition with video of four conductors demonstrating high- and low-contrast gesture specifically appropriate to either articulation or dynamics. Using one of two equivalent test forms, college music majors and non-majors (N = 285) viewed sixteen 30 s performances and evaluated the quality of the ensemble's articulation, dynamics, technique, and tempo along with overall expressivity. Results showed significantly higher evaluations for performances featuring high rather than low conducting expressivity regardless of the ensemble's performance quality. Evaluations for both articulation and dynamics were strongly and positively correlated with evaluations of overall ensemble expressivity.
An ensemble approach to simulate CO2 emissions from natural fires

Science.gov (United States)

Eliseev, A. V.; Mokhov, I. I.; Chernokulsky, A. V.

2014-06-01

This paper presents ensemble simulations with the global climate model developed at the A. M. Obukhov Institute of Atmospheric Physics, Russian Academy of Sciences (IAP RAS CM). These simulations are forced by historical reconstructions of concentrations of well-mixed greenhouse gases (CO2, CH4, and N2O), sulfate aerosols (both in the troposphere and stratosphere), extent of crops and pastures, and total solar irradiance for AD 850-2005 (hereafter all years are taken as being AD) and by the Representative Concentration Pathway (RCP) scenarios for the same forcing agents until the year 2300. Our model implements GlobFIRM (Global FIRe Model) as a scheme for calculating characteristics of natural fires. Comparing to the original GlobFIRM model, in our implementation, the scheme is extended by a module accounting for CO2 release from soil during fires. The novel approach of our paper is to simulate natural fires in an ensemble fashion. Different ensemble members in the present paper are constructed by varying the values of parameters of the natural fires module. These members are constrained by the GFED-3.1 data set for the burnt area and CO2 release from fires and further subjected to Bayesian averaging. Our simulations are the first coupled model assessment of future changes in gross characteristics of natural fires. In our model, the present-day (1998-2011) global area burnt due to natural fires is (2.1 ± 0.4) × 106 km2 yr-1 (ensemble mean and intra-ensemble standard deviation are presented), and the respective CO2 emissions to the atmosphere are (1.4 ± 0.2) Pg C yr-1. The latter value is in agreement with the corresponding GFED estimates. The area burnt by natural fires is generally larger than the GFED estimates except in boreal Eurasia, where it is realistic, and in Australia, where it is smaller than these estimates. Regionally, the modelled CO2 emissions are larger (smaller) than the GFED estimates in Europe (in the tropics and north-eastern Eurasia). From
Assessing the potential for improving S2S forecast skill through multimodel ensembling

Science.gov (United States)

Vigaud, N.; Robertson, A. W.; Tippett, M. K.; Wang, L.; Bell, M. J.

2016-12-01

Non-linear logistic regression is well suited to probability forecasting and has been successfully applied in the past to ensemble weather and climate predictions, providing access to the full probabilities distribution without any Gaussian assumption. However, little work has been done at sub-monthly lead times where relatively small re-forecast ensembles and lengths represent new challenges for which post-processing avenues have yet to be investigated. A promising approach consists in extending the definition of non-linear logistic regression by including the quantile of the forecast distribution as one of the predictors. So-called Extended Logistic Regression (ELR), which enables mutually consistent individual threshold probabilities, is here applied to ECMWF, CFSv2 and CMA re-forecasts from the S2S database in order to produce rainfall probabilities at weekly resolution. The ELR model is trained on seasonally-varying tercile categories computed for lead times of 1 to 4 weeks. It is then tested in a cross-validated manner, i.e. allowing real-time predictability applications, to produce rainfall tercile probabilities from individual weekly hindcasts that are finally combined by equal pooling. Results will be discussed over a broader North American region, where individual and MME forecasts generated out to 4 weeks lead are characterized by good probabilistic reliability but low sharpness, exhibiting systematically more skill in winter than summer.
Sequential ensemble-based optimal design for parameter estimation: SEQUENTIAL ENSEMBLE-BASED OPTIMAL DESIGN

Energy Technology Data Exchange (ETDEWEB)

Man, Jun [Zhejiang Provincial Key Laboratory of Agricultural Resources and Environment, Institute of Soil and Water Resources and Environmental Science, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou China; Zhang, Jiangjiang [Zhejiang Provincial Key Laboratory of Agricultural Resources and Environment, Institute of Soil and Water Resources and Environmental Science, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou China; Li, Weixuan [Pacific Northwest National Laboratory, Richland Washington USA; Zeng, Lingzao [Zhejiang Provincial Key Laboratory of Agricultural Resources and Environment, Institute of Soil and Water Resources and Environmental Science, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou China; Wu, Laosheng [Department of Environmental Sciences, University of California, Riverside California USA

2016-10-01

The ensemble Kalman filter (EnKF) has been widely used in parameter estimation for hydrological models. The focus of most previous studies was to develop more efficient analysis (estimation) algorithms. On the other hand, it is intuitively understandable that a well-designed sampling (data-collection) strategy should provide more informative measurements and subsequently improve the parameter estimation. In this work, a Sequential Ensemble-based Optimal Design (SEOD) method, coupled with EnKF, information theory and sequential optimal design, is proposed to improve the performance of parameter estimation. Based on the first-order and second-order statistics, different information metrics including the Shannon entropy difference (SD), degrees of freedom for signal (DFS) and relative entropy (RE) are used to design the optimal sampling strategy, respectively. The effectiveness of the proposed method is illustrated by synthetic one-dimensional and two-dimensional unsaturated flow case studies. It is shown that the designed sampling strategies can provide more accurate parameter estimation and state prediction compared with conventional sampling strategies. Optimal sampling designs based on various information metrics perform similarly in our cases. The effect of ensemble size on the optimal design is also investigated. Overall, larger ensemble size improves the parameter estimation and convergence of optimal sampling strategy. Although the proposed method is applied to unsaturated flow problems in this study, it can be equally applied in any other hydrological problems.
Modality-Driven Classification and Visualization of Ensemble Variance

Energy Technology Data Exchange (ETDEWEB)

Bensema, Kevin; Gosink, Luke; Obermaier, Harald; Joy, Kenneth I.

2016-10-01

Advances in computational power now enable domain scientists to address conceptual and parametric uncertainty by running simulations multiple times in order to sufficiently sample the uncertain input space. While this approach helps address conceptual and parametric uncertainties, the ensemble datasets produced by this technique present a special challenge to visualization researchers as the ensemble dataset records a distribution of possible values for each location in the domain. Contemporary visualization approaches that rely solely on summary statistics (e.g., mean and variance) cannot convey the detailed information encoded in ensemble distributions that are paramount to ensemble analysis; summary statistics provide no information about modality classification and modality persistence. To address this problem, we propose a novel technique that classifies high-variance locations based on the modality of the distribution of ensemble predictions. Additionally, we develop a set of confidence metrics to inform the end-user of the quality of fit between the distribution at a given location and its assigned class. We apply a similar method to time-varying ensembles to illustrate the relationship between peak variance and bimodal or multimodal behavior. These classification schemes enable a deeper understanding of the behavior of the ensemble members by distinguishing between distributions that can be described by a single tendency and distributions which reflect divergent trends in the ensemble.
Bioactive focus in conformational ensembles: a pluralistic approach

Science.gov (United States)

Habgood, Matthew

2017-12-01

Computational generation of conformational ensembles is key to contemporary drug design. Selecting the members of the ensemble that will approximate the conformation most likely to bind to a desired target (the bioactive conformation) is difficult, given that the potential energy usually used to generate and rank the ensemble is a notoriously poor discriminator between bioactive and non-bioactive conformations. In this study an approach to generating a focused ensemble is proposed in which each conformation is assigned multiple rankings based not just on potential energy but also on solvation energy, hydrophobic or hydrophilic interaction energy, radius of gyration, and on a statistical potential derived from Cambridge Structural Database data. The best ranked structures derived from each system are then assembled into a new ensemble that is shown to be better focused on bioactive conformations. This pluralistic approach is tested on ensembles generated by the Molecular Operating Environment's Low Mode Molecular Dynamics module, and by the Cambridge Crystallographic Data Centre's conformation generator software.
EMUDRA: Ensemble of Multiple Drug Repositioning Approaches to Improve Prediction Accuracy.

Science.gov (United States)

Zhou, Xianxiao; Wang, Minghui; Katsyv, Igor; Irie, Hanna; Zhang, Bin

2018-04-24

Availability of large-scale genomic, epigenetic and proteomic data in complex diseases makes it possible to objectively and comprehensively identify therapeutic targets that can lead to new therapies. The Connectivity Map has been widely used to explore novel indications of existing drugs. However, the prediction accuracy of the existing methods, such as Kolmogorov-Smirnov statistic remains low. Here we present a novel high-performance drug repositioning approach that improves over the state-of-the-art methods. We first designed an expression weighted cosine method (EWCos) to minimize the influence of the uninformative expression changes and then developed an ensemble approach termed EMUDRA (Ensemble of Multiple Drug Repositioning Approaches) to integrate EWCos and three existing state-of-the-art methods. EMUDRA significantly outperformed individual drug repositioning methods when applied to simulated and independent evaluation datasets. We predicted using EMUDRA and experimentally validated an antibiotic rifabutin as an inhibitor of cell growth in triple negative breast cancer. EMUDRA can identify drugs that more effectively target disease gene signatures and will thus be a useful tool for identifying novel therapies for complex diseases and predicting new indications for existing drugs. The EMUDRA R package is available at doi:10.7303/syn11510888. bin.zhang@mssm.edu or zhangb@hotmail.com. Supplementary data are available at Bioinformatics online.
Demonstrating the value of larger ensembles in forecasting physical systems

Directory of Open Access Journals (Sweden)

Reason L. Machete

2016-12-01

Full Text Available Ensemble simulation propagates a collection of initial states forward in time in a Monte Carlo fashion. Depending on the fidelity of the model and the properties of the initial ensemble, the goal of ensemble simulation can range from merely quantifying variations in the sensitivity of the model all the way to providing actionable probability forecasts of the future. Whatever the goal is, success depends on the properties of the ensemble, and there is a longstanding discussion in meteorology as to the size of initial condition ensemble most appropriate for Numerical Weather Prediction. In terms of resource allocation: how is one to divide finite computing resources between model complexity, ensemble size, data assimilation and other components of the forecast system. One wishes to avoid undersampling information available from the model's dynamics, yet one also wishes to use the highest fidelity model available. Arguably, a higher fidelity model can better exploit a larger ensemble; nevertheless it is often suggested that a relatively small ensemble, say ~16 members, is sufficient and that larger ensembles are not an effective investment of resources. This claim is shown to be dubious when the goal is probabilistic forecasting, even in settings where the forecast model is informative but imperfect. Probability forecasts for a ‘simple’ physical system are evaluated at different lead times; ensembles of up to 256 members are considered. The pure density estimation context (where ensemble members are drawn from the same underlying distribution as the target differs from the forecasting context, where one is given a high fidelity (but imperfect model. In the forecasting context, the information provided by additional members depends also on the fidelity of the model, the ensemble formation scheme (data assimilation, the ensemble interpretation and the nature of the observational noise. The effect of increasing the ensemble size is quantified by
Ensemble Weight Enumerators for Protograph LDPC Codes

Science.gov (United States)

Divsalar, Dariush

2006-01-01

Recently LDPC codes with projected graph, or protograph structures have been proposed. In this paper, finite length ensemble weight enumerators for LDPC codes with protograph structures are obtained. Asymptotic results are derived as the block size goes to infinity. In particular we are interested in obtaining ensemble average weight enumerators for protograph LDPC codes which have minimum distance that grows linearly with block size. As with irregular ensembles, linear minimum distance property is sensitive to the proportion of degree-2 variable nodes. In this paper the derived results on ensemble weight enumerators show that linear minimum distance condition on degree distribution of unstructured irregular LDPC codes is a sufficient but not a necessary condition for protograph LDPC codes.
Ensemble atmospheric dispersion calculations for decision support systems

International Nuclear Information System (INIS)

Borysiewicz, M.; Potempski, S.; Galkowski, A.; Zelazny, R.

2003-01-01

This document describes two approaches to long-range atmospheric dispersion of pollutants based on the ensemble concept. In the first part of the report some experiences related to the exercises undertaken under the ENSEMBLE project of the European Union are presented. The second part is devoted to the implementation of mesoscale numerical prediction models RAMS and atmospheric dispersion model HYPACT on Beowulf cluster and theirs usage for ensemble forecasting and long range atmospheric ensemble dispersion calculations based on available meteorological data from NCEO, NOAA (USA). (author)
The other side of comparative genomics: genes with no orthologs between the cow and other mammalian species

Directory of Open Access Journals (Sweden)

Ajmone-Marsan Paolo

2009-12-01

Full Text Available Abstract Background With the rapid growth in the availability of genome sequence data, the automated identification of orthologous genes between species (orthologs is of fundamental importance to facilitate functional annotation and studies on comparative and evolutionary genomics. Genes with no apparent orthologs between the bovine and human genome may be responsible for major differences between the species, however, such genes are often neglected in functional genomics studies. Results A BLAST-based method was exploited to explore the current annotation and orthology predictions in Ensembl. Genes with no orthologs between the two genomes were classified into groups based on alignments, ontology, manual curation and publicly available information. Starting from a high quality and specific set of orthology predictions, as provided by Ensembl, hidden relationship between genes and genomes of different mammalian species were unveiled using a highly sensitive approach, based on sequence similarity and genomic comparison. Conclusions The analysis identified 3,801 bovine genes with no orthologs in human and 1010 human genes with no orthologs in cow, among which 411 and 43 genes, respectively, had no match at all in the other species. Most of the apparently non-orthologous genes may potentially have orthologs which were missed in the annotation process, despite having a high percentage of identity, because of differences in gene length and structure. The comparative analysis reported here identified gene variants, new genes and species-specific features and gave an overview of the other side of orthology which may help to improve the annotation of the bovine genome and the knowledge of structural differences between species.
Forecasting European cold waves based on subsampling strategies of CMIP5 and Euro-CORDEX ensembles

Science.gov (United States)

Cordero-Llana, Laura; Braconnot, Pascale; Vautard, Robert; Vrac, Mathieu; Jezequel, Aglae

2016-04-01

Forecasting future extreme events under the present changing climate represents a difficult task. Currently there are a large number of ensembles of simulations for climate projections that take in account different models and scenarios. However, there is a need for reducing the size of the ensemble to make the interpretation of these simulations more manageable for impact studies or climate risk assessment. This can be achieved by developing subsampling strategies to identify a limited number of simulations that best represent the ensemble. In this study, cold waves are chosen to test different approaches for subsampling available simulations. The definition of cold waves depends on the criteria used, but they are generally defined using a minimum temperature threshold, the duration of the cold spell as well as their geographical extend. These climate indicators are not universal, highlighting the difficulty of directly comparing different studies. As part of the of the CLIPC European project, we use daily surface temperature data obtained from CMIP5 outputs as well as Euro-CORDEX simulations to predict future cold waves events in Europe. From these simulations a clustering method is applied to minimise the number of ensembles required. Furthermore, we analyse the different uncertainties that arise from the different model characteristics and definitions of climate indicators. Finally, we will test if the same subsampling strategy can be used for different climate indicators. This will facilitate the use of the subsampling results for a wide number of impact assessment studies.

Ensemble-Based Data Assimilation in Reservoir Characterization: A Review

Directory of Open Access Journals (Sweden)

Seungpil Jung

2018-02-01

Full Text Available This paper presents a review of ensemble-based data assimilation for strongly nonlinear problems on the characterization of heterogeneous reservoirs with different production histories. It concentrates on ensemble Kalman filter (EnKF and ensemble smoother (ES as representative frameworks, discusses their pros and cons, and investigates recent progress to overcome their drawbacks. The typical weaknesses of ensemble-based methods are non-Gaussian parameters, improper prior ensembles and finite population size. Three categorized approaches, to mitigate these limitations, are reviewed with recent accomplishments; improvement of Kalman gains, add-on of transformation functions, and independent evaluation of observed data. The data assimilation in heterogeneous reservoirs, applying the improved ensemble methods, is discussed on predicting unknown dynamic data in reservoir characterization.
Various multistage ensembles for prediction of heating energy consumption

Directory of Open Access Journals (Sweden)

Radisa Jovanovic

2015-04-01

Full Text Available Feedforward neural network models are created for prediction of daily heating energy consumption of a NTNU university campus Gloshaugen using actual measured data for training and testing. Improvement of prediction accuracy is proposed by using neural network ensemble. Previously trained feed-forward neural networks are first separated into clusters, using k-means algorithm, and then the best network of each cluster is chosen as member of an ensemble. Two conventional averaging methods for obtaining ensemble output are applied; simple and weighted. In order to achieve better prediction results, multistage ensemble is investigated. As second level, adaptive neuro-fuzzy inference system with various clustering and membership functions are used to aggregate the selected ensemble members. Feedforward neural network in second stage is also analyzed. It is shown that using ensemble of neural networks can predict heating energy consumption with better accuracy than the best trained single neural network, while the best results are achieved with multistage ensemble.
Ensemble Machine Learning Methods and Applications

CERN Document Server

Ma, Yunqian

2012-01-01

It is common wisdom that gathering a variety of views and inputs improves the process of decision making, and, indeed, underpins a democratic society. Dubbed “ensemble learning” by researchers in computational intelligence and machine learning, it is known to improve a decision system’s robustness and accuracy. Now, fresh developments are allowing researchers to unleash the power of ensemble learning in an increasing range of real-world applications. Ensemble learning algorithms such as “boosting” and “random forest” facilitate solutions to key computational issues such as face detection and are now being applied in areas as diverse as object trackingand bioinformatics. Responding to a shortage of literature dedicated to the topic, this volume offers comprehensive coverage of state-of-the-art ensemble learning techniques, including various contributions from researchers in leading industrial research labs. At once a solid theoretical study and a practical guide, the volume is a windfall for r...
Popular Music and the Instrumental Ensemble.

Science.gov (United States)

Boespflug, George

1999-01-01

Discusses popular music, the role of the musical performer as a creator, and the styles of jazz and popular music. Describes the pop ensemble at the college level, focusing on improvisation, rehearsals, recording, and performance. Argues that pop ensembles be used in junior and senior high school. (CMK)
Ensemble Kalman methods for inverse problems

International Nuclear Information System (INIS)

Iglesias, Marco A; Law, Kody J H; Stuart, Andrew M

2013-01-01

The ensemble Kalman filter (EnKF) was introduced by Evensen in 1994 (Evensen 1994 J. Geophys. Res. 99 10143–62) as a novel method for data assimilation: state estimation for noisily observed time-dependent problems. Since that time it has had enormous impact in many application domains because of its robustness and ease of implementation, and numerical evidence of its accuracy. In this paper we propose the application of an iterative ensemble Kalman method for the solution of a wide class of inverse problems. In this context we show that the estimate of the unknown function that we obtain with the ensemble Kalman method lies in a subspace A spanned by the initial ensemble. Hence the resulting error may be bounded above by the error found from the best approximation in this subspace. We provide numerical experiments which compare the error incurred by the ensemble Kalman method for inverse problems with the error of the best approximation in A, and with variants on traditional least-squares approaches, restricted to the subspace A. In so doing we demonstrate that the ensemble Kalman method for inverse problems provides a derivative-free optimization method with comparable accuracy to that achieved by traditional least-squares approaches. Furthermore, we also demonstrate that the accuracy is of the same order of magnitude as that achieved by the best approximation. Three examples are used to demonstrate these assertions: inversion of a compact linear operator; inversion of piezometric head to determine hydraulic conductivity in a Darcy model of groundwater flow; and inversion of Eulerian velocity measurements at positive times to determine the initial condition in an incompressible fluid. (paper)
Genetic Algorithm Optimized Neural Networks Ensemble as ...

African Journals Online (AJOL)

Marquardt algorithm by varying conditions such as inputs, hidden neurons, initialization, training sets and random Gaussian noise injection to ... Several such ensembles formed the population which was evolved to generate the fittest ensemble.
Steric sea level variability (1993-2010) in an ensemble of ocean reanalyses and objective analyses

Science.gov (United States)

Storto, Andrea; Masina, Simona; Balmaseda, Magdalena; Guinehut, Stéphanie; Xue, Yan; Szekely, Tanguy; Fukumori, Ichiro; Forget, Gael; Chang, You-Soon; Good, Simon A.; Köhl, Armin; Vernieres, Guillaume; Ferry, Nicolas; Peterson, K. Andrew; Behringer, David; Ishii, Masayoshi; Masuda, Shuhei; Fujii, Yosuke; Toyoda, Takahiro; Yin, Yonghong; Valdivieso, Maria; Barnier, Bernard; Boyer, Tim; Lee, Tony; Gourrion, Jérome; Wang, Ou; Heimback, Patrick; Rosati, Anthony; Kovach, Robin; Hernandez, Fabrice; Martin, Matthew J.; Kamachi, Masafumi; Kuragano, Tsurane; Mogensen, Kristian; Alves, Oscar; Haines, Keith; Wang, Xiaochun

2017-08-01

Quantifying the effect of the seawater density changes on sea level variability is of crucial importance for climate change studies, as the sea level cumulative rise can be regarded as both an important climate change indicator and a possible danger for human activities in coastal areas. In this work, as part of the Ocean Reanalysis Intercomparison Project, the global and regional steric sea level changes are estimated and compared from an ensemble of 16 ocean reanalyses and 4 objective analyses. These estimates are initially compared with a satellite-derived (altimetry minus gravimetry) dataset for a short period (2003-2010). The ensemble mean exhibits a significant high correlation at both global and regional scale, and the ensemble of ocean reanalyses outperforms that of objective analyses, in particular in the Southern Ocean. The reanalysis ensemble mean thus represents a valuable tool for further analyses, although large uncertainties remain for the inter-annual trends. Within the extended intercomparison period that spans the altimetry era (1993-2010), we find that the ensemble of reanalyses and objective analyses are in good agreement, and both detect a trend of the global steric sea level of 1.0 and 1.1 ± 0.05 mm/year, respectively. However, the spread among the products of the halosteric component trend exceeds the mean trend itself, questioning the reliability of its estimate. This is related to the scarcity of salinity observations before the Argo era. Furthermore, the impact of deep ocean layers is non-negligible on the steric sea level variability (22 and 12 % for the layers below 700 and 1500 m of depth, respectively), although the small deep ocean trends are not significant with respect to the products spread.
Ensemble of regional climate model projections for Ireland

Science.gov (United States)

Nolan, Paul; McGrath, Ray

2016-04-01

of over 35 days per year. Results show significant projected decreases in mean annual, spring and summer precipitation amounts by mid-century. The projected decreases are largest for summer, with "likely" reductions ranging from 0% to 20%. The frequencies of heavy precipitation events show notable increases (approximately 20%) during the winter and autumn months. The number of extended dry periods is projected to increase substantially during autumn and summer. Regional variations of projected precipitation change remain statistically elusive. The energy content of the wind is projected to significantly decrease for the future spring, summer and autumn months. Projected increases for winter were found to be statistically insignificant. The projected decreases were largest for summer, with "likely" values ranging from 3% to 15%. Results suggest that the tracks of intense storms are projected to extend further south over Ireland relative to those in the reference simulation. As extreme storm events are rare, the storm-tracking research needs to be extended. Future work will focus on analysing a larger ensemble, thus allowing a robust statistical analysis of extreme storm track projections.
Global Ensemble Forecast System (GEFS) [1 Deg.

Data.gov (United States)

National Oceanic and Atmospheric Administration, Department of Commerce — The Global Ensemble Forecast System (GEFS) is a weather forecast model made up of 21 separate forecasts, or ensemble members. The National Centers for Environmental...
Bidirectional Modulation of Intrinsic Excitability in Rat Prelimbic Cortex Neuronal Ensembles and Non-Ensembles after Operant Learning.

Science.gov (United States)

Whitaker, Leslie R; Warren, Brandon L; Venniro, Marco; Harte, Tyler C; McPherson, Kylie B; Beidel, Jennifer; Bossert, Jennifer M; Shaham, Yavin; Bonci, Antonello; Hope, Bruce T

2017-09-06

Learned associations between environmental stimuli and rewards drive goal-directed learning and motivated behavior. These memories are thought to be encoded by alterations within specific patterns of sparsely distributed neurons called neuronal ensembles that are activated selectively by reward-predictive stimuli. Here, we use the Fos promoter to identify strongly activated neuronal ensembles in rat prelimbic cortex (PLC) and assess altered intrinsic excitability after 10 d of operant food self-administration training (1 h/d). First, we used the Daun02 inactivation procedure in male FosLacZ-transgenic rats to ablate selectively Fos-expressing PLC neurons that were active during operant food self-administration. Selective ablation of these neurons decreased food seeking. We then used male FosGFP-transgenic rats to assess selective alterations of intrinsic excitability in Fos-expressing neuronal ensembles (FosGFP + ) that were activated during food self-administration and compared these with alterations in less activated non-ensemble neurons (FosGFP - ). Using whole-cell recordings of layer V pyramidal neurons in an ex vivo brain slice preparation, we found that operant self-administration increased excitability of FosGFP + neurons and decreased excitability of FosGFP - neurons. Increased excitability of FosGFP + neurons was driven by increased steady-state input resistance. Decreased excitability of FosGFP - neurons was driven by increased contribution of small-conductance calcium-activated potassium (SK) channels. Injections of the specific SK channel antagonist apamin into PLC increased Fos expression but had no effect on food seeking. Overall, operant learning increased intrinsic excitability of PLC Fos-expressing neuronal ensembles that play a role in food seeking but decreased intrinsic excitability of Fos - non-ensembles. SIGNIFICANCE STATEMENT Prefrontal cortex activity plays a critical role in operant learning, but the underlying cellular mechanisms are
Derivation of Mayer Series from Canonical Ensemble

International Nuclear Information System (INIS)

Wang Xian-Zhi

2016-01-01

Mayer derived the Mayer series from both the canonical ensemble and the grand canonical ensemble by use of the cluster expansion method. In 2002, we conjectured a recursion formula of the canonical partition function of a fluid (X.Z. Wang, Phys. Rev. E 66 (2002) 056102). In this paper we give a proof for this formula by developing an appropriate expansion of the integrand of the canonical partition function. We further derive the Mayer series solely from the canonical ensemble by use of this recursion formula. (paper)
Derivation of Mayer Series from Canonical Ensemble

Science.gov (United States)

Wang, Xian-Zhi

2016-02-01

Mayer derived the Mayer series from both the canonical ensemble and the grand canonical ensemble by use of the cluster expansion method. In 2002, we conjectured a recursion formula of the canonical partition function of a fluid (X.Z. Wang, Phys. Rev. E 66 (2002) 056102). In this paper we give a proof for this formula by developing an appropriate expansion of the integrand of the canonical partition function. We further derive the Mayer series solely from the canonical ensemble by use of this recursion formula.
Ensemble inequivalence: Landau theory and the ABC model

International Nuclear Information System (INIS)

Cohen, O; Mukamel, D

2012-01-01

It is well known that systems with long-range interactions may exhibit different phase diagrams when studied within two different ensembles. In many of the previously studied examples of ensemble inequivalence, the phase diagrams differ only when the transition in one of the ensembles is first order. By contrast, in a recent study of a generalized ABC model, the canonical and grand-canonical ensembles of the model were shown to differ even when they both exhibit a continuous transition. Here we show that the order of the transition where ensemble inequivalence may occur is related to the symmetry properties of the order parameter associated with the transition. This is done by analyzing the Landau expansion of a generic model with long-range interactions. The conclusions drawn from the generic analysis are demonstrated for the ABC model by explicit calculation of its Landau expansion. (paper)
Regionalization of post-processed ensemble runoff forecasts

Directory of Open Access Journals (Sweden)

J. O. Skøien

2016-05-01

Full Text Available For many years, meteorological models have been run with perturbated initial conditions or parameters to produce ensemble forecasts that are used as a proxy of the uncertainty of the forecasts. However, the ensembles are usually both biased (the mean is systematically too high or too low, compared with the observed weather, and has dispersion errors (the ensemble variance indicates a too low or too high confidence in the forecast, compared with the observed weather. The ensembles are therefore commonly post-processed to correct for these shortcomings. Here we look at one of these techniques, referred to as Ensemble Model Output Statistics (EMOS (Gneiting et al., 2005. Originally, the post-processing parameters were identified as a fixed set of parameters for a region. The application of our work is the European Flood Awareness System (http://www.efas.eu, where a distributed model is run with meteorological ensembles as input. We are therefore dealing with a considerably larger data set than previous analyses. We also want to regionalize the parameters themselves for other locations than the calibration gauges. The post-processing parameters are therefore estimated for each calibration station, but with a spatial penalty for deviations from neighbouring stations, depending on the expected semivariance between the calibration catchment and these stations. The estimated post-processed parameters can then be used for regionalization of the postprocessing parameters also for uncalibrated locations using top-kriging in the rtop-package (Skøien et al., 2006, 2014. We will show results from cross-validation of the methodology and although our interest is mainly in identifying exceedance probabilities for certain return levels, we will also show how the rtop package can be used for creating a set of post-processed ensembles through simulations.
A Comparison of Ensemble Kalman Filters for Storm Surge Assimilation

KAUST Repository

Altaf, Muhammad

2014-08-01

This study evaluates and compares the performances of several variants of the popular ensembleKalman filter for the assimilation of storm surge data with the advanced circulation (ADCIRC) model. Using meteorological data from Hurricane Ike to force the ADCIRC model on a domain including the Gulf ofMexico coastline, the authors implement and compare the standard stochastic ensembleKalman filter (EnKF) and three deterministic square root EnKFs: the singular evolutive interpolated Kalman (SEIK) filter, the ensemble transform Kalman filter (ETKF), and the ensemble adjustment Kalman filter (EAKF). Covariance inflation and localization are implemented in all of these filters. The results from twin experiments suggest that the square root ensemble filters could lead to very comparable performances with appropriate tuning of inflation and localization, suggesting that practical implementation details are at least as important as the choice of the square root ensemble filter itself. These filters also perform reasonably well with a relatively small ensemble size, whereas the stochastic EnKF requires larger ensemble sizes to provide similar accuracy for forecasts of storm surge.
A Comparison of Ensemble Kalman Filters for Storm Surge Assimilation

KAUST Repository

Altaf, Muhammad; Butler, T.; Mayo, T.; Luo, X.; Dawson, C.; Heemink, A. W.; Hoteit, Ibrahim

2014-01-01

This study evaluates and compares the performances of several variants of the popular ensembleKalman filter for the assimilation of storm surge data with the advanced circulation (ADCIRC) model. Using meteorological data from Hurricane Ike to force the ADCIRC model on a domain including the Gulf ofMexico coastline, the authors implement and compare the standard stochastic ensembleKalman filter (EnKF) and three deterministic square root EnKFs: the singular evolutive interpolated Kalman (SEIK) filter, the ensemble transform Kalman filter (ETKF), and the ensemble adjustment Kalman filter (EAKF). Covariance inflation and localization are implemented in all of these filters. The results from twin experiments suggest that the square root ensemble filters could lead to very comparable performances with appropriate tuning of inflation and localization, suggesting that practical implementation details are at least as important as the choice of the square root ensemble filter itself. These filters also perform reasonably well with a relatively small ensemble size, whereas the stochastic EnKF requires larger ensemble sizes to provide similar accuracy for forecasts of storm surge.
The life cycle of a genome project: perspectives and guidelines inspired by insect genome projects.

Science.gov (United States)

Papanicolaou, Alexie

2016-01-01

Many research programs on non-model species biology have been empowered by genomics. In turn, genomics is underpinned by a reference sequence and ancillary information created by so-called "genome projects". The most reliable genome projects are the ones created as part of an active research program and designed to address specific questions but their life extends past publication. In this opinion paper I outline four key insights that have facilitated maintaining genomic communities: the key role of computational capability, the iterative process of building genomic resources, the value of community participation and the importance of manual curation. Taken together, these ideas can and do ensure the longevity of genome projects and the growing non-model species community can use them to focus a discussion with regards to its future genomic infrastructure.
HIGH-RESOLUTION ATMOSPHERIC ENSEMBLE MODELING AT SRNL

Energy Technology Data Exchange (ETDEWEB)

Buckley, R.; Werth, D.; Chiswell, S.; Etherton, B.

2011-05-10

The High-Resolution Mid-Atlantic Forecasting Ensemble (HME) is a federated effort to improve operational forecasts related to precipitation, convection and boundary layer evolution, and fire weather utilizing data and computing resources from a diverse group of cooperating institutions in order to create a mesoscale ensemble from independent members. Collaborating organizations involved in the project include universities, National Weather Service offices, and national laboratories, including the Savannah River National Laboratory (SRNL). The ensemble system is produced from an overlapping numerical weather prediction model domain and parameter subsets provided by each contributing member. The coordination, synthesis, and dissemination of the ensemble information are performed by the Renaissance Computing Institute (RENCI) at the University of North Carolina-Chapel Hill. This paper discusses background related to the HME effort, SRNL participation, and example results available from the RENCI website.
Extended Range Prediction of Indian Summer Monsoon: Current status

Science.gov (United States)

Sahai, A. K.; Abhilash, S.; Borah, N.; Joseph, S.; Chattopadhyay, R.; S, S.; Rajeevan, M.; Mandal, R.; Dey, A.

2014-12-01

The main focus of this study is to develop forecast consensus in the extended range prediction (ERP) of monsoon Intraseasonal oscillations using a suit of different variants of Climate Forecast system (CFS) model. In this CFS based Grand MME prediction system (CGMME), the ensemble members are generated by perturbing the initial condition and using different configurations of CFSv2. This is to address the role of different physical mechanisms known to have control on the error growth in the ERP in the 15-20 day time scale. The final formulation of CGMME is based on 21 ensembles of the standalone Global Forecast System (GFS) forced with bias corrected forecasted SST from CFS, 11 low resolution CFST126 and 11 high resolution CFST382. Thus, we develop the multi-model consensus forecast for the ERP of Indian summer monsoon (ISM) using a suite of different variants of CFS model. This coordinated international effort lead towards the development of specific tailor made regional forecast products over Indian region. Skill of deterministic and probabilistic categorical rainfall forecast as well the verification of large-scale low frequency monsoon intraseasonal oscillations has been carried out using hindcast from 2001-2012 during the monsoon season in which all models are initialized at every five days starting from 16May to 28 September. The skill of deterministic forecast from CGMME is better than the best participating single model ensemble configuration (SME). The CGMME approach is believed to quantify the uncertainty in both initial conditions and model formulation. Main improvement is attained in probabilistic forecast which is because of an increase in the ensemble spread, thereby reducing the error due to over-confident ensembles in a single model configuration. For probabilistic forecast, three tercile ranges are determined by ranking method based on the percentage of ensemble members from all the participating models falls in those three categories. CGMME further
The Hydrologic Ensemble Prediction Experiment (HEPEX)

Science.gov (United States)

Wood, A. W.; Thielen, J.; Pappenberger, F.; Schaake, J. C.; Hartman, R. K.

2012-12-01

The Hydrologic Ensemble Prediction Experiment was established in March, 2004, at a workshop hosted by the European Center for Medium Range Weather Forecasting (ECMWF). With support from the US National Weather Service (NWS) and the European Commission (EC), the HEPEX goal was to bring the international hydrological and meteorological communities together to advance the understanding and adoption of hydrological ensemble forecasts for decision support in emergency management and water resources sectors. The strategy to meet this goal includes meetings that connect the user, forecast producer and research communities to exchange ideas, data and methods; the coordination of experiments to address specific challenges; and the formation of testbeds to facilitate shared experimentation. HEPEX has organized about a dozen international workshops, as well as sessions at scientific meetings (including AMS, AGU and EGU) and special issues of scientific journals where workshop results have been published. Today, the HEPEX mission is to demonstrate the added value of hydrological ensemble prediction systems (HEPS) for emergency management and water resources sectors to make decisions that have important consequences for economy, public health, safety, and the environment. HEPEX is now organised around six major themes that represent core elements of a hydrologic ensemble prediction enterprise: input and pre-processing, ensemble techniques, data assimilation, post-processing, verification, and communication and use in decision making. This poster presents an overview of recent and planned HEPEX activities, highlighting case studies that exemplify the focus and objectives of HEPEX.

The NASA Reanalysis Ensemble Service - Advanced Capabilities for Integrated Reanalysis Access and Intercomparison

Science.gov (United States)

Tamkin, G.; Schnase, J. L.; Duffy, D.; Li, J.; Strong, S.; Thompson, J. H.

2017-12-01

NASA's efforts to advance climate analytics-as-a-service are making new capabilities available to the research community: (1) A full-featured Reanalysis Ensemble Service (RES) comprising monthly means data from multiple reanalysis data sets, accessible through an enhanced set of extraction, analytic, arithmetic, and intercomparison operations. The operations are made accessible through NASA's climate data analytics Web services and our client-side Climate Data Services Python library, CDSlib; (2) A cloud-based, high-performance Virtual Real-Time Analytics Testbed supporting a select set of climate variables. This near real-time capability enables advanced technologies like Spark and Hadoop-based MapReduce analytics over native NetCDF files; and (3) A WPS-compliant Web service interface to our climate data analytics service that will enable greater interoperability with next-generation systems such as ESGF. The Reanalysis Ensemble Service includes the following: - New API that supports full temporal, spatial, and grid-based resolution services with sample queries - A Docker-ready RES application to deploy across platforms - Extended capabilities that enable single- and multiple reanalysis area average, vertical average, re-gridding, standard deviation, and ensemble averages - Convenient, one-stop shopping for commonly used data products from multiple reanalyses including basic sub-setting and arithmetic operations (e.g., avg, sum, max, min, var, count, anomaly) - Full support for the MERRA-2 reanalysis dataset in addition to, ECMWF ERA-Interim, NCEP CFSR, JMA JRA-55 and NOAA/ESRL 20CR… - A Jupyter notebook-based distribution mechanism designed for client use cases that combines CDSlib documentation with interactive scenarios and personalized project management - Supporting analytic services for NASA GMAO Forward Processing datasets - Basic uncertainty quantification services that combine heterogeneous ensemble products with comparative observational products (e
Data assimilation in integrated hydrological modeling using ensemble Kalman filtering

DEFF Research Database (Denmark)

Rasmussen, Jørn; Madsen, H.; Jensen, Karsten Høgh

2015-01-01

Groundwater head and stream discharge is assimilated using the ensemble transform Kalman filter in an integrated hydrological model with the aim of studying the relationship between the filter performance and the ensemble size. In an attempt to reduce the required number of ensemble members...... and estimating parameters requires a much larger ensemble size than just assimilating groundwater head observations. However, the required ensemble size can be greatly reduced with the use of adaptive localization, which by far outperforms distance-based localization. The study is conducted using synthetic data...
Decadal climate predictions improved by ocean ensemble dispersion filtering

Science.gov (United States)

Kadow, C.; Illing, S.; Kröner, I.; Ulbrich, U.; Cubasch, U.

2017-06-01

Decadal predictions by Earth system models aim to capture the state and phase of the climate several years in advance. Atmosphere-ocean interaction plays an important role for such climate forecasts. While short-term weather forecasts represent an initial value problem and long-term climate projections represent a boundary condition problem, the decadal climate prediction falls in-between these two time scales. In recent years, more precise initialization techniques of coupled Earth system models and increased ensemble sizes have improved decadal predictions. However, climate models in general start losing the initialized signal and its predictive skill from one forecast year to the next. Here we show that the climate prediction skill of an Earth system model can be improved by a shift of the ocean state toward the ensemble mean of its individual members at seasonal intervals. We found that this procedure, called ensemble dispersion filter, results in more accurate results than the standard decadal prediction. Global mean and regional temperature, precipitation, and winter cyclone predictions show an increased skill up to 5 years ahead. Furthermore, the novel technique outperforms predictions with larger ensembles and higher resolution. Our results demonstrate how decadal climate predictions benefit from ocean ensemble dispersion filtering toward the ensemble mean.Plain Language SummaryDecadal predictions aim to predict the climate several years in advance. Atmosphere-ocean interaction plays an important role for such climate forecasts. The ocean memory due to its heat capacity holds big potential skill. In recent years, more precise initialization techniques of coupled Earth system models (incl. atmosphere and ocean) have improved decadal predictions. Ensembles are another important aspect. Applying slightly perturbed predictions to trigger the famous butterfly effect results in an ensemble. Instead of evaluating one prediction, but the whole ensemble with its
Ensemble-based Probabilistic Forecasting at Horns Rev

DEFF Research Database (Denmark)

Pinson, Pierre; Madsen, Henrik

2009-01-01

forecasting methodology. In a first stage, ensemble forecasts of meteorological variables are converted to power through a suitable power curve model. This modelemploys local polynomial regression, and is adoptively estimated with an orthogonal fitting method. The obtained ensemble forecasts of wind power...
DroidEnsemble: Detecting Android Malicious Applications with Ensemble of String and Structural Static Features

KAUST Repository

Wang, Wei

2018-05-11

Android platform has dominated the Operating System of mobile devices. However, the dramatic increase of Android malicious applications (malapps) has caused serious software failures to Android system and posed a great threat to users. The effective detection of Android malapps has thus become an emerging yet crucial issue. Characterizing the behaviors of Android applications (apps) is essential to detecting malapps. Most existing work on detecting Android malapps was mainly based on string static features such as permissions and API usage extracted from apps. There also exists work on the detection of Android malapps with structural features, such as Control Flow Graph (CFG) and Data Flow Graph (DFG). As Android malapps have become increasingly polymorphic and sophisticated, using only one type of static features may result in false negatives. In this work, we propose DroidEnsemble that takes advantages of both string features and structural features to systematically and comprehensively characterize the static behaviors of Android apps and thus build a more accurate detection model for the detection of Android malapps. We extract each app’s string features, including permissions, hardware features, filter intents, restricted API calls, used permissions, code patterns, as well as structural features like function call graph. We then use three machine learning algorithms, namely, Support Vector Machine (SVM), k-Nearest Neighbor (kNN) and Random Forest (RF), to evaluate the performance of these two types of features and of their ensemble. In the experiments, We evaluate our methods and models with 1386 benign apps and 1296 malapps. Extensive experimental results demonstrate the effectiveness of DroidEnsemble. It achieves the detection accuracy as 95.8% with only string features and as 90.68% with only structural features. DroidEnsemble reaches the detection accuracy as 98.4% with the ensemble of both types of features, reducing 9 false positives and 12 false
Measurement of the quantum superposition state of an imaging ensemble of photons prepared in orbital angular momentum states using a phase-diversity method

International Nuclear Information System (INIS)

Uribe-Patarroyo, Nestor; Alvarez-Herrero, Alberto; Belenguer, Tomas

2010-01-01

We propose the use of a phase-diversity technique to estimate the orbital angular momentum (OAM) superposition state of an ensemble of photons that passes through an optical system, proceeding from an extended object. The phase-diversity technique permits the estimation of the optical transfer function (OTF) of an imaging optical system. As the OTF is derived directly from the wave-front characteristics of the observed light, we redefine the phase-diversity technique in terms of a superposition of OAM states. We test this new technique experimentally and find coherent results among different tests, which gives us confidence in the estimation of the photon ensemble state. We find that this technique not only allows us to estimate the square of the amplitude of each OAM state, but also the relative phases among all states, thus providing complete information about the quantum state of the photons. This technique could be used to measure the OAM spectrum of extended objects in astronomy or in an optical communication scheme using OAM states. In this sense, the use of extended images could lead to new techniques in which the communication is further multiplexed along the field.
Powerful Tests for Multi-Marker Association Analysis Using Ensemble Learning.

Directory of Open Access Journals (Sweden)

Badri Padhukasahasram

Full Text Available Multi-marker approaches have received a lot of attention recently in genome wide association studies and can enhance power to detect new associations under certain conditions. Gene-, gene-set- and pathway-based association tests are increasingly being viewed as useful supplements to the more widely used single marker association analysis which have successfully uncovered numerous disease variants. A major drawback of single-marker based methods is that they do not look at the joint effects of multiple genetic variants which individually may have weak or moderate signals. Here, we describe novel tests for multi-marker association analyses that are based on phenotype predictions obtained from machine learning algorithms. Instead of assuming a linear or logistic regression model, we propose the use of ensembles of diverse machine learning algorithms for prediction. We show that phenotype predictions obtained from ensemble learning algorithms provide a new framework for multi-marker association analysis. They can be used for constructing tests for the joint association of multiple variants, adjusting for covariates and testing for the presence of interactions. To demonstrate the power and utility of this new approach, we first apply our method to simulated SNP datasets. We show that the proposed method has the correct Type-1 error rates and can be considerably more powerful than alternative approaches in some situations. Then, we apply our method to previously studied asthma-related genes in 2 independent asthma cohorts to conduct association tests.
Crossover ensembles of random matrices and skew-orthogonal polynomials

International Nuclear Information System (INIS)

Kumar, Santosh; Pandey, Akhilesh

2011-01-01

Highlights: → We study crossover ensembles of Jacobi family of random matrices. → We consider correlations for orthogonal-unitary and symplectic-unitary crossovers. → We use the method of skew-orthogonal polynomials and quaternion determinants. → We prove universality of spectral correlations in crossover ensembles. → We discuss applications to quantum conductance and communication theory problems. - Abstract: In a recent paper (S. Kumar, A. Pandey, Phys. Rev. E, 79, 2009, p. 026211) we considered Jacobi family (including Laguerre and Gaussian cases) of random matrix ensembles and reported exact solutions of crossover problems involving time-reversal symmetry breaking. In the present paper we give details of the work. We start with Dyson's Brownian motion description of random matrix ensembles and obtain universal hierarchic relations among the unfolded correlation functions. For arbitrary dimensions we derive the joint probability density (jpd) of eigenvalues for all transitions leading to unitary ensembles as equilibrium ensembles. We focus on the orthogonal-unitary and symplectic-unitary crossovers and give generic expressions for jpd of eigenvalues, two-point kernels and n-level correlation functions. This involves generalization of the theory of skew-orthogonal polynomials to crossover ensembles. We also consider crossovers in the circular ensembles to show the generality of our method. In the large dimensionality limit, correlations in spectra with arbitrary initial density are shown to be universal when expressed in terms of a rescaled symmetry breaking parameter. Applications of our crossover results to communication theory and quantum conductance problems are also briefly discussed.
Representing Color Ensembles.

Science.gov (United States)

Chetverikov, Andrey; Campana, Gianluca; Kristjánsson, Árni

2017-10-01

Colors are rarely uniform, yet little is known about how people represent color distributions. We introduce a new method for studying color ensembles based on intertrial learning in visual search. Participants looked for an oddly colored diamond among diamonds with colors taken from either uniform or Gaussian color distributions. On test trials, the targets had various distances in feature space from the mean of the preceding distractor color distribution. Targets on test trials therefore served as probes into probabilistic representations of distractor colors. Test-trial response times revealed a striking similarity between the physical distribution of colors and their internal representations. The results demonstrate that the visual system represents color ensembles in a more detailed way than previously thought, coding not only mean and variance but, most surprisingly, the actual shape (uniform or Gaussian) of the distribution of colors in the environment.
Efficient Kernel-Based Ensemble Gaussian Mixture Filtering

KAUST Repository

Liu, Bo

2015-11-11

We consider the Bayesian filtering problem for data assimilation following the kernel-based ensemble Gaussian-mixture filtering (EnGMF) approach introduced by Anderson and Anderson (1999). In this approach, the posterior distribution of the system state is propagated with the model using the ensemble Monte Carlo method, providing a forecast ensemble that is then used to construct a prior Gaussian-mixture (GM) based on the kernel density estimator. This results in two update steps: a Kalman filter (KF)-like update of the ensemble members and a particle filter (PF)-like update of the weights, followed by a resampling step to start a new forecast cycle. After formulating EnGMF for any observational operator, we analyze the influence of the bandwidth parameter of the kernel function on the covariance of the posterior distribution. We then focus on two aspects: i) the efficient implementation of EnGMF with (relatively) small ensembles, where we propose a new deterministic resampling strategy preserving the first two moments of the posterior GM to limit the sampling error; and ii) the analysis of the effect of the bandwidth parameter on contributions of KF and PF updates and on the weights variance. Numerical results using the Lorenz-96 model are presented to assess the behavior of EnGMF with deterministic resampling, study its sensitivity to different parameters and settings, and evaluate its performance against ensemble KFs. The proposed EnGMF approach with deterministic resampling suggests improved estimates in all tested scenarios, and is shown to require less localization and to be less sensitive to the choice of filtering parameters.
Big Data Analysis of Human Genome Variations

KAUST Repository

Gojobori, Takashi

2016-01-01

Since the human genome draft sequence was in public for the first time in 2000, genomic analyses have been intensively extended to the population level. The following three international projects are good examples for large-scale studies of human
Design ensemble machine learning model for breast cancer diagnosis.

Science.gov (United States)

Hsieh, Sheau-Ling; Hsieh, Sung-Huai; Cheng, Po-Hsun; Chen, Chi-Huang; Hsu, Kai-Ping; Lee, I-Shun; Wang, Zhenyu; Lai, Feipei

2012-10-01

In this paper, we classify the breast cancer of medical diagnostic data. Information gain has been adapted for feature selections. Neural fuzzy (NF), k-nearest neighbor (KNN), quadratic classifier (QC), each single model scheme as well as their associated, ensemble ones have been developed for classifications. In addition, a combined ensemble model with these three schemes has been constructed for further validations. The experimental results indicate that the ensemble learning performs better than individual single ones. Moreover, the combined ensemble model illustrates the highest accuracy of classifications for the breast cancer among all models.
Skill prediction of local weather forecasts based on the ECMWF ensemble

Directory of Open Access Journals (Sweden)

C. Ziehmann

2001-01-01

Full Text Available Ensemble Prediction has become an essential part of numerical weather forecasting. In this paper we investigate the ability of ensemble forecasts to provide an a priori estimate of the expected forecast skill. Several quantities derived from the local ensemble distribution are investigated for a two year data set of European Centre for Medium-Range Weather Forecasts (ECMWF temperature and wind speed ensemble forecasts at 30 German stations. The results indicate that the population of the ensemble mode provides useful information for the uncertainty in temperature forecasts. The ensemble entropy is a similar good measure. This is not true for the spread if it is simply calculated as the variance of the ensemble members with respect to the ensemble mean. The number of clusters in the C regions is almost unrelated to the local skill. For wind forecasts, the results are less promising.
Ensemble methods for seasonal limited area forecasts

DEFF Research Database (Denmark)

Arritt, Raymond W.; Anderson, Christopher J.; Takle, Eugene S.

2004-01-01

The ensemble prediction methods used for seasonal limited area forecasts were examined by comparing methods for generating ensemble simulations of seasonal precipitation. The summer 1993 model over the north-central US was used as a test case. The four methods examined included the lagged-average...
Creating ensembles of decision trees through sampling

Science.gov (United States)

Kamath, Chandrika; Cantu-Paz, Erick

2005-08-30

A system for decision tree ensembles that includes a module to read the data, a module to sort the data, a module to evaluate a potential split of the data according to some criterion using a random sample of the data, a module to split the data, and a module to combine multiple decision trees in ensembles. The decision tree method is based on statistical sampling techniques and includes the steps of reading the data; sorting the data; evaluating a potential split according to some criterion using a random sample of the data, splitting the data, and combining multiple decision trees in ensembles.
An Efficient Ensemble Learning Method for Gene Microarray Classification

Directory of Open Access Journals (Sweden)

Alireza Osareh

2013-01-01

Full Text Available The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.
Sub-Ensemble Coastal Flood Forecasting: A Case Study of Hurricane Sandy

Directory of Open Access Journals (Sweden)

Justin A. Schulte

2017-12-01

Full Text Available In this paper, it is proposed that coastal flood ensemble forecasts be partitioned into sub-ensemble forecasts using cluster analysis in order to produce representative statistics and to measure forecast uncertainty arising from the presence of clusters. After clustering the ensemble members, the ability to predict the cluster into which the observation will fall can be measured using a cluster skill score. Additional sub-ensemble and composite skill scores are proposed for assessing the forecast skill of a clustered ensemble forecast. A recently proposed method for statistically increasing the number of ensemble members is used to improve sub-ensemble probabilistic estimates. Through the application of the proposed methodology to Sandy coastal flood reforecasts, it is demonstrated that statistics computed using only ensemble members belonging to a specific cluster are more representative than those computed using all ensemble members simultaneously. A cluster skill-cluster uncertainty index relationship is identified, which is the cluster analog of the documented spread-skill relationship. Two sub-ensemble skill scores are shown to be positively correlated with cluster forecast skill, suggesting that skillfully forecasting the cluster into which the observation will fall is important to overall forecast skill. The identified relationships also suggest that the number of ensemble members within in each cluster can be used as guidance for assessing the potential for forecast error. The inevitable existence of ensemble member clusters in tidally dominated total water level prediction systems suggests that clustering is a necessary post-processing step for producing representative and skillful total water level forecasts.
Ensemble Bayesian forecasting system Part I: Theory and algorithms

Science.gov (United States)

Herr, Henry D.; Krzysztofowicz, Roman

2015-05-01

The ensemble Bayesian forecasting system (EBFS), whose theory was published in 2001, is developed for the purpose of quantifying the total uncertainty about a discrete-time, continuous-state, non-stationary stochastic process such as a time series of stages, discharges, or volumes at a river gauge. The EBFS is built of three components: an input ensemble forecaster (IEF), which simulates the uncertainty associated with random inputs; a deterministic hydrologic model (of any complexity), which simulates physical processes within a river basin; and a hydrologic uncertainty processor (HUP), which simulates the hydrologic uncertainty (an aggregate of all uncertainties except input). It works as a Monte Carlo simulator: an ensemble of time series of inputs (e.g., precipitation amounts) generated by the IEF is transformed deterministically through a hydrologic model into an ensemble of time series of outputs, which is next transformed stochastically by the HUP into an ensemble of time series of predictands (e.g., river stages). Previous research indicated that in order to attain an acceptable sampling error, the ensemble size must be on the order of hundreds (for probabilistic river stage forecasts and probabilistic flood forecasts) or even thousands (for probabilistic stage transition forecasts). The computing time needed to run the hydrologic model this many times renders the straightforward simulations operationally infeasible. This motivates the development of the ensemble Bayesian forecasting system with randomization (EBFSR), which takes full advantage of the analytic meta-Gaussian HUP and generates multiple ensemble members after each run of the hydrologic model; this auxiliary randomization reduces the required size of the meteorological input ensemble and makes it operationally feasible to generate a Bayesian ensemble forecast of large size. Such a forecast quantifies the total uncertainty, is well calibrated against the prior (climatic) distribution of
Modeling task-specific neuronal ensembles improves decoding of grasp

Science.gov (United States)

Smith, Ryan J.; Soares, Alcimar B.; Rouse, Adam G.; Schieber, Marc H.; Thakor, Nitish V.

2018-06-01

Objective. Dexterous movement involves the activation and coordination of networks of neuronal populations across multiple cortical regions. Attempts to model firing of individual neurons commonly treat the firing rate as directly modulating with motor behavior. However, motor behavior may additionally be associated with modulations in the activity and functional connectivity of neurons in a broader ensemble. Accounting for variations in neural ensemble connectivity may provide additional information about the behavior being performed. Approach. In this study, we examined neural ensemble activity in primary motor cortex (M1) and premotor cortex (PM) of two male rhesus monkeys during performance of a center-out reach, grasp and manipulate task. We constructed point process encoding models of neuronal firing that incorporated task-specific variations in the baseline firing rate as well as variations in functional connectivity with the neural ensemble. Models were evaluated both in terms of their encoding capabilities and their ability to properly classify the grasp being performed. Main results. Task-specific ensemble models correctly predicted the performed grasp with over 95% accuracy and were shown to outperform models of neuronal activity that assume only a variable baseline firing rate. Task-specific ensemble models exhibited superior decoding performance in 82% of units in both monkeys (p < 0.01). Inclusion of ensemble activity also broadly improved the ability of models to describe observed spiking. Encoding performance of task-specific ensemble models, measured by spike timing predictability, improved upon baseline models in 62% of units. Significance. These results suggest that additional discriminative information about motor behavior found in the variations in functional connectivity of neuronal ensembles located in motor-related cortical regions is relevant to decode complex tasks such as grasping objects, and may serve the basis for more
Improving Climate Projections Using "Intelligent" Ensembles

Science.gov (United States)

Baker, Noel C.; Taylor, Patrick C.

2015-01-01

Recent changes in the climate system have led to growing concern, especially in communities which are highly vulnerable to resource shortages and weather extremes. There is an urgent need for better climate information to develop solutions and strategies for adapting to a changing climate. Climate models provide excellent tools for studying the current state of climate and making future projections. However, these models are subject to biases created by structural uncertainties. Performance metrics-or the systematic determination of model biases-succinctly quantify aspects of climate model behavior. Efforts to standardize climate model experiments and collect simulation data-such as the Coupled Model Intercomparison Project (CMIP)-provide the means to directly compare and assess model performance. Performance metrics have been used to show that some models reproduce present-day climate better than others. Simulation data from multiple models are often used to add value to projections by creating a consensus projection from the model ensemble, in which each model is given an equal weight. It has been shown that the ensemble mean generally outperforms any single model. It is possible to use unequal weights to produce ensemble means, in which models are weighted based on performance (called "intelligent" ensembles). Can performance metrics be used to improve climate projections? Previous work introduced a framework for comparing the utility of model performance metrics, showing that the best metrics are related to the variance of top-of-atmosphere outgoing longwave radiation. These metrics improve present-day climate simulations of Earth's energy budget using the "intelligent" ensemble method. The current project identifies several approaches for testing whether performance metrics can be applied to future simulations to create "intelligent" ensemble-mean climate projections. It is shown that certain performance metrics test key climate processes in the models, and

Generation of scenarios from calibrated ensemble forecasts with a dual ensemble copula coupling approach

DEFF Research Database (Denmark)

Ben Bouallègue, Zied; Heppelmann, Tobias; Theis, Susanne E.

2016-01-01

the original ensemble forecasts. Based on the assumption of error stationarity, parametric methods aim to fully describe the forecast dependence structures. In this study, the concept of ECC is combined with past data statistics in order to account for the autocorrelation of the forecast error. The new...... approach, called d-ECC, is applied to wind forecasts from the high resolution ensemble system COSMO-DE-EPS run operationally at the German weather service. Scenarios generated by ECC and d-ECC are compared and assessed in the form of time series by means of multivariate verification tools and in a product...
Encoding of Spatial Attention by Primate Prefrontal Cortex Neuronal Ensembles

Science.gov (United States)

Treue, Stefan

2018-01-01

Abstract Single neurons in the primate lateral prefrontal cortex (LPFC) encode information about the allocation of visual attention and the features of visual stimuli. However, how this compares to the performance of neuronal ensembles at encoding the same information is poorly understood. Here, we recorded the responses of neuronal ensembles in the LPFC of two macaque monkeys while they performed a task that required attending to one of two moving random dot patterns positioned in different hemifields and ignoring the other pattern. We found single units selective for the location of the attended stimulus as well as for its motion direction. To determine the coding of both variables in the population of recorded units, we used a linear classifier and progressively built neuronal ensembles by iteratively adding units according to their individual performance (best single units), or by iteratively adding units based on their contribution to the ensemble performance (best ensemble). For both methods, ensembles of relatively small sizes (n decoding performance relative to individual single units. However, the decoder reached similar performance using fewer neurons with the best ensemble building method compared with the best single units method. Our results indicate that neuronal ensembles within the LPFC encode more information about the attended spatial and nonspatial features of visual stimuli than individual neurons. They further suggest that efficient coding of attention can be achieved by relatively small neuronal ensembles characterized by a certain relationship between signal and noise correlation structures. PMID:29568798
BEACON: automated tool for Bacterial GEnome Annotation ComparisON

KAUST Repository

Kalkatawi, Manal M.

2015-08-18

Background Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). Results The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON’s utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27 %, while the number of genes without any function assignment is reduced. Conclusions We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/
BEACON: automated tool for Bacterial GEnome Annotation ComparisON.

Science.gov (United States)

Kalkatawi, Manal; Alam, Intikhab; Bajic, Vladimir B

2015-08-18

Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON's utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27%, while the number of genes without any function assignment is reduced. We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/ .
Three-model ensemble wind prediction in southern Italy

Science.gov (United States)

Torcasio, Rosa Claudia; Federico, Stefano; Calidonna, Claudia Roberta; Avolio, Elenio; Drofa, Oxana; Landi, Tony Christian; Malguzzi, Piero; Buzzi, Andrea; Bonasoni, Paolo

2016-03-01

Quality of wind prediction is of great importance since a good wind forecast allows the prediction of available wind power, improving the penetration of renewable energies into the energy market. Here, a 1-year (1 December 2012 to 30 November 2013) three-model ensemble (TME) experiment for wind prediction is considered. The models employed, run operationally at National Research Council - Institute of Atmospheric Sciences and Climate (CNR-ISAC), are RAMS (Regional Atmospheric Modelling System), BOLAM (BOlogna Limited Area Model), and MOLOCH (MOdello LOCale in H coordinates). The area considered for the study is southern Italy and the measurements used for the forecast verification are those of the GTS (Global Telecommunication System). Comparison with observations is made every 3 h up to 48 h of forecast lead time. Results show that the three-model ensemble outperforms the forecast of each individual model. The RMSE improvement compared to the best model is between 22 and 30 %, depending on the season. It is also shown that the three-model ensemble outperforms the IFS (Integrated Forecasting System) of the ECMWF (European Centre for Medium-Range Weather Forecast) for the surface wind forecasts. Notably, the three-model ensemble forecast performs better than each unbiased model, showing the added value of the ensemble technique. Finally, the sensitivity of the three-model ensemble RMSE to the length of the training period is analysed.
Global Ensemble Forecast System (GEFS) [2.5 Deg.

Data.gov (United States)

National Oceanic and Atmospheric Administration, Department of Commerce — The Global Ensemble Forecast System (GEFS) is a weather forecast model made up of 21 separate forecasts, or ensemble members. The National Centers for Environmental...
On-line Learning of Unlearnable True Teacher through Mobile Ensemble Teachers

Science.gov (United States)

Hirama, Takeshi; Hukushima, Koji

2008-09-01

The on-line learning of a hierarchical learning model is studied by a method based on statistical mechanics. In our model, a student of a simple perceptron learns from not a true teacher directly, but ensemble teachers who learn from a true teacher with a perceptron learning rule. Since the true teacher and ensemble teachers are expressed as nonmonotonic and simple perceptrons, respectively, the ensemble teachers go around the unlearnable true teacher with the distance between them fixed in an asymptotic steady state. The generalization performance of the student is shown to exceed that of the ensemble teachers in a transient state, as was shown in similar ensemble-teachers models. Furthermore, it is found that moving the ensemble teachers even in the steady state, in contrast to the fixed ensemble teachers, is efficient for the performance of the student.
An ensemble classifier to predict track geometry degradation

International Nuclear Information System (INIS)

Cárdenas-Gallo, Iván; Sarmiento, Carlos A.; Morales, Gilberto A.; Bolivar, Manuel A.; Akhavan-Tabatabaei, Raha

2017-01-01

Railway operations are inherently complex and source of several problems. In particular, track geometry defects are one of the leading causes of train accidents in the United States. This paper presents a solution approach which entails the construction of an ensemble classifier to forecast the degradation of track geometry. Our classifier is constructed by solving the problem from three different perspectives: deterioration, regression and classification. We considered a different model from each perspective and our results show that using an ensemble method improves the predictive performance. - Highlights: • We present an ensemble classifier to forecast the degradation of track geometry. • Our classifier considers three perspectives: deterioration, regression and classification. • We construct and test three models and our results show that using an ensemble method improves the predictive performance.
Neural Network Ensembles

DEFF Research Database (Denmark)

Hansen, Lars Kai; Salamon, Peter

1990-01-01

We propose several means for improving the performance an training of neural networks for classification. We use crossvalidation as a tool for optimizing network parameters and architecture. We show further that the remaining generalization error can be reduced by invoking ensembles of similar...... networks....
Development of a regional ensemble prediction method for probabilistic weather prediction

International Nuclear Information System (INIS)

Nohara, Daisuke; Tamura, Hidetoshi; Hirakuchi, Hiromaru

2015-01-01

A regional ensemble prediction method has been developed to provide probabilistic weather prediction using a numerical weather prediction model. To obtain consistent perturbations with the synoptic weather pattern, both of initial and lateral boundary perturbations were given by differences between control and ensemble member of the Japan Meteorological Agency (JMA)'s operational one-week ensemble forecast. The method provides a multiple ensemble member with a horizontal resolution of 15 km for 48-hour based on a downscaling of the JMA's operational global forecast accompanied with the perturbations. The ensemble prediction was examined in the case of heavy snow fall event in Kanto area on January 14, 2013. The results showed that the predictions represent different features of high-resolution spatiotemporal distribution of precipitation affected by intensity and location of extra-tropical cyclone in each ensemble member. Although the ensemble prediction has model bias of mean values and variances in some variables such as wind speed and solar radiation, the ensemble prediction has a potential to append a probabilistic information to a deterministic prediction. (author)
The Use of Artificial-Intelligence-Based Ensembles for Intrusion Detection: A Review

Directory of Open Access Journals (Sweden)

Gulshan Kumar

2012-01-01

Full Text Available In supervised learning-based classification, ensembles have been successfully employed to different application domains. In the literature, many researchers have proposed different ensembles by considering different combination methods, training datasets, base classifiers, and many other factors. Artificial-intelligence-(AI- based techniques play prominent role in development of ensemble for intrusion detection (ID and have many benefits over other techniques. However, there is no comprehensive review of ensembles in general and AI-based ensembles for ID to examine and understand their current research status to solve the ID problem. Here, an updated review of ensembles and their taxonomies has been presented in general. The paper also presents the updated review of various AI-based ensembles for ID (in particular during last decade. The related studies of AI-based ensembles are compared by set of evaluation metrics driven from (1 architecture & approach followed; (2 different methods utilized in different phases of ensemble learning; (3 other measures used to evaluate classification performance of the ensembles. The paper also provides the future directions of the research in this area. The paper will help the better understanding of different directions in which research of ensembles has been done in general and specifically: field of intrusion detection systems (IDSs.
Concrete ensemble Kalman filters with rigorous catastrophic filter divergence.

Science.gov (United States)

Kelly, David; Majda, Andrew J; Tong, Xin T

2015-08-25

The ensemble Kalman filter and ensemble square root filters are data assimilation methods used to combine high-dimensional, nonlinear dynamical models with observed data. Ensemble methods are indispensable tools in science and engineering and have enjoyed great success in geophysical sciences, because they allow for computationally cheap low-ensemble-state approximation for extremely high-dimensional turbulent forecast models. From a theoretical perspective, the dynamical properties of these methods are poorly understood. One of the central mysteries is the numerical phenomenon known as catastrophic filter divergence, whereby ensemble-state estimates explode to machine infinity, despite the true state remaining in a bounded region. In this article we provide a breakthrough insight into the phenomenon, by introducing a simple and natural forecast model that transparently exhibits catastrophic filter divergence under all ensemble methods and a large set of initializations. For this model, catastrophic filter divergence is not an artifact of numerical instability, but rather a true dynamical property of the filter. The divergence is not only validated numerically but also proven rigorously. The model cleanly illustrates mechanisms that give rise to catastrophic divergence and confirms intuitive accounts of the phenomena given in past literature.
ggbio: an R package for extending the grammar of graphics for genomic data

Science.gov (United States)

2012-01-01

We introduce ggbio, a new methodology to visualize and explore genomics annotations and high-throughput data. The plots provide detailed views of genomic regions, summary views of sequence alignments and splicing patterns, and genome-wide overviews with karyogram, circular and grand linear layouts. The methods leverage the statistical functionality available in R, the grammar of graphics and the data handling capabilities of the Bioconductor project. The plots are specified within a modular framework that enables users to construct plots in a systematic way, and are generated directly from Bioconductor data structures. The ggbio R package is available at http://www.bioconductor.org/packages/2.11/bioc/html/ggbio.html. PMID:22937822
Quark ensembles with infinite correlation length

OpenAIRE

Molodtsov, S. V.; Zinovjev, G. M.

2014-01-01

By studying quark ensembles with infinite correlation length we formulate the quantum field theory model that, as we show, is exactly integrable and develops an instability of its standard vacuum ensemble (the Dirac sea). We argue such an instability is rooted in high ground state degeneracy (for 'realistic' space-time dimensions) featuring a fairly specific form of energy distribution, and with the cutoff parameter going to infinity this inherent energy distribution becomes infinitely narrow...
GAAP: Genome-organization-framework-Assisted Assembly Pipeline for prokaryotic genomes.

Science.gov (United States)

Yuan, Lina; Yu, Yang; Zhu, Yanmin; Li, Yulai; Li, Changqing; Li, Rujiao; Ma, Qin; Siu, Gilman Kit-Hang; Yu, Jun; Jiang, Taijiao; Xiao, Jingfa; Kang, Yu

2017-01-25

Next-generation sequencing (NGS) technologies have greatly promoted the genomic study of prokaryotes. However, highly fragmented assemblies due to short reads from NGS are still a limiting factor in gaining insights into the genome biology. Reference-assisted tools are promising in genome assembly, but tend to result in false assembly when the assigned reference has extensive rearrangements. Herein, we present GAAP, a genome assembly pipeline for scaffolding based on core-gene-defined Genome Organizational Framework (cGOF) described in our previous study. Instead of assigning references, we use the multiple-reference-derived cGOFs as indexes to assist in order and orientation of the scaffolds and build a skeleton structure, and then use read pairs to extend scaffolds, called local scaffolding, and distinguish between true and chimeric adjacencies in the scaffolds. In our performance tests using both empirical and simulated data of 15 genomes in six species with diverse genome size, complexity, and all three categories of cGOFs, GAAP outcompetes or achieves comparable results when compared to three other reference-assisted programs, AlignGraph, Ragout and MeDuSa. GAAP uses both cGOF and pair-end reads to create assemblies in genomic scale, and performs better than the currently available reference-assisted assembly tools as it recovers more assemblies and makes fewer false locations, especially for species with extensive rearranged genomes. Our method is a promising solution for reconstruction of genome sequence from short reads of NGS.
A multi-model ensemble approach to seabed mapping

Science.gov (United States)

Diesing, Markus; Stephens, David

2015-06-01

Seabed habitat mapping based on swath acoustic data and ground-truth samples is an emergent and active marine science discipline. Significant progress could be achieved by transferring techniques and approaches that have been successfully developed and employed in such fields as terrestrial land cover mapping. One such promising approach is the multiple classifier system, which aims at improving classification performance by combining the outputs of several classifiers. Here we present results of a multi-model ensemble applied to multibeam acoustic data covering more than 5000 km2 of seabed in the North Sea with the aim to derive accurate spatial predictions of seabed substrate. A suite of six machine learning classifiers (k-Nearest Neighbour, Support Vector Machine, Classification Tree, Random Forest, Neural Network and Naïve Bayes) was trained with ground-truth sample data classified into seabed substrate classes and their prediction accuracy was assessed with an independent set of samples. The three and five best performing models were combined to classifier ensembles. Both ensembles led to increased prediction accuracy as compared to the best performing single classifier. The improvements were however not statistically significant at the 5% level. Although the three-model ensemble did not perform significantly better than its individual component models, we noticed that the five-model ensemble did perform significantly better than three of the five component models. A classifier ensemble might therefore be an effective strategy to improve classification performance. Another advantage is the fact that the agreement in predicted substrate class between the individual models of the ensemble could be used as a measure of confidence. We propose a simple and spatially explicit measure of confidence that is based on model agreement and prediction accuracy.
Critical Listening in the Ensemble Rehearsal: A Community of Learners

Science.gov (United States)

Bell, Cindy L.

2018-01-01

This article explores a strategy for engaging ensemble members in critical listening analysis of performances and presents opportunities for improving ensemble sound through rigorous dialogue, reflection, and attentive rehearsing. Critical listening asks ensemble members to draw on individual playing experience and knowledge to describe what they…
SVM and SVM Ensembles in Breast Cancer Prediction.

Science.gov (United States)

Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong

2017-01-01

Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.
SVM and SVM Ensembles in Breast Cancer Prediction.

Directory of Open Access Journals (Sweden)

Min-Wei Huang

Full Text Available Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.
Ensemble Data Mining Methods

Data.gov (United States)

National Aeronautics and Space Administration — Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods that leverage the power of multiple models to achieve...

Ensemble Network Architecture for Deep Reinforcement Learning

Directory of Open Access Journals (Sweden)

Xi-liang Chen

2018-01-01

Full Text Available The popular deep Q learning algorithm is known to be instability because of the Q-value’s shake and overestimation action values under certain conditions. These issues tend to adversely affect their performance. In this paper, we develop the ensemble network architecture for deep reinforcement learning which is based on value function approximation. The temporal ensemble stabilizes the training process by reducing the variance of target approximation error and the ensemble of target values reduces the overestimate and makes better performance by estimating more accurate Q-value. Our results show that this architecture leads to statistically significant better value evaluation and more stable and better performance on several classical control tasks at OpenAI Gym environment.
Shallow cumuli ensemble statistics for development of a stochastic parameterization

Science.gov (United States)

Sakradzija, Mirjana; Seifert, Axel; Heus, Thijs

2014-05-01

According to a conventional deterministic approach to the parameterization of moist convection in numerical atmospheric models, a given large scale forcing produces an unique response from the unresolved convective processes. This representation leaves out the small-scale variability of convection, as it is known from the empirical studies of deep and shallow convective cloud ensembles, there is a whole distribution of sub-grid states corresponding to the given large scale forcing. Moreover, this distribution gets broader with the increasing model resolution. This behavior is also consistent with our theoretical understanding of a coarse-grained nonlinear system. We propose an approach to represent the variability of the unresolved shallow-convective states, including the dependence of the sub-grid states distribution spread and shape on the model horizontal resolution. Starting from the Gibbs canonical ensemble theory, Craig and Cohen (2006) developed a theory for the fluctuations in a deep convective ensemble. The micro-states of a deep convective cloud ensemble are characterized by the cloud-base mass flux, which, according to the theory, is exponentially distributed (Boltzmann distribution). Following their work, we study the shallow cumulus ensemble statistics and the distribution of the cloud-base mass flux. We employ a Large-Eddy Simulation model (LES) and a cloud tracking algorithm, followed by a conditional sampling of clouds at the cloud base level, to retrieve the information about the individual cloud life cycles and the cloud ensemble as a whole. In the case of shallow cumulus cloud ensemble, the distribution of micro-states is a generalized exponential distribution. Based on the empirical and theoretical findings, a stochastic model has been developed to simulate the shallow convective cloud ensemble and to test the convective ensemble theory. Stochastic model simulates a compound random process, with the number of convective elements drawn from a
Selecting a climate model subset to optimise key ensemble properties

Directory of Open Access Journals (Sweden)

N. Herger

2018-02-01

Full Text Available End users studying impacts and risks caused by human-induced climate change are often presented with large multi-model ensembles of climate projections whose composition and size are arbitrarily determined. An efficient and versatile method that finds a subset which maintains certain key properties from the full ensemble is needed, but very little work has been done in this area. Therefore, users typically make their own somewhat subjective subset choices and commonly use the equally weighted model mean as a best estimate. However, different climate model simulations cannot necessarily be regarded as independent estimates due to the presence of duplicated code and shared development history. Here, we present an efficient and flexible tool that makes better use of the ensemble as a whole by finding a subset with improved mean performance compared to the multi-model mean while at the same time maintaining the spread and addressing the problem of model interdependence. Out-of-sample skill and reliability are demonstrated using model-as-truth experiments. This approach is illustrated with one set of optimisation criteria but we also highlight the flexibility of cost functions, depending on the focus of different users. The technique is useful for a range of applications that, for example, minimise present-day bias to obtain an accurate ensemble mean, reduce dependence in ensemble spread, maximise future spread, ensure good performance of individual models in an ensemble, reduce the ensemble size while maintaining important ensemble characteristics, or optimise several of these at the same time. As in any calibration exercise, the final ensemble is sensitive to the metric, observational product, and pre-processing steps used.
Selecting a climate model subset to optimise key ensemble properties

Science.gov (United States)

Herger, Nadja; Abramowitz, Gab; Knutti, Reto; Angélil, Oliver; Lehmann, Karsten; Sanderson, Benjamin M.

2018-02-01

End users studying impacts and risks caused by human-induced climate change are often presented with large multi-model ensembles of climate projections whose composition and size are arbitrarily determined. An efficient and versatile method that finds a subset which maintains certain key properties from the full ensemble is needed, but very little work has been done in this area. Therefore, users typically make their own somewhat subjective subset choices and commonly use the equally weighted model mean as a best estimate. However, different climate model simulations cannot necessarily be regarded as independent estimates due to the presence of duplicated code and shared development history. Here, we present an efficient and flexible tool that makes better use of the ensemble as a whole by finding a subset with improved mean performance compared to the multi-model mean while at the same time maintaining the spread and addressing the problem of model interdependence. Out-of-sample skill and reliability are demonstrated using model-as-truth experiments. This approach is illustrated with one set of optimisation criteria but we also highlight the flexibility of cost functions, depending on the focus of different users. The technique is useful for a range of applications that, for example, minimise present-day bias to obtain an accurate ensemble mean, reduce dependence in ensemble spread, maximise future spread, ensure good performance of individual models in an ensemble, reduce the ensemble size while maintaining important ensemble characteristics, or optimise several of these at the same time. As in any calibration exercise, the final ensemble is sensitive to the metric, observational product, and pre-processing steps used.
On evaluation of ensemble precipitation forecasts with observation-based ensembles

Directory of Open Access Journals (Sweden)

S. Jaun

2007-04-01

Full Text Available Spatial interpolation of precipitation data is uncertain. How important is this uncertainty and how can it be considered in evaluation of high-resolution probabilistic precipitation forecasts? These questions are discussed by experimental evaluation of the COSMO consortium's limited-area ensemble prediction system COSMO-LEPS. The applied performance measure is the often used Brier skill score (BSS. The observational references in the evaluation are (a analyzed rain gauge data by ordinary Kriging and (b ensembles of interpolated rain gauge data by stochastic simulation. This permits the consideration of either a deterministic reference (the event is observed or not with 100% certainty or a probabilistic reference that makes allowance for uncertainties in spatial averaging. The evaluation experiments show that the evaluation uncertainties are substantial even for the large area (41 300 km2 of Switzerland with a mean rain gauge distance as good as 7 km: the one- to three-day precipitation forecasts have skill decreasing with forecast lead time but the one- and two-day forecast performances differ not significantly.
Random ensemble learning for EEG classification.

Science.gov (United States)

Hosseini, Mohammad-Parsa; Pompili, Dario; Elisevich, Kost; Soltanian-Zadeh, Hamid

2018-01-01

Real-time detection of seizure activity in epilepsy patients is critical in averting seizure activity and improving patients' quality of life. Accurate evaluation, presurgical assessment, seizure prevention, and emergency alerts all depend on the rapid detection of seizure onset. A new method of feature selection and classification for rapid and precise seizure detection is discussed wherein informative components of electroencephalogram (EEG)-derived data are extracted and an automatic method is presented using infinite independent component analysis (I-ICA) to select independent features. The feature space is divided into subspaces via random selection and multichannel support vector machines (SVMs) are used to classify these subspaces. The result of each classifier is then combined by majority voting to establish the final output. In addition, a random subspace ensemble using a combination of SVM, multilayer perceptron (MLP) neural network and an extended k-nearest neighbors (k-NN), called extended nearest neighbor (ENN), is developed for the EEG and electrocorticography (ECoG) big data problem. To evaluate the solution, a benchmark ECoG of eight patients with temporal and extratemporal epilepsy was implemented in a distributed computing framework as a multitier cloud-computing architecture. Using leave-one-out cross-validation, the accuracy, sensitivity, specificity, and both false positive and false negative ratios of the proposed method were found to be 0.97, 0.98, 0.96, 0.04, and 0.02, respectively. Application of the solution to cases under investigation with ECoG has also been effected to demonstrate its utility. Copyright © 2017 Elsevier B.V. All rights reserved.
Polarized ensembles of random pure states

Science.gov (United States)

Deelan Cunden, Fabio; Facchi, Paolo; Florio, Giuseppe

2013-08-01

A new family of polarized ensembles of random pure states is presented. These ensembles are obtained by linear superposition of two random pure states with suitable distributions, and are quite manageable. We will use the obtained results for two purposes: on the one hand we will be able to derive an efficient strategy for sampling states from isopurity manifolds. On the other, we will characterize the deviation of a pure quantum state from separability under the influence of noise.
An Extended Guinier Analysis for Intrinsically Disordered Proteins.

Science.gov (United States)

Zheng, Wenwei; Best, Robert B

2018-03-21

Guinier analysis allows model-free determination of the radius of gyration (R g ) of a biomolecule from X-ray or neutron scattering data, in the limit of very small scattering angles. Its range of validity is well understood for globular proteins, but is known to be more restricted for unfolded or intrinsically disordered proteins (IDPs). We have used ensembles of disordered structures from molecular dynamics simulations to investigate which structural properties cause deviations from the Guinier approximation at small scattering angles. We find that the deviation from the Guinier approximation is correlated with the polymer scaling exponent ν describing the unfolded ensemble. We therefore introduce an empirical, ν-dependent, higher-order correction term, to augment the standard Guinier analysis. We test the new fitting scheme using all-atom simulation data for several IDPs and experimental data for both an IDP and a destabilized mutant of a folded protein. In all cases tested, we achieve an accuracy of the inferred R g within ∼3% of the true R g . The method is straightforward to implement and extends the range of validity to a maximum qR g of ∼2 versus ∼1.1 for Guinier analysis. Compared with the Guinier or Debye approaches, our method allows data from wider angles with lower noise to be used to analyze scattering data accurately. In addition to R g , our fitting scheme also yields estimates of the scaling exponent ν in excellent agreement with the reference ν determined from the underlying molecular ensemble. Published by Elsevier Ltd.
An automated approach to network features of protein structure ensembles

Science.gov (United States)

Bhattacharyya, Moitrayee; Bhat, Chanda R; Vishveshwara, Saraswathi

2013-01-01

Network theory applied to protein structures provides insights into numerous problems of biological relevance. The explosion in structural data available from PDB and simulations establishes a need to introduce a standalone-efficient program that assembles network concepts/parameters under one hood in an automated manner. Herein, we discuss the development/application of an exhaustive, user-friendly, standalone program package named PSN-Ensemble, which can handle structural ensembles generated through molecular dynamics (MD) simulation/NMR studies or from multiple X-ray structures. The novelty in network construction lies in the explicit consideration of side-chain interactions among amino acids. The program evaluates network parameters dealing with topological organization and long-range allosteric communication. The introduction of a flexible weighing scheme in terms of residue pairwise cross-correlation/interaction energy in PSN-Ensemble brings in dynamical/chemical knowledge into the network representation. Also, the results are mapped on a graphical display of the structure, allowing an easy access of network analysis to a general biological community. The potential of PSN-Ensemble toward examining structural ensemble is exemplified using MD trajectories of an ubiquitin-conjugating enzyme (UbcH5b). Furthermore, insights derived from network parameters evaluated using PSN-Ensemble for single-static structures of active/inactive states of β2-adrenergic receptor and the ternary tRNA complexes of tyrosyl tRNA synthetases (from organisms across kingdoms) are discussed. PSN-Ensemble is freely available from http://vishgraph.mbu.iisc.ernet.in/PSN-Ensemble/psn_index.html. PMID:23934896
Developing an Ensemble Prediction System based on COSMO-DE

Science.gov (United States)

Theis, S.; Gebhardt, C.; Buchhold, M.; Ben Bouallègue, Z.; Ohl, R.; Paulat, M.; Peralta, C.

2010-09-01

The numerical weather prediction model COSMO-DE is a configuration of the COSMO model with a horizontal grid size of 2.8 km. It has been running operationally at DWD since 2007, it covers the area of Germany and produces forecasts with a lead time of 0-21 hours. The model COSMO-DE is convection-permitting, which means that it does without a parametrisation of deep convection and simulates deep convection explicitly. One aim is an improved forecast of convective heavy rain events. Convection-permitting models are in operational use at several weather services, but currently not in ensemble mode. It is expected that an ensemble system could reveal the advantages of a convection-permitting model even better. The probabilistic approach is necessary, because the explicit simulation of convective processes for more than a few hours cannot be viewed as a deterministic forecast anymore. This is due to the chaotic behaviour and short life cycle of the processes which are simulated explicitly now. In the framework of the project COSMO-DE-EPS, DWD is developing and implementing an ensemble prediction system (EPS) for the model COSMO-DE. The project COSMO-DE-EPS comprises the generation of ensemble members, as well as the verification and visualization of the ensemble forecasts and also statistical postprocessing. A pre-operational mode of the EPS with 20 ensemble members is foreseen to start in 2010. Operational use is envisaged to start in 2012, after an upgrade to 40 members and inclusion of statistical postprocessing. The presentation introduces the project COSMO-DE-EPS and describes the design of the ensemble as it is planned for the pre-operational mode. In particular, the currently implemented method for the generation of ensemble members will be explained and discussed. The method includes variations of initial conditions, lateral boundary conditions, and model physics. At present, pragmatic methods are applied which resemble the basic ideas of a multi-model approach
The Hydrologic Ensemble Prediction Experiment (HEPEX)

Science.gov (United States)

Wood, Andy; Wetterhall, Fredrik; Ramos, Maria-Helena

2015-04-01

The Hydrologic Ensemble Prediction Experiment was established in March, 2004, at a workshop hosted by the European Center for Medium Range Weather Forecasting (ECMWF), and co-sponsored by the US National Weather Service (NWS) and the European Commission (EC). The HEPEX goal was to bring the international hydrological and meteorological communities together to advance the understanding and adoption of hydrological ensemble forecasts for decision support. HEPEX pursues this goal through research efforts and practical implementations involving six core elements of a hydrologic ensemble prediction enterprise: input and pre-processing, ensemble techniques, data assimilation, post-processing, verification, and communication and use in decision making. HEPEX has grown through meetings that connect the user, forecast producer and research communities to exchange ideas, data and methods; the coordination of experiments to address specific challenges; and the formation of testbeds to facilitate shared experimentation. In the last decade, HEPEX has organized over a dozen international workshops, as well as sessions at scientific meetings (including AMS, AGU and EGU) and special issues of scientific journals where workshop results have been published. Through these interactions and an active online blog (www.hepex.org), HEPEX has built a strong and active community of nearly 400 researchers & practitioners around the world. This poster presents an overview of recent and planned HEPEX activities, highlighting case studies that exemplify the focus and objectives of HEPEX.
Ensemble-based forecasting at Horns Rev: Ensemble conversion and kernel dressing

DEFF Research Database (Denmark)

Pinson, Pierre; Madsen, Henrik

. The obtained ensemble forecasts of wind power are then converted into predictive distributions with an original adaptive kernel dressing method. The shape of the kernels is driven by a mean-variance model, the parameters of which are recursively estimated in order to maximize the overall skill of obtained...
Lattice gauge theory in the microcanonical ensemble

International Nuclear Information System (INIS)

Callaway, D.J.E.; Rahman, A.

1983-01-01

The microcanonical-ensemble formulation of lattice gauge theory proposed recently is examined in detail. Expectation values in this new ensemble are determined by solving a large set of coupled ordinary differential equations, after the fashion of a molecular dynamics simulation. Following a brief review of the microcanonical ensemble, calculations are performed for the gauge groups U(1), SU(2), and SU(3). The results are compared and contrasted with standard methods of computation. Several advantages of the new formalism are noted. For example, no random numbers are required to update the system. Also, this update is performed in a simultaneous fashion. Thus the microcanonical method presumably adapts well to parallel processing techniques, especially when the p action is highly nonlocal (such as when fermions are included)
Exploring and Listening to Chinese Classical Ensembles in General Music

Science.gov (United States)

Zhang, Wenzhuo

2017-01-01

Music diversity is valued in theory, but the extent to which it is efficiently presented in music class remains limited. Within this article, I aim to bridge this gap by introducing four genres of Chinese classical ensembles--Qin and Xiao duets, Jiang Nan bamboo and silk ensembles, Cantonese ensembles, and contemporary Chinese orchestras--into the…
Polarized ensembles of random pure states

International Nuclear Information System (INIS)

Cunden, Fabio Deelan; Facchi, Paolo; Florio, Giuseppe

2013-01-01

A new family of polarized ensembles of random pure states is presented. These ensembles are obtained by linear superposition of two random pure states with suitable distributions, and are quite manageable. We will use the obtained results for two purposes: on the one hand we will be able to derive an efficient strategy for sampling states from isopurity manifolds. On the other, we will characterize the deviation of a pure quantum state from separability under the influence of noise. (paper)
Probabilistic Determination of Native State Ensembles of Proteins

DEFF Research Database (Denmark)

Olsson, Simon; Vögeli, Beat Rolf; Cavalli, Andrea

2014-01-01

ensembles of proteins by the combination of physical force fields and experimental data through modern statistical methodology. As an example, we use NMR residual dipolar couplings to determine a native state ensemble of the extensively studied third immunoglobulin binding domain of protein G (GB3...
Ensembles of a small number of conformations with relative populations

Energy Technology Data Exchange (ETDEWEB)

Vammi, Vijay, E-mail: vsvammi@iastate.edu; Song, Guang, E-mail: gsong@iastate.edu [Iowa State University, Bioinformatics and Computational Biology Program, Department of Computer Science (United States)

2015-12-15

In our previous work, we proposed a new way to represent protein native states, using ensembles of a small number of conformations with relative Populations, or ESP in short. Using Ubiquitin as an example, we showed that using a small number of conformations could greatly reduce the potential of overfitting and assigning relative populations to protein ensembles could significantly improve their quality. To demonstrate that ESP indeed is an excellent alternative to represent protein native states, in this work we compare the quality of two ESP ensembles of Ubiquitin with several well-known regular ensembles or average structure representations. Extensive amount of significant experimental data are employed to achieve a thorough assessment. Our results demonstrate that ESP ensembles, though much smaller in size comparing to regular ensembles, perform equally or even better sometimes in all four different types of experimental data used in the assessment, namely, the residual dipolar couplings, residual chemical shift anisotropy, hydrogen exchange rates, and solution scattering profiles. This work further underlines the significance of having relative populations in describing the native states.
Ensemble Kalman filtering with one-step-ahead smoothing

KAUST Repository

Raboudi, Naila F.

2018-01-11

The ensemble Kalman filter (EnKF) is widely used for sequential data assimilation. It operates as a succession of forecast and analysis steps. In realistic large-scale applications, EnKFs are implemented with small ensembles and poorly known model error statistics. This limits their representativeness of the background error covariances and, thus, their performance. This work explores the efficiency of the one-step-ahead (OSA) smoothing formulation of the Bayesian filtering problem to enhance the data assimilation performance of EnKFs. Filtering with OSA smoothing introduces an updated step with future observations, conditioning the ensemble sampling with more information. This should provide an improved background ensemble in the analysis step, which may help to mitigate the suboptimal character of EnKF-based methods. Here, the authors demonstrate the efficiency of a stochastic EnKF with OSA smoothing for state estimation. They then introduce a deterministic-like EnKF-OSA based on the singular evolutive interpolated ensemble Kalman (SEIK) filter. The authors show that the proposed SEIK-OSA outperforms both SEIK, as it efficiently exploits the data twice, and the stochastic EnKF-OSA, as it avoids observational error undersampling. They present extensive assimilation results from numerical experiments conducted with the Lorenz-96 model to demonstrate SEIK-OSA’s capabilities.
Universal critical wrapping probabilities in the canonical ensemble

Directory of Open Access Journals (Sweden)

Hao Hu

2015-09-01

Full Text Available Universal dimensionless quantities, such as Binder ratios and wrapping probabilities, play an important role in the study of critical phenomena. We study the finite-size scaling behavior of the wrapping probability for the Potts model in the random-cluster representation, under the constraint that the total number of occupied bonds is fixed, so that the canonical ensemble applies. We derive that, in the limit L→∞, the critical values of the wrapping probability are different from those of the unconstrained model, i.e. the model in the grand-canonical ensemble, but still universal, for systems with 2yt−d>0 where yt=1/ν is the thermal renormalization exponent and d is the spatial dimension. Similar modifications apply to other dimensionless quantities, such as Binder ratios. For systems with 2yt−d≤0, these quantities share same critical universal values in the two ensembles. It is also derived that new finite-size corrections are induced. These findings apply more generally to systems in the canonical ensemble, e.g. the dilute Potts model with a fixed total number of vacancies. Finally, we formulate an efficient cluster-type algorithm for the canonical ensemble, and confirm these predictions by extensive simulations.
A novel hybrid ensemble learning paradigm for nuclear energy consumption forecasting

International Nuclear Information System (INIS)

Tang, Ling; Yu, Lean; Wang, Shuai; Li, Jianping; Wang, Shouyang

2012-01-01

Highlights: ► A hybrid ensemble learning paradigm integrating EEMD and LSSVR is proposed. ► The hybrid ensemble method is useful to predict time series with high volatility. ► The ensemble method can be used for both one-step and multi-step ahead forecasting. - Abstract: In this paper, a novel hybrid ensemble learning paradigm integrating ensemble empirical mode decomposition (EEMD) and least squares support vector regression (LSSVR) is proposed for nuclear energy consumption forecasting, based on the principle of “decomposition and ensemble”. This hybrid ensemble learning paradigm is formulated specifically to address difficulties in modeling nuclear energy consumption, which has inherently high volatility, complexity and irregularity. In the proposed hybrid ensemble learning paradigm, EEMD, as a competitive decomposition method, is first applied to decompose original data of nuclear energy consumption (i.e. a difficult task) into a number of independent intrinsic mode functions (IMFs) of original data (i.e. some relatively easy subtasks). Then LSSVR, as a powerful forecasting tool, is implemented to predict all extracted IMFs independently. Finally, these predicted IMFs are aggregated into an ensemble result as final prediction, using another LSSVR. For illustration and verification purposes, the proposed learning paradigm is used to predict nuclear energy consumption in China. Empirical results demonstrate that the novel hybrid ensemble learning paradigm can outperform some other popular forecasting models in both level prediction and directional forecasting, indicating that it is a promising tool to predict complex time series with high volatility and irregularity.

Constructing Better Classifier Ensemble Based on Weighted Accuracy and Diversity Measure

Directory of Open Access Journals (Sweden)

Xiaodong Zeng

2014-01-01

Full Text Available A weighted accuracy and diversity (WAD method is presented, a novel measure used to evaluate the quality of the classifier ensemble, assisting in the ensemble selection task. The proposed measure is motivated by a commonly accepted hypothesis; that is, a robust classifier ensemble should not only be accurate but also different from every other member. In fact, accuracy and diversity are mutual restraint factors; that is, an ensemble with high accuracy may have low diversity, and an overly diverse ensemble may negatively affect accuracy. This study proposes a method to find the balance between accuracy and diversity that enhances the predictive ability of an ensemble for unknown data. The quality assessment for an ensemble is performed such that the final score is achieved by computing the harmonic mean of accuracy and diversity, where two weight parameters are used to balance them. The measure is compared to two representative measures, Kappa-Error and GenDiv, and two threshold measures that consider only accuracy or diversity, with two heuristic search algorithms, genetic algorithm, and forward hill-climbing algorithm, in ensemble selection tasks performed on 15 UCI benchmark datasets. The empirical results demonstrate that the WAD measure is superior to others in most cases.
Visualization and classification of physiological failure modes in ensemble hemorrhage simulation

Science.gov (United States)

Zhang, Song; Pruett, William Andrew; Hester, Robert

2015-01-01

In an emergency situation such as hemorrhage, doctors need to predict which patients need immediate treatment and care. This task is difficult because of the diverse response to hemorrhage in human population. Ensemble physiological simulations provide a means to sample a diverse range of subjects and may have a better chance of containing the correct solution. However, to reveal the patterns and trends from the ensemble simulation is a challenging task. We have developed a visualization framework for ensemble physiological simulations. The visualization helps users identify trends among ensemble members, classify ensemble member into subpopulations for analysis, and provide prediction to future events by matching a new patient's data to existing ensembles. We demonstrated the effectiveness of the visualization on simulated physiological data. The lessons learned here can be applied to clinically-collected physiological data in the future.
Relation between native ensembles and experimental structures of proteins

DEFF Research Database (Denmark)

Best, R. B.; Lindorff-Larsen, Kresten; DePristo, M. A.

2006-01-01

Different experimental structures of the same protein or of proteins with high sequence similarity contain many small variations. Here we construct ensembles of "high-sequence similarity Protein Data Bank" (HSP) structures and consider the extent to which such ensembles represent the structural...... Data Bank ensembles; moreover, we show that the effects of uncertainties in structure determination are insufficient to explain the results. These results highlight the importance of accounting for native-state protein dynamics in making comparisons with ensemble-averaged experimental data and suggest...... heterogeneity of the native state in solution. We find that different NMR measurements probing structure and dynamics of given proteins in solution, including order parameters, scalar couplings, and residual dipolar couplings, are remarkably well reproduced by their respective high-sequence similarity Protein...
The pig genome project has plenty to squeal about.

Science.gov (United States)

Fan, B; Gorbach, D M; Rothschild, M F

2011-01-01

Significant progress on pig genetics and genomics research has been witnessed in recent years due to the integration of advanced molecular biology techniques, bioinformatics and computational biology, and the collaborative efforts of researchers in the swine genomics community. Progress on expanding the linkage map has slowed down, but the efforts have created a higher-resolution physical map integrating the clone map and BAC end sequence. The number of QTL mapped is still growing and most of the updated QTL mapping results are available through PigQTLdb. Additionally, expression studies using high-throughput microarrays and other gene expression techniques have made significant advancements. The number of identified non-coding RNAs is rapidly increasing and their exact regulatory functions are being explored. A publishable draft (build 10) of the swine genome sequence was available for the pig genomics community by the end of December 2010. Build 9 of the porcine genome is currently available with Ensembl annotation; manual annotation is ongoing. These drafts provide useful tools for such endeavors as comparative genomics and SNP scans for fine QTL mapping. A recent community-wide effort to create a 60K porcine SNP chip has greatly facilitated whole-genome association analyses, haplotype block construction and linkage disequilibrium mapping, which can contribute to whole-genome selection. The future 'systems biology' that integrates and optimizes the information from all research levels can enhance the pig community's understanding of the full complexity of the porcine genome. These recent technological advances and where they may lead are reviewed. Copyright © 2011 S. Karger AG, Basel.
On the structure and phase transitions of power-law Poissonian ensembles

Science.gov (United States)

Eliazar, Iddo; Oshanin, Gleb

2012-10-01

Power-law Poissonian ensembles are Poisson processes that are defined on the positive half-line, and that are governed by power-law intensities. Power-law Poissonian ensembles are stochastic objects of fundamental significance; they uniquely display an array of fractal features and they uniquely generate a span of important applications. In this paper we apply three different methods—oligarchic analysis, Lorenzian analysis and heterogeneity analysis—to explore power-law Poissonian ensembles. The amalgamation of these analyses, combined with the topology of power-law Poissonian ensembles, establishes a detailed and multi-faceted picture of the statistical structure and the statistical phase transitions of these elemental ensembles.
Dissipation induced asymmetric steering of distant atomic ensembles

Science.gov (United States)

Cheng, Guangling; Tan, Huatang; Chen, Aixi

2018-04-01

The asymmetric steering effects of separated atomic ensembles denoted by the effective bosonic modes have been explored by the means of quantum reservoir engineering in the setting of the cascaded cavities, in each of which an atomic ensemble is involved. It is shown that the steady-state asymmetric steering of the mesoscopic objects is unconditionally achieved via the dissipation of the cavities, by which the nonlocal interaction occurs between two atomic ensembles, and the direction of steering could be easily controlled through variation of certain tunable system parameters. One advantage of the present scheme is that it could be rather robust against parameter fluctuations, and does not require the accurate control of evolution time and the original state of the system. Furthermore, the double-channel Raman transitions between the long-lived atomic ground states are used and the atomic ensembles act as the quantum network nodes, which makes our scheme insensitive to the collective spontaneous emission of atoms.
Skill forecasting from different wind power ensemble prediction methods

International Nuclear Information System (INIS)

Pinson, Pierre; Nielsen, Henrik A; Madsen, Henrik; Kariniotakis, George

2007-01-01

This paper presents an investigation on alternative approaches to the providing of uncertainty estimates associated to point predictions of wind generation. Focus is given to skill forecasts in the form of prediction risk indices, aiming at giving a comprehensive signal on the expected level of forecast uncertainty. Ensemble predictions of wind generation are used as input. A proposal for the definition of prediction risk indices is given. Such skill forecasts are based on the dispersion of ensemble members for a single prediction horizon, or over a set of successive look-ahead times. It is shown on the test case of a Danish offshore wind farm how prediction risk indices may be related to several levels of forecast uncertainty (and energy imbalances). Wind power ensemble predictions are derived from the transformation of ECMWF and NCEP ensembles of meteorological variables to power, as well as by a lagged average approach alternative. The ability of risk indices calculated from the various types of ensembles forecasts to resolve among situations with different levels of uncertainty is discussed
Establishing and storing of deterministic quantum entanglement among three distant atomic ensembles.

Science.gov (United States)

Yan, Zhihui; Wu, Liang; Jia, Xiaojun; Liu, Yanhong; Deng, Ruijie; Li, Shujing; Wang, Hai; Xie, Changde; Peng, Kunchi

2017-09-28

It is crucial for the physical realization of quantum information networks to first establish entanglement among multiple space-separated quantum memories and then, at a user-controlled moment, to transfer the stored entanglement to quantum channels for distribution and conveyance of information. Here we present an experimental demonstration on generation, storage, and transfer of deterministic quantum entanglement among three spatially separated atomic ensembles. The off-line prepared multipartite entanglement of optical modes is mapped into three distant atomic ensembles to establish entanglement of atomic spin waves via electromagnetically induced transparency light-matter interaction. Then the stored atomic entanglement is transferred into a tripartite quadrature entangled state of light, which is space-separated and can be dynamically allocated to three quantum channels for conveying quantum information. The existence of entanglement among three released optical modes verifies that the system has the capacity to preserve multipartite entanglement. The presented protocol can be directly extended to larger quantum networks with more nodes.Continuous-variable encoding is a promising approach for quantum information and communication networks. Here, the authors show how to map entanglement from three spatial optical modes to three separated atomic samples via electromagnetically induced transparency, releasing it later on demand.
Using ensemble forecasting for wind power

Energy Technology Data Exchange (ETDEWEB)

Giebel, G.; Landberg, L.; Badger, J. [Risoe National Lab., Roskilde (Denmark); Sattler, K.

2003-07-01

Short-term prediction of wind power has a long tradition in Denmark. It is an essential tool for the operators to keep the grid from becoming unstable in a region like Jutland, where more than 27% of the electricity consumption comes from wind power. This means that the minimum load is already lower than the maximum production from wind energy alone. Danish utilities have therefore used short-term prediction of wind energy since the mid-90ies. However, the accuracy is still far from being sufficient in the eyes of the utilities (used to have load forecasts accurate to within 5% on a one-week horizon). The Ensemble project tries to alleviate the dependency of the forecast quality on one model by using multiple models, and also will investigate the possibilities of using the model spread of multiple models or of dedicated ensemble runs for a prediction of the uncertainty of the forecast. Usually, short-term forecasting works (especially for the horizon beyond 6 hours) by gathering input from a Numerical Weather Prediction (NWP) model. This input data is used together with online data in statistical models (this is the case eg in Zephyr/WPPT) to yield the output of the wind farms or of a whole region for the next 48 hours (only limited by the NWP model horizon). For the accuracy of the final production forecast, the accuracy of the NWP prediction is paramount. While many efforts are underway to increase the accuracy of the NWP forecasts themselves (which ultimately are limited by the amount of computing power available, the lack of a tight observational network on the Atlantic and limited physics modelling), another approach is to use ensembles of different models or different model runs. This can be either an ensemble of different models output for the same area, using different data assimilation schemes and different model physics, or a dedicated ensemble run by a large institution, where the same model is run with slight variations in initial conditions and
Operational hydrological forecasting in Bavaria. Part II: Ensemble forecasting

Science.gov (United States)

Ehret, U.; Vogelbacher, A.; Moritz, K.; Laurent, S.; Meyer, I.; Haag, I.

2009-04-01

In part I of this study, the operational flood forecasting system in Bavaria and an approach to identify and quantify forecast uncertainty was introduced. The approach is split into the calculation of an empirical 'overall error' from archived forecasts and the calculation of an empirical 'model error' based on hydrometeorological forecast tests, where rainfall observations were used instead of forecasts. The 'model error' can especially in upstream catchments where forecast uncertainty is strongly dependent on the current predictability of the atrmosphere be superimposed on the spread of a hydrometeorological ensemble forecast. In Bavaria, two meteorological ensemble prediction systems are currently tested for operational use: the 16-member COSMO-LEPS forecast and a poor man's ensemble composed of DWD GME, DWD Cosmo-EU, NCEP GFS, Aladin-Austria, MeteoSwiss Cosmo-7. The determination of the overall forecast uncertainty is dependent on the catchment characteristics: 1. Upstream catchment with high influence of weather forecast a) A hydrological ensemble forecast is calculated using each of the meteorological forecast members as forcing. b) Corresponding to the characteristics of the meteorological ensemble forecast, each resulting forecast hydrograph can be regarded as equally likely. c) The 'model error' distribution, with parameters dependent on hydrological case and lead time, is added to each forecast timestep of each ensemble member d) For each forecast timestep, the overall (i.e. over all 'model error' distribution of each ensemble member) error distribution is calculated e) From this distribution, the uncertainty range on a desired level (here: the 10% and 90% percentile) is extracted and drawn as forecast envelope. f) As the mean or median of an ensemble forecast does not necessarily exhibit meteorologically sound temporal evolution, a single hydrological forecast termed 'lead forecast' is chosen and shown in addition to the uncertainty bounds. This can be
Exploring diversity in ensemble classification: Applications in large area land cover mapping

Science.gov (United States)

Mellor, Andrew; Boukir, Samia

2017-07-01

Ensemble classifiers, such as random forests, are now commonly applied in the field of remote sensing, and have been shown to perform better than single classifier systems, resulting in reduced generalisation error. Diversity across the members of ensemble classifiers is known to have a strong influence on classification performance - whereby classifier errors are uncorrelated and more uniformly distributed across ensemble members. The relationship between ensemble diversity and classification performance has not yet been fully explored in the fields of information science and machine learning and has never been examined in the field of remote sensing. This study is a novel exploration of ensemble diversity and its link to classification performance, applied to a multi-class canopy cover classification problem using random forests and multisource remote sensing and ancillary GIS data, across seven million hectares of diverse dry-sclerophyll dominated public forests in Victoria Australia. A particular emphasis is placed on analysing the relationship between ensemble diversity and ensemble margin - two key concepts in ensemble learning. The main novelty of our work is on boosting diversity by emphasizing the contribution of lower margin instances used in the learning process. Exploring the influence of tree pruning on diversity is also a new empirical analysis that contributes to a better understanding of ensemble performance. Results reveal insights into the trade-off between ensemble classification accuracy and diversity, and through the ensemble margin, demonstrate how inducing diversity by targeting lower margin training samples is a means of achieving better classifier performance for more difficult or rarer classes and reducing information redundancy in classification problems. Our findings inform strategies for collecting training data and designing and parameterising ensemble classifiers, such as random forests. This is particularly important in large area
Reliability of multi-model and structurally different single-model ensembles

Energy Technology Data Exchange (ETDEWEB)

Yokohata, Tokuta [National Institute for Environmental Studies, Center for Global Environmental Research, Tsukuba, Ibaraki (Japan); Annan, James D.; Hargreaves, Julia C. [Japan Agency for Marine-Earth Science and Technology, Research Institute for Global Change, Yokohama, Kanagawa (Japan); Collins, Matthew [University of Exeter, College of Engineering, Mathematics and Physical Sciences, Exeter (United Kingdom); Jackson, Charles S.; Tobis, Michael [The University of Texas at Austin, Institute of Geophysics, 10100 Burnet Rd., ROC-196, Mail Code R2200, Austin, TX (United States); Webb, Mark J. [Met Office Hadley Centre, Exeter (United Kingdom)

2012-08-15

The performance of several state-of-the-art climate model ensembles, including two multi-model ensembles (MMEs) and four structurally different (perturbed parameter) single model ensembles (SMEs), are investigated for the first time using the rank histogram approach. In this method, the reliability of a model ensemble is evaluated from the point of view of whether the observations can be regarded as being sampled from the ensemble. Our analysis reveals that, in the MMEs, the climate variables we investigated are broadly reliable on the global scale, with a tendency towards overdispersion. On the other hand, in the SMEs, the reliability differs depending on the ensemble and variable field considered. In general, the mean state and historical trend of surface air temperature, and mean state of precipitation are reliable in the SMEs. However, variables such as sea level pressure or top-of-atmosphere clear-sky shortwave radiation do not cover a sufficiently wide range in some. It is not possible to assess whether this is a fundamental feature of SMEs generated with particular model, or a consequence of the algorithm used to select and perturb the values of the parameters. As under-dispersion is a potentially more serious issue when using ensembles to make projections, we recommend the application of rank histograms to assess reliability when designing and running perturbed physics SMEs. (orig.)
On the forecast skill of a convection-permitting ensemble

Science.gov (United States)

Schellander-Gorgas, Theresa; Wang, Yong; Meier, Florian; Weidle, Florian; Wittmann, Christoph; Kann, Alexander

2017-01-01

The 2.5 km convection-permitting (CP) ensemble AROME-EPS (Applications of Research to Operations at Mesoscale - Ensemble Prediction System) is evaluated by comparison with the regional 11 km ensemble ALADIN-LAEF (Aire Limitée Adaption dynamique Développement InterNational - Limited Area Ensemble Forecasting) to show whether a benefit is provided by a CP EPS. The evaluation focuses on the abilities of the ensembles to quantitatively predict precipitation during a 3-month convective summer period over areas consisting of mountains and lowlands. The statistical verification uses surface observations and 1 km × 1 km precipitation analyses, and the verification scores involve state-of-the-art statistical measures for deterministic and probabilistic forecasts as well as novel spatial verification methods. The results show that the convection-permitting ensemble with higher-resolution AROME-EPS outperforms its mesoscale counterpart ALADIN-LAEF for precipitation forecasts. The positive impact is larger for the mountainous areas than for the lowlands. In particular, the diurnal precipitation cycle is improved in AROME-EPS, which leads to a significant improvement of scores at the concerned times of day (up to approximately one-third of the scored verification measure). Moreover, there are advantages for higher precipitation thresholds at small spatial scales, which are due to the improved simulation of the spatial structure of precipitation.
Ensemble computing for the petroleum industry

International Nuclear Information System (INIS)

Annaratone, M.; Dossa, D.

1995-01-01

Computer downsizing is one of the most often used buzzwords in today's competitive business, and the petroleum industry is at the forefront of this revolution. Ensemble computing provides the key for computer downsizing with its first incarnation, i.e., workstation farms. This paper concerns the importance of increasing the productivity cycle and not just the execution time of a job. The authors introduce the concept of ensemble computing and workstation farms. The they discuss how different computing paradigms can be addressed by workstation farms
Observing copepods through a genomic lens

Directory of Open Access Journals (Sweden)

Johnson Stewart C

2011-09-01

Full Text Available Abstract Background Copepods outnumber every other multicellular animal group. They are critical components of the world's freshwater and marine ecosystems, sensitive indicators of local and global climate change, key ecosystem service providers, parasites and predators of economically important aquatic animals and potential vectors of waterborne disease. Copepods sustain the world fisheries that nourish and support human populations. Although genomic tools have transformed many areas of biological and biomedical research, their power to elucidate aspects of the biology, behavior and ecology of copepods has only recently begun to be exploited. Discussion The extraordinary biological and ecological diversity of the subclass Copepoda provides both unique advantages for addressing key problems in aquatic systems and formidable challenges for developing a focused genomics strategy. This article provides an overview of genomic studies of copepods and discusses strategies for using genomics tools to address key questions at levels extending from individuals to ecosystems. Genomics can, for instance, help to decipher patterns of genome evolution such as those that occur during transitions from free living to symbiotic and parasitic lifestyles and can assist in the identification of genetic mechanisms and accompanying physiological changes associated with adaptation to new or physiologically challenging environments. The adaptive significance of the diversity in genome size and unique mechanisms of genome reorganization during development could similarly be explored. Genome-wide and EST studies of parasitic copepods of salmon and large EST studies of selected free-living copepods have demonstrated the potential utility of modern genomics approaches for the study of copepods and have generated resources such as EST libraries, shotgun genome sequences, BAC libraries, genome maps and inbred lines that will be invaluable in assisting further efforts to
Observing copepods through a genomic lens

Science.gov (United States)

2011-01-01

Background Copepods outnumber every other multicellular animal group. They are critical components of the world's freshwater and marine ecosystems, sensitive indicators of local and global climate change, key ecosystem service providers, parasites and predators of economically important aquatic animals and potential vectors of waterborne disease. Copepods sustain the world fisheries that nourish and support human populations. Although genomic tools have transformed many areas of biological and biomedical research, their power to elucidate aspects of the biology, behavior and ecology of copepods has only recently begun to be exploited. Discussion The extraordinary biological and ecological diversity of the subclass Copepoda provides both unique advantages for addressing key problems in aquatic systems and formidable challenges for developing a focused genomics strategy. This article provides an overview of genomic studies of copepods and discusses strategies for using genomics tools to address key questions at levels extending from individuals to ecosystems. Genomics can, for instance, help to decipher patterns of genome evolution such as those that occur during transitions from free living to symbiotic and parasitic lifestyles and can assist in the identification of genetic mechanisms and accompanying physiological changes associated with adaptation to new or physiologically challenging environments. The adaptive significance of the diversity in genome size and unique mechanisms of genome reorganization during development could similarly be explored. Genome-wide and EST studies of parasitic copepods of salmon and large EST studies of selected free-living copepods have demonstrated the potential utility of modern genomics approaches for the study of copepods and have generated resources such as EST libraries, shotgun genome sequences, BAC libraries, genome maps and inbred lines that will be invaluable in assisting further efforts to provide genomics tools for
Ocean Predictability and Uncertainty Forecasts Using Local Ensemble Transfer Kalman Filter (LETKF)

Science.gov (United States)

Wei, M.; Hogan, P. J.; Rowley, C. D.; Smedstad, O. M.; Wallcraft, A. J.; Penny, S. G.

2017-12-01

Ocean predictability and uncertainty are studied with an ensemble system that has been developed based on the US Navy's operational HYCOM using the Local Ensemble Transfer Kalman Filter (LETKF) technology. One of the advantages of this method is that the best possible initial analysis states for the HYCOM forecasts are provided by the LETKF which assimilates operational observations using ensemble method. The background covariance during this assimilation process is implicitly supplied with the ensemble avoiding the difficult task of developing tangent linear and adjoint models out of HYCOM with the complicated hybrid isopycnal vertical coordinate for 4D-VAR. The flow-dependent background covariance from the ensemble will be an indispensable part in the next generation hybrid 4D-Var/ensemble data assimilation system. The predictability and uncertainty for the ocean forecasts are studied initially for the Gulf of Mexico. The results are compared with another ensemble system using Ensemble Transfer (ET) method which has been used in the Navy's operational center. The advantages and disadvantages are discussed.
REAL - Ensemble radar precipitation estimation for hydrology in a mountainous region

OpenAIRE

Germann, Urs; Berenguer Ferrer, Marc; Sempere Torres, Daniel; Zappa, Massimiliano

2009-01-01

An elegant solution to characterise the residual errors in radar precipitation estimates is to generate an ensemble of precipitation fields. The paper proposes a radar ensemble generator designed for usage in the Alps using LU decomposition (REAL), and presents first results from a real-time implementation coupling the radar ensemble with a semi-distributed rainfall–runoff model for flash flood modelling in a steep Alpine catchment. Each member of the radar ensemble is a possible realisati...
Non-Boltzmann Ensembles and Monte Carlo Simulations

International Nuclear Information System (INIS)

Murthy, K. P. N.

2016-01-01

Boltzmann sampling based on Metropolis algorithm has been extensively used for simulating a canonical ensemble and for calculating macroscopic properties of a closed system at desired temperatures. An estimate of a mechanical property, like energy, of an equilibrium system, is made by averaging over a large number microstates generated by Boltzmann Monte Carlo methods. This is possible because we can assign a numerical value for energy to each microstate. However, a thermal property like entropy, is not easily accessible to these methods. The reason is simple. We can not assign a numerical value for entropy, to a microstate. Entropy is not a property associated with any single microstate. It is a collective property of all the microstates. Toward calculating entropy and other thermal properties, a non-Boltzmann Monte Carlo technique called Umbrella sampling was proposed some forty years ago. Umbrella sampling has since undergone several metamorphoses and we have now, multi-canonical Monte Carlo, entropic sampling, flat histogram methods, Wang-Landau algorithm etc . This class of methods generates non-Boltzmann ensembles which are un-physical. However, physical quantities can be calculated as follows. First un-weight a microstates of the entropic ensemble; then re-weight it to the desired physical ensemble. Carry out weighted average over the entropic ensemble to estimate physical quantities. In this talk I shall tell you of the most recent non- Boltzmann Monte Carlo method and show how to calculate free energy for a few systems. We first consider estimation of free energy as a function of energy at different temperatures to characterize phase transition in an hairpin DNA in the presence of an unzipping force. Next we consider free energy as a function of order parameter and to this end we estimate density of states g ( E , M ), as a function of both energy E , and order parameter M . This is carried out in two stages. We estimate g ( E ) in the first stage
Noodles: a tool for visualization of numerical weather model ensemble uncertainty.

Science.gov (United States)

Sanyal, Jibonananda; Zhang, Song; Dyer, Jamie; Mercer, Andrew; Amburn, Philip; Moorhead, Robert J

2010-01-01

Numerical weather prediction ensembles are routinely used for operational weather forecasting. The members of these ensembles are individual simulations with either slightly perturbed initial conditions or different model parameterizations, or occasionally both. Multi-member ensemble output is usually large, multivariate, and challenging to interpret interactively. Forecast meteorologists are interested in understanding the uncertainties associated with numerical weather prediction; specifically variability between the ensemble members. Currently, visualization of ensemble members is mostly accomplished through spaghetti plots of a single mid-troposphere pressure surface height contour. In order to explore new uncertainty visualization methods, the Weather Research and Forecasting (WRF) model was used to create a 48-hour, 18 member parameterization ensemble of the 13 March 1993 "Superstorm". A tool was designed to interactively explore the ensemble uncertainty of three important weather variables: water-vapor mixing ratio, perturbation potential temperature, and perturbation pressure. Uncertainty was quantified using individual ensemble member standard deviation, inter-quartile range, and the width of the 95% confidence interval. Bootstrapping was employed to overcome the dependence on normality in the uncertainty metrics. A coordinated view of ribbon and glyph-based uncertainty visualization, spaghetti plots, iso-pressure colormaps, and data transect plots was provided to two meteorologists for expert evaluation. They found it useful in assessing uncertainty in the data, especially in finding outliers in the ensemble run and therefore avoiding the WRF parameterizations that lead to these outliers. Additionally, the meteorologists could identify spatial regions where the uncertainty was significantly high, allowing for identification of poorly simulated storm environments and physical interpretation of these model issues.

Next generation extended Lagrangian first principles molecular dynamics.

Science.gov (United States)

Niklasson, Anders M N

2017-08-07

Extended Lagrangian Born-Oppenheimer molecular dynamics [A. M. N. Niklasson, Phys. Rev. Lett. 100, 123004 (2008)] is formulated for general Hohenberg-Kohn density-functional theory and compared with the extended Lagrangian framework of first principles molecular dynamics by Car and Parrinello [Phys. Rev. Lett. 55, 2471 (1985)]. It is shown how extended Lagrangian Born-Oppenheimer molecular dynamics overcomes several shortcomings of regular, direct Born-Oppenheimer molecular dynamics, while improving or maintaining important features of Car-Parrinello simulations. The accuracy of the electronic degrees of freedom in extended Lagrangian Born-Oppenheimer molecular dynamics, with respect to the exact Born-Oppenheimer solution, is of second-order in the size of the integration time step and of fourth order in the potential energy surface. Improved stability over recent formulations of extended Lagrangian Born-Oppenheimer molecular dynamics is achieved by generalizing the theory to finite temperature ensembles, using fractional occupation numbers in the calculation of the inner-product kernel of the extended harmonic oscillator that appears as a preconditioner in the electronic equations of motion. Material systems that normally exhibit slow self-consistent field convergence can be simulated using integration time steps of the same order as in direct Born-Oppenheimer molecular dynamics, but without the requirement of an iterative, non-linear electronic ground-state optimization prior to the force evaluations and without a systematic drift in the total energy. In combination with proposed low-rank and on the fly updates of the kernel, this formulation provides an efficient and general framework for quantum-based Born-Oppenheimer molecular dynamics simulations.
Ensemble Deep Learning for Biomedical Time Series Classification

Directory of Open Access Journals (Sweden)

Lin-peng Jin

2016-01-01

Full Text Available Ensemble learning has been proved to improve the generalization ability effectively in both theory and practice. In this paper, we briefly outline the current status of research on it first. Then, a new deep neural network-based ensemble method that integrates filtering views, local views, distorted views, explicit training, implicit training, subview prediction, and Simple Average is proposed for biomedical time series classification. Finally, we validate its effectiveness on the Chinese Cardiovascular Disease Database containing a large number of electrocardiogram recordings. The experimental results show that the proposed method has certain advantages compared to some well-known ensemble methods, such as Bagging and AdaBoost.
Device and Method for Gathering Ensemble Data Sets

Science.gov (United States)

Racette, Paul E. (Inventor)

2014-01-01

An ensemble detector uses calibrated noise references to produce ensemble sets of data from which properties of non-stationary processes may be extracted. The ensemble detector comprising: a receiver; a switching device coupled to the receiver, the switching device configured to selectively connect each of a plurality of reference noise signals to the receiver; and a gain modulation circuit coupled to the receiver and configured to vary a gain of the receiver based on a forcing signal; whereby the switching device selectively connects each of the plurality of reference noise signals to the receiver to produce an output signal derived from the plurality of reference noise signals and the forcing signal.
Combining 2-m temperature nowcasting and short range ensemble forecasting

Directory of Open Access Journals (Sweden)

A. Kann

2011-12-01

Full Text Available During recent years, numerical ensemble prediction systems have become an important tool for estimating the uncertainties of dynamical and physical processes as represented in numerical weather models. The latest generation of limited area ensemble prediction systems (LAM-EPSs allows for probabilistic forecasts at high resolution in both space and time. However, these systems still suffer from systematic deficiencies. Especially for nowcasting (0–6 h applications the ensemble spread is smaller than the actual forecast error. This paper tries to generate probabilistic short range 2-m temperature forecasts by combining a state-of-the-art nowcasting method and a limited area ensemble system, and compares the results with statistical methods. The Integrated Nowcasting Through Comprehensive Analysis (INCA system, which has been in operation at the Central Institute for Meteorology and Geodynamics (ZAMG since 2006 (Haiden et al., 2011, provides short range deterministic forecasts at high temporal (15 min–60 min and spatial (1 km resolution. An INCA Ensemble (INCA-EPS of 2-m temperature forecasts is constructed by applying a dynamical approach, a statistical approach, and a combined dynamic-statistical method. The dynamical method takes uncertainty information (i.e. ensemble variance from the operational limited area ensemble system ALADIN-LAEF (Aire Limitée Adaptation Dynamique Développement InterNational Limited Area Ensemble Forecasting which is running operationally at ZAMG (Wang et al., 2011. The purely statistical method assumes a well-calibrated spread-skill relation and applies ensemble spread according to the skill of the INCA forecast of the most recent past. The combined dynamic-statistical approach adapts the ensemble variance gained from ALADIN-LAEF with non-homogeneous Gaussian regression (NGR which yields a statistical mbox{correction} of the first and second moment (mean bias and dispersion for Gaussian distributed continuous
Modeling Dynamic Systems with Efficient Ensembles of Process-Based Models.

Directory of Open Access Journals (Sweden)

Nikola Simidjievski

Full Text Available Ensembles are a well established machine learning paradigm, leading to accurate and robust models, predominantly applied to predictive modeling tasks. Ensemble models comprise a finite set of diverse predictive models whose combined output is expected to yield an improved predictive performance as compared to an individual model. In this paper, we propose a new method for learning ensembles of process-based models of dynamic systems. The process-based modeling paradigm employs domain-specific knowledge to automatically learn models of dynamic systems from time-series observational data. Previous work has shown that ensembles based on sampling observational data (i.e., bagging and boosting, significantly improve predictive performance of process-based models. However, this improvement comes at the cost of a substantial increase of the computational time needed for learning. To address this problem, the paper proposes a method that aims at efficiently learning ensembles of process-based models, while maintaining their accurate long-term predictive performance. This is achieved by constructing ensembles with sampling domain-specific knowledge instead of sampling data. We apply the proposed method to and evaluate its performance on a set of problems of automated predictive modeling in three lake ecosystems using a library of process-based knowledge for modeling population dynamics. The experimental results identify the optimal design decisions regarding the learning algorithm. The results also show that the proposed ensembles yield significantly more accurate predictions of population dynamics as compared to individual process-based models. Finally, while their predictive performance is comparable to the one of ensembles obtained with the state-of-the-art methods of bagging and boosting, they are substantially more efficient.
IASI Radiance Data Assimilation in Local Ensemble Transform Kalman Filter

Science.gov (United States)

Cho, K.; Hyoung-Wook, C.; Jo, Y.

2016-12-01

Korea institute of Atmospheric Prediction Systems (KIAPS) is developing NWP model with data assimilation systems. Local Ensemble Transform Kalman Filter (LETKF) system, one of the data assimilation systems, has been developed for KIAPS Integrated Model (KIM) based on cubed-sphere grid and has successfully assimilated real data. LETKF data assimilation system has been extended to 4D- LETKF which considers time-evolving error covariance within assimilation window and IASI radiance data assimilation using KPOP (KIAPS package for observation processing) with RTTOV (Radiative Transfer for TOVS). The LETKF system is implementing semi operational prediction including conventional (sonde, aircraft) observation and AMSU-A (Advanced Microwave Sounding Unit-A) radiance data from April. Recently, the semi operational prediction system updated radiance observations including GPS-RO, AMV, IASI (Infrared Atmospheric Sounding Interferometer) data at July. A set of simulation of KIM with ne30np4 and 50 vertical levels (of top 0.3hPa) were carried out for short range forecast (10days) within semi operation prediction LETKF system with ensemble forecast 50 members. In order to only IASI impact, our experiments used only conventional and IAIS radiance data to same semi operational prediction set. We carried out sensitivity test for IAIS thinning method (3D and 4D). IASI observation number was increased by temporal (4D) thinning and the improvement of IASI radiance data impact on the forecast skill of model will expect.
Dispersion of aerosol particles in the free atmosphere using ensemble forecasts

Directory of Open Access Journals (Sweden)

T. Haszpra

2013-10-01

Full Text Available The dispersion of aerosol particle pollutants is studied using 50 members of an ensemble forecast in the example of a hypothetical free atmospheric emission above Fukushima over a period of 2.5 days. Considerable differences are found among the dispersion predictions of the different ensemble members, as well as between the ensemble mean and the deterministic result at the end of the observation period. The variance is found to decrease with the particle size. The geographical area where a threshold concentration is exceeded in at least one ensemble member expands to a 5–10 times larger region than the area from the deterministic forecast, both for air column "concentration" and in the "deposition" field. We demonstrate that the root-mean-square distance of any particle from its own clones in the ensemble members can reach values on the order of one thousand kilometers. Even the centers of mass of the particle cloud of the ensemble members deviate considerably from that obtained by the deterministic forecast. All these indicate that an investigation of the dispersion of aerosol particles in the spirit of ensemble forecast contains useful hints for the improvement of risk assessment.
Orbital magnetism in ensembles of ballistic billiards

International Nuclear Information System (INIS)

Ullmo, D.; Richter, K.; Jalabert, R.A.

1993-01-01

The magnetic response of ensembles of small two-dimensional structures at finite temperatures is calculated. Using semiclassical methods and numerical calculation it is demonstrated that only short classical trajectories are relevant. The magnetic susceptibility is enhanced in regular systems, where these trajectories appear in families. For ensembles of squares large paramagnetic susceptibility is obtained, in good agreement with recent measurements in the ballistic regime. (authors). 20 refs., 2 figs
Ensemble dispersion forecasting - Part 2. Application and evaluation

DEFF Research Database (Denmark)

Galmarini, S.; Bianconi, R.; Addis, R.

2004-01-01

of the dispersion of ETEX release 1 and the model ensemble is compared with the monitoring data. The scope of the comparison is to estimate to what extent the ensemble analysis is an improvement with respect to the single model results and represents a superior analysis of the process evolution. (C) 2004 Elsevier...
A variational ensemble scheme for noisy image data assimilation

Science.gov (United States)

Yang, Yin; Robinson, Cordelia; Heitz, Dominique; Mémin, Etienne

2014-05-01

Data assimilation techniques aim at recovering a system state variables trajectory denoted as X, along time from partially observed noisy measurements of the system denoted as Y. These procedures, which couple dynamics and noisy measurements of the system, fulfill indeed a twofold objective. On one hand, they provide a denoising - or reconstruction - procedure of the data through a given model framework and on the other hand, they provide estimation procedures for unknown parameters of the dynamics. A standard variational data assimilation problem can be formulated as the minimization of the following objective function with respect to the initial discrepancy, η, from the background initial guess: δ« J(η(x)) = 1∥Xb (x) - X (t ,x)∥2 + 1 tf∥H(X (t,x ))- Y (t,x)∥2dt. 2 0 0 B 2 t0 R (1) where the observation operator H links the state variable and the measurements. The cost function can be interpreted as the log likelihood function associated to the a posteriori distribution of the state given the past history of measurements and the background. In this work, we aim at studying ensemble based optimal control strategies for data assimilation. Such formulation nicely combines the ingredients of ensemble Kalman filters and variational data assimilation (4DVar). It is also formulated as the minimization of the objective function (1), but similarly to ensemble filter, it introduces in its objective function an empirical ensemble-based background-error covariance defined as: B ≡ )(Xb - )T>. (2) Thus, it works in an off-line smoothing mode rather than on the fly like sequential filters. Such resulting ensemble variational data assimilation technique corresponds to a relatively new family of methods [1,2,3]. It presents two main advantages: first, it does not require anymore to construct the adjoint of the dynamics tangent linear operator, which is a considerable advantage with respect to the method's implementation, and second, it enables the handling of a flow
Ensemble atmospheric dispersion modeling for emergency response consequence assessments

International Nuclear Information System (INIS)

Addis, R.P.; Buckley, R.L.

2003-01-01

Full text: Prognostic atmospheric dispersion models are used to generate consequence assessments, which assist decision-makers in the event of a release from a nuclear facility. Differences in the forecast wind fields generated by various meteorological agencies, differences in the transport and diffusion models themselves, as well as differences in the way these models treat the release source term, all may result in differences in the simulated plumes. This talk will address the U.S. participation in the European ENSEMBLE project, and present a perspective an how ensemble techniques may be used to enable atmospheric modelers to provide decision-makers with a more realistic understanding of how both the atmosphere and the models behave. Meteorological forecasts generated by numerical models from national and multinational meteorological agencies provide individual realizations of three-dimensional, time dependent atmospheric wind fields. These wind fields may be used to drive atmospheric dispersion (transport and diffusion) models, or they may be used to initiate other, finer resolution meteorological models, which in turn drive dispersion models. Many modeling agencies now utilize ensemble-modeling techniques to determine how sensitive the prognostic fields are to minor perturbations in the model parameters. However, the European Union programs RTMOD and ENSEMBLE are the first projects to utilize a WEB based ensemble approach to interpret the output from atmospheric dispersion models. The ensembles produced are different from those generated by meteorological forecasting centers in that they are ensembles of dispersion model outputs from many different atmospheric transport and diffusion models utilizing prognostic atmospheric fields from several different forecast centers. As such, they enable a decision-maker to consider the uncertainty in the plume transport and growth as a result of the differences in the forecast wind fields as well as the differences in the
Benchmarking Commercial Conformer Ensemble Generators.

Science.gov (United States)

Friedrich, Nils-Ole; de Bruyn Kops, Christina; Flachsenberg, Florian; Sommer, Kai; Rarey, Matthias; Kirchmair, Johannes

2017-11-27

We assess and compare the performance of eight commercial conformer ensemble generators (ConfGen, ConfGenX, cxcalc, iCon, MOE LowModeMD, MOE Stochastic, MOE Conformation Import, and OMEGA) and one leading free algorithm, the distance geometry algorithm implemented in RDKit. The comparative study is based on a new version of the Platinum Diverse Dataset, a high-quality benchmarking dataset of 2859 protein-bound ligand conformations extracted from the PDB. Differences in the performance of commercial algorithms are much smaller than those observed for free algorithms in our previous study (J. Chem. Inf. 2017, 57, 529-539). For commercial algorithms, the median minimum root-mean-square deviations measured between protein-bound ligand conformations and ensembles of a maximum of 250 conformers are between 0.46 and 0.61 Å. Commercial conformer ensemble generators are characterized by their high robustness, with at least 99% of all input molecules successfully processed and few or even no substantial geometrical errors detectable in their output conformations. The RDKit distance geometry algorithm (with minimization enabled) appears to be a good free alternative since its performance is comparable to that of the midranked commercial algorithms. Based on a statistical analysis, we elaborate on which algorithms to use and how to parametrize them for best performance in different application scenarios.
Mass Conservation and Positivity Preservation with Ensemble-type Kalman Filter Algorithms

Science.gov (United States)

Janjic, Tijana; McLaughlin, Dennis B.; Cohn, Stephen E.; Verlaan, Martin

2013-01-01

Maintaining conservative physical laws numerically has long been recognized as being important in the development of numerical weather prediction (NWP) models. In the broader context of data assimilation, concerted efforts to maintain conservation laws numerically and to understand the significance of doing so have begun only recently. In order to enforce physically based conservation laws of total mass and positivity in the ensemble Kalman filter, we incorporate constraints to ensure that the filter ensemble members and the ensemble mean conserve mass and remain nonnegative through measurement updates. We show that the analysis steps of ensemble transform Kalman filter (ETKF) algorithm and ensemble Kalman filter algorithm (EnKF) can conserve the mass integral, but do not preserve positivity. Further, if localization is applied or if negative values are simply set to zero, then the total mass is not conserved either. In order to ensure mass conservation, a projection matrix that corrects for localization effects is constructed. In order to maintain both mass conservation and positivity preservation through the analysis step, we construct a data assimilation algorithms based on quadratic programming and ensemble Kalman filtering. Mass and positivity are both preserved by formulating the filter update as a set of quadratic programming problems that incorporate constraints. Some simple numerical experiments indicate that this approach can have a significant positive impact on the posterior ensemble distribution, giving results that are more physically plausible both for individual ensemble members and for the ensemble mean. The results show clear improvements in both analyses and forecasts, particularly in the presence of localized features. Behavior of the algorithm is also tested in presence of model error.
Conductor gestures influence evaluations of ensemble performance

Directory of Open Access Journals (Sweden)

Steven eMorrison

2014-07-01

Full Text Available Previous research has found that listener evaluations of ensemble performances vary depending on the expressivity of the conductor’s gestures, even when performances are otherwise identical. It was the purpose of the present study to test whether this effect of visual information was evident in the evaluation of specific aspects of ensemble performance, articulation and dynamics. We constructed a set of 32 music performances that combined auditory and visual information and were designed to feature a high degree of contrast along one of two target characteristics: articulation and dynamics. We paired each of four music excerpts recorded by a chamber ensemble in both a high- and low-contrast condition with video of four conductors demonstrating high- and low-contrast gesture specifically appropriate to either articulation or dynamics. Using one of two equivalent test forms, college music majors and nonmajors (N = 285 viewed sixteen 30-second performances and evaluated the quality of the ensemble’s articulation, dynamics, technique and tempo along with overall expressivity. Results showed significantly higher evaluations for performances featuring high rather than low conducting expressivity regardless of the ensemble’s performance quality. Evaluations for both articulation and dynamics were strongly and positively correlated with evaluations of overall ensemble expressivity.
Curve Boxplot: Generalization of Boxplot for Ensembles of Curves.

Science.gov (United States)

Mirzargar, Mahsa; Whitaker, Ross T; Kirby, Robert M

2014-12-01

In simulation science, computational scientists often study the behavior of their simulations by repeated solutions with variations in parameters and/or boundary values or initial conditions. Through such simulation ensembles, one can try to understand or quantify the variability or uncertainty in a solution as a function of the various inputs or model assumptions. In response to a growing interest in simulation ensembles, the visualization community has developed a suite of methods for allowing users to observe and understand the properties of these ensembles in an efficient and effective manner. An important aspect of visualizing simulations is the analysis of derived features, often represented as points, surfaces, or curves. In this paper, we present a novel, nonparametric method for summarizing ensembles of 2D and 3D curves. We propose an extension of a method from descriptive statistics, data depth, to curves. We also demonstrate a set of rendering and visualization strategies for showing rank statistics of an ensemble of curves, which is a generalization of traditional whisker plots or boxplots to multidimensional curves. Results are presented for applications in neuroimaging, hurricane forecasting and fluid dynamics.
Conservation of Mass and Preservation of Positivity with Ensemble-Type Kalman Filter Algorithms

Science.gov (United States)

Janjic, Tijana; Mclaughlin, Dennis; Cohn, Stephen E.; Verlaan, Martin

2014-01-01

This paper considers the incorporation of constraints to enforce physically based conservation laws in the ensemble Kalman filter. In particular, constraints are used to ensure that the ensemble members and the ensemble mean conserve mass and remain nonnegative through measurement updates. In certain situations filtering algorithms such as the ensemble Kalman filter (EnKF) and ensemble transform Kalman filter (ETKF) yield updated ensembles that conserve mass but are negative, even though the actual states must be nonnegative. In such situations if negative values are set to zero, or a log transform is introduced, the total mass will not be conserved. In this study, mass and positivity are both preserved by formulating the filter update as a set of quadratic programming problems that incorporate non-negativity constraints. Simple numerical experiments indicate that this approach can have a significant positive impact on the posterior ensemble distribution, giving results that are more physically plausible both for individual ensemble members and for the ensemble mean. In two examples, an update that includes a non-negativity constraint is able to properly describe the transport of a sharp feature (e.g., a triangle or cone). A number of implementation questions still need to be addressed, particularly the need to develop a computationally efficient quadratic programming update for large ensemble.
Momentum distribution functions in ensembles: the inequivalence of microcannonical and canonical ensembles in a finite ultracold system.

Science.gov (United States)

Wang, Pei; Xianlong, Gao; Li, Haibin

2013-08-01

It is demonstrated in many thermodynamic textbooks that the equivalence of the different ensembles is achieved in the thermodynamic limit. In this present work we discuss the inequivalence of microcanonical and canonical ensembles in a finite ultracold system at low energies. We calculate the microcanonical momentum distribution function (MDF) in a system of identical fermions (bosons). We find that the microcanonical MDF deviates from the canonical one, which is the Fermi-Dirac (Bose-Einstein) function, in a finite system at low energies where the single-particle density of states and its inverse are finite.
A high-coverage draft genome of the mycalesine butterfly Bicyclus anynana.

Science.gov (United States)

Nowell, Reuben W; Elsworth, Ben; Oostra, Vicencio; Zwaan, Bas J; Wheat, Christopher W; Saastamoinen, Marjo; Saccheri, Ilik J; Van't Hof, Arjen E; Wasik, Bethany R; Connahs, Heidi; Aslam, Muhammad L; Kumar, Sujai; Challis, Richard J; Monteiro, Antónia; Brakefield, Paul M; Blaxter, Mark

2017-07-01

The mycalesine butterfly Bicyclus anynana, the "Squinting bush brown," is a model organism in the study of lepidopteran ecology, development, and evolution. Here, we present a draft genome sequence for B. anynana to serve as a genomics resource for current and future studies of this important model species. Seven libraries with insert sizes ranging from 350 bp to 20 kb were constructed using DNA from an inbred female and sequenced using both Illumina and PacBio technology; 128 Gb of raw Illumina data was filtered to 124 Gb and assembled to a final size of 475 Mb (∼×260 assembly coverage). Contigs were scaffolded using mate-pair, transcriptome, and PacBio data into 10 800 sequences with an N50 of 638 kb (longest scaffold 5 Mb). The genome is comprised of 26% repetitive elements and encodes a total of 22 642 predicted protein-coding genes. Recovery of a BUSCO set of core metazoan genes was almost complete (98%). Overall, these metrics compare well with other recently published lepidopteran genomes. We report a high-quality draft genome sequence for Bicyclus anynana. The genome assembly and annotated gene models are available at LepBase (http://ensembl.lepbase.org/index.html). © The Authors 2017. Published by Oxford University Press.
An educational model for ensemble streamflow simulation and uncertainty analysis

Directory of Open Access Journals (Sweden)

A. AghaKouchak

2013-02-01

Full Text Available This paper presents the hands-on modeling toolbox, HBV-Ensemble, designed as a complement to theoretical hydrology lectures, to teach hydrological processes and their uncertainties. The HBV-Ensemble can be used for in-class lab practices and homework assignments, and assessment of students' understanding of hydrological processes. Using this modeling toolbox, students can gain more insights into how hydrological processes (e.g., precipitation, snowmelt and snow accumulation, soil moisture, evapotranspiration and runoff generation are interconnected. The educational toolbox includes a MATLAB Graphical User Interface (GUI and an ensemble simulation scheme that can be used for teaching uncertainty analysis, parameter estimation, ensemble simulation and model sensitivity. HBV-Ensemble was administered in a class for both in-class instruction and a final project, and students submitted their feedback about the toolbox. The results indicate that this educational software had a positive impact on students understanding and knowledge of uncertainty in hydrological modeling.
Inhomogeneous ensembles of radical pairs in chemical compasses

Science.gov (United States)

Procopio, Maria; Ritz, Thorsten

2016-11-01

The biophysical basis for the ability of animals to detect the geomagnetic field and to use it for finding directions remains a mystery of sensory biology. One much debated hypothesis suggests that an ensemble of specialized light-induced radical pair reactions can provide the primary signal for a magnetic compass sensor. The question arises what features of such a radical pair ensemble could be optimized by evolution so as to improve the detection of the direction of weak magnetic fields. Here, we focus on the overlooked aspect of the noise arising from inhomogeneity of copies of biomolecules in a realistic biological environment. Such inhomogeneity leads to variations of the radical pair parameters, thereby deteriorating the signal arising from an ensemble and providing a source of noise. We investigate the effect of variations in hyperfine interactions between different copies of simple radical pairs on the directional response of a compass system. We find that the choice of radical pair parameters greatly influences how strongly the directional response of an ensemble is affected by inhomogeneity.

Enhancing COSMO-DE ensemble forecasts by inexpensive techniques

Directory of Open Access Journals (Sweden)

Zied Ben Bouallègue

2013-02-01

Full Text Available COSMO-DE-EPS, a convection-permitting ensemble prediction system based on the high-resolution numerical weather prediction model COSMO-DE, is pre-operational since December 2010, providing probabilistic forecasts which cover Germany. This ensemble system comprises 20 members based on variations of the lateral boundary conditions, the physics parameterizations and the initial conditions. In order to increase the sample size in a computationally inexpensive way, COSMO-DE-EPS is combined with alternative ensemble techniques: the neighborhood method and the time-lagged approach. Their impact on the quality of the resulting probabilistic forecasts is assessed. Objective verification is performed over a six months period, scores based on the Brier score and its decomposition are shown for June 2011. The combination of the ensemble system with the alternative approaches improves probabilistic forecasts of precipitation in particular for high precipitation thresholds. Moreover, combining COSMO-DE-EPS with only the time-lagged approach improves the skill of area probabilities for precipitation and does not deteriorate the skill of 2 m-temperature and wind gusts forecasts.
Genomic footprinting in mammalian cells with ultraviolet light

International Nuclear Information System (INIS)

Becker, M.M.; Wang, Z.; Grossmann, G.; Becherer, K.A.

1989-01-01

A simple and accurate genomic primer extension method has been developed to detect ultraviolet footprinting patterns of regulatory protein-DNA interactions in mammalian genomic DNA. The technique can also detect footprinting or sequencing patterns introduced into genomic DNA by other methods. Purified genomic DNA, containing either damaged bases or strand breaks introduced by footprinting or sequencing reactions, is first cut with a convenient restriction enzyme to reduce its molecular weight. A highly radioactive single-stranded DNA primer that is complementary to a region of genomic DNA whose sequence or footprint one wishes to examine is then mixed with 50 micrograms of restriction enzyme-cut genomic DNA. The primer is approximately 100 bases long and contains 85 radioactive phosphates, each of specific activity 3000 Ci/mmol (1 Ci = 37 GBq). A simple and fast method for preparing such primers is described. Following brief heat denaturation at 100 degrees C, the solution of genomic DNA and primer is cooled to 74 degrees C and a second solution containing Taq polymerase (Thermus aquaticus DNA polymerase) and the four deoxynucleotide triphosphates is added to initiate primer extension of genomic DNA. Taq polymerase extends genomic hybridized primer until its polymerization reaction is terminated either by a damaged base or strand break in genomic DNA or by the addition of dideoxynucleotide triphosphates in the polymerization reaction. The concurrent primer hybridization-extension reaction is terminated after 5 hr and unhybridized primer is digested away by mung bean nuclease. Primer-extended genomic DNA is then denatured and electrophoresed on a polyacrylamide sequencing gel, and radioactive primer extension products are revealed by autoradiography
Spectral statistics in semiclassical random-matrix ensembles

International Nuclear Information System (INIS)

Feingold, M.; Leitner, D.M.; Wilkinson, M.

1991-01-01

A novel random-matrix ensemble is introduced which mimics the global structure inherent in the Hamiltonian matrices of autonomous, ergodic systems. Changes in its parameters induce a transition between a Poisson and a Wigner distribution for the level spacings, P(s). The intermediate distributions are uniquely determined by a single scaling variable. Semiclassical constraints force the ensemble to be in a regime with Wigner P(s) for systems with more than two freedoms
Multivariate localization methods for ensemble Kalman filtering

OpenAIRE

S. Roh; M. Jun; I. Szunyogh; M. G. Genton

2015-01-01

In ensemble Kalman filtering (EnKF), the small number of ensemble members that is feasible to use in a practical data assimilation application leads to sampling variability of the estimates of the background error covariances. The standard approach to reducing the effects of this sampling variability, which has also been found to be highly efficient in improving the performance of EnKF, is the localization of the estimates of the covariances. One family of ...
Decimated Input Ensembles for Improved Generalization

Science.gov (United States)

Tumer, Kagan; Oza, Nikunj C.; Norvig, Peter (Technical Monitor)

1999-01-01

Recently, many researchers have demonstrated that using classifier ensembles (e.g., averaging the outputs of multiple classifiers before reaching a classification decision) leads to improved performance for many difficult generalization problems. However, in many domains there are serious impediments to such "turnkey" classification accuracy improvements. Most notable among these is the deleterious effect of highly correlated classifiers on the ensemble performance. One particular solution to this problem is generating "new" training sets by sampling the original one. However, with finite number of patterns, this causes a reduction in the training patterns each classifier sees, often resulting in considerably worsened generalization performance (particularly for high dimensional data domains) for each individual classifier. Generally, this drop in the accuracy of the individual classifier performance more than offsets any potential gains due to combining, unless diversity among classifiers is actively promoted. In this work, we introduce a method that: (1) reduces the correlation among the classifiers; (2) reduces the dimensionality of the data, thus lessening the impact of the 'curse of dimensionality'; and (3) improves the classification performance of the ensemble.
Tailored Random Graph Ensembles

International Nuclear Information System (INIS)

Roberts, E S; Annibale, A; Coolen, A C C

2013-01-01

Tailored graph ensembles are a developing bridge between biological networks and statistical mechanics. The aim is to use this concept to generate a suite of rigorous tools that can be used to quantify and compare the topology of cellular signalling networks, such as protein-protein interaction networks and gene regulation networks. We calculate exact and explicit formulae for the leading orders in the system size of the Shannon entropies of random graph ensembles constrained with degree distribution and degree-degree correlation. We also construct an ergodic detailed balance Markov chain with non-trivial acceptance probabilities which converges to a strictly uniform measure and is based on edge swaps that conserve all degrees. The acceptance probabilities can be generalized to define Markov chains that target any alternative desired measure on the space of directed or undirected graphs, in order to generate graphs with more sophisticated topological features.
Deviations from Wick's theorem in the canonical ensemble

Science.gov (United States)

Schönhammer, K.

2017-07-01

Wick's theorem for the expectation values of products of field operators for a system of noninteracting fermions or bosons plays an important role in the perturbative approach to the quantum many-body problem. A finite-temperature version holds in the framework of the grand canonical ensemble, but not for the canonical ensemble appropriate for systems with fixed particle number such as ultracold quantum gases in optical lattices. Here we present formulas for expectation values of products of field operators in the canonical ensemble using a method in the spirit of Gaudin's proof of Wick's theorem for the grand canonical case. The deviations from Wick's theorem are examined quantitatively for two simple models of noninteracting fermions.
A hybrid nudging-ensemble Kalman filter approach to data assimilation. Part I: application in the Lorenz system

Directory of Open Access Journals (Sweden)

Lili Lei

2012-05-01

Full Text Available A hybrid data assimilation approach combining nudging and the ensemble Kalman filter (EnKF for dynamic analysis and numerical weather prediction is explored here using the non-linear Lorenz three-variable model system with the goal of a smooth, continuous and accurate data assimilation. The hybrid nudging-EnKF (HNEnKF computes the hybrid nudging coefficients from the flow-dependent, time-varying error covariance matrix from the EnKF's ensemble forecasts. It extends the standard diagonal nudging terms to additional off-diagonal statistical correlation terms for greater inter-variable influence of the innovations in the model's predictive equations to assist in the data assimilation process. The HNEnKF promotes a better fit of an analysis to data compared to that achieved by either nudging or incremental analysis update (IAU. When model error is introduced, it produces similar or better root mean square errors compared to the EnKF while minimising the error spikes/discontinuities created by the intermittent EnKF. It provides a continuous data assimilation with better inter-variable consistency and improved temporal smoothness than that of the EnKF. Data assimilation experiments are also compared to the ensemble Kalman smoother (EnKS. The HNEnKF has similar or better temporal smoothness than that of the EnKS, and with much smaller central processing unit (CPU time and data storage requirements.
Preferences of and Attitudes toward Treble Choral Ensembles

Science.gov (United States)

Wilson, Jill M.

2012-01-01

In choral ensembles, a pursuit where females far outnumber males, concern exists that females are being devalued. Attitudes of female choral singers may be negatively affected by the gender imbalance that exists in mixed choirs and by the placement of the mixed choir as the most select ensemble in a program. The purpose of this research was to…
A note on the multi model super ensemble technique for reducing forecast errors

International Nuclear Information System (INIS)

Kantha, L.; Carniel, S.; Sclavo, M.

2008-01-01

The multi model super ensemble (S E) technique has been used with considerable success to improve meteorological forecasts and is now being applied to ocean models. Although the technique has been shown to produce deterministic forecasts that can be superior to the individual models in the ensemble or a simple multi model ensemble forecast, there is a clear need to understand its strengths and limitations. This paper is an attempt to do so in simple, easily understood contexts. The results demonstrate that the S E forecast is almost always better than the simple ensemble forecast, the degree of improvement depending on the properties of the models in the ensemble. However, the skill of the S E forecast with respect to the true forecast depends on a number of factors, principal among which is the skill of the models in the ensemble. As can be expected, if the ensemble consists of models with poor skill, the S E forecast will also be poor, although better than the ensemble forecast. On the other hand, the inclusion of even a single skillful model in the ensemble increases the forecast skill significantly.
Comparing Mycobacterium tuberculosis genomes using genome topology networks.

Science.gov (United States)

Jiang, Jianping; Gu, Jianlei; Zhang, Liang; Zhang, Chenyi; Deng, Xiao; Dou, Tonghai; Zhao, Guoping; Zhou, Yan

2015-02-14

Over the last decade, emerging research methods, such as comparative genomic analysis and phylogenetic study, have yielded new insights into genotypes and phenotypes of closely related bacterial strains. Several findings have revealed that genomic structural variations (SVs), including gene gain/loss, gene duplication and genome rearrangement, can lead to different phenotypes among strains, and an investigation of genes affected by SVs may extend our knowledge of the relationships between SVs and phenotypes in microbes, especially in pathogenic bacteria. In this work, we introduce a 'Genome Topology Network' (GTN) method based on gene homology and gene locations to analyze genomic SVs and perform phylogenetic analysis. Furthermore, the concept of 'unfixed ortholog' has been proposed, whose members are affected by SVs in genome topology among close species. To improve the precision of 'unfixed ortholog' recognition, a strategy to detect annotation differences and complete gene annotation was applied. To assess the GTN method, a set of thirteen complete M. tuberculosis genomes was analyzed as a case study. GTNs with two different gene homology-assigning methods were built, the Clusters of Orthologous Groups (COG) method and the orthoMCL clustering method, and two phylogenetic trees were constructed accordingly, which may provide additional insights into whole genome-based phylogenetic analysis. We obtained 24 unfixable COG groups, of which most members were related to immunogenicity and drug resistance, such as PPE-repeat proteins (COG5651) and transcriptional regulator TetR gene family members (COG1309). The GTN method has been implemented in PERL and released on our website. The tool can be downloaded from http://homepage.fudan.edu.cn/zhouyan/gtn/ , and allows re-annotating the 'lost' genes among closely related genomes, analyzing genes affected by SVs, and performing phylogenetic analysis. With this tool, many immunogenic-related and drug resistance-related genes
Ensemble prediction of floods – catchment non-linearity and forecast probabilities

Directory of Open Access Journals (Sweden)

C. Reszler

2007-07-01

Full Text Available Quantifying the uncertainty of flood forecasts by ensemble methods is becoming increasingly important for operational purposes. The aim of this paper is to examine how the ensemble distribution of precipitation forecasts propagates in the catchment system, and to interpret the flood forecast probabilities relative to the forecast errors. We use the 622 km2 Kamp catchment in Austria as an example where a comprehensive data set, including a 500 yr and a 1000 yr flood, is available. A spatially-distributed continuous rainfall-runoff model is used along with ensemble and deterministic precipitation forecasts that combine rain gauge data, radar data and the forecast fields of the ALADIN and ECMWF numerical weather prediction models. The analyses indicate that, for long lead times, the variability of the precipitation ensemble is amplified as it propagates through the catchment system as a result of non-linear catchment response. In contrast, for lead times shorter than the catchment lag time (e.g. 12 h and less, the variability of the precipitation ensemble is decreased as the forecasts are mainly controlled by observed upstream runoff and observed precipitation. Assuming that all ensemble members are equally likely, the statistical analyses for five flood events at the Kamp showed that the ensemble spread of the flood forecasts is always narrower than the distribution of the forecast errors. This is because the ensemble forecasts focus on the uncertainty in forecast precipitation as the dominant source of uncertainty, and other sources of uncertainty are not accounted for. However, a number of analyses, including Relative Operating Characteristic diagrams, indicate that the ensemble spread is a useful indicator to assess potential forecast errors for lead times larger than 12 h.
Rainfall downscaling of weekly ensemble forecasts using self-organising maps

Directory of Open Access Journals (Sweden)

Masamichi Ohba

2016-03-01

Full Text Available This study presents an application of self-organising maps (SOMs to downscaling medium-range ensemble forecasts and probabilistic prediction of local precipitation in Japan. SOM was applied to analyse and connect the relationship between atmospheric patterns over Japan and local high-resolution precipitation data. Multiple SOM was simultaneously employed on four variables derived from the JRA-55 reanalysis over the area of study (south-western Japan, and a two-dimensional lattice of weather patterns (WPs was obtained. Weekly ensemble forecasts can be downscaled to local precipitation using the obtained multiple SOM. The downscaled precipitation is derived by the five SOM lattices based on the WPs of the global model ensemble forecasts for a particular day in 2009–2011. Because this method effectively handles the stochastic uncertainties from the large number of ensemble members, a probabilistic local precipitation is easily and quickly obtained from the ensemble forecasts. This downscaling of ensemble forecasts provides results better than those from a 20-km global spectral model (i.e. capturing the relatively detailed precipitation distribution over the region. To capture the effect of the detailed pattern differences in each SOM node, a statistical model is additionally concreted for each SOM node. The predictability skill of the ensemble forecasts is significantly improved under the neural network-statistics hybrid-downscaling technique, which then brings a much better skill score than the traditional method. It is expected that the results of this study will provide better guidance to the user community and contribute to the future development of dam-management models.
Quantum Ensemble Classification: A Sampling-Based Learning Control Approach.

Science.gov (United States)

Chen, Chunlin; Dong, Daoyi; Qi, Bo; Petersen, Ian R; Rabitz, Herschel

2017-06-01

Quantum ensemble classification (QEC) has significant applications in discrimination of atoms (or molecules), separation of isotopes, and quantum information extraction. However, quantum mechanics forbids deterministic discrimination among nonorthogonal states. The classification of inhomogeneous quantum ensembles is very challenging, since there exist variations in the parameters characterizing the members within different classes. In this paper, we recast QEC as a supervised quantum learning problem. A systematic classification methodology is presented by using a sampling-based learning control (SLC) approach for quantum discrimination. The classification task is accomplished via simultaneously steering members belonging to different classes to their corresponding target states (e.g., mutually orthogonal states). First, a new discrimination method is proposed for two similar quantum systems. Then, an SLC method is presented for QEC. Numerical results demonstrate the effectiveness of the proposed approach for the binary classification of two-level quantum ensembles and the multiclass classification of multilevel quantum ensembles.
Cluster ensembles, quantization and the dilogarithm

DEFF Research Database (Denmark)

Fock, Vladimir; Goncharov, Alexander B.

2009-01-01

A cluster ensemble is a pair of positive spaces (i.e. varieties equipped with positive atlases), coming with an action of a symmetry group . The space is closely related to the spectrum of a cluster algebra [ 12 ]. The two spaces are related by a morphism . The space is equipped with a closed -form......, possibly degenerate, and the space has a Poisson structure. The map is compatible with these structures. The dilogarithm together with its motivic and quantum avatars plays a central role in the cluster ensemble structure. We define a non-commutative -deformation of the -space. When is a root of unity...
Nonequilibrium statistical mechanics ensemble method

CERN Document Server

Eu, Byung Chan

1998-01-01

In this monograph, nonequilibrium statistical mechanics is developed by means of ensemble methods on the basis of the Boltzmann equation, the generic Boltzmann equations for classical and quantum dilute gases, and a generalised Boltzmann equation for dense simple fluids The theories are developed in forms parallel with the equilibrium Gibbs ensemble theory in a way fully consistent with the laws of thermodynamics The generalised hydrodynamics equations are the integral part of the theory and describe the evolution of macroscopic processes in accordance with the laws of thermodynamics of systems far removed from equilibrium Audience This book will be of interest to researchers in the fields of statistical mechanics, condensed matter physics, gas dynamics, fluid dynamics, rheology, irreversible thermodynamics and nonequilibrium phenomena
Ensembl Genomes 2013

DEFF Research Database (Denmark)

Kersey, Paul Julian; Allen, James E; Christensen, Mikkel

2014-01-01

, and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. This article provides an update...
Short text sentiment classification based on feature extension and ensemble classifier

Science.gov (United States)

Liu, Yang; Zhu, Xie

2018-05-01

With the rapid development of Internet social media, excavating the emotional tendencies of the short text information from the Internet, the acquisition of useful information has attracted the attention of researchers. At present, the commonly used can be attributed to the rule-based classification and statistical machine learning classification methods. Although micro-blog sentiment analysis has made good progress, there still exist some shortcomings such as not highly accurate enough and strong dependence from sentiment classification effect. Aiming at the characteristics of Chinese short texts, such as less information, sparse features, and diverse expressions, this paper considers expanding the original text by mining related semantic information from the reviews, forwarding and other related information. First, this paper uses Word2vec to compute word similarity to extend the feature words. And then uses an ensemble classifier composed of SVM, KNN and HMM to analyze the emotion of the short text of micro-blog. The experimental results show that the proposed method can make good use of the comment forwarding information to extend the original features. Compared with the traditional method, the accuracy, recall and F1 value obtained by this method have been improved.
Quantum canonical ensemble: A projection operator approach

Science.gov (United States)

Magnus, Wim; Lemmens, Lucien; Brosens, Fons

2017-09-01

Knowing the exact number of particles N, and taking this knowledge into account, the quantum canonical ensemble imposes a constraint on the occupation number operators. The constraint particularly hampers the systematic calculation of the partition function and any relevant thermodynamic expectation value for arbitrary but fixed N. On the other hand, fixing only the average number of particles, one may remove the above constraint and simply factorize the traces in Fock space into traces over single-particle states. As is well known, that would be the strategy of the grand-canonical ensemble which, however, comes with an additional Lagrange multiplier to impose the average number of particles. The appearance of this multiplier can be avoided by invoking a projection operator that enables a constraint-free computation of the partition function and its derived quantities in the canonical ensemble, at the price of an angular or contour integration. Introduced in the recent past to handle various issues related to particle-number projected statistics, the projection operator approach proves beneficial to a wide variety of problems in condensed matter physics for which the canonical ensemble offers a natural and appropriate environment. In this light, we present a systematic treatment of the canonical ensemble that embeds the projection operator into the formalism of second quantization while explicitly fixing N, the very number of particles rather than the average. Being applicable to both bosonic and fermionic systems in arbitrary dimensions, transparent integral representations are provided for the partition function ZN and the Helmholtz free energy FN as well as for two- and four-point correlation functions. The chemical potential is not a Lagrange multiplier regulating the average particle number but can be extracted from FN+1 -FN, as illustrated for a two-dimensional fermion gas.
On extending Kohn-Sham density functionals to systems with fractional number of electrons.

Science.gov (United States)

Li, Chen; Lu, Jianfeng; Yang, Weitao

2017-06-07

We analyze four ways of formulating the Kohn-Sham (KS) density functionals with a fractional number of electrons, through extending the constrained search space from the Kohn-Sham and the generalized Kohn-Sham (GKS) non-interacting v-representable density domain for integer systems to four different sets of densities for fractional systems. In particular, these density sets are (I) ensemble interacting N-representable densities, (II) ensemble non-interacting N-representable densities, (III) non-interacting densities by the Janak construction, and (IV) non-interacting densities whose composing orbitals satisfy the Aufbau occupation principle. By proving the equivalence of the underlying first order reduced density matrices associated with these densities, we show that sets (I), (II), and (III) are equivalent, and all reduce to the Janak construction. Moreover, for functionals with the ensemble v-representable assumption at the minimizer, (III) reduces to (IV) and thus justifies the previous use of the Aufbau protocol within the (G)KS framework in the study of the ground state of fractional electron systems, as defined in the grand canonical ensemble at zero temperature. By further analyzing the Aufbau solution for different density functional approximations (DFAs) in the (G)KS scheme, we rigorously prove that there can be one and only one fractional occupation for the Hartree Fock functional, while there can be multiple fractional occupations for general DFAs in the presence of degeneracy. This has been confirmed by numerical calculations using the local density approximation as a representative of general DFAs. This work thus clarifies important issues on density functional theory calculations for fractional electron systems.

Ensemble data assimilation in the Red Sea: sensitivity to ensemble selection and atmospheric forcing

KAUST Repository

Toye, Habib; Zhan, Peng; Gopalakrishnan, Ganesh; Kartadikaria, Aditya R.; Huang, Huang; Knio, Omar; Hoteit, Ibrahim

2017-01-01

We present our efforts to build an ensemble data assimilation and forecasting system for the Red Sea. The system consists of the high-resolution Massachusetts Institute of Technology general circulation model (MITgcm) to simulate ocean circulation
An iterative ensemble Kalman filter for reservoir engineering applications

NARCIS (Netherlands)

Krymskaya, M.V.; Hanea, R.G.; Verlaan, M.

2009-01-01

The study has been focused on examining the usage and the applicability of ensemble Kalman filtering techniques to the history matching procedures. The ensemble Kalman filter (EnKF) is often applied nowadays to solving such a problem. Meanwhile, traditional EnKF requires assumption of the
Improving the ensemble-optimization method through covariance-matrix adaptation

NARCIS (Netherlands)

Fonseca, R.M.; Leeuwenburgh, O.; Hof, P.M.J. van den; Jansen, J.D.

2015-01-01

Ensemble optimization (referred to throughout the remainder of the paper as EnOpt) is a rapidly emerging method for reservoirmodel-based production optimization. EnOpt uses an ensemble of controls to approximate the gradient of the objective function with respect to the controls. Current
Extending the cereus group genomics to putative food-bornepathogens of different toxicity

Energy Technology Data Exchange (ETDEWEB)

Lapidus, Alla; Goltsman, Eugene; Auger, Sandrine; Galleron,Nathalie; Segurens, Beatrice; Dossat, Carole; Land, Miriam L.; Broussole,Veronique; Brillard, Julien; Guinebretiere, Marie-Helene; Sanchis,Vincent; Nguen-the, Christophe; Lereclus, Didier; Richardson, Paul; Winker, Patrick; Weissenbach, Jean; Ehrlich, S.Dusko; Sorokin, Alexei

2006-08-24

The cereus group represents sporulating soil bacteriacontaining pathogenic strains which may cause diarrheic or emetic foodpoisoning outbreaks. Multiple locus sequence typing revealed a presencein natural samples of these bacteria of about thirty clonal complexes.Application of genomic methods to this group was however biased due tothe major interest for representatives closely related to B. anthracis.Albeit the most important food-borne pathogens were not yet defined,existing dataindicate that they are scattered all over the phylogenetictree. The preliminary analysis of the sequences of three genomesdiscussed in this paper narrows down the gaps in our knowledge of thecereus group. The strain NVH391-98 is a rare but particularly severefood-borne pathogen. Sequencing revealed that the strain must be arepresentative of a novel bacterial species, for which the name Bacilluscytotoxis is proposed. This strain has a reduced genome size compared toother cereus group strains. Genome analysis revealed absence of sigma Bfactor and the presence of genes encoding diarrheic Nhe toxin, notdetected earlier. The strain B. cereus F837/76 represents a clonalcomplex close to that of B. anthracis. Including F837/76, three such B.cereus strains had been sequenced. Alignment of genomes suggests that B.anthracis is their common ancestor. Since such strains often emerge fromclinical cases, they merit a special attention. The third strain, KBAB4,is a typical psychrotrophe characteristic to unbiased soil communities.Phylogenic studies show that in nature it is the most active group interms of gene exchange. Genomic sequence revealed high presence ofextra-chromosomal genetic material (about 530 kb) that may account forthis phenomenon. Genes coding Nhe-like toxin were found on a big plasmidin this strain. This may indicate a potential mechanism of toxicityspread from the psychrotrophic strain community. The results of thisgenomic work and ecological compartments of different strains incite
Parallel quantum computing in a single ensemble quantum computer

International Nuclear Information System (INIS)

Long Guilu; Xiao, L.

2004-01-01

We propose a parallel quantum computing mode for ensemble quantum computer. In this mode, some qubits are in pure states while other qubits are in mixed states. It enables a single ensemble quantum computer to perform 'single-instruction-multidata' type of parallel computation. Parallel quantum computing can provide additional speedup in Grover's algorithm and Shor's algorithm. In addition, it also makes a fuller use of qubit resources in an ensemble quantum computer. As a result, some qubits discarded in the preparation of an effective pure state in the Schulman-Varizani and the Cleve-DiVincenzo algorithms can be reutilized
Fluctuation, stationarity, and ergodic properties of random-matrix ensembles

International Nuclear Information System (INIS)

Pandey, A.

1979-01-01

The properties of random-matrix ensembles and the application of such ensembles to energy-level fluctuations and strength fluctuations are discussed. The two-point correlation function for complex spectra described by the three standard Gaussian ensembles is calculated, and its essential simplicity, displayed by an elementary procedure that derives from the dominance of binary correlations. The resultant function is exact for the unitary case and a very good approximation to the orthogonal and symplectic cases. The same procedure yields the spectrum for a Gaussian orthogonal ensemble (GOE) deformed by a pairing interaction. Several extensions are given and relationships to other problems of current interest are discussed. The standard fluctuation measures are rederived for the GOE, and their extensions to the unitary and symplectic cases are given. The measures are shown to derive, for the most part, from the two-point function, and new relationships between them are established, answering some long-standing questions. Some comparisons with experimental values are also made. All the cluster functions, and therefore the fluctuation measures, are shown to be stationary and strongly ergodic, thus justifying the use of random matrices for individual spectra. Strength fluctuations in the orthogonal ensemble are also considered. The Porter-Thomas distribution in its various forms is rederived and its ergodicity is established
Convergence of the Square Root Ensemble Kalman Filter in the Large Ensemble Limit

Czech Academy of Sciences Publication Activity Database

Kwiatkowski, E.; Mandel, Jan

2015-01-01

Roč. 3, č. 1 (2015), s. 1-17 ISSN 2166-2525 R&D Projects: GA ČR GA13-34856S Institutional support: RVO:67985807 Keywords : data assimilation * Lp laws of large numbers * Hilbert space * ensemble Kalman filter Subject RIV: IN - Informatics, Computer Science
Spatio-temporal behaviour of medium-range ensemble forecasts

Science.gov (United States)

Kipling, Zak; Primo, Cristina; Charlton-Perez, Andrew

2010-05-01

Using the recently-developed mean-variance of logarithms (MVL) diagram, together with the TIGGE archive of medium-range ensemble forecasts from nine different centres, we present an analysis of the spatio-temporal dynamics of their perturbations, and show how the differences between models and perturbation techniques can explain the shape of their characteristic MVL curves. We also consider the use of the MVL diagram to compare the growth of perturbations within the ensemble with the growth of the forecast error, showing that there is a much closer correspondence for some models than others. We conclude by looking at how the MVL technique might assist in selecting models for inclusion in a multi-model ensemble, and suggest an experiment to test its potential in this context.
Microcanonical ensemble and algebra of conserved generators for generalized quantum dynamics

International Nuclear Information System (INIS)

Adler, S.L.; Horwitz, L.P.

1996-01-01

It has recently been shown, by application of statistical mechanical methods to determine the canonical ensemble governing the equilibrium distribution of operator initial values, that complex quantum field theory can emerge as a statistical approximation to an underlying generalized quantum dynamics. This result was obtained by an argument based on a Ward identity analogous to the equipartition theorem of classical statistical mechanics. We construct here a microcanonical ensemble which forms the basis of this canonical ensemble. This construction enables us to define the microcanonical entropy and free energy of the field configuration of the equilibrium distribution and to study the stability of the canonical ensemble. We also study the algebraic structure of the conserved generators from which the microcanonical and canonical ensembles are constructed, and the flows they induce on the phase space. copyright 1996 American Institute of Physics
Body maps on the human genome.

Science.gov (United States)

Cherniak, Christopher; Rodriguez-Esteban, Raul

2013-12-20

Chromosomes have territories, or preferred locales, in the cell nucleus. When these sites are taken into account, some large-scale structure of the human genome emerges. The synoptic picture is that genes highly expressed in particular topologically compact tissues are not randomly distributed on the genome. Rather, such tissue-specific genes tend to map somatotopically onto the complete chromosome set. They seem to form a "genome homunculus": a multi-dimensional, genome-wide body representation extending across chromosome territories of the entire spermcell nucleus. The antero-posterior axis of the body significantly corresponds to the head-tail axis of the nucleus, and the dorso-ventral body axis to the central-peripheral nucleus axis. This large-scale genomic structure includes thousands of genes. One rationale for a homuncular genome structure would be to minimize connection costs in genetic networks. Somatotopic maps in cerebral cortex have been reported for over a century.
Supplementary Material for: BEACON: automated tool for Bacterial GEnome Annotation ComparisON

KAUST Repository

Kalkatawi, Manal M.; Alam, Intikhab; Bajic, Vladimir B.

2015-01-01

Abstract Background Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). Results The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACONâ s utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27Â %, while the number of genes without any function assignment is reduced. Conclusions We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/ .
Competitive Learning Neural Network Ensemble Weighted by Predicted Performance

Science.gov (United States)

Ye, Qiang

2010-01-01

Ensemble approaches have been shown to enhance classification by combining the outputs from a set of voting classifiers. Diversity in error patterns among base classifiers promotes ensemble performance. Multi-task learning is an important characteristic for Neural Network classifiers. Introducing a secondary output unit that receives different…
Robust Ensemble Filtering and Its Relation to Covariance Inflation in the Ensemble Kalman Filter

KAUST Repository

Luo, Xiaodong

2011-12-01

A robust ensemble filtering scheme based on the H∞ filtering theory is proposed. The optimal H∞ filter is derived by minimizing the supremum (or maximum) of a predefined cost function, a criterion different from the minimum variance used in the Kalman filter. By design, the H∞ filter is more robust than the Kalman filter, in the sense that the estimation error in the H∞ filter in general has a finite growth rate with respect to the uncertainties in assimilation, except for a special case that corresponds to the Kalman filter. The original form of the H∞ filter contains global constraints in time, which may be inconvenient for sequential data assimilation problems. Therefore a variant is introduced that solves some time-local constraints instead, and hence it is called the time-local H∞ filter (TLHF). By analogy to the ensemble Kalman filter (EnKF), the concept of ensemble time-local H∞ filter (EnTLHF) is also proposed. The general form of the EnTLHF is outlined, and some of its special cases are discussed. In particular, it is shown that an EnKF with certain covariance inflation is essentially an EnTLHF. In this sense, the EnTLHF provides a general framework for conducting covariance inflation in the EnKF-based methods. Some numerical examples are used to assess the relative robustness of the TLHF–EnTLHF in comparison with the corresponding KF–EnKF method.
Ensemble system for Part-of-Speech tagging

OpenAIRE

Dell'Orletta, Felice

2009-01-01

The paper contains a description of the Felice-POS-Tagger and of its performance in Evalita 2009. Felice-POS-Tagger is an ensemble system that combines six different POS taggers. When evaluated on the official test set, the ensemble system outperforms each of the single tagger components and achieves the highest accuracy score in Evalita 2009 POS Closed Task. It is shown rst that the errors made from the dierent taggers are complementary, and then how to use this complementary behavior to the...
Learning to Run with Actor-Critic Ensemble

OpenAIRE

Huang, Zhewei; Zhou, Shuchang; Zhuang, BoEr; Zhou, Xinyu

2017-01-01

We introduce an Actor-Critic Ensemble(ACE) method for improving the performance of Deep Deterministic Policy Gradient(DDPG) algorithm. At inference time, our method uses a critic ensemble to select the best action from proposals of multiple actors running in parallel. By having a larger candidate set, our method can avoid actions that have fatal consequences, while staying deterministic. Using ACE, we have won the 2nd place in NIPS'17 Learning to Run competition, under the name of "Megvii-hzw...
Robust Ensemble Filtering and Its Relation to Covariance Inflation in the Ensemble Kalman Filter

KAUST Repository

Luo, Xiaodong; Hoteit, Ibrahim

2011-01-01

A robust ensemble filtering scheme based on the H∞ filtering theory is proposed. The optimal H∞ filter is derived by minimizing the supremum (or maximum) of a predefined cost function, a criterion different from the minimum variance used
Time-dependent generalized Gibbs ensembles in open quantum systems

Science.gov (United States)

Lange, Florian; Lenarčič, Zala; Rosch, Achim

2018-04-01

Generalized Gibbs ensembles have been used as powerful tools to describe the steady state of integrable many-particle quantum systems after a sudden change of the Hamiltonian. Here, we demonstrate numerically that they can be used for a much broader class of problems. We consider integrable systems in the presence of weak perturbations which break both integrability and drive the system to a state far from equilibrium. Under these conditions, we show that the steady state and the time evolution on long timescales can be accurately described by a (truncated) generalized Gibbs ensemble with time-dependent Lagrange parameters, determined from simple rate equations. We compare the numerically exact time evolutions of density matrices for small systems with a theory based on block-diagonal density matrices (diagonal ensemble) and a time-dependent generalized Gibbs ensemble containing only a small number of approximately conserved quantities, using the one-dimensional Heisenberg model with perturbations described by Lindblad operators as an example.
Precision bounds for gradient magnetometry with atomic ensembles

Science.gov (United States)

Apellaniz, Iagoba; Urizar-Lanz, Iñigo; Zimborás, Zoltán; Hyllus, Philipp; Tóth, Géza

2018-05-01

We study gradient magnetometry with an ensemble of atoms with arbitrary spin. We calculate precision bounds for estimating the gradient of the magnetic field based on the quantum Fisher information. For quantum states that are invariant under homogeneous magnetic fields, we need to measure a single observable to estimate the gradient. On the other hand, for states that are sensitive to homogeneous fields, a simultaneous measurement is needed, as the homogeneous field must also be estimated. We prove that for the cases studied in this paper, such a measurement is feasible. We present a method to calculate precision bounds for gradient estimation with a chain of atoms or with two spatially separated atomic ensembles. We also consider a single atomic ensemble with an arbitrary density profile, where the atoms cannot be addressed individually, and which is a very relevant case for experiments. Our model can take into account even correlations between particle positions. While in most of the discussion we consider an ensemble of localized particles that are classical with respect to their spatial degree of freedom, we also discuss the case of gradient metrology with a single Bose-Einstein condensate.
Embedded random matrix ensembles in quantum physics

CERN Document Server

Kota, V K B

2014-01-01

Although used with increasing frequency in many branches of physics, random matrix ensembles are not always sufficiently specific to account for important features of the physical system at hand. One refinement which retains the basic stochastic approach but allows for such features consists in the use of embedded ensembles. The present text is an exhaustive introduction to and survey of this important field. Starting with an easy-to-read introduction to general random matrix theory, the text then develops the necessary concepts from the beginning, accompanying the reader to the frontiers of present-day research. With some notable exceptions, to date these ensembles have primarily been applied in nuclear spectroscopy. A characteristic example is the use of a random two-body interaction in the framework of the nuclear shell model. Yet, topics in atomic physics, mesoscopic physics, quantum information science and statistical mechanics of isolated finite quantum systems can also be addressed using these ensemb...
Understanding ensemble protein folding at atomic detail

International Nuclear Information System (INIS)

Wallin, Stefan; Shakhnovich, Eugene I

2008-01-01

Although far from routine, simulating the folding of specific short protein chains on the computer, at a detailed atomic level, is starting to become a reality. This remarkable progress, which has been made over the last decade or so, allows a fundamental aspect of the protein folding process to be addressed, namely its statistical nature. In order to make quantitative comparisons with experimental kinetic data a complete ensemble view of folding must be achieved, with key observables averaged over the large number of microscopically different folding trajectories available to a protein chain. Here we review recent advances in atomic-level protein folding simulations and the new insight provided by them into the protein folding process. An important element in understanding ensemble folding kinetics are methods for analyzing many separate folding trajectories, and we discuss techniques developed to condense the large amount of information contained in an ensemble of trajectories into a manageable picture of the folding process. (topical review)

Ensemble-free configurational temperature for spin systems

Science.gov (United States)

Palma, G.; Gutiérrez, G.; Davis, S.

2016-12-01

An estimator for the dynamical temperature in an arbitrary ensemble is derived in the framework of the conjugate variables theorem. We prove directly that its average indeed gives the inverse temperature and that it is independent of the ensemble. We test this estimator numerically by a simulation of the two-dimensional X Y model in the canonical ensemble. As this model is critical in the whole region of temperatures below the Berezinski-Kosterlitz-Thouless critical temperature TBKT, we use a generalization of Wolff's unicluster algorithm. The numerical results allow us to confirm the robustness of the analytical expression for the microscopic estimator of the temperature. This microscopic estimator has also the advantage that it gives a direct measure of the thermalization process and can be used to compute absolute errors associated with statistical fluctuations. In consequence, this estimator allows for a direct, absolute, and stringent test of the ergodicity of the underlying Markov process, which encodes the algorithm used in a numerical simulation.
Online probabilistic learning with an ensemble of forecasts

Science.gov (United States)

Thorey, Jean; Mallet, Vivien; Chaussin, Christophe

2016-04-01

Our objective is to produce a calibrated weighted ensemble to forecast a univariate time series. In addition to a meteorological ensemble of forecasts, we rely on observations or analyses of the target variable. The celebrated Continuous Ranked Probability Score (CRPS) is used to evaluate the probabilistic forecasts. However applying the CRPS on weighted empirical distribution functions (deriving from the weighted ensemble) may introduce a bias because of which minimizing the CRPS does not produce the optimal weights. Thus we propose an unbiased version of the CRPS which relies on clusters of members and is strictly proper. We adapt online learning methods for the minimization of the CRPS. These methods generate the weights associated to the members in the forecasted empirical distribution function. The weights are updated before each forecast step using only past observations and forecasts. Our learning algorithms provide the theoretical guarantee that, in the long run, the CRPS of the weighted forecasts is at least as good as the CRPS of any weighted ensemble with weights constant in time. In particular, the performance of our forecast is better than that of any subset ensemble with uniform weights. A noteworthy advantage of our algorithm is that it does not require any assumption on the distributions of the observations and forecasts, both for the application and for the theoretical guarantee to hold. As application example on meteorological forecasts for photovoltaic production integration, we show that our algorithm generates a calibrated probabilistic forecast, with significant performance improvements on probabilistic diagnostic tools (the CRPS, the reliability diagram and the rank histogram).
Orchestrating the Selection and Packaging of Genomic RNA by Retroviruses: An Ensemble of Viral and Host Factors

Science.gov (United States)

Kaddis Maldonado, Rebecca J.; Parent, Leslie J.

2016-01-01

Infectious retrovirus particles contain two copies of unspliced viral RNA that serve as the viral genome. Unspliced retroviral RNA is transcribed in the nucleus by the host RNA polymerase II and has three potential fates: (1) it can be spliced into subgenomic messenger RNAs (mRNAs) for the translation of viral proteins; or it can remain unspliced to serve as either (2) the mRNA for the translation of Gag and Gag–Pol; or (3) the genomic RNA (gRNA) that is packaged into virions. The Gag structural protein recognizes and binds the unspliced viral RNA to select it as a genome, which is selected in preference to spliced viral RNAs and cellular RNAs. In this review, we summarize the current state of understanding about how retroviral packaging is orchestrated within the cell and explore potential new mechanisms based on recent discoveries in the field. We discuss the cis-acting elements in the unspliced viral RNA and the properties of the Gag protein that are required for their interaction. In addition, we discuss the role of host factors in influencing the fate of the newly transcribed viral RNA, current models for how retroviruses distinguish unspliced viral mRNA from viral genomic RNA, and the possible subcellular sites of genomic RNA dimerization and selection by Gag. Although this review centers primarily on the wealth of data available for the alpharetrovirus Rous sarcoma virus, in which a discrete RNA packaging sequence has been identified, we have also summarized the cis- and trans-acting factors as well as the mechanisms governing gRNA packaging of other retroviruses for comparison. PMID:27657110
Improving the ensemble optimization method through covariance matrix adaptation (CMA-EnOpt)

NARCIS (Netherlands)

Fonseca, R.M.; Leeuwenburgh, O.; Hof, P.M.J. van den; Jansen, J.D.

2013-01-01

Ensemble Optimization (EnOpt) is a rapidly emerging method for reservoir model based production optimization. EnOpt uses an ensemble of controls to approximate the gradient of the objective function with respect to the controls. Current implementations of EnOpt use a Gaussian ensemble with a
Scalable quantum information processing with atomic ensembles and flying photons

International Nuclear Information System (INIS)

Mei Feng; Yu Yafei; Feng Mang; Zhang Zhiming

2009-01-01

We present a scheme for scalable quantum information processing with atomic ensembles and flying photons. Using the Rydberg blockade, we encode the qubits in the collective atomic states, which could be manipulated fast and easily due to the enhanced interaction in comparison to the single-atom case. We demonstrate that our proposed gating could be applied to generation of two-dimensional cluster states for measurement-based quantum computation. Moreover, the atomic ensembles also function as quantum repeaters useful for long-distance quantum state transfer. We show the possibility of our scheme to work in bad cavity or in weak coupling regime, which could much relax the experimental requirement. The efficient coherent operations on the ensemble qubits enable our scheme to be switchable between quantum computation and quantum communication using atomic ensembles.
Statistical Analysis of Protein Ensembles

Science.gov (United States)

Máté, Gabriell; Heermann, Dieter

2014-04-01

As 3D protein-configuration data is piling up, there is an ever-increasing need for well-defined, mathematically rigorous analysis approaches, especially that the vast majority of the currently available methods rely heavily on heuristics. We propose an analysis framework which stems from topology, the field of mathematics which studies properties preserved under continuous deformations. First, we calculate a barcode representation of the molecules employing computational topology algorithms. Bars in this barcode represent different topological features. Molecules are compared through their barcodes by statistically determining the difference in the set of their topological features. As a proof-of-principle application, we analyze a dataset compiled of ensembles of different proteins, obtained from the Ensemble Protein Database. We demonstrate that our approach correctly detects the different protein groupings.
A J-modulated protonless NMR experiment characterizes the conformational ensemble of the intrinsically disordered protein WIP

Energy Technology Data Exchange (ETDEWEB)

Rozentur-Shkop, Eva; Goobes, Gil; Chill, Jordan H., E-mail: Jordan.Chill@biu.ac.il [Bar Ilan University, Department of Chemistry (Israel)

2016-12-15

Intrinsically disordered proteins (IDPs) are multi-conformational polypeptides that lack a single stable three-dimensional structure. It has become increasingly clear that the versatile IDPs play key roles in a multitude of biological processes, and, given their flexible nature, NMR is a leading method to investigate IDP behavior on the molecular level. Here we present an IDP-tailored J-modulated experiment designed to monitor changes in the conformational ensemble characteristic of IDPs by accurately measuring backbone one- and two-bond J({sup 15}N,{sup 13}Cα) couplings. This concept was realized using a unidirectional (H)NCO {sup 13}C-detected experiment suitable for poor spectral dispersion and optimized for maximum coverage of amino acid types. To demonstrate the utility of this approach we applied it to the disordered actin-binding N-terminal domain of WASp interacting protein (WIP), a ubiquitous key modulator of cytoskeletal changes in a range of biological systems. One- and two-bond J({sup 15}N,{sup 13}Cα) couplings were acquired for WIP residues 2–65 at various temperatures, and in denaturing and crowding environments. Under native conditions fitted J-couplings identified in the WIP conformational ensemble a propensity for extended conformation at residues 16–23 and 45–60, and a helical tendency at residues 28–42. These findings are consistent with a previous study of the based upon chemical shift and RDC data and confirm that the WIP{sup 2–65} conformational ensemble is biased towards the structure assumed by this fragment in its actin-bound form. The effects of environmental changes upon this ensemble were readily apparent in the J-coupling data, which reflected a significant decrease in structural propensity at higher temperatures, in the presence of 8 M urea, and under the influence of a bacterial cell lysate. The latter suggests that crowding can cause protein unfolding through protein–protein interactions that stabilize the unfolded
Ensemble Forecasts with Useful Skill-Spread Relationships for African meningitis and Asia Streamflow Forecasting

Science.gov (United States)

Hopson, T. M.

2014-12-01

One potential benefit of an ensemble prediction system (EPS) is its capacity to forecast its own forecast error through the ensemble spread-error relationship. In practice, an EPS is often quite limited in its ability to represent the variable expectation of forecast error through the variable dispersion of the ensemble, and perhaps more fundamentally, in its ability to provide enough variability in the ensembles dispersion to make the skill-spread relationship even potentially useful (irrespective of whether the EPS is well-calibrated or not). In this paper we examine the ensemble skill-spread relationship of an ensemble constructed from the TIGGE (THORPEX Interactive Grand Global Ensemble) dataset of global forecasts and a combination of multi-model and post-processing approaches. Both of the multi-model and post-processing techniques are based on quantile regression (QR) under a step-wise forward selection framework leading to ensemble forecasts with both good reliability and sharpness. The methodology utilizes the ensemble's ability to self-diagnose forecast instability to produce calibrated forecasts with informative skill-spread relationships. A context for these concepts is provided by assessing the constructed ensemble in forecasting district-level humidity impacting the incidence of meningitis in the meningitis belt of Africa, and in forecasting flooding events in the Brahmaputra and Ganges basins of South Asia.
Adiabatic passage and ensemble control of quantum systems

International Nuclear Information System (INIS)

Leghtas, Z; Sarlette, A; Rouchon, P

2011-01-01

This paper considers population transfer between eigenstates of a finite quantum ladder controlled by a classical electric field. Using an appropriate change of variables, we show that this setting can be set in the framework of adiabatic passage, which is known to facilitate ensemble control of quantum systems. Building on this insight, we present a mathematical proof of robustness for a control protocol-chirped pulse-practised by experimentalists to drive an ensemble of quantum systems from the ground state to the most excited state. We then propose new adiabatic control protocols using a single chirped and amplitude-shaped pulse, to robustly perform any permutation of eigenstate populations, on an ensemble of systems with unknown coupling strengths. These adiabatic control protocols are illustrated by simulations on a four-level ladder.
Conductor and Ensemble Performance Expressivity and State Festival Ratings

Science.gov (United States)

Price, Harry E.; Chang, E. Christina

2005-01-01

This study is the second in a series examining the relationship between conducting and ensemble performance. The purpose was to further examine the associations among conductor, ensemble performance expressivity, and festival ratings. Participants were asked to rate the expressivity of video-only conducting and parallel audio-only excerpts from a…
Constructing Support Vector Machine Ensembles for Cancer Classification Based on Proteomic Profiling

Institute of Scientific and Technical Information of China (English)

Yong Mao; Xiao-Bo Zhou; Dao-Ying Pi; You-Xian Sun

2005-01-01

In this study, we present a constructive algorithm for training cooperative support vector machine ensembles (CSVMEs). CSVME combines ensemble architecture design with cooperative training for individual SVMs in ensembles. Unlike most previous studies on training ensembles, CSVME puts emphasis on both accuracy and collaboration among individual SVMs in an ensemble. A group of SVMs selected on the basis of recursive classifier elimination is used in CSVME, and the number of the individual SVMs selected to construct CSVME is determined by 10-fold cross-validation. This kind of SVME has been tested on two ovarian cancer datasets previously obtained by proteomic mass spectrometry. By combining several individual SVMs, the proposed method achieves better performance than the SVME of all base SVMs.
Sequential Ensembles Tolerant to Synthetic Aperture Radar (SAR Soil Moisture Retrieval Errors

Directory of Open Access Journals (Sweden)

Ju Hyoung Lee

2016-04-01

Full Text Available Due to complicated and undefined systematic errors in satellite observation, data assimilation integrating model states with satellite observations is more complicated than field measurements-based data assimilation at a local scale. In the case of Synthetic Aperture Radar (SAR soil moisture, the systematic errors arising from uncertainties in roughness conditions are significant and unavoidable, but current satellite bias correction methods do not resolve the problems very well. Thus, apart from the bias correction process of satellite observation, it is important to assess the inherent capability of satellite data assimilation in such sub-optimal but more realistic observational error conditions. To this end, time-evolving sequential ensembles of the Ensemble Kalman Filter (EnKF is compared with stationary ensemble of the Ensemble Optimal Interpolation (EnOI scheme that does not evolve the ensembles over time. As the sensitivity analysis demonstrated that the surface roughness is more sensitive to the SAR retrievals than measurement errors, it is a scope of this study to monitor how data assimilation alters the effects of roughness on SAR soil moisture retrievals. In results, two data assimilation schemes all provided intermediate values between SAR overestimation, and model underestimation. However, under the same SAR observational error conditions, the sequential ensembles approached a calibrated model showing the lowest Root Mean Square Error (RMSE, while the stationary ensemble converged towards the SAR observations exhibiting the highest RMSE. As compared to stationary ensembles, sequential ensembles have a better tolerance to SAR retrieval errors. Such inherent nature of EnKF suggests an operational merit as a satellite data assimilation system, due to the limitation of bias correction methods currently available.
Genome-wide identification of the regulatory targets of a transcription factor using biochemical characterization and computational genomic analysis

Directory of Open Access Journals (Sweden)

Jolly Emmitt R

2005-11-01

Full Text Available Abstract Background A major challenge in computational genomics is the development of methodologies that allow accurate genome-wide prediction of the regulatory targets of a transcription factor. We present a method for target identification that combines experimental characterization of binding requirements with computational genomic analysis. Results Our method identified potential target genes of the transcription factor Ndt80, a key transcriptional regulator involved in yeast sporulation, using the combined information of binding affinity, positional distribution, and conservation of the binding sites across multiple species. We have also developed a mathematical approach to compute the false positive rate and the total number of targets in the genome based on the multiple selection criteria. Conclusion We have shown that combining biochemical characterization and computational genomic analysis leads to accurate identification of the genome-wide targets of a transcription factor. The method can be extended to other transcription factors and can complement other genomic approaches to transcriptional regulation.
Visualizing Confidence in Cluster-Based Ensemble Weather Forecast Analyses.

Science.gov (United States)

Kumpf, Alexander; Tost, Bianca; Baumgart, Marlene; Riemer, Michael; Westermann, Rudiger; Rautenhaus, Marc

2018-01-01

In meteorology, cluster analysis is frequently used to determine representative trends in ensemble weather predictions in a selected spatio-temporal region, e.g., to reduce a set of ensemble members to simplify and improve their analysis. Identified clusters (i.e., groups of similar members), however, can be very sensitive to small changes of the selected region, so that clustering results can be misleading and bias subsequent analyses. In this article, we - a team of visualization scientists and meteorologists-deliver visual analytics solutions to analyze the sensitivity of clustering results with respect to changes of a selected region. We propose an interactive visual interface that enables simultaneous visualization of a) the variation in composition of identified clusters (i.e., their robustness), b) the variability in cluster membership for individual ensemble members, and c) the uncertainty in the spatial locations of identified trends. We demonstrate that our solution shows meteorologists how representative a clustering result is, and with respect to which changes in the selected region it becomes unstable. Furthermore, our solution helps to identify those ensemble members which stably belong to a given cluster and can thus be considered similar. In a real-world application case we show how our approach is used to analyze the clustering behavior of different regions in a forecast of "Tropical Cyclone Karl", guiding the user towards the cluster robustness information required for subsequent ensemble analysis.
Application of Ensemble Sensitivity Analysis to Observation Targeting for Short-term Wind Speed Forecasting in the Tehachapi Region Winter Season

Energy Technology Data Exchange (ETDEWEB)

Zack, John [AWS Truepower, LLC, Albany, NY (United States); Natenberg, Eddie [AWS Truepower, LLC, Albany, NY (United States); Young, Steve [AWS Truepower, LLC, Albany, NY (United States); Van Knowe, Glenn [AWS Truepower, LLC, Albany, NY (United States); Waight, Ken [AWS Truepower, LLC, Albany, NY (United States); Manobainco, John [AWS Truepower, LLC, Albany, NY (United States); Kamath, Chandrika [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2010-10-20

This study extends the wind power forecast sensitivity work done by Zack et al. (2010a, b) in two prior research efforts. Zack et al. (2010a, b) investigated the relative predictive value and optimal combination of different variables/locations from correlated sensitivity patterns. Their work involved developing the Multiple Observation Optimization Algorithm (MOOA) and applying the algorithm to the results obtained from the Ensemble Sensitivity Analysis (ESA) method (Ancell and Hakim 2007; Torn and Hakim 2008).
Analyzing the impact of changing size and composition of a crop model ensemble

Science.gov (United States)

Rodríguez, Alfredo

2017-04-01

The use of an ensemble of crop growth simulation models is a practice recently adopted in order to quantify aspects of uncertainties in model simulations. Yet, while the climate modelling community has extensively investigated the properties of model ensembles and their implications, this has hardly been investigated for crop model ensembles (Wallach et al., 2016). In their ensemble of 27 wheat models, Martre et al. (2015) found that the accuracy of the multi-model ensemble-average only increases up to an ensemble size of ca. 10, but does not improve when including more models in the analysis. However, even when this number of members is reached, questions about the impact of the addition or removal of a member to/from the ensemble arise. When selecting ensemble members, identifying members with poor performance or giving implausible results can make a large difference on the outcome. The objective of this study is to set up a methodology that defines indicators to show the effects of changing the ensemble composition and size on simulation results, when a selection procedure of ensemble members is applied. Ensemble mean or median, and variance are measures used to depict ensemble results among other indicators. We are utilizing simulations from an ensemble of wheat models that have been used to construct impact response surfaces (Pirttioja et al., 2015) (IRSs). These show the response of an impact variable (e.g., crop yield) to systematic changes in two explanatory variables (e.g., precipitation and temperature). Using these, we compare different sub-ensembles in terms of the mean, median and spread, and also by comparing IRSs. The methodology developed here allows comparing an ensemble before and after applying any procedure that changes the ensemble composition and size by measuring the impact of this decision on the ensemble central tendency measures. The methodology could also be further developed to compare the effect of changing ensemble composition and size
Kinetic theory of nonequilibrium ensembles, irreversible thermodynamics, and generalized hydrodynamics

CERN Document Server

Eu, Byung Chan

2016-01-01

This book presents the fundamentals of irreversible thermodynamics for nonlinear transport processes in gases and liquids, as well as for generalized hydrodynamics extending the classical hydrodynamics of Navier, Stokes, Fourier, and Fick. Together with its companion volume on relativistic theories, it provides a comprehensive picture of the kinetic theory formulated from the viewpoint of nonequilibrium ensembles in both nonrelativistic and, in Vol. 2, relativistic contexts. Theories of macroscopic irreversible processes must strictly conform to the thermodynamic laws at every step and in all approximations that enter their derivation from the mechanical principles. Upholding this as the inviolable tenet, the author develops theories of irreversible transport processes in fluids (gases or liquids) on the basis of irreversible kinetic equations satisfying the H theorem. They apply regardless of whether the processes are near to or far removed from equilibrium, or whether they are linear or nonlinear with respe...
Multi-objective optimization for generating a weighted multi-model ensemble

Science.gov (United States)

Lee, H.

2017-12-01

Many studies have demonstrated that multi-model ensembles generally show better skill than each ensemble member. When generating weighted multi-model ensembles, the first step is measuring the performance of individual model simulations using observations. There is a consensus on the assignment of weighting factors based on a single evaluation metric. When considering only one evaluation metric, the weighting factor for each model is proportional to a performance score or inversely proportional to an error for the model. While this conventional approach can provide appropriate combinations of multiple models, the approach confronts a big challenge when there are multiple metrics under consideration. When considering multiple evaluation metrics, it is obvious that a simple averaging of multiple performance scores or model ranks does not address the trade-off problem between conflicting metrics. So far, there seems to be no best method to generate weighted multi-model ensembles based on multiple performance metrics. The current study applies the multi-objective optimization, a mathematical process that provides a set of optimal trade-off solutions based on a range of evaluation metrics, to combining multiple performance metrics for the global climate models and their dynamically downscaled regional climate simulations over North America and generating a weighted multi-model ensemble. NASA satellite data and the Regional Climate Model Evaluation System (RCMES) software toolkit are used for assessment of the climate simulations. Overall, the performance of each model differs markedly with strong seasonal dependence. Because of the considerable variability across the climate simulations, it is important to evaluate models systematically and make future projections by assigning optimized weighting factors to the models with relatively good performance. Our results indicate that the optimally weighted multi-model ensemble always shows better performance than an arithmetic
Impact of ensemble learning in the assessment of skeletal maturity.

Science.gov (United States)

Cunha, Pedro; Moura, Daniel C; Guevara López, Miguel Angel; Guerra, Conceição; Pinto, Daniela; Ramos, Isabel

2014-09-01

The assessment of the bone age, or skeletal maturity, is an important task in pediatrics that measures the degree of maturation of children's bones. Nowadays, there is no standard clinical procedure for assessing bone age and the most widely used approaches are the Greulich and Pyle and the Tanner and Whitehouse methods. Computer methods have been proposed to automatize the process; however, there is a lack of exploration about how to combine the features of the different parts of the hand, and how to take advantage of ensemble techniques for this purpose. This paper presents a study where the use of ensemble techniques for improving bone age assessment is evaluated. A new computer method was developed that extracts descriptors for each joint of each finger, which are then combined using different ensemble schemes for obtaining a final bone age value. Three popular ensemble schemes are explored in this study: bagging, stacking and voting. Best results were achieved by bagging with a rule-based regression (M5P), scoring a mean absolute error of 10.16 months. Results show that ensemble techniques improve the prediction performance of most of the evaluated regression algorithms, always achieving best or comparable to best results. Therefore, the success of the ensemble methods allow us to conclude that their use may improve computer-based bone age assessment, offering a scalable option for utilizing multiple regions of interest and combining their output.
NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms.

Directory of Open Access Journals (Sweden)

Joeri Ruyssinck

Full Text Available One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made

Construction of an integrated database to support genomic sequence analysis

Energy Technology Data Exchange (ETDEWEB)

Gilbert, W.; Overbeek, R.

1994-11-01

The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.
Ensemble of classifiers based network intrusion detection system performance bound

CSIR Research Space (South Africa)

Mkuzangwe, Nenekazi NP

2017-11-01

Full Text Available This paper provides a performance bound of a network intrusion detection system (NIDS) that uses an ensemble of classifiers. Currently researchers rely on implementing the ensemble of classifiers based NIDS before they can determine the performance...
Population Genomic Analysis of 1,777 Extended-Spectrum Beta-Lactamase-Producing Klebsiella pneumoniae Isolates, Houston, Texas: Unexpected Abundance of Clonal Group 307.

Science.gov (United States)

Long, S Wesley; Olsen, Randall J; Eagar, Todd N; Beres, Stephen B; Zhao, Picheng; Davis, James J; Brettin, Thomas; Xia, Fangfang; Musser, James M

2017-05-16

Klebsiella pneumoniae is a major human pathogen responsible for high morbidity and mortality rates. The emergence and spread of strains resistant to multiple antimicrobial agents and documented large nosocomial outbreaks are especially concerning. To develop new therapeutic strategies for K. pneumoniae , it is imperative to understand the population genomic structure of strains causing human infections. To address this knowledge gap, we sequenced the genomes of 1,777 extended-spectrum beta-lactamase-producing K. pneumoniae strains cultured from patients in the 2,000-bed Houston Methodist Hospital system between September 2011 and May 2015, representing a comprehensive, population-based strain sample. Strains of largely uncharacterized clonal group 307 (CG307) caused more infections than those of well-studied epidemic CG258. Strains varied markedly in gene content and had an extensive array of small and very large plasmids, often containing antimicrobial resistance genes. Some patients with multiple strains cultured over time were infected with genetically distinct clones. We identified 15 strains expressing the New Delhi metallo-beta-lactamase 1 (NDM-1) enzyme that confers broad resistance to nearly all beta-lactam antibiotics. Transcriptome sequencing analysis of 10 phylogenetically diverse strains showed that the global transcriptome of each strain was unique and highly variable. Experimental mouse infection provided new information about immunological parameters of host-pathogen interaction. We exploited the large data set to develop whole-genome sequence-based classifiers that accurately predict clinical antimicrobial resistance for 12 of the 16 antibiotics tested. We conclude that analysis of large, comprehensive, population-based strain samples can assist understanding of the molecular diversity of these organisms and contribute to enhanced translational research. IMPORTANCE Klebsiella pneumoniae causes human infections that are increasingly difficult to
An Adaptive Approach to Mitigate Background Covariance Limitations in the Ensemble Kalman Filter

KAUST Repository

Song, Hajoon

2010-07-01

A new approach is proposed to address the background covariance limitations arising from undersampled ensembles and unaccounted model errors in the ensemble Kalman filter (EnKF). The method enhances the representativeness of the EnKF ensemble by augmenting it with new members chosen adaptively to add missing information that prevents the EnKF from fully fitting the data to the ensemble. The vectors to be added are obtained by back projecting the residuals of the observation misfits from the EnKF analysis step onto the state space. The back projection is done using an optimal interpolation (OI) scheme based on an estimated covariance of the subspace missing from the ensemble. In the experiments reported here, the OI uses a preselected stationary background covariance matrix, as in the hybrid EnKF–three-dimensional variational data assimilation (3DVAR) approach, but the resulting correction is included as a new ensemble member instead of being added to all existing ensemble members. The adaptive approach is tested with the Lorenz-96 model. The hybrid EnKF–3DVAR is used as a benchmark to evaluate the performance of the adaptive approach. Assimilation experiments suggest that the new adaptive scheme significantly improves the EnKF behavior when it suffers from small size ensembles and neglected model errors. It was further found to be competitive with the hybrid EnKF–3DVAR approach, depending on ensemble size and data coverage.
Advances in snow cover distributed modelling via ensemble simulations and assimilation of satellite data

Science.gov (United States)

Revuelto, J.; Dumont, M.; Tuzet, F.; Vionnet, V.; Lafaysse, M.; Lecourt, G.; Vernay, M.; Morin, S.; Cosme, E.; Six, D.; Rabatel, A.

2017-12-01

Nowadays snowpack models show a good capability in simulating the evolution of snow in mountain areas. However singular deviations of meteorological forcing and shortcomings in the modelling of snow physical processes, when accumulated on time along a snow season, could produce large deviations from real snowpack state. The evaluation of these deviations is usually assessed with on-site observations from automatic weather stations. Nevertheless the location of these stations could strongly influence the results of these evaluations since local topography may have a marked influence on snowpack evolution. Despite the evaluation of snowpack models with automatic weather stations usually reveal good results, there exist a lack of large scale evaluations of simulations results on heterogeneous alpine terrain subjected to local topographic effects.This work firstly presents a complete evaluation of the detailed snowpack model Crocus over an extended mountain area, the Arve upper catchment (western European Alps). This catchment has a wide elevation range with a large area above 2000m a.s.l. and/or glaciated. The evaluation compares results obtained with distributed and semi-distributed simulations (the latter nowadays used on the operational forecasting). Daily observations of the snow covered area from MODIS satellite sensor, seasonal glacier surface mass balance evolution measured in more than 65 locations and the galciers annual equilibrium line altitude from Landsat/Spot/Aster satellites, have been used for model evaluation. Additionally the latest advances in producing ensemble snowpack simulations for assimilating satellite reflectance data over extended areas will be presented. These advances comprises the generation of an ensemble of downscaled high-resolution meteorological forcing from meso-scale meteorological models and the application of a particle filter scheme for assimilating satellite observations. Despite the results are prefatory, they show a good
arrayCGHbase: an analysis platform for comparative genomic hybridization microarrays

Directory of Open Access Journals (Sweden)

Moreau Yves

2005-05-01

Full Text Available Abstract Background The availability of the human genome sequence as well as the large number of physically accessible oligonucleotides, cDNA, and BAC clones across the entire genome has triggered and accelerated the use of several platforms for analysis of DNA copy number changes, amongst others microarray comparative genomic hybridization (arrayCGH. One of the challenges inherent to this new technology is the management and analysis of large numbers of data points generated in each individual experiment. Results We have developed arrayCGHbase, a comprehensive analysis platform for arrayCGH experiments consisting of a MIAME (Minimal Information About a Microarray Experiment supportive database using MySQL underlying a data mining web tool, to store, analyze, interpret, compare, and visualize arrayCGH results in a uniform and user-friendly format. Following its flexible design, arrayCGHbase is compatible with all existing and forthcoming arrayCGH platforms. Data can be exported in a multitude of formats, including BED files to map copy number information on the genome using the Ensembl or UCSC genome browser. Conclusion ArrayCGHbase is a web based and platform independent arrayCGH data analysis tool, that allows users to access the analysis suite through the internet or a local intranet after installation on a private server. ArrayCGHbase is available at http://medgen.ugent.be/arrayCGHbase/.
Adaptive calibration of (u,v)‐wind ensemble forecasts

DEFF Research Database (Denmark)

Pinson, Pierre

2012-01-01

of sufficient reliability. The original framework introduced here allows for an adaptive bivariate calibration of these ensemble forecasts. The originality of this methodology lies in the fact that calibrated ensembles still consist of a set of (space–time) trajectories, after translation and dilation...... of translation and dilation factors are discussed. Copyright © 2012 Royal Meteorological Society...
Modeling polydispersive ensembles of diamond nanoparticles

International Nuclear Information System (INIS)

Barnard, Amanda S

2013-01-01

While significant progress has been made toward production of monodispersed samples of a variety of nanoparticles, in cases such as diamond nanoparticles (nanodiamonds) a significant degree of polydispersivity persists, so scaling-up of laboratory applications to industrial levels has its challenges. In many cases, however, monodispersivity is not essential for reliable application, provided that the inevitable uncertainties are just as predictable as the functional properties. As computational methods of materials design are becoming more widespread, there is a growing need for robust methods for modeling ensembles of nanoparticles, that capture the structural complexity characteristic of real specimens. In this paper we present a simple statistical approach to modeling of ensembles of nanoparticles, and apply it to nanodiamond, based on sets of individual simulations that have been carefully selected to describe specific structural sources that are responsible for scattering of fundamental properties, and that are typically difficult to eliminate experimentally. For the purposes of demonstration we show how scattering in the Fermi energy and the electronic band gap are related to different structural variations (sources), and how these results can be combined strategically to yield statistically significant predictions of the properties of an entire ensemble of nanodiamonds, rather than merely one individual ‘model’ particle or a non-representative sub-set. (paper)
The microcanonical ensemble of the ideal relativistic quantum gas with angular momentum conservation

International Nuclear Information System (INIS)

Becattini, F.; Ferroni, L.

2007-01-01

We derive the microcanonical partition function of the ideal relativistic quantum gas with fixed intrinsic angular momentum as an expansion over fixed multiplicities. We developed a group theoretical approach by generalizing known projection techniques to the Poincare group. Our calculation is carried out in a quantum field framework and applies to particles with any spin. It extends known results in the literature in that it does not introduce any large volume approximation, and it takes particle spin fully into account. We provide expressions of the microcanonical partition function at fixed multiplicities in the limiting classical case of large volumes and large angular momenta and in the grand-canonical ensemble. We also derive the microcanonical partition function of the ideal relativistic quantum gas with fixed parity. (orig.)
Multivariate localization methods for ensemble Kalman filtering

KAUST Repository

Roh, S.

2015-12-03

In ensemble Kalman filtering (EnKF), the small number of ensemble members that is feasible to use in a practical data assimilation application leads to sampling variability of the estimates of the background error covariances. The standard approach to reducing the effects of this sampling variability, which has also been found to be highly efficient in improving the performance of EnKF, is the localization of the estimates of the covariances. One family of localization techniques is based on taking the Schur (element-wise) product of the ensemble-based sample covariance matrix and a correlation matrix whose entries are obtained by the discretization of a distance-dependent correlation function. While the proper definition of the localization function for a single state variable has been extensively investigated, a rigorous definition of the localization function for multiple state variables that exist at the same locations has been seldom considered. This paper introduces two strategies for the construction of localization functions for multiple state variables. The proposed localization functions are tested by assimilating simulated observations experiments into the bivariate Lorenz 95 model with their help.
Microcanonical ensemble extensive thermodynamics of Tsallis statistics

International Nuclear Information System (INIS)

Parvan, A.S.

2005-01-01

The microscopic foundation of the generalized equilibrium statistical mechanics based on the Tsallis entropy is given by using the Gibbs idea of statistical ensembles of the classical and quantum mechanics.The equilibrium distribution functions are derived by the thermodynamic method based upon the use of the fundamental equation of thermodynamics and the statistical definition of the functions of the state of the system. It is shown that if the entropic index ξ = 1/q - 1 in the microcanonical ensemble is an extensive variable of the state of the system, then in the thermodynamic limit z bar = 1/(q - 1)N = const the principle of additivity and the zero law of thermodynamics are satisfied. In particular, the Tsallis entropy of the system is extensive and the temperature is intensive. Thus, the Tsallis statistics completely satisfies all the postulates of the equilibrium thermodynamics. Moreover, evaluation of the thermodynamic identities in the microcanonical ensemble is provided by the Euler theorem. The principle of additivity and the Euler theorem are explicitly proved by using the illustration of the classical microcanonical ideal gas in the thermodynamic limit
Multivariate localization methods for ensemble Kalman filtering

KAUST Repository

Roh, S.

2015-05-08

In ensemble Kalman filtering (EnKF), the small number of ensemble members that is feasible to use in a practical data assimilation application leads to sampling variability of the estimates of the background error covariances. The standard approach to reducing the effects of this sampling variability, which has also been found to be highly efficient in improving the performance of EnKF, is the localization of the estimates of the covariances. One family of localization techniques is based on taking the Schur (entry-wise) product of the ensemble-based sample covariance matrix and a correlation matrix whose entries are obtained by the discretization of a distance-dependent correlation function. While the proper definition of the localization function for a single state variable has been extensively investigated, a rigorous definition of the localization function for multiple state variables has been seldom considered. This paper introduces two strategies for the construction of localization functions for multiple state variables. The proposed localization functions are tested by assimilating simulated observations experiments into the bivariate Lorenz 95 model with their help.
Multivariate localization methods for ensemble Kalman filtering

KAUST Repository

Roh, S.; Jun, M.; Szunyogh, I.; Genton, Marc G.

2015-01-01

In ensemble Kalman filtering (EnKF), the small number of ensemble members that is feasible to use in a practical data assimilation application leads to sampling variability of the estimates of the background error covariances. The standard approach to reducing the effects of this sampling variability, which has also been found to be highly efficient in improving the performance of EnKF, is the localization of the estimates of the covariances. One family of localization techniques is based on taking the Schur (entry-wise) product of the ensemble-based sample covariance matrix and a correlation matrix whose entries are obtained by the discretization of a distance-dependent correlation function. While the proper definition of the localization function for a single state variable has been extensively investigated, a rigorous definition of the localization function for multiple state variables has been seldom considered. This paper introduces two strategies for the construction of localization functions for multiple state variables. The proposed localization functions are tested by assimilating simulated observations experiments into the bivariate Lorenz 95 model with their help.
Multivariate localization methods for ensemble Kalman filtering

Science.gov (United States)

Roh, S.; Jun, M.; Szunyogh, I.; Genton, M. G.

2015-12-01

In ensemble Kalman filtering (EnKF), the small number of ensemble members that is feasible to use in a practical data assimilation application leads to sampling variability of the estimates of the background error covariances. The standard approach to reducing the effects of this sampling variability, which has also been found to be highly efficient in improving the performance of EnKF, is the localization of the estimates of the covariances. One family of localization techniques is based on taking the Schur (element-wise) product of the ensemble-based sample covariance matrix and a correlation matrix whose entries are obtained by the discretization of a distance-dependent correlation function. While the proper definition of the localization function for a single state variable has been extensively investigated, a rigorous definition of the localization function for multiple state variables that exist at the same locations has been seldom considered. This paper introduces two strategies for the construction of localization functions for multiple state variables. The proposed localization functions are tested by assimilating simulated observations experiments into the bivariate Lorenz 95 model with their help.
Microcanonical ensemble extensive thermodynamics of Tsallis statistics

International Nuclear Information System (INIS)

Parvan, A.S.

2006-01-01

The microscopic foundation of the generalized equilibrium statistical mechanics based on the Tsallis entropy is given by using the Gibbs idea of statistical ensembles of the classical and quantum mechanics. The equilibrium distribution functions are derived by the thermodynamic method based upon the use of the fundamental equation of thermodynamics and the statistical definition of the functions of the state of the system. It is shown that if the entropic index ξ=1/(q-1) in the microcanonical ensemble is an extensive variable of the state of the system, then in the thermodynamic limit z-bar =1/(q-1)N=const the principle of additivity and the zero law of thermodynamics are satisfied. In particular, the Tsallis entropy of the system is extensive and the temperature is intensive. Thus, the Tsallis statistics completely satisfies all the postulates of the equilibrium thermodynamics. Moreover, evaluation of the thermodynamic identities in the microcanonical ensemble is provided by the Euler theorem. The principle of additivity and the Euler theorem are explicitly proved by using the illustration of the classical microcanonical ideal gas in the thermodynamic limit
Lessons from Climate Modeling on the Design and Use of Ensembles for Crop Modeling

Science.gov (United States)

Wallach, Daniel; Mearns, Linda O.; Ruane, Alexander C.; Roetter, Reimund P.; Asseng, Senthold

2016-01-01

Working with ensembles of crop models is a recent but important development in crop modeling which promises to lead to better uncertainty estimates for model projections and predictions, better predictions using the ensemble mean or median, and closer collaboration within the modeling community. There are numerous open questions about the best way to create and analyze such ensembles. Much can be learned from the field of climate modeling, given its much longer experience with ensembles. We draw on that experience to identify questions and make propositions that should help make ensemble modeling with crop models more rigorous and informative. The propositions include defining criteria for acceptance of models in a crop MME, exploring criteria for evaluating the degree of relatedness of models in a MME, studying the effect of number of models in the ensemble, development of a statistical model of model sampling, creation of a repository for MME results, studies of possible differential weighting of models in an ensemble, creation of single model ensembles based on sampling from the uncertainty distribution of parameter values or inputs specifically oriented toward uncertainty estimation, the creation of super ensembles that sample more than one source of uncertainty, the analysis of super ensemble results to obtain information on total uncertainty and the separate contributions of different sources of uncertainty and finally further investigation of the use of the multi-model mean or median as a predictor.
BGD: a database of bat genomes.

Science.gov (United States)

Fang, Jianfei; Wang, Xuan; Mu, Shuo; Zhang, Shuyi; Dong, Dong

2015-01-01

Bats account for ~20% of mammalian species, and are the only mammals with true powered flight. For the sake of their specialized phenotypic traits, many researches have been devoted to examine the evolution of bats. Until now, some whole genome sequences of bats have been assembled and annotated, however, a uniform resource for the annotated bat genomes is still unavailable. To make the extensive data associated with the bat genomes accessible to the general biological communities, we established a Bat Genome Database (BGD). BGD is an open-access, web-available portal that integrates available data of bat genomes and genes. It hosts data from six bat species, including two megabats and four microbats. Users can query the gene annotations using efficient searching engine, and it offers browsable tracks of bat genomes. Furthermore, an easy-to-use phylogenetic analysis tool was also provided to facilitate online phylogeny study of genes. To the best of our knowledge, BGD is the first database of bat genomes. It will extend our understanding of the bat evolution and be advantageous to the bat sequences analysis. BGD is freely available at: http://donglab.ecnu.edu.cn/databases/BatGenome/.
BGD: a database of bat genomes.

Directory of Open Access Journals (Sweden)

Jianfei Fang

Full Text Available Bats account for ~20% of mammalian species, and are the only mammals with true powered flight. For the sake of their specialized phenotypic traits, many researches have been devoted to examine the evolution of bats. Until now, some whole genome sequences of bats have been assembled and annotated, however, a uniform resource for the annotated bat genomes is still unavailable. To make the extensive data associated with the bat genomes accessible to the general biological communities, we established a Bat Genome Database (BGD. BGD is an open-access, web-available portal that integrates available data of bat genomes and genes. It hosts data from six bat species, including two megabats and four microbats. Users can query the gene annotations using efficient searching engine, and it offers browsable tracks of bat genomes. Furthermore, an easy-to-use phylogenetic analysis tool was also provided to facilitate online phylogeny study of genes. To the best of our knowledge, BGD is the first database of bat genomes. It will extend our understanding of the bat evolution and be advantageous to the bat sequences analysis. BGD is freely available at: http://donglab.ecnu.edu.cn/databases/BatGenome/.
Single-particle model of a strongly driven, dense, nanoscale quantum ensemble

Science.gov (United States)

DiLoreto, C. S.; Rangan, C.

2018-01-01

We study the effects of interatomic interactions on the quantum dynamics of a dense, nanoscale, atomic ensemble driven by a strong electromagnetic field. We use a self-consistent, mean-field technique based on the pseudospectral time-domain method and a full, three-directional basis to solve the coupled Maxwell-Liouville equations. We find that interatomic interactions generate a decoherence in the state of an ensemble on a much faster time scale than the excited-state lifetime of individual atoms. We present a single-particle model of the driven, dense ensemble by incorporating interactions into a dephasing rate. This single-particle model reproduces the essential physics of the full simulation and is an efficient way of rapidly estimating the collective dynamics of a dense ensemble.
Benefits of an ultra large and multiresolution ensemble for estimating available wind power

Science.gov (United States)

Berndt, Jonas; Hoppe, Charlotte; Elbern, Hendrik

2016-04-01

In this study we investigate the benefits of an ultra large ensemble with up to 1000 members including multiple nesting with a target horizontal resolution of 1 km. The ensemble shall be used as a basis to detect events of extreme errors in wind power forecasting. Forecast value is the wind vector at wind turbine hub height (~ 100 m) in the short range (1 to 24 hour). Current wind power forecast systems rest already on NWP ensemble models. However, only calibrated ensembles from meteorological institutions serve as input so far, with limited spatial resolution (˜10 - 80 km) and member number (˜ 50). Perturbations related to the specific merits of wind power production are yet missing. Thus, single extreme error events which are not detected by such ensemble power forecasts occur infrequently. The numerical forecast model used in this study is the Weather Research and Forecasting Model (WRF). Model uncertainties are represented by stochastic parametrization of sub-grid processes via stochastically perturbed parametrization tendencies and in conjunction via the complementary stochastic kinetic-energy backscatter scheme already provided by WRF. We perform continuous ensemble updates by comparing each ensemble member with available observations using a sequential importance resampling filter to improve the model accuracy while maintaining ensemble spread. Additionally, we use different ensemble systems from global models (ECMWF and GFS) as input and boundary conditions to capture different synoptic conditions. Critical weather situations which are connected to extreme error events are located and corresponding perturbation techniques are applied. The demanding computational effort is overcome by utilising the supercomputer JUQUEEN at the Forschungszentrum Juelich.

Matrix-product-state simulation of an extended Brueschweiler bulk-ensemble database search

International Nuclear Information System (INIS)

SaiToh, Akira; Kitagawa, Masahiro

2006-01-01

Brueschweiler's database search in a spin Liouville space can be efficiently simulated on a conventional computer without error as long as the simulation cost of the internal circuit of an oracle function is polynomial, unlike the fact that in true NMR experiments, it suffers from an exponential decrease in the variation of a signal intensity. With the simulation method using the matrix-product-state proposed by Vidal [G. Vidal, Phys. Rev. Lett. 91, 147902 (2003)], we perform such a simulation. We also show the extensions of the algorithm without utilizing the J-coupling or DD-coupling splitting of frequency peaks in observation: searching can be completed with a single query in polynomial postoracle circuit complexities in an extension; multiple solutions of an oracle can be found in another extension whose query complexity is linear in the key length and in the number of solutions (this extension is to find all of marked keys). These extended algorithms are also simulated with the same simulation method
GIGGLE: a search engine for large-scale integrated genome analysis.

Science.gov (United States)

Layer, Ryan M; Pedersen, Brent S; DiSera, Tonya; Marth, Gabor T; Gertz, Jason; Quinlan, Aaron R

2018-02-01

GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation.
Multi-model ensembles for assessment of flood losses and associated uncertainty

Science.gov (United States)

Figueiredo, Rui; Schröter, Kai; Weiss-Motz, Alexander; Martina, Mario L. V.; Kreibich, Heidi

2018-05-01

Flood loss modelling is a crucial part of risk assessments. However, it is subject to large uncertainty that is often neglected. Most models available in the literature are deterministic, providing only single point estimates of flood loss, and large disparities tend to exist among them. Adopting any one such model in a risk assessment context is likely to lead to inaccurate loss estimates and sub-optimal decision-making. In this paper, we propose the use of multi-model ensembles to address these issues. This approach, which has been applied successfully in other scientific fields, is based on the combination of different model outputs with the aim of improving the skill and usefulness of predictions. We first propose a model rating framework to support ensemble construction, based on a probability tree of model properties, which establishes relative degrees of belief between candidate models. Using 20 flood loss models in two test cases, we then construct numerous multi-model ensembles, based both on the rating framework and on a stochastic method, differing in terms of participating members, ensemble size and model weights. We evaluate the performance of ensemble means, as well as their probabilistic skill and reliability. Our results demonstrate that well-designed multi-model ensembles represent a pragmatic approach to consistently obtain more accurate flood loss estimates and reliable probability distributions of model uncertainty.
Population Genomic Analysis of 1,777 Extended-Spectrum Beta-Lactamase-Producing Klebsiella pneumoniae Isolates, Houston, Texas: Unexpected Abundance of Clonal Group 307

Directory of Open Access Journals (Sweden)

S. Wesley Long

2017-05-01

Full Text Available Klebsiella pneumoniae is a major human pathogen responsible for high morbidity and mortality rates. The emergence and spread of strains resistant to multiple antimicrobial agents and documented large nosocomial outbreaks are especially concerning. To develop new therapeutic strategies for K. pneumoniae, it is imperative to understand the population genomic structure of strains causing human infections. To address this knowledge gap, we sequenced the genomes of 1,777 extended-spectrum beta-lactamase-producing K. pneumoniae strains cultured from patients in the 2,000-bed Houston Methodist Hospital system between September 2011 and May 2015, representing a comprehensive, population-based strain sample. Strains of largely uncharacterized clonal group 307 (CG307 caused more infections than those of well-studied epidemic CG258. Strains varied markedly in gene content and had an extensive array of small and very large plasmids, often containing antimicrobial resistance genes. Some patients with multiple strains cultured over time were infected with genetically distinct clones. We identified 15 strains expressing the New Delhi metallo-beta-lactamase 1 (NDM-1 enzyme that confers broad resistance to nearly all beta-lactam antibiotics. Transcriptome sequencing analysis of 10 phylogenetically diverse strains showed that the global transcriptome of each strain was unique and highly variable. Experimental mouse infection provided new information about immunological parameters of host-pathogen interaction. We exploited the large data set to develop whole-genome sequence-based classifiers that accurately predict clinical antimicrobial resistance for 12 of the 16 antibiotics tested. We conclude that analysis of large, comprehensive, population-based strain samples can assist understanding of the molecular diversity of these organisms and contribute to enhanced translational research.
Evaluation of LDA Ensembles Classifiers for Brain Computer Interface

International Nuclear Information System (INIS)

Arjona, Cristian; Pentácolo, José; Gareis, Iván; Atum, Yanina; Gentiletti, Gerardo; Acevedo, Rubén; Rufiner, Leonardo

2011-01-01

The Brain Computer Interface (BCI) translates brain activity into computer commands. To increase the performance of the BCI, to decode the user intentions it is necessary to get better the feature extraction and classification techniques. In this article the performance of a three linear discriminant analysis (LDA) classifiers ensemble is studied. The system based on ensemble can theoretically achieved better classification results than the individual counterpart, regarding individual classifier generation algorithm and the procedures for combine their outputs. Classic algorithms based on ensembles such as bagging and boosting are discussed here. For the application on BCI, it was concluded that the generated results using ER and AUC as performance index do not give enough information to establish which configuration is better.
Kohn-Sham Theory for Ground-State Ensembles

International Nuclear Information System (INIS)

Ullrich, C. A.; Kohn, W.

2001-01-01

An electron density distribution n(r) which can be represented by that of a single-determinant ground state of noninteracting electrons in an external potential v(r) is called pure-state v -representable (P-VR). Most physical electronic systems are P-VR. Systems which require a weighted sum of several such determinants to represent their density are called ensemble v -representable (E-VR). This paper develops formal Kohn-Sham equations for E-VR physical systems, using the appropriate coupling constant integration. It also derives local density- and generalized gradient approximations, and conditions and corrections specific to ensembles
Verification of Ensemble Forecasts for the New York City Operations Support Tool

Science.gov (United States)

Day, G.; Schaake, J. C.; Thiemann, M.; Draijer, S.; Wang, L.

2012-12-01

The New York City water supply system operated by the Department of Environmental Protection (DEP) serves nine million people. It covers 2,000 square miles of portions of the Catskill, Delaware, and Croton watersheds, and it includes nineteen reservoirs and three controlled lakes. DEP is developing an Operations Support Tool (OST) to support its water supply operations and planning activities. OST includes historical and real-time data, a model of the water supply system complete with operating rules, and lake water quality models developed to evaluate alternatives for managing turbidity in the New York City Catskill reservoirs. OST will enable DEP to manage turbidity in its unfiltered system while satisfying its primary objective of meeting the City's water supply needs, in addition to considering secondary objectives of maintaining ecological flows, supporting fishery and recreation releases, and mitigating downstream flood peaks. The current version of OST relies on statistical forecasts of flows in the system based on recent observed flows. To improve short-term decision making, plans are being made to transition to National Weather Service (NWS) ensemble forecasts based on hydrologic models that account for short-term weather forecast skill, longer-term climate information, as well as the hydrologic state of the watersheds and recent observed flows. To ensure that the ensemble forecasts are unbiased and that the ensemble spread reflects the actual uncertainty of the forecasts, a statistical model has been developed to post-process the NWS ensemble forecasts to account for hydrologic model error as well as any inherent bias and uncertainty in initial model states, meteorological data and forecasts. The post-processor is designed to produce adjusted ensemble forecasts that are consistent with the DEP historical flow sequences that were used to develop the system operating rules. A set of historical hindcasts that is representative of the real-time ensemble
Distribution of the Largest Eigenvalues of the Levi-Smirnov Ensemble

International Nuclear Information System (INIS)

Wieczorek, W.

2004-01-01

We calculate the distribution of the k-th largest eigenvalue in the random matrix Levi - Smirnov Ensemble (LSE), using the spectral dualism between LSE and chiral Gaussian Unitary Ensemble (GUE). Then we reconstruct universal spectral oscillations and we investigate an asymptotic behavior of the spectral distribution. (author)
Products of random matrices from fixed trace and induced Ginibre ensembles

Science.gov (United States)

Akemann, Gernot; Cikovic, Milan

2018-05-01

We investigate the microcanonical version of the complex induced Ginibre ensemble, by introducing a fixed trace constraint for its second moment. Like for the canonical Ginibre ensemble, its complex eigenvalues can be interpreted as a two-dimensional Coulomb gas, which are now subject to a constraint and a modified, collective confining potential. Despite the lack of determinantal structure in this fixed trace ensemble, we compute all its density correlation functions at finite matrix size and compare to a fixed trace ensemble of normal matrices, representing a different Coulomb gas. Our main tool of investigation is the Laplace transform, that maps back the fixed trace to the induced Ginibre ensemble. Products of random matrices have been used to study the Lyapunov and stability exponents for chaotic dynamical systems, where the latter are based on the complex eigenvalues of the product matrix. Because little is known about the universality of the eigenvalue distribution of such product matrices, we then study the product of m induced Ginibre matrices with a fixed trace constraint—which are clearly non-Gaussian—and M ‑ m such Ginibre matrices without constraint. Using an m-fold inverse Laplace transform, we obtain a concise result for the spectral density of such a mixed product matrix at finite matrix size, for arbitrary fixed m and M. Very recently local and global universality was proven by the authors and their coworker for a more general, single elliptic fixed trace ensemble in the bulk of the spectrum. Here, we argue that the spectral density of mixed products is in the same universality class as the product of M independent induced Ginibre ensembles.
A Single-column Model Ensemble Approach Applied to the TWP-ICE Experiment

Science.gov (United States)

Davies, L.; Jakob, C.; Cheung, K.; DelGenio, A.; Hill, A.; Hume, T.; Keane, R. J.; Komori, T.; Larson, V. E.; Lin, Y.;

2013-01-01

Single-column models (SCM) are useful test beds for investigating the parameterization schemes of numerical weather prediction and climate models. The usefulness of SCM simulations are limited, however, by the accuracy of the best estimate large-scale observations prescribed. Errors estimating the observations will result in uncertainty in modeled simulations. One method to address the modeled uncertainty is to simulate an ensemble where the ensemble members span observational uncertainty. This study first derives an ensemble of large-scale data for the Tropical Warm Pool International Cloud Experiment (TWP-ICE) based on an estimate of a possible source of error in the best estimate product. These data are then used to carry out simulations with 11 SCM and two cloud-resolving models (CRM). Best estimate simulations are also performed. All models show that moisture-related variables are close to observations and there are limited differences between the best estimate and ensemble mean values. The models, however, show different sensitivities to changes in the forcing particularly when weakly forced. The ensemble simulations highlight important differences in the surface evaporation term of the moisture budget between the SCM and CRM. Differences are also apparent between the models in the ensemble mean vertical structure of cloud variables, while for each model, cloud properties are relatively insensitive to forcing. The ensemble is further used to investigate cloud variables and precipitation and identifies differences between CRM and SCM particularly for relationships involving ice. This study highlights the additional analysis that can be performed using ensemble simulations and hence enables a more complete model investigation compared to using the more traditional single best estimate simulation only.

ENSEMBLE methods to reconcile disparate national long range dispersion forecasting

Energy Technology Data Exchange (ETDEWEB)

Mikkelsen, T; Galmarini, S; Bianconi, R; French, S [eds.

2003-11-01

ENSEMBLE is a web-based decision support system for real-time exchange and evaluation of national long-range dispersion forecasts of nuclear releases with cross-boundary consequences. The system is developed with the purpose to reconcile among disparate national forecasts for long-range dispersion. ENSEMBLE addresses the problem of achieving a common coherent strategy across European national emergency management when national long-range dispersion forecasts differ from one another during an accidental atmospheric release of radioactive material. A series of new decision-making 'ENSEMBLE' procedures and Web-based software evaluation and exchange tools have been created for real-time reconciliation and harmonisation of real-time dispersion forecasts from meteorological and emergency centres across Europe during an accident. The new ENSEMBLE software tools is available to participating national emergency and meteorological forecasting centres, which may choose to integrate them directly into operational emergency information systems, or possibly use them as a basis for future system development. (au)
ENSEMBLE methods to reconcile disparate national long range dispersion forecasting

Energy Technology Data Exchange (ETDEWEB)

Mikkelsen, T.; Galmarini, S.; Bianconi, R.; French, S. (eds.)

2003-11-01

ENSEMBLE is a web-based decision support system for real-time exchange and evaluation of national long-range dispersion forecasts of nuclear releases with cross-boundary consequences. The system is developed with the purpose to reconcile among disparate national forecasts for long-range dispersion. ENSEMBLE addresses the problem of achieving a common coherent strategy across European national emergency management when national long-range dispersion forecasts differ from one another during an accidental atmospheric release of radioactive material. A series of new decision-making 'ENSEMBLE' procedures and Web-based software evaluation and exchange tools have been created for real-time reconciliation and harmonisation of real-time dispersion forecasts from meteorological and emergency centres across Europe during an accident. The new ENSEMBLE software tools is available to participating national emergency and meteorological forecasting centres, which may choose to integrate them directly into operational emergency information systems, or possibly use them as a basis for future system development. (au)
A study of fuzzy logic ensemble system performance on face recognition problem

Science.gov (United States)

Polyakova, A.; Lipinskiy, L.

2017-02-01

Some problems are difficult to solve by using a single intelligent information technology (IIT). The ensemble of the various data mining (DM) techniques is a set of models which are able to solve the problem by itself, but the combination of which allows increasing the efficiency of the system as a whole. Using the IIT ensembles can improve the reliability and efficiency of the final decision, since it emphasizes on the diversity of its components. The new method of the intellectual informational technology ensemble design is considered in this paper. It is based on the fuzzy logic and is designed to solve the classification and regression problems. The ensemble consists of several data mining algorithms: artificial neural network, support vector machine and decision trees. These algorithms and their ensemble have been tested by solving the face recognition problems. Principal components analysis (PCA) is used for feature selection.
Intelligent classification of electrocardiogram (ECG) signal using extended Kalman Filter (EKF) based neuro fuzzy system.

Science.gov (United States)

Meau, Yeong Pong; Ibrahim, Fatimah; Narainasamy, Selvanathan A L; Omar, Razali

2006-05-01

This study presents the development of a hybrid system consisting of an ensemble of Extended Kalman Filter (EKF) based Multi Layer Perceptron Network (MLPN) and a one-pass learning Fuzzy Inference System using Look-up Table Scheme for the recognition of electrocardiogram (ECG) signals. This system can distinguish various types of abnormal ECG signals such as Ventricular Premature Cycle (VPC), T wave inversion (TINV), ST segment depression (STDP), and Supraventricular Tachycardia (SVT) from normal sinus rhythm (NSR) ECG signal.
Tweet-based Target Market Classification Using Ensemble Method

Directory of Open Access Journals (Sweden)

Muhammad Adi Khairul Anshary

2016-09-01

Full Text Available Target market classification is aimed at focusing marketing activities on the right targets. Classification of target markets can be done through data mining and by utilizing data from social media, e.g. Twitter. The end result of data mining are learning models that can classify new data. Ensemble methods can improve the accuracy of the models and therefore provide better results. In this study, classification of target markets was conducted on a dataset of 3000 tweets in order to extract features. Classification models were constructed to manipulate the training data using two ensemble methods (bagging and boosting. To investigate the effectiveness of the ensemble methods, this study used the CART (classification and regression tree algorithm for comparison. Three categories of consumer goods (computers, mobile phones and cameras and three categories of sentiments (positive, negative and neutral were classified towards three target-market categories. Machine learning was performed using Weka 3.6.9. The results of the test data showed that the bagging method improved the accuracy of CART with 1.9% (to 85.20%. On the other hand, for sentiment classification, the ensemble methods were not successful in increasing the accuracy of CART. The results of this study may be taken into consideration by companies who approach their customers through social media, especially Twitter.
Phase Locking a Clock Oscillator to a Coherent Atomic Ensemble

Directory of Open Access Journals (Sweden)

R. Kohlhaas

2015-04-01

Full Text Available The sensitivity of an atomic interferometer increases when the phase evolution of its quantum superposition state is measured over a longer interrogation interval. In practice, a limit is set by the measurement process, which returns not the phase but its projection in terms of population difference on two energetic levels. The phase interval over which the relation can be inverted is thus limited to the interval [-π/2,π/2]; going beyond it introduces an ambiguity in the readout, hence a sensitivity loss. Here, we extend the unambiguous interval to probe the phase evolution of an atomic ensemble using coherence-preserving measurements and phase corrections, and demonstrate the phase lock of the clock oscillator to an atomic superposition state. We propose a protocol based on the phase lock to improve atomic clocks limited by local oscillator noise, and foresee the application to other atomic interferometers such as inertial sensors.
Programming in the Zone: Repertoire Selection for the Large Ensemble

Science.gov (United States)

Hopkins, Michael

2013-01-01

One of the great challenges ensemble directors face is selecting high-quality repertoire that matches the musical and technical levels of their ensembles. Thoughtful repertoire selection can lead to increased student motivation as well as greater enthusiasm for the music program from parents, administrators, teachers, and community members. Common…
The metastasis suppressor KISS1 is an intrinsically disordered protein slightly more extended than a random coil.

Science.gov (United States)

Ibáñez de Opakua, Alain; Merino, Nekane; Villate, Maider; Cordeiro, Tiago N; Ormaza, Georgina; Sánchez-Carbayo, Marta; Diercks, Tammo; Bernadó, Pau; Blanco, Francisco J

2017-01-01

The metastasis suppressor KISS1 is reported to be involved in the progression of several solid neoplasias, making it a promising molecular target for controlling their metastasis. The KISS1 sequence contains an N-terminal secretion signal and several dibasic sequences that are proposed to be the proteolytic cleavage sites. We present the first structural characterization of KISS1 by circular dichroism, multi-angle light scattering, small angle X-Ray scattering and NMR spectroscopy. An analysis of the KISS1 backbone NMR chemical shifts does not reveal any preferential conformation and deviation from a random coil ensemble. The backbone 15N transverse relaxation times indicate a mildly reduced mobility for two regions that are rich in bulky residues. The small angle X-ray scattering curve of KISS1 is likewise consistent with a predominantly random coil ensemble, although an ensemble optimization analysis indicates some preference for more extended conformations possibly due to positive charge repulsion between the abundant basic residues. Our results support the hypothesis that KISS1 mostly samples a random coil conformational space, which is consistent with its high susceptibility to proteolysis and the generation of Kisspeptin fragments.
Enabling grand-canonical Monte Carlo: extending the flexibility of GROMACS through the GromPy python interface module.

Science.gov (United States)

Pool, René; Heringa, Jaap; Hoefling, Martin; Schulz, Roland; Smith, Jeremy C; Feenstra, K Anton

2012-05-05

We report on a python interface to the GROMACS molecular simulation package, GromPy (available at https://github.com/GromPy). This application programming interface (API) uses the ctypes python module that allows function calls to shared libraries, for example, written in C. To the best of our knowledge, this is the first reported interface to the GROMACS library that uses direct library calls. GromPy can be used for extending the current GROMACS simulation and analysis modes. In this work, we demonstrate that the interface enables hybrid Monte-Carlo/molecular dynamics (MD) simulations in the grand-canonical ensemble, a simulation mode that is currently not implemented in GROMACS. For this application, the interplay between GromPy and GROMACS requires only minor modifications of the GROMACS source code, not affecting the operation, efficiency, and performance of the GROMACS applications. We validate the grand-canonical application against MD in the canonical ensemble by comparison of equations of state. The results of the grand-canonical simulations are in complete agreement with MD in the canonical ensemble. The python overhead of the grand-canonical scheme is only minimal. Copyright © 2012 Wiley Periodicals, Inc.
A Brief Tutorial on the Ensemble Kalman Filter

OpenAIRE

Mandel, Jan

2009-01-01

The ensemble Kalman filter (EnKF) is a recursive filter suitable for problems with a large number of variables, such as discretizations of partial differential equations in geophysical models. The EnKF originated as a version of the Kalman filter for large problems (essentially, the covariance matrix is replaced by the sample covariance), and it is now an important data assimilation component of ensemble forecasting. EnKF is related to the particle filter (in this context, a particle is the s...

Holographic entanglement entropy and the extended phase structure of STU black holes

International Nuclear Information System (INIS)

Caceres, Elena; Nguyen, Phuc H.; Pedraza, Juan F.

2015-01-01

We study the extended thermodynamics, obtained by considering the cosmological constant as a thermodynamic variable, of STU black holes in 4-dimensions in the fixed charge ensemble. The associated phase structure is conjectured to be dual to an RG-flow on the space of field theories. We find that for some charge configurations the phase structure resembles that of a Van der Waals gas: the system exhibits a family of first order phase transitions ending in a second order phase transition at a critical temperature. We calculate the holographic entanglement entropy for several charge configurations and show that for the cases where the gravity background exhibits Van der Waals behavior, the entanglement entropy presents a transition at the same critical temperature. To further characterize the phase transition we calculate appropriate critical exponents and show that they coincide. Thus, the entanglement entropy successfully captures the information of the extended phase structure. Finally, we discuss the physical interpretation of the extended space in terms of the boundary QFT and construct various holographic heat engines dual to STU black holes.
A comparison of resampling schemes for estimating model observer performance with small ensembles

Science.gov (United States)

Elshahaby, Fatma E. A.; Jha, Abhinav K.; Ghaly, Michael; Frey, Eric C.

2017-09-01

In objective assessment of image quality, an ensemble of images is used to compute the 1st and 2nd order statistics of the data. Often, only a finite number of images is available, leading to the issue of statistical variability in numerical observer performance. Resampling-based strategies can help overcome this issue. In this paper, we compared different combinations of resampling schemes (the leave-one-out (LOO) and the half-train/half-test (HT/HT)) and model observers (the conventional channelized Hotelling observer (CHO), channelized linear discriminant (CLD) and channelized quadratic discriminant). Observer performance was quantified by the area under the ROC curve (AUC). For a binary classification task and for each observer, the AUC value for an ensemble size of 2000 samples per class served as a gold standard for that observer. Results indicated that each observer yielded a different performance depending on the ensemble size and the resampling scheme. For a small ensemble size, the combination [CHO, HT/HT] had more accurate rankings than the combination [CHO, LOO]. Using the LOO scheme, the CLD and CHO had similar performance for large ensembles. However, the CLD outperformed the CHO and gave more accurate rankings for smaller ensembles. As the ensemble size decreased, the performance of the [CHO, LOO] combination seriously deteriorated as opposed to the [CLD, LOO] combination. Thus, it might be desirable to use the CLD with the LOO scheme when smaller ensemble size is available.
Weight Distribution for Non-binary Cluster LDPC Code Ensemble

Science.gov (United States)

Nozaki, Takayuki; Maehara, Masaki; Kasai, Kenta; Sakaniwa, Kohichi

In this paper, we derive the average weight distributions for the irregular non-binary cluster low-density parity-check (LDPC) code ensembles. Moreover, we give the exponential growth rate of the average weight distribution in the limit of large code length. We show that there exist $(2,d_c)$-regular non-binary cluster LDPC code ensembles whose normalized typical minimum distances are strictly positive.
GIGGLE: a search engine for large-scale integrated genome analysis

Science.gov (United States)

Layer, Ryan M; Pedersen, Brent S; DiSera, Tonya; Marth, Gabor T; Gertz, Jason; Quinlan, Aaron R

2018-01-01

GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation. PMID:29309061
Skill of Global Raw and Postprocessed Ensemble Predictions of Rainfall over Northern Tropical Africa

Science.gov (United States)

Vogel, Peter; Knippertz, Peter; Fink, Andreas H.; Schlueter, Andreas; Gneiting, Tilmann

2018-04-01

Accumulated precipitation forecasts are of high socioeconomic importance for agriculturally dominated societies in northern tropical Africa. In this study, we analyze the performance of nine operational global ensemble prediction systems (EPSs) relative to climatology-based forecasts for 1 to 5-day accumulated precipitation based on the monsoon seasons 2007-2014 for three regions within northern tropical Africa. To assess the full potential of raw ensemble forecasts across spatial scales, we apply state-of-the-art statistical postprocessing methods in form of Bayesian Model Averaging (BMA) and Ensemble Model Output Statistics (EMOS), and verify against station and spatially aggregated, satellite-based gridded observations. Raw ensemble forecasts are uncalibrated, unreliable, and underperform relative to climatology, independently of region, accumulation time, monsoon season, and ensemble. Differences between raw ensemble and climatological forecasts are large, and partly stem from poor prediction for low precipitation amounts. BMA and EMOS postprocessed forecasts are calibrated, reliable, and strongly improve on the raw ensembles, but - somewhat disappointingly - typically do not outperform climatology. Most EPSs exhibit slight improvements over the period 2007-2014, but overall have little added value compared to climatology. We suspect that the parametrization of convection is a potential cause for the sobering lack of ensemble forecast skill in a region dominated by mesoscale convective systems.
Internal Spin Control, Squeezing and Decoherence in Ensembles of Alkali Atomic Spins

Science.gov (United States)

Norris, Leigh Morgan

Large atomic ensembles interacting with light are one of the most promising platforms for quantum information processing. In the past decade, novel applications for these systems have emerged in quantum communication, quantum computing, and metrology. Essential to all of these applications is the controllability of the atomic ensemble, which is facilitated by a strong coupling between the atoms and light. Non-classical spin squeezed states are a crucial step in attaining greater ensemble control. The degree of entanglement present in these states, furthermore, serves as a benchmark for the strength of the atom-light interaction. Outside the broader context of quantum information processing with atomic ensembles, spin squeezed states have applications in metrology, where their quantum correlations can be harnessed to improve the precision of magnetometers and atomic clocks. This dissertation focuses upon the production of spin squeezed states in large ensembles of cold trapped alkali atoms interacting with optical fields. While most treatments of spin squeezing consider only the case in which the ensemble is composed of two level systems or qubits, we utilize the entire ground manifold of an alkali atom with hyperfine spin f greater than or equal to 1/2, a qudit. Spin squeezing requires non-classical correlations between the constituent atomic spins, which are generated through the atoms' collective coupling to the light. Either through measurement or multiple interactions with the atoms, the light mediates an entangling interaction that produces quantum correlations. Because the spin squeezing treated in this dissertation ultimately originates from the coupling between the light and atoms, conventional approaches of improving this squeezing have focused on increasing the optical density of the ensemble. The greater number of internal degrees of freedom and the controllability of the spin-f ground hyperfine manifold enable novel methods of enhancing squeezing. In
Regional interdependency of precipitation indices across Denmark in two ensembles of high-resolution RCMs

DEFF Research Database (Denmark)

Sunyer Pinya, Maria Antonia; Madsen, Henrik; Rosbjerg, Dan

2013-01-01

all these methods is that the climate models are independent. This study addresses the validity of this assumption for two ensembles of regional climate models (RCMs) from the Ensemble-Based Predictions of Climate Changes and their Impacts (ENSEMBLES) project based on the land cells covering Denmark....... Daily precipitation indices from an ensemble of RCMs driven by the 40-yrECMWFRe-Analysis (ERA-40) and an ensemble of the same RCMs driven by different general circulation models (GCMs) are analyzed. Two different methods are used to estimate the amount of independent information in the ensembles....... These are based on different statistical properties of a measure of climate model error. Additionally, a hierarchical cluster analysis is carried out. Regardless of the method used, the effective number of RCMs is smaller than the total number of RCMs. The estimated effective number of RCMs varies depending...
Ensemble Clustering using Semidefinite Programming with Applications.

Science.gov (United States)

Singh, Vikas; Mukherjee, Lopamudra; Peng, Jiming; Xu, Jinhui

2010-05-01

In this paper, we study the ensemble clustering problem, where the input is in the form of multiple clustering solutions. The goal of ensemble clustering algorithms is to aggregate the solutions into one solution that maximizes the agreement in the input ensemble. We obtain several new results for this problem. Specifically, we show that the notion of agreement under such circumstances can be better captured using a 2D string encoding rather than a voting strategy, which is common among existing approaches. Our optimization proceeds by first constructing a non-linear objective function which is then transformed into a 0-1 Semidefinite program (SDP) using novel convexification techniques. This model can be subsequently relaxed to a polynomial time solvable SDP. In addition to the theoretical contributions, our experimental results on standard machine learning and synthetic datasets show that this approach leads to improvements not only in terms of the proposed agreement measure but also the existing agreement measures based on voting strategies. In addition, we identify several new application scenarios for this problem. These include combining multiple image segmentations and generating tissue maps from multiple-channel Diffusion Tensor brain images to identify the underlying structure of the brain.
2 × 2 random matrix ensembles with reduced symmetry: from Hermitian to PT -symmetric matrices

International Nuclear Information System (INIS)

Gong Jiangbin; Wang Qinghai

2012-01-01

A possibly fruitful extension of conventional random matrix ensembles is proposed by imposing symmetry constraints on conventional Hermitian matrices or parity–time (PT)-symmetric matrices. To illustrate the main idea, we first study 2 × 2 complex Hermitian matrix ensembles with O(2)-invariant constraints, yielding novel level-spacing statistics such as singular distributions, the half-Gaussian distribution, distributions interpolating between the GOE (Gaussian orthogonal ensemble) distribution and half-Gaussian distributions, as well as the gapped-GOE distribution. Such a symmetry-reduction strategy is then used to explore 2 × 2 PT-symmetric matrix ensembles with real eigenvalues. In particular, PT-symmetric random matrix ensembles with U(2) invariance can be constructed, with the conventional complex Hermitian random matrix ensemble being a special case. In two examples of PT-symmetric random matrix ensembles, the level-spacing distributions are found to be the standard GUE (Gaussian unitary ensemble) statistics or the ‘truncated-GUE’ statistics. This article is part of a special issue of Journal of Physics A: Mathematical and Theoretical devoted to ‘Quantum physics with non-Hermitian operators’. (paper)
New technologies for examining neuronal ensembles in drug addiction and fear

Science.gov (United States)

Cruz, Fabio C.; Koya, Eisuke; Guez-Barber, Danielle H.; Bossert, Jennifer M.; Lupica, Carl R.; Shaham, Yavin; Hope, Bruce T.

2015-01-01

Correlational data suggest that learned associations are encoded within neuronal ensembles. However, it has been difficult to prove that neuronal ensembles mediate learned behaviours because traditional pharmacological and lesion methods, and even newer cell type-specific methods, affect both activated and non-activated neurons. Additionally, previous studies on synaptic and molecular alterations induced by learning did not distinguish between behaviourally activated and non-activated neurons. Here, we describe three new approaches—Daun02 inactivation, FACS sorting of activated neurons and c-fos-GFP transgenic rats — that have been used to selectively target and study activated neuronal ensembles in models of conditioned drug effects and relapse. We also describe two new tools — c-fos-tTA mice and inactivation of CREB-overexpressing neurons — that have been used to study the role of neuronal ensembles in conditioned fear. PMID:24088811
A Simple Approach to Account for Climate Model Interdependence in Multi-Model Ensembles

Science.gov (United States)

Herger, N.; Abramowitz, G.; Angelil, O. M.; Knutti, R.; Sanderson, B.

2016-12-01

Multi-model ensembles are an indispensable tool for future climate projection and its uncertainty quantification. Ensembles containing multiple climate models generally have increased skill, consistency and reliability. Due to the lack of agreed-on alternatives, most scientists use the equally-weighted multi-model mean as they subscribe to model democracy ("one model, one vote").Different research groups are known to share sections of code, parameterizations in their model, literature, or even whole model components. Therefore, individual model runs do not represent truly independent estimates. Ignoring this dependence structure might lead to a false model consensus, wrong estimation of uncertainty and effective number of independent models.Here, we present a way to partially address this problem by selecting a subset of CMIP5 model runs so that its climatological mean minimizes the RMSE compared to a given observation product. Due to the cancelling out of errors, regional biases in the ensemble mean are reduced significantly.Using a model-as-truth experiment we demonstrate that those regional biases persist into the future and we are not fitting noise, thus providing improved observationally-constrained projections of the 21st century. The optimally selected ensemble shows significantly higher global mean surface temperature projections than the original ensemble, where all the model runs are considered. Moreover, the spread is decreased well beyond that expected from the decreased ensemble size.Several previous studies have recommended an ensemble selection approach based on performance ranking of the model runs. Here, we show that this approach can perform even worse than randomly selecting ensemble members and can thus be harmful. We suggest that accounting for interdependence in the ensemble selection process is a necessary step for robust projections for use in impact assessments, adaptation and mitigation of climate change.
A Separation between Divergence and Holevo Information for Ensembles

OpenAIRE

Jain, Rahul; Nayak, Ashwin; Su, Yi

2007-01-01

The notion of divergence information of an ensemble of probability distributions was introduced by Jain, Radhakrishnan, and Sen in the context of the ``substate theorem''. Since then, divergence has been recognized as a more natural measure of information in several situations in quantum and classical communication. We construct ensembles of probability distributions for which divergence information may be significantly smaller than the more standard Holevo information. As a result, we establ...
Multimodel hydrological ensemble forecasts for the Baskatong catchment in Canada using the TIGGE database.

Science.gov (United States)

Tito Arandia Martinez, Fabian

2014-05-01

Adequate uncertainty assessment is an important issue in hydrological modelling. An important issue for hydropower producers is to obtain ensemble forecasts which truly grasp the uncertainty linked to upcoming streamflows. If properly assessed, this uncertainty can lead to optimal reservoir management and energy production (ex. [1]). The meteorological inputs to the hydrological model accounts for an important part of the total uncertainty in streamflow forecasting. Since the creation of the THORPEX initiative and the TIGGE database, access to meteorological ensemble forecasts from nine agencies throughout the world have been made available. This allows for hydrological ensemble forecasts based on multiple meteorological ensemble forecasts. Consequently, both the uncertainty linked to the architecture of the meteorological model and the uncertainty linked to the initial condition of the atmosphere can be accounted for. The main objective of this work is to show that a weighted combination of meteorological ensemble forecasts based on different atmospheric models can lead to improved hydrological ensemble forecasts, for horizons from one to ten days. This experiment is performed for the Baskatong watershed, a head subcatchment of the Gatineau watershed in the province of Quebec, in Canada. Baskatong watershed is of great importance for hydro-power production, as it comprises the main reservoir for the Gatineau watershed, on which there are six hydropower plants managed by Hydro-Québec. Since the 70's, they have been using pseudo ensemble forecast based on deterministic meteorological forecasts to which variability derived from past forecasting errors is added. We use a combination of meteorological ensemble forecasts from different models (precipitation and temperature) as the main inputs for hydrological model HSAMI ([2]). The meteorological ensembles from eight of the nine agencies available through TIGGE are weighted according to their individual performance and
Multimodel ensembles of wheat growth

DEFF Research Database (Denmark)

Martre, Pierre; Wallach, Daniel; Asseng, Senthold

2015-01-01

, but such studies are difficult to organize and have only recently begun. We report on the largest ensemble study to date, of 27 wheat models tested in four contrasting locations for their accuracy in simulating multiple crop growth and yield variables. The relative error averaged over models was 24...
Teaching Strategies for Specialized Ensembles.

Science.gov (United States)

Teaching Music, 1999

1999-01-01

Provides a strategy, from the book "Strategies for Teaching Specialized Ensembles," that addresses Standard 9A of the National Standards for Music Education. Explains that students will identify and describe the musical and historical characteristics of the classical era in music they perform and in audio examples. (CMK)
Limited-area short-range ensemble predictions targeted for heavy rain in Europe

Directory of Open Access Journals (Sweden)

K. Sattler

2005-01-01

Full Text Available Inherent uncertainties in short-range quantitative precipitation forecasts (QPF from the high-resolution, limited-area numerical weather prediction model DMI-HIRLAM (LAM are addressed using two different approaches to creating a small ensemble of LAM simulations, with focus on prediction of extreme rainfall events over European river basins. The first ensemble type is designed to represent uncertainty in the atmospheric state of the initial condition and at the lateral LAM boundaries. The global ensemble prediction system (EPS from ECMWF serves as host model to the LAM and provides the state perturbations, from which a small set of significant members is selected. The significance is estimated on the basis of accumulated precipitation over a target area of interest, which contains the river basin(s under consideration. The selected members provide the initial and boundary data for the ensemble integration in the LAM. A second ensemble approach tries to address a portion of the model-inherent uncertainty responsible for errors in the forecasted precipitation field by utilising different parameterisation schemes for condensation and convection in the LAM. Three periods around historical heavy rain events that caused or contributed to disastrous river flooding in Europe are used to study the performance of the LAM ensemble designs. The three cases exhibit different dynamic and synoptic characteristics and provide an indication of the ensemble qualities in different weather situations. Precipitation analyses from the Deutsche Wetterdienst (DWD are used as the verifying reference and a comparison of daily rainfall amounts is referred to the respective river basins of the historical cases.
The canonical ensemble redefined - 3. Ideal Bose gas

International Nuclear Information System (INIS)

Venkataraman, R.

1984-12-01

The ideal Bose gas solved in the redefined ensemble formalism exhibits a discontinuity in the specific heat suggesting that Bose-Einstein condensation is a second order phase transition. The deviations from the classical ideal gas behaviour are larger than those predicted by Gibbs ensemble. Below Tsub(c) the pressure is not independent of the volume. For a certain range of values of VT 3 , the peak in black body radiation shows a shift in the frequency scale and this could be detected, at least in principle, experimentally. (author)
Analogies between random matrix ensembles and the one-component plasma in two-dimensions

Directory of Open Access Journals (Sweden)

Peter J. Forrester

2016-03-01

Full Text Available The eigenvalue PDF for some well known classes of non-Hermitian random matrices — the complex Ginibre ensemble for example — can be interpreted as the Boltzmann factor for one-component plasma systems in two-dimensional domains. We address this theme in a systematic fashion, identifying the plasma system for the Ginibre ensemble of non-Hermitian Gaussian random matrices G, the spherical ensemble of the product of an inverse Ginibre matrix and a Ginibre matrix G1−1G2, and the ensemble formed by truncating unitary matrices, as well as for products of such matrices. We do this when each has either real, complex or real quaternion elements. One consequence of this analogy is that the leading form of the eigenvalue density follows as a corollary. Another is that the eigenvalue correlations must obey sum rules known to characterise the plasma system, and this leads us to an exhibit of an integral identity satisfied by the two-particle correlation for real quaternion matrices in the neighbourhood of the real axis. Further random matrix ensembles investigated from this viewpoint are self dual non-Hermitian matrices, in which a previous study has related to the one-component plasma system in a disk at inverse temperature β=4, and the ensemble formed by the single row and column of quaternion elements from a member of the circular symplectic ensemble.
A Link-Based Cluster Ensemble Approach For Improved Gene Expression Data Analysis

Directory of Open Access Journals (Sweden)

P.Balaji

2015-01-01

Full Text Available Abstract It is difficult from possibilities to select a most suitable effective way of clustering algorithm and its dataset for a defined set of gene expression data because we have a huge number of ways and huge number of gene expressions. At present many researchers are preferring to use hierarchical clustering in different forms this is no more totally optimal. Cluster ensemble research can solve this type of problem by automatically merging multiple data partitions from a wide range of different clusterings of any dimensions to improve both the quality and robustness of the clustering result. But we have many existing ensemble approaches using an association matrix to condense sample-cluster and co-occurrence statistics and relations within the ensemble are encapsulated only at raw level while the existing among clusters are totally discriminated. Finding these missing associations can greatly expand the capability of those ensemble methodologies for microarray data clustering. We propose general K-means cluster ensemble approach for the clustering of general categorical data into required number of partitions.
Towards quantum optics and entanglement with electron spin ensembles in semiconductors

NARCIS (Netherlands)

van der Wal, Caspar H.; Sladkov, Maksym

We discuss a technique and a material system that enable the controlled realization of quantum entanglement between spin-wave modes of electron ensembles in two spatially separated pieces of semiconductor material. The approach uses electron ensembles in GaAs quantum wells that are located inside

Preliminary Assessment of Tecplot Chorus for Analyzing Ensemble of CTH Simulations

Energy Technology Data Exchange (ETDEWEB)

Agelastos, Anthony Michael [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Stevenson, Joel O. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Attaway, Stephen W. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Peterson, David

2015-04-01

The exploration of large parameter spaces in search of problem solution and uncertainty quantifcation produces very large ensembles of data. Processing ensemble data will continue to require more resources as simulation complexity and HPC platform throughput increase. More tools are needed to help provide rapid insight into these data sets to decrease manual processing time by the analyst and to increase knowledge the data can provide. One such tool is Tecplot Chorus, whose strengths are visualizing ensemble metadata and linked images. This report contains the analysis and conclusions from evaluating Tecplot Chorus with an example problem that is relevant to Sandia National Laboratories.
Atom lasers, coherent states, and coherence II. Maximally robust ensembles of pure states

International Nuclear Information System (INIS)

Wiseman, H.M.; Vaccaro, John A.

2002-01-01

As discussed in the preceding paper [Wiseman and Vaccaro, preceding paper, Phys. Rev. A 65, 043605 (2002)], the stationary state of an optical or atom laser far above threshold is a mixture of coherent field states with random phase, or, equivalently, a Poissonian mixture of number states. We are interested in which, if either, of these descriptions of ρ ss as a stationary ensemble of pure states, is more natural. In the preceding paper we concentrated upon the question of whether descriptions such as these are physically realizable (PR). In this paper we investigate another relevant aspect of these ensembles, their robustness. A robust ensemble is one for which the pure states that comprise it survive relatively unchanged for a long time under the system evolution. We determine numerically the most robust ensembles as a function of the parameters in the laser model: the self-energy χ of the bosons in the laser mode, and the excess phase noise ν. We find that these most robust ensembles are PR ensembles, or similar to PR ensembles, for all values of these parameters. In the ideal laser limit (ν=χ=0), the most robust states are coherent states. As the phase noise or phase dispersion is increased through ν or the self-interaction of the bosons χ, respectively, the most robust states become more and more amplitude squeezed. We find scaling laws for these states, and give analytical derivations for them. As the phase diffusion or dispersion becomes so large that the laser output is no longer quantum coherent, the most robust states become so squeezed that they cease to have a well-defined coherent amplitude. That is, the quantum coherence of the laser output is manifest in the most robust PR ensemble being an ensemble of states with a well-defined coherent amplitude. This lends support to our approach of regarding robust PR ensembles as the most natural description of the state of the laser mode. It also has interesting implications for atom lasers in particular
Ensemble perception of emotions in autistic and typical children and adolescents

Directory of Open Access Journals (Sweden)

Themelis Karaminis

2017-04-01

Full Text Available Ensemble perception, the ability to assess automatically the summary of large amounts of information presented in visual scenes, is available early in typical development. This ability might be compromised in autistic children, who are thought to present limitations in maintaining summary statistics representations for the recent history of sensory input. Here we examined ensemble perception of facial emotional expressions in 35 autistic children, 30 age- and ability-matched typical children and 25 typical adults. Participants received three tasks: a an ‘ensemble’ emotion discrimination task; b a baseline (single-face emotion discrimination task; and c a facial expression identification task. Children performed worse than adults on all three tasks. Unexpectedly, autistic and typical children were, on average, indistinguishable in their precision and accuracy on all three tasks. Computational modelling suggested that, on average, autistic and typical children used ensemble-encoding strategies to a similar extent; but ensemble perception was related to non-verbal reasoning abilities in autistic but not in typical children. Eye-movement data also showed no group differences in the way children attended to the stimuli. Our combined findings suggest that the abilities of autistic and typical children for ensemble perception of emotions are comparable on average.
Model dependence and its effect on ensemble projections in CMIP5

Science.gov (United States)

Abramowitz, G.; Bishop, C.

2013-12-01

Conceptually, the notion of model dependence within climate model ensembles is relatively simple - modelling groups share a literature base, parametrisations, data sets and even model code - the potential for dependence in sampling different climate futures is clear. How though can this conceptual problem inform a practical solution that demonstrably improves the ensemble mean and ensemble variance as an estimate of system uncertainty? While some research has already focused on error correlation or error covariance as a candidate to improve ensemble mean estimates, a complete definition of independence must at least implicitly subscribe to an ensemble interpretation paradigm, such as the 'truth-plus-error', 'indistinguishable', or more recently 'replicate Earth' paradigm. Using a definition of model dependence based on error covariance within the replicate Earth paradigm, this presentation will show that accounting for dependence in surface air temperature gives cooler projections in CMIP5 - by as much as 20% globally in some RCPs - although results differ significantly for each RCP, especially regionally. The fact that the change afforded by accounting for dependence across different RCPs is different is not an inconsistent result. Different numbers of submissions to each RCP by different modelling groups mean that differences in projections from different RCPs are not entirely about RCP forcing conditions - they also reflect different sampling strategies.
A Diagnostics Tool to detect ensemble forecast system anomaly and guide operational decisions

Science.gov (United States)

Park, G. H.; Srivastava, A.; Shrestha, E.; Thiemann, M.; Day, G. N.; Draijer, S.

2017-12-01

The hydrologic community is moving toward using ensemble forecasts to take uncertainty into account during the decision-making process. The New York City Department of Environmental Protection (DEP) implements several types of ensemble forecasts in their decision-making process: ensemble products for a statistical model (Hirsch and enhanced Hirsch); the National Weather Service (NWS) Advanced Hydrologic Prediction Service (AHPS) forecasts based on the classical Ensemble Streamflow Prediction (ESP) technique; and the new NWS Hydrologic Ensemble Forecasting Service (HEFS) forecasts. To remove structural error and apply the forecasts to additional forecast points, the DEP post processes both the AHPS and the HEFS forecasts. These ensemble forecasts provide mass quantities of complex data, and drawing conclusions from these forecasts is time-consuming and difficult. The complexity of these forecasts also makes it difficult to identify system failures resulting from poor data, missing forecasts, and server breakdowns. To address these issues, we developed a diagnostic tool that summarizes ensemble forecasts and provides additional information such as historical forecast statistics, forecast skill, and model forcing statistics. This additional information highlights the key information that enables operators to evaluate the forecast in real-time, dynamically interact with the data, and review additional statistics, if needed, to make better decisions. We used Bokeh, a Python interactive visualization library, and a multi-database management system to create this interactive tool. This tool compiles and stores data into HTML pages that allows operators to readily analyze the data with built-in user interaction features. This paper will present a brief description of the ensemble forecasts, forecast verification results, and the intended applications for the diagnostic tool.
Dynamical predictive power of the generalized Gibbs ensemble revealed in a second quench.

Science.gov (United States)

Zhang, J M; Cui, F C; Hu, Jiangping

2012-04-01

We show that a quenched and relaxed completely integrable system is hardly distinguishable from the corresponding generalized Gibbs ensemble in a dynamical sense. To be specific, the response of the quenched and relaxed system to a second quench can be accurately reproduced by using the generalized Gibbs ensemble as a substitute. Remarkably, as demonstrated with the transverse Ising model and the hard-core bosons in one dimension, not only the steady values but even the transient, relaxation dynamics of the physical variables can be accurately reproduced by using the generalized Gibbs ensemble as a pseudoinitial state. This result is an important complement to the previously established result that a quenched and relaxed system is hardly distinguishable from the generalized Gibbs ensemble in a static sense. The relevance of the generalized Gibbs ensemble in the nonequilibrium dynamics of completely integrable systems is then greatly strengthened.
Random matrix ensembles for PT-symmetric systems

International Nuclear Information System (INIS)

Graefe, Eva-Maria; Mudute-Ndumbe, Steve; Taylor, Matthew

2015-01-01

Recently much effort has been made towards the introduction of non-Hermitian random matrix models respecting PT-symmetry. Here we show that there is a one-to-one correspondence between complex PT-symmetric matrices and split-complex and split-quaternionic versions of Hermitian matrices. We introduce two new random matrix ensembles of (a) Gaussian split-complex Hermitian; and (b) Gaussian split-quaternionic Hermitian matrices, of arbitrary sizes. We conjecture that these ensembles represent universality classes for PT-symmetric matrices. For the case of 2 × 2 matrices we derive analytic expressions for the joint probability distributions of the eigenvalues, the one-level densities and the level spacings in the case of real eigenvalues. (fast track communication)
Ensemble singular vectors and their use as additive inflation in EnKF

Directory of Open Access Journals (Sweden)

Shu-Chih Yang

2015-07-01

Full Text Available Given an ensemble of forecasts, it is possible to determine the leading ensemble singular vector (ESV, that is, the linear combination of the forecasts that, given the choice of the perturbation norm and forecast interval, will maximise the growth of the perturbations. Because the ESV indicates the directions of the fastest growing forecast errors, we explore the potential of applying the leading ESVs in ensemble Kalman filter (EnKF for correcting fast-growing errors. The ESVs are derived based on a quasi-geostrophic multi-level channel model, and data assimilation experiments are carried out under framework of the local ensemble transform Kalman filter. We confirm that even during the early spin-up starting with random initial conditions, the final ESVs of the first analysis with a 12-h window are strongly related to the background errors. Since initial ensemble singular vectors (IESVs grow much faster than Lyapunov Vectors (LVs, and the final ensemble singular vectors (FESVs are close to convergence to leading LVs, perturbations based on leading IESVs grow faster than those based on FESVs, and are therefore preferable as additive inflation. The IESVs are applied in the EnKF framework for constructing flow-dependent additive perturbations to inflate the analysis ensemble. Compared with using random perturbations as additive inflation, a positive impact from using ESVs is found especially in areas with large growing errors. When an EnKF is ‘cold-started’ from random perturbations and poor initial condition, results indicate that using the ESVs as additive inflation has the advantage of correcting large errors so that the spin-up of the EnKF can be accelerated.
Spectral Diagonal Ensemble Kalman Filters

Czech Academy of Sciences Publication Activity Database

Kasanický, Ivan; Mandel, Jan; Vejmelka, Martin

2015-01-01

Roč. 22, č. 4 (2015), s. 485-497 ISSN 1023-5809 R&D Projects: GA ČR GA13-34856S Grant - others:NSF(US) DMS-1216481 Institutional support: RVO:67985807 Keywords : data assimilation * ensemble Kalman filter * spectral representation Subject RIV: DG - Athmosphere Sciences, Meteorology Impact factor: 1.321, year: 2015
Tridiagonal realization of the antisymmetric Gaussian β-ensemble

International Nuclear Information System (INIS)

Dumitriu, Ioana; Forrester, Peter J.

2010-01-01

The Householder reduction of a member of the antisymmetric Gaussian unitary ensemble gives an antisymmetric tridiagonal matrix with all independent elements. The random variables permit the introduction of a positive parameter β, and the eigenvalue probability density function of the corresponding random matrices can be computed explicitly, as can the distribution of (q i ), the first components of the eigenvectors. Three proofs are given. One involves an inductive construction based on bordering of a family of random matrices which are shown to have the same distributions as the antisymmetric tridiagonal matrices. This proof uses the Dixon-Anderson integral from Selberg integral theory. A second proof involves the explicit computation of the Jacobian for the change of variables between real antisymmetric tridiagonal matrices, its eigenvalues, and (q i ). The third proof maps matrices from the antisymmetric Gaussian β-ensemble to those realizing particular examples of the Laguerre β-ensemble. In addition to these proofs, we note some simple properties of the shooting eigenvector and associated Pruefer phases of the random matrices.
Non-Hermitian Extensions of Wishart Random Matrix Ensembles

International Nuclear Information System (INIS)

Akemann, G.

2011-01-01

We briefly review the solution of three ensembles of non-Hermitian random matrices generalizing the Wishart-Laguerre (also called chiral) ensembles. These generalizations are realized as Gaussian two-matrix models, where the complex eigenvalues of the product of the two independent rectangular matrices are sought, with the matrix elements of both matrices being either real, complex or quaternion real. We also present the more general case depending on a non-Hermiticity parameter, that allows us to interpolate between the corresponding three Hermitian Wishart ensembles with real eigenvalues and the maximally non-Hermitian case. All three symmetry classes are explicitly solved for finite matrix size N x M for all complex eigenvalue correlations functions (and real or mixed correlations for real matrix elements). These are given in terms of the corresponding kernels built from orthogonal or skew-orthogonal Laguerre polynomials in the complex plane. We then present the corresponding three Bessel kernels in the complex plane in the microscopic large-N scaling limit at the origin, both at weak and strong non-Hermiticity with M - N ≥ 0 fixed. (author)
Geometric integrator for simulations in the canonical ensemble

Energy Technology Data Exchange (ETDEWEB)

Tapias, Diego, E-mail: diego.tapias@nucleares.unam.mx [Departamento de Física, Facultad de Ciencias, Universidad Nacional Autónoma de México, Ciudad Universitaria, Ciudad de México 04510 (Mexico); Sanders, David P., E-mail: dpsanders@ciencias.unam.mx [Departamento de Física, Facultad de Ciencias, Universidad Nacional Autónoma de México, Ciudad Universitaria, Ciudad de México 04510 (Mexico); Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139 (United States); Bravetti, Alessandro, E-mail: alessandro.bravetti@iimas.unam.mx [Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Ciudad Universitaria, Ciudad de México 04510 (Mexico)

2016-08-28

We introduce a geometric integrator for molecular dynamics simulations of physical systems in the canonical ensemble that preserves the invariant distribution in equations arising from the density dynamics algorithm, with any possible type of thermostat. Our integrator thus constitutes a unified framework that allows the study and comparison of different thermostats and of their influence on the equilibrium and non-equilibrium (thermo-)dynamic properties of a system. To show the validity and the generality of the integrator, we implement it with a second-order, time-reversible method and apply it to the simulation of a Lennard-Jones system with three different thermostats, obtaining good conservation of the geometrical properties and recovering the expected thermodynamic results. Moreover, to show the advantage of our geometric integrator over a non-geometric one, we compare the results with those obtained by using the non-geometric Gear integrator, which is frequently used to perform simulations in the canonical ensemble. The non-geometric integrator induces a drift in the invariant quantity, while our integrator has no such drift, thus ensuring that the system is effectively sampling the correct ensemble.
Thermodynamics and kinetics of a molecular motor ensemble.

Science.gov (United States)

Baker, J E; Thomas, D D

2000-10-01

If, contrary to conventional models of muscle, it is assumed that molecular forces equilibrate among rather than within molecular motors, an equation of state and an expression for energy output can be obtained for a near-equilibrium, coworking ensemble of molecular motors. These equations predict clear, testable relationships between motor structure, motor biochemistry, and ensemble motor function, and we discuss these relationships in the context of various experimental studies. In this model, net work by molecular motors is performed with the relaxation of a near-equilibrium intermediate step in a motor-catalyzed reaction. The free energy available for work is localized to this step, and the rate at which this free energy is transferred to work is accelerated by the free energy of a motor-catalyzed reaction. This thermodynamic model implicitly deals with a motile cell system as a dynamic network (not a rigid lattice) of molecular motors within which the mechanochemistry of one motor influences and is influenced by the mechanochemistry of other motors in the ensemble.
Geometric integrator for simulations in the canonical ensemble

International Nuclear Information System (INIS)

Tapias, Diego; Sanders, David P.; Bravetti, Alessandro

2016-01-01

We introduce a geometric integrator for molecular dynamics simulations of physical systems in the canonical ensemble that preserves the invariant distribution in equations arising from the density dynamics algorithm, with any possible type of thermostat. Our integrator thus constitutes a unified framework that allows the study and comparison of different thermostats and of their influence on the equilibrium and non-equilibrium (thermo-)dynamic properties of a system. To show the validity and the generality of the integrator, we implement it with a second-order, time-reversible method and apply it to the simulation of a Lennard-Jones system with three different thermostats, obtaining good conservation of the geometrical properties and recovering the expected thermodynamic results. Moreover, to show the advantage of our geometric integrator over a non-geometric one, we compare the results with those obtained by using the non-geometric Gear integrator, which is frequently used to perform simulations in the canonical ensemble. The non-geometric integrator induces a drift in the invariant quantity, while our integrator has no such drift, thus ensuring that the system is effectively sampling the correct ensemble.
Multi-criterion model ensemble of CMIP5 surface air temperature over China

Science.gov (United States)

Yang, Tiantian; Tao, Yumeng; Li, Jingjing; Zhu, Qian; Su, Lu; He, Xiaojia; Zhang, Xiaoming

2018-05-01

The global circulation models (GCMs) are useful tools for simulating climate change, projecting future temperature changes, and therefore, supporting the preparation of national climate adaptation plans. However, different GCMs are not always in agreement with each other over various regions. The reason is that GCMs' configurations, module characteristics, and dynamic forcings vary from one to another. Model ensemble techniques are extensively used to post-process the outputs from GCMs and improve the variability of model outputs. Root-mean-square error (RMSE), correlation coefficient (CC, or R) and uncertainty are commonly used statistics for evaluating the performances of GCMs. However, the simultaneous achievements of all satisfactory statistics cannot be guaranteed in using many model ensemble techniques. In this paper, we propose a multi-model ensemble framework, using a state-of-art evolutionary multi-objective optimization algorithm (termed MOSPD), to evaluate different characteristics of ensemble candidates and to provide comprehensive trade-off information for different model ensemble solutions. A case study of optimizing the surface air temperature (SAT) ensemble solutions over different geographical regions of China is carried out. The data covers from the period of 1900 to 2100, and the projections of SAT are analyzed with regard to three different statistical indices (i.e., RMSE, CC, and uncertainty). Among the derived ensemble solutions, the trade-off information is further analyzed with a robust Pareto front with respect to different statistics. The comparison results over historical period (1900-2005) show that the optimized solutions are superior over that obtained simple model average, as well as any single GCM output. The improvements of statistics are varying for different climatic regions over China. Future projection (2006-2100) with the proposed ensemble method identifies that the largest (smallest) temperature changes will happen in the
Ensemble Kalman filtering with residual nudging

KAUST Repository

Luo, X.; Hoteit, Ibrahim

2012-01-01

Covariance inflation and localisation are two important techniques that are used to improve the performance of the ensemble Kalman filter (EnKF) by (in effect) adjusting the sample covariances of the estimates in the state space. In this work
Pairagon+N-SCAN_EST: a model-based gene annotation pipeline

DEFF Research Database (Denmark)

Arumugam, Manimozhiyan; Wei, Chaochun; Brown, Randall H

2006-01-01

This paper describes Pairagon+N-SCAN_EST, a gene annotation pipeline that uses only native alignments. For each expressed sequence it chooses the best genomic alignment. Systems like ENSEMBL and ExoGean rely on trans alignments, in which expressed sequences are aligned to the genomic loci...... with de novo gene prediction by using N-SCAN_EST. N-SCAN_EST is based on a generalized HMM probability model augmented with a phylogenetic conservation model and EST alignments. It can predict complete transcripts by extending or merging EST alignments, but it can also predict genes in regions without EST...
Hartree and Exchange in Ensemble Density Functional Theory: Avoiding the Nonuniqueness Disaster.

Science.gov (United States)

Gould, Tim; Pittalis, Stefano

2017-12-15

Ensemble density functional theory is a promising method for the efficient and accurate calculation of excitations of quantum systems, at least if useful functionals can be developed to broaden its domain of practical applicability. Here, we introduce a guaranteed single-valued "Hartree-exchange" ensemble density functional, E_{Hx}[n], in terms of the right derivative of the universal ensemble density functional with respect to the coupling constant at vanishing interaction. We show that E_{Hx}[n] is straightforwardly expressible using block eigenvalues of a simple matrix [Eq. (14)]. Specialized expressions for E_{Hx}[n] from the literature, including those involving superpositions of Slater determinants, can now be regarded as originating from the unifying picture presented here. We thus establish a clear and practical description for Hartree and exchange in ensemble systems.
Interpolation of property-values between electron numbers is inconsistent with ensemble averaging

Energy Technology Data Exchange (ETDEWEB)

Miranda-Quintana, Ramón Alain [Laboratory of Computational and Theoretical Chemistry, Faculty of Chemistry, University of Havana, Havana (Cuba); Department of Chemistry and Chemical Biology, McMaster University, Hamilton, Ontario L8S 4M1 (Canada); Ayers, Paul W. [Department of Chemistry and Chemical Biology, McMaster University, Hamilton, Ontario L8S 4M1 (Canada)

2016-06-28

In this work we explore the physical foundations of models that study the variation of the ground state energy with respect to the number of electrons (E vs. N models), in terms of general grand-canonical (GC) ensemble formulations. In particular, we focus on E vs. N models that interpolate the energy between states with integer number of electrons. We show that if the interpolation of the energy corresponds to a GC ensemble, it is not differentiable. Conversely, if the interpolation is smooth, then it cannot be formulated as any GC ensemble. This proves that interpolation of electronic properties between integer electron numbers is inconsistent with any form of ensemble averaging. This emphasizes the role of derivative discontinuities and the critical role of a subsystem’s surroundings in determining its properties.
On the distribution of eigenvalues of certain matrix ensembles

International Nuclear Information System (INIS)

Bogomolny, E.; Bohigas, O.; Pato, M.P.

1995-01-01

Invariant random matrix ensembles with weak confinement potentials of the eigenvalues, corresponding to indeterminate moment problems, are investigated. These ensembles are characterized by the fact that the mean density of eigenvalues tends to a continuous function with increasing matrix dimension contrary to the usual cases where it grows indefinitely. It is demonstrated that the standard asymptotic formulae are not applicable in these cases and that the asymptotic distribution of eigenvalues can deviate from the classical ones. (author)

Developing of Thai Classical Music Ensemble in Rattanakosin Period

OpenAIRE

Pansak Vandee

2013-01-01

The research titled “Developing of Thai Classical Music Ensemble in Rattanakosin Period" aimed 1) to study the history of Thai Classical Music Ensemble in Rattanakosin Period and 2) to analyze changing in each period of Rattanakosin Era. This is the historical and documentary research. The data was collected by in-depth interview those musicians, and academic music experts and field study. The focus group discussion was conducted to analyze and conclude the findings. The research found that t...
Can decadal climate predictions be improved by ocean ensemble dispersion filtering?

Science.gov (United States)

Kadow, C.; Illing, S.; Kröner, I.; Ulbrich, U.; Cubasch, U.

2017-12-01

Decadal predictions by Earth system models aim to capture the state and phase of the climate several years inadvance. Atmosphere-ocean interaction plays an important role for such climate forecasts. While short-termweather forecasts represent an initial value problem and long-term climate projections represent a boundarycondition problem, the decadal climate prediction falls in-between these two time scales. The ocean memorydue to its heat capacity holds big potential skill on the decadal scale. In recent years, more precise initializationtechniques of coupled Earth system models (incl. atmosphere and ocean) have improved decadal predictions.Ensembles are another important aspect. Applying slightly perturbed predictions results in an ensemble. Insteadof using and evaluating one prediction, but the whole ensemble or its ensemble average, improves a predictionsystem. However, climate models in general start losing the initialized signal and its predictive skill from oneforecast year to the next. Here we show that the climate prediction skill of an Earth system model can be improvedby a shift of the ocean state toward the ensemble mean of its individual members at seasonal intervals. Wefound that this procedure, called ensemble dispersion filter, results in more accurate results than the standarddecadal prediction. Global mean and regional temperature, precipitation, and winter cyclone predictions showan increased skill up to 5 years ahead. Furthermore, the novel technique outperforms predictions with largerensembles and higher resolution. Our results demonstrate how decadal climate predictions benefit from oceanensemble dispersion filtering toward the ensemble mean. This study is part of MiKlip (fona-miklip.de) - a major project on decadal climate prediction in Germany.We focus on the Max-Planck-Institute Earth System Model using the low-resolution version (MPI-ESM-LR) andMiKlip's basic initialization strategy as in 2017 published decadal climate forecast: http
A genetic ensemble approach for gene-gene interaction identification

Directory of Open Access Journals (Sweden)

Ho Joshua WK

2010-10-01

Full Text Available Abstract Background It has now become clear that gene-gene interactions and gene-environment interactions are ubiquitous and fundamental mechanisms for the development of complex diseases. Though a considerable effort has been put into developing statistical models and algorithmic strategies for identifying such interactions, the accurate identification of those genetic interactions has been proven to be very challenging. Methods In this paper, we propose a new approach for identifying such gene-gene and gene-environment interactions underlying complex diseases. This is a hybrid algorithm and it combines genetic algorithm (GA and an ensemble of classifiers (called genetic ensemble. Using this approach, the original problem of SNP interaction identification is converted into a data mining problem of combinatorial feature selection. By collecting various single nucleotide polymorphisms (SNP subsets as well as environmental factors generated in multiple GA runs, patterns of gene-gene and gene-environment interactions can be extracted using a simple combinatorial ranking method. Also considered in this study is the idea of combining identification results obtained from multiple algorithms. A novel formula based on pairwise double fault is designed to quantify the degree of complementarity. Conclusions Our simulation study demonstrates that the proposed genetic ensemble algorithm has comparable identification power to Multifactor Dimensionality Reduction (MDR and is slightly better than Polymorphism Interaction Analysis (PIA, which are the two most popular methods for gene-gene interaction identification. More importantly, the identification results generated by using our genetic ensemble algorithm are highly complementary to those obtained by PIA and MDR. Experimental results from our simulation studies and real world data application also confirm the effectiveness of the proposed genetic ensemble algorithm, as well as the potential benefits of
Ensemble modeling for aromatic production in Escherichia coli.

Directory of Open Access Journals (Sweden)

Matthew L Rizk

2009-09-01

Full Text Available Ensemble Modeling (EM is a recently developed method for metabolic modeling, particularly for utilizing the effect of enzyme tuning data on the production of a specific compound to refine the model. This approach is used here to investigate the production of aromatic products in Escherichia coli. Instead of using dynamic metabolite data to fit a model, the EM approach uses phenotypic data (effects of enzyme overexpression or knockouts on the steady state production rate to screen possible models. These data are routinely generated during strain design. An ensemble of models is constructed that all reach the same steady state and are based on the same mechanistic framework at the elementary reaction level. The behavior of the models spans the kinetics allowable by thermodynamics. Then by using existing data from the literature for the overexpression of genes coding for transketolase (Tkt, transaldolase (Tal, and phosphoenolpyruvate synthase (Pps to screen the ensemble, we arrive at a set of models that properly describes the known enzyme overexpression phenotypes. This subset of models becomes more predictive as additional data are used to refine the models. The final ensemble of models demonstrates the characteristic of the cell that Tkt is the first rate controlling step, and correctly predicts that only after Tkt is overexpressed does an increase in Pps increase the production rate of aromatics. This work demonstrates that EM is able to capture the result of enzyme overexpression on aromatic producing bacteria by successfully utilizing routinely generated enzyme tuning data to guide model learning.
A New Method for Determining Structure Ensemble: Application to a RNA Binding Di-Domain Protein.

Science.gov (United States)

Liu, Wei; Zhang, Jingfeng; Fan, Jing-Song; Tria, Giancarlo; Grüber, Gerhard; Yang, Daiwen

2016-05-10

Structure ensemble determination is the basis of understanding the structure-function relationship of a multidomain protein with weak domain-domain interactions. Paramagnetic relaxation enhancement has been proven a powerful tool in the study of structure ensembles, but there exist a number of challenges such as spin-label flexibility, domain dynamics, and overfitting. Here we propose a new (to our knowledge) method to describe structure ensembles using a minimal number of conformers. In this method, individual domains are considered rigid; the position of each spin-label conformer and the structure of each protein conformer are defined by three and six orthogonal parameters, respectively. First, the spin-label ensemble is determined by optimizing the positions and populations of spin-label conformers against intradomain paramagnetic relaxation enhancements with a genetic algorithm. Subsequently, the protein structure ensemble is optimized using a more efficient genetic algorithm-based approach and an overfitting indicator, both of which were established in this work. The method was validated using a reference ensemble with a set of conformers whose populations and structures are known. This method was also applied to study the structure ensemble of the tandem di-domain of a poly (U) binding protein. The determined ensemble was supported by small-angle x-ray scattering and nuclear magnetic resonance relaxation data. The ensemble obtained suggests an induced fit mechanism for recognition of target RNA by the protein. Copyright © 2016 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Data Pre-Analysis and Ensemble of Various Artificial Neural Networks for Monthly Streamflow Forecasting

Directory of Open Access Journals (Sweden)

Jianzhong Zhou

2018-05-01

Full Text Available This paper introduces three artificial neural network (ANN architectures for monthly streamflow forecasting: a radial basis function network, an extreme learning machine, and the Elman network. Three ensemble techniques, a simple average ensemble, a weighted average ensemble, and an ANN-based ensemble, were used to combine the outputs of the individual ANN models. The objective was to highlight the performance of the general regression neural network-based ensemble technique (GNE through an improvement of monthly streamflow forecasting accuracy. Before the construction of an ANN model, data preanalysis techniques, such as empirical wavelet transform (EWT, were exploited to eliminate the oscillations of the streamflow series. Additionally, a theory of chaos phase space reconstruction was used to select the most relevant and important input variables for forecasting. The proposed GNE ensemble model has been applied for the mean monthly streamflow observation data from the Wudongde hydrological station in the Jinsha River Basin, China. Comparisons and analysis of this study have demonstrated that the denoised streamflow time series was less disordered and unsystematic than was suggested by the original time series according to chaos theory. Thus, EWT can be adopted as an effective data preanalysis technique for the prediction of monthly streamflow. Concurrently, the GNE performed better when compared with other ensemble techniques.
Distinct contributions of attention and working memory to visual statistical learning and ensemble processing.

Science.gov (United States)

Hall, Michelle G; Mattingley, Jason B; Dux, Paul E

2015-08-01

The brain exploits redundancies in the environment to efficiently represent the complexity of the visual world. One example of this is ensemble processing, which provides a statistical summary of elements within a set (e.g., mean size). Another is statistical learning, which involves the encoding of stable spatial or temporal relationships between objects. It has been suggested that ensemble processing over arrays of oriented lines disrupts statistical learning of structure within the arrays (Zhao, Ngo, McKendrick, & Turk-Browne, 2011). Here we asked whether ensemble processing and statistical learning are mutually incompatible, or whether this disruption might occur because ensemble processing encourages participants to process the stimulus arrays in a way that impedes statistical learning. In Experiment 1, we replicated Zhao and colleagues' finding that ensemble processing disrupts statistical learning. In Experiments 2 and 3, we found that statistical learning was unimpaired by ensemble processing when task demands necessitated (a) focal attention to individual items within the stimulus arrays and (b) the retention of individual items in working memory. Together, these results are consistent with an account suggesting that ensemble processing and statistical learning can operate over the same stimuli given appropriate stimulus processing demands during exposure to regularities. (c) 2015 APA, all rights reserved).
An engineering approach to extending lifespan in C. elegans.

Directory of Open Access Journals (Sweden)

Dror Sagi

Full Text Available We have taken an engineering approach to extending the lifespan of Caenorhabditis elegans. Aging stands out as a complex trait, because events that occur in old animals are not under strong natural selection. As a result, lifespan can be lengthened rationally using bioengineering to modulate gene expression or to add exogenous components. Here, we engineered longer lifespan by expressing genes from zebrafish encoding molecular functions not normally present in worms. Additionally, we extended lifespan by increasing the activity of four endogenous worm aging pathways. Next, we used a modular approach to extend lifespan by combining components. Finally, we used cell- and worm-based assays to analyze changes in cell physiology and as a rapid means to evaluate whether multi-component transgenic lines were likely to have extended longevity. Using engineering to add novel functions and to tune endogenous functions provides a new framework for lifespan extension that goes beyond the constraints of the worm genome.
Evaluation of bias-correction methods for ensemble streamflow volume forecasts

Directory of Open Access Journals (Sweden)

T. Hashino

2007-01-01

Full Text Available Ensemble prediction systems are used operationally to make probabilistic streamflow forecasts for seasonal time scales. However, hydrological models used for ensemble streamflow prediction often have simulation biases that degrade forecast quality and limit the operational usefulness of the forecasts. This study evaluates three bias-correction methods for ensemble streamflow volume forecasts. All three adjust the ensemble traces using a transformation derived with simulated and observed flows from a historical simulation. The quality of probabilistic forecasts issued when using the three bias-correction methods is evaluated using a distributions-oriented verification approach. Comparisons are made of retrospective forecasts of monthly flow volumes for a north-central United States basin (Des Moines River, Iowa, issued sequentially for each month over a 48-year record. The results show that all three bias-correction methods significantly improve forecast quality by eliminating unconditional biases and enhancing the potential skill. Still, subtle differences in the attributes of the bias-corrected forecasts have important implications for their use in operational decision-making. Diagnostic verification distinguishes these attributes in a context meaningful for decision-making, providing criteria to choose among bias-correction methods with comparable skill.
Cluster Ensemble-Based Image Segmentation

Directory of Open Access Journals (Sweden)

Xiaoru Wang

2013-07-01

Full Text Available Image segmentation is the foundation of computer vision applications. In this paper, we propose a new cluster ensemble-based image segmentation algorithm, which overcomes several problems of traditional methods. We make two main contributions in this paper. First, we introduce the cluster ensemble concept to fuse the segmentation results from different types of visual features effectively, which can deliver a better final result and achieve a much more stable performance for broad categories of images. Second, we exploit the PageRank idea from Internet applications and apply it to the image segmentation task. This can improve the final segmentation results by combining the spatial information of the image and the semantic similarity of regions. Our experiments on four public image databases validate the superiority of our algorithm over conventional single type of feature or multiple types of features-based algorithms, since our algorithm can fuse multiple types of features effectively for better segmentation results. Moreover, our method is also proved to be very competitive in comparison with other state-of-the-art segmentation algorithms.
Stacking Ensemble Learning for Short-Term Electricity Consumption Forecasting

Directory of Open Access Journals (Sweden)

Federico Divina

2018-04-01

Full Text Available The ability to predict short-term electric energy demand would provide several benefits, both at the economic and environmental level. For example, it would allow for an efficient use of resources in order to face the actual demand, reducing the costs associated to the production as well as the emission of CO 2 . To this aim, in this paper we propose a strategy based on ensemble learning in order to tackle the short-term load forecasting problem. In particular, our approach is based on a stacking ensemble learning scheme, where the predictions produced by three base learning methods are used by a top level method in order to produce final predictions. We tested the proposed scheme on a dataset reporting the energy consumption in Spain over more than nine years. The obtained experimental results show that an approach for short-term electricity consumption forecasting based on ensemble learning can help in combining predictions produced by weaker learning methods in order to obtain superior results. In particular, the system produces a lower error with respect to the existing state-of-the art techniques used on the same dataset. More importantly, this case study has shown that using an ensemble scheme can achieve very accurate predictions, and thus that it is a suitable approach for addressing the short-term load forecasting problem.
Exploiting ensemble learning for automatic cataract detection and grading.

Science.gov (United States)

Yang, Ji-Jiang; Li, Jianqiang; Shen, Ruifang; Zeng, Yang; He, Jian; Bi, Jing; Li, Yong; Zhang, Qinyan; Peng, Lihui; Wang, Qing

2016-02-01

Cataract is defined as a lenticular opacity presenting usually with poor visual acuity. It is one of the most common causes of visual impairment worldwide. Early diagnosis demands the expertise of trained healthcare professionals, which may present a barrier to early intervention due to underlying costs. To date, studies reported in the literature utilize a single learning model for retinal image classification in grading cataract severity. We present an ensemble learning based approach as a means to improving diagnostic accuracy. Three independent feature sets, i.e., wavelet-, sketch-, and texture-based features, are extracted from each fundus image. For each feature set, two base learning models, i.e., Support Vector Machine and Back Propagation Neural Network, are built. Then, the ensemble methods, majority voting and stacking, are investigated to combine the multiple base learning models for final fundus image classification. Empirical experiments are conducted for cataract detection (two-class task, i.e., cataract or non-cataractous) and cataract grading (four-class task, i.e., non-cataractous, mild, moderate or severe) tasks. The best performance of the ensemble classifier is 93.2% and 84.5% in terms of the correct classification rates for cataract detection and grading tasks, respectively. The results demonstrate that the ensemble classifier outperforms the single learning model significantly, which also illustrates the effectiveness of the proposed approach. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Evaluation of stability of k-means cluster ensembles with respect to random initialization.

Science.gov (United States)

Kuncheva, Ludmila I; Vetrov, Dmitry P

2006-11-01

Many clustering algorithms, including cluster ensembles, rely on a random component. Stability of the results across different runs is considered to be an asset of the algorithm. The cluster ensembles considered here are based on k-means clusterers. Each clusterer is assigned a random target number of clusters, k and is started from a random initialization. Here, we use 10 artificial and 10 real data sets to study ensemble stability with respect to random k, and random initialization. The data sets were chosen to have a small number of clusters (two to seven) and a moderate number of data points (up to a few hundred). Pairwise stability is defined as the adjusted Rand index between pairs of clusterers in the ensemble, averaged across all pairs. Nonpairwise stability is defined as the entropy of the consensus matrix of the ensemble. An experimental comparison with the stability of the standard k-means algorithm was carried out for k from 2 to 20. The results revealed that ensembles are generally more stable, markedly so for larger k. To establish whether stability can serve as a cluster validity index, we first looked at the relationship between stability and accuracy with respect to the number of clusters, k. We found that such a relationship strongly depends on the data set, varying from almost perfect positive correlation (0.97, for the glass data) to almost perfect negative correlation (-0.93, for the crabs data). We propose a new combined stability index to be the sum of the pairwise individual and ensemble stabilities. This index was found to correlate better with the ensemble accuracy. Following the hypothesis that a point of stability of a clustering algorithm corresponds to a structure found in the data, we used the stability measures to pick the number of clusters. The combined stability index gave best results.
Charge transfer excitations from exact and approximate ensemble Kohn-Sham theory

Science.gov (United States)

Gould, Tim; Kronik, Leeor; Pittalis, Stefano

2018-05-01

By studying the lowest excitations of an exactly solvable one-dimensional soft-Coulomb molecular model, we show that components of Kohn-Sham ensembles can be used to describe charge transfer processes. Furthermore, we compute the approximate excitation energies obtained by using the exact ensemble densities in the recently formulated ensemble Hartree-exchange theory [T. Gould and S. Pittalis, Phys. Rev. Lett. 119, 243001 (2017)]. Remarkably, our results show that triplet excitations are accurately reproduced across a dissociation curve in all cases tested, even in systems where ground state energies are poor due to strong static correlations. Singlet excitations exhibit larger deviations from exact results but are still reproduced semi-quantitatively.
A database and API for variation, dense genotyping and resequencing data

Directory of Open Access Journals (Sweden)

Flicek Paul

2010-05-01

Full Text Available Abstract Background Advances in sequencing and genotyping technologies are leading to the widespread availability of multi-species variation data, dense genotype data and large-scale resequencing projects. The 1000 Genomes Project and similar efforts in other species are challenging the methods previously used for storage and manipulation of such data necessitating the redesign of existing genome-wide bioinformatics resources. Results Ensembl has created a database and software library to support data storage, analysis and access to the existing and emerging variation data from large mammalian and vertebrate genomes. These tools scale to thousands of individual genome sequences and are integrated into the Ensembl infrastructure for genome annotation and visualisation. The database and software system is easily expanded to integrate both public and non-public data sources in the context of an Ensembl software installation and is already being used outside of the Ensembl project in a number of database and application environments. Conclusions Ensembl's powerful, flexible and open source infrastructure for the management of variation, genotyping and resequencing data is freely available at http://www.ensembl.org.
Genome edited animals: Learning from GM crops?

Science.gov (United States)

Bruce, Ann

2017-06-01

Genome editing of livestock is poised to become commercial reality, yet questions remain as to appropriate regulation, potential impact on the industry sector and public acceptability of products. This paper looks at how genome editing of livestock has attempted to learn some of the lessons from commercialisation of GM crops, and takes a systemic approach to explore some of the complexity and ambiguity in incorporating genome edited animals in a food production system. Current applications of genome editing are considered, viewed from the perspective of past technological applications. The question of what is genome editing, and can it be considered natural is examined. The implications of regulation on development of different sectors of livestock production systems are studied, with a particular focus on the veterinary sector. From an EU perspective, regulation of genome edited animals, although not necessarily the same as for GM crops, is advocated from a number of different perspectives. This paper aims to open up new avenues of research on genome edited animals, extending from the current primary focus on science and regulation, to engage with a wider-range of food system actors.
‘Which-way’ collective atomic spin excitation among atomic ensembles by photon indistinguishability

International Nuclear Information System (INIS)

Zhang Guowan; Bian Chenglin; Chen, L Q; Ou, Z Y; Zhang Weiping

2012-01-01

In spontaneous Raman scattering in an atomic ensemble, a collective atomic spin wave is created in correlation with the Stokes field. When the Stokes photons from two or more such atomic ensembles are made indistinguishable, a ‘which-way’ collective atomic spin excitation is generated among the independent atomic ensembles. We demonstrate this phenomenon experimentally by reading out the atomic spin excitations and observing interference between the read-out beams. When a single-photon projective measurement is made on the indistinguishable Stokes photons, this simple scheme can be used to entangle independent atomic ensembles. Compared to other currently used methods, this scheme can be easily scaled up and has greater efficiency. (paper)
Potentialities of ensemble strategies for flood forecasting over the Milano urban area

Science.gov (United States)

Ravazzani, Giovanni; Amengual, Arnau; Ceppi, Alessandro; Homar, Víctor; Romero, Romu; Lombardi, Gabriele; Mancini, Marco

2016-08-01

Analysis of ensemble forecasting strategies, which can provide a tangible backing for flood early warning procedures and mitigation measures over the Mediterranean region, is one of the fundamental motivations of the international HyMeX programme. Here, we examine two severe hydrometeorological episodes that affected the Milano urban area and for which the complex flood protection system of the city did not completely succeed. Indeed, flood damage have exponentially increased during the last 60 years, due to industrial and urban developments. Thus, the improvement of the Milano flood control system needs a synergism between structural and non-structural approaches. First, we examine how land-use changes due to urban development have altered the hydrological response to intense rainfalls. Second, we test a flood forecasting system which comprises the Flash-flood Event-based Spatially distributed rainfall-runoff Transformation, including Water Balance (FEST-WB) and the Weather Research and Forecasting (WRF) models. Accurate forecasts of deep moist convection and extreme precipitation are difficult to be predicted due to uncertainties arising from the numeric weather prediction (NWP) physical parameterizations and high sensitivity to misrepresentation of the atmospheric state; however, two hydrological ensemble prediction systems (HEPS) have been designed to explicitly cope with uncertainties in the initial and lateral boundary conditions (IC/LBCs) and physical parameterizations of the NWP model. No substantial differences in skill have been found between both ensemble strategies when considering an enhanced diversity of IC/LBCs for the perturbed initial conditions ensemble. Furthermore, no additional benefits have been found by considering more frequent LBCs in a mixed physics ensemble, as ensemble spread seems to be reduced. These findings could help to design the most appropriate ensemble strategies before these hydrometeorological extremes, given the computational
Coherent Rabi Dynamics of a Superradiant Spin Ensemble in a Microwave Cavity

Science.gov (United States)

Rose, B. C.; Tyryshkin, A. M.; Riemann, H.; Abrosimov, N. V.; Becker, P.; Pohl, H.-J.; Thewalt, M. L. W.; Itoh, K. M.; Lyon, S. A.

2017-07-01

We achieve the strong-coupling regime between an ensemble of phosphorus donor spins in a highly enriched 28Si crystal and a 3D dielectric resonator. Spins are polarized beyond Boltzmann equilibrium using spin-selective optical excitation of the no-phonon bound exciton transition resulting in N =3.6 ×1 013 unpaired spins in the ensemble. We observe a normal mode splitting of the spin-ensemble-cavity polariton resonances of 2 g √{N }=580 kHz (where each spin is coupled with strength g ) in a cavity with a quality factor of 75 000 (γ ≪κ ≈60 kHz , where γ and κ are the spin dephasing and cavity loss rates, respectively). The spin ensemble has a long dephasing time (T2*=9 μ s ) providing a wide window for viewing the dynamics of the coupled spin-ensemble-cavity system. The free-induction decay shows up to a dozen collapses and revivals revealing a coherent exchange of excitations between the superradiant state of the spin ensemble and the cavity at the rate g √{N }. The ensemble is found to evolve as a single large pseudospin according to the Tavis-Cummings model due to minimal inhomogeneous broadening and uniform spin-cavity coupling. We demonstrate independent control of the total spin and the initial Z projection of the psuedospin using optical excitation and microwave manipulation, respectively. We vary the microwave excitation power to rotate the pseudospin on the Bloch sphere and observe a long delay in the onset of the superradiant emission as the pseudospin approaches full inversion. This delay is accompanied by an abrupt π -phase shift in the peusdospin microwave emission. The scaling of this delay with the initial angle and the sudden phase shift are explained by the Tavis-Cummings model.
Development of Super-Ensemble techniques for ocean analyses: the Mediterranean Sea case

Science.gov (United States)

Pistoia, Jenny; Pinardi, Nadia; Oddo, Paolo; Collins, Matthew; Korres, Gerasimos; Drillet, Yann

2017-04-01

Short-term ocean analyses for Sea Surface Temperature SST in the Mediterranean Sea can be improved by a statistical post-processing technique, called super-ensemble. This technique consists in a multi-linear regression algorithm applied to a Multi-Physics Multi-Model Super-Ensemble (MMSE) dataset, a collection of different operational forecasting analyses together with ad-hoc simulations produced by modifying selected numerical model parameterizations. A new linear regression algorithm based on Empirical Orthogonal Function filtering techniques is capable to prevent overfitting problems, even if best performances are achieved when we add correlation to the super-ensemble structure using a simple spatial filter applied after the linear regression. Our outcomes show that super-ensemble performances depend on the selection of an unbiased operator and the length of the learning period, but the quality of the generating MMSE dataset has the largest impact on the MMSE analysis Root Mean Square Error (RMSE) evaluated with respect to observed satellite SST. Lower RMSE analysis estimates result from the following choices: 15 days training period, an overconfident MMSE dataset (a subset with the higher quality ensemble members), and the least square algorithm being filtered a posteriori.

Fidelity estimation between two finite ensembles of unknown pure equatorial qubit states

Energy Technology Data Exchange (ETDEWEB)

Siomau, Michael, E-mail: siomau@physi.uni-heidelberg.de [Physikalisches Institut, Heidelberg Universitaet, D-69120 Heidelberg (Germany); Department of Theoretical Physics, Belarussian State University, 220030 Minsk (Belarus)

2011-09-05

Suppose, we are given two finite ensembles of pure qubit states, so that the qubits in each ensemble are prepared in identical (but unknown for us) states lying on the equator of the Bloch sphere. What is the best strategy to estimate fidelity between these two finite ensembles of qubit states? We discuss three possible strategies for the fidelity estimation. We show that the best strategy includes two stages: a specific unitary transformation on two ensembles and state estimation of the output states of this transformation. -- Highlights: → We search for the best strategy for the fidelity estimation. → A measurement-based, a cloning-based and a unified strategies are considered. → The last strategy includes a specific unitary transformation and state estimation. → The unified strategy is shown to be the best among the three.
A new deterministic Ensemble Kalman Filter with one-step-ahead smoothing for storm surge forecasting

KAUST Repository

Raboudi, Naila

2016-01-01

KF-OSA exploits the observation twice. The incoming observation is first used to smooth the ensemble at the previous time step. The resulting smoothed ensemble is then integrated forward to compute a "pseudo forecast" ensemble, which is again updated with the same
Quark ensembles with the infinite correlation length

Science.gov (United States)

Zinov'ev, G. M.; Molodtsov, S. V.

2015-01-01

A number of exactly integrable (quark) models of quantum field theory with the infinite correlation length have been considered. It has been shown that the standard vacuum quark ensemble—Dirac sea (in the case of the space-time dimension higher than three)—is unstable because of the strong degeneracy of a state, which is due to the character of the energy distribution. When the momentum cutoff parameter tends to infinity, the distribution becomes infinitely narrow, leading to large (unlimited) fluctuations. Various vacuum ensembles—Dirac sea, neutral ensemble, color superconductor, and BCS state—have been compared. In the case of the color interaction between quarks, the BCS state has been certainly chosen as the ground state of the quark ensemble.
Quark ensembles with the infinite correlation length

International Nuclear Information System (INIS)

Zinov’ev, G. M.; Molodtsov, S. V.

2015-01-01

A number of exactly integrable (quark) models of quantum field theory with the infinite correlation length have been considered. It has been shown that the standard vacuum quark ensemble—Dirac sea (in the case of the space-time dimension higher than three)—is unstable because of the strong degeneracy of a state, which is due to the character of the energy distribution. When the momentum cutoff parameter tends to infinity, the distribution becomes infinitely narrow, leading to large (unlimited) fluctuations. Various vacuum ensembles—Dirac sea, neutral ensemble, color superconductor, and BCS state—have been compared. In the case of the color interaction between quarks, the BCS state has been certainly chosen as the ground state of the quark ensemble
Quark ensembles with the infinite correlation length

Energy Technology Data Exchange (ETDEWEB)

Zinov’ev, G. M. [National Academy of Sciences of Ukraine, Bogoliubov Institute for Theoretical Physics (Ukraine); Molodtsov, S. V., E-mail: molodtsov@itep.ru [Joint Institute for Nuclear Research (Russian Federation)

2015-01-15

A number of exactly integrable (quark) models of quantum field theory with the infinite correlation length have been considered. It has been shown that the standard vacuum quark ensemble—Dirac sea (in the case of the space-time dimension higher than three)—is unstable because of the strong degeneracy of a state, which is due to the character of the energy distribution. When the momentum cutoff parameter tends to infinity, the distribution becomes infinitely narrow, leading to large (unlimited) fluctuations. Various vacuum ensembles—Dirac sea, neutral ensemble, color superconductor, and BCS state—have been compared. In the case of the color interaction between quarks, the BCS state has been certainly chosen as the ground state of the quark ensemble.
Statistical ensembles and molecular dynamics studies of anisotropic solids. II

International Nuclear Information System (INIS)

Ray, J.R.; Rahman, A.

1985-01-01

We have recently discussed how the Parrinello--Rahman theory can be brought into accord with the theory of the elastic and thermodynamic behavior of anisotropic media. This involves the isoenthalpic--isotension ensemble of statistical mechanics. Nose has developed a canonical ensemble form of molecular dynamics. We combine Nose's ideas with the Parrinello--Rahman theory to obtain a canonical form of molecular dynamics appropriate to the study of anisotropic media subjected to arbitrary external stress. We employ this isothermal--isotension ensemble in a study of a fcc→ close-packed structural phase transformation in a Lennard-Jones solid subjected to uniaxial compression. Our interpretation of the Nose theory does not involve a scaling of the time variable. This latter fact leads to simplifications when studying the time dependence of quantities
Generation of Exotic Quantum States of a Cold Atomic Ensemble

DEFF Research Database (Denmark)

Christensen, Stefan Lund

Over the last decades quantum effects have become more and more controllable, leading to the implementations of various quantum information protocols. These protocols are all based on utilizing quantum correlation. In this thesis we consider how states of an atomic ensemble with such correlations...... can be created and characterized. First we consider a spin-squeezed state. This state is generated by performing quantum non-demolition measurements of the atomic population difference. We show a spectroscopically relevant noise reduction of -1.7dB, the ensemble is in a many-body entangled state...... — a nanofiber based light-atom interface. Using a dual-frequency probing method we measure and prepare an ensemble with a sub-Poissonian atom number distribution. This is a first step towards the implementation of more exotic quantum states....
Robustness of the far-field response of nonlocal plasmonic ensembles

DEFF Research Database (Denmark)

Tserkezis, Christos; Maack, Johan Rosenkrantz; Liu, Zhaowei

2016-01-01

Contrary to classical predictions, the optical response of few-nm plasmonic particles depends on particle size due to effects such as nonlocality and electron spill-out. Ensembles of such nanoparticles are therefore expected to exhibit a nonclassical inhomogeneous spectral broadening due to size...... distribution. For a normal distribution of free-electron nanoparticles, and within the simple nonlocal hydrodynamic Drude model, both the nonlocal blueshift and the plasmon linewidth are shown to be considerably affected by ensemble averaging. Size-variance effects tend however to conceal nonlocality...... to a lesser extent when the homogeneous size-dependent broadening of individual nanoparticles is taken into account, either through a local size-dependent damping model or through the Generalized Nonlocal Optical Response theory. The role of ensemble averaging is further explored in realistic distributions...
NCAR's Experimental Real-time Convection-allowing Ensemble Prediction System

Science.gov (United States)

Schwartz, C. S.; Romine, G. S.; Sobash, R.; Fossell, K.

2016-12-01

Since April 2015, the National Center for Atmospheric Research's (NCAR's) Mesoscale and Microscale Meteorology (MMM) Laboratory, in collaboration with NCAR's Computational Information Systems Laboratory (CISL), has been producing daily, real-time, 10-member, 48-hr ensemble forecasts with 3-km horizontal grid spacing over the conterminous United States (http://ensemble.ucar.edu). These computationally-intensive, next-generation forecasts are produced on the Yellowstone supercomputer, have been embraced by both amateur and professional weather forecasters, are widely used by NCAR and university researchers, and receive considerable attention on social media. Initial conditions are supplied by NCAR's Data Assimilation Research Testbed (DART) software and the forecast model is NCAR's Weather Research and Forecasting (WRF) model; both WRF and DART are community tools. This presentation will focus on cutting-edge research results leveraging the ensemble dataset, including winter weather predictability, severe weather forecasting, and power outage modeling. Additionally, the unique design of the real-time analysis and forecast system and computational challenges and solutions will be described.
Puzzling sequences: studying microbial genomes from 'Ötzi'

International Nuclear Information System (INIS)

Rattei, T.

2012-01-01

Ancient remains, and mummies in particular, are of central value for archaeological research. The Tyrolean iceman “Ötzi” was conserved in a glacier of the Ötztal Alps about 5000 years ago. Aside from morphological and phenotypical classification, the determination of DNA sequences and the subsequent genome analyses have been first applied to mitochondrial DNA and then been extended to genomic DNA. Typically also ancient microbial DNA is sequenced. These sequences allow the identification of pathogens as well as studying the evolution of microorganisms. The talk will explain the metagenomic aspects of the “Ötzi” genome project and discuss the first results. (author)
Network and Ensemble Enabled Entity Extraction in Informal Text (NEEEEIT) final report

Energy Technology Data Exchange (ETDEWEB)

Kegelmeyer, Philip W. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Shead, Timothy M. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Dunlavy, Daniel M. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

2013-09-01

This SAND report summarizes the activities and outcomes of the Network and Ensemble Enabled Entity Extraction in Information Text (NEEEEIT) LDRD project, which addressed improving the accuracy of conditional random fields for named entity recognition through the use of ensemble methods.
A Probabilistic Genome-Wide Gene Reading Frame Sequence Model

DEFF Research Database (Denmark)

Have, Christian Theil; Mørk, Søren

We introduce a new type of probabilistic sequence model, that model the sequential composition of reading frames of genes in a genome. Our approach extends gene finders with a model of the sequential composition of genes at the genome-level -- effectively producing a sequential genome annotation...... as output. The model can be used to obtain the most probable genome annotation based on a combination of i: a gene finder score of each gene candidate and ii: the sequence of the reading frames of gene candidates through a genome. The model --- as well as a higher order variant --- is developed and tested...... and are evaluated by the effect on prediction performance. Since bacterial gene finding to a large extent is a solved problem it forms an ideal proving ground for evaluating the explicit modeling of larger scale gene sequence composition of genomes. We conclude that the sequential composition of gene reading frames...
A Simple Ensemble Simulation Technique for Assessment of Future Variations in Specific High-Impact Weather Events

Science.gov (United States)

Taniguchi, Kenji

2018-04-01

To investigate future variations in high-impact weather events, numerous samples are required. For the detailed assessment in a specific region, a high spatial resolution is also required. A simple ensemble simulation technique is proposed in this paper. In the proposed technique, new ensemble members were generated from one basic state vector and two perturbation vectors, which were obtained by lagged average forecasting simulations. Sensitivity experiments with different numbers of ensemble members, different simulation lengths, and different perturbation magnitudes were performed. Experimental application to a global warming study was also implemented for a typhoon event. Ensemble-mean results and ensemble spreads of total precipitation, atmospheric conditions showed similar characteristics across the sensitivity experiments. The frequencies of the maximum total and hourly precipitation also showed similar distributions. These results indicate the robustness of the proposed technique. On the other hand, considerable ensemble spread was found in each ensemble experiment. In addition, the results of the application to a global warming study showed possible variations in the future. These results indicate that the proposed technique is useful for investigating various meteorological phenomena and the impacts of global warming. The results of the ensemble simulations also enable the stochastic evaluation of differences in high-impact weather events. In addition, the impacts of a spectral nudging technique were also examined. The tracks of a typhoon were quite different between cases with and without spectral nudging; however, the ranges of the tracks among ensemble members were comparable. It indicates that spectral nudging does not necessarily suppress ensemble spread.
Clusters of ancestrally related genes that show paralogy in whole or in part are a major feature of the genomes of humans and other species.

Directory of Open Access Journals (Sweden)

Michael B Walker

Full Text Available Arrangements of genes along chromosomes are a product of evolutionary processes, and we can expect that preferable arrangements will prevail over the span of evolutionary time, often being reflected in the non-random clustering of structurally and/or functionally related genes. Such non-random arrangements can arise by two distinct evolutionary processes: duplications of DNA sequences that give rise to clusters of genes sharing both sequence similarity and common sequence features and the migration together of genes related by function, but not by common descent. To provide a background for distinguishing between the two, which is important for future efforts to unravel the evolutionary processes involved, we here provide a description of the extent to which ancestrally related genes are found in proximity.Towards this purpose, we combined information from five genomic datasets, InterPro, SCOP, PANTHER, Ensembl protein families, and Ensembl gene paralogs. The results are provided in publicly available datasets (http://cgd.jax.org/datasets/clustering/paraclustering.shtml describing the extent to which ancestrally related genes are in proximity beyond what is expected by chance (i.e. form paraclusters in the human and nine other vertebrate genomes, as well as the D. melanogaster, C. elegans, A. thaliana, and S. cerevisiae genomes. With the exception of Saccharomyces, paraclusters are a common feature of the genomes we examined. In the human genome they are estimated to include at least 22% of all protein coding genes. Paraclusters are far more prevalent among some gene families than others, are highly species or clade specific and can evolve rapidly, sometimes in response to environmental cues. Altogether, they account for a large portion of the functional clustering previously reported in several genomes.
ENSEMBLE methods to reconcile disparate national long range dispersion forecasts

OpenAIRE

Mikkelsen, Torben; Galmarini, S.; Bianconi, R.; French, S.

2003-01-01

ENSEMBLE is a web-based decision support system for real-time exchange and evaluation of national long-range dispersion forecasts of nuclear releases with cross-boundary consequences. The system is developed with the purpose to reconcile among disparatenational forecasts for long-range dispersion. ENSEMBLE addresses the problem of achieving a common coherent strategy across European national emergency management when national long-range dispersion forecasts differ from one another during an a...
Good and Bad Neighborhood Approximations for Outlier Detection Ensembles

DEFF Research Database (Denmark)

Kirner, Evelyn; Schubert, Erich; Zimek, Arthur

2017-01-01

Outlier detection methods have used approximate neighborhoods in filter-refinement approaches. Outlier detection ensembles have used artificially obfuscated neighborhoods to achieve diverse ensemble members. Here we argue that outlier detection models could be based on approximate neighborhoods...... in the first place, thus gaining in both efficiency and effectiveness. It depends, however, on the type of approximation, as only some seem beneficial for the task of outlier detection, while no (large) benefit can be seen for others. In particular, we argue that space-filling curves are beneficial...
A short-range ensemble prediction system for southern Africa

CSIR Research Space (South Africa)

Park, R

2012-10-01

Full Text Available system for southern Africa R PARK, WA LANDMAN AND F ENGELBRECHT CSIR, PO Box 395, Pretoria, South Africa, 0001 Email: xxxxxxxxxxxxxx@csir.co.za ? www.csir.co.za INTRODUCTION This research has been conducted in order to develop a short-range ensemble... stream_source_info Park_2012.pdf.txt stream_content_type text/plain stream_size 7211 Content-Encoding ISO-8859-1 stream_name Park_2012.pdf.txt Content-Type text/plain; charset=ISO-8859-1 A short-range ensemble prediction...
Post-processing of multi-model ensemble river discharge forecasts using censored EMOS

Science.gov (United States)

Hemri, Stephan; Lisniak, Dmytro; Klein, Bastian

2014-05-01

When forecasting water levels and river discharge, ensemble weather forecasts are used as meteorological input to hydrologic process models. As hydrologic models are imperfect and the input ensembles tend to be biased and underdispersed, the output ensemble forecasts for river runoff typically are biased and underdispersed, too. Thus, statistical post-processing is required in order to achieve calibrated and sharp predictions. Standard post-processing methods such as Ensemble Model Output Statistics (EMOS) that have their origins in meteorological forecasting are now increasingly being used in hydrologic applications. Here we consider two sub-catchments of River Rhine, for which the forecasting system of the Federal Institute of Hydrology (BfG) uses runoff data that are censored below predefined thresholds. To address this methodological challenge, we develop a censored EMOS method that is tailored to such data. The censored EMOS forecast distribution can be understood as a mixture of a point mass at the censoring threshold and a continuous part based on a truncated normal distribution. Parameter estimates of the censored EMOS model are obtained by minimizing the Continuous Ranked Probability Score (CRPS) over the training dataset. Model fitting on Box-Cox transformed data allows us to take account of the positive skewness of river discharge distributions. In order to achieve realistic forecast scenarios over an entire range of lead-times, there is a need for multivariate extensions. To this end, we smooth the marginal parameter estimates over lead-times. In order to obtain realistic scenarios of discharge evolution over time, the marginal distributions have to be linked with each other. To this end, the multivariate dependence structure can either be adopted from the raw ensemble like in Ensemble Copula Coupling (ECC), or be estimated from observations in a training period. The censored EMOS model has been applied to multi-model ensemble forecasts issued on a
Distinguishing high and low flow domains in urban drainage systems 2 days ahead using numerical weather prediction ensembles

Science.gov (United States)

Courdent, Vianney; Grum, Morten; Mikkelsen, Peter Steen

2018-01-01

Precipitation constitutes a major contribution to the flow in urban storm- and wastewater systems. Forecasts of the anticipated runoff flows, created from radar extrapolation and/or numerical weather predictions, can potentially be used to optimize operation in both wet and dry weather periods. However, flow forecasts are inevitably uncertain and their use will ultimately require a trade-off between the value of knowing what will happen in the future and the probability and consequence of being wrong. In this study we examine how ensemble forecasts from the HIRLAM-DMI-S05 numerical weather prediction (NWP) model subject to three different ensemble post-processing approaches can be used to forecast flow exceedance in a combined sewer for a wide range of ratios between the probability of detection (POD) and the probability of false detection (POFD). We use a hydrological rainfall-runoff model to transform the forecasted rainfall into forecasted flow series and evaluate three different approaches to establishing the relative operating characteristics (ROC) diagram of the forecast, which is a plot of POD against POFD for each fraction of concordant ensemble members and can be used to select the weight of evidence that matches the desired trade-off between POD and POFD. In the first approach, the rainfall input to the model is calculated for each of 25 ensemble members as a weighted average of rainfall from the NWP cells over the catchment where the weights are proportional to the areal intersection between the catchment and the NWP cells. In the second approach, a total of 2825 flow ensembles are generated using rainfall input from the neighbouring NWP cells up to approximately 6 cells in all directions from the catchment. In the third approach, the first approach is extended spatially by successively increasing the area covered and for each spatial increase and each time step selecting only the cell with the highest intensity resulting in a total of 175 ensemble
SAChES: Scalable Adaptive Chain-Ensemble Sampling.

Energy Technology Data Exchange (ETDEWEB)

Swiler, Laura Painton [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Ray, Jaideep [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Ebeida, Mohamed Salah [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Huang, Maoyi [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Hou, Zhangshuan [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Bao, Jie [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Ren, Huiying [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

2017-08-01

We present the development of a parallel Markov Chain Monte Carlo (MCMC) method called SAChES, Scalable Adaptive Chain-Ensemble Sampling. This capability is targed to Bayesian calibration of com- putationally expensive simulation models. SAChES involves a hybrid of two methods: Differential Evo- lution Monte Carlo followed by Adaptive Metropolis. Both methods involve parallel chains. Differential evolution allows one to explore high-dimensional parameter spaces using loosely coupled (i.e., largely asynchronous) chains. Loose coupling allows the use of large chain ensembles, with far more chains than the number of parameters to explore. This reduces per-chain sampling burden, enables high-dimensional inversions and the use of computationally expensive forward models. The large number of chains can also ameliorate the impact of silent-errors, which may affect only a few chains. The chain ensemble can also be sampled to provide an initial condition when an aberrant chain is re-spawned. Adaptive Metropolis takes the best points from the differential evolution and efficiently hones in on the poste- rior density. The multitude of chains in SAChES is leveraged to (1) enable efficient exploration of the parameter space; and (2) ensure robustness to silent errors which may be unavoidable in extreme-scale computational platforms of the future. This report outlines SAChES, describes four papers that are the result of the project, and discusses some additional results.

Quasi-static ensemble variational data assimilation: a theoretical and numerical study with the iterative ensemble Kalman smoother

Science.gov (United States)

Fillion, Anthony; Bocquet, Marc; Gratton, Serge

2018-04-01

The analysis in nonlinear variational data assimilation is the solution of a non-quadratic minimization. Thus, the analysis efficiency relies on its ability to locate a global minimum of the cost function. If this minimization uses a Gauss-Newton (GN) method, it is critical for the starting point to be in the attraction basin of a global minimum. Otherwise the method may converge to a local extremum, which degrades the analysis. With chaotic models, the number of local extrema often increases with the temporal extent of the data assimilation window, making the former condition harder to satisfy. This is unfortunate because the assimilation performance also increases with this temporal extent. However, a quasi-static (QS) minimization may overcome these local extrema. It accomplishes this by gradually injecting the observations in the cost function. This method was introduced by Pires et al. (1996) in a 4D-Var context. We generalize this approach to four-dimensional strong-constraint nonlinear ensemble variational (EnVar) methods, which are based on both a nonlinear variational analysis and the propagation of dynamical error statistics via an ensemble. This forces one to consider the cost function minimizations in the broader context of cycled data assimilation algorithms. We adapt this QS approach to the iterative ensemble Kalman smoother (IEnKS), an exemplar of nonlinear deterministic four-dimensional EnVar methods. Using low-order models, we quantify the positive impact of the QS approach on the IEnKS, especially for long data assimilation windows. We also examine the computational cost of QS implementations and suggest cheaper algorithms.
IMG: the integrated microbial genomes database and comparative analysis system

Science.gov (United States)

Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Grechkin, Yuri; Ratner, Anna; Jacob, Biju; Huang, Jinghua; Williams, Peter; Huntemann, Marcel; Anderson, Iain; Mavromatis, Konstantinos; Ivanova, Natalia N.; Kyrpides, Nikos C.

2012-01-01

The Integrated Microbial Genomes (IMG) system serves as a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context. IMG integrates publicly available draft and complete genomes from all three domains of life with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and reviewing the annotations of genes and genomes in a comparative context. IMG's data content and analytical capabilities have been continuously extended through regular updates since its first release in March 2005. IMG is available at http://img.jgi.doe.gov. Companion IMG systems provide support for expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er), teaching courses and training in microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu) and analysis of genomes related to the Human Microbiome Project (IMG/HMP: http://www.hmpdacc-resources.org/img_hmp). PMID:22194640
Breaking of ensembles of linear and nonlinear oscillators

International Nuclear Information System (INIS)

Buts, V.A.

2016-01-01

Some results concerning the study of the dynamics of ensembles of linear and nonlinear oscillators are stated. It is shown that, in general, a stable ensemble of linear oscillator has a limited number of oscillators. This number has been defined for some simple models. It is shown that the features of the dynamics of linear oscillators can be used for conversion of the low-frequency energy oscillations into high frequency oscillations. The dynamics of coupled nonlinear oscillators in most cases is chaotic. For such a case, it is shown that the statistical characteristics (moments) of chaotic motion can significantly reduce potential barriers that keep the particles in the capture region
Force Sensor Based Tool Condition Monitoring Using a Heterogeneous Ensemble Learning Model

Directory of Open Access Journals (Sweden)

Guofeng Wang

2014-11-01

Full Text Available Tool condition monitoring (TCM plays an important role in improving machining efficiency and guaranteeing workpiece quality. In order to realize reliable recognition of the tool condition, a robust classifier needs to be constructed to depict the relationship between tool wear states and sensory information. However, because of the complexity of the machining process and the uncertainty of the tool wear evolution, it is hard for a single classifier to fit all the collected samples without sacrificing generalization ability. In this paper, heterogeneous ensemble learning is proposed to realize tool condition monitoring in which the support vector machine (SVM, hidden Markov model (HMM and radius basis function (RBF are selected as base classifiers and a stacking ensemble strategy is further used to reflect the relationship between the outputs of these base classifiers and tool wear states. Based on the heterogeneous ensemble learning classifier, an online monitoring system is constructed in which the harmonic features are extracted from force signals and a minimal redundancy and maximal relevance (mRMR algorithm is utilized to select the most prominent features. To verify the effectiveness of the proposed method, a titanium alloy milling experiment was carried out and samples with different tool wear states were collected to build the proposed heterogeneous ensemble learning classifier. Moreover, the homogeneous ensemble learning model and majority voting strategy are also adopted to make a comparison. The analysis and comparison results show that the proposed heterogeneous ensemble learning classifier performs better in both classification accuracy and stability.
Path planning in uncertain flow fields using ensemble method

KAUST Repository

Wang, Tong

2016-08-20

An ensemble-based approach is developed to conduct optimal path planning in unsteady ocean currents under uncertainty. We focus our attention on two-dimensional steady and unsteady uncertain flows, and adopt a sampling methodology that is well suited to operational forecasts, where an ensemble of deterministic predictions is used to model and quantify uncertainty. In an operational setting, much about dynamics, topography, and forcing of the ocean environment is uncertain. To address this uncertainty, the flow field is parametrized using a finite number of independent canonical random variables with known densities, and the ensemble is generated by sampling these variables. For each of the resulting realizations of the uncertain current field, we predict the path that minimizes the travel time by solving a boundary value problem (BVP), based on the Pontryagin maximum principle. A family of backward-in-time trajectories starting at the end position is used to generate suitable initial values for the BVP solver. This allows us to examine and analyze the performance of the sampling strategy and to develop insight into extensions dealing with general circulation ocean models. In particular, the ensemble method enables us to perform a statistical analysis of travel times and consequently develop a path planning approach that accounts for these statistics. The proposed methodology is tested for a number of scenarios. We first validate our algorithms by reproducing simple canonical solutions, and then demonstrate our approach in more complex flow fields, including idealized, steady and unsteady double-gyre flows.
An Effective and Novel Neural Network Ensemble for Shift Pattern Detection in Control Charts

Directory of Open Access Journals (Sweden)

Mahmoud Barghash

2015-01-01

Full Text Available Pattern recognition in control charts is critical to make a balance between discovering faults as early as possible and reducing the number of false alarms. This work is devoted to designing a multistage neural network ensemble that achieves this balance which reduces rework and scrape without reducing productivity. The ensemble under focus is composed of a series of neural network stages and a series of decision points. Initially, this work compared using multidecision points and single-decision point on the performance of the ANN which showed that multidecision points are highly preferable to single-decision points. This work also tested the effect of population percentages on the ANN and used this to optimize the ANN’s performance. Also this work used optimized and nonoptimized ANNs in an ensemble and proved that using nonoptimized ANN may reduce the performance of the ensemble. The ensemble that used only optimized ANNs has improved performance over individual ANNs and three-sigma level rule. In that respect using the designed ensemble can help in reducing the number of false stops and increasing productivity. It also can be used to discover even small shifts in the mean as early as possible.
Reliability of windstorm predictions in the ECMWF ensemble prediction system

Science.gov (United States)

Becker, Nico; Ulbrich, Uwe

2016-04-01

Windstorms caused by extratropical cyclones are one of the most dangerous natural hazards in the European region. Therefore, reliable predictions of such storm events are needed. Case studies have shown that ensemble prediction systems (EPS) are able to provide useful information about windstorms between two and five days prior to the event. In this work, ensemble predictions with the European Centre for Medium-Range Weather Forecasts (ECMWF) EPS are evaluated in a four year period. Within the 50 ensemble members, which are initialized every 12 hours and are run for 10 days, windstorms are identified and tracked in time and space. By using a clustering approach, different predictions of the same storm are identified in the different ensemble members and compared to reanalysis data. The occurrence probability of the predicted storms is estimated by fitting a bivariate normal distribution to the storm track positions. Our results show, for example, that predicted storm clusters with occurrence probabilities of more than 50% have a matching observed storm in 80% of all cases at a lead time of two days. The predicted occurrence probabilities are reliable up to 3 days lead time. At longer lead times the occurrence probabilities are overestimated by the EPS.
Improving wave forecasting by integrating ensemble modelling and machine learning

Science.gov (United States)

O'Donncha, F.; Zhang, Y.; James, S. C.

2017-12-01

Modern smart-grid networks use technologies to instantly relay information on supply and demand to support effective decision making. Integration of renewable-energy resources with these systems demands accurate forecasting of energy production (and demand) capacities. For wave-energy converters, this requires wave-condition forecasting to enable estimates of energy production. Current operational wave forecasting systems exhibit substantial errors with wave-height RMSEs of 40 to 60 cm being typical, which limits the reliability of energy-generation predictions thereby impeding integration with the distribution grid. In this study, we integrate physics-based models with statistical learning aggregation techniques that combine forecasts from multiple, independent models into a single "best-estimate" prediction of the true state. The Simulating Waves Nearshore physics-based model is used to compute wind- and currents-augmented waves in the Monterey Bay area. Ensembles are developed based on multiple simulations perturbing input data (wave characteristics supplied at the model boundaries and winds) to the model. A learning-aggregation technique uses past observations and past model forecasts to calculate a weight for each model. The aggregated forecasts are compared to observation data to quantify the performance of the model ensemble and aggregation techniques. The appropriately weighted ensemble model outperforms an individual ensemble member with regard to forecasting wave conditions.
Plasticity of the Binding Site of Renin: Optimized Selection of Protein Structures for Ensemble Docking.

Science.gov (United States)

Strecker, Claas; Meyer, Bernd

2018-05-02

Protein flexibility poses a major challenge to docking of potential ligands in that the binding site can adopt different shapes. Docking algorithms usually keep the protein rigid and only allow the ligand to be treated as flexible. However, a wrong assessment of the shape of the binding pocket can prevent a ligand from adapting a correct pose. Ensemble docking is a simple yet promising method to solve this problem: Ligands are docked into multiple structures, and the results are subsequently merged. Selection of protein structures is a significant factor for this approach. In this work we perform a comprehensive and comparative study evaluating the impact of structure selection on ensemble docking. We perform ensemble docking with several crystal structures and with structures derived from molecular dynamics simulations of renin, an attractive target for antihypertensive drugs. Here, 500 ns of MD simulations revealed binding site shapes not found in any available crystal structure. We evaluate the importance of structure selection for ensemble docking by comparing binding pose prediction, ability to rank actives above nonactives (screening utility), and scoring accuracy. As a result, for ensemble definition k-means clustering appears to be better suited than hierarchical clustering with average linkage. The best performing ensemble consists of four crystal structures and is able to reproduce the native ligand poses better than any individual crystal structure. Moreover this ensemble outperforms 88% of all individual crystal structures in terms of screening utility as well as scoring accuracy. Similarly, ensembles of MD-derived structures perform on average better than 75% of any individual crystal structure in terms of scoring accuracy at all inspected ensembles sizes.
The thermal insulation difference of clothing ensembles on the dry and perspiration manikins

International Nuclear Information System (INIS)

Xiaohong, Zhou; Chunqin, Zheng; Yingming, Qiang; Holmér, Ingvar; Gao, Chuansi; Kuklane, Kalev

2010-01-01

There are about a hundred manikin users around the world. Some of them use the manikin such as 'Walter' and 'Tore' to evaluate the comfort of clothing ensembles according to their thermal insulation and moisture resistance. A 'Walter' manikin is made of water and waterproof breathable fabric 'skin', which simulates the characteristics of human perspiration. So evaporation, condensation or sorption and desorption are always accompanied by heat transfer. A 'Tore' manikin only has dry heat exchange by conduction, radiation and convection from the manikin through clothing ensembles to environments. It is an ideal apparatus to measure the thermal insulation of the clothing ensemble and allows evaluation of thermal comfort. This paper compares thermal insulation measured with dry 'Tore' and sweating 'Walter' manikins. Clothing ensembles consisted of permeable and impermeable clothes. The results showed that the clothes covering the 'Walter' manikin absorbed the moisture evaporated from the manikin. When the moisture transferred through the permeable clothing ensembles, heat of condensation could be neglected. But it was observed that heavy condensation occurred if impermeable clothes were tested on the 'Walter' manikin. This resulted in a thermal insulation difference of clothing ensembles on the dry and perspiration manikins. The thermal insulation obtained from the 'Walter' manikin has to be modified when heavy condensation occurs. The modified equation is obtained in this study
Self Organizing Maps to efficiently cluster and functionally interpret protein conformational ensembles

Directory of Open Access Journals (Sweden)

Fabio Stella

2013-09-01

Full Text Available An approach that combines Self-Organizing maps, hierarchical clustering and network components is presented, aimed at comparing protein conformational ensembles obtained from multiple Molecular Dynamic simulations. As a first result the original ensembles can be summarized by using only the representative conformations of the clusters obtained. In addition the network components analysis allows to discover and interpret the dynamic behavior of the conformations won by each neuron. The results showed the ability of this approach to efficiently derive a functional interpretation of the protein dynamics described by the original conformational ensemble, highlighting its potential as a support for protein engineering.
Performance of a multi-RCM ensemble for South Eastern South America

Energy Technology Data Exchange (ETDEWEB)

Carril, A.F.; Menendez, C.G.; Salio, P. [Ciudad Universitaria, Ciudad Autonoma de Buenos Aires, Centro de Investigaciones del Mar y la Atmosfera (CIMA), CONICET-UBA, Buenos Aires (Argentina); Universidad de Buenos Aires, Departamento de Ciencias de la Atmosfera y los Oceanos (DCAO), FCEN, Buenos Aires (Argentina); UMI IFAECI/CNRS, Buenos Aires (Argentina); Remedio, A.R.C.; Jacob, D.; Pfeifer, S. [Max Planck Institute for Meteorology (MPI-M), Hamburg (Germany); Robledo, F.; Tencer, B. [Universidad de Buenos Aires, Departamento de Ciencias de la Atmosfera y los Oceanos (DCAO), FCEN, Buenos Aires (Argentina); Soerensson, A.; Zaninelli, P. [Ciudad Universitaria, Ciudad Autonoma de Buenos Aires, Centro de Investigaciones del Mar y la Atmosfera (CIMA), CONICET-UBA, Buenos Aires (Argentina); UMI IFAECI/CNRS, Buenos Aires (Argentina); Boulanger, J.P. [LOCEAN, UMR CNRS/IRD/UPMC, Paris (France); Castro, M. de; Sanchez, E. [Universidad de Castilla-La Mancha (UCLM), Toledo (Spain); Le Treut, H.; Li, L.Z.X. [Sciences de l' Environnement en Ile de France, Laboratoire de Meteorologie Dynamique (LMD), Institut-Pierre-Simon-Laplace et Ecole Doctorale, Paris (France); Penalba, O.; Rusticucci, M. [Universidad de Buenos Aires, Departamento de Ciencias de la Atmosfera y los Oceanos (DCAO), FCEN, Buenos Aires (Argentina); UMI IFAECI/CNRS, Buenos Aires (Argentina); Samuelsson, P. [Swedish Meteorological and Hydrological Institute (SMHI), Norrkoeping (Sweden)

2012-12-15

The ability of four regional climate models to reproduce the present-day South American climate is examined with emphasis on La Plata Basin. Models were integrated for the period 1991-2000 with initial and lateral boundary conditions from ERA-40 Reanalysis. The ensemble sea level pressure, maximum and minimum temperatures and precipitation are evaluated in terms of seasonal means and extreme indices based on a percentile approach. Dispersion among the individual models and uncertainties when comparing the ensemble mean with different climatologies are also discussed. The ensemble mean is warmer than the observations in South Eastern South America (SESA), especially for minimum winter temperatures with errors increasing in magnitude towards the tails of the distributions. The ensemble mean reproduces the broad spatial pattern of precipitation, but overestimates the convective precipitation in the tropics and the orographic precipitation along the Andes and over the Brazilian Highlands, and underestimates the precipitation near the monsoon core region. The models overestimate the number of wet days and underestimate the daily intensity of rainfall for both seasons suggesting a premature triggering of convection. The skill of models to simulate the intensity of convective precipitation in summer in SESA and the variability associated with heavy precipitation events (the upper quartile daily precipitation) is far from satisfactory. Owing to the sparseness of the observing network, ensemble and observations uncertainties in seasonal means are comparable for some regions and seasons. (orig.)
Data on genome analysis of Bacillus velezensis LS69.

Science.gov (United States)

Liu, Guoqiang; Kong, Yingying; Fan, Yajing; Geng, Ce; Peng, Donghai; Sun, Ming

2017-08-01

The data presented in this article are related to the published entitled "Whole-genome sequencing of Bacillus velezensis LS69, a strain with a broad inhibitory spectrum against pathogenic bacteria" (Liu et al., 2017) [1]. Genome analysis revealed B. velezensis LS69 has a good potential for biocontrol and plant growth promotion. This article provides an extended analysis of the genetic islands, core genes and amylolysin loci of B. velezensis LS69.
Data on genome analysis of Bacillus velezensis LS69

OpenAIRE

Liu, Guoqiang; Kong, Yingying; Fan, Yajing; Geng, Ce; Peng, Donghai; Sun, Ming

2017-01-01

The data presented in this article are related to the published entitled “Whole-genome sequencing of Bacillus velezensis LS69, a strain with a broad inhibitory spectrum against pathogenic bacteria” (Liu et al., 2017) [1]. Genome analysis revealed B. velezensis LS69 has a good potential for biocontrol and plant growth promotion. This article provides an extended analysis of the genetic islands, core genes and amylolysin loci of B. velezensis LS69.
ABCD of Beta Ensembles and Topological Strings

CERN Document Server

Krefl, Daniel

2012-01-01

We study beta-ensembles with Bn, Cn, and Dn eigenvalue measure and their relation with refined topological strings. Our results generalize the familiar connections between local topological strings and matrix models leading to An measure, and illustrate that all those classical eigenvalue ensembles, and their topological string counterparts, are related one to another via various deformations and specializations, quantum shifts and discrete quotients. We review the solution of the Gaussian models via Macdonald identities, and interpret them as conifold theories. The interpolation between the various models is plainly apparent in this case. For general polynomial potential, we calculate the partition function in the multi-cut phase in a perturbative fashion, beyond tree-level in the large-N limit. The relation to refined topological string orientifolds on the corresponding local geometry is discussed along the way.
Scrutinizing virus genome termini by high-throughput sequencing.

Directory of Open Access Journals (Sweden)

Shasha Li

Full Text Available Analysis of genomic terminal sequences has been a major step in studies on viral DNA replication and packaging mechanisms. However, traditional methods to study genome termini are challenging due to the time-consuming protocols and their inefficiency where critical details are lost easily. Recent advances in next generation sequencing (NGS have enabled it to be a powerful tool to study genome termini. In this study, using NGS we sequenced one iridovirus genome and twenty phage genomes and confirmed for the first time that the high frequency sequences (HFSs found in the NGS reads are indeed the terminal sequences of viral genomes. Further, we established a criterion to distinguish the type of termini and the viral packaging mode. We also obtained additional terminal details such as terminal repeats, multi-termini, asymmetric termini. With this approach, we were able to simultaneously detect details of the genome termini as well as obtain the complete sequence of bacteriophage genomes. Theoretically, this application can be further extended to analyze larger and more complicated genomes of plant and animal viruses. This study proposed a novel and efficient method for research on viral replication, packaging, terminase activity, transcription regulation, and metabolism of the host cell.
Big Data Analysis of Human Genome Variations

KAUST Repository

Gojobori, Takashi

2016-01-25

Since the human genome draft sequence was in public for the first time in 2000, genomic analyses have been intensively extended to the population level. The following three international projects are good examples for large-scale studies of human genome variations: 1) HapMap Data (1,417 individuals) (http://hapmap.ncbi.nlm.nih.gov/downloads/genotypes/2010-08_phaseII+III/forward/), 2) HGDP (Human Genome Diversity Project) Data (940 individuals) (http://www.hagsc.org/hgdp/files.html), 3) 1000 genomes Data (2,504 individuals) http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ If we can integrate all three data into a single volume of data, we should be able to conduct a more detailed analysis of human genome variations for a total number of 4,861 individuals (= 1,417+940+2,504 individuals). In fact, we successfully integrated these three data sets by use of information on the reference human genome sequence, and we conducted the big data analysis. In particular, we constructed a phylogenetic tree of about 5,000 human individuals at the genome level. As a result, we were able to identify clusters of ethnic groups, with detectable admixture, that were not possible by an analysis of each of the three data sets. Here, we report the outcome of this kind of big data analyses and discuss evolutionary significance of human genomic variations. Note that the present study was conducted in collaboration with Katsuhiko Mineta and Kosuke Goto at KAUST.
Real-Time Ensemble Forecasting of Coronal Mass Ejections Using the Wsa-Enlil+Cone Model

Science.gov (United States)

Mays, M. L.; Taktakishvili, A.; Pulkkinen, A. A.; Odstrcil, D.; MacNeice, P. J.; Rastaetter, L.; LaSota, J. A.

2014-12-01

Ensemble forecasting of coronal mass ejections (CMEs) provides significant information in that it provides an estimation of the spread or uncertainty in CME arrival time predictions. Real-time ensemble modeling of CME propagation is performed by forecasters at the Space Weather Research Center (SWRC) using the WSA-ENLIL+cone model available at the Community Coordinated Modeling Center (CCMC). To estimate the effect of uncertainties in determining CME input parameters on arrival time predictions, a distribution of n (routinely n=48) CME input parameter sets are generated using the CCMC Stereo CME Analysis Tool (StereoCAT) which employs geometrical triangulation techniques. These input parameters are used to perform n different simulations yielding an ensemble of solar wind parameters at various locations of interest, including a probability distribution of CME arrival times (for hits), and geomagnetic storm strength (for Earth-directed hits). We present the results of ensemble simulations for a total of 38 CME events in 2013-2014. For 28 of the ensemble runs containing hits, the observed CME arrival was within the range of ensemble arrival time predictions for 14 runs (half). The average arrival time prediction was computed for each of the 28 ensembles predicting hits and using the actual arrival time, an average absolute error of 10.0 hours (RMSE=11.4 hours) was found for all 28 ensembles, which is comparable to current forecasting errors. Some considerations for the accuracy of ensemble CME arrival time predictions include the importance of the initial distribution of CME input parameters, particularly the mean and spread. When the observed arrivals are not within the predicted range, this still allows the ruling out of prediction errors caused by tested CME input parameters. Prediction errors can also arise from ambient model parameters such as the accuracy of the solar wind background, and other limitations. Additionally the ensemble modeling sysem was used to
A new deterministic Ensemble Kalman Filter with one-step-ahead smoothing for storm surge forecasting

KAUST Repository

Raboudi, Naila

2016-11-01

The Ensemble Kalman Filter (EnKF) is a popular data assimilation method for state-parameter estimation. Following a sequential assimilation strategy, it breaks the problem into alternating cycles of forecast and analysis steps. In the forecast step, the dynamical model is used to integrate a stochastic sample approximating the state analysis distribution (called analysis ensemble) to obtain a forecast ensemble. In the analysis step, the forecast ensemble is updated with the incoming observation using a Kalman-like correction, which is then used for the next forecast step. In realistic large-scale applications, EnKFs are implemented with limited ensembles, and often poorly known model errors statistics, leading to a crude approximation of the forecast covariance. This strongly limits the filter performance. Recently, a new EnKF was proposed in [1] following a one-step-ahead smoothing strategy (EnKF-OSA), which involves an OSA smoothing of the state between two successive analysis. At each time step, EnKF-OSA exploits the observation twice. The incoming observation is first used to smooth the ensemble at the previous time step. The resulting smoothed ensemble is then integrated forward to compute a "pseudo forecast" ensemble, which is again updated with the same observation. The idea of constraining the state with future observations is to add more information in the estimation process in order to mitigate for the sub-optimal character of EnKF-like methods. The second EnKF-OSA "forecast" is computed from the smoothed ensemble and should therefore provide an improved background. In this work, we propose a deterministic variant of the EnKF-OSA, based on the Singular Evolutive Interpolated Ensemble Kalman (SEIK) filter. The motivation behind this is to avoid the observations perturbations of the EnKF in order to improve the scheme\\'s behavior when assimilating big data sets with small ensembles. The new SEIK-OSA scheme is implemented and its efficiency is demonstrated
Pre- and post-processing of hydro-meteorological ensembles for the Norwegian flood forecasting system in 145 basins.

Science.gov (United States)

Jahr Hegdahl, Trine; Steinsland, Ingelin; Merete Tallaksen, Lena; Engeland, Kolbjørn

2016-04-01

Probabilistic flood forecasting has an added value for decision making. The Norwegian flood forecasting service is based on a flood forecasting model that run for 145 basins. Covering all of Norway the basins differ in both size and hydrological regime. Currently the flood forecasting is based on deterministic meteorological forecasts, and an auto-regressive procedure is used to achieve probabilistic forecasts. An alternative approach is to use meteorological and hydrological ensemble forecasts to quantify the uncertainty in forecasted streamflow. The hydrological ensembles are based on forcing a hydrological model with meteorological ensemble forecasts of precipitation and temperature. However, the ensembles of precipitation are often biased and the spread is too small, especially for the shortest lead times, i.e. they are not calibrated. These properties will, to some extent, propagate to hydrological ensembles, that most likely will be uncalibrated as well. Pre- and post-processing methods are commonly used to obtain calibrated meteorological and hydrological ensembles respectively. Quantitative studies showing the effect of the combined processing of the meteorological (pre-processing) and the hydrological (post-processing) ensembles are however few. The aim of this study is to evaluate the influence of pre- and post-processing on the skill of streamflow predictions, and we will especially investigate if the forecasting skill depends on lead-time, basin size and hydrological regime. This aim is achieved by applying the 51 medium-range ensemble forecast of precipitation and temperature provided by the European Center of Medium-Range Weather Forecast (ECMWF). These ensembles are used as input to the operational Norwegian flood forecasting model, both raw and pre-processed. Precipitation ensembles are calibrated using a zero-adjusted gamma distribution. Temperature ensembles are calibrated using a Gaussian distribution and altitude corrected by a constant gradient

Ensemble Kalman Filter data assimilation and storm surge experiments of tropical cyclone Nargis

Directory of Open Access Journals (Sweden)

Le Duc

2015-07-01

Full Text Available Data assimilation experiments on Myanmar tropical cyclone (TC, Nargis, using the Local Ensemble Transform Kalman Filter (LETKF method and the Japan Meteorological Agency (JMA non-hydrostatic model (NHM were performed to examine the impact of LETKF on analysis performance in real cases. Although the LETKF control experiment using NHM as its driving model (NHM–LETKF produced a weak vortex, the subsequent 3-day forecast predicted Nargis’ track and intensity better than downscaling from JMA's global analysis. Some strategies to further improve the final analysis were considered. They were sea surface temperature (SST perturbations and assimilation of TC advisories. To address SST uncertainty, SST analyses issued by operational forecast centres were used in the assimilation window. The use of a fixed source of SST analysis for each ensemble member was more effective in practice. SST perturbations were found to have slightly positive impact on the track forecasts. Assimilation of TC advisories could have a positive impact with a reasonable choice of its free parameters. However, the TC track forecasts exhibited northward displacements, when the observation error of intensities was underestimated in assimilation of TC advisories. The use of assimilation of TC advisories was considered in the final NHM–LETKF by choosing an appropriate set of free parameters. The extended forecast based on the final analysis provided meteorological forcings for a storm surge simulation using the Princeton Ocean Model. Probabilistic forecasts of the water levels at Irrawaddy and Yangon significantly improved the results in the previous studies.
Ensemble hydro-meteorological forecasting for early warning of floods and scheduling of hydropower production

Science.gov (United States)

Solvang Johansen, Stian; Steinsland, Ingelin; Engeland, Kolbjørn

2016-04-01

Running hydrological models with precipitation and temperature ensemble forcing to generate ensembles of streamflow is a commonly used method in operational hydrology. Evaluations of streamflow ensembles have however revealed that the ensembles are biased with respect to both mean and spread. Thus postprocessing of the ensembles is needed in order to improve the forecast skill. The aims of this study is (i) to to evaluate how postprocessing of streamflow ensembles works for Norwegian catchments within different hydrological regimes and to (ii) demonstrate how post processed streamflow ensembles are used operationally by a hydropower producer. These aims were achieved by postprocessing forecasted daily discharge for 10 lead-times for 20 catchments in Norway by using EPS forcing from ECMWF applied the semi-distributed HBV-model dividing each catchment into 10 elevation zones. Statkraft Energi uses forecasts from these catchments for scheduling hydropower production. The catchments represent different hydrological regimes. Some catchments have stable winter condition with winter low flow and a major flood event during spring or early summer caused by snow melting. Others has a more mixed snow-rain regime, often with a secondary flood season during autumn, and in the coastal areas, the stream flow is dominated by rain, and the main flood season is autumn and winter. For post processing, a Bayesian model averaging model (BMA) close to (Kleiber et al 2011) is used. The model creates a predictive PDF that is a weighted average of PDFs centered on the individual bias corrected forecasts. The weights are here equal since all ensemble members come from the same model, and thus have the same probability. For modeling streamflow, the gamma distribution is chosen as a predictive PDF. The bias correction parameters and the PDF parameters are estimated using a 30-day sliding window training period. Preliminary results show that the improvement varies between catchments depending
Ensemble stacking mitigates biases in inference of synaptic connectivity

Directory of Open Access Journals (Sweden)

Brendan Chambers

2018-03-01

Full Text Available A promising alternative to directly measuring the anatomical connections in a neuronal population is inferring the connections from the activity. We employ simulated spiking neuronal networks to compare and contrast commonly used inference methods that identify likely excitatory synaptic connections using statistical regularities in spike timing. We find that simple adjustments to standard algorithms improve inference accuracy: A signing procedure improves the power of unsigned mutual-information-based approaches and a correction that accounts for differences in mean and variance of background timing relationships, such as those expected to be induced by heterogeneous firing rates, increases the sensitivity of frequency-based methods. We also find that different inference methods reveal distinct subsets of the synaptic network and each method exhibits different biases in the accurate detection of reciprocity and local clustering. To correct for errors and biases specific to single inference algorithms, we combine methods into an ensemble. Ensemble predictions, generated as a linear combination of multiple inference algorithms, are more sensitive than the best individual measures alone, and are more faithful to ground-truth statistics of connectivity, mitigating biases specific to single inference methods. These weightings generalize across simulated datasets, emphasizing the potential for the broad utility of ensemble-based approaches. Mapping the routing of spikes through local circuitry is crucial for understanding neocortical computation. Under appropriate experimental conditions, these maps can be used to infer likely patterns of synaptic recruitment, linking activity to underlying anatomical connections. Such inferences help to reveal the synaptic implementation of population dynamics and computation. We compare a number of standard functional measures to infer underlying connectivity. We find that regularization impacts measures
Stochastic Approaches Within a High Resolution Rapid Refresh Ensemble

Science.gov (United States)

Jankov, I.

2017-12-01

It is well known that global and regional numerical weather prediction (NWP) ensemble systems are under-dispersive, producing unreliable and overconfident ensemble forecasts. Typical approaches to alleviate this problem include the use of multiple dynamic cores, multiple physics suite configurations, or a combination of the two. While these approaches may produce desirable results, they have practical and theoretical deficiencies and are more difficult and costly to maintain. An active area of research that promotes a more unified and sustainable system is the use of stochastic physics. Stochastic approaches include Stochastic Parameter Perturbations (SPP), Stochastic Kinetic Energy Backscatter (SKEB), and Stochastic Perturbation of Physics Tendencies (SPPT). The focus of this study is to assess model performance within a convection-permitting ensemble at 3-km grid spacing across the Contiguous United States (CONUS) using a variety of stochastic approaches. A single physics suite configuration based on the operational High-Resolution Rapid Refresh (HRRR) model was utilized and ensemble members produced by employing stochastic methods. Parameter perturbations (using SPP) for select fields were employed in the Rapid Update Cycle (RUC) land surface model (LSM) and Mellor-Yamada-Nakanishi-Niino (MYNN) Planetary Boundary Layer (PBL) schemes. Within MYNN, SPP was applied to sub-grid cloud fraction, mixing length, roughness length, mass fluxes and Prandtl number. In the RUC LSM, SPP was applied to hydraulic conductivity and tested perturbing soil moisture at initial time. First iterative testing was conducted to assess the initial performance of several configuration settings (e.g. variety of spatial and temporal de-correlation lengths). Upon selection of the most promising candidate configurations using SPP, a 10-day time period was run and more robust statistics were gathered. SKEB and SPPT were included in additional retrospective tests to assess the impact of using
Virtual Genome Walking across the 32 Gb Ambystoma mexicanum genome; assembling gene models and intronic sequence.

Science.gov (United States)

Evans, Teri; Johnson, Andrew D; Loose, Matthew

2018-01-12

Large repeat rich genomes present challenges for assembly using short read technologies. The 32 Gb axolotl genome is estimated to contain ~19 Gb of repetitive DNA making an assembly from short reads alone effectively impossible. Indeed, this model species has been sequenced to 20× coverage but the reads could not be conventionally assembled. Using an alternative strategy, we have assembled subsets of these reads into scaffolds describing over 19,000 gene models. We call this method Virtual Genome Walking as it locally assembles whole genome reads based on a reference transcriptome, identifying exons and iteratively extending them into surrounding genomic sequence. These assemblies are then linked and refined to generate gene models including upstream and downstream genomic, and intronic, sequence. Our assemblies are validated by comparison with previously published axolotl bacterial artificial chromosome (BAC) sequences. Our analyses of axolotl intron length, intron-exon structure, repeat content and synteny provide novel insights into the genic structure of this model species. This resource will enable new experimental approaches in axolotl, such as ChIP-Seq and CRISPR and aid in future whole genome sequencing efforts. The assembled sequences and annotations presented here are freely available for download from https://tinyurl.com/y8gydc6n . The software pipeline is available from https://github.com/LooseLab/iterassemble .
Improved validation of IDP ensembles by one-bond Cα–Hα scalar couplings

Energy Technology Data Exchange (ETDEWEB)

Gapsys, Vytautas [Max Planck Institute for Biophysical Chemistry, Computational Biomolecular Dynamics Group (Germany); Narayanan, Raghavendran L.; Xiang, ShengQi [Max Planck Institute for Biophysical Chemistry, Department for NMR-Based Structural Biology (Germany); Groot, Bert L. de [Max Planck Institute for Biophysical Chemistry, Computational Biomolecular Dynamics Group (Germany); Zweckstetter, Markus, E-mail: markus.zweckstetter@dzne.de [Max Planck Institute for Biophysical Chemistry, Department for NMR-Based Structural Biology (Germany)

2015-11-15

Intrinsically disordered proteins (IDPs) are best described by ensembles of conformations and a variety of approaches have been developed to determine IDP ensembles. Because of the large number of conformations, however, cross-validation of the determined ensembles by independent experimental data is crucial. The {sup 1}J{sub CαHα} coupling constant is particularly suited for cross-validation, because it has a large magnitude and mostly depends on the often less accessible dihedral angle ψ. Here, we reinvestigated the connection between {sup 1}J{sub CαHα} values and protein backbone dihedral angles. We show that accurate amino-acid specific random coil values of the {sup 1}J{sub CαHα} coupling constant, in combination with a reparameterized empirical Karplus-type equation, allow for reliable cross-validation of molecular ensembles of IDPs.
MACC regional multi-model ensemble simulations of birch pollen dispersion in Europe

NARCIS (Netherlands)

Sofiev, M.; Berger, U.; Prank, M.; Vira, J.; Arteta, J.; Belmonte, J.; Bergmann, K.C.; Chéroux, F.; Elbern, H.; Friese, E.; Galan, C.; Gehrig, R.; Khvorostyanov, D.; Kranenburg, R.; Kumar, U.; Marécal, V.; Meleux, F.; Menut, L.; Pessi, A.M.; Robertson, L.; Ritenberga, O.; Rodinkova, V.; Saarto, A.; Segers, A.; Severova, E.; Sauliene, I.; Siljamo, P.; Steensen, B.M.; Teinemaa, E.; Thibaudon, M.; Peuch, V.H.

2015-01-01

This paper presents the first ensemble modelling experiment in relation to birch pollen in Europe. The seven-model European ensemble of MACC-ENS, tested in trial simulations over the flowering season of 2010, was run through the flowering season of 2013. The simulations have been compared with
Genome sequence of Shigella flexneri strain SP1, a diarrheal isolate that encodes an extended-spectrum β-lactamase (ESBL).

Science.gov (United States)

Shen, Ping; Fan, Jianzhong; Guo, Lihua; Li, Jiahua; Li, Ang; Zhang, Jing; Ying, Chaoqun; Ji, Jinru; Xu, Hao; Zheng, Beiwen; Xiao, Yonghong

2017-05-12

Shigellosis is the most common cause of gastrointestinal infections in developing countries. In China, the species most frequently responsible for shigellosis is Shigella flexneri. S. flexneri remains largely unexplored from a genomic standpoint and is still described using a vocabulary based on biochemical and serological properties. Moreover, increasing numbers of ESBL-producing Shigella strains have been isolated from clinical samples. Despite this, only a few cases of ESBL-producing Shigella have been described in China. Therefore, a better understanding of ESBL-producing Shigella from a genomic standpoint is required. In this study, a S. flexneri type 1a isolate SP1 harboring bla CTX-M-14 , which was recovered from the patient with diarrhea, was subjected to whole genome sequencing. The draft genome assembly of S. flexneri strain SP1 consisted of 4,592,345 bp with a G+C content of 50.46%. RAST analysis revealed the genome contained 4798 coding sequences (CDSs) and 100 RNA-encoding genes. We detected one incomplete prophage and six candidate CRISPR loci in the genome. In vitro antimicrobial susceptibility testing demonstrated that strain SP1 is resistant to ampicillin, amoxicillin/clavulanic acid, cefazolin, ceftriaxone and trimethoprim. In silico analysis detected genes mediating resistance to aminoglycosides, β-lactams, phenicol, tetracycline, sulphonamides, and trimethoprim. The bla CTX-M-14 gene was located on an IncFII2 plasmid. A series of virulence factors were identified in the genome. In this study, we report the whole genome sequence of a bla CTX-M-14 -encoding S. flexneri strain SP1. Dozens of resistance determinants were detected in the genome and may be responsible for the multidrug-resistance of this strain, although further confirmation studies are warranted. Numerous virulence factors identified in the strain suggest that isolate SP1 is potential pathogenic. The availability of the genome sequence and comparative analysis with other S
Impact of distributions on the archetypes and prototypes in heterogeneous nanoparticle ensembles.

Science.gov (United States)

Fernandez, Michael; Wilson, Hugh F; Barnard, Amanda S

2017-01-05

The magnitude and complexity of the structural and functional data available on nanomaterials requires data analytics, statistical analysis and information technology to drive discovery. We demonstrate that multivariate statistical analysis can recognise the sets of truly significant nanostructures and their most relevant properties in heterogeneous ensembles with different probability distributions. The prototypical and archetypal nanostructures of five virtual ensembles of Si quantum dots (SiQDs) with Boltzmann, frequency, normal, Poisson and random distributions are identified using clustering and archetypal analysis, where we find that their diversity is defined by size and shape, regardless of the type of distribution. At the complex hull of the SiQD ensembles, simple configuration archetypes can efficiently describe a large number of SiQDs, whereas more complex shapes are needed to represent the average ordering of the ensembles. This approach provides a route towards the characterisation of computationally intractable virtual nanomaterial spaces, which can convert big data into smart data, and significantly reduce the workload to simulate experimentally relevant virtual samples.
The First Complete Chloroplast Genome Sequences in Actinidiaceae: Genome Structure and Comparative Analysis.

Science.gov (United States)

Yao, Xiaohong; Tang, Ping; Li, Zuozhou; Li, Dawei; Liu, Yifei; Huang, Hongwen

2015-01-01

Actinidia chinensis is an important economic plant belonging to the basal lineage of the asterids. Availability of a complete Actinidia chloroplast genome sequence is crucial to understanding phylogenetic relationships among major lineages of angiosperms and facilitates kiwifruit genetic improvement. We report here the complete nucleotide sequences of the chloroplast genomes for Actinidia chinensis and A. chinensis var deliciosa obtained through de novo assembly of Illumina paired-end reads produced by total DNA sequencing. The total genome size ranges from 155,446 to 157,557 bp, with an inverted repeat (IR) of 24,013 to 24,391 bp, a large single copy region (LSC) of 87,984 to 88,337 bp and a small single copy region (SSC) of 20,332 to 20,336 bp. The genome encodes 113 different genes, including 79 unique protein-coding genes, 30 tRNA genes and 4 ribosomal RNA genes, with 16 duplicated in the inverted repeats, and a tRNA gene (trnfM-CAU) duplicated once in the LSC region. Comparisons of IR boundaries among four asterid species showed that IR/LSC borders were extended into the 5' portion of the psbA gene and IR contraction occurred in Actinidia. The clap gene has been lost from the chloroplast genome in Actinidia, and may have been transferred to the nucleus during chloroplast evolution. Twenty-seven polymorphic simple sequence repeat (SSR) loci were identified in the Actinidia chloroplast genome. Maximum parsimony analyses of a 72-gene, 16 taxa angiosperm dataset strongly support the placement of Actinidiaceae in Ericales within the basal asterids.
A Pseudoproxy-Ensemble Study of Late-Holocene Climate Field Reconstructions Using CCA

Science.gov (United States)

Amrhein, D. E.; Smerdon, J. E.

2009-12-01

Recent evaluations of late-Holocene multi-proxy reconstruction methods have used pseudoproxy experiments derived from millennial General Circulation Model (GCM) integrations. These experiments assess the performance of a reconstruction technique by comparing pseudoproxy reconstructions, which use restricted subsets of model data, against complete GCM data fields. Most previous studies have tested methodologies using different pseudoproxy noise levels, but only with single realizations for each noise classification. A more robust evaluation of performance is to create an ensemble of pseudoproxy networks with distinct sets of noise realizations and a corresponding reconstruction ensemble that can be evaluated for consistency and sensitivity to random error. This work investigates canonical correlation analysis (CCA) as a late-Holocene climate field reconstruction (CFR) technique using ensembles of pseudoproxy experiments derived from the NCAR CSM 1.4 millennial integration. Three 200-member reconstruction ensembles are computed using pseudoproxies with signal-to-noise ratios (by standard deviation) of 1, 0.5, and 0.25 and locations that approximate the spatial distribution of real-world multiproxy networks. An important component of these ensemble calculations is the independent optimization of the three CCA truncation parameters for each ensemble member. This task is accomplished using an inexpensive discrete optimization algorithm that minimizes both RMS error in the calibration interval and the number of free parameters in the reconstruction model to avoid artificial skill. Within this framework, CCA is investigated for its sensitivity to the level of noise in the pseudoproxy network and the spatial distribution of the network. Warm biases, variance losses, and validation-interval error increase with noise level and vary spatially within the reconstructed fields. Reconstruction skill, measured as grid-point correlations during the validation interval, is lowest in
Crossover between the Gaussian orthogonal ensemble, the Gaussian unitary ensemble, and Poissonian statistics.

Science.gov (United States)

Schweiner, Frank; Laturner, Jeanine; Main, Jörg; Wunner, Günter

2017-11-01

Until now only for specific crossovers between Poissonian statistics (P), the statistics of a Gaussian orthogonal ensemble (GOE), or the statistics of a Gaussian unitary ensemble (GUE) have analytical formulas for the level spacing distribution function been derived within random matrix theory. We investigate arbitrary crossovers in the triangle between all three statistics. To this aim we propose an according formula for the level spacing distribution function depending on two parameters. Comparing the behavior of our formula for the special cases of P→GUE, P→GOE, and GOE→GUE with the results from random matrix theory, we prove that these crossovers are described reasonably. Recent investigations by F. Schweiner et al. [Phys. Rev. E 95, 062205 (2017)2470-004510.1103/PhysRevE.95.062205] have shown that the Hamiltonian of magnetoexcitons in cubic semiconductors can exhibit all three statistics in dependence on the system parameters. Evaluating the numerical results for magnetoexcitons in dependence on the excitation energy and on a parameter connected with the cubic valence band structure and comparing the results with the formula proposed allows us to distinguish between regular and chaotic behavior as well as between existent or broken antiunitary symmetries. Increasing one of the two parameters, transitions between different crossovers, e.g., from the P→GOE to the P→GUE crossover, are observed and discussed.
Equipartition terms in transition path ensemble: Insights from molecular dynamics simulations of alanine dipeptide

Science.gov (United States)

Li, Wenjin

2018-02-01

Transition path ensemble consists of reactive trajectories and possesses all the information necessary for the understanding of the mechanism and dynamics of important condensed phase processes. However, quantitative description of the properties of the transition path ensemble is far from being established. Here, with numerical calculations on a model system, the equipartition terms defined in thermal equilibrium were for the first time estimated in the transition path ensemble. It was not surprising to observe that the energy was not equally distributed among all the coordinates. However, the energies distributed on a pair of conjugated coordinates remained equal. Higher energies were observed to be distributed on several coordinates, which are highly coupled to the reaction coordinate, while the rest were almost equally distributed. In addition, the ensemble-averaged energy on each coordinate as a function of time was also quantified. These quantitative analyses on energy distributions provided new insights into the transition path ensemble.
Microbial comparative pan-genomics using binomial mixture models

Directory of Open Access Journals (Sweden)

Ussery David W

2009-08-01

Full Text Available Abstract Background The size of the core- and pan-genome of bacterial species is a topic of increasing interest due to the growing number of sequenced prokaryote genomes, many from the same species. Attempts to estimate these quantities have been made, using regression methods or mixture models. We extend the latter approach by using statistical ideas developed for capture-recapture problems in ecology and epidemiology. Results We estimate core- and pan-genome sizes for 16 different bacterial species. The results reveal a complex dependency structure for most species, manifested as heterogeneous detection probabilities. Estimated pan-genome sizes range from small (around 2600 gene families in Buchnera aphidicola to large (around 43000 gene families in Escherichia coli. Results for Echerichia coli show that as more data become available, a larger diversity is estimated, indicating an extensive pool of rarely occurring genes in the population. Conclusion Analyzing pan-genomics data with binomial mixture models is a way to handle dependencies between genomes, which we find is always present. A bottleneck in the estimation procedure is the annotation of rarely occurring genes.
The Effects of Classical Guitar Ensembles on Student Self-Perceptions and Acquisition of Music Skills

Science.gov (United States)

Kramer, John R.

2012-01-01

Classical guitar ensembles are increasing in the United States as popular alternatives to band, choir, and orchestra. Classical guitar ensembles are offered at many middle and high schools as fine arts electives as one of the only options for classical guitarists to participate in ensembles. The purpose of this study was to explore the development…
A Comparative Case Study of Non-Music Major Participation in Two Contrasting Collegiate Choral Ensembles

Science.gov (United States)

Jones, Sara K.

2018-01-01

The purpose of this comparative case study was to examine the motivation for participation in traditional and non-traditional vocal ensembles by students who are not pursuing a career in music and the perceived benefits of this participation. Participants were selected from a traditional mixed choral ensemble and a student-run a cappella ensemble.…
Ensemble streamflow assimilation with the National Water Model.

Science.gov (United States)

Rafieeinasab, A.; McCreight, J. L.; Noh, S.; Seo, D. J.; Gochis, D.

2017-12-01

Through case studies of flooding across the US, we compare the performance of the National Water Model (NWM) data assimilation (DA) scheme to that of a newly implemented ensemble Kalman filter approach. The NOAA National Water Model (NWM) is an operational implementation of the community WRF-Hydro modeling system. As of August 2016, the NWM forecasts of distributed hydrologic states and fluxes (including soil moisture, snowpack, ET, and ponded water) over the contiguous United States have been publicly disseminated by the National Center for Environmental Prediction (NCEP) . It also provides streamflow forecasts at more than 2.7 million river reaches up to 30 days in advance. The NWM employs a nudging scheme to assimilate more than 6,000 USGS streamflow observations and provide initial conditions for its forecasts. A problem with nudging is how the forecasts relax quickly to open-loop bias in the forecast. This has been partially addressed by an experimental bias correction approach which was found to have issues with phase errors during flooding events. In this work, we present an ensemble streamflow data assimilation approach combining new channel-only capabilities of the NWM and HydroDART (a coupling of the offline WRF-Hydro model and NCAR's Data Assimilation Research Testbed; DART). Our approach focuses on the single model state of discharge and incorporates error distributions on channel-influxes (overland and groundwater) in the assimilation via an ensemble Kalman filter (EnKF). In order to avoid filter degeneracy associated with a limited number of ensemble at large scale, DART's covariance inflation (Anderson, 2009) and localization capabilities are implemented and evaluated. The current NWM data assimilation scheme is compared to preliminary results from the EnKF application for several flooding case studies across the US.
Demanding Epistemic Democracy and Indirect Civics Pedagogy: The Performance-Oriented Music Ensemble

Science.gov (United States)

Pyrcz, Greg; MacLean, Tessa; Hopkins, Mark

2017-01-01

The participation of young adults in performance-oriented music ensembles can be seen to enhance democratic capacities and virtues. Much, however, turns on the particular conception of democracy at work. Although contemporary currents in music education tend towards models of liberal and participatory democracy to govern music ensembles, this…
The Effect of Ensemble Performance Quality on the Evaluation of Conducting Expressivity

Science.gov (United States)

Silvey, Brian A.

2011-01-01

This study was designed to examine whether the presence of excellent or poor ensemble performances would influence the ratings assigned by ensemble members to conductors who demonstrated highly expressive conducting. Two conductors were videotaped conducting one of two excerpts from an arrangement of Frank Ticheli's "Loch Lomond." These videos…
Effective theory of the D = 3 center vortex ensemble

Science.gov (United States)

Oxman, L. E.; Reinhardt, H.

2018-03-01

By means of lattice calculations, center vortices have been established as the infrared dominant gauge field configurations of Yang-Mills theory. In this work, we investigate an ensemble of center vortices in D = 3 Euclidean space-time dimension where they form closed flux loops. To account for the properties of center vortices detected on the lattice, they are equipped with tension, stiffness and a repulsive contact interaction. The ensemble of oriented center vortices is then mapped onto an effective theory of a complex scalar field with a U(1) symmetry. For a positive tension, small vortex loops are favoured and the Wilson loop displays a perimeter law while for a negative tension, large loops dominate the ensemble. In this case the U(1) symmetry of the effective scalar field theory is spontaneously broken and the Wilson loop shows an area law. To account for the large quantum fluctuations of the corresponding Goldstone modes, we use a lattice representation, which results in an XY model with frustration, for which we also study the Villain approximation.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.