WorldWideScience

Sample records for accurate phylogenetic classification

  1. Accurate phylogenetic classification of DNA fragments based onsequence composition

    McHardy, Alice C.; Garcia Martin, Hector; Tsirigos, Aristotelis; Hugenholtz, Philip; Rigoutsos, Isidore

    2006-05-01

    Metagenome studies have retrieved vast amounts of sequenceout of a variety of environments, leading to novel discoveries and greatinsights into the uncultured microbial world. Except for very simplecommunities, diversity makes sequence assembly and analysis a verychallenging problem. To understand the structure a 5 nd function ofmicrobial communities, a taxonomic characterization of the obtainedsequence fragments is highly desirable, yet currently limited mostly tothose sequences that contain phylogenetic marker genes. We show that forclades at the rank of domain down to genus, sequence composition allowsthe very accurate phylogenetic 10 characterization of genomic sequence.We developed a composition-based classifier, PhyloPythia, for de novophylogenetic sequence characterization and have trained it on adata setof 340 genomes. By extensive evaluation experiments we show that themethodis accurate across all taxonomic ranks considered, even forsequences that originate fromnovel organisms and are as short as 1kb.Application to two metagenome datasets 15 obtained from samples ofphosphorus-removing sludge showed that the method allows the accurateclassification at genus level of most sequence fragments from thedominant populations, while at the same time correctly characterizingeven larger parts of the samples at higher taxonomic levels.

  2. Concepts of Classification and Taxonomy. Phylogenetic Classification

    Fraix-Burnet, Didier

    2016-01-01

    Phylogenetic approaches to classification have been heavily developed in biology by bioinformaticians. But these techniques have applications in other fields, in particular in linguistics. Their main characteristics is to search for relationships between the objects or species in study, instead of grouping them by similarity. They are thus rather well suited for any kind of evolutionary objects. For nearly fifteen years, astrocladistics has explored the use of Maximum Parsimony (or cladistics) for astronomical objects like galaxies or globular clusters. In this lesson we will learn how it works. 1 Why phylogenetic tools in astrophysics? 1.1 History of classification The need for classifying living organisms is very ancient, and the first classification system can be dated back to the Greeks. The goal was very practical since it was intended to distinguish between eatable and toxic aliments, or kind and dangerous animals. Simple resemblance was used and has been used for centuries. Basically, until the XVIIIth...

  3. Concepts of Classification and Taxonomy Phylogenetic Classification

    Fraix-Burnet, D.

    2016-05-01

    Phylogenetic approaches to classification have been heavily developed in biology by bioinformaticians. But these techniques have applications in other fields, in particular in linguistics. Their main characteristics is to search for relationships between the objects or species in study, instead of grouping them by similarity. They are thus rather well suited for any kind of evolutionary objects. For nearly fifteen years, astrocladistics has explored the use of Maximum Parsimony (or cladistics) for astronomical objects like galaxies or globular clusters. In this lesson we will learn how it works.

  4. Classification and Phylogenetics of Myxozoa

    Fiala, Ivan; Bartošová-Sojková, Pavla; Whipps, C. M.

    Cham: Springer International Publishing, 2015 - (Okamura, B.; Gruhl, A.; Bartholomew, J.), s. 85-110 ISBN 978-3-319-14752-9 Institutional support: RVO:60077344 Keywords : Taxonomy * Classification * Myxosporea * Actinosporea * Spore * Phylogeny Subject RIV: EG - Zoology

  5. ACCURATE TIME SERIES CLASSIFICATION USING SHAPELETS

    M. Arathi; A. GOVARDHAN

    2014-01-01

    Time series data are sequences of values measured o ver time. One of the most recent approaches to classification of time series data is to find shape lets within a data set. Time series shapelets are time series subsequences which represent a class. In order to compare two time series sequences, existing work use s Euclidean distance measure. The problem with Euclid ean distance is that it requires data to be standardized if scales ...

  6. Accurate Reconstruction of Insertion-Deletion Histories by Statistical Phylogenetics

    Westesson, O; Lunter, G.; Paten, B; Holmes, I

    2012-01-01

    The Multiple Sequence Alignment (MSA) is a computational abstraction that represents a partial summary either of indel history, or of structural similarity. Taking the former view (indel history), it is possible to use formal automata theory to generalize the phylogenetic likelihood framework for finite substitution models (Dayhoff's probability matrices and Felsenstein's pruning algorithm) to arbitrary-length sequences. In this paper, we report results of a simulation-based benchmark of seve...

  7. Accurate molecular classification of cancer using simple rules

    Gotoh Osamu; Wang Xiaosheng

    2009-01-01

    Abstract Background One intractable problem with using microarray data analysis for cancer classification is how to reduce the extremely high-dimensionality gene feature data to remove the effects of noise. Feature selection is often used to address this problem by selecting informative genes from among thousands or tens of thousands of genes. However, most of the existing methods of microarray-based cancer classification utilize too many genes to achieve accurate classification, which often ...

  8. Accurate reconstruction of insertion-deletion histories by statistical phylogenetics.

    Oscar Westesson

    Full Text Available The Multiple Sequence Alignment (MSA is a computational abstraction that represents a partial summary either of indel history, or of structural similarity. Taking the former view (indel history, it is possible to use formal automata theory to generalize the phylogenetic likelihood framework for finite substitution models (Dayhoff's probability matrices and Felsenstein's pruning algorithm to arbitrary-length sequences. In this paper, we report results of a simulation-based benchmark of several methods for reconstruction of indel history. The methods tested include a relatively new algorithm for statistical marginalization of MSAs that sums over a stochastically-sampled ensemble of the most probable evolutionary histories. For mammalian evolutionary parameters on several different trees, the single most likely history sampled by our algorithm appears less biased than histories reconstructed by other MSA methods. The algorithm can also be used for alignment-free inference, where the MSA is explicitly summed out of the analysis. As an illustration of our method, we discuss reconstruction of the evolutionary histories of human protein-coding genes.

  9. Phylogenetics, classification, and biogeography of the treefrogs (Amphibia: Anura: Arboranae).

    Duellman, William E; Marion, Angela B; Hedges, S Blair

    2016-01-01

    A phylogenetic analysis of sequences from 503 species of hylid frogs and four outgroup taxa resulted in 16,128 aligned sites of 19 genes. The molecular data were subjected to a maximum likelihood analysis that resulted in a new phylogenetic tree of treefrogs. A conservative new classification based on the tree has (1) three families composing an unranked taxon, Arboranae, (2) nine subfamilies (five resurrected, one new), and (3) six resurrected generic names and five new generic names. Using the results of a maximum likelihood timetree, times of divergence were determined. For the most part these times of divergence correlated well with historical geologic events. The arboranan frogs originated in South America in the Late Mesozoic or Early Cenozoic. The family Pelodryadidae diverged from its South American relative, Phyllomedusidae, in the Eocene and invaded Australia via Antarctica. There were two dispersals from South America to North America in the Paleogene. One lineage was the ancestral stock of Acris and its relatives, whereas the other lineage, subfamily Hylinae, differentiated into a myriad of genera in Middle America. PMID:27394762

  10. Accurate molecular classification of cancer using simple rules

    Gotoh Osamu

    2009-10-01

    Full Text Available Abstract Background One intractable problem with using microarray data analysis for cancer classification is how to reduce the extremely high-dimensionality gene feature data to remove the effects of noise. Feature selection is often used to address this problem by selecting informative genes from among thousands or tens of thousands of genes. However, most of the existing methods of microarray-based cancer classification utilize too many genes to achieve accurate classification, which often hampers the interpretability of the models. For a better understanding of the classification results, it is desirable to develop simpler rule-based models with as few marker genes as possible. Methods We screened a small number of informative single genes and gene pairs on the basis of their depended degrees proposed in rough sets. Applying the decision rules induced by the selected genes or gene pairs, we constructed cancer classifiers. We tested the efficacy of the classifiers by leave-one-out cross-validation (LOOCV of training sets and classification of independent test sets. Results We applied our methods to five cancerous gene expression datasets: leukemia (acute lymphoblastic leukemia [ALL] vs. acute myeloid leukemia [AML], lung cancer, prostate cancer, breast cancer, and leukemia (ALL vs. mixed-lineage leukemia [MLL] vs. AML. Accurate classification outcomes were obtained by utilizing just one or two genes. Some genes that correlated closely with the pathogenesis of relevant cancers were identified. In terms of both classification performance and algorithm simplicity, our approach outperformed or at least matched existing methods. Conclusion In cancerous gene expression datasets, a small number of genes, even one or two if selected correctly, is capable of achieving an ideal cancer classification effect. This finding also means that very simple rules may perform well for cancerous class prediction.

  11. Automatic classification and accurate size measurement of blank mask defects

    Bhamidipati, Samir; Paninjath, Sankaranarayanan; Pereira, Mark; Buck, Peter

    2015-07-01

    complexity of defects encountered. The variety arises due to factors such as defect nature, size, shape and composition; and the optical phenomena occurring around the defect. This paper focuses on preliminary characterization results, in terms of classification and size estimation, obtained by Calibre MDPAutoClassify tool on a variety of mask blank defects. It primarily highlights the challenges faced in achieving the results with reference to the variety of defects observed on blank mask substrates and the underlying complexities which make accurate defect size measurement an important and challenging task.

  12. Accurate mobile malware detection and classification in the cloud.

    Wang, Xiaolei; Yang, Yuexiang; Zeng, Yingzhi

    2015-01-01

    As the dominator of the Smartphone operating system market, consequently android has attracted the attention of s malware authors and researcher alike. The number of types of android malware is increasing rapidly regardless of the considerable number of proposed malware analysis systems. In this paper, by taking advantages of low false-positive rate of misuse detection and the ability of anomaly detection to detect zero-day malware, we propose a novel hybrid detection system based on a new open-source framework CuckooDroid, which enables the use of Cuckoo Sandbox's features to analyze Android malware through dynamic and static analysis. Our proposed system mainly consists of two parts: anomaly detection engine performing abnormal apps detection through dynamic analysis; signature detection engine performing known malware detection and classification with the combination of static and dynamic analysis. We evaluate our system using 5560 malware samples and 6000 benign samples. Experiments show that our anomaly detection engine with dynamic analysis is capable of detecting zero-day malware with a low false negative rate (1.16 %) and acceptable false positive rate (1.30 %); it is worth noting that our signature detection engine with hybrid analysis can accurately classify malware samples with an average positive rate 98.94 %. Considering the intensive computing resources required by the static and dynamic analysis, our proposed detection system should be deployed off-device, such as in the Cloud. The app store markets and the ordinary users can access our detection system for malware detection through cloud service. PMID:26543718

  13. Phylogenetics.

    Sleator, Roy D

    2011-04-01

    The recent rapid expansion in the DNA and protein databases, arising from large-scale genomic and metagenomic sequence projects, has forced significant development in the field of phylogenetics: the study of the evolutionary relatedness of the planet's inhabitants. Advances in phylogenetic analysis have greatly transformed our view of the landscape of evolutionary biology, transcending the view of the tree of life that has shaped evolutionary theory since Darwinian times. Indeed, modern phylogenetic analysis no longer focuses on the restricted Darwinian-Mendelian model of vertical gene transfer, but must also consider the significant degree of lateral gene transfer, which connects and shapes almost all living things. Herein, I review the major tree-building methods, their strengths, weaknesses and future prospects. PMID:21249334

  14. Phylogeny and phylogenetic classification of the antbirds, ovenbirds, woodcreepers, and allies (Aves: Passeriformes: Infraorder Furnariides)

    Moyle, R.G.; Chesser, R.T.; Brumfield, R.T.; Tello, J.G.; Marchese, D.J.; Cracraft, J.

    2009-01-01

    The infraorder Furnariides is a diverse group of suboscine passerine birds comprising a substantial component of the Neotropical avifauna. The included species encompass a broad array of morphologies and behaviours, making them appealing for evolutionary studies, but the size of the group (ca. 600 species) has limited well-sampled higher-level phylogenetic studies. Using DNA sequence data from the nuclear RAG-1 and RAG-2 exons, we undertook a phylogenetic analysis of the Furnariides sampling 124 (more than 88%) of the genera. Basal relationships among family-level taxa differed depending on phylogenetic method, but all topologies had little nodal support, mirroring the results from earlier studies in which discerning relationships at the base of the radiation was also difficult. In contrast, branch support for family-rank taxa and for many relationships within those clades was generally high. Our results support the Melanopareidae and Grallariidae as distinct from the Rhinocryptidae and Formicariidae, respectively. Within the Furnariides our data contradict some recent phylogenetic hypotheses and suggest that further study is needed to resolve these discrepancies. Of the few genera represented by multiple species, several were not monophyletic, indicating that additional systematic work remains within furnariine families and must include dense taxon sampling. We use this study as a basis for proposing a new phylogenetic classification for the group and in the process erect new family-group names for clades having high branch support across methods. ?? 2009 The Willi Hennig Society.

  15. Molecular phylogenetic perspectives for character classification and convergence: Framing some issues with nematode vulval appendages and telotylenchid tail termini

    Characters flagged as convergent based on newer molecular phylogenetic trees inform both practical identification and more esoteric classification. Nematode morphological characters such as lateral lines, bullae and laciniae are quite independent structures from those similarly named in other organi...

  16. A bootstrap based analysis pipeline for efficient classification of phylogenetically related animal miRNAs

    Gu Xun

    2007-03-01

    Full Text Available Abstract Background Phylogenetically related miRNAs (miRNA families convey important information of the function and evolution of miRNAs. Due to the special sequence features of miRNAs, pair-wise sequence identity between miRNA precursors alone is often inadequate for unequivocally judging the phylogenetic relationships between miRNAs. Most of the current methods for miRNA classification rely heavily on manual inspection and lack measurements of the reliability of the results. Results In this study, we designed an analysis pipeline (the Phylogeny-Bootstrap-Cluster (PBC pipeline to identify miRNA families based on branch stability in the bootstrap trees derived from overlapping genome-wide miRNA sequence sets. We tested the PBC analysis pipeline with the miRNAs from six animal species, H. sapiens, M. musculus, G. gallus, D. rerio, D. melanogaster, and C. elegans. The resulting classification was compared with the miRNA families defined in miRBase. The two classifications were largely consistent. Conclusion The PBC analysis pipeline is an efficient method for classifying large numbers of heterogeneous miRNA sequences. It requires minimum human involvement and provides measurements of the reliability of the classification results.

  17. Correlation between the Chemotaxonomic Classifications of the essential oils of 48 Eucalyptus species harvested from Tunisia and their Phylogenetic Classification

    Elaissi Ameur

    2014-03-01

    Full Text Available Various chemical classes (monoterpenes hydrocarbons, oxygenated monoterpenes, sesquiterpenes hydrocarbons, oxygenated sesquiterpenes, esters, ketones, non classified coumpounds and non identified compounds and twenty five of the main components from the essential oils of 48 Tunisian Eucalyptus species has been reported. The compounds includes 1,8-cineole, torquatone, p-cymene, spathulenol, trans-pinocarveol, α-pinene, borneol, cryptone, 4-methyl-2-pentyl acetate, globulol, isoamyl isovalerate, α-terpineol, (E,E-farnesol, viridiflorol, aromadendrene, terpinen-4-ol, β-eudesmol, α-eudesmol, limonene, D-piperitone, caryophyllene oxide, β-phellandrene, bicyclogermacrene, α-phellandrene and benzaldehyde, as a principal component when analysed by GC-MS.. The comparison of this classification to the phylogenetic classification showed a divergence for the majority of the species, however some concordance was found.

  18. INDUS - a composition-based approach for rapid and accurate taxonomic classification of metagenomic sequences

    Mohammed, Monzoorul Haque; Ghosh, Tarini Shankar; Reddy, Rachamalla Maheedhar; Reddy, Chennareddy Venkata Siva Kumar; Singh, Nitin Kumar; Sharmila S Mande

    2011-01-01

    Background Taxonomic classification of metagenomic sequences is the first step in metagenomic analysis. Existing taxonomic classification approaches are of two types, similarity-based and composition-based. Similarity-based approaches, though accurate and specific, are extremely slow. Since, metagenomic projects generate millions of sequences, adopting similarity-based approaches becomes virtually infeasible for research groups having modest computational resources. In this study, we present ...

  19. Molecular phylogenetic evaluation of classification and scenarios of character evolution in calcareous sponges (Porifera, Class Calcarea.

    Oliver Voigt

    Full Text Available Calcareous sponges (Phylum Porifera, Class Calcarea are known to be taxonomically difficult. Previous molecular studies have revealed many discrepancies between classically recognized taxa and the observed relationships at the order, family and genus levels; these inconsistencies question underlying hypotheses regarding the evolution of certain morphological characters. Therefore, we extended the available taxa and character set by sequencing the complete small subunit (SSU rDNA and the almost complete large subunit (LSU rDNA of additional key species and complemented this dataset by substantially increasing the length of available LSU sequences. Phylogenetic analyses provided new hypotheses about the relationships of Calcarea and about the evolution of certain morphological characters. We tested our phylogeny against competing phylogenetic hypotheses presented by previous classification systems. Our data reject the current order-level classification by again finding non-monophyletic Leucosolenida, Clathrinida and Murrayonida. In the subclass Calcinea, we recovered a clade that includes all species with a cortex, which is largely consistent with the previously proposed order Leucettida. Other orders that had been rejected in the current system were not found, but could not be rejected in our tests either. We found several additional families and genera polyphyletic: the families Leucascidae and Leucaltidae and the genus Leucetta in Calcinea, and in Calcaronea the family Amphoriscidae and the genus Ute. Our phylogeny also provided support for the vaguely suspected close relationship of several members of Grantiidae with giantortical diactines to members of Heteropiidae. Similarly, our analyses revealed several unexpected affinities, such as a sister group relationship between Leucettusa (Leucaltidae and Leucettidae and between Leucascandra (Jenkinidae and Sycon carteri (Sycettidae. According to our results, the taxonomy of Calcarea is in

  20. Phylogenetic systematics and a revised generic classification of anthidiine bees (Hymenoptera: Megachilidae).

    Litman, Jessica R; Griswold, Terry; Danforth, Bryan N

    2016-07-01

    The bee tribe Anthidiini (Hymenoptera: Megachilidae) is a large, cosmopolitan group of solitary bees that exhibit intriguing nesting behavior. We present the first molecular-based phylogenetic analysis of relationships within Anthidiini using model-based methods and a large, multi-locus dataset (five nuclear genes, 5081 base pairs), as well as a combined analysis using our molecular dataset in conjunction with a previously published morphological matrix. We discuss the evolution of nesting behavior in Anthidiini and the relationship between nesting material and female mandibular morphology. Following an examination of the morphological characters historically used to recognize anthidiine genera, we recommend the use of a molecular-based phylogenetic backbone to define taxonomic groups prior to the assignment of diagnostic morphological characters for these groups. Finally, our results reveal the paraphyly of numerous genera and have significant consequences for anthidiine classification. In order to promote a classification system based on stable, monophyletic clades, we hereby make the following changes to Michener's (2007) classification: The subgenera Afranthidium (Zosteranthidium) Michener and Griswold, 1994, Afranthidium (Branthidium) Pasteels, 1969 and Afranthidium (Immanthidium) Pasteels, 1969 are moved into the genus Pseudoanthidium, thus forming the new combinations Pseudoanthidium (Zosteranthidium), Pseudoanthidium (Branthidium), and Pseudoanthidium (Immanthidium). The genus Neanthidium Pasteels, 1969 is also moved into the genus Pseudoanthidium, thus forming the new combination Pseudoanthidium (Neanthidium). Based on morphological characters shared with our new definition of the genus Pseudoanthidium, the subgenus Afranthidium (Mesanthidiellum) Pasteels, 1969 and the genus Gnathanthidium Pasteels, 1969 are also moved into the genus Pseudoanthidium, thus forming the new combinations Pseudoanthidium (Mesanthidiellum) and Pseudoanthidium (Gnathanthidium

  1. Molecular phylogenetic evaluation of classification and scenarios of character evolution in calcareous sponges (Porifera, Class Calcarea).

    Voigt, Oliver; Wülfing, Eilika; Wörheide, Gert

    2012-01-01

    Calcareous sponges (Phylum Porifera, Class Calcarea) are known to be taxonomically difficult. Previous molecular studies have revealed many discrepancies between classically recognized taxa and the observed relationships at the order, family and genus levels; these inconsistencies question underlying hypotheses regarding the evolution of certain morphological characters. Therefore, we extended the available taxa and character set by sequencing the complete small subunit (SSU) rDNA and the almost complete large subunit (LSU) rDNA of additional key species and complemented this dataset by substantially increasing the length of available LSU sequences. Phylogenetic analyses provided new hypotheses about the relationships of Calcarea and about the evolution of certain morphological characters. We tested our phylogeny against competing phylogenetic hypotheses presented by previous classification systems. Our data reject the current order-level classification by again finding non-monophyletic Leucosolenida, Clathrinida and Murrayonida. In the subclass Calcinea, we recovered a clade that includes all species with a cortex, which is largely consistent with the previously proposed order Leucettida. Other orders that had been rejected in the current system were not found, but could not be rejected in our tests either. We found several additional families and genera polyphyletic: the families Leucascidae and Leucaltidae and the genus Leucetta in Calcinea, and in Calcaronea the family Amphoriscidae and the genus Ute. Our phylogeny also provided support for the vaguely suspected close relationship of several members of Grantiidae with giantortical diactines to members of Heteropiidae. Similarly, our analyses revealed several unexpected affinities, such as a sister group relationship between Leucettusa (Leucaltidae) and Leucettidae and between Leucascandra (Jenkinidae) and Sycon carteri (Sycettidae). According to our results, the taxonomy of Calcarea is in desperate need of a

  2. Rapid phylogenetic and functional classification of short genomic fragments with signature peptides

    Berendzen Joel

    2012-08-01

    Full Text Available Abstract Background Classification is difficult for shotgun metagenomics data from environments such as soils, where the diversity of sequences is high and where reference sequences from close relatives may not exist. Approaches based on sequence-similarity scores must deal with the confounding effects that inheritance and functional pressures exert on the relation between scores and phylogenetic distance, while approaches based on sequence alignment and tree-building are typically limited to a small fraction of gene families. We describe an approach based on finding one or more exact matches between a read and a precomputed set of peptide 10-mers. Results At even the largest phylogenetic distances, thousands of 10-mer peptide exact matches can be found between pairs of bacterial genomes. Genes that share one or more peptide 10-mers typically have high reciprocal BLAST scores. Among a set of 403 representative bacterial genomes, some 20 million 10-mer peptides were found to be shared. We assign each of these peptides as a signature of a particular node in a phylogenetic reference tree based on the RNA polymerase genes. We classify the phylogeny of a genomic fragment (e.g., read at the most specific node on the reference tree that is consistent with the phylogeny of observed signature peptides it contains. Using both synthetic data from four newly-sequenced soil-bacterium genomes and ten real soil metagenomics data sets, we demonstrate a sensitivity and specificity comparable to that of the MEGAN metagenomics analysis package using BLASTX against the NR database. Phylogenetic and functional similarity metrics applied to real metagenomics data indicates a signal-to-noise ratio of approximately 400 for distinguishing among environments. Our method assigns ~6.6 Gbp/hr on a single CPU, compared with 25 kbp/hr for methods based on BLASTX against the NR database. Conclusions Classification by exact matching against a precomputed list of signature

  3. Accurate crop classification using hierarchical genetic fuzzy rule-based systems

    Topaloglou, Charalampos A.; Mylonas, Stelios K.; Stavrakoudis, Dimitris G.; Mastorocostas, Paris A.; Theocharis, John B.

    2014-10-01

    This paper investigates the effectiveness of an advanced classification system for accurate crop classification using very high resolution (VHR) satellite imagery. Specifically, a recently proposed genetic fuzzy rule-based classification system (GFRBCS) is employed, namely, the Hierarchical Rule-based Linguistic Classifier (HiRLiC). HiRLiC's model comprises a small set of simple IF-THEN fuzzy rules, easily interpretable by humans. One of its most important attributes is that its learning algorithm requires minimum user interaction, since the most important learning parameters affecting the classification accuracy are determined by the learning algorithm automatically. HiRLiC is applied in a challenging crop classification task, using a SPOT5 satellite image over an intensively cultivated area in a lake-wetland ecosystem in northern Greece. A rich set of higher-order spectral and textural features is derived from the initial bands of the (pan-sharpened) image, resulting in an input space comprising 119 features. The experimental analysis proves that HiRLiC compares favorably to other interpretable classifiers of the literature, both in terms of structural complexity and classification accuracy. Its testing accuracy was very close to that obtained by complex state-of-the-art classification systems, such as the support vector machines (SVM) and random forest (RF) classifiers. Nevertheless, visual inspection of the derived classification maps shows that HiRLiC is characterized by higher generalization properties, providing more homogeneous classifications that the competitors. Moreover, the runtime requirements for producing the thematic map was orders of magnitude lower than the respective for the competitors.

  4. HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors

    Sun Yanni

    2011-05-01

    Full Text Available Abstract Background Protein domain classification is an important step in metagenomic annotation. The state-of-the-art method for protein domain classification is profile HMM-based alignment. However, the relatively high rates of insertions and deletions in homopolymer regions of pyrosequencing reads create frameshifts, causing conventional profile HMM alignment tools to generate alignments with marginal scores. This makes error-containing gene fragments unclassifiable with conventional tools. Thus, there is a need for an accurate domain classification tool that can detect and correct sequencing errors. Results We introduce HMM-FRAME, a protein domain classification tool based on an augmented Viterbi algorithm that can incorporate error models from different sequencing platforms. HMM-FRAME corrects sequencing errors and classifies putative gene fragments into domain families. It achieved high error detection sensitivity and specificity in a data set with annotated errors. We applied HMM-FRAME in Targeted Metagenomics and a published metagenomic data set. The results showed that our tool can correct frameshifts in error-containing sequences, generate much longer alignments with significantly smaller E-values, and classify more sequences into their native families. Conclusions HMM-FRAME provides a complementary protein domain classification tool to conventional profile HMM-based methods for data sets containing frameshifts. Its current implementation is best used for small-scale metagenomic data sets. The source code of HMM-FRAME can be downloaded at http://www.cse.msu.edu/~zhangy72/hmmframe/ and at https://sourceforge.net/projects/hmm-frame/.

  5. The challenge of producing an accurate statewide land cover classification of digital satellite data

    A general land use/land cover data set for South Carolina produced from 1989/1990 SPOT multispectral data is presented. This data set incorporates eight categories: urban/built-up, agricultural/grass, scrub/shrub, forest, water, forested wetland, nonforested wetland, and barren. A statewide inventory of these land use/land cover 'associations' is prepared using integrated pcERDAS and prARC/INFO software by the South Carolina Land Resources Commission with unsupervised classification and reclassification routines, and subsequent air photo verification. Land cover data are produced by county and evaluated for reliability (88-percent average classification accuracy). Multiple applications are served by accurate and timely county land cover inventories for resource management and economic development at state and local government levels, specifically for purposes of land use planning and site location analysis. 6 refs

  6. Towards a formal genealogical classification of the Lezgian languages (North Caucasus): testing various phylogenetic methods on lexical data.

    Kassian, Alexei

    2015-01-01

    A lexicostatistical classification is proposed for 20 languages and dialects of the Lezgian group of the North Caucasian family, based on meticulously compiled 110-item wordlists, published as part of the Global Lexicostatistical Database project. The lexical data have been subsequently analyzed with the aid of the principal phylogenetic methods, both distance-based and character-based: Starling neighbor joining (StarlingNJ), Neighbor joining (NJ), Unweighted pair group method with arithmetic mean (UPGMA), Bayesian Markov chain Monte Carlo (MCMC), Unweighted maximum parsimony (UMP). Cognation indexes within the input matrix were marked by two different algorithms: traditional etymological approach and phonetic similarity, i.e., the automatic method of consonant classes (Levenshtein distances). Due to certain reasons (first of all, high lexicographic quality of the wordlists and a consensus about the Lezgian phylogeny among Caucasologists), the Lezgian database is a perfect testing area for appraisal of phylogenetic methods. For the etymology-based input matrix, all the phylogenetic methods, with the possible exception of UMP, have yielded trees that are sufficiently compatible with each other to generate a consensus phylogenetic tree of the Lezgian lects. The obtained consensus tree agrees with the traditional expert classification as well as some of the previously proposed formal classifications of this linguistic group. Contrary to theoretical expectations, the UMP method has suggested the least plausible tree of all. In the case of the phonetic similarity-based input matrix, the distance-based methods (StarlingNJ, NJ, UPGMA) have produced the trees that are rather close to the consensus etymology-based tree and the traditional expert classification, whereas the character-based methods (Bayesian MCMC, UMP) have yielded less likely topologies. PMID:25719456

  7. DNA barcode analysis: a comparison of phylogenetic and statistical classification methods

    Leblois Raphael

    2009-11-01

    Full Text Available Abstract Background DNA barcoding aims to assign individuals to given species according to their sequence at a small locus, generally part of the CO1 mitochondrial gene. Amongst other issues, this raises the question of how to deal with within-species genetic variability and potential transpecific polymorphism. In this context, we examine several assignation methods belonging to two main categories: (i phylogenetic methods (neighbour-joining and PhyML that attempt to account for the genealogical framework of DNA evolution and (ii supervised classification methods (k-nearest neighbour, CART, random forest and kernel methods. These methods range from basic to elaborate. We investigated the ability of each method to correctly classify query sequences drawn from samples of related species using both simulated and real data. Simulated data sets were generated using coalescent simulations in which we varied the genealogical history, mutation parameter, sample size and number of species. Results No method was found to be the best in all cases. The simplest method of all, "one nearest neighbour", was found to be the most reliable with respect to changes in the parameters of the data sets. The parameter most influencing the performance of the various methods was molecular diversity of the data. Addition of genetically independent loci - nuclear genes - improved the predictive performance of most methods. Conclusion The study implies that taxonomists can influence the quality of their analyses either by choosing a method best-adapted to the configuration of their sample, or, given a certain method, increasing the sample size or altering the amount of molecular diversity. This can be achieved either by sequencing more mtDNA or by sequencing additional nuclear genes. In the latter case, they may also have to modify their data analysis method.

  8. A classification of the Chloridoideae (Poaceae) based on multi-gene phylogenetic trees.

    Peterson, Paul M; Romaschenko, Konstantin; Johnson, Gabriel

    2010-05-01

    We conducted a molecular phylogenetic study of the subfamily Chloridoideae using six plastid DNA sequences (ndhA intron, ndhF, rps16-trnK, rps16 intron, rps3, and rpl32-trnL) and a single nuclear ITS DNA sequence. Our large original data set includes 246 species (17.3%) representing 95 genera (66%) of the grasses currently placed in the Chloridoideae. The maximum likelihood and Bayesian analysis of DNA sequences provides strong support for the monophyly of the Chloridoideae; followed by, in order of divergence: a Triraphideae clade with Neyraudia sister to Triraphis; an Eragrostideae clade with the Cotteinae (includes Cottea and Enneapogon) sister to the Uniolinae (includes Entoplocamia, Tetrachne, and Uniola), and a terminal Eragrostidinae clade of Ectrosia, Harpachne, and Psammagrostis embedded in a polyphyletic Eragrostis; a Zoysieae clade with Urochondra sister to a Zoysiinae (Zoysia) clade, and a terminal Sporobolinae clade that includes Spartina, Calamovilfa, Pogoneura, and Crypsis embedded in a polyphyletic Sporobolus; and a very large terminal Cynodonteae clade that includes 13 monophyletic subtribes. The Cynodonteae includes, in alphabetical order: Aeluropodinae (Aeluropus); Boutelouinae (Bouteloua); Eleusininae (includes Apochiton, Astrebla with Schoenefeldia embedded, Austrochloris, Brachyachne, Chloris, Cynodon with Brachyachne embedded in part, Eleusine, Enteropogon with Eustachys embedded in part, Eustachys, Chrysochloa, Coelachyrum, Leptochloa with Dinebra embedded, Lepturus, Lintonia, Microchloa, Saugetia, Schoenefeldia, Sclerodactylon, Tetrapogon, and Trichloris); Hilariinae (Hilaria); Monanthochloinae (includes Distichlis, Monanthochloe, and Reederochloa); Muhlenbergiinae (Muhlenbergia with Aegopogon, Bealia, Blepharoneuron, Chaboissaea, Lycurus, Pereilema, Redfieldia, Schaffnerella, and Schedonnardus all embedded); Orcuttiinae (includes Orcuttia and Tuctoria); Pappophorinae (includes Neesiochloa and Pappophorum); Scleropogoninae (includes

  9. GPD: a graph pattern diffusion kernel for accurate graph classification with applications in cheminformatics.

    Smalter, Aaron; Huan, Jun Luke; Jia, Yi; Lushington, Gerald

    2010-01-01

    Graph data mining is an active research area. Graphs are general modeling tools to organize information from heterogeneous sources and have been applied in many scientific, engineering, and business fields. With the fast accumulation of graph data, building highly accurate predictive models for graph data emerges as a new challenge that has not been fully explored in the data mining community. In this paper, we demonstrate a novel technique called graph pattern diffusion (GPD) kernel. Our idea is to leverage existing frequent pattern discovery methods and to explore the application of kernel classifier (e.g., support vector machine) in building highly accurate graph classification. In our method, we first identify all frequent patterns from a graph database. We then map subgraphs to graphs in the graph database and use a process we call "pattern diffusion" to label nodes in the graphs. Finally, we designed a graph alignment algorithm to compute the inner product of two graphs. We have tested our algorithm using a number of chemical structure data. The experimental results demonstrate that our method is significantly better than competing methods such as those kernel functions based on paths, cycles, and subgraphs. PMID:20431140

  10. DNA barcode analysis: a comparison of phylogenetic and statistical classification methods.

    Leblois Raphael; Olteanu Madalina; Bleakley Kevin; Schaeffer Brigitte; David Olivier; Austerlitz Frederic; Veuille Michel; Laredo Catherine

    2009-01-01

    Abstract Background DNA barcoding aims to assign individuals to given species according to their sequence at a small locus, generally part of the CO1 mitochondrial gene. Amongst other issues, this raises the question of how to deal with within-species genetic variability and potential transpecific polymorphism. In this context, we examine several assignation methods belonging to two main categories: (i) phylogenetic methods (neighbour-joining and PhyML) that attempt to account for the genealo...

  11. Addition of wsp sequences to the Wolbachia phylogenetic tree and stability of the classification.

    Pintureau, B; Chaudier, S; Lassablière, F; Charles, H; Grenier, S

    2000-10-01

    Wolbachia are symbiotic bacteria altering reproductive characters of numerous arthropods. Their most recent phylogeny and classification are based on sequences of the wsp gene. We sequenced wsp gene from six Wolbachia strains infecting six Trichogramma species that live as egg parasitoids on many insects. This allows us to test the effect of the addition of sequences on the Wolbachia phylogeny and to check the classification of Wolbachia infecting Trichogramma. The six Wolbachia studied are classified in the B supergroup. They confirm the monophyletic structure of the B Wolbachia in Trichogramma but introduce small differences in the Wolbachia classification. Modifications include the definition of a new group, Sem, for Wolbachia of T. semblidis and the merging of the two closely related groups, Sib and Kay. Specific primers were determined and tested for the Sem group. PMID:11040288

  12. Towards a phylogenetic classification of reef corals: The Indo-Pacific genera Merulina, Goniastrea and Scapophyllia (Scleractinia, Merulinidae)

    Huang, Danwei

    2014-06-03

    Recent advances in scleractinian systematics and taxonomy have been achieved through the integration of molecular and morphological data, as well as rigorous analysis using phylogenetic methods. In this study, we continue in our pursuit of a phylogenetic classification by examining the evolutionary relationships between the closely related reef coral genera Merulina, Goniastrea, Paraclavarina and Scapophyllia (Merulinidae). In particular, we address the extreme polyphyly of Favites and Goniastrea that was discovered a decade ago. We sampled 145 specimens belonging to 16 species from a wide geographic range in the Indo-Pacific, focusing especially on type localities, including the Red Sea, western Indian Ocean and central Pacific. Tree reconstructions based on both nuclear and mitochondrial markers reveal a novel lineage composed of three species previously placed in Favites and Goniastrea. Morphological analyses indicate that this clade, Paragoniastrea Huang, Benzoni & Budd, gen. n., has a unique combination of corallite and subcorallite features observable with scanning electron microscopy and thin sections. Molecular and morphological evidence furthermore indicates that the monotypic genus Paraclavarina is nested within Merulina, and the former is therefore synonymised. © 2014 Royal Swedish Academy of Sciences.

  13. Chemical classification of cattle. 2. Phylogenetic tree and specific status of the Zebu.

    Manwell, C; Baker, C M

    1980-01-01

    Phylogenetic trees for the ten major breed groups of cattle were constructed by Farris's (1972) maximum parsimony method, or Fitch & Margoliash's (1967) method, which averages ou the deviation over the entire assemblage. Both techniques yield essentially identical trees. The phylogenetic tree for the ten major cattle breed groups can be superimposed on a map of Europe and western Asia, the root of the tree being close to the 'fertile crescent' in Asia Minor, believed to be a primary centre of bovine domestication. For some but not all protein variants there is a cline of gene frequencies as one proceeds from the British Isles and northwest Europe towards southeast Europe and Asia Minor, with the most extreme gene frequencies in the Zebu breeds of India. It is not clear to what extent the observed clines are primary or secondary, i.e., consequent to the initial migrations of cattle towards the end of the Pleistocene or consequent to the many migrations of man with his domesticated cattle. Such clines as exist are not in themselves sufficient to prove either selection versus genetic drift or to establish taxonomic ranking. Contrary to some suggestions in the literature, the biochemical evidence supports Linnaeus's original conclusions: Bos taurus and Bos indicus are distinct species. PMID:7458002

  14. Archaeal-eubacterial mergers in the origin of Eukarya: phylogenetic classification of life.

    Margulis, L

    1996-01-01

    A symbiosis-based phylogeny leads to a consistent, useful classification system for all life. "Kingdoms" and "Domains" are replaced by biological names for the most inclusive taxa: Prokarya (bacteria) and Eukarya (symbiosis-derived nucleated organisms). The earliest Eukarya, anaerobic mastigotes, hypothetically originated from permanent whole-cell fusion between members of Archaea (e.g., Thermoplasma-like organisms) and of Eubacteria (e.g., Spirochaeta-like organisms). Molecular biology, life...

  15. Nucleotide sequence and phylogenetic classification of candidate human papilloma virus type 92

    From a basal cell carcinoma (BCC) the complete genome of candidate human papillomavirus (HPV) type 92 was characterized. Phylogenetically, the candidate HPV 92 was relatively distantly related to other cutaneous HPV types within the B1 group. By quantitative real time PCR, 94 viral copies were present per cell in the BCC and another BCC contained 1 viral copy per cell. Lower copy numbers were found in two solar keratoses (1 copy per 33 cells and 1 copy per 60 cells) and two squamous cell carcinomas (1 copy per 436 cells and 1 copy per 1143 cells). The high viral load of HPV 92 in two BCCs differs from the low amount of HPV DNA reported from nonmelanoma skin cancers

  16. A Highly Accurate Classification of TM Data through Correction of Atmospheric Effects

    Bill Smith; Frank Scarpace; Widad Elmahboub

    2009-01-01

    Atmospheric correction impacts on the accuracy of satellite image-based land cover classification are a growing concern among scientists. In this study, the principle objective was to enhance classification accuracy by minimizing contamination effects from aerosol scattering in Landsat TM images due to the variation in solar zenith angle corresponding to cloud-free earth targets. We have derived a mathematical model for aerosols to compute and subtract the aerosol scattering noise per pixel o...

  17. From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification

    2010-01-01

    Background Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. Results In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. Conclusions FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for

  18. From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification

    Dawyndt Peter

    2010-01-01

    Full Text Available Abstract Background Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. Results In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. Conclusions FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the

  19. Archaeal-eubacterial mergers in the origin of Eukarya: phylogenetic classification of life

    Margulis, L.

    1996-01-01

    A symbiosis-based phylogeny leads to a consistent, useful classification system for all life. "Kingdoms" and "Domains" are replaced by biological names for the most inclusive taxa: Prokarya (bacteria) and Eukarya (symbiosis-derived nucleated organisms). The earliest Eukarya, anaerobic mastigotes, hypothetically originated from permanent whole-cell fusion between members of Archaea (e.g., Thermoplasma-like organisms) and of Eubacteria (e.g., Spirochaeta-like organisms). Molecular biology, life-history, and fossil record evidence support the reunification of bacteria as Prokarya while subdividing Eukarya into uniquely defined subtaxa: Protoctista, Animalia, Fungi, and Plantae.

  20. Assignment of Calibration Information to Deeper Phylogenetic Nodes is More Effective in Obtaining Precise and Accurate Divergence Time Estimates.

    Mello, Beatriz; Schrago, Carlos G

    2014-01-01

    Divergence time estimation has become an essential tool for understanding macroevolutionary events. Molecular dating aims to obtain reliable inferences, which, within a statistical framework, means jointly increasing the accuracy and precision of estimates. Bayesian dating methods exhibit the propriety of a linear relationship between uncertainty and estimated divergence dates. This relationship occurs even if the number of sites approaches infinity and places a limit on the maximum precision of node ages. However, how the placement of calibration information may affect the precision of divergence time estimates remains an open question. In this study, relying on simulated and empirical data, we investigated how the location of calibration within a phylogeny affects the accuracy and precision of time estimates. We found that calibration priors set at median and deep phylogenetic nodes were associated with higher precision values compared to analyses involving calibration at the shallowest node. The results were independent of the tree symmetry. An empirical mammalian dataset produced results that were consistent with those generated by the simulated sequences. Assigning time information to the deeper nodes of a tree is crucial to guarantee the accuracy and precision of divergence times. This finding highlights the importance of the appropriate choice of outgroups in molecular dating. PMID:24855333

  1. A Highly Accurate Classification of TM Data through Correction of Atmospheric Effects

    Bill Smith

    2009-07-01

    Full Text Available Atmospheric correction impacts on the accuracy of satellite image-based land cover classification are a growing concern among scientists. In this study, the principle objective was to enhance classification accuracy by minimizing contamination effects from aerosol scattering in Landsat TM images due to the variation in solar zenith angle corresponding to cloud-free earth targets. We have derived a mathematical model for aerosols to compute and subtract the aerosol scattering noise per pixel of different vegetation classes from TM images of Nicolet in north-eastern Wisconsin. An algorithm in C++ has been developed with iterations to simulate, model, and correct for the solar zenith angle influences on scattering. Results from a supervised classification with corrected TM images showed increased class accuracy for land cover types over uncorrected images. The overall accuracy of the supervised classification was improved substantially (between 13% and 18%. The z-score shows significant difference between the corrected data and the raw data (between 4.0 and 12.0. Therefore, the atmospheric correction was essential for enhancing the image classification.

  2. Molecular Phylogenetic Classification of Streptomycetes Isolated from the Rhizosphere of Tropical Legume (Paraserianthes falcataria (L. Nielsen

    LANGKAH SEMBIRING

    2009-09-01

    Full Text Available Intrageneric diversity of 556 streptomycetes isolated from the rhizosphere of tropical legume was determined by using molecular taxonomic method based on 16S rDNA. A total of 46 isolates were taken to represent 37 colour groups of the isolates. 16S rDNA were amplified and subsequently sequenced and the sequences data were aligned with streptomycete sequences retrieved from the ribosomal data base project (RDP data. Phylogenetic trees were generated by using the PHYLIP software package and the matrix of nucleotide similarity and nucleotide difference were generated by using PHYDIT software. The results confirmed and extended the value of 16S rDNA sequencing in streptomycete systematic. The 16S rDNA sequence data showed that most of the tested colour group representatives formed new centers of taxonomic variation within the genus Streptomyces. The generic assignment of these organisms was underpinned by 16S rDNA sequence data which also suggested that most of the strains represented new centers of taxonomic variation. The taxonomic data indicate that diverse populations of streptomycetes are associated with the roots of tropical legume (P. falcataria. Therefore, the combination of selective isolation and molecular taxonomic procedures used in this study provide a powerful way of uncovering new centers of taxonomic variation within the genus Streptomyces.

  3. Deceptive desmas: molecular phylogenetics suggests a new classification and uncovers convergent evolution of lithistid demosponges.

    Astrid Schuster

    Full Text Available Reconciling the fossil record with molecular phylogenies to enhance the understanding of animal evolution is a challenging task, especially for taxa with a mostly poor fossil record, such as sponges (Porifera. 'Lithistida', a polyphyletic group of recent and fossil sponges, are an exception as they provide the richest fossil record among demosponges. Lithistids, currently encompassing 13 families, 41 genera and >300 recent species, are defined by the common possession of peculiar siliceous spicules (desmas that characteristically form rigid articulated skeletons. Their phylogenetic relationships are to a large extent unresolved and there has been no (taxonomically comprehensive analysis to formally reallocate lithistid taxa to their closest relatives. This study, based on the most comprehensive molecular and morphological investigation of 'lithistid' demosponges to date, corroborates some previous weakly-supported hypotheses, and provides novel insights into the evolutionary relationships of the previous 'order Lithistida'. Based on molecular data (partial mtDNA CO1 and 28S rDNA sequences, we show that 8 out of 13 'Lithistida' families belong to the order Astrophorida, whereas Scleritodermidae and Siphonidiidae form a separate monophyletic clade within Tetractinellida. Most lithistid astrophorids are dispersed between different clades of the Astrophorida and we propose to formally reallocate them, respectively. Corallistidae, Theonellidae and Phymatellidae are monophyletic, whereas the families Pleromidae and Scleritodermidae are polyphyletic. Family Desmanthidae is polyphyletic and groups within Halichondriidae--we formally propose a reallocation. The sister group relationship of the family Vetulinidae to Spongillida is confirmed and we propose here for the first time to include Vetulina into a new Order Sphaerocladina. Megascleres and microscleres possibly evolved and/or were lost several times independently in different 'lithistid' taxa, and

  4. Phylogenetic analysis and classification of the Brassica rapa SET-domain protein family

    Huang Yong

    2011-12-01

    Full Text Available Abstract Background The SET (Su(var3-9, Enhancer-of-zeste, Trithorax domain is an evolutionarily conserved sequence of approximately 130-150 amino acids, and constitutes the catalytic site of lysine methyltransferases (KMTs. KMTs perform many crucial biological functions via histone methylation of chromatin. Histone methylation marks are interpreted differently depending on the histone type (i.e. H3 or H4, the lysine position (e.g. H3K4, H3K9, H3K27, H3K36 or H4K20 and the number of added methyl groups (i.e. me1, me2 or me3. For example, H3K4me3 and H3K36me3 are associated with transcriptional activation, but H3K9me2 and H3K27me3 are associated with gene silencing. The substrate specificity and activity of KMTs are determined by sequences within the SET domain and other regions of the protein. Results Here we identified 49 SET-domain proteins from the recently sequenced Brassica rapa genome. We performed sequence similarity and protein domain organization analysis of these proteins, along with the SET-domain proteins from the dicot Arabidopsis thaliana, the monocots Oryza sativa and Brachypodium distachyon, and the green alga Ostreococcus tauri. We showed that plant SET-domain proteins can be grouped into 6 distinct classes, namely KMT1, KMT2, KMT3, KMT6, KMT7 and S-ET. Apart from the S-ET class, which has an interrupted SET domain and may be involved in methylation of nonhistone proteins, the other classes have characteristics of histone methyltransferases exhibiting different substrate specificities: KMT1 for H3K9, KMT2 for H3K4, KMT3 for H3K36, KMT6 for H3K27 and KMT7 also for H3K4. We also propose a coherent and rational nomenclature for plant SET-domain proteins. Comparisons of sequence similarity and synteny of B. rapa and A. thaliana SET-domain proteins revealed recent gene duplication events for some KMTs. Conclusion This study provides the first characterization of the SET-domain KMT proteins of B. rapa. Phylogenetic analysis data

  5. GPD: A Graph Pattern Diffusion Kernel for Accurate Graph Classification with Applications in Cheminformatics

    Smalter, Aaron; Huan, Jun; Jia, Yi; Lushington, Gerald

    2010-01-01

    Graph data mining is an active research area. Graphs are general modeling tools to organize information from heterogeneous sources and have been applied in many scientific, engineering, and business fields. With the fast accumulation of graph data, building highly accurate predictive models for graph data emerges as a new challenge that has not been fully explored in the data mining community. In this paper, we demonstrate a novel technique called graph pattern diffusion (GPD) kernel. Our ide...

  6. Two fast and accurate heuristic RBF learning rules for data classification.

    Rouhani, Modjtaba; Javan, Dawood S

    2016-03-01

    This paper presents new Radial Basis Function (RBF) learning methods for classification problems. The proposed methods use some heuristics to determine the spreads, the centers and the number of hidden neurons of network in such a way that the higher efficiency is achieved by fewer numbers of neurons, while the learning algorithm remains fast and simple. To retain network size limited, neurons are added to network recursively until termination condition is met. Each neuron covers some of train data. The termination condition is to cover all training data or to reach the maximum number of neurons. In each step, the center and spread of the new neuron are selected based on maximization of its coverage. Maximization of coverage of the neurons leads to a network with fewer neurons and indeed lower VC dimension and better generalization property. Using power exponential distribution function as the activation function of hidden neurons, and in the light of new learning approaches, it is proved that all data became linearly separable in the space of hidden layer outputs which implies that there exist linear output layer weights with zero training error. The proposed methods are applied to some well-known datasets and the simulation results, compared with SVM and some other leading RBF learning methods, show their satisfactory and comparable performance. PMID:26797472

  7. Protein clustering and RNA phylogenetic reconstruction of the influenza A [corrected] virus NS1 protein allow an update in classification and identification of motif conservation.

    Edgar E Sevilla-Reyes

    Full Text Available The non-structural protein 1 (NS1 of influenza A virus (IAV, coded by its third most diverse gene, interacts with multiple molecules within infected cells. NS1 is involved in host immune response regulation and is a potential contributor to the virus host range. Early phylogenetic analyses using 50 sequences led to the classification of NS1 gene variants into groups (alleles A and B. We reanalyzed NS1 diversity using 14,716 complete NS IAV sequences, downloaded from public databases, without host bias. Removal of sequence redundancy and further structured clustering at 96.8% amino acid similarity produced 415 clusters that enhanced our capability to detect distinct subgroups and lineages, which were assigned a numerical nomenclature. Maximum likelihood phylogenetic reconstruction using RNA sequences indicated the previously identified deep branching separating group A from group B, with five distinct subgroups within A as well as two and five lineages within the A4 and A5 subgroups, respectively. Our classification model proposes that sequence patterns in thirteen amino acid positions are sufficient to fit >99.9% of all currently available NS1 sequences into the A subgroups/lineages or the B group. This classification reduces host and virus bias through the prioritization of NS1 RNA phylogenetics over host or virus phenetics. We found significant sequence conservation within the subgroups and lineages with characteristic patterns of functional motifs, such as the differential binding of CPSF30 and crk/crkL or the availability of a C-terminal PDZ-binding motif. To understand selection pressures and evolution acting on NS1, it is necessary to organize the available data. This updated classification may help to clarify and organize the study of NS1 interactions and pathogenic differences and allow the drawing of further functional inferences on sequences in each group, subgroup and lineage rather than on a strain-by-strain basis.

  8. Phylogenetic Classification and Species Identification of Dermatophyte Strains Based on DNA Sequences of Nuclear Ribosomal Internal Transcribed Spacer 1 Regions

    Makimura, Koichi; Tamura, Yoshiko; Mochizuki, Takashi; Hasegawa, Atsuhiko; Tajiri, Yoshito; Hanazawa, Ryo; Uchida, Katsuhisa; Saito, Hiuga; YAMAGUCHI, HIDEYO

    1999-01-01

    The mutual phylogenetic relationships of dermatophytes of the genera Trichophyton, Microsporum, and Epidermophyton were demonstrated by using internal transcribed spacer 1 (ITS1) region ribosomal DNA sequences. Trichophyton spp. and Microsporum spp. form a cluster in the phylogenetic tree with Epidermophyton floccosum as an outgroup, and within this cluster, all Trichophyton spp. except Trichophyton terrestre form a nested cluster (100% bootstrap support). Members of dermatophytes in the clus...

  9. TIPP: taxonomic identification and phylogenetic profiling

    Nguyen, Nam-phuong; Mirarab, Siavash; Liu, Bo; Pop, Mihai; Warnow, Tandy

    2014-01-01

    Motivation: Abundance profiling (also called ‘phylogenetic profiling’) is a crucial step in understanding the diversity of a metagenomic sample, and one of the basic techniques used for this is taxonomic identification of the metagenomic reads. Results: We present taxon identification and phylogenetic profiling (TIPP), a new marker-based taxon identification and abundance profiling method. TIPP combines SAT\\'e-enabled phylogenetic placement a phylogenetic placement method, with statistical techniques to control the classification precision and recall, and results in improved abundance profiles. TIPP is highly accurate even in the presence of high indel errors and novel genomes, and matches or improves on previous approaches, including NBC, mOTU, PhymmBL, MetaPhyler and MetaPhlAn. Availability and implementation: Software and supplementary materials are available at http://www.cs.utexas.edu/users/phylo/software/sepp/tipp-submission/. Contact: warnow@illinois.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25359891

  10. Revisiting the phylogeny of Bombacoideae (Malvaceae): Novel relationships, morphologically cohesive clades, and a new tribal classification based on multilocus phylogenetic analyses.

    Carvalho-Sobrinho, Jefferson G; Alverson, William S; Alcantara, Suzana; Queiroz, Luciano P; Mota, Aline C; Baum, David A

    2016-08-01

    Bombacoideae (Malvaceae) is a clade of deciduous trees with a marked dominance in many forests, especially in the Neotropics. The historical lack of a well-resolved phylogenetic framework for Bombacoideae hinders studies in this ecologically important group. We reexamined phylogenetic relationships in this clade based on a matrix of 6465 nuclear (ETS, ITS) and plastid (matK, trnL-trnF, trnS-trnG) DNA characters. We used maximum parsimony, maximum likelihood, and Bayesian inference to infer relationships among 108 species (∼70% of the total number of known species). We analyzed the evolution of selected morphological traits: trunk or branch prickles, calyx shape, endocarp type, seed shape, and seed number per fruit, using ML reconstructions of their ancestral states to identify possible synapomorphies for major clades. Novel phylogenetic relationships emerged from our analyses, including three major lineages marked by fruit or seed traits: the winged-seed clade (Bernoullia, Gyranthera, and Huberodendron), the spongy endocarp clade (Adansonia, Aguiaria, Catostemma, Cavanillesia, and Scleronema), and the Kapok clade (Bombax, Ceiba, Eriotheca, Neobuchia, Pachira, Pseudobombax, Rhodognaphalon, and Spirotheca). The Kapok clade, the most diverse lineage of the subfamily, includes sister relationships (i) between Pseudobombax and "Pochota fendleri" a historically incertae sedis taxon, and (ii) between the Paleotropical genera Bombax and Rhodognaphalon, implying just two bombacoid dispersals to the Old World, the other one involving Adansonia. This new phylogenetic framework offers new insights and a promising avenue for further evolutionary studies. In view of this information, we present a new tribal classification of the subfamily, accompanied by an identification key. PMID:27154210

  11. Phylogenetic Classification of Trichophyton mentagrophytes Complex Strains Based on DNA Sequences of Nuclear Ribosomal Internal Transcribed Spacer 1 Regions

    Makimura, Koichi; Mochizuki, Takashi; Hasegawa, Atsuhiko; Uchida, Katsuhisa; Saito, Hiuga; YAMAGUCHI, HIDEYO

    1998-01-01

    Using internal transcribed spacer 1 (ITS1) region ribosomal DNA sequences from 37 stock strains and clinical isolates provisionally termed Trichophyton mentagrophytes complex in Japan, we demonstrated the mutual phylogenetic relationships of these strains. Members of this complex were classified into 3 ITS1-homologous groups and 13 ITS1-identical groups by their sequences. ITS1-homologous group I consists of Arthroderma vanbreuseghemii, T. mentagrophytes human isolates, and several strains of...

  12. Increasing the data size to accurately reconstruct the phylogenetic relationships between nine subgroups of the Drosophila melanogaster species group (Drosophilidae, Diptera).

    Yang, Yong; Hou, Zhuo-Cheng; Qian, Yuan-Huai; Kang, Han; Zeng, Qing-Tao

    2012-01-01

    Previous phylogenetic analyses of the melanogaster species group have led to conflicting hypotheses concerning their relationship; therefore the addition of new sequence data is necessary to discover the phylogeny of this species group. Here we present new data derived from 17 genes and representing 48 species to reconstruct the phylogeny of the melanogaster group. A variety of statistical tests, as well as maximum likelihood mapping analysis, were performed to estimate data quality, suggesting that all genes had a high degree of contribution to resolve the phylogeny. Individual locus was analyzed using maximum likelihood (ML), and the concatenated dataset (12,988 bp) were analyzed using partitioned maximum likelihood (ML) and Bayesian analyses. Separated analysis produced various phylogenetic relationships, however, phylogenetic topologies from ML and Bayesian analysis based on concatenated dataset, at the subgroup level, were completely identical to each other with high levels of support. Our results recovered three major clades: the ananassae subgroup, followed by the montium subgroup, the melanogaster subgroup and the oriental subgroups form the third monophyletic clade, in which melanogaster (takahashii, suzukii) forms one subclade and ficusphila [eugracilis (elegans, rhopaloa)] forms another. However, more data are necessary to determine the phylogenetic position of Drosophila lucipennis which proved difficult to place. PMID:21985965

  13. DEFLATE Compression Algorithm Corrects for Overestimation of Phylogenetic Diversity by Grantham Approach to Single-Nucleotide Polymorphism Classification

    Arran Schlosberg

    2014-05-01

    Full Text Available Improvements in speed and cost of genome sequencing are resulting in increasing numbers of novel non-synonymous single nucleotide polymorphisms (nsSNPs in genes known to be associated with disease. The large number of nsSNPs makes laboratory-based classification infeasible and familial co-segregation with disease is not always possible. In-silico methods for classification or triage are thus utilised. A popular tool based on multiple-species sequence alignments (MSAs and work by Grantham, Align-GVGD, has been shown to underestimate deleterious effects, particularly as sequence numbers increase. We utilised the DEFLATE compression algorithm to account for expected variation across a number of species. With the adjusted Grantham measure we derived a means of quantitatively clustering known neutral and deleterious nsSNPs from the same gene; this was then used to assign novel variants to the most appropriate cluster as a means of binary classification. Scaling of clusters allows for inter-gene comparison of variants through a single pathogenicity score. The approach improves upon the classification accuracy of Align-GVGD while correcting for sensitivity to large MSAs. Open-source code and a web server are made available at https://github.com/aschlosberg/CompressGV.

  14. When proglottids and scoleces conflict: phylogenetic relationships and a family-level classification of the Lecanicephalidea (Platyhelminthes: Cestoda).

    Jensen, Kirsten; Caira, Janine N; Cielocha, Joanna J; Littlewood, D Timothy J; Waeschenbach, Andrea

    2016-05-01

    This study presents the first comprehensive phylogenetic analysis of the interrelationships of the morphologically diverse elasmobranch-hosted tapeworm order Lecanicephalidea, based on molecular sequence data. With almost half of current generic diversity having been erected or resurrected within the last decade, an apparent conflict between scolex morphology and proglottid anatomy has hampered the assignment of many of these genera to families. Maximum likelihood and Bayesian analyses of two nuclear markers (D1-D3 of lsrDNA and complete ssrDNA) and two mitochondrial markers (partial rrnL and partial cox1) for 61 lecanicephalidean species representing 22 of the 25 valid genera were conducted; new sequence data were generated for 43 species and 11 genera, including three undescribed genera. The monophyly of the order was confirmed in all but the analyses based on cox1 data alone. Sesquipedalapex placed among species of Anteropora and was thus synonymized with the latter genus. Based on analyses of the concatenated dataset, eight major groups emerged which are herein formally recognised at the familial level. Existing family names (i.e., Lecanicephalidae, Polypocephalidae, Tetragonocephalidae, and Cephalobothriidae) are maintained for four of the eight clades, and new families are proposed for the remaining four groups (Aberrapecidae n. fam., Eniochobothriidae n. fam., Paraberrapecidae n. fam., and Zanobatocestidae n. fam.). The four new families and the Tetragonocephalidae are monogeneric, while the Cephalobothriidae, Lecanicephalidae and Polypocephalidae comprise seven, eight and four genera, respectively. As a result of their unusual morphologies, the three genera not included here (i.e., Corrugatocephalum, Healyum and Quadcuspibothrium) are considered incertae sedis within the order until their familial affinities can be examined in more detail. All eight families are newly circumscribed based on morphological features and a key to the families is provided

  15. Classification

    Clary, Renee; Wandersee, James

    2013-01-01

    In this article, Renee Clary and James Wandersee describe the beginnings of "Classification," which lies at the very heart of science and depends upon pattern recognition. Clary and Wandersee approach patterns by first telling the story of the "Linnaean classification system," introduced by Carl Linnacus (1707-1778), who is…

  16. A non-contact method based on multiple signal classification algorithm to reduce the measurement time for accurately heart rate detection

    Bechet, P.; Mitran, R.; Munteanu, M.

    2013-08-01

    Non-contact methods for the assessment of vital signs are of great interest for specialists due to the benefits obtained in both medical and special applications, such as those for surveillance, monitoring, and search and rescue. This paper investigates the possibility of implementing a digital processing algorithm based on the MUSIC (Multiple Signal Classification) parametric spectral estimation in order to reduce the observation time needed to accurately measure the heart rate. It demonstrates that, by proper dimensioning the signal subspace, the MUSIC algorithm can be optimized in order to accurately assess the heart rate during an 8-28 s time interval. The validation of the processing algorithm performance was achieved by minimizing the mean error of the heart rate after performing simultaneous comparative measurements on several subjects. In order to calculate the error the reference value of heart rate was measured using a classic measurement system through direct contact.

  17. Fast, Simple and Accurate Handwritten Digit Classification by Training Shallow Neural Network Classifiers with the 'Extreme Learning Machine' Algorithm.

    Mark D McDonnell

    Full Text Available Recent advances in training deep (multi-layer architectures have inspired a renaissance in neural network use. For example, deep convolutional networks are becoming the default option for difficult tasks on large datasets, such as image and speech recognition. However, here we show that error rates below 1% on the MNIST handwritten digit benchmark can be replicated with shallow non-convolutional neural networks. This is achieved by training such networks using the 'Extreme Learning Machine' (ELM approach, which also enables a very rapid training time (∼ 10 minutes. Adding distortions, as is common practise for MNIST, reduces error rates even further. Our methods are also shown to be capable of achieving less than 5.5% error rates on the NORB image database. To achieve these results, we introduce several enhancements to the standard ELM algorithm, which individually and in combination can significantly improve performance. The main innovation is to ensure each hidden-unit operates only on a randomly sized and positioned patch of each image. This form of random 'receptive field' sampling of the input ensures the input weight matrix is sparse, with about 90% of weights equal to zero. Furthermore, combining our methods with a small number of iterations of a single-batch backpropagation method can significantly reduce the number of hidden-units required to achieve a particular performance. Our close to state-of-the-art results for MNIST and NORB suggest that the ease of use and accuracy of the ELM algorithm for designing a single-hidden-layer neural network classifier should cause it to be given greater consideration either as a standalone method for simpler problems, or as the final classification stage in deep neural networks applied to more difficult problems.

  18. 善用《中图法》(第五版)改善图书文献归类准确性%Books and Documents'Accurate Classification by Using Chinese Library Classification ( Sth Edition)

    汤彩霞

    2011-01-01

    从三个方面讨论如何善用《中图法》(第五版)(以下简称CLC5)改善图书文献归类准确性,分别是:做好和CLC5相关的前期准备工作,如新旧分类法的比对等;了解和掌握《中图法》(第五版)的部分通用分类规则;制定启用CLC5的本馆分类规定。%From three aspects, this paper discusses how to classify books and documents accurately by using the Chinese Library Classification (Sth Edition) (hereafter referred to as CLC5 ), such as: making a good preliminary preparation for CLCS, including the comparison of the new with the old classification, etc. ; Understanding and grasping some universal classification rules of CLCS; Making the regulations of launching CLC5 in our library.

  19. SpineAnalyzer™ is an accurate and precise method of vertebral fracture detection and classification on dual-energy lateral vertebral assessment scans

    Osteoporotic fractures of the spine are associated with significant morbidity, are highly predictive of hip fractures, but frequently do not present clinically. When there is a low to moderate clinical suspicion of vertebral fracture, which would not justify acquisition of a radiograph, vertebral fracture assessment (VFA) using Dual-energy X-ray Absorptiometry (DXA) offers a low-dose opportunity for diagnosis. Different approaches to the classification of vertebral fractures have been documented. The aim of this study was to measure the precision and accuracy of SpineAnalyzer™, a quantitative morphometry software program. Lateral vertebral assessment images of 64 men were analysed using SpineAnalyzer™ and standard GE Lunar software. The images were also analysed by two expert readers using a semi-quantitative approach. Agreement between groups ranged from 95.99% to 98.60%. The intra-rater precision for the application of SpineAnalyzer™ to vertebrae was poor in the upper thoracic regions, but good elsewhere. SpineAnalyzer™ is a reproducible and accurate method for measuring vertebral height and quantifying vertebral fractures from VFA scans. - Highlights: • Vertebral fracture assessment (VFA) using Dual-energy X-ray Absorptiometry (DXA) offers a low-dose opportunity for diagnosis. • Agreement between VFA software (SpineAnalyzer™) and expert readers is high. • Intra-rater precision of SpineAnalyzer™ applied to upper thoracic vertebrae is poor, but good elsewhere. • SpineAnalyzer™ is reproducible and accurate for vertebral height measurement and fracture quantification from VFA scans

  20. Didiscus verdensis spec. nov. (Porifera: Halichondrida) from the Cape Verde Islands, with a revision and phylogenetic classification of the genus Didiscus

    Hiemstra, F.; Soest, van R.W.M.

    1991-01-01

    A new species of the circumtropical/subtropical genus Didiscus Dendy, 1922 is described from the Cape Verde Islands. Based on a phylogenetic analysis of all known species of the genus, using morphological and microscopical (including SEM) characters, it was demonstrated that the new species is close

  1. Cyber infrastructure for Fusarium: three integrated platforms supporting strain identification, phylogenetics, comparative genomics and knowledge sharing

    Park, Bongsoo; Park, Jongsun; Cheong, Kyeong-Chae; Choi, Jaeyoung; Jung, Kyongyong; Kim, Donghan; Lee, Yong-Hwan; Ward, Todd J.; O'Donnell, Kerry; Geiser, David M.; Kang, Seogchan

    2010-01-01

    The fungal genus Fusarium includes many plant and/or animal pathogenic species and produces diverse toxins. Although accurate species identification is critical for managing such threats, it is difficult to identify Fusarium morphologically. Fortunately, extensive molecular phylogenetic studies, founded on well-preserved culture collections, have established a robust foundation for Fusarium classification. Genomes of four Fusarium species have been published with more being currently sequence...

  2. Molecular Phylogenetic: Organism Taxonomy Method Based on Evolution History

    N.L.P Indi Dharmayanti

    2011-01-01

    Phylogenetic is described as taxonomy classification of an organism based on its evolution history namely its phylogeny and as a part of systematic science that has objective to determine phylogeny of organism according to its characteristic. Phylogenetic analysis from amino acid and protein usually became important area in sequence analysis. Phylogenetic analysis can be used to follow the rapid change of a species such as virus. The phylogenetic evolution tree is a two dimensional of a spec...

  3. Rapid and accurate taxonomic classification of insect (class Insecta) cytochrome c oxidase subunit 1 (COI) DNA barcode sequences using a naïve Bayesian classifier

    Porter, Teresita M.; Gibson, Joel F; Shokralla, Shadi; Baird, Donald J.; Golding, G. Brian; Hajibabaei, Mehrdad

    2014-01-01

    Current methods to identify unknown insect (class Insecta) cytochrome c oxidase (COI barcode) sequences often rely on thresholds of distances that can be difficult to define, sequence similarity cut-offs, or monophyly. Some of the most commonly used metagenomic classification methods do not provide a measure of confidence for the taxonomic assignments they provide. The aim of this study was to use a naïve Bayesian classifier (Wang et al. Applied and Environmental Microbiology, 2007; 73: 5261)...

  4. The Revised Classification of Eukaryotes

    Adl, Sina M; Simpson, Alastair G.B.; Lane, Christopher E.; Lukeš, Julius; Bass, David; Bowser, Samuel S.; Brown, Matthew W.; Burki, Fabien; Dunthorn, Micah; Hampl, Vladimir; Heiss, Aaron; Hoppenrath, Mona; Lara, Enrique; Le Gall, Line; Lynn, Denis H.

    2013-01-01

    This revision of the classification of eukaryotes, which updates that of Adl et al. [J. Eukaryot. Microbiol. 52 (2005) 399], retains an emphasis on the protists and incorporates changes since 2005 that have resolved nodes and branches in phylogenetic trees. Whereas the previous revision was successful in re-introducing name stability to the classification, this revision provides a classification for lineages that were then still unresolved. The supergroups have withstood phylogenetic hypothes...

  5. ICGA-PSO-ELM approach for accurate multiclass cancer classification resulting in reduced gene sets in which genes encoding secreted proteins are highly represented.

    Saraswathi, Saras; Sundaram, Suresh; Sundararajan, Narasimhan; Zimmermann, Michael; Nilsen-Hamilton, Marit

    2011-01-01

    A combination of Integer-Coded Genetic Algorithm (ICGA) and Particle Swarm Optimization (PSO), coupled with the neural-network-based Extreme Learning Machine (ELM), is used for gene selection and cancer classification. ICGA is used with PSO-ELM to select an optimal set of genes, which is then used to build a classifier to develop an algorithm (ICGA_PSO_ELM) that can handle sparse data and sample imbalance. We evaluate the performance of ICGA-PSO-ELM and compare our results with existing methods in the literature. An investigation into the functions of the selected genes, using a systems biology approach, revealed that many of the identified genes are involved in cell signaling and proliferation. An analysis of these gene sets shows a larger representation of genes that encode secreted proteins than found in randomly selected gene sets. Secreted proteins constitute a major means by which cells interact with their surroundings. Mounting biological evidence has identified the tumor microenvironment as a critical factor that determines tumor survival and growth. Thus, the genes identified by this study that encode secreted proteins might provide important insights to the nature of the critical biological features in the microenvironment of each tumor type that allow these cells to thrive and proliferate. PMID:21233525

  6. Stratification of co-evolving genomic groups using ranked phylogenetic profiles

    Tsoka Sophia

    2009-10-01

    Full Text Available Abstract Background Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database. Results The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples. Conclusion Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples.

  7. Photometric brown-dwarf classification. II. A homogeneous sample of 1361 L and T dwarfs brighter than J = 17.5 with accurate spectral types

    Skrzypek, N.; Warren, S. J.; Faherty, J. K.

    2016-04-01

    We present a homogeneous sample of 1361 L and T dwarfs brighter than J = 17.5 (of which 998 are new), from an effective area of 3070 deg2, classified by the photo-type method to an accuracy of one spectral sub-type using izYJHKW1W2 photometry from SDSS+UKIDSS+WISE. Other than a small bias in the early L types, the sample is shown to be effectively complete to the magnitude limit, for all spectral types L0 to T8. The nature of the bias is an incompleteness estimated at 3% because peculiar blue L dwarfs of type L4 and earlier are classified late M. There is a corresponding overcompleteness because peculiar red (likely young) late M dwarfs are classified early L. Contamination of the sample is confirmed to be small: so far spectroscopy has been obtained for 19 sources in the catalogue and all are confirmed to be ultracool dwarfs. We provide coordinates and izYJHKW1W2 photometry of all sources. We identify an apparent discontinuity, Δm ~ 0.4 mag, in the Y - K colour between spectral types L7 and L8. We present near-infrared spectra of nine sources identified by photo-type as peculiar, including a new low-gravity source ULAS J005505.68+013436.0, with spectroscopic classification L2γ. We provide revised izYJHKW1W2 template colours for late M dwarfs, types M7 to M9. The catalogue is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (ftp://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/589/A49

  8. The need for improved identification and accurate classification of stages 3-5 Chronic Kidney Disease in primary care: retrospective cohort study.

    Poorva Jain

    Full Text Available BACKGROUND: Around ten percent of the population have been reported as having Chronic Kidney Disease (CKD, which is associated with increased cardiovascular mortality. Few previous studies have ascertained the chronicity of CKD. In the UK, a payment for performance (P4P initiative incentivizes CKD (stages 3-5 recognition and management in primary care, but the impact of this has not been assessed. METHODS AND FINDINGS: Using data from 426 primary care practices (population 2,707,130, the age standardised prevalence of stages 3-5 CKD was identified using two consecutive estimated Glomerular Filtration Rates (eGFRs seven days apart. Additionally the accuracy of practice CKD registers and the relationship between accurate identification of CKD and the achievement of P4P indicators was determined. Between 2005 and 2009, the prevalence of stages 3-5 CKD increased from 0.3% to 3.9%. In 2009, 30,440 patients (1.1% unadjusted fulfilled biochemical criteria for CKD but were not on a practice CKD register (uncoded CKD and 60,705 patients (2.2% unadjusted were included on a practice CKD register but did not fulfil biochemical criteria (miscoded CKD. For patients with confirmed CKD, inclusion in a practice register was associated with increasing age, male sex, diabetes, hypertension, cardiovascular disease and increasing CKD stage (p<0.0001. Uncoded CKD patients compared to miscoded patients were less likely to achieve performance indicators for blood pressure (OR 0.84, 95% CI 0.82-0.86 p<0.001 or recorded albumin-creatinine ratio (OR 0.73, 0.70-0.76, p<0.001. CONCLUSIONS: The prevalence of stages 3-5 CKD, using two laboratory reported eGFRs, was lower than estimates from previous studies. Clinically significant discrepancies were identified between biochemically defined CKD and appearance on practice registers, with misclassification associated with sub-optimal care for some people with CKD.

  9. Phylogenetic and phytogeographical relationships in Maloideae (Rosaceae) based on morphological and anatomical characters

    Aldasoro, J.J.; Aedo, C.; Navarro, C.

    2005-01-01

    Phylogenetic relationships among 24 genera of Rosaceae subfam. Maloideae and Spiraeoideae are explored by means of a cladistic analysis; 16 morphological and anatomical characters were included in the analysis. Published suprageneric classifications and characters used in these classifications are b

  10. Advances in phylogenetic studies of Nematoda

    2002-01-01

    Nematoda is a metazoan group with extremely high diversity only next to Insecta. Caenorhabditis elegans is now a favorable experimental model animal in modern developmental biology, genetics and genomics studies. However, the phylogeny of Nematoda and the phylogenetic position of the phylum within animal kingdom have long been in debate. Recent molecular phylogenetic studies gave great challenges to the traditional nematode classification. The new phylogenies not only placed the Nematoda in the Ecdysozoan and divided the phylum into five clades, but also provided new insights into animal molecular identification and phylogenetic biodiversity studies. The present paper reviews major progress and remaining problems in the current molecular phylogenetic studies of Nematoda, and prospects the developmental tendencies of this field.

  11. ClassyFlu: Classification of Influenza A Viruses with Discriminatively Trained Profile-HMMs

    Van der Auwera, Sandra; Bulla, Ingo; Ziller, Mario; Pohlmann, Anne; Harder, Timm; Stanke, Mario

    2014-01-01

    Accurate and rapid characterization of influenza A virus (IAV) hemagglutinin (HA) and neuraminidase (NA) sequences with respect to subtype and clade is at the basis of extended diagnostic services and implicit to molecular epidemiologic studies. ClassyFlu is a new tool and web service for the classification of IAV sequences of the HA and NA gene into subtypes and phylogenetic clades using discriminatively trained profile hidden Markov models (HMMs), one for each subtype or clade. ClassyFlu me...

  12. A genus-level classification of the family Thraupidae (Class Aves: Order Passeriformes).

    Burns, Kevin J; Unitt, Philip; Mason, Nicholas A

    2016-01-01

    The tanagers (Thraupidae) are a major component of the Neotropical avifauna, and vary in plumage colors, behaviors, morphologies, and ecologies. Globally, they represent nearly 4% of all avian species and are the largest family of songbirds. However, many currently used tanager genera are not monophyletic, based on analyses of molecular data that have accumulated over the past 25 years. Current genus-level classifications of tanagers have not been revised according to newly documented relationships of tanagers for various reasons: 1) the lack of a comprehensive phylogeny, 2) reluctance to lump existing genera into larger groups, and 3) the lack of available names for newly defined smaller groups. Here, we present two alternative classifications based on a newly published comprehensive phylogeny of tanagers. One of these classifications uses existing generic names, but defines them broadly. The other, which we advocate and follow here, provides new generic names for more narrowly defined groups. Under the latter, we propose eleven new genera (Asemospiza, Islerothraupis, Maschalethraupis, Chrysocorypha, Kleinothraupis, Castanozoster, Ephippiospingus, Chionodacryon, Pseudosaltator, Poecilostreptus, Stilpnia), and resurrect several generic names to form monophyletic taxa. Either of these classifications would allow taxonomic authorities to reconcile classification with current understanding of tanager phylogenetic relationships. Having a more phylogenetically accurate classification for tanagers will facilitate the study and conservation of this important Neotropical radiation of songbirds. PMID:27394344

  13. The evolution of HPV by means of a phylogenetic study.

    Isea, Raúl; Chaves, Juan L; Montes, Esther; Rubio-Montero, Antonio J; Mayo, Rafael

    2009-01-01

    In this work we demonstrate the adequacy of revising the classification systems based on molecular phylogenetic calculations by allowing an arbitrary number of taxas that take advantage of high performance computing platforms for the Human papillomavirus (HPV) case. To do so, we have analysed several phylogenetic trees which have been calculated with the PhyloGrid tool, a workflow developed in the framework of the EELA-2 Project. PMID:19593062

  14. Efficient multivariate sequence classification

    Kuksa, Pavel P.

    2014-01-01

    Kernel-based approaches for sequence classification have been successfully applied to a variety of domains, including the text categorization, image classification, speech analysis, biological sequence analysis, time series and music classification, where they show some of the most accurate results. Typical kernel functions for sequences in these domains (e.g., bag-of-words, mismatch, or subsequence kernels) are restricted to {\\em discrete univariate} (i.e. one-dimensional) string data, such ...

  15. Phylogenetic effective sample size

    Bartoszek, Krzysztof

    2015-01-01

    In this paper I address the question - how large is a phylogenetic sample I propose a definition of a phylogenetic effective sample size for Brownian motion and Ornstein-Uhlenbeck processes - the regression effective sample size. I discuss how mutual information can be used to define an effective sample size in the non-normal process case and compare these two definitions to an already present concept of effective sample size (the mean effective sample size). Through a simulation study I find...

  16. 'Araphid' diatom classification and the 'absolute standard'

    Williams, David M.

    2009-01-01

    'Araphid' diatom classification is discussed from the point of view of an 'absolute standard' for taxonomic rank. The 'absolute standard' is the phylogenetic tree, its nodes, the included monophyletic groups and sub-groups. To illustrate this point a few species from the genus Licmophora are re-analysed and the resulting phylogenetic tree is discussed in terms of a possible classification, the groups and sub-groups and their ranks.

  17. Phylogenetically resolving epidemiologic linkage

    Romero-Severson, Ethan O.; Bulla, Ingo; Leitner, Thomas

    2016-01-01

    Although the use of phylogenetic trees in epidemiological investigations has become commonplace, their epidemiological interpretation has not been systematically evaluated. Here, we use an HIV-1 within-host coalescent model to probabilistically evaluate transmission histories of two epidemiologically linked hosts. Previous critique of phylogenetic reconstruction has claimed that direction of transmission is difficult to infer, and that the existence of unsampled intermediary links or common sources can never be excluded. The phylogenetic relationship between the HIV populations of epidemiologically linked hosts can be classified into six types of trees, based on cladistic relationships and whether the reconstruction is consistent with the true transmission history or not. We show that the direction of transmission and whether unsampled intermediary links or common sources existed make very different predictions about expected phylogenetic relationships: (i) Direction of transmission can often be established when paraphyly exists, (ii) intermediary links can be excluded when multiple lineages were transmitted, and (iii) when the sampled individuals’ HIV populations both are monophyletic a common source was likely the origin. Inconsistent results, suggesting the wrong transmission direction, were generally rare. In addition, the expected tree topology also depends on the number of transmitted lineages, the sample size, the time of the sample relative to transmission, and how fast the diversity increases after infection. Typically, 20 or more sequences per subject give robust results. We confirm our theoretical evaluations with analyses of real transmission histories and discuss how our findings should aid in interpreting phylogenetic results. PMID:26903617

  18. Clustering with phylogenetic tools in astrophysics

    Fraix-Burnet, Didier

    2016-01-01

    Phylogenetic approaches are finding more and more applications outside the field of biology. Astrophysics is no exception since an overwhelming amount of multivariate data has appeared in the last twenty years or so. In particular, the diversification of galaxies throughout the evolution of the Universe quite naturally invokes phylogenetic approaches. We have demonstrated that Maximum Parsimony brings useful astrophysical results, and we now proceed toward the analyses of large datasets for galaxies. In this talk I present how we solve the major difficulties for this goal: the choice of the parameters, their discretization, and the analysis of a high number of objects with an unsupervised NP-hard classification technique like cladistics. 1. Introduction How do the galaxy form, and when? How did the galaxy evolve and transform themselves to create the diversity we observe? What are the progenitors to present-day galaxies? To answer these big questions, observations throughout the Universe and the physical mode...

  19. A Universal Phylogenetic Tree.

    Offner, Susan

    2001-01-01

    Presents a universal phylogenetic tree suitable for use in high school and college-level biology classrooms. Illustrates the antiquity of life and that all life is related, even if it dates back 3.5 billion years. Reflects important evolutionary relationships and provides an exciting way to learn about the history of life. (SAH)

  20. Charles Darwin, beetles and phylogenetics

    Beutel, Rolf G.; Friedrich, Frank; Leschen, Richard A. B.

    2009-11-01

    Here, we review Charles Darwin’s relation to beetles and developments in coleopteran systematics in the last two centuries. Darwin was an enthusiastic beetle collector. He used beetles to illustrate different evolutionary phenomena in his major works, and astonishingly, an entire sub-chapter is dedicated to beetles in “The Descent of Man”. During his voyage on the Beagle, Darwin was impressed by the high diversity of beetles in the tropics, and he remarked that, to his surprise, the majority of species were small and inconspicuous. However, despite his obvious interest in the group, he did not get involved in beetle taxonomy, and his theoretical work had little immediate impact on beetle classification. The development of taxonomy and classification in the late nineteenth and earlier twentieth century was mainly characterised by the exploration of new character systems (e.g. larval features and wing venation). In the mid-twentieth century, Hennig’s new methodology to group lineages by derived characters revolutionised systematics of Coleoptera and other organisms. As envisioned by Darwin and Ernst Haeckel, the new Hennigian approach enabled systematists to establish classifications truly reflecting evolution. Roy A. Crowson and Howard E. Hinton, who both made tremendous contributions to coleopterology, had an ambivalent attitude towards the Hennigian ideas. The Mickoleit school combined detailed anatomical work with a classical Hennigian character evaluation, with stepwise tree building, comparatively few characters and a priori polarity assessment without explicit use of the outgroup comparison method. The rise of cladistic methods in the 1970s had a strong impact on beetle systematics. Cladistic computer programs facilitated parsimony analyses of large data matrices, mostly morphological characters not requiring detailed anatomical investigations. Molecular studies on beetle phylogeny started in the 1990s with modest taxon sampling and limited DNA data

  1. Phylogenetic molecular function annotation

    Barbara E Engelhardt; Jordan, Michael I.; Repo, Susanna T; Brenner, Steven E

    2009-01-01

    It is now easier to discover thousands of protein sequences in a new microbial genome than it is to biochemically characterize the specific activity of a single protein of unknown function. The molecular functions of protein sequences have typically been predicted using homology-based computational methods, which rely on the principle that homologous proteins share a similar function. However, some protein families include groups of proteins with different molecular functions. A phylogenetic ...

  2. Molecular phylogenetics before sequences

    Mark A. Ragan; Bernard, Guillaume,; Chan, Cheong Xin

    2014-01-01

    From 1971 to 1985, Carl Woese and colleagues generated oligonucleotide catalogs of 16S/18S rRNAs from more than 400 organisms. Using these incomplete and imperfect data, Carl and his colleagues developed unprecedented insights into the structure, function, and evolution of the large RNA components of the translational apparatus. They recognized a third domain of life, revealed the phylogenetic backbone of bacteria (and its limitations), delineated taxa, and explored the tempo and mode of micr...

  3. Canonical phylogenetic ordination.

    Giannini, Norberto P

    2003-10-01

    A phylogenetic comparative method is proposed for estimating historical effects on comparative data using the partitions that compose a cladogram, i.e., its monophyletic groups. Two basic matrices, Y and X, are defined in the context of an ordinary linear model. Y contains the comparative data measured over t taxa. X consists of an initial tree matrix that contains all the xj monophyletic groups (each coded separately as a binary indicator variable) of the phylogenetic tree available for those taxa. The method seeks to define the subset of groups, i.e., a reduced tree matrix, that best explains the patterns in Y. This definition is accomplished via regression or canonical ordination (depending on the dimensionality of Y) coupled with Monte Carlo permutations. It is argued here that unrestricted permutations (i.e., under an equiprobable model) are valid for testing this specific kind of groupwise hypothesis. Phylogeny is either partialled out or, more properly, incorporated into the analysis in the form of component variation. Direct extensions allow for testing ecomorphological data controlled by phylogeny in a variation partitioning approach. Currently available statistical techniques make this method applicable under most univariate/multivariate models and metrics; two-way phylogenetic effects can be estimated as well. The simplest case (univariate Y), tested with simulations, yielded acceptable type I error rates. Applications presented include examples from evolutionary ethology, ecology, and ecomorphology. Results showed that the new technique detected previously overlooked variation clearly associated with phylogeny and that many phylogenetic effects on comparative data may occur at particular groups rather than across the entire tree. PMID:14530135

  4. Efficient segmentation by sparse pixel classification

    Dam, Erik B; Loog, Marco

    2008-01-01

    Segmentation methods based on pixel classification are powerful but often slow. We introduce two general algorithms, based on sparse classification, for optimizing the computation while still obtaining accurate segmentations. The computational costs of the algorithms are derived, and they are...

  5. Multiple sparse representations classification

    Plenge, Esben; Klein, Stefan; Niessen, Wiro; Meijering, Erik

    2015-01-01

    textabstractSparse representations classification (SRC) is a powerful technique for pixelwise classification of images and it is increasingly being used for a wide variety of image analysis tasks. The method uses sparse representation and learned redundant dictionaries to classify image pixels. In this empirical study we propose to further leverage the redundancy of the learned dictionaries to achieve a more accurate classifier. In conventional SRC, each image pixel is associated with a small...

  6. Multiple Sparse Representations Classification

    Plenge, Esben; Klein, Stefan S.; Niessen, Wiro J.; Meijering, Erik

    2015-01-01

    Sparse representations classification (SRC) is a powerful technique for pixelwise classification of images and it is increasingly being used for a wide variety of image analysis tasks. The method uses sparse representation and learned redundant dictionaries to classify image pixels. In this empirical study we propose to further leverage the redundancy of the learned dictionaries to achieve a more accurate classifier. In conventional SRC, each image pixel is associated with a small patch surro...

  7. Nominal classification

    Senft, G.

    2007-01-01

    This handbook chapter summarizes some of the problems of nominal classification in language, presents and illustrates the various systems or techniques of nominal classification, and points out why nominal classification is one of the most interesting topics in Cognitive Linguistics.

  8. Associations of Leaf Spectra with Genetic and Phylogenetic Variation in Oaks: Prospects for Remote Detection of Biodiversity

    Jeannine Cavender-Bares

    2016-03-01

    Full Text Available Species and phylogenetic lineages have evolved to differ in the way that they acquire and deploy resources, with consequences for their physiological, chemical and structural attributes, many of which can be detected using spectral reflectance form leaves. Recent technological advances for assessing optical properties of plants offer opportunities to detect functional traits of organisms and differentiate levels of biological organization across the tree of life. Here, we connect leaf-level full range spectral data (400–2400 nm of leaves to the hierarchical organization of plant diversity within the oak genus (Quercus using field and greenhouse experiments in which environmental factors and plant age are controlled. We show that spectral data significantly differentiate populations within a species and that spectral similarity is significantly associated with phylogenetic similarity among species. We further show that hyperspectral information allows more accurate classification of taxa than spectrally-derived traits, which by definition are of lower dimensionality. Finally, model accuracy increases at higher levels in the hierarchical organization of plant diversity, such that we are able to better distinguish clades than species or populations. This pattern supports an evolutionary explanation for the degree of optical differentiation among plants and demonstrates potential for remote detection of genetic and phylogenetic diversity.

  9. Fast phylogenetic DNA barcoding

    Terkelsen, Kasper Munch; Boomsma, Wouter Krogh; Willerslev, Eske;

    2008-01-01

    We present a heuristic approach to the DNA assignment problem based on phylogenetic inferences using constrained neighbour joining and non-parametric bootstrapping. We show that this method performs as well as the more computationally intensive full Bayesian approach in an analysis of 500 insect...... DNA sequences obtained from GenBank. We also analyse a previously published dataset of environmental DNA sequences from soil from New Zealand and Siberia, and use these data to illustrate the fact that statistical approaches to the DNA assignment problem allow for more appropriate criteria...... for determining the taxonomic level at which a particular DNA sequence can be assigned....

  10. Phylogenetic comparative assembly

    Husemann Peter

    2010-01-01

    Full Text Available Abstract Background Recent high throughput sequencing technologies are capable of generating a huge amount of data for bacterial genome sequencing projects. Although current sequence assemblers successfully merge the overlapping reads, often several contigs remain which cannot be assembled any further. It is still costly and time consuming to close all the gaps in order to acquire the whole genomic sequence. Results Here we propose an algorithm that takes several related genomes and their phylogenetic relationships into account to create a graph that contains the likelihood for each pair of contigs to be adjacent. Subsequently, this graph can be used to compute a layout graph that shows the most promising contig adjacencies in order to aid biologists in finishing the complete genomic sequence. The layout graph shows unique contig orderings where possible, and the best alternatives where necessary. Conclusions Our new algorithm for contig ordering uses sequence similarity as well as phylogenetic information to estimate adjacencies of contigs. An evaluation of our implementation shows that it performs better than recent approaches while being much faster at the same time.

  11. Phylogenetic trees in bioinformatics

    Burr, Tom L [Los Alamos National Laboratory

    2008-01-01

    Genetic data is often used to infer evolutionary relationships among a collection of viruses, bacteria, animal or plant species, or other operational taxonomic units (OTU). A phylogenetic tree depicts such relationships and provides a visual representation of the estimated branching order of the OTUs. Tree estimation is unique for several reasons, including: the types of data used to represent each OTU; the use ofprobabilistic nucleotide substitution models; the inference goals involving both tree topology and branch length, and the huge number of possible trees for a given sample of a very modest number of OTUs, which implies that fmding the best tree(s) to describe the genetic data for each OTU is computationally demanding. Bioinformatics is too large a field to review here. We focus on that aspect of bioinformatics that includes study of similarities in genetic data from multiple OTUs. Although research questions are diverse, a common underlying challenge is to estimate the evolutionary history of the OTUs. Therefore, this paper reviews the role of phylogenetic tree estimation in bioinformatics, available methods and software, and identifies areas for additional research and development.

  12. CREST--classification resources for environmental sequence tags.

    Anders Lanzén

    Full Text Available Sequencing of taxonomic or phylogenetic markers is becoming a fast and efficient method for studying environmental microbial communities. This has resulted in a steadily growing collection of marker sequences, most notably of the small-subunit (SSU ribosomal RNA gene, and an increased understanding of microbial phylogeny, diversity and community composition patterns. However, to utilize these large datasets together with new sequencing technologies, a reliable and flexible system for taxonomic classification is critical. We developed CREST (Classification Resources for Environmental Sequence Tags, a set of resources and tools for generating and utilizing custom taxonomies and reference datasets for classification of environmental sequences. CREST uses an alignment-based classification method with the lowest common ancestor algorithm. It also uses explicit rank similarity criteria to reduce false positives and identify novel taxa. We implemented this method in a web server, a command line tool and the graphical user interfaced program MEGAN. Further, we provide the SSU rRNA reference database and taxonomy SilvaMod, derived from the publicly available SILVA SSURef, for classification of sequences from bacteria, archaea and eukaryotes. Using cross-validation and environmental datasets, we compared the performance of CREST and SilvaMod to the RDP Classifier. We also utilized Greengenes as a reference database, both with CREST and the RDP Classifier. These analyses indicate that CREST performs better than alignment-free methods with higher recall rate (sensitivity as well as precision, and with the ability to accurately identify most sequences from novel taxa. Classification using SilvaMod performed better than with Greengenes, particularly when applied to environmental sequences. CREST is freely available under a GNU General Public License (v3 from http://apps.cbu.uib.no/crest and http://lcaclassifier.googlecode.com.

  13. Molecular Phylogenetic: Organism Taxonomy Method Based on Evolution History

    N.L.P Indi Dharmayanti

    2011-03-01

    Full Text Available Phylogenetic is described as taxonomy classification of an organism based on its evolution history namely its phylogeny and as a part of systematic science that has objective to determine phylogeny of organism according to its characteristic. Phylogenetic analysis from amino acid and protein usually became important area in sequence analysis. Phylogenetic analysis can be used to follow the rapid change of a species such as virus. The phylogenetic evolution tree is a two dimensional of a species graphic that shows relationship among organisms or particularly among their gene sequences. The sequence separation are referred as taxa (singular taxon that is defined as phylogenetically distinct units on the tree. The tree consists of outer branches or leaves that represents taxa and nodes and branch represent correlation among taxa. When the nucleotide sequence from two different organism are similar, they were inferred to be descended from common ancestor. There were three methods which were used in phylogenetic, namely (1 Maximum parsimony, (2 Distance, and (3 Maximum likehoood. Those methods generally are applied to construct the evolutionary tree or the best tree for determine sequence variation in group. Every method is usually used for different analysis and data.

  14. Modeling body size evolution in Felidae under alternative phylogenetic hypotheses

    José Alexandre Felizola Diniz-Filho

    2009-01-01

    Full Text Available The use of phylogenetic comparative methods in ecological research has advanced during the last twenty years, mainly due to accurate phylogenetic reconstructions based on molecular data and computational and statistical advances. We used phylogenetic correlograms and phylogenetic eigenvector regression (PVR to model body size evolution in 35 worldwide Felidae (Mammalia, Carnivora species using two alternative phylogenies and published body size data. The purpose was not to contrast the phylogenetic hypotheses but to evaluate how analyses of body size evolution patterns can be affected by the phylogeny used for comparative analyses (CA. Both phylogenies produced a strong phylogenetic pattern, with closely related species having similar body sizes and the similarity decreasing with increasing distances in time. The PVR explained 65% to 67% of body size variation and all Moran's I values for the PVR residuals were non-significant, indicating that both these models explained phylogenetic structures in trait variation. Even though our results did not suggest that any phylogeny can be used for CA with the same power, or that “good” phylogenies are unnecessary for the correct interpretation of the evolutionary dynamics of ecological, biogeographical, physiological or behavioral patterns, it does suggest that developments in CA can, and indeed should, proceed without waiting for perfect and fully resolved phylogenies.

  15. Insights into the evolution of sorbitol metabolism: phylogenetic analysis of SDR196C family

    Sola Carvajal Agustín

    2012-08-01

    Full Text Available Abstract Background Short chain dehydrogenases/reductases (SDR are NAD(P(H-dependent oxidoreductases with a highly conserved 3D structure and of an early origin, which has allowed them to diverge into several families and enzymatic activities. The SDR196C family (http://www.sdr-enzymes.org groups bacterial sorbitol dehydrogenases (SDH, which are of great industrial interest. In this study, we examine the phylogenetic relationship between the members of this family, and based on the findings and some sequence conserved blocks, a new and a more accurate classification is proposed. Results The distribution of the 66 bacterial SDH species analyzed was limited to Gram-negative bacteria. Six different bacterial families were found, encompassing α-, β- and γ-proteobacteria. This broad distribution in terms of bacteria and niches agrees with that of SDR, which are found in all forms of life. A cluster analysis of sorbitol dehydrogenase revealed different types of gene organization, although with a common pattern in which the SDH gene is surrounded by sugar ABC transporter proteins, another SDR, a kinase, and several gene regulators. According to the obtained trees, six different lineages and three sublineages can be discerned. The phylogenetic analysis also suggested two different origins for SDH in β-proteobacteria and four origins for γ-proteobacteria. Finally, this subdivision was further confirmed by the differences observed in the sequence of the conserved blocks described for SDR and some specific blocks of SDH, and by a functional divergence analysis, which made it possible to establish new consensus sequences and specific fingerprints for the lineages and sub lineages. Conclusion SDH distribution agrees with that observed for SDR, indicating the importance of the polyol metabolism, as an alternative source of carbon and energy. The phylogenetic analysis pointed to six clearly defined lineages and three sub lineages, and great variability in

  16. Ant-Based Phylogenetic Reconstruction (ABPR: A new distance algorithm for phylogenetic estimation based on ant colony optimization

    Karla Vittori

    2008-12-01

    Full Text Available We propose a new distance algorithm for phylogenetic estimation based on Ant Colony Optimization (ACO, named Ant-Based Phylogenetic Reconstruction (ABPR. ABPR joins two taxa iteratively based on evolutionary distance among sequences, while also accounting for the quality of the phylogenetic tree built according to the total length of the tree. Similar to optimization algorithms for phylogenetic estimation, the algorithm allows exploration of a larger set of nearly optimal solutions. We applied the algorithm to four empirical data sets of mitochondrial DNA ranging from 12 to 186 sequences, and from 898 to 16,608 base pairs, and covering taxonomic levels from populations to orders. We show that ABPR performs better than the commonly used Neighbor-Joining algorithm, except when sequences are too closely related (e.g., population-level sequences. The phylogenetic relationships recovered at and above species level by ABPR agree with conventional views. However, like other algorithms of phylogenetic estimation, the proposed algorithm failed to recover expected relationships when distances are too similar or when rates of evolution are very variable, leading to the problem of long-branch attraction. ABPR, as well as other ACO-based algorithms, is emerging as a fast and accurate alternative method of phylogenetic estimation for large data sets.

  17. CORE: a phylogenetically-curated 16S rDNA database of the core oral microbiome.

    Ann L Griffen

    Full Text Available Comparing bacterial 16S rDNA sequences to GenBank and other large public databases via BLAST often provides results of little use for identification and taxonomic assignment of the organisms of interest. The human microbiome, and in particular the oral microbiome, includes many taxa, and accurate identification of sequence data is essential for studies of these communities. For this purpose, a phylogenetically curated 16S rDNA database of the core oral microbiome, CORE, was developed. The goal was to include a comprehensive and minimally redundant representation of the bacteria that regularly reside in the human oral cavity with computationally robust classification at the level of species and genus. Clades of cultivated and uncultivated taxa were formed based on sequence analyses using multiple criteria, including maximum-likelihood-based topology and bootstrap support, genetic distance, and previous naming. A number of classification inconsistencies for previously named species, especially at the level of genus, were resolved. The performance of the CORE database for identifying clinical sequences was compared to that of three publicly available databases, GenBank nr/nt, RDP and HOMD, using a set of sequencing reads that had not been used in creation of the database. CORE offered improved performance compared to other public databases for identification of human oral bacterial 16S sequences by a number of criteria. In addition, the CORE database and phylogenetic tree provide a framework for measures of community divergence, and the focused size of the database offers advantages of efficiency for BLAST searching of large datasets. The CORE database is available as a searchable interface and for download at http://microbiome.osu.edu.

  18. Dengue virus type 3 in Brazil: a phylogenetic perspective

    Josélio Maria Galvão de Araújo

    2009-05-01

    Full Text Available Circulation of a new dengue virus (DENV-3 genotype was recently described in Brazil and Colombia, but the precise classification of this genotype has been controversial. Here we perform phylogenetic and nucleotide-distance analyses of the envelope gene, which support the subdivision of DENV-3 strains into five distinct genotypes (GI to GV and confirm the classification of the new South American genotype as GV. The extremely low genetic distances between Brazilian GV strains and the prototype Philippines/L11423 GV strain isolated in 1956 raise important questions regarding the origin of GV in South America.

  19. Ultrafast Approximation for Phylogenetic Bootstrap

    Bui Quang Minh, [No Value; Nguyen, Thi; von Haeseler, Arndt

    2013-01-01

    Nonparametric bootstrap has been a widely used tool in phylogenetic analysis to assess the clade support of phylogenetic trees. However, with the rapidly growing amount of data, this task remains a computational bottleneck. Recently, approximation methods such as the RAxML rapid bootstrap (RBS) and

  20. Quartets and unrooted phylogenetic networks.

    Gambette, Philippe; Berry, Vincent; Paul, Christophe

    2012-08-01

    Phylogenetic networks were introduced to describe evolution in the presence of exchanges of genetic material between coexisting species or individuals. Split networks in particular were introduced as a special kind of abstract network to visualize conflicts between phylogenetic trees which may correspond to such exchanges. More recently, methods were designed to reconstruct explicit phylogenetic networks (whose vertices can be interpreted as biological events) from triplet data. In this article, we link abstract and explicit networks through their combinatorial properties, by introducing the unrooted analog of level-k networks. In particular, we give an equivalence theorem between circular split systems and unrooted level-1 networks. We also show how to adapt to quartets some existing results on triplets, in order to reconstruct unrooted level-k phylogenetic networks. These results give an interesting perspective on the combinatorics of phylogenetic networks and also raise algorithmic and combinatorial questions. PMID:22809417

  1. PSG-Based Classification of Sleep Phases

    Králík, M.

    2015-01-01

    This work is focused on classification of sleep phases using artificial neural network. The unconventional approach was used for calculation of classification features using polysomnographic data (PSG) of real patients. This approach allows to increase the time resolution of the analysis and, thus, to achieve more accurate results of classification.

  2. Classification of pmoA amplicon pyrosequences using BLAST and the lowest common ancestor method in MEGAN

    Dumont, Marc G.; Lüke, Claudia; Deng, Yongcui; Frenzel, Peter

    2014-01-01

    The classification of high-throughput sequencing data of protein-encoding genes is not as well established as for 16S rRNA. The objective of this work was to develop a simple and accurate method of classifying large datasets of pmoA sequences, a common marker for methanotrophic bacteria. A taxonomic system for pmoA was developed based on a phylogenetic analysis of available sequences. The taxonomy incorporates the known diversity of pmoA present in public databases, including both sequences f...

  3. High-resolution phylogenetic microbial community profiling

    Singer, Esther; Coleman-Derr, Devin; Bowman, Brett; Schwientek, Patrick; Clum, Alicia; Copeland, Alex; Ciobanu, Doina; Cheng, Jan-Fang; Gies, Esther; Hallam, Steve; Tringe, Susannah; Woyke, Tanja

    2014-03-17

    The representation of bacterial and archaeal genome sequences is strongly biased towards cultivated organisms, which belong to merely four phylogenetic groups. Functional information and inter-phylum level relationships are still largely underexplored for candidate phyla, which are often referred to as microbial dark matter. Furthermore, a large portion of the 16S rRNA gene records in the GenBank database are labeled as environmental samples and unclassified, which is in part due to low read accuracy, potential chimeric sequences produced during PCR amplifications and the low resolution of short amplicons. In order to improve the phylogenetic classification of novel species and advance our knowledge of the ecosystem function of uncultivated microorganisms, high-throughput full length 16S rRNA gene sequencing methodologies with reduced biases are needed. We evaluated the performance of PacBio single-molecule real-time (SMRT) sequencing in high-resolution phylogenetic microbial community profiling. For this purpose, we compared PacBio and Illumina metagenomic shotgun and 16S rRNA gene sequencing of a mock community as well as of an environmental sample from Sakinaw Lake, British Columbia. Sakinaw Lake is known to contain a large age of microbial species from candidate phyla. Sequencing results show that community structure based on PacBio shotgun and 16S rRNA gene sequences is highly similar in both the mock and the environmental communities. Resolution power and community representation accuracy from SMRT sequencing data appeared to be independent of GC content of microbial genomes and was higher when compared to Illumina-based metagenome shotgun and 16S rRNA gene (iTag) sequences, e.g. full-length sequencing resolved all 23 OTUs in the mock community, while iTags did not resolve closely related species. SMRT sequencing hence offers various potential benefits when characterizing uncharted microbial communities.

  4. Phylogenetic and biogeographic analysis of sphaerexochine trilobites.

    Curtis R Congreve

    Full Text Available BACKGROUND: Sphaerexochinae is a speciose and widely distributed group of cheirurid trilobites. Their temporal range extends from the earliest Ordovician through the Silurian, and they survived the end Ordovician mass extinction event (the second largest mass extinction in Earth history. Prior to this study, the individual evolutionary relationships within the group had yet to be determined utilizing rigorous phylogenetic methods. Understanding these evolutionary relationships is important for producing a stable classification of the group, and will be useful in elucidating the effects the end Ordovician mass extinction had on the evolutionary and biogeographic history of the group. METHODOLOGY/PRINCIPAL FINDINGS: Cladistic parsimony analysis of cheirurid trilobites assigned to the subfamily Sphaerexochinae was conducted to evaluate phylogenetic patterns and produce a hypothesis of relationship for the group. This study utilized the program TNT, and the analysis included thirty-one taxa and thirty-nine characters. The results of this analysis were then used in a Lieberman-modified Brooks Parsimony Analysis to analyze biogeographic patterns during the Ordovician-Silurian. CONCLUSIONS/SIGNIFICANCE: The genus Sphaerexochus was found to be monophyletic, consisting of two smaller clades (one composed entirely of Ordovician species and another composed of Silurian and Ordovician species. By contrast, the genus Kawina was found to be paraphyletic. It is a basal grade that also contains taxa formerly assigned to Cydonocephalus. Phylogenetic patterns suggest Sphaerexochinae is a relatively distinctive trilobite clade because it appears to have been largely unaffected by the end Ordovician mass extinction. Finally, the biogeographic analysis yields two major conclusions about Sphaerexochus biogeography: Bohemia and Avalonia were close enough during the Silurian to exchange taxa; and during the Ordovician there was dispersal between Eastern Laurentia and

  5. [Foundations of the new phylogenetics].

    Pavlinov, I Ia

    2004-01-01

    Evolutionary idea is the core of the modern biology. Due to this, phylogenetics dealing with historical reconstructions in biology takes a priority position among biological disciplines. The second half of the 20th century witnessed growth of a great interest to phylogenetic reconstructions at macrotaxonomic level which replaced microevolutionary studies dominating during the 30s-60s. This meant shift from population thinking to phylogenetic one but it was not revival of the classical phylogenetics; rather, a new approach emerged that was baptized The New Phylogenetics. It arose as a result of merging of three disciplines which were developing independently during 60s-70s, namely cladistics, numerical phyletics, and molecular phylogenetics (now basically genophyletics). Thus, the new phylogenetics could be defined as a branch of evolutionary biology aimed at elaboration of "parsimonious" cladistic hypotheses by means of numerical methods on the basis of mostly molecular data. Classical phylogenetics, as a historical predecessor of the new one, emerged on the basis of the naturphilosophical worldview which included a superorganismal idea of biota. Accordingly to that view, historical development (the phylogeny) was thought an analogy of individual one (the ontogeny) so its most basical features were progressive parallel developments of "parts" (taxa), supplemented with Darwinian concept of monophyly. Two predominating traditions were diverged within classical phylogenetics according to a particular interpretation of relation between these concepts. One of them (Cope, Severtzow) belittled monophyly and paid most attention to progressive parallel developments of morphological traits. Such an attitude turned this kind of phylogenetics to be rather the semogenetics dealing primarily with evolution of structures and not of taxa. Another tradition (Haeckel) considered both monophyletic and parallel origins of taxa jointly: in the middle of 20th century it was split into

  6. A phylogenetic analysis of the myxobacteria: basis for their classification

    Shimkets, L.; Woese, C. R.

    1992-01-01

    The primary sequence and secondary structural features of the 16S rRNA were compared for 12 different myxobacteria representing all the known cultivated genera. Analysis of these data show the myxobacteria to form a monophyletic grouping consisting of three distinct families, which lies within the delta subdivision of the purple bacterial phylum. The composition of the families is consistent with differences in cell and spore morphology, cell behavior, and pigment and secondary metabolite production but is not correlated with the morphological complexity of the fruiting bodies. The Nannocystis exedens lineage has evolved at an unusually rapid pace and its rRNA shows numerous primary and secondary structural idiosyncrasies.

  7. Bayesian Classification in Medicine: The Transferability Question *

    Zagoria, Ronald J.; Reggia, James A.; Price, Thomas R.; Banko, Maryann

    1981-01-01

    Using probabilities derived from a geographically distant patient population, we applied Bayesian classification to categorize stroke patients by etiology. Performance was assessed both by error rate and with a new linear accuracy coefficient. This approach to patient classification was found to be surprisingly accurate when compared to classification by two neurologists and to classification by the Bayesian method using “low cost” local and subjective probabilities. We conclude that for some...

  8. Quantum Simulation of Phylogenetic Trees

    Ellinas, Demosthenes; Jarvis, Peter

    2011-01-01

    Quantum simulations constructing probability tensors of biological multi-taxa in phylogenetic trees are proposed, in terms of positive trace preserving maps, describing evolving systems of quantum walks with multiple walkers. Basic phylogenetic models applying on trees of various topologies are simulated following appropriate decoherent quantum circuits. Quantum simulations of statistical inference for aligned sequences of biological characters are provided in terms of a quantum pruning map o...

  9. Application of Data Mining in Protein Sequence Classification

    Suprativ Saha

    2012-11-01

    Full Text Available Protein sequence classification involves feature selection for accurate classification. Popular protein sequence classification techniques involve extraction of specific features from the sequences. Researchers apply some well-known classification techniques like neural networks, Genetic algorithm, Fuzzy ARTMAP,Rough Set Classifier etc for accurate classification. This paper presents a review is with three different classification models such as neural network model, fuzzy ARTMAP model and Rough set classifier model.This is followed by a new technique for classifying protein sequences. The proposed model is typicallyimplemented with an own designed tool and tries to reduce the computational overheads encountered by earlier approaches and increase the accuracy of classification.

  10. Molecular systematics of Volvocales (Chlorophyceae, Chlorophyta) based on exhaustive 18S rRNA phylogenetic analyses.

    Nakada, Takashi; Misawa, Kazuharu; Nozaki, Hisayoshi

    2008-07-01

    The taxonomy of Volvocales (Chlorophyceae, Chlorophyta) was traditionally based solely on morphological characteristics. However, because recent molecular phylogeny largely contradicts the traditional subordinal and familial classifications, no classification system has yet been established that describes the subdivision of Volvocales in a manner consistent with the phylogenetic relationships. Towards development of a natural classification system at and above the generic level, identification and sorting of hundreds of sequences based on subjective phylogenetic definitions is a significant step. We constructed an 18S rRNA gene phylogeny based on 449 volvocalean sequences collected using exhaustive BLAST searches of the GenBank database. Many chimeric sequences, which can cause fallacious phylogenetic trees, were detected and excluded during data collection. The results revealed 21 strongly supported primary clades within phylogenetically redefined Volvocales. Phylogenetic classification following PhyloCode was proposed based on the presented 18S rRNA gene phylogeny along with the results of previous combined 18S and 26S rRNA and chloroplast multigene analyses. PMID:18430591

  11. The phylogenetic utility of chloroplast and nuclear DNA markers and the phylogeny of the Rubiaceae tribe Spermacoceae.

    Kårehed, Jesper; Groeninckx, Inge; Dessein, Steven; Motley, Timothy J; Bremer, Birgitta

    2008-12-01

    The phylogenetic utility of chloroplast (atpB-rbcL, petD, rps16, trnL-F) and nuclear (ETS, ITS) DNA regions was investigated for the tribe Spermacoceae of the coffee family (Rubiaceae). ITS was, despite often raised cautions of its utility at higher taxonomic levels, shown to provide the highest number of parsimony informative characters, in partitioned Bayesian analyses it yielded the fewest trees in the 95% credible set, it resolved the highest proportion of well resolved clades, and was the most accurate region as measured by the partition metric and the proportion of correctly resolved clades (well supported clades retrieved from a combined analysis regarded as "true"). For Hedyotis, the nuclear 5S-NTS was shown to be potentially as useful as ITS, despite its shorter sequence length. The chloroplast region being the most phylogenetically informative was the petD group II intron. We also present a phylogeny of Spermacoceae based on a Bayesian analysis of the four chloroplast regions, ITS, and ETS combined. Spermacoceae are shown to be monophyletic. Clades supported by high posterior probabilities are discussed, especially in respect to the current generic classification. Notably, Oldenlandia is polyphyletic, the two subgenera of Kohautia are not sister taxa, and Hedyotis should be treated in a narrow sense to include only Asian species. PMID:18950720

  12. Accurate classification of 17 AGNs detected with Swift/BAT

    Parisi, P; Jimenez-Bailon, E; Chavushyan, V; Malizia, A; Landi, R; Molina, M; Fiocchi, M; Palazzi, E; Bassani, L; Bazzano, A; Bird, A J; Dean, A J; Galaz, G; Mason, E; Minniti, D; Morelli, L; Stephen, J B; Ubertini, P

    2009-01-01

    Through an optical campaign performed at 5 telescopes located in the northern and the southern hemispheres, plus archival data from two on line sky surveys, we have obtained optical spectroscopy for 17 counterparts of suspected or poorly studied hard X-ray emitting active galactic nuclei (AGNs) detected with Swift/BAT in order to determine or better classify their nature. We find that 7 sources of our sample are Type 1 AGNs, 9 are Type 2 AGNs, and 1 object is an X-ray bright optically normal galaxy; the redshifts of these objects lie in a range between 0.012 and 0.286. For all these sources, X-ray data analysis was also performed to estimate their absorption column and to search for possible Compton thick candidates. Among our type 2 objects, we did not find any clear Compton thick AGN, but at least 6 out of 9 of them are highly absorbed (N_H > 10^23 cm^-2), while one does not require intrinsic absorption; i.e., it appears to be a naked Seyfert 2 galaxy.

  13. Accurate mobile malware detection and classification in the cloud

    Wang, Xiaolei; Yang, Yuexiang; Zeng, Yingzhi

    2015-01-01

    As the dominator of the Smartphone operating system market, consequently android has attracted the attention of s malware authors and researcher alike. The number of types of android malware is increasing rapidly regardless of the considerable number of proposed malware analysis systems. In this paper, by taking advantages of low false-positive rate of misuse detection and the ability of anomaly detection to detect zero-day malware, we propose a novel hybrid detection system based on a new op...

  14. An Innovative Imputation and Classification Approach for Accurate Disease Prediction

    UshaRani, Yelipe; Sammulal, P.

    2016-01-01

    Imputation of missing attribute values in medical datasets for extracting hidden knowledge from medical datasets is an interesting research topic of interest which is very challenging. One cannot eliminate missing values in medical records. The reason may be because some tests may not been conducted as they are cost effective, values missed when conducting clinical trials, values may not have been recorded to name some of the reasons. Data mining researchers have been proposing various approa...

  15. Phylogenetic placement of the ectomycorrhizal genus Cenococcum in Gloniaceae (Dothideomycetes).

    Spatafora, Joseph W; Owensby, C Alisha; Douhan, Greg W; Boehm, Eric W A; Schoch, Conrad L

    2012-01-01

    Cenococcum is a genus of ectomycorrhizal Ascomycota that has a broad host range and geographic distribution. It is not known to produce either meiotic or mitotic spores and is known to exist only in the form of hyphae, sclerotia and host-colonized ectomycorrhizal root tips. Due to its lack of sexual and asexual spores and reproductive structures, it has proven difficult to incorporate into traditional classification within Ascomycota. Molecular phylogenetic studies of ribosomal RNA placed Cenococcum in Dothideomycetes, but the definitive identification of closely related taxa remained elusive. Here we report a phylogenetic analysis of five nuclear loci (SSU, LSU, TEF1, RPB1, RPB2) of Dothideomycetes that placed Cenococcum as a close relative of the genus Glonium of Gloniaceae (Pleosporomycetidae incertae sedis) with strong statistical support. Glonium is a genus of saprobic Dothideomycetes that produces darkly pigmented, carbonaceous, hysteriate apothecia and is not known to be biotrophic. Evolution of ectomycorhizae, Cenococcum and Dothideomycetes is discussed. PMID:22453119

  16. Accurate Finite Difference Algorithms

    Goodrich, John W.

    1996-01-01

    Two families of finite difference algorithms for computational aeroacoustics are presented and compared. All of the algorithms are single step explicit methods, they have the same order of accuracy in both space and time, with examples up to eleventh order, and they have multidimensional extensions. One of the algorithm families has spectral like high resolution. Propagation with high order and high resolution algorithms can produce accurate results after O(10(exp 6)) periods of propagation with eight grid points per wavelength.

  17. Multiple Sparse Representations Classification.

    Plenge, Esben; Klein, Stefan; Klein, Stefan S; Niessen, Wiro J; Meijering, Erik

    2015-01-01

    Sparse representations classification (SRC) is a powerful technique for pixelwise classification of images and it is increasingly being used for a wide variety of image analysis tasks. The method uses sparse representation and learned redundant dictionaries to classify image pixels. In this empirical study we propose to further leverage the redundancy of the learned dictionaries to achieve a more accurate classifier. In conventional SRC, each image pixel is associated with a small patch surrounding it. Using these patches, a dictionary is trained for each class in a supervised fashion. Commonly, redundant/overcomplete dictionaries are trained and image patches are sparsely represented by a linear combination of only a few of the dictionary elements. Given a set of trained dictionaries, a new patch is sparse coded using each of them, and subsequently assigned to the class whose dictionary yields the minimum residual energy. We propose a generalization of this scheme. The method, which we call multiple sparse representations classification (mSRC), is based on the observation that an overcomplete, class specific dictionary is capable of generating multiple accurate and independent estimates of a patch belonging to the class. So instead of finding a single sparse representation of a patch for each dictionary, we find multiple, and the corresponding residual energies provides an enhanced statistic which is used to improve classification. We demonstrate the efficacy of mSRC for three example applications: pixelwise classification of texture images, lumen segmentation in carotid artery magnetic resonance imaging (MRI), and bifurcation point detection in carotid artery MRI. We compare our method with conventional SRC, K-nearest neighbor, and support vector machine classifiers. The results show that mSRC outperforms SRC and the other reference methods. In addition, we present an extensive evaluation of the effect of the main mSRC parameters: patch size, dictionary size, and

  18. Strategic Classification

    Hardt, Moritz; Megiddo, Nimrod; Papadimitriou, Christos; Wootters, Mary

    2015-01-01

    Machine learning relies on the assumption that unseen test instances of a classification problem follow the same distribution as observed training data. However, this principle can break down when machine learning is used to make important decisions about the welfare (employment, education, health) of strategic individuals. Knowing information about the classifier, such individuals may manipulate their attributes in order to obtain a better classification outcome. As a result of this behavior...

  19. HYBRID INTERNET TRAFFIC CLASSIFICATION TECHNIQUE1

    Li Jun; Zhang Shunyi; Lu Yanqing; Yan Junrong

    2009-01-01

    Accurate and real-time classification of network traffic is significant to network operation and management such as QoS differentiation, traffic shaping and security surveillance. However, with many newly emerged P2P applications using dynamic port numbers, masquerading techniques, and payload encryption to avoid detection, traditional classification approaches turn to be ineffective. In this paper, we present a layered hybrid system to classify current Internet traffic, motivated by variety of network activities and their requirements of traffic classification. The proposed method could achieve fast and accurate traffic classification with low overheads and robustness to accommodate both known and unknown/encrypted applications. Furthermore, it is feasible to be used in the context of real-time traffic classification. Our experimental results show the distinct advantages of the proposed classification system, compared with the one-step Machine Learning (ML) approach.

  20. Phylogenetic Position of Barbus lacerta Heckel, 1843

    Mustafa Korkmaz

    2015-11-01

    As a result, five clades come out from phylogenetic reconstruction and in phylogenetic tree Barbus lacerta determined to be sister group of Barbus macedonicus, Barbus oligolepis and Barbus plebejus complex.

  1. DendroBlast: approximate phylogenetic trees in the absence of multiple sequence alignments

    KELLY S; Maini, P. K.

    2013-01-01

    The rapidly growing availability of genome information has created considerable demand for both fast and accurate phylogenetic inference algorithms. We present a novel method called DendroBLAST for reconstructing phylogenetic dendrograms/trees from protein sequences using BLAST. This method differs from other methods by incorporating a simple model of sequence evolution to test the effect of introducing sequence changes on the reliability of the bipartitions in the inferred tree. Using realis...

  2. Phylogenetic Distribution of Fungal Sterols

    Weete, John D.; Abril, Maritza; Blackwell, Meredith

    2010-01-01

    Background Ergosterol has been considered the “fungal sterol” for almost 125 years; however, additional sterol data superimposed on a recent molecular phylogeny of kingdom Fungi reveals a different and more complex situation. Methodology/Principal Findings The interpretation of sterol distribution data in a modern phylogenetic context indicates that there is a clear trend from cholesterol and other Δ5 sterols in the earliest diverging fungal species to ergosterol in later diverging fungi. The...

  3. PHYLOGENETIC ANALYSIS AMONG FOUR SECTIONS OF GENUS DENDROBIUM SW. (ORCHIDACEAE) IN PENINSULAR MALAYSIA USING RBCL SEQUENCE DATA

    2013-01-01

    Phylogenetic analysis using chloroplast DNA, the ribulose-bisphosphate carboxylase gene (rbcL), was conducted to examine relationship among four sections of the genus Dendrobium (Orchidaceae): Aporum, Crumenata, Strongyle, and Bolbidium in Peninsular Malaysia. Classifications based on morphological characters have not been able to clearly divide these four sections, therefore deeper and detailed analyses are required to ascertain their status. In this study, the phylogenetic relationships amo...

  4. Combinatorial Approaches to Accurate Identification of Orthologous Genes

    Shi, Guanqun

    2011-01-01

    The accurate identification of orthologous genes across different species is a critical and challenging problem in comparative genomics and has a wide spectrum of biological applications including gene function inference, evolutionary studies and systems biology. During the past several years, many methods have been proposed for ortholog assignment based on sequence similarity, phylogenetic approaches, synteny information, and genome rearrangement. Although these methods share many commonly a...

  5. Text Classification using Data Mining

    Kamruzzaman, S M; Hasan, Ahmed Ryadh

    2010-01-01

    Text classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Existing supervised learning algorithms to automatically classify text need sufficient documents to learn accurately. This paper presents a new algorithm for text classification using data mining that requires fewer documents for training. Instead of using words, word relation i.e. association rules from these words is used to derive feature set from pre-classified text documents. The concept of Naive Bayes classifier is then used on derived features and finally only a single concept of Genetic Algorithm has been added for final classification. A system based on the...

  6. Text Classification using Artificial Intelligence

    Kamruzzaman, S M

    2010-01-01

    Text classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Existing supervised learning algorithms for classifying text need sufficient documents to learn accurately. This paper presents a new algorithm for text classification using artificial intelligence technique that requires fewer documents for training. Instead of using words, word relation i.e. association rules from these words is used to derive feature set from pre-classified text documents. The concept of na\\"ive Bayes classifier is then used on derived features and finally only a single concept of genetic algorithm has been added for final classification. A syste...

  7. Transforming phylogenetic networks: Moving beyond tree space.

    Huber, Katharina T; Moulton, Vincent; Wu, Taoyang

    2016-09-01

    Phylogenetic networks are a generalization of phylogenetic trees that are used to represent reticulate evolution. Unrooted phylogenetic networks form a special class of such networks, which naturally generalize unrooted phylogenetic trees. In this paper we define two operations on unrooted phylogenetic networks, one of which is a generalization of the well-known nearest-neighbor interchange (NNI) operation on phylogenetic trees. We show that any unrooted phylogenetic network can be transformed into any other such network using only these operations. This generalizes the well-known fact that any phylogenetic tree can be transformed into any other such tree using only NNI operations. It also allows us to define a generalization of tree space and to define some new metrics on unrooted phylogenetic networks. To prove our main results, we employ some fascinating new connections between phylogenetic networks and cubic graphs that we have recently discovered. Our results should be useful in developing new strategies to search for optimal phylogenetic networks, a topic that has recently generated some interest in the literature, as well as for providing new ways to compare networks. PMID:27224010

  8. Functional and phylogenetic ecology in R

    Swenson, Nathan G

    2014-01-01

    Functional and Phylogenetic Ecology in R is designed to teach readers to use R for phylogenetic and functional trait analyses. Over the past decade, a dizzying array of tools and methods were generated to incorporate phylogenetic and functional information into traditional ecological analyses. Increasingly these tools are implemented in R, thus greatly expanding their impact. Researchers getting started in R can use this volume as a step-by-step entryway into phylogenetic and functional analyses for ecology in R. More advanced users will be able to use this volume as a quick reference to understand particular analyses. The volume begins with an introduction to the R environment and handling relevant data in R. Chapters then cover phylogenetic and functional metrics of biodiversity; null modeling and randomizations for phylogenetic and functional trait analyses; integrating phylogenetic and functional trait information; and interfacing the R environment with a popular C-based program. This book presents a uni...

  9. A phylogenetic re-analysis of groupers with applications for ciguatera fish poisoning.

    Charlotte Schoelinck

    Full Text Available Ciguatera fish poisoning (CFP is a significant public health problem due to dinoflagellates. It is responsible for one of the highest reported incidence of seafood-borne illness and Groupers are commonly reported as a source of CFP due to their position in the food chain. With the role of recent climate change on harmful algal blooms, CFP cases might become more frequent and more geographically widespread. Since there is no appropriate treatment for CFP, the most efficient solution is to regulate fish consumption. Such a strategy can only work if the fish sold are correctly identified, and it has been repeatedly shown that misidentifications and species substitutions occur in fish markets.We provide here both a DNA-barcoding reference for groupers, and a new phylogenetic reconstruction based on five genes and a comprehensive taxonomical sampling. We analyse the correlation between geographic range of species and their susceptibility to ciguatera accumulation, and the co-occurrence of ciguatoxins in closely related species, using both character mapping and statistical methods.Misidentifications were encountered in public databases, precluding accurate species identifications. Epinephelinae now includes only twelve genera (vs. 15 previously. Comparisons with the ciguatera incidences show that in some genera most species are ciguateric, but statistical tests display only a moderate correlation with the phylogeny. Atlantic species were rarely contaminated, with ciguatera occurrences being restricted to the South Pacific.The recent changes in classification based on the reanalyses of the relationships within Epinephelidae have an impact on the interpretation of the ciguatera distribution in the genera. In this context and to improve the monitoring of fish trade and safety, we need to obtain extensive data on contamination at the species level. Accurate species identifications through DNA barcoding are thus an essential tool in controlling CFP since

  10. Transporter Classification Database (TCDB)

    U.S. Department of Health & Human Services — The Transporter Classification Database details a comprehensive classification system for membrane transport proteins known as the Transporter Classification (TC)...

  11. HIV classification using coalescent theory

    Zhang, Ming [Los Alamos National Laboratory; Letiner, Thomas K [Los Alamos National Laboratory; Korber, Bette T [Los Alamos National Laboratory

    2008-01-01

    Algorithms for subtype classification and breakpoint detection of HIV-I sequences are based on a classification system of HIV-l. Hence, their quality highly depend on this system. Due to the history of creation of the current HIV-I nomenclature, the current one contains inconsistencies like: The phylogenetic distance between the subtype B and D is remarkably small compared with other pairs of subtypes. In fact, it is more like the distance of a pair of subsubtypes Robertson et al. (2000); Subtypes E and I do not exist any more since they were discovered to be composed of recombinants Robertson et al. (2000); It is currently discussed whether -- instead of CRF02 being a recombinant of subtype A and G -- subtype G should be designated as a circulating recombination form (CRF) nd CRF02 as a subtype Abecasis et al. (2007); There are 8 complete and over 400 partial HIV genomes in the LANL-database which belong neither to a subtype nor to a CRF (denoted by U). Moreover, the current classification system is somehow arbitrary like all complex classification systems that were created manually. To this end, it is desirable to deduce the classification system of HIV systematically by an algorithm. Of course, this problem is not restricted to HIV, but applies to all fast mutating and recombining viruses. Our work addresses the simpler subproblem to score classifications of given input sequences of some virus species (classification denotes a partition of the input sequences in several subtypes and CRFs). To this end, we reconstruct ancestral recombination graphs (ARG) of the input sequences under restrictions determined by the given classification. These restritions are imposed in order to ensure that the reconstructed ARGs do not contradict the classification under consideration. Then, we find the ARG with maximal probability by means of Markov Chain Monte Carlo methods. The probability of the most probable ARG is interpreted as a score for the classification. To our

  12. Classifying Classification

    Novakowski, Janice

    2009-01-01

    This article describes the experience of a group of first-grade teachers as they tackled the science process of classification, a targeted learning objective for the first grade. While the two-year process was not easy and required teachers to teach in a new, more investigation-oriented way, the benefits were great. The project helped teachers and…

  13. Tissue Classification

    Van Leemput, Koen; Puonti, Oula

    2015-01-01

    Computational methods for automatically segmenting magnetic resonance images of the brain have seen tremendous advances in recent years. So-called tissue classification techniques, aimed at extracting the three main brain tissue classes (white matter, gray matter, and cerebrospinal fluid), are no...... software packages such as SPM, FSL, and FreeSurfer....

  14. HoxPred: automated classification of Hox proteins using combinations of generalised profiles

    Leyns Luc

    2007-07-01

    Full Text Available Abstract Background Correct identification of individual Hox proteins is an essential basis for their study in diverse research fields. Common methods to classify Hox proteins focus on the homeodomain that characterise homeobox transcription factors. Classification is hampered by the high conservation of this short domain. Phylogenetic tree reconstruction is a widely used but time-consuming classification method. Results We have developed an automated procedure, HoxPred, that classifies Hox proteins in their groups of homology. The method relies on a discriminant analysis that classifies Hox proteins according to their scores for a combination of protein generalised profiles. 54 generalised profiles dedicated to each Hox homology group were produced de novo from a curated dataset of vertebrate Hox proteins. Several classification methods were investigated to select the most accurate discriminant functions. These functions were then incorporated into the HoxPred program. Conclusion HoxPred shows a mean accuracy of 97%. Predictions on the recently-sequenced stickleback fish proteome identified 44 Hox proteins, including HoxC1a only found so far in zebrafish. Using the Uniprot databank, we demonstrate that HoxPred can efficiently contribute to large-scale automatic annotation of Hox proteins into their paralogous groups. As orthologous group predictions show a higher risk of misclassification, they should be corroborated by additional supporting evidence. HoxPred is accessible via SOAP and Web interface http://cege.vub.ac.be/hoxpred/. Complete datasets, results and source code are available at the same site.

  15. Phylogenetic placement of the Spirosomaceae

    Woese, C. R.; Maloy, S.; Mandelco, L.; Raj, H. D.

    1990-01-01

    Comparative analysis of 16S rRNA sequences shows that the family Spirosomaceae belongs within the eubacterial phylum defined by the flavobacteria and bacteriodes. Its constituent genera, Spirosoma, Flectobacillus, and Runella form a monophyletic grouping therein. The phylogenetic assignment is based not only upon evolutionary distance analysis, but also upon sequence signatures and higher order structural synapomorphies in 16S rRNA. Another genus peripherally associated with the Spirosomaceae, Ancylobacter ("Microcyclus"), does not cluster with the flavobacteria and their relatives, but rather belongs to the alpha subdivision of the purple bacteria.

  16. Phylogenetics of neotropical Platymiscium (Leguminosae

    Saslis-Lagoudakis, C. Haris; Chase, Mark W; Robinson, Daniel N;

    2008-01-01

    Platymiscium is a neotropical legume genus of forest trees in the Pterocarpus clade of the pantropical "dalbergioid" clade. It comprises 19 species (29 taxa), distributed from Mexico to southern Brazil. This study presents a molecular phylogenetic analysis of Platymiscium and allies inferred from...... nuclear ribosomal (nrITS) and plastid (trnL, trnL-F and matK) DNA sequence data using parsimony and Bayesian methods. Divergence times are estimated using a Bayesian method assuming a relaxed molecular clock (multidivtime). Within the Pterocarpus clade, new sister relationships are recovered: Pterocarpus...

  17. Making Mosquito Taxonomy Useful: A Stable Classification of Tribe Aedini that Balances Utility with Current Knowledge of Evolutionary Relationships.

    Wilkerson, Richard C; Linton, Yvonne-Marie; Fonseca, Dina M; Schultz, Ted R; Price, Dana C; Strickman, Daniel A

    2015-01-01

    The tribe Aedini (Family Culicidae) contains approximately one-quarter of the known species of mosquitoes, including vectors of deadly or debilitating disease agents. This tribe contains the genus Aedes, which is one of the three most familiar genera of mosquitoes. During the past decade, Aedini has been the focus of a series of extensive morphology-based phylogenetic studies published by Reinert, Harbach, and Kitching (RH&K). Those authors created 74 new, elevated or resurrected genera from what had been the single genus Aedes, almost tripling the number of genera in the entire family Culicidae. The proposed classification is based on subjective assessments of the "number and nature of the characters that support the branches" subtending particular monophyletic groups in the results of cladistic analyses of a large set of morphological characters of representative species. To gauge the stability of RH&K's generic groupings we reanalyzed their data with unweighted parsimony jackknife and maximum-parsimony analyses, with and without ordering 14 of the characters as in RH&K. We found that their phylogeny was largely weakly supported and their taxonomic rankings failed priority and other useful taxon-naming criteria. Consequently, we propose simplified aedine generic designations that 1) restore a classification system that is useful for the operational community; 2) enhance the ability of taxonomists to accurately place new species into genera; 3) maintain the progress toward a natural classification based on monophyletic groups of species; and 4) correct the current classification system that is subject to instability as new species are described and existing species more thoroughly defined. We do not challenge the phylogenetic hypotheses generated by the above-mentioned series of morphological studies. However, we reduce the ranks of the genera and subgenera of RH&K to subgenera or informal species groups, respectively, to preserve stability as new data become

  18. Making Mosquito Taxonomy Useful: A Stable Classification of Tribe Aedini that Balances Utility with Current Knowledge of Evolutionary Relationships.

    Richard C Wilkerson

    Full Text Available The tribe Aedini (Family Culicidae contains approximately one-quarter of the known species of mosquitoes, including vectors of deadly or debilitating disease agents. This tribe contains the genus Aedes, which is one of the three most familiar genera of mosquitoes. During the past decade, Aedini has been the focus of a series of extensive morphology-based phylogenetic studies published by Reinert, Harbach, and Kitching (RH&K. Those authors created 74 new, elevated or resurrected genera from what had been the single genus Aedes, almost tripling the number of genera in the entire family Culicidae. The proposed classification is based on subjective assessments of the "number and nature of the characters that support the branches" subtending particular monophyletic groups in the results of cladistic analyses of a large set of morphological characters of representative species. To gauge the stability of RH&K's generic groupings we reanalyzed their data with unweighted parsimony jackknife and maximum-parsimony analyses, with and without ordering 14 of the characters as in RH&K. We found that their phylogeny was largely weakly supported and their taxonomic rankings failed priority and other useful taxon-naming criteria. Consequently, we propose simplified aedine generic designations that 1 restore a classification system that is useful for the operational community; 2 enhance the ability of taxonomists to accurately place new species into genera; 3 maintain the progress toward a natural classification based on monophyletic groups of species; and 4 correct the current classification system that is subject to instability as new species are described and existing species more thoroughly defined. We do not challenge the phylogenetic hypotheses generated by the above-mentioned series of morphological studies. However, we reduce the ranks of the genera and subgenera of RH&K to subgenera or informal species groups, respectively, to preserve stability as new

  19. Phycas: software for Bayesian phylogenetic analysis.

    Lewis, Paul O; Holder, Mark T; Swofford, David L

    2015-05-01

    Phycas is open source, freely available Bayesian phylogenetics software written primarily in C++ but with a Python interface. Phycas specializes in Bayesian model selection for nucleotide sequence data, particularly the estimation of marginal likelihoods, central to computing Bayes Factors. Marginal likelihoods can be estimated using newer methods (Thermodynamic Integration and Generalized Steppingstone) that are more accurate than the widely used Harmonic Mean estimator. In addition, Phycas supports two posterior predictive approaches to model selection: Gelfand-Ghosh and Conditional Predictive Ordinates. The General Time Reversible family of substitution models, as well as a codon model, are available, and data can be partitioned with all parameters unlinked except tree topology and edge lengths. Phycas provides for analyses in which the prior on tree topologies allows polytomous trees as well as fully resolved trees, and provides for several choices for edge length priors, including a hierarchical model as well as the recently described compound Dirichlet prior, which helps avoid overly informative induced priors on tree length. PMID:25577605

  20. Neuromuscular disease classification system

    Sáez, Aurora; Acha, Begoña; Montero-Sánchez, Adoración; Rivas, Eloy; Escudero, Luis M.; Serrano, Carmen

    2013-06-01

    Diagnosis of neuromuscular diseases is based on subjective visual assessment of biopsies from patients by the pathologist specialist. A system for objective analysis and classification of muscular dystrophies and neurogenic atrophies through muscle biopsy images of fluorescence microscopy is presented. The procedure starts with an accurate segmentation of the muscle fibers using mathematical morphology and a watershed transform. A feature extraction step is carried out in two parts: 24 features that pathologists take into account to diagnose the diseases and 58 structural features that the human eye cannot see, based on the assumption that the biopsy is considered as a graph, where the nodes are represented by each fiber, and two nodes are connected if two fibers are adjacent. A feature selection using sequential forward selection and sequential backward selection methods, a classification using a Fuzzy ARTMAP neural network, and a study of grading the severity are performed on these two sets of features. A database consisting of 91 images was used: 71 images for the training step and 20 as the test. A classification error of 0% was obtained. It is concluded that the addition of features undetectable by the human visual inspection improves the categorization of atrophic patterns.

  1. Many-core algorithms for statistical phylogenetics

    Suchard, Marc A.; Rambaut, Andrew

    2009-01-01

    Motivation: Statistical phylogenetics is computationally intensive, resulting in considerable attention meted on techniques for parallelization. Codon-based models allow for independent rates of synonymous and replacement substitutions and have the potential to more adequately model the process of protein-coding sequence evolution with a resulting increase in phylogenetic accuracy. Unfortunately, due to the high number of codon states, computational burden has largely thwarted phylogenetic re...

  2. Phylogenetic diversity of freshwater picocyanobacteria

    Callieri, Cristiana; Coci, Manuela

    2012-01-01

    Picocyanobacteria are photosynthetic prokaryotes, coccoid or rod-shaped, with a cell diameter < 2 ?m. They are common in lakes and oceans, and abundant across a wide spectrum of trophic conditions (Callieri et al 2012). The dominant genus of freshwater picocyanobacteria is Synechococcus. Analysis of 16S rRNA gene of freshwater Synechococcus showed its polyphyletic origin, requiring better insights in the present classification of the genus and possibly a revision. We isolated more than 40 pic...

  3. Phylogenetic organization of bacterial activity.

    Morrissey, Ember M; Mau, Rebecca L; Schwartz, Egbert; Caporaso, J Gregory; Dijkstra, Paul; van Gestel, Natasja; Koch, Benjamin J; Liu, Cindy M; Hayer, Michaela; McHugh, Theresa A; Marks, Jane C; Price, Lance B; Hungate, Bruce A

    2016-09-01

    Phylogeny is an ecologically meaningful way to classify plants and animals, as closely related taxa frequently have similar ecological characteristics, functional traits and effects on ecosystem processes. For bacteria, however, phylogeny has been argued to be an unreliable indicator of an organism's ecology owing to evolutionary processes more common to microbes such as gene loss and lateral gene transfer, as well as convergent evolution. Here we use advanced stable isotope probing with (13)C and (18)O to show that evolutionary history has ecological significance for in situ bacterial activity. Phylogenetic organization in the activity of bacteria sets the stage for characterizing the functional attributes of bacterial taxonomic groups. Connecting identity with function in this way will allow scientists to begin building a mechanistic understanding of how bacterial community composition regulates critical ecosystem functions. PMID:26943624

  4. DNA sequence analysis using hierarchical ART-based classification networks

    LeBlanc, C.; Hruska, S.I. [Florida State Univ., Tallahassee, FL (United States); Katholi, C.R.; Unnasch, T.R. [Univ. of Alabama, Birmingham, AL (United States)

    1994-12-31

    Adaptive resonance theory (ART) describes a class of artificial neural network architectures that act as classification tools which self-organize, work in real-time, and require no retraining to classify novel sequences. We have adapted ART networks to provide support to scientists attempting to categorize tandem repeat DNA fragments from Onchocerca volvulus. In this approach, sequences of DNA fragments are presented to multiple ART-based networks which are linked together into two (or more) tiers; the first provides coarse sequence classification while the sub- sequent tiers refine the classifications as needed. The overall rating of the resulting classification of fragments is measured using statistical techniques based on those introduced to validate results from traditional phylogenetic analysis. Tests of the Hierarchical ART-based Classification Network, or HABclass network, indicate its value as a fast, easy-to-use classification tool which adapts to new data without retraining on previously classified data.

  5. Progress, pitfalls and parallel universes: a history of insect phylogenetics.

    Kjer, Karl M; Simon, Chris; Yavorskaya, Margarita; Beutel, Rolf G

    2016-08-01

    The phylogeny of insects has been both extensively studied and vigorously debated for over a century. A relatively accurate deep phylogeny had been produced by 1904. It was not substantially improved in topology until recently when phylogenomics settled many long-standing controversies. Intervening advances came instead through methodological improvement. Early molecular phylogenetic studies (1985-2005), dominated by a few genes, provided datasets that were too small to resolve controversial phylogenetic problems. Adding to the lack of consensus, this period was characterized by a polarization of philosophies, with individuals belonging to either parsimony or maximum-likelihood camps; each largely ignoring the insights of the other. The result was an unfortunate detour in which the few perceived phylogenetic revolutions published by both sides of the philosophical divide were probably erroneous. The size of datasets has been growing exponentially since the mid-1980s accompanied by a wave of confidence that all relationships will soon be known. However, large datasets create new challenges, and a large number of genes does not guarantee reliable results. If history is a guide, then the quality of conclusions will be determined by an improved understanding of both molecular and morphological evolution, and not simply the number of genes analysed. PMID:27558853

  6. LABEL: fast and accurate lineage assignment with assessment of H5N1 and H9N2 influenza A hemagglutinins.

    Samuel S Shepard

    Full Text Available The evolutionary classification of influenza genes into lineages is a first step in understanding their molecular epidemiology and can inform the subsequent implementation of control measures. We introduce a novel approach called Lineage Assignment By Extended Learning (LABEL to rapidly determine cladistic information for any number of genes without the need for time-consuming sequence alignment, phylogenetic tree construction, or manual annotation. Instead, LABEL relies on hidden Markov model profiles and support vector machine training to hierarchically classify gene sequences by their similarity to pre-defined lineages. We assessed LABEL by analyzing the annotated hemagglutinin genes of highly pathogenic (H5N1 and low pathogenicity (H9N2 avian influenza A viruses. Using the WHO/FAO/OIE H5N1 evolution working group nomenclature, the LABEL pipeline quickly and accurately identified the H5 lineages of uncharacterized sequences. Moreover, we developed an updated clade nomenclature for the H9 hemagglutinin gene and show a similarly fast and reliable phylogenetic assessment with LABEL. While this study was focused on hemagglutinin sequences, LABEL could be applied to the analysis of any gene and shows great potential to guide molecular epidemiology activities, accelerate database annotation, and provide a data sorting tool for other large-scale bioinformatic studies.

  7. Vehicle Classification by Lane Allowance

    Vishakha Gaikwad

    2014-12-01

    Full Text Available Classification of vehicles from video is used for analysis of traffic, self-driving systems or security systems. This analysis is based on shape, size, velocity and track of vehicles. These features characterize vehicle in background subtraction and feature extraction methods. Extraction is done by active contours and morphological operations. Extracted vehicles are classified by applying various classification techniques. The combination of features and classification techniques varies with the application. Proposed system, Uses combination of K Nearest Neighbor (KNN and Decision Tree techniques to overcome constraints. These constraints are instances of an object, overlapping of objects, and scaling factor. KNN is utilized to classify vehicle by size and lane. Decision tree manipulates the combination of these two features to classify accurately which results increased performance. This system classifies objects into three classes. These classes are four wheeler, bikers and heavy duty vehicle extracted from video.

  8. Phylogenetic comparative approaches for studying niche conservatism

    COOPER, NATALIE; Jetz, Walter; Freckleton, Rob P.

    2010-01-01

    Analyses of phylogenetic niche conservatism (PNC) are becoming increasingly common. However, each analysis makes subtly different assumptions about the evolutionary mechanism that generates patterns of niche conservatism. To understand PNC, analyses should be conducted with reference to a clear underlying model, using appropriate methods. Here, we outline five macroevolutionary models that may underlie patterns of PNC (drift, niche retention, phylogenetic inertia, niche filling ? shifti...

  9. Demonstrating Biological Classification Using a Simulation of Natural Taxa.

    Vogt, Kenneth D.

    1995-01-01

    A review of introductory college level and high school biology texts reveals that concepts and theories behind classification are usually poorly discussed. Suggests ways in which card games can be used to teach differences between the phenetic and phylogenetic approaches. (LZ)

  10. Automatic web services classification based on rough set theory

    陈立; 张英; 宋自林; 苗壮

    2013-01-01

    With development of web services technology, the number of existing services in the internet is growing day by day. In order to achieve automatic and accurate services classification which can be beneficial for service related tasks, a rough set theory based method for services classification was proposed. First, the services descriptions were preprocessed and represented as vectors. Elicited by the discernibility matrices based attribute reduction in rough set theory and taking into account the characteristic of decision table of services classification, a method based on continuous discernibility matrices was proposed for dimensionality reduction. And finally, services classification was processed automatically. Through the experiment, the proposed method for services classification achieves approving classification result in all five testing categories. The experiment result shows that the proposed method is accurate and could be used in practical web services classification.

  11. Multi-borders classification

    Mills, Peter

    2014-01-01

    The number of possible methods of generalizing binary classification to multi-class classification increases exponentially with the number of class labels. Often, the best method of doing so will be highly problem dependent. Here we present classification software in which the partitioning of multi-class classification problems into binary classification problems is specified using a recursive control language.

  12. Classification in Australia.

    McKinlay, John

    Despite some inroads by the Library of Congress Classification and short-lived experimentation with Universal Decimal Classification and Bliss Classification, Dewey Decimal Classification, with its ability in recent editions to be hospitable to local needs, remains the most widely used classification system in Australia. Although supplemented at…

  13. Classification in context

    Mai, Jens Erik

    2004-01-01

    This paper surveys classification research literature, discusses various classification theories, and shows that the focus has traditionally been on establishing a scientific foundation for classification research. This paper argues that a shift has taken place, and suggests that contemporary cla...... classification research focus on contextual information as the guide for the design and construction of classification schemes....

  14. Use of whole genome sequences to develop a molecular phylogenetic framework for Rhodococcus fascians and the Rhodococcus genus

    Allison L. Creason

    2014-08-01

    Full Text Available The accurate diagnosis of diseases caused by pathogenic bacteria requires a stable species classification. Rhodococcus fascians is the only documented member of its ill-defined genus that is capable of causing disease on a wide range of agriculturally important plants. Comparisons of genome sequences generated from isolates of Rhodococcus associated with diseased plants revealed a level of genetic diversity consistent with them representing multiple species. To test this, we generated a tree based on more than 1700 homologous sequences from plant-associated isolates of Rhodococcus, and obtained support from additional approaches that measure and cluster based on genome similarities. Results were consistent in supporting the definition of new Rhodococcus species within clades containing phytopathogenic members. We also used the genome sequences, along with other rhodococcal genome sequences to construct a molecular phylogenetic tree as a framework for resolving the Rhodococcus genus. Results indicated that Rhodococcus has the potential for having 20 species and also confirmed a need to revisit the taxonomic groupings within Rhodococcus.

  15. Phylogenetic analysis of Pectinidae (Bivalvia) based on the ribosomal DNA internal transcribed spacer region

    2007-01-01

    The ribosomal DNA internal transcribed spacer (ITS) region is a useful genomic region for understanding evolutionary and genetic relationships. In the current study, the molecular phylogenetic analysis of Pectinidae (Mollusca: Bivalvia) was performed using the nucleotide sequences of the nuclear ITS region in nine species of this family. The sequences were obtained from the scallop species Argopecten irradians, Mizuhopecten yessoensis, Amusium pleuronectes and Mimachlamys nobilis, and compared with the published sequences of Aequipecten opercularis, Chlamys farreri, C. distorta, M. varia, Pecten maximus, and an outgroup species Perna viridis. The molecular phylogenetic tree was constructed by the neighbor-joining and maximum parsimony methods. Phylogenetic analysis based on ITS1, ITS2, or their combination always yielded trees of similar topology. The results support the morphological classifications of bivalve and are nearly consistent with classification of two subfamilies (Chlamydinae and Pectininae) formulated by Waller. However, A. irradians, together with A. opercularis made up of genera Amusium, evidences that they may belong to the subfamily Pectinidae. The data are incompatible with the conclusion of Waller who placed them in Chlamydinae by morphological characteristics. These results provide new insights into the evolutionary relationships among scallop species and contribute to the improvement of existing classification systems.

  16. Interactive multiclass segmentation using superpixel classification

    Mathieu, Bérengère; Crouzil, Alain; Puel, Jean-Baptiste

    2015-01-01

    This paper adresses the problem of interactive multiclass segmentation. We propose a fast and efficient new interactive segmentation method called Superpixel Classification-based Interactive Segmentation (SCIS). From a few strokes drawn by a human user over an image, this method extracts relevant semantic objects. To get a fast calculation and an accurate segmentation, SCIS uses superpixel over-segmentation and support vector machine classification. In this paper, we demonstrate that SCIS sig...

  17. Phylogenetic placement of Hydra and relationships within Aplanulata (Cnidaria: Hydrozoa).

    Nawrocki, Annalise M; Collins, Allen G; Hirano, Yayoi M; Schuchert, Peter; Cartwright, Paulyn

    2013-04-01

    The model organism Hydra belongs to the hydrozoan clade Aplanulata. Despite being a popular model system for development, little is known about the phylogenetic placement of this taxon or the relationships of its closest relatives. Previous studies have been conflicting regarding sister group relationships and have been unable to resolve deep nodes within the clade. In addition, there are several putative Aplanulata taxa that have never been sampled for molecular data or analyzed using multiple markers. Here, we combine the fast-evolving cytochrome oxidase 1 (CO1) mitochondrial marker with mitochondrial 16S, nuclear small ribosomal subunit (18S, SSU) and large ribosomal subunit (28S, LSU) sequences to examine relationships within the clade Aplanulata. We further discuss the relative contribution of four different molecular markers to resolving phylogenetic relationships within Aplanulata. Lastly, we report morphological synapomorphies for some of the major Aplanulata genera and families, and suggest new taxonomic classifications for two species of Aplanulata, Fukaurahydra anthoformis and Corymorpha intermedia, based on a preponderance of molecular and morphological data that justify the designation of these species to different genera. PMID:23280366

  18. The space of ultrametric phylogenetic trees.

    Gavryushkin, Alex; Drummond, Alexei J

    2016-08-21

    The reliability of a phylogenetic inference method from genomic sequence data is ensured by its statistical consistency. Bayesian inference methods produce a sample of phylogenetic trees from the posterior distribution given sequence data. Hence the question of statistical consistency of such methods is equivalent to the consistency of the summary of the sample. More generally, statistical consistency is ensured by the tree space used to analyse the sample. In this paper, we consider two standard parameterisations of phylogenetic time-trees used in evolutionary models: inter-coalescent interval lengths and absolute times of divergence events. For each of these parameterisations we introduce a natural metric space on ultrametric phylogenetic trees. We compare the introduced spaces with existing models of tree space and formulate several formal requirements that a metric space on phylogenetic trees must possess in order to be a satisfactory space for statistical analysis, and justify them. We show that only a few known constructions of the space of phylogenetic trees satisfy these requirements. However, our results suggest that these basic requirements are not enough to distinguish between the two metric spaces we introduce and that the choice between metric spaces requires additional properties to be considered. Particularly, that the summary tree minimising the square distance to the trees from the sample might be different for different parameterisations. This suggests that further fundamental insight is needed into the problem of statistical consistency of phylogenetic inference methods. PMID:27188249

  19. Hazard classification methodology

    This document outlines the hazard classification methodology used to determine the hazard classification of the NIF LTAB, OAB, and the support facilities on the basis of radionuclides and chemicals. The hazard classification determines the safety analysis requirements for a facility

  20. Remote Sensing Information Classification

    Rickman, Douglas L.

    2008-01-01

    This viewgraph presentation reviews the classification of Remote Sensing data in relation to epidemiology. Classification is a way to reduce the dimensionality and precision to something a human can understand. Classification changes SCALAR data into NOMINAL data.

  1. Classification and knowledge

    Kurtz, Michael J.

    1989-01-01

    Automated procedures to classify objects are discussed. The classification problem is reviewed, and the relation of epistemology and classification is considered. The classification of stellar spectra and of resolved images of galaxies is addressed.

  2. Molecular systematics of the Amazonian genus Aldina, a phylogenetically enigmatic ectomycorrhizal lineage of papilionoid legumes.

    Ramos, Gustavo; de Lima, Haroldo Cavalcante; Prenner, Gerhard; de Queiroz, Luciano Paganucci; Zartman, Charles E; Cardoso, Domingos

    2016-04-01

    Aldina (Leguminosae) is among the very few ecologically successful ectomycorrhizal lineages in a family largely marked by the evolution of nodulating symbiosis. The genus comprises 20 species predominantly distributed in Amazonia and has been traditionally classified in the tribe Swartzieae because of its radial flowers with an entire calyx and numerous free stamens. The taxonomy of Aldina is complicated due to its poor representation in herbaria and the lack of a robust phylogenetic hypothesis of relationship. Recent phylogenetic analyses of matK and trnL sequences confirmed the placement of Aldina in the 50-kb inversion clade, although the genus remained phylogenetically isolated or unresolved in the context of the evolutionary history of the main early-branching papilionoid lineages. We performed maximum likelihood and Bayesian analyses of combined chloroplast datasets (matK, rbcL, and trnL) and explored the effect of incomplete taxa or missing data in order to shed light on the enigmatic phylogenetic position of Aldina. Unexpectedly, a sister relationship of Aldina with the Andira clade (Andira and Hymenolobium) is revealed. We suggest that a new tribal phylogenetic classification of the papilionoid legumes should place Aldina along with Andira and Hymenolobium. These results highlight yet another example of the independent evolution of radial floral symmetry within the early-branching Papilionoideae, a large collection of florally heterogeneous lineages dominated by papilionate or bilaterally symmetric flower morphology. PMID:26748266

  3. Texture Classification Based on Texton Features

    U Ravi Babu

    2012-08-01

    Full Text Available Texture Analysis plays an important role in the interpretation, understanding and recognition of terrain, biomedical or microscopic images. To achieve high accuracy in classification the present paper proposes a new method on textons. Each texture analysis method depends upon how the selected texture features characterizes image. Whenever a new texture feature is derived it is tested whether it precisely classifies the textures. Here not only the texture features are important but also the way in which they are applied is also important and significant for a crucial, precise and accurate texture classification and analysis. The present paper proposes a new method on textons, for an efficient rotationally invariant texture classification. The proposed Texton Features (TF evaluates the relationship between the values of neighboring pixels. The proposed classification algorithm evaluates the histogram based techniques on TF for a precise classification. The experimental results on various stone textures indicate the efficacy of the proposed method when compared to other methods.

  4. Discriminating the effects of phylogenetic hypothesis, tree resolution and clade age estimates on phylogenetic signal measurements.

    Seger, G D S; Duarte, L D S; Debastiani, V J; Kindel, A; Jarenkow, J A

    2013-09-01

    Understanding how species traits evolved over time is the central question to comprehend assembly rules that govern the phylogenetic structure of communities. The measurement of phylogenetic signal (PS) in ecologically relevant traits is a first step to understand phylogenetically structured community patterns. The different methods available to estimate PS make it difficult to choose which is most appropriate. Furthermore, alternative phylogenetic tree hypotheses, node resolution and clade age estimates might influence PS measurements. In this study, we evaluated to what extent these parameters affect different methods of PS analysis, and discuss advantages and disadvantages when selecting which method to use. We measured fruit/seed traits and flowering/fruiting phenology of endozoochoric species occurring in Southern Brazilian Araucaria forests and evaluated their PS using Mantel regressions, phylogenetic eigenvector regressions (PVR) and K statistic. Mantel regressions always gave less significant results compared to PVR and K statistic in all combinations of phylogenetic trees constructed. Moreover, a better phylogenetic resolution affected PS, independently of the method used to estimate it. Morphological seed traits tended to show higher PS than diaspores traits, while PS in flowering/fruiting phenology depended mostly on the method used to estimate it. This study demonstrates that different PS estimates are obtained depending on the chosen method and the phylogenetic tree resolution. This finding has implications for inferences on phylogenetic niche conservatism or ecological processes determining phylogenetic community structure. PMID:23368095

  5. Classification and Analysis of Computer Network Traffic

    Bujlow, Tomasz

    2014-01-01

    of traffic for academic purposes. We define the objective of this thesis as finding a way to evaluate the performance of various applications in a high-speed Internet infrastructure. To satisfy the objective, we needed to answer a number of research questions. The biggest extent of them concern techniques...... classification (as by using transport layer port numbers, Deep Packet Inspection (DPI), statistical classification) and assessed their usefulness in particular areas. We found that the classification techniques based on port numbers are not accurate anymore as most applications use dynamic port numbers, while...

  6. Nudivirus Genomics: Diversity and Classification

    Yong-jie Wang; John P. Burand; Johannes A. Jehle

    2007-01-01

    Nudiviruses represent a diverse group of arthropod specific, rod-shaped and dsDNA viruses. Due to similarities in pathology and morphology to members of the family Baculoviridae, they have been previously classified as the so-called "non-occluded" baculoviruses. However, presently they are taxonomically orphaned and are not assigned to any virus family because of the lack of genetic relatedness to Baculoviridae,. Here, we report on recent progress in the genomic analysis of Heliothis zea nudivirus 1 (HzNV-1), Oryctes rhinoceros nudivirus (OrNV), Gryllus bimaculatus nudivirus (GbNV) and Heliotis zea nudivirus 2 (HzNV-2). Gene content comparison and phylogenetic analyses indicated that the viruses share 15 core genes with baculoviruses and form a monophyletic sister group to them. Consequences of the genetic relationship are discussed for the classification of nudiviruses.

  7. DendroBLAST: approximate phylogenetic trees in the absence of multiple sequence alignments.

    Steven Kelly

    Full Text Available The rapidly growing availability of genome information has created considerable demand for both fast and accurate phylogenetic inference algorithms. We present a novel method called DendroBLAST for reconstructing phylogenetic dendrograms/trees from protein sequences using BLAST. This method differs from other methods by incorporating a simple model of sequence evolution to test the effect of introducing sequence changes on the reliability of the bipartitions in the inferred tree. Using realistic simulated sequence data we demonstrate that this method produces phylogenetic trees that are more accurate than other commonly-used distance based methods though not as accurate as maximum likelihood methods from good quality multiple sequence alignments. In addition to tests on simulated data, we use DendroBLAST to generate input trees for a supertree reconstruction of the phylogeny of the Archaea. This independent analysis produces an approximate phylogeny of the Archaea that has both high precision and recall when compared to previously published analysis of the same dataset using conventional methods. Taken together these results demonstrate that approximate phylogenetic trees can be produced in the absence of multiple sequence alignments, and we propose that these trees will provide a platform for improving and informing downstream bioinformatic analysis. A web implementation of the DendroBLAST method is freely available for use at http://www.dendroblast.com/.

  8. Marine turtle mitogenome phylogenetics and evolution

    Duchene, S.; Frey, A.; Alfaro-Núñez, A.;

    2012-01-01

    . Analyses of partial mitochondrial sequences and some nuclear markers have revealed phylogenetic inconsistencies within Cheloniidae, especially regarding the placement of the flatback. Population genetic studies based on D-Loop sequences have shown considerable structuring in species with broad geographic...

  9. Phylogenetic structure in tropical hummingbird communities

    Graham, Catherine H; Parra, Juan L; Rahbek, Carsten;

    2009-01-01

    sustaining an expensive means of locomotion at high elevations. We found that communities in the lowlands on opposite sides of the Andes tend to be phylogenetically similar despite their large differences in species composition, a pattern implicating the Andes as an important dispersal barrier. In contrast......How biotic interactions, current and historical environment, and biogeographic barriers determine community structure is a fundamental question in ecology and evolution, especially in diverse tropical regions. To evaluate patterns of local and regional diversity, we quantified the phylogenetic...... composition of 189 hummingbird communities in Ecuador. We assessed how species and phylogenetic composition changed along environmental gradients and across biogeographic barriers. We show that humid, low-elevation communities are phylogenetically overdispersed (coexistence of distant relatives), a pattern...

  10. A statistical approach to root system classification.

    Gernot eBodner

    2013-08-01

    Full Text Available Plant root systems have a key role in ecology and agronomy. In spite of fast increase in root studies, still there is no classification that allows distinguishing among distinctive characteristics within the diversity of rooting strategies. Our hypothesis is that a multivariate approach for plant functional type identification in ecology can be applied to the classification of root systems. We demonstrate that combining principal component and cluster analysis yields a meaningful classification of rooting types based on morphological traits. The classification method presented is based on a data-defined statistical procedure without a priori decision on the classifiers. Biplot inspection is used to determine key traits and to ensure stability in cluster based grouping. The classification method is exemplified with simulated root architectures and morphological field data. Simulated root architectures showed that morphological attributes with spatial distribution parameters capture most distinctive features within root system diversity. While developmental type (tap vs. shoot-borne systems is a strong, but coarse classifier, topological traits provide the most detailed differentiation among distinctive groups. Adequacy of commonly available morphologic traits for classification is supported by field data. Three rooting types emerged from measured data, distinguished by diameter/weight, density and spatial distribution respectively. Similarity of root systems within distinctive groups was the joint result of phylogenetic relation and environmental as well as human selection pressure. We concluded that the data-define classification is appropriate for integration of knowledge obtained with different root measurement methods and at various scales. Currently root morphology is the most promising basis for classification due to widely used common measurement protocols. To capture details of root diversity efforts in architectural measurement

  11. On the analysis of phylogenetically paired designs

    Funk, Jennifer L.; Rakovski, Cyril S; Macpherson, J Michael

    2015-01-01

    As phylogenetically controlled experimental designs become increasingly common in ecology, the need arises for a standardized statistical treatment of these datasets. Phylogenetically paired designs circumvent the need for resolved phylogenies and have been used to compare species groups, particularly in the areas of invasion biology and adaptation. Despite the widespread use of this approach, the statistical analysis of paired designs has not been critically evaluated. We propose a mixed mod...

  12. Phylogenetic relationships in the family Alloherpesviridae

    Waltzek, T.B.; Kelley, G.O.; Alfaro, M.E.; Kurobe, T.; Davison, A J; Hedrick, R.P.

    2009-01-01

    Phylogenetic relationships among herpesviruses (HVs) of mammals, birds, and reptiles have been studied extensively, whereas those among other HVs are relatively unexplored. We have reconstructed the phylogenetic relationships among 13 fish and amphibian HVs using maximum likelihood and Bayesian analyses of amino acid sequences predicted from parts of the DNA polymerase and terminase genes. The relationships among 6 of these viruses were confirmed using the partial DNA polymerase data plus the...

  13. Consequences of recombination on traditional phylogenetic analysis

    Schierup, M H; Hein, J

    2000-01-01

    We investigate the shape of a phylogenetic tree reconstructed from sequences evolving under the coalescent with recombination. The motivation is that evolutionary inferences are often made from phylogenetic trees reconstructed from population data even though recombination may well occur (mtDNA or...... recombination leads to a large overestimation of the substitution rate heterogeneity and the loss of the molecular clock. These results are discussed in relation to viral and mtDNA data sets. Udgivelsesdato: 2000-Oct...

  14. Phylogenetic Position of Barbus lacerta Heckel, 1843

    Mustafa Korkmaz

    2015-01-01

    The genus Barbus is characterized by a complex taxonomical structure, due to high number of species and its morphological plasticity; it counts more than 25 species in Europe, displaying different ecological preferences. 21 taxon’s from Barbus genus including Barbus lacerta was used in phylogenetic analysis. Cytochrome oxidase I (COI) gene sequence analysis of Barbus lacerta is presented firstly in this study. A phylogenetic tree (neighbor-joining and maximum likelihood analysis) was reco...

  15. Phylogenetic niche conservatism in C4 grasses.

    Liu, Hui; Edwards, Erika J; Freckleton, Robert P; Osborne, Colin P

    2012-11-01

    Photosynthetic pathway is used widely to discriminate plant functional types in studies of global change. However, independent evolutionary lineages of C(4) grasses with different variants of C(4) photosynthesis show different biogeographical relationships with mean annual precipitation, suggesting phylogenetic niche conservatism (PNC). To investigate how phylogeny and photosynthetic type differentiate C(4) grasses, we compiled a dataset of morphological and habitat information of 185 genera belonging to two monophyletic subfamilies, Chloridoideae and Panicoideae, which together account for 90 % of the world's C(4) grass species. We evaluated evolutionary variance and covariance of morphological and habitat traits. Strong phylogenetic signals were found in both morphological and habitat traits, arising mainly from the divergence of the two subfamilies. Genera in Chloridoideae had significantly smaller culm heights, leaf widths, 1,000-seed weights and stomata; they also appeared more in dry, open or saline habitats than those of Panicoideae. Controlling for phylogenetic structure showed significant covariation among morphological traits, supporting the hypothesis of phylogenetically independent scaling effects. However, associations between morphological and habitat traits showed limited phylogenetic covariance. Subfamily was a better explanation than photosynthetic type for the variance in most morphological traits. Morphology, habitat water availability, shading, and productivity are therefore all involved in the PNC of C(4) grass lineages. This study emphasized the importance of phylogenetic history in the ecology and biogeography of C(4) grasses, suggesting that divergent lineages need to be considered to fully understand the impacts of global change on plant distributions. PMID:22569558

  16. Phylogenetic distribution of fungal sterols.

    John D Weete

    Full Text Available BACKGROUND: Ergosterol has been considered the "fungal sterol" for almost 125 years; however, additional sterol data superimposed on a recent molecular phylogeny of kingdom Fungi reveals a different and more complex situation. METHODOLOGY/PRINCIPAL FINDINGS: The interpretation of sterol distribution data in a modern phylogenetic context indicates that there is a clear trend from cholesterol and other Delta(5 sterols in the earliest diverging fungal species to ergosterol in later diverging fungi. There are, however, deviations from this pattern in certain clades. Sterols of the diverse zoosporic and zygosporic forms exhibit structural diversity with cholesterol and 24-ethyl -Delta(5 sterols in zoosporic taxa, and 24-methyl sterols in zygosporic fungi. For example, each of the three monophyletic lineages of zygosporic fungi has distinctive major sterols, ergosterol in Mucorales, 22-dihydroergosterol in Dimargaritales, Harpellales, and Kickxellales (DHK clade, and 24-methyl cholesterol in Entomophthorales. Other departures from ergosterol as the dominant sterol include: 24-ethyl cholesterol in Glomeromycota, 24-ethyl cholest-7-enol and 24-ethyl-cholesta-7,24(28-dienol in rust fungi, brassicasterol in Taphrinales and hypogeous pezizalean species, and cholesterol in Pneumocystis. CONCLUSIONS/SIGNIFICANCE: Five dominant end products of sterol biosynthesis (cholesterol, ergosterol, 24-methyl cholesterol, 24-ethyl cholesterol, brassicasterol, and intermediates in the formation of 24-ethyl cholesterol, are major sterols in 175 species of Fungi. Although most fungi in the most speciose clades have ergosterol as a major sterol, sterols are more varied than currently understood, and their distribution supports certain clades of Fungi in current fungal phylogenies. In addition to the intellectual importance of understanding evolution of sterol synthesis in fungi, there is practical importance because certain antifungal drugs (e.g., azoles target reactions in

  17. Classification of the web

    Mai, Jens Erik

    2004-01-01

    This paper discusses the challenges faced by investigations into the classification of the Web and outlines inquiries that are needed to use principles for bibliographic classification to construct classifications of the Web. This paper suggests that the classification of the Web meets challenges...

  18. Towards accurate emergency response behavior

    Nuclear reactor operator emergency response behavior has persisted as a training problem through lack of information. The industry needs an accurate definition of operator behavior in adverse stress conditions, and training methods which will produce the desired behavior. Newly assembled information from fifty years of research into human behavior in both high and low stress provides a more accurate definition of appropriate operator response, and supports training methods which will produce the needed control room behavior. The research indicates that operator response in emergencies is divided into two modes, conditioned behavior and knowledge based behavior. Methods which assure accurate conditioned behavior, and provide for the recovery of knowledge based behavior, are described in detail

  19. Phylogenetic and functional assessment of orthologs inference projects and methods.

    Adrian M Altenhoff

    2009-01-01

    Full Text Available Accurate genome-wide identification of orthologs is a central problem in comparative genomics, a fact reflected by the numerous orthology identification projects developed in recent years. However, only a few reports have compared their accuracy, and indeed, several recent efforts have not yet been systematically evaluated. Furthermore, orthology is typically only assessed in terms of function conservation, despite the phylogeny-based original definition of Fitch. We collected and mapped the results of nine leading orthology projects and methods (COG, KOG, Inparanoid, OrthoMCL, Ensembl Compara, Homologene, RoundUp, EggNOG, and OMA and two standard methods (bidirectional best-hit and reciprocal smallest distance. We systematically compared their predictions with respect to both phylogeny and function, using six different tests. This required the mapping of millions of sequences, the handling of hundreds of millions of predicted pairs of orthologs, and the computation of tens of thousands of trees. In phylogenetic analysis or in functional analysis where high specificity is required, we find that OMA and Homologene perform best. At lower functional specificity but higher coverage level, OrthoMCL outperforms Ensembl Compara, and to a lesser extent Inparanoid. Lastly, the large coverage of the recent EggNOG can be of interest to build broad functional grouping, but the method is not specific enough for phylogenetic or detailed function analyses. In terms of general methodology, we observe that the more sophisticated tree reconstruction/reconciliation approach of Ensembl Compara was at times outperformed by pairwise comparison approaches, even in phylogenetic tests. Furthermore, we show that standard bidirectional best-hit often outperforms projects with more complex algorithms. First, the present study provides guidance for the broad community of orthology data users as to which database best suits their needs. Second, it introduces new methodology

  20. Hierarchical classification of glycoside hydrolases.

    Naumoff, D G

    2011-06-01

    This review deals with structural and functional features of glycoside hydrolases, a widespread group of enzymes present in almost all living organisms. Their catalytic domains are grouped into 120 amino acid sequence-based families in the international classification of the carbohydrate-active enzymes (CAZy database). At a higher hierarchical level some of these families are combined in 14 clans. Enzymes of the same clan have common evolutionary origin of their genes and share the most important functional characteristics such as composition of the active center, anomeric configuration of cleaved glycosidic bonds, and molecular mechanism of the catalyzed reaction (either inverting, or retaining). There are now extensive data in the literature concerning the relationship between glycoside hydrolase families belonging to different clans and/or included in none of them, as well as information on phylogenetic protein relationship within particular families. Summarizing these data allows us to propose a multilevel hierarchical classification of glycoside hydrolases and their homologs. It is shown that almost the whole variety of the enzyme catalytic domains can be brought into six main folds, large groups of proteins having the same three-dimensional structure and the supposed common evolutionary origin. PMID:21639842

  1. Scalable metagenomic taxonomy classification using a reference genome database

    Ames, Sasha K.; Hysom, David A.; Shea N. Gardner; Lloyd, G. Scott; Gokhale, Maya B.; Allen, Jonathan E.

    2013-01-01

    Motivation: Deep metagenomic sequencing of biological samples has the potential to recover otherwise difficult-to-detect microorganisms and accurately characterize biological samples with limited prior knowledge of sample contents. Existing metagenomic taxonomic classification algorithms, however, do not scale well to analyze large metagenomic datasets, and balancing classification accuracy with computational efficiency presents a fundamental challenge. Results: A method is presented to shift...

  2. PHYLOGENETIC ANALYSIS AMONG FOUR SECTIONS OF GENUS DENDROBIUM SW. (ORCHIDACEAE IN PENINSULAR MALAYSIA USING RBCL SEQUENCE DATA

    Maryam Moudi

    2013-06-01

    Full Text Available Phylogenetic analysis using chloroplast DNA, the ribulose-bisphosphate carboxylase gene (rbcL, was conducted to examine relationship among four sections of the genus Dendrobium (Orchidaceae: Aporum, Crumenata, Strongyle, and Bolbidium in Peninsular Malaysia. Classifications based on morphological characters have not been able to clearly divide these four sections, therefore deeper and detailed analyses are required to ascertain their status. In this study, the phylogenetic relationships among species of the four sections were investigated to clarify their relations either to lump them into one section or reduce them into two.

  3. Barcoding and Phylogenetic Inferences in Nine Mugilid Species (Pisces, Mugiliformes

    Neonila Polyakova

    2013-10-01

    Full Text Available Accurate identification of fish and fish products, from eggs to adults, is important in many areas. Grey mullets of the family Mugilidae are distributed worldwide and inhabit marine, estuarine, and freshwater environments in all tropical and temperate regions. Various Mugilid species are commercially important species in fishery and aquaculture of many countries. For the present study we have chosen two Mugilid genes with different phylogenetic signals: relatively variable mitochondrial cytochrome oxidase subunit I (COI and conservative nuclear rhodopsin (RHO. We examined their diversity within and among 9 Mugilid species belonging to 4 genera, many of which have been examined from multiple specimens, with the goal of determining whether DNA barcoding can achieve unambiguous species recognition of Mugilid species. The data obtained showed that information based on COI sequences was diagnostic not only for species-level identification but also for recognition of intraspecific units, e.g., allopatric populations of circumtropical Mugil cephalus, or even native and acclimatized specimens of Chelon haematocheila. All RHO sequences appeared strictly species specific. Based on the data obtained, we conclude that COI, as well as RHO sequencing can be used to unambiguously identify fish species. Topologies of phylogeny based on RHO and COI sequences coincided with each other, while together they had a good phylogenetic signal.

  4. Phylogenetic analysis of the Trypanosoma genus based on the heat-shock protein 70 gene.

    Fraga, Jorge; Fernández-Calienes, Aymé; Montalvo, Ana Margarita; Maes, Ilse; Deborggraeve, Stijn; Büscher, Philippe; Dujardin, Jean-Claude; Van der Auwera, Gert

    2016-09-01

    Trypanosome evolution was so far essentially studied on the basis of phylogenetic analyses of small subunit ribosomal RNA (SSU-rRNA) and glycosomal glyceraldehyde-3-phosphate dehydrogenase (gGAPDH) genes. We used for the first time the 70kDa heat-shock protein gene (hsp70) to investigate the phylogenetic relationships among 11 Trypanosoma species on the basis of 1380 nucleotides from 76 sequences corresponding to 65 strains. We also constructed a phylogeny based on combined datasets of SSU-rDNA, gGAPDH and hsp70 sequences. The obtained clusters can be correlated with the sections and subgenus classifications of mammal-infecting trypanosomes except for Trypanosoma theileri and Trypanosoma rangeli. Our analysis supports the classification of Trypanosoma species into clades rather than in sections and subgenera, some of which being polyphyletic. Nine clades were recognized: Trypanosoma carassi, Trypanosoma congolense, Trypanosoma cruzi, Trypanosoma grayi, Trypanosoma lewisi, T. rangeli, T. theileri, Trypanosoma vivax and Trypanozoon. These results are consistent with existing knowledge of the genus' phylogeny. Within the T. cruzi clade, three groups of T. cruzi discrete typing units could be clearly distinguished, corresponding to TcI, TcIII, and TcII+V+VI, while support for TcIV was lacking. Phylogenetic analyses based on hsp70 demonstrated that this molecular marker can be applied for discriminating most of the Trypanosoma species and clades. PMID:27180897

  5. PhyTB: Phylogenetic tree visualisation and sample positioning for M. tuberculosis

    Benavente, Ernest D

    2015-05-13

    Background Phylogenetic-based classification of M. tuberculosis and other bacterial genomes is a core analysis for studying evolutionary hypotheses, disease outbreaks and transmission events. Whole genome sequencing is providing new insights into the genomic variation underlying intra- and inter-strain diversity, thereby assisting with the classification and molecular barcoding of the bacteria. One roadblock to strain investigation is the lack of user-interactive solutions to interrogate and visualise variation within a phylogenetic tree setting. Results We have developed a web-based tool called PhyTB (http://pathogenseq.lshtm.ac.uk/phytblive/index.php webcite) to assist phylogenetic tree visualisation and identification of M. tuberculosis clade-informative polymorphism. Variant Call Format files can be uploaded to determine a sample position within the tree. A map view summarises the geographical distribution of alleles and strain-types. The utility of the PhyTB is demonstrated on sequence data from 1,601 M. tuberculosis isolates. Conclusion PhyTB contextualises M. tuberculosis genomic variation within epidemiological, geographical and phylogenic settings. Further tool utility is possible by incorporating large variants and phenotypic data (e.g. drug-resistance profiles), and an assessment of genotype-phenotype associations. Source code is available to develop similar websites for other organisms (http://sourceforge.net/projects/phylotrack webcite).

  6. Accurate determination of antenna directivity

    Dich, Mikael

    1997-01-01

    The derivation of a formula for accurate estimation of the total radiated power from a transmitting antenna for which the radiated power density is known in a finite number of points on the far-field sphere is presented. The main application of the formula is determination of directivity from power...

  7. Characterization of a branch of the phylogenetic tree

    We use a combination of analytic models and computer simulations to gain insight into the dynamics of evolution. Our results suggest that certain interesting phenomena should eventually emerge from the fossil record. For example, there should be a 'tortoise and hare effect': Those genera with the smallest species death rate are likely to survive much longer than genera with large species birth and death rates. A complete characterization of the behavior of a branch of the phylogenetic tree corresponding to a genus and accurate mathematical representations of the various stages are obtained. We apply our results to address certain controversial issues that have arisen in paleontology such as the importance of punctuated equilibrium and whether unique Cambrian phyla have survived to the present

  8. Characterization of a branch of the phylogenetic tree.

    Samuel, Stuart A; Weng, Gezhi

    2003-02-21

    We use a combination of analytic models and computer simulations to gain insight into the dynamics of evolution. Our results suggest that certain interesting phenomena should eventually emerge from the fossil record. For example, there should be a "tortoise and hare effect": those genera with the smallest species death rate are likely to survive much longer than genera with large species birth and death rates. A complete characterization of the behavior of a branch of the phylogenetic tree corresponding to a genus and accurate mathematical representations of the various stages are obtained. We apply our results to address certain controversial issues that have arisen in paleontology such as the importance of punctuated equilibrium and whether unique Cambrian phyla have survived to the present. PMID:12623281

  9. Morphological and molecular convergences in mammalian phylogenetics.

    Zou, Zhengting; Zhang, Jianzhi

    2016-01-01

    Phylogenetic trees reconstructed from molecular sequences are often considered more reliable than those reconstructed from morphological characters, in part because convergent evolution, which confounds phylogenetic reconstruction, is believed to be rarer for molecular sequences than for morphologies. However, neither the validity of this belief nor its underlying cause is known. Here comparing thousands of characters of each type that have been used for inferring the phylogeny of mammals, we find that on average morphological characters indeed experience much more convergences than amino acid sites, but this disparity is explained by fewer states per character rather than an intrinsically higher susceptibility to convergence for morphologies than sequences. We show by computer simulation and actual data analysis that a simple method for identifying and removing convergence-prone characters improves phylogenetic accuracy, potentially enabling, when necessary, the inclusion of morphologies and hence fossils for reliable tree inference. PMID:27585543

  10. Texture Classification based on Gabor Wavelet

    Amandeep Kaur

    2012-07-01

    Full Text Available This paper presents the comparison of Texture classification algorithms based on Gabor Wavelets. The focus of this paper is on feature extraction scheme for texture classification. The texture feature for an image can be classified using texture descriptors. In this paper we have used Homogeneous texture descriptor that uses Gabor Wavelets concept. For texture classification, we have used online texture database that is Brodatz’s database and three advanced well known classifiers: Support Vector Machine, K-nearest neighbor method and decision tree induction method. The results shows that classification using Support vector machines gives better results as compare to the other classifiers. It can accurately discriminate between a testing image data and training data.

  11. Phylogenetic and Structural Analysis of Polyketide Synthases in Aspergilli

    Bhetariya, Preetida J.; Prajapati, Madhvi; Bhaduri, Asani; Mandal, Rahul Shubhra; Varma, Anupam; Madan, Taruna; Singh, Yogendra; Sarma, P. Usha

    2016-01-01

    Polyketide synthases (PKSs) of Aspergillus species are multidomain and multifunctional megaenzymes that play an important role in the synthesis of diverse polyketide compounds. Putative PKS protein sequences from Aspergillus species representing medically, agriculturally, and industrially important Aspergillus species were chosen and screened for in silico studies. Six candidate Aspergillus species, Aspergillus fumigatus Af293, Aspergillus flavus NRRL3357, Aspergillus niger CBS 513.88, Aspergillus terreus NIH2624, Aspergillus oryzae RIB40, and Aspergillus clavatus NRRL1, were selected to study the PKS phylogeny. Full-length PKS proteins and only ketosynthase (KS) domain sequence were retrieved for independent phylogenetic analysis from the aforementioned species, and phylogenetic analysis was performed with characterized fungal PKS. This resulted into grouping of Aspergilli PKSs into nonreducing (NR), partially reducing (PR), and highly reducing (HR) PKS enzymes. Eight distinct clades with unique domain arrangements were classified based on homology with functionally characterized PKS enzymes. Conserved motif signatures corresponding to each type of PKS were observed. Three proteins from Protein Data Bank corresponding to NR, PR, and HR type of PKS (XP_002384329.1, XP_753141.2, and XP_001402408.2, respectively) were selected for mapping of conserved motifs on three-dimensional structures of KS domain. Structural variations were found at the active sites on modeled NR, PR, and HR enzymes of Aspergillus. It was observed that the number of iteration cycles was dependent on the size of the cavity in the active site of the PKS enzyme correlating with a type with reducing or NR products, such as pigment, 6MSA, and lovastatin. The current study reports the grouping and classification of PKS proteins of Aspergilli for possible exploration of novel polyketides based on sequence homology; this information can be useful for selection of PKS for polyketide exploration and

  12. La LC classification come linked data

    Kevin Ford

    2013-01-01

    Full Text Available In 2009 and in 2011, the Library of Congress made two of its largest authority files – Subject Headings and Names – available as linked data via LC’s Linked Data Service, ID.LOC.GOV. Both are offered in MADS/RDF and SKOS. It is LC’s objective, in 2012, to publish another of its largest authority files as linked data: LC Classification. Whereas the source records for Subject Headings and Names are encoded in the MARC Authority format, from which there is a relatively straightforward mapping to MADS/RDF and SKOS, LC Classification records rely on the MARC Classification format. Mapping from LC Classification to MADS/RDF or SKOS has been a little more challenging. For example, records that represent classification ranges, which are not Concepts intended to be assigned, are not easily accommodated in SKOS. This presents additional problems when needing to accurately represent the relationships in RDF for LC Classification. With comparison to the publication of LCSH and Names at ID.LOC.GOV, this paper will examine issues encountered – and how those challenges were addressed – during the conversion of LC Classification to MADS/RDF and SKOS for release as linked data at ID.LOC.GOV.

  13. Typology, classification and systematization of innovative projects and initiatives in the company

    Baklanova Julia O.

    2012-04-01

    Full Text Available The author presents a comparison of definitions of typology, classification and systematization, and treats them as an example of innovative projects and initiatives of the company. The basis of typology and classification laid methodical Benko K., Mc Farlan. In order to obtain a more accurate result it is necessary to integrate the task typology, classification and systematization.

  14. PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium

    Mi, Huaiyu; Dong, Qing; Muruganujan, Anushya; Gaudet, Pascale; Lewis, Suzanna; Thomas, Paul D

    2009-01-01

    Protein Analysis THrough Evolutionary Relationships (PANTHER) is a comprehensive software system for inferring the functions of genes based on their evolutionary relationships. Phylogenetic trees of gene families form the basis for PANTHER and these trees are annotated with ontology terms describing the evolution of gene function from ancestral to modern day genes. One of the main applications of PANTHER is in accurate prediction of the functions of uncharacterized genes, based on their evolu...

  15. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree

    Kodner Robin B

    2010-10-01

    Full Text Available Abstract Background Likelihood-based phylogenetic inference is generally considered to be the most reliable classification method for unknown sequences. However, traditional likelihood-based phylogenetic methods cannot be applied to large volumes of short reads from next-generation sequencing due to computational complexity issues and lack of phylogenetic signal. "Phylogenetic placement," where a reference tree is fixed and the unknown query sequences are placed onto the tree via a reference alignment, is a way to bring the inferential power offered by likelihood-based approaches to large data sets. Results This paper introduces pplacer, a software package for phylogenetic placement and subsequent visualization. The algorithm can place twenty thousand short reads on a reference tree of one thousand taxa per hour per processor, has essentially linear time and memory complexity in the number of reference taxa, and is easy to run in parallel. Pplacer features calculation of the posterior probability of a placement on an edge, which is a statistically rigorous way of quantifying uncertainty on an edge-by-edge basis. It also can inform the user of the positional uncertainty for query sequences by calculating expected distance between placement locations, which is crucial in the estimation of uncertainty with a well-sampled reference tree. The software provides visualizations using branch thickness and color to represent number of placements and their uncertainty. A simulation study using reads generated from 631 COG alignments shows a high level of accuracy for phylogenetic placement over a wide range of alignment diversity, and the power of edge uncertainty estimates to measure placement confidence. Conclusions Pplacer enables efficient phylogenetic placement and subsequent visualization, making likelihood-based phylogenetics methodology practical for large collections of reads; it is freely available as source code, binaries, and a web service.

  16. Phylogenetic position of Oryzolejeunea (Lejeuneaceae,Marchantiophyta): Evidence from molecular markers and morphology

    Wen YE; Yu-Mei WEI; Alfons SCH(A)FER-VERWIMP; Rui-Liang ZHU

    2013-01-01

    The systematic position of the small neotropical genus Oryzolejeunea (three spp.) has long been controversial.Phylogenetic analyses of molecular data for the present study using DNA markers (trnL,psbA,and a nuclear ribosomal internal transcribed spacer [nrITS] region) shows that the genus is nested in Lejeunea.The results not only reveal the phylogenetic position of Oryzolejeunea for the first time,but also challenge the taxonomic value of the proximal hyaline papilla as a key feature in Lejeunea.The present study shows the urgent need for a reassessment of the perimeters of the genus Lejeunea and its infrageneric classification.Three new combinations,namely Lejeunea saccatiloba,Lejeunea grolleana,and Lejeunea venezuelana,are proposed.

  17. The complete mitochondrial genome of Meriones libycus (Rodentia: Cricetidae) and its phylogenetic analysis.

    Luo, Guangjie; Liao, Jicheng

    2016-07-01

    Meriones libycus belongs to the genus Meriones in Gerbillinae, its complete mitochondrial genome is 16,341 bp in length. The heavy strand contains 32.8% A, 13.1% G, 25.3% C, 28.8% T, protein-coding genes approximately accounting for 69.54%. Results of phylogenetic analysis showed that M. libycus and Meriones unguiculatus were clustered together, and it was consistent with that of primary morphological taxonomy. This study verifies the evolutionary status of M. libycus in Meriones at the molecular level. The mitochondrial genome would be a significant supplement for the gene pool of Rodentia and the conclusion of phylogenetic analysis could be an important molecular evidence for the classification of Gerbillinae. PMID:26017047

  18. Hand eczema classification

    Diepgen, T L; Andersen, Klaus Ejner; Brandao, F M;

    2008-01-01

    the disease is rarely evidence based, and a classification system for different subdiagnoses of hand eczema is not agreed upon. Randomized controlled trials investigating the treatment of hand eczema are called for. For this, as well as for clinical purposes, a generally accepted classification system...... classification system for hand eczema is proposed. Conclusions It is suggested that this classification be used in clinical work and in clinical trials....

  19. Phylogenetic Relationships and Biogeographic History of Iriarteeae

    Bacon, Christine D.; Florez, Alexander; Balslev, Henrik;

    sequence data for 11 loci (5 chloroplast and 6 nuclear) to reconstruct a coalescent species tree and infer relationships amongst genera and species to, in turn, allow for tests of biogeography and community phylogenetics in the tribe. Our results define inter-generic relationships and resolve all genera as...

  20. DNA barcoding and phylogenetic relationships in Timaliidae.

    Huang, Z H; Ke, D H

    2015-01-01

    The Timaliidae, a diverse family of oscine passerine birds, has long been a subject of debate regarding its phylogeny. The mitochondrial cytochrome c oxidase subunit I (COI) gene has been used as a powerful marker for identification and phylogenetic studies of animal species. In the present study, we analyzed the COI barcodes of 71 species from 21 genera belonging to the family Timaliidae. Every bird species possessed a barcode distinct from that of other bird species. Kimura two-parameter (K2P) distances were calculated between barcodes. The average genetic distance between species was 18 times higher than the average genetic distance within species. The neighbor-joining method was used to construct a phylogenetic tree and all the species could be discriminated by their distinct clades within the phylogenetic tree. The results indicate that some currently recognized babbler genera might not be monophyletic, with the COI gene data supporting the hypothesis of polyphyly for Garrulax, Alcippe, and Minla. Thus, DNA barcoding is an effective molecular tool for Timaliidae species identification and phylogenetic inference. PMID:26125793

  1. Phylogenetic and phylogenomic overview of the Polyporales.

    Binder, Manfred; Justo, Alfredo; Riley, Robert; Salamov, Asaf; Lopez-Giraldez, Francesc; Sjökvist, Elisabet; Copeland, Alex; Foster, Brian; Sun, Hui; Larsson, Ellen; Larsson, Karl-Henrik; Townsend, Jeffrey; Grigoriev, Igor V; Hibbett, David S

    2013-01-01

    We present a phylogenetic and phylogenomic overview of the Polyporales. The newly sequenced genomes of Bjerkandera adusta, Ganoderma sp., and Phlebia brevispora are introduced and an overview of 10 currently available Polyporales genomes is provided. The new genomes are 39 500 000-49 900 00 bp and encode for 12 910-16 170 genes. We searched available genomes for single-copy genes and performed phylogenetic informativeness analyses to evaluate their potential for phylogenetic systematics of the Polyporales. Phylogenomic datasets (25, 71, 356 genes) were assembled for the 10 Polyporales species with genome data and compared with the most comprehensive dataset of Polyporales to date (six-gene dataset for 373 taxa, including taxa with missing data). Maximum likelihood and Bayesian phylogenetic analyses of genomic datasets yielded identical topologies, and the corresponding clades also were recovered in the 373-taxa dataset although with different support values in some datasets. Three previously recognized lineages of Polyporales, antrodia, core polyporoid and phlebioid clades, are supported in most datasets, while the status of the residual polyporoid clade remains uncertain and certain taxa (e.g. Gelatoporia, Grifola, Tyromyces) apparently do not belong to any of the major lineages of Polyporales. The most promising candidate single-copy genes are presented, and nodes in the Polyporales phylogeny critical for the suprageneric taxonomy of the order are identified and discussed. PMID:23935031

  2. Phylogenetic Memory of Developing Mammalian Dentition

    Peterková, Renata; Lesot, H.; Peterka, Miroslav

    2006-01-01

    Roč. 306, č. 3 (2006), s. 234-250. ISSN 1552-5007 R&D Projects: GA ČR GA304/05/2665; GA MŠk OC B23.002 Institutional research plan: CEZ:AV0Z50390512 Keywords : Phylogenetic Memory Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 2.756, year: 2006

  3. Genomic repeat abundances contain phylogenetic signal

    Dodsworth, S.; Chase, M.W.; Kelly, L.J.; Leitch, I.J.; Macas, Jiří; Novák, Petr; Piednoël, M.; Weiß-Schneeweiss, H.; Leitch, A.R.

    2015-01-01

    Roč. 64, č. 1 (2015), s. 112-126. ISSN 1063-5157 R&D Projects: GA ČR GBP501/12/G090 Institutional support: RVO:60077344 Keywords : Repetitive DNA * continuous characters * genomics * next-generation sequencing * phylogenetics Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 14.387, year: 2014

  4. Classification of articulators.

    Rihani, A

    1980-03-01

    A simple classification in familiar terms with definite, clear characteristics can be adopted. This classification system is based on the number of records used and the adjustments necessary for the articulator to accept these records. The classification divides the articulators into nonadjustable, semiadjustable, and fully adjustable articulators (Table I). PMID:6928204

  5. Aircraft Operations Classification System

    Harlow, Charles; Zhu, Weihong

    2001-01-01

    Accurate data is important in the aviation planning process. In this project we consider systems for measuring aircraft activity at airports. This would include determining the type of aircraft such as jet, helicopter, single engine, and multiengine propeller. Some of the issues involved in deploying technologies for monitoring aircraft operations are cost, reliability, and accuracy. In addition, the system must be field portable and acceptable at airports. A comparison of technologies was conducted and it was decided that an aircraft monitoring system should be based upon acoustic technology. A multimedia relational database was established for the study. The information contained in the database consists of airport information, runway information, acoustic records, photographic records, a description of the event (takeoff, landing), aircraft type, and environmental information. We extracted features from the time signal and the frequency content of the signal. A multi-layer feed-forward neural network was chosen as the classifier. Training and testing results were obtained. We were able to obtain classification results of over 90 percent for training and testing for takeoff events.

  6. Phylogenetic relationships of some species of the family Echinostomatidae Odner, 1910 ( Trematoda ), inferred from nuclear rDNA sequences and karyological analysis

    Gražina Stanevičiūtė; Virmantas Stunžėnas; Romualda Petkevičiūtė

    2015-01-01

    Abstract The family Echinostomatidae Looss, 1899 exhibits a substantial taxonomic diversity, morphological criteria adopted by different authors have resulted in its subdivision into an impressive number of subfamilies. The status of the subfamily Echinochasminae Odhner, 1910 was changed in various classifications. Genetic characteristics and phylogenetic analysis of four Echinostomatidae species – Echinochasmus sp., Echinochasmus coaxatus Dietz, 1909, Stephanoprora pseudoechinata (Olsson, 18...

  7. Cirrhosis classification based on texture classification of random features.

    Liu, Hui; Shao, Ying; Guo, Dongmei; Zheng, Yuanjie; Zhao, Zuowei; Qiu, Tianshuang

    2014-01-01

    Accurate staging of hepatic cirrhosis is important in investigating the cause and slowing down the effects of cirrhosis. Computer-aided diagnosis (CAD) can provide doctors with an alternative second opinion and assist them to make a specific treatment with accurate cirrhosis stage. MRI has many advantages, including high resolution for soft tissue, no radiation, and multiparameters imaging modalities. So in this paper, multisequences MRIs, including T1-weighted, T2-weighted, arterial, portal venous, and equilibrium phase, are applied. However, CAD does not meet the clinical needs of cirrhosis and few researchers are concerned with it at present. Cirrhosis is characterized by the presence of widespread fibrosis and regenerative nodules in the hepatic, leading to different texture patterns of different stages. So, extracting texture feature is the primary task. Compared with typical gray level cooccurrence matrix (GLCM) features, texture classification from random features provides an effective way, and we adopt it and propose CCTCRF for triple classification (normal, early, and middle and advanced stage). CCTCRF does not need strong assumptions except the sparse character of image, contains sufficient texture information, includes concise and effective process, and makes case decision with high accuracy. Experimental results also illustrate the satisfying performance and they are also compared with typical NN with GLCM. PMID:24707317

  8. Phyloclimatic modeling: combining phylogenetics and bioclimatic modeling.

    Yesson, C; Culham, A

    2006-10-01

    We investigate the impact of past climates on plant diversification by tracking the "footprint" of climate change on a phylogenetic tree. Diversity within the cosmopolitan carnivorous plant genus Drosera (Droseraceae) is focused within Mediterranean climate regions. We explore whether this diversity is temporally linked to Mediterranean-type climatic shifts of the mid-Miocene and whether climate preferences are conservative over phylogenetic timescales. Phyloclimatic modeling combines environmental niche (bioclimatic) modeling with phylogenetics in order to study evolutionary patterns in relation to climate change. We present the largest and most complete such example to date using Drosera. The bioclimatic models of extant species demonstrate clear phylogenetic patterns; this is particularly evident for the tuberous sundews from southwestern Australia (subgenus Ergaleium). We employ a method for establishing confidence intervals of node ages on a phylogeny using replicates from a Bayesian phylogenetic analysis. This chronogram shows that many clades, including subgenus Ergaleium and section Bryastrum, diversified during the establishment of the Mediterranean-type climate. Ancestral reconstructions of bioclimatic models demonstrate a pattern of preference for this climate type within these groups. Ancestral bioclimatic models are projected into palaeo-climate reconstructions for the time periods indicated by the chronogram. We present two such examples that each generate plausible estimates of ancestral lineage distribution, which are similar to their current distributions. This is the first study to attempt bioclimatic projections on evolutionary time scales. The sundews appear to have diversified in response to local climate development. Some groups are specialized for Mediterranean climates, others show wide-ranging generalism. This demonstrates that Phyloclimatic modeling could be repeated for other plant groups and is fundamental to the understanding of

  9. Proposal for a revised classification of the Demospongiae (Porifera)

    Morrow, Christine; Cardenas, Paco

    2015-01-01

    Background: Demospongiae is the largest sponge class including 81% of all living sponges with nearly 7,000 species worldwide. Systema Porifera (2002) was the result of a large international collaboration to update the Demospongiae higher taxa classification, essentially based on morphological data. Since then, an increasing number of molecular phylogenetic studies have considerably shaken this taxonomic framework, with numerous polyphyletic groups revealed or confirmed and new clades discover...

  10. Identification and classification of silks using infrared spectroscopy

    M. Boulet-Audet; Vollrath, F.; Holland, C.

    2015-01-01

    ABSTRACT Lepidopteran silks number in the thousands and display a vast diversity of structures, properties and industrial potential. To map this remarkable biochemical diversity, we present an identification and screening method based on the infrared spectra of native silk feedstock and cocoons. Multivariate analysis of over 1214 infrared spectra obtained from 35 species allowed us to group silks into distinct hierarchies and a classification that agrees well with current phylogenetic data an...

  11. Phylogenetic Characterization of Transport Protein Superfamilies: Superiority of SuperfamilyTree Programs over Those Based on Multiple Alignments

    Chen, Jonathan S.; Reddy, Vamsee; Chen, Joshua H.; Shlykov, Maksim A; Zheng, Wei Hao; Cho, Jaehoon; Yen, Ming Ren; Saier, Milton H.

    2012-01-01

    Transport proteins function in the translocation of ions, solutes and macromolecules across cellular and organellar membranes. These integral membrane proteins fall into >600 families as tabulated in the Transporter Classification Database (www.tcdb.org). Recent studies, some of which are reported here, define distant phylogenetic relationships between families with the creation of superfamilies. Several of these are analyzed using a novel set of programs designed to allow reliable prediction...

  12. A new measure to study phylogenetic relations in the brown algal order Ectocarpales: The ``codon impact parameter"

    Smarajit Das; Jayprokas Chakrabarti; Zhumur Ghosh; Satyabrata Sahoo; Bibekanand Mallick

    2005-12-01

    We analyse forty-seven chloroplast genes of the large subunit of RuBisCO, from the algal order Ectocarpales, sourced from GenBank. Codon-usage weighted by the nucleotide base-bias defines our score called the codon-impact-parameter. This score is used to obtain phylogenetic relations amongst the 47 Ectocarpales. We compare our classification with the ones done earlier.

  13. Strong phylogenetic signals and phylogenetic niche conservatism in ecophysiological traits across divergent lineages of Magnoliaceae

    Hui Liu; Qiuyuan Xu; Pengcheng He; Santiago, Louis S.; Keming Yang; Qing Ye

    2015-01-01

    The early diverged Magnoliaceae shows a historical temperate-tropical distribution among lineages indicating divergent evolution, yet which ecophysiological traits are phylogenetically conserved, and whether these traits are involved in correlated evolution remain unclear. Integrating phylogeny and 20 ecophysiological traits of 27 species, from the four largest sections of Magnoliaceae, we tested the phylogenetic signals of these traits and the correlated evolution between trait pairs. Phylog...

  14. Fast and accurate estimation for astrophysical problems in large databases

    Richards, Joseph W.

    2010-10-01

    A recent flood of astronomical data has created much demand for sophisticated statistical and machine learning tools that can rapidly draw accurate inferences from large databases of high-dimensional data. In this Ph.D. thesis, methods for statistical inference in such databases will be proposed, studied, and applied to real data. I use methods for low-dimensional parametrization of complex, high-dimensional data that are based on the notion of preserving the connectivity of data points in the context of a Markov random walk over the data set. I show how this simple parameterization of data can be exploited to: define appropriate prototypes for use in complex mixture models, determine data-driven eigenfunctions for accurate nonparametric regression, and find a set of suitable features to use in a statistical classifier. In this thesis, methods for each of these tasks are built up from simple principles, compared to existing methods in the literature, and applied to data from astronomical all-sky surveys. I examine several important problems in astrophysics, such as estimation of star formation history parameters for galaxies, prediction of redshifts of galaxies using photometric data, and classification of different types of supernovae based on their photometric light curves. Fast methods for high-dimensional data analysis are crucial in each of these problems because they all involve the analysis of complicated high-dimensional data in large, all-sky surveys. Specifically, I estimate the star formation history parameters for the nearly 800,000 galaxies in the Sloan Digital Sky Survey (SDSS) Data Release 7 spectroscopic catalog, determine redshifts for over 300,000 galaxies in the SDSS photometric catalog, and estimate the types of 20,000 supernovae as part of the Supernova Photometric Classification Challenge. Accurate predictions and classifications are imperative in each of these examples because these estimates are utilized in broader inference problems

  15. A user's guide to a data base of the diversity of Pseudomonas syringae and its application to classifying strains in this phylogenetic complex.

    Odile Berge

    Full Text Available The Pseudomonas syringae complex is composed of numerous genetic lineages of strains from both agricultural and environmental habitats including habitats closely linked to the water cycle. The new insights from the discovery of this bacterial species in habitats outside of agricultural contexts per se have led to the revelation of a wide diversity of strains in this complex beyond what was known from agricultural contexts. Here, through Multi Locus Sequence Typing (MLST of 216 strains, we identified 23 clades within 13 phylogroups among which the seven previously described P. syringae phylogroups were included. The phylogeny of the core genome of 29 strains representing nine phylogroups was similar to the phylogeny obtained with MLST thereby confirming the robustness of MLST-phylogroups. We show that phenotypic traits rarely provide a satisfactory means for classification of strains even if some combinations are highly probable in some phylogroups. We demonstrate that the citrate synthase (cts housekeeping gene can accurately predict the phylogenetic affiliation for more than 97% of strains tested. We propose a list of cts sequences to be used as a simple tool for quickly and precisely classifying new strains. Finally, our analysis leads to predictions about the diversity of P. syringae that is yet to be discovered. We present here an expandable framework mainly based on cts genetic analysis into which more diversity can be integrated.

  16. A user's guide to a data base of the diversity of Pseudomonas syringae and its application to classifying strains in this phylogenetic complex.

    Berge, Odile; Monteil, Caroline L; Bartoli, Claudia; Chandeysson, Charlotte; Guilbaud, Caroline; Sands, David C; Morris, Cindy E

    2014-01-01

    The Pseudomonas syringae complex is composed of numerous genetic lineages of strains from both agricultural and environmental habitats including habitats closely linked to the water cycle. The new insights from the discovery of this bacterial species in habitats outside of agricultural contexts per se have led to the revelation of a wide diversity of strains in this complex beyond what was known from agricultural contexts. Here, through Multi Locus Sequence Typing (MLST) of 216 strains, we identified 23 clades within 13 phylogroups among which the seven previously described P. syringae phylogroups were included. The phylogeny of the core genome of 29 strains representing nine phylogroups was similar to the phylogeny obtained with MLST thereby confirming the robustness of MLST-phylogroups. We show that phenotypic traits rarely provide a satisfactory means for classification of strains even if some combinations are highly probable in some phylogroups. We demonstrate that the citrate synthase (cts) housekeeping gene can accurately predict the phylogenetic affiliation for more than 97% of strains tested. We propose a list of cts sequences to be used as a simple tool for quickly and precisely classifying new strains. Finally, our analysis leads to predictions about the diversity of P. syringae that is yet to be discovered. We present here an expandable framework mainly based on cts genetic analysis into which more diversity can be integrated. PMID:25184292

  17. A User's Guide to a Data Base of the Diversity of Pseudomonas syringae and Its Application to Classifying Strains in This Phylogenetic Complex

    Berge, Odile; Monteil, Caroline L.; Bartoli, Claudia; Chandeysson, Charlotte; Guilbaud, Caroline; Sands, David C.; Morris, Cindy E.

    2014-01-01

    The Pseudomonas syringae complex is composed of numerous genetic lineages of strains from both agricultural and environmental habitats including habitats closely linked to the water cycle. The new insights from the discovery of this bacterial species in habitats outside of agricultural contexts per se have led to the revelation of a wide diversity of strains in this complex beyond what was known from agricultural contexts. Here, through Multi Locus Sequence Typing (MLST) of 216 strains, we identified 23 clades within 13 phylogroups among which the seven previously described P. syringae phylogroups were included. The phylogeny of the core genome of 29 strains representing nine phylogroups was similar to the phylogeny obtained with MLST thereby confirming the robustness of MLST-phylogroups. We show that phenotypic traits rarely provide a satisfactory means for classification of strains even if some combinations are highly probable in some phylogroups. We demonstrate that the citrate synthase (cts) housekeeping gene can accurately predict the phylogenetic affiliation for more than 97% of strains tested. We propose a list of cts sequences to be used as a simple tool for quickly and precisely classifying new strains. Finally, our analysis leads to predictions about the diversity of P. syringae that is yet to be discovered. We present here an expandable framework mainly based on cts genetic analysis into which more diversity can be integrated. PMID:25184292

  18. Phylogenetic and structural analysis of merkel cell polyomavirus VP1 in Brazilian samples.

    Baez, Camila F; Diaz, Nuria C; Venceslau, Marianna T; Luz, Flávio B; Guimarães, Maria Angelica A M; Zalis, Mariano G; Varella, Rafael B

    2016-08-01

    Our understanding of the phylogenetic and structural characteristics of the Merkel Cell Polyomavirus (MCPyV) is increasing but still scarce, especially in samples originating from South America. In order to investigate the properties of MCPyV circulating in the continent in more detail, MCPyV Viral Protein 1 (VP1) sequences from five basal cell carcinoma (BCC) and four saliva samples from Brazilian individuals were evaluated from the phylogenetic and structural standpoint, along with all complete MCPyV VP1 sequences available at Genbank database so far. The VP1 phylogenetic analysis confirmed the previously reported pattern of geographic distribution of MCPyV genotypes and the complexity of the South-American clade. The nine Brazilian samples were equally distributed in the South-American (3 saliva samples); North American/European (2 BCC and 1 saliva sample); and in the African clades (3 BCC). The classification of mutations according to the functional regions of VP1 protein revealed a differentiated pattern for South-American sequences, with higher number of mutations on the neutralizing epitope loops and lower on the region of C-terminus, responsible for capsid formation, when compared to other continents. In conclusion, the phylogenetic analysis showed that the distribution of Brazilian VP1 sequences agrees with the ethnic composition of the country, indicating that VP1 can be successfully used for MCPyV phylogenetic studies. Finally, the structural analysis suggests that some mutations could have impact on the protein folding, membrane binding or antibody escape, and therefore they should be further studied. PMID:27173789

  19. Accurate Image Super-Resolution Using Very Deep Convolutional Networks

    Kim, Jiwon; Lee, Jung Kwon; Lee, Kyoung Mu

    2015-01-01

    We present a highly accurate single-image super-resolution (SR) method. Our method uses a very deep convolutional network inspired by VGG-net used for ImageNet classification \\cite{simonyan2015very}. We find increasing our network depth shows a significant improvement in accuracy. Our final model uses 20 weight layers. By cascading small filters many times in a deep network structure, contextual information over large image regions is exploited in an efficient way. With very deep networks, ho...

  20. Automatic classification of blank substrate defects

    Boettiger, Tom; Buck, Peter; Paninjath, Sankaranarayanan; Pereira, Mark; Ronald, Rob; Rost, Dan; Samir, Bhamidipati

    2014-10-01

    Mask preparation stages are crucial in mask manufacturing, since this mask is to later act as a template for considerable number of dies on wafer. Defects on the initial blank substrate, and subsequent cleaned and coated substrates, can have a profound impact on the usability of the finished mask. This emphasizes the need for early and accurate identification of blank substrate defects and the risk they pose to the patterned reticle. While Automatic Defect Classification (ADC) is a well-developed technology for inspection and analysis of defects on patterned wafers and masks in the semiconductors industry, ADC for mask blanks is still in the early stages of adoption and development. Calibre ADC is a powerful analysis tool for fast, accurate, consistent and automatic classification of defects on mask blanks. Accurate, automated classification of mask blanks leads to better usability of blanks by enabling defect avoidance technologies during mask writing. Detailed information on blank defects can help to select appropriate job-decks to be written on the mask by defect avoidance tools [1][4][5]. Smart algorithms separate critical defects from the potentially large number of non-critical defects or false defects detected at various stages during mask blank preparation. Mechanisms used by Calibre ADC to identify and characterize defects include defect location and size, signal polarity (dark, bright) in both transmitted and reflected review images, distinguishing defect signals from background noise in defect images. The Calibre ADC engine then uses a decision tree to translate this information into a defect classification code. Using this automated process improves classification accuracy, repeatability and speed, while avoiding the subjectivity of human judgment compared to the alternative of manual defect classification by trained personnel [2]. This paper focuses on the results from the evaluation of Automatic Defect Classification (ADC) product at MP Mask

  1. Quality-Oriented Classification of Aircraft Material Based on SVM

    Hongxia Cai

    2014-01-01

    Full Text Available The existing material classification is proposed to improve the inventory management. However, different materials have the different quality-related attributes, especially in the aircraft industry. In order to reduce the cost without sacrificing the quality, we propose a quality-oriented material classification system considering the material quality character, Quality cost, and Quality influence. Analytic Hierarchy Process helps to make feature selection and classification decision. We use the improved Kraljic Portfolio Matrix to establish the three-dimensional classification model. The aircraft materials can be divided into eight types, including general type, key type, risk type, and leveraged type. Aiming to improve the classification accuracy of various materials, the algorithm of Support Vector Machine is introduced. Finally, we compare the SVM and BP neural network in the application. The results prove that the SVM algorithm is more efficient and accurate and the quality-oriented material classification is valuable.

  2. Accurate Modeling of Advanced Reflectarrays

    Zhou, Min

    of the incident field, the choice of basis functions, and the technique to calculate the far-field. Based on accurate reference measurements of two offset reflectarrays carried out at the DTU-ESA Spherical NearField Antenna Test Facility, it was concluded that the three latter factors are particularly important...... to the conventional phase-only optimization technique (POT), the geometrical parameters of the array elements are directly optimized to fulfill the far-field requirements, thus maintaining a direct relation between optimization goals and optimization variables. As a result, better designs can be obtained compared...... using the GDOT to demonstrate its capabilities. To verify the accuracy of the GDOT, two offset contoured beam reflectarrays that radiate a high-gain beam on a European coverage have been designed and manufactured, and subsequently measured at the DTU-ESA Spherical Near-Field Antenna Test Facility...

  3. Accurate ab initio spin densities

    Boguslawski, Katharina; Legeza, Örs; Reiher, Markus

    2012-01-01

    We present an approach for the calculation of spin density distributions for molecules that require very large active spaces for a qualitatively correct description of their electronic structure. Our approach is based on the density-matrix renormalization group (DMRG) algorithm to calculate the spin density matrix elements as basic quantity for the spatially resolved spin density distribution. The spin density matrix elements are directly determined from the second-quantized elementary operators optimized by the DMRG algorithm. As an analytic convergence criterion for the spin density distribution, we employ our recently developed sampling-reconstruction scheme [J. Chem. Phys. 2011, 134, 224101] to build an accurate complete-active-space configuration-interaction (CASCI) wave function from the optimized matrix product states. The spin density matrix elements can then also be determined as an expectation value employing the reconstructed wave function expansion. Furthermore, the explicit reconstruction of a CA...

  4. Accurate thickness measurement of graphene

    Shearer, Cameron J.; Slattery, Ashley D.; Stapleton, Andrew J.; Shapter, Joseph G.; Gibson, Christopher T.

    2016-03-01

    Graphene has emerged as a material with a vast variety of applications. The electronic, optical and mechanical properties of graphene are strongly influenced by the number of layers present in a sample. As a result, the dimensional characterization of graphene films is crucial, especially with the continued development of new synthesis methods and applications. A number of techniques exist to determine the thickness of graphene films including optical contrast, Raman scattering and scanning probe microscopy techniques. Atomic force microscopy (AFM), in particular, is used extensively since it provides three-dimensional images that enable the measurement of the lateral dimensions of graphene films as well as the thickness, and by extension the number of layers present. However, in the literature AFM has proven to be inaccurate with a wide range of measured values for single layer graphene thickness reported (between 0.4 and 1.7 nm). This discrepancy has been attributed to tip-surface interactions, image feedback settings and surface chemistry. In this work, we use standard and carbon nanotube modified AFM probes and a relatively new AFM imaging mode known as PeakForce tapping mode to establish a protocol that will allow users to accurately determine the thickness of graphene films. In particular, the error in measuring the first layer is reduced from 0.1-1.3 nm to 0.1-0.3 nm. Furthermore, in the process we establish that the graphene-substrate adsorbate layer and imaging force, in particular the pressure the tip exerts on the surface, are crucial components in the accurate measurement of graphene using AFM. These findings can be applied to other 2D materials.

  5. Statistical assignment of DNA sequences using Bayesian phylogenetics

    Terkelsen, Kasper Munch; Boomsma, Wouter Krogh; Huelsenbeck, John P;

    2008-01-01

    We provide a new automated statistical method for DNA barcoding based on a Bayesian phylogenetic analysis. The method is based on automated database sequence retrieval, alignment, and phylogenetic analysis using a custom-built program for Bayesian phylogenetic analysis. We show on real data that...

  6. Phylogenetic Analysis of PRRSV from Danish Pigs

    Hjulsager, Charlotte Kristiane; Breum, Solvej Østergaard; Larsen, Lars Erik

    visualized with NJ-plot software. Genbank entries of Danish PRRSV sequences from the 1990’ties were included in the phylogenetic analysis. Translated sequences were aligned with current vaccine isolates. Results Both PRRSV EU and US type viruses were isolated from material submitted from Danish pigs in the...... phylogenetic analysis, in order to asses the applicability of vaccines currently used to control PRRSV infection in Danish pig herds. Materials and methods Lung tissue from samples submitted to the National Veterinary Institute during 2003-2008 for PRRSV diagnosis were screened for PRRSV by real-time RT......-PCR, essentially as described by Egli et al. 2001, on RNA extracted with RNeasy Mini Kit (QIAGEN). Complete open reading frames (ORF) ORF5 and ORF7 were PCR amplified as described (Oleksiewicz et al. 1998) and sequenced. Sequences were aligned and Neighbour-Joining trees were constructed with ClustalX. Trees were...

  7. The phylogenetics of succession can guide restoration

    Shooner, Stephanie; Chisholm, Chelsea Lee; Davies, T. Jonathan

    2015-01-01

    Phylogenetic tools have increasingly been used in community ecology to describe the evolutionary relationships among co-occurring species. In studies of succession, such tools may allow us to identify the evolutionary lineages most suited for particular stages of succession and habitat...... phylogenetically random subset of species from the local species pool. Over time, there appears to be selection for particular lineages that come to be filtered across space and environment. The species most appropriate for mine site restoration might, therefore, depend on the successional stage of the community...... appropriate for mine site restoration might, therefore, depend on the successional stage of the community and the local species composition. For example, in later succession, it could be more beneficial to facilitate establishment of more distant relatives. Our findings can improve management practices by...

  8. A phylogenetic analysis of Aquifex pyrophilus

    Burggraf, S.; Olsen, G. J.; Stetter, K. O.; Woese, C. R.

    1992-01-01

    The 16S rRNA of the bacterion Aquifex pyrophilus, a microaerophilic, oxygen-reducing hyperthermophile, has been sequenced directly from the the PCR amplified gene. Phylogenetic analyses show the Aq. pyrophilus lineage to be probably the deepest (earliest) in the (eu)bacterial tree. The addition of this deep branching to the bacterial tree further supports the argument that the Bacteria are of thermophilic ancestry.

  9. A Consistent Phylogenetic Backbone for the Fungi

    Ebersberger, Ingo; de Matos Simoes, Ricardo; Kupczok, Anne; Gube, Matthias; Kothe, Erika; Voigt, Kerstin; von Haeseler, Arndt

    2011-01-01

    The kingdom of fungi provides model organisms for biotechnology, cell biology, genetics, and life sciences in general. Only when their phylogenetic relationships are stably resolved, can individual results from fungal research be integrated into a holistic picture of biology. However, and despite recent progress, many deep relationships within the fungi remain unclear. Here, we present the first phylogenomic study of an entire eukaryotic kingdom that uses a consistency criterion to strengthen...

  10. Phylogenetic invariants for stationary base composition

    Allman, Elizabeth S.; Rhodes, John A.

    2004-01-01

    Changing base composition during the evolution of biological sequences can mislead some of the phylogenetic inference techniques in current use. However, detecting whether such a process has occurred may be difficult, since convergent evolution may lead to similar base frequencies emerging from different lineages. To study this situation, algebraic models of biological sequence evolution are introduced in which the base composition is fixed throughout evolution. Basic properties of the associ...

  11. Phylogenetic tree shapes resolve disease transmission patterns

    Colijn, Caroline; Gardy, Jennifer

    2014-01-01

    Background and Objectives: Whole-genome sequencing is becoming popular as a tool for understanding outbreaks of communicable diseases, with phylogenetic trees being used to identify individual transmission events or to characterize outbreak-level overall transmission dynamics. Existing methods to infer transmission dynamics from sequence data rely on well-characterized infectious periods, epidemiological and clinical metadata which may not always be available, and typically require computatio...

  12. Phylogenetic conservatism of functional traits in microorganisms

    Martiny, Adam C.; Treseder, Kathleen; Pusch, Gordon

    2012-01-01

    A central question in biology is how biodiversity influences ecosystem functioning. Underlying this is the relationship between organismal phylogeny and the presence of specific functional traits. The relationship is complicated by gene loss and convergent evolution, resulting in the polyphyletic distribution of many traits. In microorganisms, lateral gene transfer can further distort the linkage between phylogeny and the presence of specific functional traits. To identify the phylogenetic co...

  13. Security classification of information

    Quist, A.S.

    1993-04-01

    This document is the second of a planned four-volume work that comprehensively discusses the security classification of information. The main focus of Volume 2 is on the principles for classification of information. Included herein are descriptions of the two major types of information that governments classify for national security reasons (subjective and objective information), guidance to use when determining whether information under consideration for classification is controlled by the government (a necessary requirement for classification to be effective), information disclosure risks and benefits (the benefits and costs of classification), standards to use when balancing information disclosure risks and benefits, guidance for assigning classification levels (Top Secret, Secret, or Confidential) to classified information, guidance for determining how long information should be classified (classification duration), classification of associations of information, classification of compilations of information, and principles for declassifying and downgrading information. Rules or principles of certain areas of our legal system (e.g., trade secret law) are sometimes mentioned to .provide added support to some of those classification principles.

  14. Recursive heuristic classification

    Wilkins, David C.

    1994-01-01

    The author will describe a new problem-solving approach called recursive heuristic classification, whereby a subproblem of heuristic classification is itself formulated and solved by heuristic classification. This allows the construction of more knowledge-intensive classification programs in a way that yields a clean organization. Further, standard knowledge acquisition and learning techniques for heuristic classification can be used to create, refine, and maintain the knowledge base associated with the recursively called classification expert system. The method of recursive heuristic classification was used in the Minerva blackboard shell for heuristic classification. Minerva recursively calls itself every problem-solving cycle to solve the important blackboard scheduler task, which involves assigning a desirability rating to alternative problem-solving actions. Knowing these ratings is critical to the use of an expert system as a component of a critiquing or apprenticeship tutoring system. One innovation of this research is a method called dynamic heuristic classification, which allows selection among dynamically generated classification categories instead of requiring them to be prenumerated.

  15. Incongruencies in Vaccinia Virus Phylogenetic Trees

    Chad Smithson

    2014-10-01

    Full Text Available Over the years, as more complete poxvirus genomes have been sequenced, phylogenetic studies of these viruses have become more prevalent. In general, the results show similar relationships between the poxvirus species; however, some inconsistencies are notable. Previous analyses of the viral genomes contained within the vaccinia virus (VACV-Dryvax vaccine revealed that their phylogenetic relationships were sometimes clouded by low bootstrapping confidence. To analyze the VACV-Dryvax genomes in detail, a new tool-set was developed and integrated into the Base-By-Base bioinformatics software package. Analyses showed that fewer unique positions were present in each VACV-Dryvax genome than expected. A series of patterns, each containing several single nucleotide polymorphisms (SNPs were identified that were counter to the results of the phylogenetic analysis. The VACV genomes were found to contain short DNA sequence blocks that matched more distantly related clades. Additionally, similar non-conforming SNP patterns were observed in (1 the variola virus clade; (2 some cowpox clades; and (3 VACV-CVA, the direct ancestor of VACV-MVA. Thus, traces of past recombination events are common in the various orthopoxvirus clades, including those associated with smallpox and cowpox viruses.

  16. A Consistent Phylogenetic Backbone for the Fungi

    Ebersberger, Ingo; de Matos Simoes, Ricardo; Kupczok, Anne; Gube, Matthias; Kothe, Erika; Voigt, Kerstin; von Haeseler, Arndt

    2012-01-01

    The kingdom of fungi provides model organisms for biotechnology, cell biology, genetics, and life sciences in general. Only when their phylogenetic relationships are stably resolved, can individual results from fungal research be integrated into a holistic picture of biology. However, and despite recent progress, many deep relationships within the fungi remain unclear. Here, we present the first phylogenomic study of an entire eukaryotic kingdom that uses a consistency criterion to strengthen phylogenetic conclusions. We reason that branches (splits) recovered with independent data and different tree reconstruction methods are likely to reflect true evolutionary relationships. Two complementary phylogenomic data sets based on 99 fungal genomes and 109 fungal expressed sequence tag (EST) sets analyzed with four different tree reconstruction methods shed light from different angles on the fungal tree of life. Eleven additional data sets address specifically the phylogenetic position of Blastocladiomycota, Ustilaginomycotina, and Dothideomycetes, respectively. The combined evidence from the resulting trees supports the deep-level stability of the fungal groups toward a comprehensive natural system of the fungi. In addition, our analysis reveals methodologically interesting aspects. Enrichment for EST encoded data—a common practice in phylogenomic analyses—introduces a strong bias toward slowly evolving and functionally correlated genes. Consequently, the generalization of phylogenomic data sets as collections of randomly selected genes cannot be taken for granted. A thorough characterization of the data to assess possible influences on the tree reconstruction should therefore become a standard in phylogenomic analyses. PMID:22114356

  17. The phylogenetic affinities of the extinct glyptodonts.

    Delsuc, Frédéric; Gibb, Gillian C; Kuch, Melanie; Billet, Guillaume; Hautier, Lionel; Southon, John; Rouillard, Jean-Marie; Fernicola, Juan Carlos; Vizcaíno, Sergio F; MacPhee, Ross D E; Poinar, Hendrik N

    2016-02-22

    Among the fossils of hitherto unknown mammals that Darwin collected in South America between 1832 and 1833 during the Beagle expedition were examples of the large, heavily armored herbivores later known as glyptodonts. Ever since, glyptodonts have fascinated evolutionary biologists because of their remarkable skeletal adaptations and seemingly isolated phylogenetic position even within their natural group, the cingulate xenarthrans (armadillos and their allies). In possessing a carapace comprised of fused osteoderms, the glyptodonts were clearly related to other cingulates, but their precise phylogenetic position as suggested by morphology remains unresolved. To provide a molecular perspective on this issue, we designed sequence-capture baits using in silico reconstructed ancestral sequences and successfully assembled the complete mitochondrial genome of Doedicurus sp., one of the largest glyptodonts. Our phylogenetic reconstructions establish that glyptodonts are in fact deeply nested within the armadillo crown-group, representing a distinct subfamily (Glyptodontinae) within family Chlamyphoridae. Molecular dating suggests that glyptodonts diverged no earlier than around 35 million years ago, in good agreement with their fossil record. Our results highlight the derived nature of the glyptodont morphotype, one aspect of which is a spectacular increase in body size until their extinction at the end of the last ice age. PMID:26906483

  18. Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium.

    Gaudet, Pascale; Livstone, Michael S; Lewis, Suzanna E; Thomas, Paul D

    2011-09-01

    The goal of the Gene Ontology (GO) project is to provide a uniform way to describe the functions of gene products from organisms across all kingdoms of life and thereby enable analysis of genomic data. Protein annotations are either based on experiments or predicted from protein sequences. Since most sequences have not been experimentally characterized, most available annotations need to be based on predictions. To make as accurate inferences as possible, the GO Consortium's Reference Genome Project is using an explicit evolutionary framework to infer annotations of proteins from a broad set of genomes from experimental annotations in a semi-automated manner. Most components in the pipeline, such as selection of sequences, building multiple sequence alignments and phylogenetic trees, retrieving experimental annotations and depositing inferred annotations, are fully automated. However, the most crucial step in our pipeline relies on software-assisted curation by an expert biologist. This curation tool, Phylogenetic Annotation and INference Tool (PAINT) helps curators to infer annotations among members of a protein family. PAINT allows curators to make precise assertions as to when functions were gained and lost during evolution and record the evidence (e.g. experimentally supported GO annotations and phylogenetic information including orthology) for those assertions. In this article, we describe how we use PAINT to infer protein function in a phylogenetic context with emphasis on its strengths, limitations and guidelines. We also discuss specific examples showing how PAINT annotations compare with those generated by other highly used homology-based methods. PMID:21873635

  19. Sequence exploration reveals information bias among molecular markers used in phylogenetic reconstruction for Colletotrichum species.

    Rampersad, Sephra N; Hosein, Fazeeda N; Carrington, Christine Vf

    2014-01-01

    The Colletotrichum gloeosporioides species complex is among the most destructive fungal plant pathogens in the world, however, identification of isolates of quarantine importance to the intra-specific level is confounded by a number of factors that affect phylogenetic reconstruction. Information bias and quality parameters were investigated to determine whether nucleotide sequence alignments and phylogenetic trees accurately reflect the genetic diversity and phylogenetic relatedness of individuals. Sequence exploration of GAPDH, ACT, TUB2 and ITS markers indicated that the query sequences had different patterns of nucleotide substitution but were without evidence of base substitution saturation. Regions of high entropy were much more dispersed in the ACT and GAPDH marker alignments than for the ITS and TUB2 markers. A discernible bimodal gap in the genetic distance frequency histograms was produced for the ACT and GAPDH markers which indicated successful separation of intra- and inter-specific sequences in the data set. Overall, analyses indicated clear differences in the ability of these markers to phylogenetically separate individuals to the intra-specific level which coincided with information bias. PMID:25392785

  20. Ant-Based Phylogenetic Reconstruction (ABPR): A new distance algorithm for phylogenetic estimation based on ant colony optimization

    Karla Vittori; Alexandre C B Delbem; Pereira, Sérgio L

    2008-01-01

    We propose a new distance algorithm for phylogenetic estimation based on Ant Colony Optimization (ACO), named Ant-Based Phylogenetic Reconstruction (ABPR). ABPR joins two taxa iteratively based on evolutionary distance among sequences, while also accounting for the quality of the phylogenetic tree built according to the total length of the tree. Similar to optimization algorithms for phylogenetic estimation, the algorithm allows exploration of a larger set of nearly optimal solutions. We appl...

  1. A More Accurate Fourier Transform

    Courtney, Elya

    2015-01-01

    Fourier transform methods are used to analyze functions and data sets to provide frequencies, amplitudes, and phases of underlying oscillatory components. Fast Fourier transform (FFT) methods offer speed advantages over evaluation of explicit integrals (EI) that define Fourier transforms. This paper compares frequency, amplitude, and phase accuracy of the two methods for well resolved peaks over a wide array of data sets including cosine series with and without random noise and a variety of physical data sets, including atmospheric $\\mathrm{CO_2}$ concentrations, tides, temperatures, sound waveforms, and atomic spectra. The FFT uses MIT's FFTW3 library. The EI method uses the rectangle method to compute the areas under the curve via complex math. Results support the hypothesis that EI methods are more accurate than FFT methods. Errors range from 5 to 10 times higher when determining peak frequency by FFT, 1.4 to 60 times higher for peak amplitude, and 6 to 10 times higher for phase under a peak. The ability t...

  2. Text Classification Using Sentential Frequent Itemsets

    Shi-Zhu Liu; He-Ping Hu

    2007-01-01

    Text classification techniques mostly rely on single term analysis of the document data set, while more concepts,especially the specific ones, are usually conveyed by set of terms. To achieve more accurate text classifier, more informative feature including frequent co-occurring words in the same sentence and their weights are particularly important in such scenarios. In this paper, we propose a novel approach using sentential frequent itemset, a concept comes from association rule mining, for text classification, which views a sentence rather than a document as a transaction, and uses a variable precision rough set based method to evaluate each sentential frequent itemset's contribution to the classification. Experiments over the Reuters and newsgroup corpus are carried out, which validate the practicability of the proposed system.

  3. AGN Zoo and Classifications of Active Galaxies

    Mickaelian, Areg M.

    2015-07-01

    We review the variety of Active Galactic Nuclei (AGN) classes (so-called "AGN zoo") and classification schemes of galaxies by activity types based on their optical emission-line spectrum, as well as other parameters and other than optical wavelength ranges. A historical overview of discoveries of various types of active galaxies is given, including Seyfert galaxies, radio galaxies, QSOs, BL Lacertae objects, Starbursts, LINERs, etc. Various kinds of AGN diagnostics are discussed. All known AGN types and subtypes are presented and described to have a homogeneous classification scheme based on the optical emission-line spectra and in many cases, also other parameters. Problems connected with accurate classifications and open questions related to AGN and their classes are discussed and summarized.

  4. An Ensemble Classification Algorithm for Hyperspectral Images

    K.Kavitha

    2014-04-01

    Full Text Available Hyperspectral image analysis has been used for many purposes in environmental monitoring, remote sensing, vegetation research and also for land cover classification. A hyperspectral image consists of many layers in which each layer represents a specific wavelength. The layers stack on top of one another making a cube-like image for entire spectrum. This work aims to classify the hyperspectral images and to produce a thematic map accurately. Spatial information of hyperspectral images is collected by applying morphological profile and local binary pattern. Support vector machine is an efficient classification algorithm for classifying the hyperspectral images. Genetic algorithm is used to obtain the best feature subjected for classification. Selected features are classified for obtaining the classes and to produce a thematic map. Experiment is carried out with AVIRIS Indian Pines and ROSIS Pavia University. Proposed method produces accuracy as 93% for Indian Pines and 92% for Pavia University.

  5. Efficient Pairwise Multilabel Classification

    Loza Mencía, Eneldo

    2013-01-01

    Multilabel classification learning is the task of learning a mapping between objects and sets of possibly overlapping classes and has gained increasing attention in recent times. A prototypical application scenario for multilabel classification is the assignment of a set of keywords to a document, a frequently encountered problem in the text classification domain. With upcoming Web 2.0 technologies, this domain is extended by a wide range of tag suggestion tasks and the trend definitely...

  6. Classiology and soil classification

    Rozhkov, V. A.

    2012-03-01

    Classiology can be defined as a science studying the principles and rules of classification of objects of any nature. The development of the theory of classification and the particular methods for classifying objects are the main challenges of classiology; to a certain extent, they are close to the challenges of pattern recognition. The methodology of classiology integrates a wide range of methods and approaches: from expert judgment to formal logic, multivariate statistics, and informatics. Soil classification assumes generalization of available data and practical experience, formalization of our notions about soils, and their representation in the form of an information system. As an information system, soil classification is designed to predict the maximum number of a soil's properties from the position of this soil in the classification space. The existing soil classification systems do not completely satisfy the principles of classiology. The violation of logical basis, poor structuring, low integrity, and inadequate level of formalization make these systems verbal schemes rather than classification systems sensu stricto. The concept of classification as listing (enumeration) of objects makes it possible to introduce the notion of the information base of classification. For soil objects, this is the database of soil indices (properties) that might be applied for generating target-oriented soil classification system. Mathematical methods enlarge the prognostic capacity of classification systems; they can be applied to assess the quality of these systems and to recognize new soil objects to be included in the existing systems. The application of particular principles and rules of classiology for soil classification purposes is discussed in this paper.

  7. Classifier in Age classification

    B. Santhi; R.Seethalakshmi

    2012-01-01

    Face is the important feature of the human beings. We can derive various properties of a human by analyzing the face. The objective of the study is to design a classifier for age using facial images. Age classification is essential in many applications like crime detection, employment and face detection. The proposed algorithm contains four phases: preprocessing, feature extraction, feature selection and classification. The classification employs two class labels namely child and Old. This st...

  8. Aspects de la classification

    Mari, Jean-François; Napoli, Amedeo

    1996-01-01

    Les techniques de classification numérique ont toujours été présentes en reconnaissance des formes. Les réseaux de neurones montrent chaque jour leurs (très ?) bonnes propriétés de classification, et la classification se fait de plus en plus présente en représentation des connaissances. Ainsi, ce rapport présente, simplement dans un but introductif, les aspects mathématiques, statistiques, neuromimétiques et cognitifs de la classification.

  9. Ontologies vs. Classification Systems

    Madsen, Bodil Nistrup; Erdman Thomsen, Hanne

    2009-01-01

    What is an ontology compared to a classification system? Is a taxonomy a kind of classification system or a kind of ontology? These are questions that we meet when working with people from industry and public authorities, who need methods and tools for concept clarification, for developing meta...... data sets or for obtaining advanced search facilities. In this paper we will present an attempt at answering these questions. We will give a presentation of various types of ontologies and briefly introduce terminological ontologies. Furthermore we will argue that classification systems, e.g. product...... classification systems and meta data taxonomies, should be based on ontologies....

  10. A Note on Encodings of Phylogenetic Networks of Bounded Level

    Gambette, Philippe

    2009-01-01

    Driven by the need for better models that allow one to shed light into the question how life's diversity has evolved, phylogenetic networks have now joined phylogenetic trees in the center of phylogenetics research. Like phylogenetic trees, such networks canonically induce collections of phylogenetic trees, clusters, and triplets, respectively. Thus it is not surprising that many network approaches aim to reconstruct a phylogenetic network from such collections. Related to the well-studied perfect phylogeny problem, the following question is of fundamental importance in this context: When does one of the above collections encode (i.e. uniquely describe) the network that induces it? In this note, we present a complete answer to this question for the special case of a level-1 (phylogenetic) network by characterizing those level-1 networks for which an encoding in terms of one (or equivalently all) of the above collections exists. Given that this type of network forms the first layer of the rich hierarchy of lev...