WorldWideScience

Sample records for repeats keywords bioinformatics

  1. Bioinformatics

    DEFF Research Database (Denmark)

    Baldi, Pierre; Brunak, Søren

    , and medicine will be particularly affected by the new results and the increased understanding of life at the molecular level. Bioinformatics is the development and application of computer methods for analysis, interpretation, and prediction, as well as for the design of experiments. It has emerged...... as a strategic frontier between biology and computer science. Machine learning approaches (e.g. neural networks, hidden Markov models, and belief networsk) are ideally suited for areas in which there is a lot of data but little theory. The goal in machine learning is to extract useful information from a body...... of data by building good probabilistic models. The particular twist behind machine learning, however, is to automate the process as much as possible.In this book, the authors present the key machine learning approaches and apply them to the computational problems encountered in the analysis of biological...

  2. Keyword Search in Databases

    CERN Document Server

    Yu, Jeffrey Xu; Chang, Lijun

    2009-01-01

    It has become highly desirable to provide users with flexible ways to query/search information over databases as simple as keyword search like Google search. This book surveys the recent developments on keyword search over databases, and focuses on finding structural information among objects in a database using a set of keywords. Such structural information to be returned can be either trees or subgraphs representing how the objects, that contain the required keywords, are interconnected in a relational database or in an XML database. The structural keyword search is completely different from

  3. The Mnemonic Keyword Method.

    Science.gov (United States)

    Pressley, Michael; And Others

    1982-01-01

    Available experimental evidence is reviewed concerning the keyword method, a two-stage procedure for remembering materials having an associative component. The review examines subjects' memory for definitions, given vocabulary words; subjects' learning of other aspects of vocabulary, given definitions; group-administered keyword studies; and…

  4. Searching Databases with Keywords

    Institute of Scientific and Technical Information of China (English)

    Shan Wang; Kun-Long Zhang

    2005-01-01

    Traditionally, SQL query language is used to search the data in databases. However, it is inappropriate for end-users, since it is complex and hard to learn. It is the need of end-user, searching in databases with keywords, like in web search engines. This paper presents a survey of work on keyword search in databases. It also includes a brief introduction to the SEEKER system which has been developed.

  5. Collective spatial keyword querying

    DEFF Research Database (Denmark)

    Cao, Xin; Cong, Gao; Jensen, Christian S.;

    2011-01-01

    With the proliferation of geo-positioning and geo-tagging, spatial web objects that possess both a geographical location and a textual description are gaining in prevalence, and spatial keyword queries that exploit both location and textual description are gaining in prominence. However......, the queries studied so far generally focus on finding individual objects that each satisfy a query rather than finding groups of objects where the objects in a group collectively satisfy a query. We define the problem of retrieving a group of spatial web objects such that the group's keywords cover the query...

  6. Moving Spatial Keyword Queries

    DEFF Research Database (Denmark)

    Wu, Dingming; Yiu, Man Lung; Jensen, Christian S.

    2013-01-01

    Web users and content are increasingly being geo-positioned. This development gives prominence to spatial keyword queries, which involve both the locations and textual descriptions of content. We study the efficient processing of continuously moving top-k spatial keyword (MkSK) queries over spatial...... text data. State-of-the-art solutions for moving queries employ safe zones that guarantee the validity of reported results as long as the user remains within the safe zone associated with a result. However, existing safe-zone methods focus solely on spatial locations and ignore text relevancy. We...

  7. Spatial Keyword Querying

    DEFF Research Database (Denmark)

    Cao, Xin; Chen, Lisi; Cong, Gao;

    2012-01-01

    The web is increasingly being used by mobile users. In addition, it is increasingly becoming possible to accurately geo-position mobile users and web content. This development gives prominence to spatial web data management. Specifically, a spatial keyword query takes a user location and user-sup...... different kinds of functionality as well as the ideas underlying their definition....

  8. Spatial Keyword Query Processing

    DEFF Research Database (Denmark)

    Chen, Lisi; Jensen, Christian S.; Wu, Dingming

    2013-01-01

    an all-around survey of 12 state- of-the-art geo-textual indices. We propose a benchmark that en- ables the comparison of the spatial keyword query performance. We also report on the findings obtained when applying the bench- mark to the indices, thus uncovering new insights that may guide index...

  9. PTO: The New Keyword

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    @@ On November 5, the State Council made public the new paid-time-off (PTO) regulations in order to get some feedback. As a result, "PTO"has become the most searched keyword in China;shortly after the announcement more than 140,000 people had participated in a survey on Sina.com, with nearly 60% showing their support of the new policies.

  10. Contents, Abstracts and Keywords

    Institute of Scientific and Technical Information of China (English)

    2012-01-01

    Transnationalism and Its Theoretical Contributions…… Ding Yueya(1) Abstract: The theory of transnationalism and its development in past twenty years brings great contributions to the studies of the immigration, transnation and transculture. The paper first presents the origin of transnationalism and its definition; then, based on the review of literature relevant, illustrates three core concepts of transnationalism. transnational practices, transnational social space, and transnational identity; finally, discovers the contributions of transnationalism which mean the innovational influence of its logic, perspectives, structure, and method on the academia. Keywords. transnationalism; transnational practices; transnational social space; transnational identity.

  11. Standardization of Keyword Search Mode

    Science.gov (United States)

    Su, Di

    2010-01-01

    In spite of its popularity, keyword search mode has not been standardized. Though information professionals are quick to adapt to various presentations of keyword search mode, novice end-users may find keyword search confusing. This article compares keyword search mode in some major reference databases and calls for standardization. (Contains 3…

  12. Contents, Abstracts and Keywords

    Institute of Scientific and Technical Information of China (English)

    2012-01-01

    The Social Competition and the Construction of Ethnic Group: A Refection of Western Theory about Resource Competition Guan Kai(1) Abstract: The theory about resource competition claims that ethnic group, as a symbol, is a kind of toot in the social competition, and that on certain social conditions, people can be organized by ethnic group to strive for social resource, hence the boundary among ethnic groups is drawn. So the need of social competition and the strategy of individual and collectivity to meet it becomes the basic power to construct ethnic group. The paper reviews the origin and development of the theory about resource competition of ethnic group, and the relative cases; on the basis of the above, responds to the theory and analyzes its limitation. At the same time, the paper also tries to introduces a new variable--state--to the theory, that is to say, add a viewpoints of state in the theory about resource competition whose focus is on the society, and discovers the important role of state which means a competition coordinator and its function and effect in the construction of ethnic group. Keywords : resource competition; ethnic group ; state; construction of ethnic group ; competition of ethnic group.

  13. Hybrid Keyword Search Auctions

    CERN Document Server

    Goel, Ashish

    2008-01-01

    Search auctions have become a dominant source of revenue generation on the Internet. Such auctions have typically used per-click bidding and pricing. We propose the use of hybrid auctions where an advertiser can make a per-impression as well as a per-click bid, and the auctioneer then chooses one of the two as the pricing mechanism. We assume that the advertiser and the auctioneer both have separate beliefs (called priors) on the click-probability of an advertisement. We first prove that the hybrid auction is truthful, assuming that the advertisers are risk-neutral. We then show that this auction is superior to the existing per-click auction in multiple ways: 1) It takes into account the risk characteristics of the advertisers. 2) For obscure keywords, the auctioneer is unlikely to have a very sharp prior on the click-probabilities. In such situations, the hybrid auction can result in significantly higher revenue. 3) An advertiser who believes that its click-probability is much higher than the auctioneer's es...

  14. Hybrid keyword search auctions

    KAUST Repository

    Goel, Ashish

    2009-01-01

    Search auctions have become a dominant source of revenue generation on the Internet. Such auctions have typically used per-click bidding and pricing. We propose the use of hybrid auctions where an advertiser can make a per-impression as well as a per-click bid, and the auctioneer then chooses one of the two as the pricing mechanism. We assume that the advertiser and the auctioneer both have separate beliefs (called priors) on the click-probability of an advertisement. We first prove that the hybrid auction is truthful, assuming that the advertisers are risk-neutral. We then show that this auction is superior to the existing per-click auction in multiple ways: 1. We show that risk-seeking advertisers will choose only a per-impression bid whereas risk-averse advertisers will choose only a per-click bid, and argue that both kind of advertisers arise naturally. Hence, the ability to bid in a hybrid fashion is important to account for the risk characteristics of the advertisers. 2. For obscure keywords, the auctioneer is unlikely to have a very sharp prior on the click-probabilities. In such situations, we show that having the extra information from the advertisers in the form of a per-impression bid can result in significantly higher revenue. 3. An advertiser who believes that its click-probability is much higher than the auctioneer\\'s estimate can use per-impression bids to correct the auctioneer\\'s prior without incurring any extra cost. 4. The hybrid auction can allow the advertiser and auctioneer to implement complex dynamic programming strategies to deal with the uncertainty in the click-probability using the same basic auction. The per-click and per-impression bidding schemes can only be used to implement two extreme cases of these strategies. As Internet commerce matures, we need more sophisticated pricing models to exploit all the information held by each of the participants. We believe that hybrid auctions could be an important step in this direction. The

  15. Keywords

    African Journals Online (AJOL)

    Femi

    a health facility is a determinant of patient's choice of provider and willingness to pay for the services. This paper discusses ... Journal of Community Medicine and Primary Health Care. 28(1) 25-30 ..... on Health Workers' Training. Daily Trust.

  16. Keywords

    African Journals Online (AJOL)

    Femi

    Introduction: Most patients in Rivers State seek health care from primary health centres which recently had ... Outcome measured were patients' satisfaction with doctors and nurses' ..... to subject selection, measurement instrument and. 15.

  17. Keywords

    African Journals Online (AJOL)

    Femi

    found to be as long as over 2 hours in Malaysia, to about 42.89 ... guide the Department's management of the PHC. The specific ... The required sample size was calculated using the ... flow analysis chart was given to the parents or ..... health resources to improve service delivery and ... Waiting Room Issues" (2012).

  18. Keywords

    African Journals Online (AJOL)

    Femi

    Background: Domestic Violence is a serious, preventable public health problem that affects millions of people. The abuse of women ... Conclusion: Prevalence of domestic violence in the studied group was high. ...... Violence in North India.

  19. Keywords

    African Journals Online (AJOL)

    Femi

    family planning methods is for men while 41.7% would communicate with their wives about the need for either partner to use family planning ... Journal of Community Medicine and Primary Health Care. ..... Nursing Clinics of North America.

  20. Keywords

    African Journals Online (AJOL)

    Femi

    Male gender and not taking alcohol regularly were significant ... adults in Lagos and the level of knowledge of harmful health effects of tobacco is low .... Footnote: All the students were unemployed and studying at the tertiary level of education.

  1. Keywords:

    African Journals Online (AJOL)

    Femi

    Methods: The study examined several aspects of diabetes-related knowledge, attitude and ... (39.6%) 10.1% and 2.8% knew that diabetes can present with fast breathing and abdominal pain respectively. ..... Pakistan and the other in Iran.

  2. KEYWORDS'

    African Journals Online (AJOL)

    2003-05-25

    May 25, 2003 ... ... propensity to spread to healthcare workers and household members, .... phase, by hyaline membranes, interstitial and ... Respiratory protection and barrier nursing are advised for ... first time in the history of the World Health.

  3. An Extended Keyword Extraction Method

    Science.gov (United States)

    Hong, Bao; Zhen, Deng

    Among numerous Chinese keyword extraction methods, Chinese characteristics were shortly considered. This phenomenon going against the precision enhancement of the Chinese keyword extraction. An extended term frequency based method(Extended TF) is proposed in this paper which combined Chinese linguistic characteristics with basic TF method. Unary, binary and ternary grammars for the candidate keyword extraction as well as other linguistic features were all taken into account. The method establishes classification model using support vector machine. Tests show that the proposed extraction method improved key words precision and recall rate significantly. We applied the key words extracted by the extended TF method into the text file classification. Results show that the key words extracted by the proposed method contributed greatly to raising the precision of text file classification.

  4. Keywords in musical free improvisation

    DEFF Research Database (Denmark)

    Bergstrøm-Nielsen, Carl

    2017-01-01

    This article presents some keywords and concepts concerning free improvised music and its recent developments drawing from ongoing bibliographical research. A radical pluralism stems from musicians' backgrounds and the mixtures and fusions of styles and idioms resulting from these mixtures....... Seemingly very different "performance-driven" and "playdriven" attitudes exist, even among musicians who share the practice of performing at concerts. New models of musical analysis aiming specifically at free improvised music provide strategical observations of interaction and structure....

  5. Effective Approaches For Extraction Of Keywords

    Directory of Open Access Journals (Sweden)

    Jasmeen Kaur

    2010-11-01

    Full Text Available Keywords are index terms that contain most important information. Automatic keyword extraction is the task to identify a small set of words , keyphrases or keywords from a document that can describe the meaning of document. Keyword extraction is considered as core technology of all automatic processing for text materials. In this paper, a Survey of Keyword Extraction techniques have been presented that can be applied to extract effective keywords that uniquely identify a document.

  6. Using "Arabidopsis" Genetic Sequences to Teach Bioinformatics

    Science.gov (United States)

    Zhang, Xiaorong

    2009-01-01

    This article describes a new approach to teaching bioinformatics using "Arabidopsis" genetic sequences. Several open-ended and inquiry-based laboratory exercises have been designed to help students grasp key concepts and gain practical skills in bioinformatics, using "Arabidopsis" leucine-rich repeat receptor-like kinase (LRR…

  7. Using "Arabidopsis" Genetic Sequences to Teach Bioinformatics

    Science.gov (United States)

    Zhang, Xiaorong

    2009-01-01

    This article describes a new approach to teaching bioinformatics using "Arabidopsis" genetic sequences. Several open-ended and inquiry-based laboratory exercises have been designed to help students grasp key concepts and gain practical skills in bioinformatics, using "Arabidopsis" leucine-rich repeat receptor-like kinase (LRR…

  8. Visualization According To Research Paper Keywords

    OpenAIRE

    Isenberg, Petra; Isenberg, Tobias; Sedlmair, Michael; Chen, Jian; Möller, Torsten

    2014-01-01

    To appear; International audience; We analyzed visualization paper keywords supplied for 4366 papers accepted to three main visualization conferences. We describe main keywords, topic areas, and 10-year historic trends from author-chosen keywords for papers published in the IEEE Visualization conference series (now called IEEE VIS) since 2004. Furthermore, we present the KeyVis Web application that allows visualization researchers to easily browse the 2600+ keywords used for IEEE VIS papers o...

  9. Transcriptional and Bioinformatic Analysis Provide a Relationship between Host Response Changes to Marek’s Disease Viruses Infection and an Integrated Long Terminal Repeat

    Directory of Open Access Journals (Sweden)

    Ning eCui

    2016-04-01

    Full Text Available GX0101, Marek’s disease virus (MDV strain with a long terminal repeat (LTR insert of reticuloendotheliosis virus (REV, was isolated from CVI988/Rispens vaccinated birds showing tumors. We have constructed a LTR deleted strain GX0101∆LTR in our previous study. To compare the host responses to GX0101 and GX0101∆LTR, chicken embryo fibroblasts (CEF cells were infected with two MDV strains and a gene-chip containing chicken genome was employed to examine gene transcription changes in host cells in the present study. Of the 42 368 chicken transcripts on the chip, there were 2199 genes that differentially expressed in CEF infected with GX0101 compared to GX0101∆LTR significantly. Differentially expressed genes were distributed to 25 possible gene networks according to their intermolecular connections and were annotated to 56 pathways. The insertion of REV LTR showed the greatest influence on cancer formation and metastasis, followed with immune changes, atherosclerosis and nervous system disorders in MDV-infected CEF cells. Based on these bio functions, GX0101 infection was predicated with a greater growth and survival inhibition but lower oncogenicity in chickens than GX0101∆LTR, at least in the acute phase of infection. In summary, the insertion of REV LTR altered the expression of host genes in response to MDV infection, possibly resulting in novel phenotypic properties in chickens. Our study has provided the evidence of retroviral insertional changes of host responses to herpesvirus infection for the first time, which will promote to elucidation of the possible relationship between the LTR insertion and the observed phenotypes.

  10. Automatic Keyword Extraction from Individual Documents

    Energy Technology Data Exchange (ETDEWEB)

    Rose, Stuart J.; Engel, David W.; Cramer, Nicholas O.; Cowley, Wendy E.

    2010-05-03

    This paper introduces a novel and domain-independent method for automatically extracting keywords, as sequences of one or more words, from individual documents. We describe the method’s configuration parameters and algorithm, and present an evaluation on a benchmark corpus of technical abstracts. We also present a method for generating lists of stop words for specific corpora and domains, and evaluate its ability to improve keyword extraction on the benchmark corpus. Finally, we apply our method of automatic keyword extraction to a corpus of news articles and define metrics for characterizing the exclusivity, essentiality, and generality of extracted keywords within a corpus.

  11. An Introduction to Bioinformatics

    Institute of Scientific and Technical Information of China (English)

    SHENG Qi-zheng; De Moor Bart

    2004-01-01

    As a newborn interdisciplinary field, bioinformatics is receiving increasing attention from biologists, computer scientists, statisticians, mathematicians and engineers. This paper briefly introduces the birth, importance, and extensive applications of bioinformatics in the different fields of biological research. A major challenge in bioinformatics - the unraveling of gene regulation - is discussed in detail.

  12. Precision and Recall in Title Keyword Searches.

    Science.gov (United States)

    McJunkin, Monica Cahill

    This study examines precision and recall for title and keyword searches performed in the "FirstSearch" WorldCat database when keywords are used with and without adjacency of terms specified. A random sample of 68 titles in economics were searched in the OCLC (Online Computer Library Center) Online Union Catalog in order to obtain their…

  13. Keyword analysis of community planning documents

    Data.gov (United States)

    U.S. Environmental Protection Agency — This file contains total hits per keyword expressed as percentage of total hits for the eight domains of the human well-being index. Additional categorical data is...

  14. Processing keyword queries under access limitations

    OpenAIRE

    Calì, Andrea; Martinenghi, D.; Torlone, R.

    2015-01-01

    The Deep Web is constituted by data accessible through Web pages, but not readily indexable by search engines, as they are returned in dynamic pages. In this paper we propose a framework for accessing Deep Web sources, represented as relational tables with so-called access limitations, with keyword-based queries. We formalize the notion of optimal answer and propose methods for query processing. To the best of our knowledge, ours is the first systematic approach to keyword search in such cont...

  15. Extraction of Protein-Protein Interaction from Scientific Articles by Predicting Dominant Keywords

    Directory of Open Access Journals (Sweden)

    Shun Koyabu

    2015-01-01

    Full Text Available For the automatic extraction of protein-protein interaction information from scientific articles, a machine learning approach is useful. The classifier is generated from training data represented using several features to decide whether a protein pair in each sentence has an interaction. Such a specific keyword that is directly related to interaction as “bind” or “interact” plays an important role for training classifiers. We call it a dominant keyword that affects the capability of the classifier. Although it is important to identify the dominant keywords, whether a keyword is dominant depends on the context in which it occurs. Therefore, we propose a method for predicting whether a keyword is dominant for each instance. In this method, a keyword that derives imbalanced classification results is tentatively assumed to be a dominant keyword initially. Then the classifiers are separately trained from the instance with and without the assumed dominant keywords. The validity of the assumed dominant keyword is evaluated based on the classification results of the generated classifiers. The assumption is updated by the evaluation result. Repeating this process increases the prediction accuracy of the dominant keyword. Our experimental results using five corpora show the effectiveness of our proposed method with dominant keyword prediction.

  16. Extraction of Protein-Protein Interaction from Scientific Articles by Predicting Dominant Keywords.

    Science.gov (United States)

    Koyabu, Shun; Phan, Thi Thanh Thuy; Ohkawa, Takenao

    2015-01-01

    For the automatic extraction of protein-protein interaction information from scientific articles, a machine learning approach is useful. The classifier is generated from training data represented using several features to decide whether a protein pair in each sentence has an interaction. Such a specific keyword that is directly related to interaction as "bind" or "interact" plays an important role for training classifiers. We call it a dominant keyword that affects the capability of the classifier. Although it is important to identify the dominant keywords, whether a keyword is dominant depends on the context in which it occurs. Therefore, we propose a method for predicting whether a keyword is dominant for each instance. In this method, a keyword that derives imbalanced classification results is tentatively assumed to be a dominant keyword initially. Then the classifiers are separately trained from the instance with and without the assumed dominant keywords. The validity of the assumed dominant keyword is evaluated based on the classification results of the generated classifiers. The assumption is updated by the evaluation result. Repeating this process increases the prediction accuracy of the dominant keyword. Our experimental results using five corpora show the effectiveness of our proposed method with dominant keyword prediction.

  17. Deep Learning in Bioinformatics

    OpenAIRE

    Min, Seonwoo; Lee, Byunghan; Yoon, Sungroh

    2016-01-01

    In the era of big data, transformation of biomedical big data into valuable knowledge has been one of the most important challenges in bioinformatics. Deep learning has advanced rapidly since the early 2000s and now demonstrates state-of-the-art performance in various fields. Accordingly, application of deep learning in bioinformatics to gain insight from data has been emphasized in both academia and industry. Here, we review deep learning in bioinformatics, presenting examples of current res...

  18. String Mining in Bioinformatics

    Science.gov (United States)

    Abouelhoda, Mohamed; Ghanem, Moustafa

    Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra- and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word "data-mining" is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14].

  19. Stability-mutation feature identification of Web search keywords based on keyword concentration change ratio

    Institute of Scientific and Technical Information of China (English)

    Hongtao; LU; Guanghui; YE; Gang; LI

    2014-01-01

    Purpose: The aim of this paper is to discuss how the keyword concentration change ratio(KCCR) is used while identifying the stability-mutation feature of Web search keywords during information analyses and predictions.Design/methodology/approach: By introducing the stability-mutation feature of keywords and its significance, the paper describes the function of the KCCR in identifying keyword stability-mutation features. By using Ginsberg’s influenza keywords, the paper shows how the KCCR can be used to identify the keyword stability-mutation feature effectively.Findings: Keyword concentration ratio has close positive correlation with the change rate of research objects retrieved by users, so from the characteristic of the "stability-mutation" of keywords, we can understand the relationship between these keywords and certain information. In general, keywords representing for mutation fit for the objects changing in short-term, while those representing for stability are suitable for long-term changing objects. Research limitations: It is difficult to acquire the frequency of keywords, so indexes or parameters which are closely related to the true search volume are chosen for this study.Practical implications: The stability-mutation feature identification of Web search keywords can be applied to predict and analyze the information of unknown public events through observing trends of keyword concentration ratio.Originality/value: The stability-mutation feature of Web search could be quantitatively described by the keyword concentration change ratio(KCCR). Through KCCR, the authors took advantage of Ginsberg’s influenza epidemic data accordingly and demonstrated how accurate and effective the method proposed in this paper was while it was used in information analyses and predictions.

  20. A bioinformatics approach to marker development

    NARCIS (Netherlands)

    Tang, J.

    2008-01-01

    The thesis focuses on two bioinformatics research topics: the development of tools for an efficient and reliable identification of single nucleotides polymorphisms (SNPs) and polymorphic simple sequence repeats (SSRs) from expressed sequence tags (ESTs) (Chapter 2, 3 and 4), and the subsequent imple

  1. Keyword Extraction from Arabic Legal Texts

    Science.gov (United States)

    Rammal, Mahmoud; Bahsoun, Zeinab; Al Achkar Jabbour, Mona

    2015-01-01

    Purpose: The purpose of this paper is to apply local grammar (LG) to develop an indexing system which automatically extracts keywords from titles of Lebanese official journals. Design/methodology/approach: To build LG for our system, the first word that plays the determinant role in understanding the meaning of a title is analyzed and grouped as…

  2. Access to Periodicals: Search Key versus Keyword.

    Science.gov (United States)

    Golden, Susan U.; Golden, Gary A.

    1983-01-01

    Retrievability of titles of 152 periodicals were compared in a fixed-length algorithmic search (Library Computer System) and a keyword search (Washington Library Network) to determine which type of search algorithm is more successful with titles of varying lengths. Three references and a list of English language stopwords are appended. (EJS)

  3. What Makes an Automatic Keyword Classification Effective?

    Science.gov (United States)

    Jones, K. Sparck; Barber, E. O.

    1971-01-01

    The substitution information contained in automatically obtained keyword classification is most effectively exploited when: (1) strong similarity connectives only are utilized, (2) grouping is confined to non-frequent terms, (3) term groups are used to provide additional and not alternative descriptive items and (4) descriptor collection frequency…

  4. A keyword history of Marketing Science

    NARCIS (Netherlands)

    C.F. Mela (Carl); J.M.T. Roos (Jason); Y. Deng (Yanhui)

    2013-01-01

    textabstractThis paper considers the history of keywords used in Marketing Science to develop insights on the evolution of marketing science. Several findings emerge. First, "pricing" and "game theory" are the most ubiquitous words. More generally, the three C's and four P's predominate, suggesting

  5. Rapid automatic keyword extraction for information retrieval and analysis

    Science.gov (United States)

    Rose, Stuart J [Richland, WA; Cowley,; E, Wendy [Richland, WA; Crow, Vernon L [Richland, WA; Cramer, Nicholas O [Richland, WA

    2012-03-06

    Methods and systems for rapid automatic keyword extraction for information retrieval and analysis. Embodiments can include parsing words in an individual document by delimiters, stop words, or both in order to identify candidate keywords. Word scores for each word within the candidate keywords are then calculated based on a function of co-occurrence degree, co-occurrence frequency, or both. Based on a function of the word scores for words within the candidate keyword, a keyword score is calculated for each of the candidate keywords. A portion of the candidate keywords are then extracted as keywords based, at least in part, on the candidate keywords having the highest keyword scores.

  6. Efficient Spatial Keyword Search in Trajectory Databases

    CERN Document Server

    Cong, Gao; Ooi, Beng Chin; Zhang, Dongxiang; Zhang, Meihui

    2012-01-01

    An increasing amount of trajectory data is being annotated with text descriptions to better capture the semantics associated with locations. The fusion of spatial locations and text descriptions in trajectories engenders a new type of top-$k$ queries that take into account both aspects. Each trajectory in consideration consists of a sequence of geo-spatial locations associated with text descriptions. Given a user location $\\lambda$ and a keyword set $\\psi$, a top-$k$ query returns $k$ trajectories whose text descriptions cover the keywords $\\psi$ and that have the shortest match distance. To the best of our knowledge, previous research on querying trajectory databases has focused on trajectory data without any text description, and no existing work has studied such kind of top-$k$ queries on trajectories. This paper proposes one novel method for efficiently computing top-$k$ trajectories. The method is developed based on a new hybrid index, cell-keyword conscious B$^+$-tree, denoted by \\cellbtree, which enabl...

  7. Bioinformatics and Cancer

    Science.gov (United States)

    Researchers take on challenges and opportunities to mine "Big Data" for answers to complex biological questions. Learn how bioinformatics uses advanced computing, mathematics, and technological platforms to store, manage, analyze, and understand data.

  8. INDEXING WORKSHOP: HOW TO ASSIGN KEYWORDS

    Energy Technology Data Exchange (ETDEWEB)

    Sternberg, Virginia

    1979-09-01

    You have heard about issues surrounding indexing and retrieval of nuclear records and automation and micrographics of these records. Now we are going to get each of you involved in indexing and assigning keywords. The first part of this hands-on workshop will be a very basic, elementary step-by-step introduction, concentrating on how to assign keywords. It is a workshop for beginners, People who have never done it before. It is planned to demonstrate what an analyst has to do to index and assign keywords to a document. Then I will take some pages of a report and demonstrate how I choose keywords for it. Then each of you will have a chance to do the same thing with similar pages from another report. Then we will discuss the variations ln the keywords you individually assigned. There are many systems that can be used. In this particular workshop we will cover only a system of building your own keyword listing as you index your documents. We will be discussing keywords or descriptors or subject words, but first I want to point out a few other critical points about indexing. When developing an indexing project the most important thing to do first lS decide what elements you want to retrieve by. Whether you go into a large computer retrieval system or a small three-by-five card system, you have to decide in advance what you want to retrieve. Then you can go on from there. If you only need to search by equipment number or by purchase order or by contract number, then you can use a very simple retrieval system. But if you want to be able to retrieve a record by any combination of elements, then you have to consistently input these into your system. For example, if you want to be able to ask for the drawings of the piping in the secondary cooling system, level 3, manufactured by a certain vendor, then you must have put the information into the index by a retrieval file point, in advance. I want to stress that the time spent in deciding what has to be retrievable is never

  9. Deep learning in bioinformatics.

    Science.gov (United States)

    Min, Seonwoo; Lee, Byunghan; Yoon, Sungroh

    2016-07-29

    In the era of big data, transformation of biomedical big data into valuable knowledge has been one of the most important challenges in bioinformatics. Deep learning has advanced rapidly since the early 2000s and now demonstrates state-of-the-art performance in various fields. Accordingly, application of deep learning in bioinformatics to gain insight from data has been emphasized in both academia and industry. Here, we review deep learning in bioinformatics, presenting examples of current research. To provide a useful and comprehensive perspective, we categorize research both by the bioinformatics domain (i.e. omics, biomedical imaging, biomedical signal processing) and deep learning architecture (i.e. deep neural networks, convolutional neural networks, recurrent neural networks, emergent architectures) and present brief descriptions of each study. Additionally, we discuss theoretical and practical issues of deep learning in bioinformatics and suggest future research directions. We believe that this review will provide valuable insights and serve as a starting point for researchers to apply deep learning approaches in their bioinformatics studies.

  10. KRBKSS: a keyword relationship based keyword-set search system for peer-to-peer networks

    Institute of Scientific and Technical Information of China (English)

    ZHANG Liang; ZOU Fu-tai; MA Fan-yuan

    2005-01-01

    Distributed inverted index technology is used in many peer-to-peer (P2P) systems to help find rapidly document in -set search system for peer-to-peer networkswhich a given word appears. Distributed inverted index by keywords may incur significant bandwidth for executing more complicated search queries such as multiple-attribute queries. In order to reduce query overhead, KSS (keyword-set search) by Gnawali partitions the index by a set of keywords. However, a KSS index is considerably larger than a standard inverted index,since there are more word sets than there are individual words. And the insert overhead and storage overhead are obviously unacceptable for full-text search on a collection of documents even if KSS uses the distance window technology. In this paper, we extract the relationship information between query keywords from websites' queries logs to improve performance of KSS system.Experiments results clearly demonstrated that the improved keyword-set search system based on keywords relationship (KRBKSS) is more efficient than KSS index in insert overhead and storage overhead, and a standard inverted index in terms of communication costs for query.

  11. Keyword search in the Deep Web

    OpenAIRE

    Calì, Andrea; Martinenghi, D.; Torlone, R.

    2015-01-01

    The Deep Web is constituted by data accessible through Web\\ud pages, but not readily indexable by search engines, as they are returned\\ud in dynamic pages. In this paper we propose a framework for accessing\\ud Deep Web sources, represented as relational tables with so-called ac-\\ud cess limitations, with keyword-based queries. We formalize the notion\\ud of optimal answer and investigate methods for query processing. To our\\ud knowledge, this problem has never been studied in a systematic way.

  12. Automatic keywording of High Energy Physics

    CERN Document Server

    Dallman, David Peter

    1999-01-01

    Bibliographic databases were developed from the traditional library card catalogue in order to enable users to access library documents via various types of bibliographic information, such as title, author, series or conference date. In addition these catalogues sometimes contained some form of indexation by subject, such as the Universal (or Dewey) Decimal Classification used for books. With the introduction of the eprint archives, set up by the High Energy Physics (HEP) Community in the early 90s, huge collections of documents in several fields have been made available on the World Wide Web. These developments however have not yet been followed up from a keywording point of view. We will see in this paper how important it is to attribute keywords to all documents in the area of HEP Grey Literature. As libraries are facing a future with less and less manpower available and more and more documents, we will explore the possibility of being helped by automatic classification software. We will specifically menti...

  13. Chemistry in Bioinformatics

    Science.gov (United States)

    Murray-Rust, Peter; Mitchell, John BO; Rzepa, Henry S

    2005-01-01

    Chemical information is now seen as critical for most areas of life sciences. But unlike Bioinformatics, where data is openly available and freely re-usable, most chemical information is closed and cannot be re-distributed without permission. This has led to a failure to adopt modern informatics and software techniques and therefore paucity of chemistry in bioinformatics. New technology, however, offers the hope of making chemical data (compounds and properties) free during the authoring process. We argue that the technology is already available; we require a collective agreement to enhance publication protocols. PMID:15941476

  14. Chemistry in Bioinformatics

    Directory of Open Access Journals (Sweden)

    Mitchell John

    2005-06-01

    Full Text Available Abstract Chemical information is now seen as critical for most areas of life sciences. But unlike Bioinformatics, where data is openly available and freely re-usable, most chemical information is closed and cannot be re-distributed without permission. This has led to a failure to adopt modern informatics and software techniques and therefore paucity of chemistry in bioinformatics. New technology, however, offers the hope of making chemical data (compounds and properties free during the authoring process. We argue that the technology is already available; we require a collective agreement to enhance publication protocols.

  15. Towards a career in bioinformatics.

    Science.gov (United States)

    Ranganathan, Shoba

    2009-12-03

    The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation from 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 9-11, 2009 at Biopolis, Singapore. InCoB has actively engaged researchers from the area of life sciences, systems biology and clinicians, to facilitate greater synergy between these groups. To encourage bioinformatics students and new researchers, tutorials and student symposium, the Singapore Symposium on Computational Biology (SYMBIO) were organized, along with the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and the Clinical Bioinformatics (CBAS) Symposium. However, to many students and young researchers, pursuing a career in a multi-disciplinary area such as bioinformatics poses a Himalayan challenge. A collection to tips is presented here to provide signposts on the road to a career in bioinformatics. An overview of the application of bioinformatics to traditional and emerging areas, published in this supplement, is also presented to provide possible future avenues of bioinformatics investigation. A case study on the application of e-learning tools in undergraduate bioinformatics curriculum provides information on how to go impart targeted education, to sustain bioinformatics in the Asia-Pacific region. The next InCoB is scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010.

  16. Bioinformatics for Exploration

    Science.gov (United States)

    Johnson, Kathy A.

    2006-01-01

    For the purpose of this paper, bioinformatics is defined as the application of computer technology to the management of biological information. It can be thought of as the science of developing computer databases and algorithms to facilitate and expedite biological research. This is a crosscutting capability that supports nearly all human health areas ranging from computational modeling, to pharmacodynamics research projects, to decision support systems within autonomous medical care. Bioinformatics serves to increase the efficiency and effectiveness of the life sciences research program. It provides data, information, and knowledge capture which further supports management of the bioastronautics research roadmap - identifying gaps that still remain and enabling the determination of which risks have been addressed.

  17. Feature selection in bioinformatics

    Science.gov (United States)

    Wang, Lipo

    2012-06-01

    In bioinformatics, there are often a large number of input features. For example, there are millions of single nucleotide polymorphisms (SNPs) that are genetic variations which determine the dierence between any two unrelated individuals. In microarrays, thousands of genes can be proled in each test. It is important to nd out which input features (e.g., SNPs or genes) are useful in classication of a certain group of people or diagnosis of a given disease. In this paper, we investigate some powerful feature selection techniques and apply them to problems in bioinformatics. We are able to identify a very small number of input features sucient for tasks at hand and we demonstrate this with some real-world data.

  18. Distributed computing in bioinformatics.

    Science.gov (United States)

    Jain, Eric

    2002-01-01

    This paper provides an overview of methods and current applications of distributed computing in bioinformatics. Distributed computing is a strategy of dividing a large workload among multiple computers to reduce processing time, or to make use of resources such as programs and databases that are not available on all computers. Participating computers may be connected either through a local high-speed network or through the Internet.

  19. Advance in structural bioinformatics

    CERN Document Server

    Wei, Dongqing; Zhao, Tangzhen; Dai, Hao

    2014-01-01

    This text examines in detail mathematical and physical modeling, computational methods and systems for obtaining and analyzing biological structures, using pioneering research cases as examples. As such, it emphasizes programming and problem-solving skills. It provides information on structure bioinformatics at various levels, with individual chapters covering introductory to advanced aspects, from fundamental methods and guidelines on acquiring and analyzing genomics and proteomics sequences, the structures of protein, DNA and RNA, to the basics of physical simulations and methods for conform

  20. Phylogenetic trees in bioinformatics

    Energy Technology Data Exchange (ETDEWEB)

    Burr, Tom L [Los Alamos National Laboratory

    2008-01-01

    Genetic data is often used to infer evolutionary relationships among a collection of viruses, bacteria, animal or plant species, or other operational taxonomic units (OTU). A phylogenetic tree depicts such relationships and provides a visual representation of the estimated branching order of the OTUs. Tree estimation is unique for several reasons, including: the types of data used to represent each OTU; the use ofprobabilistic nucleotide substitution models; the inference goals involving both tree topology and branch length, and the huge number of possible trees for a given sample of a very modest number of OTUs, which implies that fmding the best tree(s) to describe the genetic data for each OTU is computationally demanding. Bioinformatics is too large a field to review here. We focus on that aspect of bioinformatics that includes study of similarities in genetic data from multiple OTUs. Although research questions are diverse, a common underlying challenge is to estimate the evolutionary history of the OTUs. Therefore, this paper reviews the role of phylogenetic tree estimation in bioinformatics, available methods and software, and identifies areas for additional research and development.

  1. Public-key Encryption with Registered Keyword Search

    NARCIS (Netherlands)

    Tang, Qiang; Chen, Liqun

    Public-key Encryption with Keyword Search (PEKS) enables a server to test whether a tag from a sender and a trapdoor from a receiver contain the same keyword. In this paper, we highlight some potential security concern, i.e. a curious server is able to answer whether any selected keyword is

  2. Flow cytometry bioinformatics.

    Directory of Open Access Journals (Sweden)

    Kieran O'Neill

    Full Text Available Flow cytometry bioinformatics is the application of bioinformatics to flow cytometry data, which involves storing, retrieving, organizing, and analyzing flow cytometry data using extensive computational resources and tools. Flow cytometry bioinformatics requires extensive use of and contributes to the development of techniques from computational statistics and machine learning. Flow cytometry and related methods allow the quantification of multiple independent biomarkers on large numbers of single cells. The rapid growth in the multidimensionality and throughput of flow cytometry data, particularly in the 2000s, has led to the creation of a variety of computational analysis methods, data standards, and public databases for the sharing of results. Computational methods exist to assist in the preprocessing of flow cytometry data, identifying cell populations within it, matching those cell populations across samples, and performing diagnosis and discovery using the results of previous steps. For preprocessing, this includes compensating for spectral overlap, transforming data onto scales conducive to visualization and analysis, assessing data for quality, and normalizing data across samples and experiments. For population identification, tools are available to aid traditional manual identification of populations in two-dimensional scatter plots (gating, to use dimensionality reduction to aid gating, and to find populations automatically in higher dimensional space in a variety of ways. It is also possible to characterize data in more comprehensive ways, such as the density-guided binary space partitioning technique known as probability binning, or by combinatorial gating. Finally, diagnosis using flow cytometry data can be aided by supervised learning techniques, and discovery of new cell types of biological importance by high-throughput statistical methods, as part of pipelines incorporating all of the aforementioned methods. Open standards, data

  3. Bioinformatics of prokaryotic RNAs.

    Science.gov (United States)

    Backofen, Rolf; Amman, Fabian; Costa, Fabrizio; Findeiß, Sven; Richter, Andreas S; Stadler, Peter F

    2014-01-01

    The genome of most prokaryotes gives rise to surprisingly complex transcriptomes, comprising not only protein-coding mRNAs, often organized as operons, but also harbors dozens or even hundreds of highly structured small regulatory RNAs and unexpectedly large levels of anti-sense transcripts. Comprehensive surveys of prokaryotic transcriptomes and the need to characterize also their non-coding components is heavily dependent on computational methods and workflows, many of which have been developed or at least adapted specifically for the use with bacterial and archaeal data. This review provides an overview on the state-of-the-art of RNA bioinformatics focusing on applications to prokaryotes.

  4. Pattern recognition in bioinformatics.

    Science.gov (United States)

    de Ridder, Dick; de Ridder, Jeroen; Reinders, Marcel J T

    2013-09-01

    Pattern recognition is concerned with the development of systems that learn to solve a given problem using a set of example instances, each represented by a number of features. These problems include clustering, the grouping of similar instances; classification, the task of assigning a discrete label to a given instance; and dimensionality reduction, combining or selecting features to arrive at a more useful representation. The use of statistical pattern recognition algorithms in bioinformatics is pervasive. Classification and clustering are often applied to high-throughput measurement data arising from microarray, mass spectrometry and next-generation sequencing experiments for selecting markers, predicting phenotype and grouping objects or genes. Less explicitly, classification is at the core of a wide range of tools such as predictors of genes, protein function, functional or genetic interactions, etc., and used extensively in systems biology. A course on pattern recognition (or machine learning) should therefore be at the core of any bioinformatics education program. In this review, we discuss the main elements of a pattern recognition course, based on material developed for courses taught at the BSc, MSc and PhD levels to an audience of bioinformaticians, computer scientists and life scientists. We pay attention to common problems and pitfalls encountered in applications and in interpretation of the results obtained.

  5. Bayesian Framework for Automatic Image Annotation Using Visual Keywords

    Science.gov (United States)

    Agrawal, Rajeev; Wu, Changhua; Grosky, William; Fotouhi, Farshad

    In this paper, we propose a Bayesian probability based framework, which uses visual keywords and already available text keywords to automatically annotate the images. Taking the cue from document classification, an image can be considered as a document and objects present in it as words. Using this concept, we can create visual keywords by dividing an image into tiles based on a certain template size. Visual keywords are simple vector quantization of small-sized image tiles. We estimate the conditional probability of a text keyword in the presence of visual keywords, described by a multivariate Gaussian distribution. We demonstrate the effectiveness of our approach by comparing predicted text annotations with manual annotations and analyze the effect of text annotation length on the performance.

  6. Experiences in automatic keywording of particle physics literature

    CERN Document Server

    Montejo Ráez, Arturo

    2001-01-01

    Attributing keywords can assist in the classification and retrieval of documents in the particle physics literature. As information services face a future with less available manpower and more and more documents being written, the possibility of keyword attribution being assisted by automatic classification software is explored. A project being carried out at CERN (the European Laboratory for Particle Physics) for the development and integration of automatic keywording is described.

  7. Toward a deeper understanding of Visualization through keyword analysis

    OpenAIRE

    2014-01-01

    We present the results of a comprehensive analysis of visualization paper keywords supplied for 4366 papers submitted to five main visualization conferences. We describe main keywords, topic areas, and 10-year historic trends from two datasets: (1) the standardized PCS taxonomy keywords in use for paper submissions for IEEE InfoVis, IEEE Vis-SciVis, IEEE VAST, EuroVis, and IEEE PacificVis since 2009 and (2) the author-chosen keywords for papers published in the IEEE Visualization conference s...

  8. Incremental Training for SVM-Based Classification with Keyword Adjusting

    Institute of Scientific and Technical Information of China (English)

    SUN Jin-wen; YANG Jian-wu; LU Bin; XIAO Jian-guo

    2004-01-01

    This paper analyzed the theory of incremental learning of SVM (support vector machine) and pointed out it is a shortage that the support vector optimization is only considered in present research of SVM incremental learning.According to the significance of keyword in training, a new incremental training method considering keyword adjusting was proposed, which eliminates the difference between incremental learning and batch learning through the keyword adjusting.The experimental results show that the improved method outperforms the method without the keyword adjusting and achieve the same precision as the batch method.

  9. Emergent Computation Emphasizing Bioinformatics

    CERN Document Server

    Simon, Matthew

    2005-01-01

    Emergent Computation is concerned with recent applications of Mathematical Linguistics or Automata Theory. This subject has a primary focus upon "Bioinformatics" (the Genome and arising interest in the Proteome), but the closing chapter also examines applications in Biology, Medicine, Anthropology, etc. The book is composed of an organized examination of DNA, RNA, and the assembly of amino acids into proteins. Rather than examine these areas from a purely mathematical viewpoint (that excludes much of the biochemical reality), the author uses scientific papers written mostly by biochemists based upon their laboratory observations. Thus while DNA may exist in its double stranded form, triple stranded forms are not excluded. Similarly, while bases exist in Watson-Crick complements, mismatched bases and abasic pairs are not excluded, nor are Hoogsteen bonds. Just as there are four bases naturally found in DNA, the existence of additional bases is not ignored, nor amino acids in addition to the usual complement of...

  10. Bioinformatics meets parasitology.

    Science.gov (United States)

    Cantacessi, C; Campbell, B E; Jex, A R; Young, N D; Hall, R S; Ranganathan, S; Gasser, R B

    2012-05-01

    The advent and integration of high-throughput '-omics' technologies (e.g. genomics, transcriptomics, proteomics, metabolomics, glycomics and lipidomics) are revolutionizing the way biology is done, allowing the systems biology of organisms to be explored. These technologies are now providing unique opportunities for global, molecular investigations of parasites. For example, studies of a transcriptome (all transcripts in an organism, tissue or cell) have become instrumental in providing insights into aspects of gene expression, regulation and function in a parasite, which is a major step to understanding its biology. The purpose of this article was to review recent applications of next-generation sequencing technologies and bioinformatic tools to large-scale investigations of the transcriptomes of parasitic nematodes of socio-economic significance (particularly key species of the order Strongylida) and to indicate the prospects and implications of these explorations for developing novel methods of parasite intervention.

  11. Engineering BioInformatics

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    @@ With the completion of human genome sequencing, a new era of bioinformatics st arts. On one hand, due to the advance of high throughput DNA microarray technol ogies, functional genomics such as gene expression information has increased exp onentially and will continue to do so for the foreseeable future. Conventional m eans of storing, analysing and comparing related data are already overburdened. Moreover, the rich information in genes , their functions and their associated wide biological implication requires new technologies of analysing data that employ sophisticated statistical and machine learning algorithms, powerful com puters and intensive interaction together different data sources such as seque nce data, gene expression data, proteomics data and metabolic pathway informati on to discover complex genomic structures and functional patterns with other bi ological process to gain a comprehensive understanding of cell physiology.

  12. Bioinformatics and moonlighting proteins

    Directory of Open Access Journals (Sweden)

    Sergio eHernández

    2015-06-01

    Full Text Available Multitasking or moonlighting is the capability of some proteins to execute two or more biochemical functions. Usually, moonlighting proteins are experimentally revealed by serendipity. For this reason, it would be helpful that Bioinformatics could predict this multifunctionality, especially because of the large amounts of sequences from genome projects. In the present work, we analyse and describe several approaches that use sequences, structures, interactomics and current bioinformatics algorithms and programs to try to overcome this problem. Among these approaches are: a remote homology searches using Psi-Blast, b detection of functional motifs and domains, c analysis of data from protein-protein interaction databases (PPIs, d match the query protein sequence to 3D databases (i.e., algorithms as PISITE, e mutation correlation analysis between amino acids by algorithms as MISTIC. Programs designed to identify functional motif/domains detect mainly the canonical function but usually fail in the detection of the moonlighting one, Pfam and ProDom being the best methods. Remote homology search by Psi-Blast combined with data from interactomics databases (PPIs have the best performance. Structural information and mutation correlation analysis can help us to map the functional sites. Mutation correlation analysis can only be used in very specific situations –it requires the existence of multialigned family protein sequences - but can suggest how the evolutionary process of second function acquisition took place. The multitasking protein database MultitaskProtDB (http://wallace.uab.es/multitask/, previously published by our group, has been used as a benchmark for the all of the analyses.

  13. Virtual Bioinformatics Distance Learning Suite

    Science.gov (United States)

    Tolvanen, Martti; Vihinen, Mauno

    2004-01-01

    Distance learning as a computer-aided concept allows students to take courses from anywhere at any time. In bioinformatics, computers are needed to collect, store, process, and analyze massive amounts of biological and biomedical data. We have applied the concept of distance learning in virtual bioinformatics to provide university course material…

  14. Virtual Bioinformatics Distance Learning Suite

    Science.gov (United States)

    Tolvanen, Martti; Vihinen, Mauno

    2004-01-01

    Distance learning as a computer-aided concept allows students to take courses from anywhere at any time. In bioinformatics, computers are needed to collect, store, process, and analyze massive amounts of biological and biomedical data. We have applied the concept of distance learning in virtual bioinformatics to provide university course material…

  15. Virtual bioinformatics distance learning suite*.

    Science.gov (United States)

    Tolvanen, Martti; Vihinen, Mauno

    2004-05-01

    Distance learning as a computer-aided concept allows students to take courses from anywhere at any time. In bioinformatics, computers are needed to collect, store, process, and analyze massive amounts of biological and biomedical data. We have applied the concept of distance learning in virtual bioinformatics to provide university course material over the Internet. Currently, we provide two fully computer-based courses, "Introduction to Bioinformatics" and "Bioinformatics in Functional Genomics." Here we will discuss the application of distance learning in bioinformatics training and our experiences gained during the 3 years that we have run the courses, with about 400 students from a number of universities. The courses are available at bioinf.uta.fi.

  16. Output Keywords in Context in an HTML File with Python

    Directory of Open Access Journals (Sweden)

    William J. Turkel

    2012-07-01

    Full Text Available This lesson builds on Keywords in Context (Using N-grams, where n-grams were extracted from a text. Here, you will learn how to output all of the n-grams of a given keyword in a document downloaded from the Internet, and display them clearly in your browser window.

  17. Tag cloud generation for results of multiple keywords queries

    DEFF Research Database (Denmark)

    2013-01-01

    In this paper we study tag cloud generation for retrieved results of multiple keyword queries. It is motivated by many real world scenarios such as personalization tasks, surveillance systems and information retrieval tasks defined with multiple keywords. We adjust the state-of-the-art tag cloud...

  18. Attribute-Based Proxy Re-Encryption with Keyword Search

    Science.gov (United States)

    Shi, Yanfeng; Liu, Jiqiang; Han, Zhen; Zheng, Qingji; Zhang, Rui; Qiu, Shuo

    2014-01-01

    Keyword search on encrypted data allows one to issue the search token and conduct search operations on encrypted data while still preserving keyword privacy. In the present paper, we consider the keyword search problem further and introduce a novel notion called attribute-based proxy re-encryption with keyword search (), which introduces a promising feature: In addition to supporting keyword search on encrypted data, it enables data owners to delegate the keyword search capability to some other data users complying with the specific access control policy. To be specific, allows (i) the data owner to outsource his encrypted data to the cloud and then ask the cloud to conduct keyword search on outsourced encrypted data with the given search token, and (ii) the data owner to delegate other data users keyword search capability in the fine-grained access control manner through allowing the cloud to re-encrypted stored encrypted data with a re-encrypted data (embedding with some form of access control policy). We formalize the syntax and security definitions for , and propose two concrete constructions for : key-policy and ciphertext-policy . In the nutshell, our constructions can be treated as the integration of technologies in the fields of attribute-based cryptography and proxy re-encryption cryptography. PMID:25549257

  19. Fuzzy Keyword Search Over Encrypted Data in Cloud Computing

    Directory of Open Access Journals (Sweden)

    Yogesh K. Gedam,

    2014-07-01

    Full Text Available As Cloud Computing becomes prevalent, more and more sensitive information are being centralized into the cloud. For the protection of data privacy, sensitive data usually have to be encrypted before outsourcing, which makes effective data utilization a very challenging task. Although traditional searchable encryption schemes allow a user to securely search over encrypted data through keywords and selectively retrieve files of interest, these techniques support only exact keyword search. This significant drawback makes existing techniques unsuitable in cloud computing as it is greatly affect system usability, rendering user searching experiences very frustrating and system efficiency very low. In this paper, for the first time we formalize and solve the problem of effective fuzzy keyword search over encrypted cloud while maintaining keyword privacy. In our solution, we exploit edit distance to quantify keyword similarity and develop new advanced technique on constructing fuzzy keyword sets which greatly reduces the storage and representation overheads. In this way, we show that our proposed solution is secure and privacy preserving, while realizing the goal of fuzzy keyword search.

  20. Engineering bioinformatics: building reliability, performance and productivity into bioinformatics software.

    Science.gov (United States)

    Lawlor, Brendan; Walsh, Paul

    2015-01-01

    There is a lack of software engineering skills in bioinformatic contexts. We discuss the consequences of this lack, examine existing explanations and remedies to the problem, point out their shortcomings, and propose alternatives. Previous analyses of the problem have tended to treat the use of software in scientific contexts as categorically different from the general application of software engineering in commercial settings. In contrast, we describe bioinformatic software engineering as a specialization of general software engineering, and examine how it should be practiced. Specifically, we highlight the difference between programming and software engineering, list elements of the latter and present the results of a survey of bioinformatic practitioners which quantifies the extent to which those elements are employed in bioinformatics. We propose that the ideal way to bring engineering values into research projects is to bring engineers themselves. We identify the role of Bioinformatic Engineer and describe how such a role would work within bioinformatic research teams. We conclude by recommending an educational emphasis on cross-training software engineers into life sciences, and propose research on Domain Specific Languages to facilitate collaboration between engineers and bioinformaticians.

  1. Tag cloud generation for results of multiple keywords queries

    DEFF Research Database (Denmark)

    2013-01-01

    In this paper we study tag cloud generation for retrieved results of multiple keyword queries. It is motivated by many real world scenarios such as personalization tasks, surveillance systems and information retrieval tasks defined with multiple keywords. We adjust the state-of-the-art tag cloud...... generation techniques for multiple keywords query results. Consequently, we conduct the extensive evaluation on top of three distinct collaborative tagging systems. The graph-based methods perform significantly better for the Movielens and Bibsonomy datasets. Tag cloud generation based on maximal coverage...

  2. An approach for discovering keywords from Spanish tweets using Wikipedia

    Directory of Open Access Journals (Sweden)

    Daniel AYALA

    2016-05-01

    Full Text Available Most approaches to keywords discovery when analyzing microblogging messages (among them those from Twitter are based on statistical and lexical information about the words that compose the text. The lack of context in the short messages can be problematic due to the low co-occurrence of words. In this paper, we present a new approach for keywords discovering from Spanish tweets based on the addition of context information using Wikipedia as a knowledge base. We present four different ways to use Wikipedia and two ways to rank the new keywords. We have tested these strategies using more than 60000 Spanish tweets, measuring performance and analyzing particularities of each strategy.

  3. Bayesian estimation of keyword confidence in Chinese continuous speech recognition

    Institute of Scientific and Technical Information of China (English)

    HAO Jie; LI Xing

    2003-01-01

    In a syllable-based speaker-independent Chinese continuous speech recognition system based on classical Hidden Markov Model (HMM), a Bayesian approach of keyword confidence estimation is studied, which utilizes both acoustic layer scores and syllable-based statistical language model (LM) score. The Maximum a posteriori (MAP) confidence measure is proposed, and the forward-backward algorithm calculating the MAP confidence scores is deduced. The performance of the MAP confidence measure is evaluated in keyword spotting application and the experiment results show that the MAP confidence scores provide high discriminability for keyword candidates. Furthermore, the MAP confidence measure can be applied to various speech recognition applications.

  4. Genome Exploitation and Bioinformatics Tools

    Science.gov (United States)

    de Jong, Anne; van Heel, Auke J.; Kuipers, Oscar P.

    Bioinformatic tools can greatly improve the efficiency of bacteriocin screening efforts by limiting the amount of strains. Different classes of bacteriocins can be detected in genomes by looking at different features. Finding small bacteriocins can be especially challenging due to low homology and because small open reading frames (ORFs) are often omitted from annotations. In this chapter, several bioinformatic tools/strategies to identify bacteriocins in genomes are discussed.

  5. Clustering Techniques in Bioinformatics

    Directory of Open Access Journals (Sweden)

    Muhammad Ali Masood

    2015-01-01

    Full Text Available Dealing with data means to group information into a set of categories either in order to learn new artifacts or understand new domains. For this purpose researchers have always looked for the hidden patterns in data that can be defined and compared with other known notions based on the similarity or dissimilarity of their attributes according to well-defined rules. Data mining, having the tools of data classification and data clustering, is one of the most powerful techniques to deal with data in such a manner that it can help researchers identify the required information. As a step forward to address this challenge, experts have utilized clustering techniques as a mean of exploring hidden structure and patterns in underlying data. Improved stability, robustness and accuracy of unsupervised data classification in many fields including pattern recognition, machine learning, information retrieval, image analysis and bioinformatics, clustering has proven itself as a reliable tool. To identify the clusters in datasets algorithm are utilized to partition data set into several groups based on the similarity within a group. There is no specific clustering algorithm, but various algorithms are utilized based on domain of data that constitutes a cluster and the level of efficiency required. Clustering techniques are categorized based upon different approaches. This paper is a survey of few clustering techniques out of many in data mining. For the purpose five of the most common clustering techniques out of many have been discussed. The clustering techniques which have been surveyed are: K-medoids, K-means, Fuzzy C-means, Density-Based Spatial Clustering of Applications with Noise (DBSCAN and Self-Organizing Map (SOM clustering.

  6. An introduction to XML query processing and keyword search

    CERN Document Server

    Lu, Jiaheng

    2013-01-01

    This book systematically and comprehensively covers the latest advances in XML data searching. It presents an extensive overview of the current query processing and keyword search techniques on XML data.

  7. Boolean Burritos: How the Faculty Ate Up Keyword Searching.

    Science.gov (United States)

    York, Sherry

    1999-01-01

    Describes an activity that librarians can use to acquaint teachers with keyword searching and Boolean operators to more successfully use the library's online catalog. Uses food ingredients to represent various possible combinations. (LRW)

  8. Manifold-Ranking-Based Keyword Propagation for Image Retrieval

    Directory of Open Access Journals (Sweden)

    Li Mingjing

    2006-01-01

    Full Text Available A novel keyword propagation method is proposed for image retrieval based on a recently developed manifold-ranking algorithm. In contrast to existing methods which train a binary classifier for each keyword, our keyword model is constructed in a straightforward manner by exploring the relationship among all images in the feature space in the learning stage. In relevance feedback, the feedback information can be naturally incorporated to refine the retrieval result by additional propagation processes. In order to speed up the convergence of the query concept, we adopt two active learning schemes to select images during relevance feedback. Furthermore, by means of keyword model update, the system can be self-improved constantly. The updating procedure can be performed online during relevance feedback without extra offline training. Systematic experiments on a general-purpose image database consisting of 5 000 Corel images validate the effectiveness of the proposed method.

  9. Keyword Query over Error-Tolerant Knowledge Bases

    Institute of Scientific and Technical Information of China (English)

    Yu-Rong Cheng; Ye Yuan; Jia-Yu Li; Lei Chen; Guo-Ren Wang

    2016-01-01

    With more and more knowledge provided by WWW, querying and mining the knowledge bases have attracted much research attention. Among all the queries over knowledge bases, which are usually modelled as graphs, a keyword query is the most widely used one. Although the problem of keyword query over graphs has been deeply studied for years, knowledge bases, as special error-tolerant graphs, lead to the results of the traditional defined keyword queries out of users’ satisfaction. Thus, in this paper, we define a new keyword query, called confident r-clique, specific for knowledge bases based on the r-clique definition for keyword query on general graphs, which has been proved to be the best one. However, as we prove in the paper, finding the confident r-cliques is #P-hard. We propose a filtering-and-verification framework to improve the search efficiency. In the filtering phase, we develop the tightest upper bound of the confident r-clique, and design an index together with its search algorithm, which suits the large scale of knowledge bases well. In the verification phase, we develop an efficient sampling method to verify the final answers from the candidates remaining in the filtering phase. Extensive experiments demonstrate that the results derived from our new definition satisfy the users’ requirement better compared with the traditional r-clique definition, and our algorithms are efficient.

  10. Keywords in Context (Using n-grams with Python

    Directory of Open Access Journals (Sweden)

    William J. Turkel

    2012-07-01

    Full Text Available Like in Output Data as HTML File, this lesson takes the frequency pairs collected in Counting Frequencies and outputs them in HTML. This time the focus is on keywords in context (KWIC which creates n-grams from the original document content – in this case a trial transcript from the Old Bailey Online. You can use your program to select a keyword and the computer will output all instances of that keyword, along with the words to the left and right of it, making it easy to see at a glance how the keyword is used. Once the KWICs have been created, they are then wrapped in HTML and sent to the browser where they can be viewed. This reinforces what was learned in Output Data as HTML File, opting for a slightly different output. At the end of this lesson, you will be able to extract all possible n-grams from the text. In the next lesson, you will be learn how to output all of the n-grams of a given keyword in a document downloaded from the Internet, and display them clearly in your browser window.

  11. USING GOOGLE’S KEYWORD RELATION IN MULTIDOMAIN DOCUMENT CLASSIFICATION

    Directory of Open Access Journals (Sweden)

    Ping-I Chen

    2013-07-01

    Full Text Available People can collect all kinds of knowledge from search engines to improve the quality of decision making, and use document classification systems to manage the knowledge repository. Document classification systems always need to construct a keyword vector, which always contains thousands of words, to represent the knowledge domain. Thus, the computation complexity of the classification algorithm is very high. Also, users need to download all the documents before extracting the keywords and classifying the documents. In our previous work, we described a new algorithm called “Word AdHoc Network” (WANET and used it to extract the most important sequences of keywords for each document. In this paper, we adapt the WANET system to make it more precise. We will also use a new similarity measurement algorithm, called “Google Purity,” to calculate the similarity between the extracted keyword sequences to classify similar documents together. By using this system, we can easily classify the information in different knowledge domains at the same time, and all the executions are without any pre-established keyword repository. Our experiments show that the classification results are very accurate and useful. This new system can improve the efficiency of document classification and make it more usable in Web-based information management.

  12. Development of a taxonomy of keywords for engineering education research

    Science.gov (United States)

    Finelli, Cynthia J.; Borrego, Maura; Rasoulifar, Golnoosh

    2016-05-01

    The diversity of engineering education research provides an opportunity for cross-fertilisation of ideas and creativity, but it also can result in fragmentation of the field and duplication of effort. One solution is to establish a standardised taxonomy of engineering education terms to map the field and communicate and connect research initiatives. This report describes the process for developing such a taxonomy, the EER Taxonomy. Although the taxonomy focuses on engineering education research in the United States, inclusive efforts have engaged 266 individuals from 149 cities in 30 countries during one multiday workshop, 7 conference sessions, and several other virtual and in-person activities. The resulting taxonomy comprises 455 terms arranged in 14 branches and 6 levels. This taxonomy was found to satisfy four criteria for validity and reliability: (1) keywords assigned to a set of abstracts were reproducible by multiple researchers, (2) the taxonomy comprised terms that could be selected as keywords to fully describe 243 articles in 3 journals, (3) the keywords for those 243 articles were evenly distributed across the branches of the taxonomy, and (4) the authors of 31 conference papers agreed with 90% of researcher-assigned keywords. This report also describes guidelines developed to help authors consistently assign keywords for their articles by encouraging them to choose terms from three categories: (1) context/focus/topic, (2) purpose/target/motivation, and (3) research approach.

  13. An answer summarization method based on keyword extraction

    Directory of Open Access Journals (Sweden)

    Fan Qiaoqing

    2017-01-01

    Full Text Available In order to reduce the redundancy of answer summary generated from community q&a dataset without topic tags, we propose an answer summarization algorithm based on keyword extraction. We combine tf-idf with word vector to change the influence transferred ratio equation in TextRank. And then during summarizing, we take the ratio of the number of sentences containing any keyword to the total number of candidate sentences as an adaptive factor for AMMR. Meanwhile we reuse the scores of keywords generated by TextRank as a weight factor for sentence similarity computing. Experimental results show that the proposed answer summarization is better than the traditional MMR and AMMR.

  14. Word Spotting Based on a posterior Measure of Keyword Confidence

    Institute of Scientific and Technical Information of China (English)

    郝杰; 李星

    2002-01-01

    In this paper, an approach of keyword confidence estimation is developed that well combines acoustic layer scores and syllable-based statistical language model (LM) scores.An a posteriori (AP) confidence measure and its forward-backward calculating algorithm are deduced. A zero false alarm (ZFA) assumption is proposed for evaluating relative confidence measures by word spotting task. In a word spotting experiment with a vocabulary of 240keywords, the keyword accuracy under the AP measure is above 94%, which well approaches its theoretical upper limit. In addition, a syllable lattice Hidden Markov Model (SLHMM) is formulated and a unified view of confidence estimation, word spotting, optimal path search,and N-best syllable re-scoring is presented. The proposed AP measure can be easily applied to various speech recognition systems as well.

  15. Two Decades of Research Collaboration: A Keyword Scopus Evaluation

    Directory of Open Access Journals (Sweden)

    Alexandru Amarioarei

    2016-12-01

    Full Text Available One issue that has become more important over the years is to evaluate the capability for worldwide research networks on different areas of research, especially in the areas that are identified as being worldwide significant. The study investigated the research output, citations impact and collaborations on publications listed in Scopus authored by researchers all over the world, research published between 1999-2014, selected by a group of keywords identified by authors. The results of the analysis identified an increasing trend in scientific publications starting with 2006, especially on three of the analyzed keywords. We also found differences in the citations patterns for the Black Sea and Danube Delta keywords in the contributing countries. The results of this study revealed a steady increase of the collaboration output and an increasing trend in the collaboration behavior, both at the European and national level. Additionally, at the national level the study identified the collaboration network between Romanian institutions per counties.

  16. Bioinformatics: perspectives for the future.

    Science.gov (United States)

    Costa, Luciano da Fontoura

    2004-12-30

    I give here a very personal perspective of Bioinformatics and its future, starting by discussing the origin of the term (and area) of bioinformatics and proceeding by trying to foresee the development of related issues, including pattern recognition/data mining, the need to reintegrate biology, the potential of complex networks as a powerful and flexible framework for bioinformatics and the interplay between bio- and neuroinformatics. Human resource formation and market perspective are also addressed. Given the complexity and vastness of these issues and concepts, as well as the limited size of a scientific article and finite patience of the reader, these perspectives are surely incomplete and biased. However, it is expected that some of the questions and trends that are identified will motivate discussions during the IcoBiCoBi round table (with the same name as this article) and perhaps provide a more ample perspective among the participants of that conference and the readers of this text.

  17. Bioinformatics/biostatistics: microarray analysis.

    Science.gov (United States)

    Eichler, Gabriel S

    2012-01-01

    The quantity and complexity of the molecular-level data generated in both research and clinical settings require the use of sophisticated, powerful computational interpretation techniques. It is for this reason that bioinformatic analysis of complex molecular profiling data has become a fundamental technology in the development of personalized medicine. This chapter provides a high-level overview of the field of bioinformatics and outlines several, classic bioinformatic approaches. The highlighted approaches can be aptly applied to nearly any sort of high-dimensional genomic, proteomic, or metabolomic experiments. Reviewed technologies in this chapter include traditional clustering analysis, the Gene Expression Dynamics Inspector (GEDI), GoMiner (GoMiner), Gene Set Enrichment Analysis (GSEA), and the Learner of Functional Enrichment (LeFE).

  18. Joint Top-K Spatial Keyword Query Processing

    DEFF Research Database (Denmark)

    Wu, Dingming; Yiu, Man Lung; Cong, Gao

    2012-01-01

    Web users and content are increasingly being geopositioned, and increased focus is being given to serving local content in response to web queries. This development calls for spatial keyword queries that take into account both the locations and textual descriptions of content. We study...... keyword queries. Empirical studies show that the proposed solution is efficient on real data sets. We also offer analytical studies on synthetic data sets to demonstrate the efficiency of the proposed solution. Index Terms IEEE Terms Electronic mail , Google , Indexes , Joints , Mobile communication...

  19. Training Experimental Biologists in Bioinformatics

    Directory of Open Access Journals (Sweden)

    Pedro Fernandes

    2012-01-01

    Full Text Available Bioinformatics, for its very nature, is devoted to a set of targets that constantly evolve. Training is probably the best response to the constant need for the acquisition of bioinformatics skills. It is interesting to assess the effects of training in the different sets of researchers that make use of it. While training bench experimentalists in the life sciences, we have observed instances of changes in their attitudes in research that, if well exploited, can have beneficial impacts in the dialogue with professional bioinformaticians and influence the conduction of the research itself.

  20. Bioinformatic prediction and functional characterization of human KIAA0100 gene

    OpenAIRE

    He Cui; Xi Lan; Shemin Lu; Fujun Zhang; Wanggang Zhang

    2017-01-01

    Our previous study demonstrated that human KIAA0100 gene was a novel acute monocytic leukemia-associated antigen (MLAA) gene. But the functional characterization of human KIAA0100 gene has remained unknown to date. Here, firstly, bioinformatic prediction of human KIAA0100 gene was carried out using online softwares; Secondly, Human KIAA0100 gene expression was downregulated by the clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas) 9 system in U937 cells...

  1. Hybrid Recommendation System Memanfaatkan Penggalian Frequent Itemset dan Perbandingan Keyword

    OpenAIRE

    Suka Parwita, Wayan Gede; Winarko, Edi

    2015-01-01

    AbstrakRecommendation system sering dibangun dengan memanfaatkan data peringkat item dan data identitas pengguna. Data peringkat item merupakan data yang langka pada sistem yang baru dibangun. Sedangkan, pemberian data identitas pada recommendation system dapat menimbulkan kekhawatiran penyalahgunaan data identitas.Hybrid recommendation system memanfaatkan algoritma penggalian frequent itemset dan perbandingan keyword dapat memberikan daftar rekomendasi tanpa menggunakan data identitas penggu...

  2. AN EFFECTIVE INFORMATION RETRIEVAL SYSTEM USING KEYWORD SEARCH TECHNIQUE

    Directory of Open Access Journals (Sweden)

    Dhananjay A. Gholap

    2015-10-01

    Full Text Available Keyword search is the technique use for the retrieving data or information. In Information Retrieval, keyword search is a type of search method that looks for matching documents which contain one or more keywords specified by a user.A keyword search scheme to relational database becomes an interesting area of research system within the IR and relational database system. The assumption and investigation of user search goals can be very valuable in improving search engine relevance and user experience. The user tries to search about any query on the internet, Search engine gives many numbers of result related to that query. These results can be depend on metadata or on full text indexing, because of this, user need to spend a lot of time in finding the information of his interest. Therefore, in project inferred user search goals by analyzing search engine query logs. System use a framework to discover different user search goals for a query by clustering the propose feedback sessions.

  3. Understanding the Delayed-Keyword Effect on Metacomprehension Accuracy

    Science.gov (United States)

    Thiede, Keith W.; Dunlosky, John; Griffin, Thomas D.; Wiley, Jennifer

    2005-01-01

    The typical finding from research on metacomprehension is that accuracy is quite low. However, recent studies have shown robust accuracy improvements when judgments follow certain generation tasks (summarizing or keyword listing) but only when these tasks are performed at a delay rather than immediately after reading (K. W. Thiede & M. C. M.…

  4. Keyword Searching vs. Authority Control in an Online Catalog.

    Science.gov (United States)

    Jamieson, Alexis J.; And Others

    1986-01-01

    Study conducted at the University of Western Ontario explored whether use of keywords of an online catalog would be a satisfactory alternative to self cross referencing for locating variant subject heading forms. Random comparison of machine readable cataloging with Library of Congress authorities demonstrated the desirability of cross reference…

  5. World coordinate system keywords for FITS files from Lick Observatory

    Science.gov (United States)

    Allen, Steven L.; Gates, John; Kibrick, Robert I.

    2010-07-01

    Every bit of metadata added at the time of acquisition increases the value of image data, facilitates automated processing of those data, and decreases the effort required during subsequent data curation activities. In 2002 the FITS community completed a standard for World Coordinate System (WCS) information which describes the celestial coordinates of pixels in astronomical image data. Most of the instruments in use at Lick Observatory and Keck Observatory predate this standard. None of them was designed to produce FITS files with celestial WCS information. We report on the status of WCS keywords in the FITS files of various astronomical detectors at Lick and Keck. These keywords combine the information from sources which include the telescope pointing system, the optics of the telescope and instrument, a description of the pixel layout of the detector focal plane, and the hardware and software mappings between the silicon pixels of the detector and the pixels in the data array of the FITS file. The existing WCS keywords include coordinates which refer to the detector structure itself (for locating defects and artifacts), but not celestial coordinates. We also present proof-of-concept from the first data acquisition system at Lick Observatory which inserts the WCS keywords for a celestial coordinate system.

  6. Interdisciplinarity of Nano Research Fields : A Keyword Mining Approach

    NARCIS (Netherlands)

    Wang, L.; Notten, A.; Surpatean, A.

    2012-01-01

    Using a keyword mining approach, this paper explores the interdisciplinary and integrative dynamics in five nano research fields. We argue that the general trend of integration in nano research fields is converging in the long run, although the degree of this convergence depends greatly on the indic

  7. Joint Top-K Spatial Keyword Query Processing

    DEFF Research Database (Denmark)

    Wu, Dinming; Yiu, Man Lung; Cong, Gao

    2012-01-01

    keyword queries. Empirical studies show that the proposed solution is efficient on real data sets. We also offer analytical studies on synthetic data sets to demonstrate the efficiency of the proposed solution. Index Terms IEEE Terms Electronic mail , Google , Indexes , Joints , Mobile communication...

  8. Collocations of High Frequency Noun Keywords in Prescribed Science Textbooks

    Science.gov (United States)

    Menon, Sujatha; Mukundan, Jayakaran

    2012-01-01

    This paper analyses the discourse of science through the study of collocational patterns of high frequency noun keywords in science textbooks used by upper secondary students in Malaysia. Research has shown that one of the areas of difficulty in science discourse concerns lexis, especially that of collocations. This paper describes a corpus-based…

  9. Enjoying Vocabulary Learning in Junior High: The Keyword Method

    Science.gov (United States)

    Singer, Gail

    1977-01-01

    The keyword method is a mnemonic device limited to teaching vocabulary items. It involves association of a bizarre image with the meaning of the word and can take on the attractive qualities of a game. Results indicate that motivation and interest are stimulated and vocabulary skills improved. (AMH)

  10. Bioinformatics and the Undergraduate Curriculum

    Science.gov (United States)

    Maloney, Mark; Parker, Jeffrey; LeBlanc, Mark; Woodard, Craig T.; Glackin, Mary; Hanrahan, Michael

    2010-01-01

    Recent advances involving high-throughput techniques for data generation and analysis have made familiarity with basic bioinformatics concepts and programs a necessity in the biological sciences. Undergraduate students increasingly need training in methods related to finding and retrieving information stored in vast databases. The rapid rise of…

  11. Visualising "Junk" DNA through Bioinformatics

    Science.gov (United States)

    Elwess, Nancy L.; Latourelle, Sandra M.; Cauthorn, Olivia

    2005-01-01

    One of the hottest areas of science today is the field in which biology, information technology,and computer science are merged into a single discipline called bioinformatics. This field enables the discovery and analysis of biological data, including nucleotide and amino acid sequences that are easily accessed through the use of computers. As…

  12. Reproducible Bioinformatics Research for Biologists

    Science.gov (United States)

    This book chapter describes the current Big Data problem in Bioinformatics and the resulting issues with performing reproducible computational research. The core of the chapter provides guidelines and summaries of current tools/techniques that a noncomputational researcher would need to learn to pe...

  13. Bioinformatics interoperability: all together now !

    NARCIS (Netherlands)

    Meganck, B.; Mergen, P.; Meirte, D.

    2009-01-01

    The following text presents some personal ideas about the way (bio)informatics2 is heading, along with some examples of how our institution – the Royal Museum for Central Africa (RMCA) – is gearing up for these new times ahead. It tries to find the important trends amongst the buzzwords, and to demo

  14. Bioinformatics and the Undergraduate Curriculum

    Science.gov (United States)

    Maloney, Mark; Parker, Jeffrey; LeBlanc, Mark; Woodard, Craig T.; Glackin, Mary; Hanrahan, Michael

    2010-01-01

    Recent advances involving high-throughput techniques for data generation and analysis have made familiarity with basic bioinformatics concepts and programs a necessity in the biological sciences. Undergraduate students increasingly need training in methods related to finding and retrieving information stored in vast databases. The rapid rise of…

  15. The secondary metabolite bioinformatics portal

    DEFF Research Database (Denmark)

    Weber, Tilmann; Kim, Hyun Uk

    2016-01-01

    . In this context, this review gives a summary of tools and databases that currently are available to mine, identify and characterize natural product biosynthesis pathways and their producers based on ‘omics data. A web portal called Secondary Metabolite Bioinformatics Portal (SMBP at http...

  16. Virginia Bioinformatics Institute awards Transdisciplinary Team Science

    OpenAIRE

    Bland, Susan

    2009-01-01

    The Virginia Bioinformatics Institute at Virginia Tech, in collaboration with Virginia Tech's Ph.D. program in genetics, bioinformatics, and computational biology, has awarded three fellowships in support of graduate work in transdisciplinary team science.

  17. Application of bioinformatics in tropical medicine

    Institute of Scientific and Technical Information of China (English)

    Wiwanitkit V

    2008-01-01

    Bioinformatics is a usage of information technology to help solve biological problems by designing novel and in-cisive algorithms and methods of analyses.Bioinformatics becomes a discipline vital in the era of post-genom-ics.In this review article,the application of bioinformatics in tropical medicine will be presented and dis-cussed.

  18. No-boundary thinking in bioinformatics research.

    Science.gov (United States)

    Huang, Xiuzhen; Bruce, Barry; Buchan, Alison; Congdon, Clare Bates; Cramer, Carole L; Jennings, Steven F; Jiang, Hongmei; Li, Zenglu; McClure, Gail; McMullen, Rick; Moore, Jason H; Nanduri, Bindu; Peckham, Joan; Perkins, Andy; Polson, Shawn W; Rekepalli, Bhanu; Salem, Saeed; Specker, Jennifer; Wunsch, Donald; Xiong, Donghai; Zhang, Shuzhong; Zhao, Zhongming

    2013-11-06

    Currently there are definitions from many agencies and research societies defining "bioinformatics" as deriving knowledge from computational analysis of large volumes of biological and biomedical data. Should this be the bioinformatics research focus? We will discuss this issue in this review article. We would like to promote the idea of supporting human-infrastructure (HI) with no-boundary thinking (NT) in bioinformatics (HINT).

  19. Keyword-based Ciphertext Search Algorithm under Cloud Storage

    Directory of Open Access Journals (Sweden)

    Ren Xunyi

    2016-01-01

    Full Text Available With the development of network storage services, cloud storage have the advantage of high scalability , inexpensive, without access limit and easy to manage. These advantages make more and more small or medium enterprises choose to outsource large quantities of data to a third party. This way can make lots of small and medium enterprises get rid of costs of construction and maintenance, so it has broad market prospects. But now lots of cloud storage service providers can not protect data security.This result leakage of user data, so many users have to use traditional storage method.This has become one of the important factors that hinder the development of cloud storage. In this article, establishing keyword index by extracting keywords from ciphertext data. After that, encrypted data and the encrypted index upload cloud server together.User get related ciphertext by searching encrypted index, so it can response data leakage problem.

  20. Visualization as Seen through its Research Paper Keywords.

    Science.gov (United States)

    Isenberg, Petra; Isenberg, Tobias; Sedlmair, Michael; Chen, Jian; Moller, Torsten

    2017-01-01

    We present the results of a comprehensive multi-pass analysis of visualization paper keywords supplied by authors for their papers published in the IEEE Visualization conference series (now called IEEE VIS) between 1990-2015. From this analysis we derived a set of visualization topics that we discuss in the context of the current taxonomy that is used to categorize papers and assign reviewers in the IEEE VIS reviewing process. We point out missing and overemphasized topics in the current taxonomy and start a discussion on the importance of establishing common visualization terminology. Our analysis of research topics in visualization can, thus, serve as a starting point to (a) help create a common vocabulary to improve communication among different visualization sub-groups, (b) facilitate the process of understanding differences and commonalities of the various research sub-fields in visualization, (c) provide an understanding of emerging new research trends, (d) facilitate the crucial step of finding the right reviewers for research submissions, and (e) it can eventually lead to a comprehensive taxonomy of visualization research. One additional tangible outcome of our work is an online query tool (http://keyvis.org/) that allows visualization researchers to easily browse the 3952 keywords used for IEEE VIS papers since 1990 to find related work or make informed keyword choices.

  1. 蜡状芽孢杆菌群中规律成簇间隔短回文重复序列的生物信息学分析%Bioinformatics Analysis of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) in the Genomes of Bacillus cereus Group

    Institute of Scientific and Technical Information of China (English)

    王琰; 喻婵; 王阶平; 邱宁; 何进; 孙明; 张青叶

    2011-01-01

    CRISPR is a novel type of microbial defense system, which is unique in that it is invaderspecific, adaptive and heritable. It is a recent breakthrough in understanding host-virus interactions.Bioinformatics methods including BLAST, multiple sequence alignment, and RNA structure prediction was used to analyze the CRISPR structures of 24 Bacillus cereus group genomes. CRISPR existed in 42% strains. Two types of RNA secondary structures derived from the repeat sequences were predicted, and demonstrated that stemloop secondary structure might function in mediating the interaction between foreign genetic elements and CASencoded proteins. The sequence homologous among 31% spacer, phage, plasmid and the genomes of Bacillus cereus group further verified that spacer was likely to come from the exogenous mobile genetic factor. As most of the Bacillus cereus group strains contain multiple plasmids and prophages, the CRISPR research in Bacillus cereus group by this study would be help to reveal relationship between host strains with plasmid or host strains with phage.%规律成簇间隔短回文重复序列(clustered regularly interspaced short palindromic repeats,CRISPR)是最近发现针对噬菌体等外源遗传物质的获得性和可遗传性的新型原核生物防御系统.通过BLAST、多序列比对、RNA二级结构预测等生物信息学方法对已经完成全基因组测序的蜡状芽孢杆菌群24个菌株进行CRISPR的系统分析,结果表明:42%的菌株含有该结构;8个CRISPR座位的正向重复序列可以形成RNA二级结构,提示正向重复序列可能介导外源DNA或RNA与CAS编码蛋白的相互作用;31%的间区序列与噬菌体、质粒、蜡状芽孢杆菌群基因组序列具有同源性,进一步验证间区序列很可能来源于外源可移动遗传因子.由于大部分蜡状芽孢杆菌群菌株含有多个前噬菌体和质粒,通过对蜡状芽孢杆菌群CRISPR的分析,为揭示其对宿主菌与噬菌体,以及宿主

  2. A Bioinformatics Facility for NASA

    Science.gov (United States)

    Schweighofer, Karl; Pohorille, Andrew

    2006-01-01

    Building on an existing prototype, we have fielded a facility with bioinformatics technologies that will help NASA meet its unique requirements for biological research. This facility consists of a cluster of computers capable of performing computationally intensive tasks, software tools, databases and knowledge management systems. Novel computational technologies for analyzing and integrating new biological data and already existing knowledge have been developed. With continued development and support, the facility will fulfill strategic NASA s bioinformatics needs in astrobiology and space exploration. . As a demonstration of these capabilities, we will present a detailed analysis of how spaceflight factors impact gene expression in the liver and kidney for mice flown aboard shuttle flight STS-108. We have found that many genes involved in signal transduction, cell cycle, and development respond to changes in microgravity, but that most metabolic pathways appear unchanged.

  3. Undergraduate Bioinformatics Workshops Provide Perceived Skills

    Directory of Open Access Journals (Sweden)

    Robin Herlands Cresiski

    2014-07-01

    Full Text Available Bioinformatics is becoming an important part of undergraduate curriculum, but expertise and well-evaluated teaching materials may not be available on every campus. Here, a guest speaker was utilized to introduce bioinformatics and web-available exercises were adapted for student investigation. Students used web-based nucleotide comparison tools to examine the medical and evolutionary relevance of a unidentified genetic sequence. Based on pre- and post-workshop surveys, there were significant gains in the students understanding of bioinformatics, as well as their perceived skills in using bioinformatics tools. The relevance of bioinformatics to a student’s career seemed dependent on career aspirations.

  4. An introduction to proteome bioinformatics.

    Science.gov (United States)

    Jones, Andrew R; Hubbard, Simon J

    2010-01-01

    This book is part of the Methods in Molecular Biology series, and provides a general overview of computational approaches used in proteome research. In this chapter, we give an overview of the scope of the book in terms of current proteomics experimental techniques and the reasons why computational approaches are needed. We then give a summary of each chapter, which together provide a picture of the state of the art in proteome bioinformatics research.

  5. Statistics of co-occurring keywords on Twitter

    CERN Document Server

    Mathiesen, Joachim; Jensen, Mogens H

    2014-01-01

    Online social media such as the micro-blogging site Twitter has become a rich source of real-time data on online human behaviors. Here we analyze the occurrence and co-occurrence frequency of keywords in user posts on Twitter. From the occurrence rate of major international brand names, we provide examples on predictions of brand-user behaviors. From the co-occurrence rates, we further analyze the user-perceived relationships between international brand names and construct the corresponding relationship networks. In general the user activity on Twitter is highly intermittent and we show that the occurrence rate of brand names forms a highly correlated time signal.

  6. Keyword Search over Data Service Integration for Accurate Results

    CERN Document Server

    Zemleris, Vidmantas; Robert Gwadera

    2013-01-01

    Virtual data integration provides a coherent interface for querying heterogeneous data sources (e.g., web services, proprietary systems) with minimum upfront effort. Still, this requires its users to learn the query language and to get acquainted with data organization, which may pose problems even to proficient users. We present a keyword search system, which proposes a ranked list of structured queries along with their explanations. It operates mainly on the metadata, such as the constraints on inputs accepted by services. It was developed as an integral part of the CMS data discovery service, and is currently available as open source.

  7. Corpus analysis and automatic detection of emotion-including keywords

    Science.gov (United States)

    Yuan, Bo; He, Xiangqing; Liu, Ying

    2013-12-01

    Emotion words play a vital role in many sentiment analysis tasks. Previous research uses sentiment dictionary to detect the subjectivity or polarity of words. In this paper, we dive into Emotion-Inducing Keywords (EIK), which refers to the words in use that convey emotion. We first analyze an emotion corpus to explore the pragmatic aspects of EIK. Then we design an effective framework for automatically detecting EIK in sentences by utilizing linguistic features and context information. Our system outperforms traditional dictionary-based methods dramatically in increasing Precision, Recall and F1-score.

  8. Efficient Continuously Moving Top-K Spatial Keyword Query Processing

    DEFF Research Database (Denmark)

    Wu, Dinming; Yiu, Man Lung; Jensen, Christian Søndergaard;

    2011-01-01

    keyword data. State-of-the-art solutions for moving queries employ safe zones that guarantee the validity of reported results as long as the user remains within a zone. However, existing safe zone methods focus solely on spatial locations and ignore text relevancy. We propose two algorithms for computing...... safe zones that guarantee correct results at any time and that aim to optimize the computation on the server as well as the communication between the server and the client. We exploit tight and conservative approximations of safe zones and aggressive computational space pruning. Empirical studies...

  9. Bioinformatics Training Network (BTN): a community resource for bioinformatics trainers

    DEFF Research Database (Denmark)

    Schneider, Maria V.; Walter, Peter; Blatter, Marie-Claude

    2012-01-01

    Funding bodies are increasingly recognizing the need to provide graduates and researchers with access to short intensive courses in a variety of disciplines, in order both to improve the general skills base and to provide solid foundations on which researchers may build their careers. In response...... and clearly tagged in relation to target audiences, learning objectives, etc. Ideally, they would also be peer reviewed, and easily and efficiently accessible for downloading. Here, we present the Bioinformatics Training Network (BTN), a new enterprise that has been initiated to address these needs and review...

  10. Bioinformatics

    DEFF Research Database (Denmark)

    Baldi, Pierre; Brunak, Søren

    as a strategic frontier between biology and computer science. Machine learning approaches (e.g. neural networks, hidden Markov models, and belief networsk) are ideally suited for areas in which there is a lot of data but little theory. The goal in machine learning is to extract useful information from a body...... of data by building good probabilistic models. The particular twist behind machine learning, however, is to automate the process as much as possible.In this book, the authors present the key machine learning approaches and apply them to the computational problems encountered in the analysis of biological...

  11. PKIS: practical keyword index search on cloud datacenter

    Directory of Open Access Journals (Sweden)

    Park Jae Hyun

    2011-01-01

    Full Text Available Abstract This paper highlights the importance of the interoperability of the encrypted DB in terms of the characteristics of DB and efficient schemes. Although most prior researches have developed efficient algorithms under the provable security, they do not focus on the interoperability of the encrypted DB. In order to address this lack of practical aspects, we conduct two practical approaches--efficiency and group search in cloud datacenter. The process of this paper is as follows: first, we create two schemes of efficiency and group search--practical keyword index search--I and II; second, we define and analyze group search secrecy and keyword index search privacy in our schemes; third, we experiment on efficient performances over our proposed encrypted DB. As the result, we summarize two major results: (1our proposed schemes can support a secure group search without re-encrypting all documents under the group-key update and (2our experiments represent that our scheme is approximately 935 times faster than Golle's scheme and about 16 times faster than Song's scheme for 10,000 documents. Based on our experiments and results, this paper has the following contributions: (1 in the current cloud computing environments, our schemes provide practical, realistic, and secure solutions over the encrypted DB and (2 this paper identifies the importance of interoperability with database management system for designing efficient schemes.

  12. Formalizing An Approach to Curate the Global Change Master Directory (GCMD)'s Controlled Vocabularies (Keywords) Through a Keyword Governance Process and Community Involvement

    Science.gov (United States)

    Stevens, T.

    2016-12-01

    NASA's Global Change Master Directory (GCMD) curates a hierarchical set of controlled vocabularies (keywords) covering Earth sciences and associated information (data centers, projects, platforms, and instruments). The purpose of the keywords is to describe Earth science data and services in a consistent and comprehensive manner, allowing for precise metadata search and subsequent retrieval of data and services. The keywords are accessible in a standardized SKOS/RDF/OWL representation and are used as an authoritative taxonomy, as a source for developing ontologies, and to search and access Earth Science data within online metadata catalogs. The keyword curation approach involves: (1) receiving community suggestions; (2) triaging community suggestions; (3) evaluating keywords against a set of criteria coordinated by the NASA Earth Science Data and Information System (ESDIS) Standards Office; (4) implementing the keywords; and (5) publication/notification of keyword changes. This approach emphasizes community input, which helps ensure a high quality, normalized, and relevant keyword structure that will evolve with users' changing needs. The Keyword Community Forum, which promotes a responsive, open, and transparent process, is an area where users can discuss keyword topics and make suggestions for new keywords. Others could potentially use this formalized approach as a model for keyword curation.

  13. Bioinformatics in Africa: The Rise of Ghana?

    Science.gov (United States)

    Karikari, Thomas K.

    2015-01-01

    Until recently, bioinformatics, an important discipline in the biological sciences, was largely limited to countries with advanced scientific resources. Nonetheless, several developing countries have lately been making progress in bioinformatics training and applications. In Africa, leading countries in the discipline include South Africa, Nigeria, and Kenya. However, one country that is less known when it comes to bioinformatics is Ghana. Here, I provide a first description of the development of bioinformatics activities in Ghana and how these activities contribute to the overall development of the discipline in Africa. Over the past decade, scientists in Ghana have been involved in publications incorporating bioinformatics analyses, aimed at addressing research questions in biomedical science and agriculture. Scarce research funding and inadequate training opportunities are some of the challenges that need to be addressed for Ghanaian scientists to continue developing their expertise in bioinformatics. PMID:26378921

  14. Bioinformatics in Africa: The Rise of Ghana?

    Directory of Open Access Journals (Sweden)

    Thomas K Karikari

    2015-09-01

    Full Text Available Until recently, bioinformatics, an important discipline in the biological sciences, was largely limited to countries with advanced scientific resources. Nonetheless, several developing countries have lately been making progress in bioinformatics training and applications. In Africa, leading countries in the discipline include South Africa, Nigeria, and Kenya. However, one country that is less known when it comes to bioinformatics is Ghana. Here, I provide a first description of the development of bioinformatics activities in Ghana and how these activities contribute to the overall development of the discipline in Africa. Over the past decade, scientists in Ghana have been involved in publications incorporating bioinformatics analyses, aimed at addressing research questions in biomedical science and agriculture. Scarce research funding and inadequate training opportunities are some of the challenges that need to be addressed for Ghanaian scientists to continue developing their expertise in bioinformatics.

  15. Bioinformatics analysis of differentially expressed proteins in prostate cancer based on proteomics data

    Directory of Open Access Journals (Sweden)

    Chen C

    2016-03-01

    Full Text Available Chen Chen,1 Li-Guo Zhang,1 Jian Liu,1 Hui Han,1 Ning Chen,1 An-Liang Yao,1 Shao-San Kang,1 Wei-Xing Gao,1 Hong Shen,2 Long-Jun Zhang,1 Ya-Peng Li,1 Feng-Hong Cao,1 Zhi-Guo Li3 1Department of Urology, North China University of Science and Technology Affiliated Hospital, 2Department of Modern Technology and Education Center, 3Department of Medical Research Center, International Science and Technology Cooperation Base of Geriatric Medicine, North China University of Science and Technology, Tangshan, People’s Republic of China Abstract: We mined the literature for proteomics data to examine the occurrence and metastasis of prostate cancer (PCa through a bioinformatics analysis. We divided the differentially expressed proteins (DEPs into two groups: the group consisting of PCa and benign tissues (P&b and the group presenting both high and low PCa metastatic tendencies (H&L. In the P&b group, we found 320 DEPs, 20 of which were reported more than three times, and DES was the most commonly reported. Among these DEPs, the expression levels of FGG, GSN, SERPINC1, TPM1, and TUBB4B have not yet been correlated with PCa. In the H&L group, we identified 353 DEPs, 13 of which were reported more than three times. Among these DEPs, MDH2 and MYH9 have not yet been correlated with PCa metastasis. We further confirmed that DES was differentially expressed between 30 cancer and 30 benign tissues. In addition, DEPs associated with protein transport, regulation of actin cytoskeleton, and the extracellular matrix (ECM–receptor interaction pathway were prevalent in the H&L group and have not yet been studied in detail in this context. Proteins related to homeostasis, the wound-healing response, focal adhesions, and the complement and coagulation pathways were overrepresented in both groups. Our findings suggest that the repeatedly reported DEPs in the two groups may function as potential biomarkers for detecting PCa and predicting its aggressiveness. Furthermore

  16. Management of information for mission operations using automated keyword referencing

    Science.gov (United States)

    Davidson, Roger A.; Curran, Patrick S.

    1993-01-01

    Although millions of dollars have helped to improve the operability and technology of ground data systems for mission operations, almost all mission documentation remains bound in printed volumes. This form of documentation is difficult and timeconsuming to use, may be out-of-date, and is usually not cross-referenced with other related volumes of mission documentation. A more effective, automated method of mission information access is needed. A new method of information management for mission operations using automated keyword referencing is proposed. We expound on the justification for and the objectives of this concept. The results of a prototype tool for mission information access that uses a hypertextlike user interface and existing mission documentation are shared. Finally, the future directions and benefits of our proposed work are described.

  17. Finding keywords amongst noise: automatic text classification without parsing

    Science.gov (United States)

    Allison, Andrew G.; Pearce, Charles E. M.; Abbott, Derek

    2007-06-01

    The amount of text stored on the Internet, and in our libraries, continues to expand at an exponential rate. There is a great practical need to locate relevant content. This requires quick automated methods for classifying textual information, according to subject. We propose a quick statistical approach, which can distinguish between 'keywords' and 'noisewords', like 'the' and 'a', without the need to parse the text into its parts of speech. Our classification is based on an F-statistic, which compares the observed Word Recurrence Interval (WRI) with a simple null hypothesis. We also propose a model to account for the observed distribution of WRI statistics and we subject this model to a number of tests.

  18. Establishing bioinformatics research in the Asia Pacific

    Directory of Open Access Journals (Sweden)

    Tammi Martti

    2006-12-01

    Full Text Available Abstract In 1998, the Asia Pacific Bioinformatics Network (APBioNet, Asia's oldest bioinformatics organisation was set up to champion the advancement of bioinformatics in the Asia Pacific. By 2002, APBioNet was able to gain sufficient critical mass to initiate the first International Conference on Bioinformatics (InCoB bringing together scientists working in the field of bioinformatics in the region. This year, the InCoB2006 Conference was organized as the 5th annual conference of the Asia-Pacific Bioinformatics Network, on Dec. 18–20, 2006 in New Delhi, India, following a series of successful events in Bangkok (Thailand, Penang (Malaysia, Auckland (New Zealand and Busan (South Korea. This Introduction provides a brief overview of the peer-reviewed manuscripts accepted for publication in this Supplement. It exemplifies a typical snapshot of the growing research excellence in bioinformatics of the region as we embark on a trajectory of establishing a solid bioinformatics research culture in the Asia Pacific that is able to contribute fully to the global bioinformatics community.

  19. MOWServ: a web client for integration of bioinformatic resources.

    Science.gov (United States)

    Ramírez, Sergio; Muñoz-Mérida, Antonio; Karlsson, Johan; García, Maximiliano; Pérez-Pulido, Antonio J; Claros, M Gonzalo; Trelles, Oswaldo

    2010-07-01

    The productivity of any scientist is affected by cumbersome, tedious and time-consuming tasks that try to make the heterogeneous web services compatible so that they can be useful in their research. MOWServ, the bioinformatic platform offered by the Spanish National Institute of Bioinformatics, was released to provide integrated access to databases and analytical tools. Since its release, the number of available services has grown dramatically, and it has become one of the main contributors of registered services in the EMBRACE Biocatalogue. The ontology that enables most of the web-service compatibility has been curated, improved and extended. The service discovery has been greatly enhanced by Magallanes software and biodataSF. User data are securely stored on the main server by an authentication protocol that enables the monitoring of current or already-finished user's tasks, as well as the pipelining of successive data processing services. The BioMoby standard has been greatly extended with the new features included in the MOWServ, such as management of additional information (metadata such as extended descriptions, keywords and datafile examples), a qualified registry, error handling, asynchronous services and service replication. All of them have increased the MOWServ service quality, usability and robustness. MOWServ is available at http://www.inab.org/MOWServ/ and has a mirror at http://www.bitlab-es.com/MOWServ/.

  20. MOWServ: a web client for integration of bioinformatic resources

    Science.gov (United States)

    Ramírez, Sergio; Muñoz-Mérida, Antonio; Karlsson, Johan; García, Maximiliano; Pérez-Pulido, Antonio J.; Claros, M. Gonzalo; Trelles, Oswaldo

    2010-01-01

    The productivity of any scientist is affected by cumbersome, tedious and time-consuming tasks that try to make the heterogeneous web services compatible so that they can be useful in their research. MOWServ, the bioinformatic platform offered by the Spanish National Institute of Bioinformatics, was released to provide integrated access to databases and analytical tools. Since its release, the number of available services has grown dramatically, and it has become one of the main contributors of registered services in the EMBRACE Biocatalogue. The ontology that enables most of the web-service compatibility has been curated, improved and extended. The service discovery has been greatly enhanced by Magallanes software and biodataSF. User data are securely stored on the main server by an authentication protocol that enables the monitoring of current or already-finished user’s tasks, as well as the pipelining of successive data processing services. The BioMoby standard has been greatly extended with the new features included in the MOWServ, such as management of additional information (metadata such as extended descriptions, keywords and datafile examples), a qualified registry, error handling, asynchronous services and service replication. All of them have increased the MOWServ service quality, usability and robustness. MOWServ is available at http://www.inab.org/MOWServ/ and has a mirror at http://www.bitlab-es.com/MOWServ/. PMID:20525794

  1. HotSwap for bioinformatics: A STRAP tutorial

    Directory of Open Access Journals (Sweden)

    Robinson Peter N

    2006-02-01

    Full Text Available Abstract Background Bioinformatics applications are now routinely used to analyze large amounts of data. Application development often requires many cycles of optimization, compiling, and testing. Repeatedly loading large datasets can significantly slow down the development process. We have incorporated HotSwap functionality into the protein workbench STRAP, allowing developers to create plugins using the Java HotSwap technique. Results Users can load multiple protein sequences or structures into the main STRAP user interface, and simultaneously develop plugins using an editor of their choice such as Emacs. Saving changes to the Java file causes STRAP to recompile the plugin and automatically update its user interface without requiring recompilation of STRAP or reloading of protein data. This article presents a tutorial on how to develop HotSwap plugins. STRAP is available at http://strapjava.de and http://www.charite.de/bioinf/strap. Conclusion HotSwap is a useful and time-saving technique for bioinformatics developers. HotSwap can be used to efficiently develop bioinformatics applications that require loading large amounts of data into memory.

  2. HotSwap for bioinformatics: a STRAP tutorial.

    Science.gov (United States)

    Gille, Christoph; Robinson, Peter N

    2006-02-09

    Bioinformatics applications are now routinely used to analyze large amounts of data. Application development often requires many cycles of optimization, compiling, and testing. Repeatedly loading large datasets can significantly slow down the development process. We have incorporated HotSwap functionality into the protein workbench STRAP, allowing developers to create plugins using the Java HotSwap technique. Users can load multiple protein sequences or structures into the main STRAP user interface, and simultaneously develop plugins using an editor of their choice such as Emacs. Saving changes to the Java file causes STRAP to recompile the plugin and automatically update its user interface without requiring recompilation of STRAP or reloading of protein data. This article presents a tutorial on how to develop HotSwap plugins. STRAP is available at http://strapjava.de and http://www.charite.de/bioinf/strap. HotSwap is a useful and time-saving technique for bioinformatics developers. HotSwap can be used to efficiently develop bioinformatics applications that require loading large amounts of data into memory.

  3. The Aspergillus Mine - publishing bioinformatics

    DEFF Research Database (Denmark)

    Vesth, Tammi Camilla; Rasmussen, Jane Lind Nybo; Theobald, Sebastian

    so with no computational specialist. Here we present a setup for analysis and publication of genome data of 70 species of Aspergillus fungi. The platform is based on R, Python and uses the RShiny framework to create interactive web‐applications. It allows all participants to create interactive...... analysis which can be shared with the team and in connection with publications. We present analysis for investigation of genetic diversity, secondary and primary metabolism and general data overview. The platform, the Aspergillus Mine, is a collection of analysis tools based on data from collaboration...... with the Joint Genome Institute. The Aspergillus Mine is not intended as a genomic data sharing service but instead focuses on creating an environment where the results of bioinformatic analysis is made available for inspection. The data and code is public upon request and figures can be obtained directly from...

  4. A Mathematical Optimization Problem in Bioinformatics

    Science.gov (United States)

    Heyer, Laurie J.

    2008-01-01

    This article describes the sequence alignment problem in bioinformatics. Through examples, we formulate sequence alignment as an optimization problem and show how to compute the optimal alignment with dynamic programming. The examples and sample exercises have been used by the author in a specialized course in bioinformatics, but could be adapted…

  5. Online Bioinformatics Tutorials | Office of Cancer Genomics

    Science.gov (United States)

    Bioinformatics is a scientific discipline that applies computer science and information technology to help understand biological processes. The NIH provides a list of free online bioinformatics tutorials, either generated by the NIH Library or other institutes, which includes introductory lectures and "how to" videos on using various tools.

  6. A Mathematical Optimization Problem in Bioinformatics

    Science.gov (United States)

    Heyer, Laurie J.

    2008-01-01

    This article describes the sequence alignment problem in bioinformatics. Through examples, we formulate sequence alignment as an optimization problem and show how to compute the optimal alignment with dynamic programming. The examples and sample exercises have been used by the author in a specialized course in bioinformatics, but could be adapted…

  7. Bioinformatics clouds for big data manipulation

    KAUST Repository

    Dai, Lin

    2012-11-28

    As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics.This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor. 2012 Dai et al.; licensee BioMed Central Ltd.

  8. BIOINFORMATICS FOR UNDERGRADUATES OF LIFE SCIENCE COURSES

    Directory of Open Access Journals (Sweden)

    J.F. De Mesquita

    2007-05-01

    Full Text Available In the recent years, Bioinformatics has emerged as an important research tool. Theability to mine large databases for relevant information has become essential fordifferent life science fields. On the other hand, providing education in bioinformatics toundergraduates is challenging from this multidisciplinary perspective. Therefore, it isimportant to introduced undergraduate students to the available information andcurrent methodologies in Bioinformatics. Here we report the results of a course usinga computer-assisted and problem -based learning model. The syllabus was comprisedof theoretical lectures covering different topics within bioinformatics and practicalactivities. For the latter, we developed a set of step-by-step tutorials based on casestudies. The course was applied to undergraduate students of biological andbiomedical courses. At the end of the course, the students were able to build up astep-by-step tutorial covering a bioinformatics issue.

  9. Bioinformatics clouds for big data manipulation

    Directory of Open Access Journals (Sweden)

    Dai Lin

    2012-11-01

    Full Text Available Abstract As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS, Software as a Service (SaaS, Platform as a Service (PaaS, and Infrastructure as a Service (IaaS, and present our perspectives on the adoption of cloud computing in bioinformatics. Reviewers This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor.

  10. A Keyword Analysis for Human Resource Management Factors

    Directory of Open Access Journals (Sweden)

    Muhammed Kürşad ÖZLEN

    2014-05-01

    Full Text Available With the constant increasing in technology and education, with development of multinational corporations and frequent changes in economic status and structures, Human Resources become the most crucial, the most reliable and necessary department. Moreover, in many companies, Human Resource Department is the most important department. The main purpose of this research is to mark off top rated factors related with Human Resource Management by analyzing all the abstracts of the published papers of a Human Resource Management journal for the period between the first issue of 2005 and the first issue of 2013. We identified the most frequent categories of the articles during this analyzed period. The literature is reviewed according to the identified factors related to Human Resource Management. If the keywords about Human Resources (35,7 % is not considered, it is observed that the researches, for the selected period, have organizational approach (39,2 % (Management, organizational strategy, organizational performance, organizational culture, contextual issues, technical issues and location and from the individual approach (24,4 % (Individual performance, training and education, employee rights, and behavioral issues. Furthermore, it is also observed that the researchers (a mainly give importance to the practice more than the theory and (b consider the organization more than the individual.

  11. Defining Smart City. A Conceptual Framework Based on Keyword Analysis

    Directory of Open Access Journals (Sweden)

    Farnaz Mosannenzadeh

    2014-05-01

    Full Text Available “Smart city” is a concept that has been the subject of increasing attention in urban planning and governance during recent years. The first step to create Smart Cities is to understand its concept. However, a brief review of literature shows that the concept of Smart City is the subject of controversy. Thus, the main purpose of this paper is to provide a conceptual framework to define Smart City. To this aim, an extensive literature review was done. Then, a keyword analysis on literature was held against main research questions (why, what, who, when, where, how and based on three main domains involved in the policy decision making process and Smart City plan development: Academic, Industrial and Governmental. This resulted in a conceptual framework for Smart City. The result clarifies the definition of Smart City, while providing a framework to define Smart City’s each sub-system. Moreover, urban authorities can apply this framework in Smart City initiatives in order to recognize their main goals, main components, and key stakeholders.

  12. Keywords Review of IT Security Literature in Recent 20 Years

    Directory of Open Access Journals (Sweden)

    WANG Lidong

    2012-10-01

    Full Text Available The volume of published scientific literature available on Internet has been increasing exponentially. Some of them reflect the latest achievement of the specific research domain. In recent years, many projects have been funded aiming to online scientific literature mining, especially in biomedical research. Scientific literature covers most of the hot topics in the research field and has a very large domain-specific vocabulary. The exploitation of domain knowledge and specialized vocabulary can dramatically improve the result of literature text processing. The purpose of this paper is to identify the frequently used keywords in IT security literatures. The result then can be utilized to improve the performance of automatic IT security document retrieval, identification and classification. Our method is to query CiteseeX to retrieve source data of paper description information and build an artificially annotated corpus. Over the corpus, we perform words frequency statistics, word co-occurrence analysis, discrimination index computation, retrieval efficiency analysis and thus build a lexicon of IT security based on our experimental result. The lexicon can further be used in improving retrieval performance and assisting new words discovering and document classification.

  13. Imagined Affordance: Reconstructing a Keyword for Communication Theory

    Directory of Open Access Journals (Sweden)

    Peter Nagy

    2015-09-01

    Full Text Available In this essay, we reconstruct a keyword for communication—affordance. Affordance, adopted from ecological psychology, is now widely used in technology studies, yet the term lacks a clear definition. This is especially problematic for scholars grappling with how to theorize the relationship between technology and sociality for complex socio-technical systems such as machine-learning algorithms, pervasive computing, the Internet of Things, and other such “smart” innovations. Within technology studies, emerging theories of materiality, affect, and mediation all necessitate a richer and more nuanced definition for affordance than the field currently uses. To solve this, we develop the concept of imagined affordance. Imagined affordances emerge between users’ perceptions, attitudes, and expectations; between the materiality and functionality of technologies; and between the intentions and perceptions of designers. We use imagined affordance to evoke the importance of imagination in affordances—expectations for technology that are not fully realized in conscious, rational knowledge. We also use imagined affordance to distinguish our process-oriented, socio-technical definition of affordance from the “imagined” consensus of the field around a flimsier use of the term. We also use it in order to better capture the importance of mediation, materiality, and affect. We suggest that imagined affordance helps to theorize the duality of materiality and communication technology: namely, that people shape their media environments, perceive them, and have agency within them because of imagined affordances.

  14. An effective suggestion method for keyword search of databases

    KAUST Repository

    Huang, Hai

    2016-09-09

    This paper solves the problem of providing high-quality suggestions for user keyword queries over databases. With the assumption that the returned suggestions are independent, existing query suggestion methods over databases score candidate suggestions individually and return the top-k best of them. However, the top-k suggestions have high redundancy with respect to the topics. To provide informative suggestions, the returned k suggestions are expected to be diverse, i.e., maximizing the relevance to the user query and the diversity with respect to topics that the user might be interested in simultaneously. In this paper, an objective function considering both factors is defined for evaluating a suggestion set. We show that maximizing the objective function is a submodular function maximization problem subject to n matroid constraints, which is an NP-hard problem. An greedy approximate algorithm with an approximation ratio O((Formula presented.)) is also proposed. Experimental results show that our suggestion outperforms other methods on providing relevant and diverse suggestions. © 2016 Springer Science+Business Media New York

  15. Concepts Of Bioinformatics And Its Application In Veterinary ...

    African Journals Online (AJOL)

    Concepts Of Bioinformatics And Its Application In Veterinary Research And ... Bioinformatics is the science of managing and analyzing biological information. Because of the rapidly growing sequence biological data, bioinformatics tools and ...

  16. Query Intent Disambiguation of Keyword-Based Semantic Entity Search in Dataspaces

    Institute of Scientific and Technical Information of China (English)

    Dan Yang; De-Rong Shen; Ge Yu; Yue Kou; Tie-Zheng Nie

    2013-01-01

    Keyword query has attracted much research attention due to its simplicity and wide applications.The inherent ambiguity of keyword query is prone to unsatisfied query results.Moreover some existing techniques on Web query,keyword query in relational databases and XML databases cannot be completely applied to keyword query in dataspaces.So we propose KeymanticES,a novel keyword-based semantic entity search mechanism in dataspaces which combines both keyword query and semantic query features.And we focus on query intent disambiguation problem and propose a novel three-step approach to resolve it.Extensive experimental results show the effectiveness and correctness of our proposed approach.

  17. Computational biology and bioinformatics in Nigeria.

    Science.gov (United States)

    Fatumo, Segun A; Adoga, Moses P; Ojo, Opeolu O; Oluwagbemi, Olugbenga; Adeoye, Tolulope; Ewejobi, Itunuoluwa; Adebiyi, Marion; Adebiyi, Ezekiel; Bewaji, Clement; Nashiru, Oyekanmi

    2014-04-01

    Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological data generated by the scientific community. The critical need to process and analyze such a deluge of data and turn it into useful knowledge has caused bioinformatics to gain prominence and importance. Bioinformatics is an interdisciplinary research area that applies techniques, methodologies, and tools in computer and information science to solve biological problems. In Nigeria, bioinformatics has recently played a vital role in the advancement of biological sciences. As a developing country, the importance of bioinformatics is rapidly gaining acceptance, and bioinformatics groups comprised of biologists, computer scientists, and computer engineers are being constituted at Nigerian universities and research institutes. In this article, we present an overview of bioinformatics education and research in Nigeria. We also discuss professional societies and academic and research institutions that play central roles in advancing the discipline in Nigeria. Finally, we propose strategies that can bolster bioinformatics education and support from policy makers in Nigeria, with potential positive implications for other developing countries.

  18. Computational biology and bioinformatics in Nigeria.

    Directory of Open Access Journals (Sweden)

    Segun A Fatumo

    2014-04-01

    Full Text Available Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological data generated by the scientific community. The critical need to process and analyze such a deluge of data and turn it into useful knowledge has caused bioinformatics to gain prominence and importance. Bioinformatics is an interdisciplinary research area that applies techniques, methodologies, and tools in computer and information science to solve biological problems. In Nigeria, bioinformatics has recently played a vital role in the advancement of biological sciences. As a developing country, the importance of bioinformatics is rapidly gaining acceptance, and bioinformatics groups comprised of biologists, computer scientists, and computer engineers are being constituted at Nigerian universities and research institutes. In this article, we present an overview of bioinformatics education and research in Nigeria. We also discuss professional societies and academic and research institutions that play central roles in advancing the discipline in Nigeria. Finally, we propose strategies that can bolster bioinformatics education and support from policy makers in Nigeria, with potential positive implications for other developing countries.

  19. Evolutionary features of academic articles co-keyword network and keywords co-occurrence network: Based on two-mode affiliation network

    Science.gov (United States)

    Li, Huajiao; An, Haizhong; Wang, Yue; Huang, Jiachen; Gao, Xiangyun

    2016-05-01

    Keeping abreast of trends in the articles and rapidly grasping a body of article's key points and relationship from a holistic perspective is a new challenge in both literature research and text mining. As the important component, keywords can present the core idea of the academic article. Usually, articles on a single theme or area could share one or some same keywords, and we can analyze topological features and evolution of the articles co-keyword networks and keywords co-occurrence networks to realize the in-depth analysis of the articles. This paper seeks to integrate statistics, text mining, complex networks and visualization to analyze all of the academic articles on one given theme, complex network(s). All 5944 "complex networks" articles that were published between 1990 and 2013 and are available on the Web of Science are extracted. Based on the two-mode affiliation network theory, a new frontier of complex networks, we constructed two different networks, one taking the articles as nodes, the co-keyword relationships as edges and the quantity of co-keywords as the weight to construct articles co-keyword network, and another taking the articles' keywords as nodes, the co-occurrence relationships as edges and the quantity of simultaneous co-occurrences as the weight to construct keyword co-occurrence network. An integrated method for analyzing the topological features and evolution of the articles co-keyword network and keywords co-occurrence networks is proposed, and we also defined a new function to measure the innovation coefficient of the articles in annual level. This paper provides a useful tool and process for successfully achieving in-depth analysis and rapid understanding of the trends and relationships of articles in a holistic perspective.

  20. The Effect of Keyword and Context Methods on Vocabulary Retention of Iranian EFL Learners

    Directory of Open Access Journals (Sweden)

    Hassan Soleimani

    2012-07-01

    Full Text Available This study intended to investigate the comparative effectiveness of keyword and context method on immediate and delayed vocabulary retention of EFL learners. It also compared the rate of forgetting in the keyword and context groups. With a quasi experimental design, 40 learners from two intact classes in a language teaching institute in Khorramabad, Iran, were randomly assigned to the keyword and context group. The keyword group received the keyword strategy training, while the context group focused on learning vocabulary in their real context. The result indicated that learners in the keyword group recalled more vocabulary immediately after training and one week later. The results also indicated the rate of forgetting is more in the context group than in the keyword group.Key words: Vocabulary Learning, Keyword Strategy, Context Strategy, Vocabulary Retention

  1. When cloud computing meets bioinformatics: a review.

    Science.gov (United States)

    Zhou, Shuigeng; Liao, Ruiqi; Guan, Jihong

    2013-10-01

    In the past decades, with the rapid development of high-throughput technologies, biology research has generated an unprecedented amount of data. In order to store and process such a great amount of data, cloud computing and MapReduce were applied to many fields of bioinformatics. In this paper, we first introduce the basic concepts of cloud computing and MapReduce, and their applications in bioinformatics. We then highlight some problems challenging the applications of cloud computing and MapReduce to bioinformatics. Finally, we give a brief guideline for using cloud computing in biology research.

  2. ProtRepeatsDB: a database of amino acid repeats in genomes

    Directory of Open Access Journals (Sweden)

    Chauhan Virander S

    2006-07-01

    Full Text Available Abstract Background Genome wide and cross species comparisons of amino acid repeats is an intriguing problem in biology mainly due to the highly polymorphic nature and diverse functions of amino acid repeats. Innate protein repeats constitute vital functional and structural regions in proteins. Repeats are of great consequence in evolution of proteins, as evident from analysis of repeats in different organisms. In the post genomic era, availability of protein sequences encoded in different genomes provides a unique opportunity to perform large scale comparative studies of amino acid repeats. ProtRepeatsDB http://bioinfo.icgeb.res.in/repeats/ is a relational database of perfect and mismatch repeats, access to which is designed as a resource and collection of tools for detection and cross species comparisons of different types of amino acid repeats. Description ProtRepeatsDB (v1.2 consists of perfect as well as mismatch amino acid repeats in the protein sequences of 141 organisms, the genomes of which are now available. The web interface of ProtRepeatsDB consists of different tools to perform repeat s; based on protein IDs, organism name, repeat sequences, and keywords as in FASTA headers, size, frequency, gene ontology (GO annotation IDs and regular expressions (REGEXP describing repeats. These tools also allow formulation of a variety of simple, complex and logical queries to facilitate mining and large-scale cross-species comparisons of amino acid repeats. In addition to this, the database also contains sequence analysis tools to determine repeats in user input sequences. Conclusion ProtRepeatsDB is a multi-organism database of different types of amino acid repeats present in proteins. It integrates useful tools to perform genome wide queries for rapid screening and identification of amino acid repeats and facilitates comparative and evolutionary studies of the repeats. The database is useful for identification of species or organism specific

  3. Agile parallel bioinformatics workflow management using Pwrake.

    OpenAIRE

    2011-01-01

    Abstract Background In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environm...

  4. Coronavirus Genomics and Bioinformatics Analysis

    Directory of Open Access Journals (Sweden)

    Kwok-Yung Yuen

    2010-08-01

    Full Text Available The drastic increase in the number of coronaviruses discovered and coronavirus genomes being sequenced have given us an unprecedented opportunity to perform genomics and bioinformatics analysis on this family of viruses. Coronaviruses possess the largest genomes (26.4 to 31.7 kb among all known RNA viruses, with G + C contents varying from 32% to 43%. Variable numbers of small ORFs are present between the various conserved genes (ORF1ab, spike, envelope, membrane and nucleocapsid and downstream to nucleocapsid gene in different coronavirus lineages. Phylogenetically, three genera, Alphacoronavirus, Betacoronavirus and Gammacoronavirus, with Betacoronavirus consisting of subgroups A, B, C and D, exist. A fourth genus, Deltacoronavirus, which includes bulbul coronavirus HKU11, thrush coronavirus HKU12 and munia coronavirus HKU13, is emerging. Molecular clock analysis using various gene loci revealed that the time of most recent common ancestor of human/civet SARS related coronavirus to be 1999-2002, with estimated substitution rate of 4´10-4 to 2´10-2 substitutions per site per year. Recombination in coronaviruses was most notable between different strains of murine hepatitis virus (MHV, between different strains of infectious bronchitis virus, between MHV and bovine coronavirus, between feline coronavirus (FCoV type I and canine coronavirus generating FCoV type II, and between the three genotypes of human coronavirus HKU1 (HCoV-HKU1. Codon usage bias in coronaviruses were observed, with HCoV-HKU1 showing the most extreme bias, and cytosine deamination and selection of CpG suppressed clones are the two major independent biological forces that shape such codon usage bias in coronaviruses.

  5. Planning bioinformatics workflows using an expert system.

    Science.gov (United States)

    Chen, Xiaoling; Chang, Jeffrey T

    2017-04-15

    Bioinformatic analyses are becoming formidably more complex due to the increasing number of steps required to process the data, as well as the proliferation of methods that can be used in each step. To alleviate this difficulty, pipelines are commonly employed. However, pipelines are typically implemented to automate a specific analysis, and thus are difficult to use for exploratory analyses requiring systematic changes to the software or parameters used. To automate the development of pipelines, we have investigated expert systems. We created the Bioinformatics ExperT SYstem (BETSY) that includes a knowledge base where the capabilities of bioinformatics software is explicitly and formally encoded. BETSY is a backwards-chaining rule-based expert system comprised of a data model that can capture the richness of biological data, and an inference engine that reasons on the knowledge base to produce workflows. Currently, the knowledge base is populated with rules to analyze microarray and next generation sequencing data. We evaluated BETSY and found that it could generate workflows that reproduce and go beyond previously published bioinformatics results. Finally, a meta-investigation of the workflows generated from the knowledge base produced a quantitative measure of the technical burden imposed by each step of bioinformatics analyses, revealing the large number of steps devoted to the pre-processing of data. In sum, an expert system approach can facilitate exploratory bioinformatic analysis by automating the development of workflows, a task that requires significant domain expertise. https://github.com/jefftc/changlab. jeffrey.t.chang@uth.tmc.edu.

  6. Community Involvement in Enhancing the Global Change Master Directory (GCMD) Controlled Vocabularies (Keywords)

    Science.gov (United States)

    Stevens, T.; Ritz, S.; Aleman, A.; Genazzio, M.; Morahan, M.; Wharton, S.

    2016-01-01

    NASA's Global Change Master Directory (GCMD) develops and expands a hierarchical set of controlled vocabularies (keywords) covering the Earth sciences and associated information (data centers, projects, platforms, instruments, etc.). The purpose of the keywords is to describe Earth science data and services in a consistent and comprehensive manner, allowing for the precise searching of metadata and subsequent retrieval of data and services. The keywords are accessible in a standardized SKOSRDFOWL representation and are used as an authoritative taxonomy, as a source for developing ontologies, and to search and access Earth Science data within online metadata catalogues. The keyword development approach involves: (1) receiving community suggestions, (2) triaging community suggestions, (3) evaluating the keywords against a set of criteria coordinated by the NASA ESDIS Standards Office, and (4) publication/notification of the keyword changes. This approach emphasizes community input, which helps ensure a high quality, normalized, and relevant keyword structure that will evolve with users changing needs. The Keyword Community Forum, which promotes a responsive, open, and transparent processes, is an area where users can discuss keyword topics and make suggestions for new keywords. The formalized approach could potentially be used as a model for keyword development.

  7. One-Match and All-Match Categories for Keywords Matching in Chatbot

    Directory of Open Access Journals (Sweden)

    Abbas S. Lokman

    2010-01-01

    Full Text Available Problem statement: Artificial intelligence chatbot is a technology that makes interactions between men and machines using natural language possible. From literature of chatbots keywords/pattern matching techniques, potential issues for improvement had been discovered. The discovered issues are in the context of keywords arrangement for matching precedence and keywords variety for matching flexibility. Approach: Combining previous techniques/mechanisms with some additional adjustment, new technique to be used for keywords matching process is proposed. Using newly developed chatbot named ViDi (abbreviation for Virtual Diabetes physician which is a chatbot for diabetes education activity as a testing medium, the proposed technique named One-Match and All-Match Categories (OMAMC is being used to test the creation of possible keywords surrounding one sample input sentence. The result for possible keywords created by this technique then being compared to possible keywords created by previous chatbots techniques surrounding the same sample sentence in matching precedence and matching flexibility context. Results: OMAMC technique is found to be improving previous matching techniques in matching precedence and flexibility context. This improvement is seen to be useful for shortening matching time and widening matching flexibility within the chatbots keywords matching process. Conclusion: OMAMC for keywords matching in chatbot is shown to be an improvement over previous techniques in the context of keywords arrangement for matching precedence and keywords variety for matching flexibility.

  8. Deployment Repeatability

    Science.gov (United States)

    2016-04-01

    controlled to great precision, but in a Cubesat , there may be no attitude determination at all. Such a Cubesat might treat sun angle and tumbling rates as...could be sensitive to small differences in motor controller timing. In these cases, the analyst might choose to model the entire deployment path, with...knowledge of the material damage model or motor controller timing precision. On the other hand, if many repeated and environmentally representative

  9. Regulatory bioinformatics for food and drug safety.

    Science.gov (United States)

    Healy, Marion J; Tong, Weida; Ostroff, Stephen; Eichler, Hans-Georg; Patak, Alex; Neuspiel, Margaret; Deluyker, Hubert; Slikker, William

    2016-10-01

    "Regulatory Bioinformatics" strives to develop and implement a standardized and transparent bioinformatic framework to support the implementation of existing and emerging technologies in regulatory decision-making. It has great potential to improve public health through the development and use of clinically important medical products and tools to manage the safety of the food supply. However, the application of regulatory bioinformatics also poses new challenges and requires new knowledge and skill sets. In the latest Global Coalition on Regulatory Science Research (GCRSR) governed conference, Global Summit on Regulatory Science (GSRS2015), regulatory bioinformatics principles were presented with respect to global trends, initiatives and case studies. The discussion revealed that datasets, analytical tools, skills and expertise are rapidly developing, in many cases via large international collaborative consortia. It also revealed that significant research is still required to realize the potential applications of regulatory bioinformatics. While there is significant excitement in the possibilities offered by precision medicine to enhance treatments of serious and/or complex diseases, there is a clear need for further development of mechanisms to securely store, curate and share data, integrate databases, and standardized quality control and data analysis procedures. A greater understanding of the biological significance of the data is also required to fully exploit vast datasets that are becoming available. The application of bioinformatics in the microbiological risk analysis paradigm is delivering clear benefits both for the investigation of food borne pathogens and for decision making on clinically important treatments. It is recognized that regulatory bioinformatics will have many beneficial applications by ensuring high quality data, validated tools and standardized processes, which will help inform the regulatory science community of the requirements

  10. Bioinformatic prediction and functional characterization of human KIAA0100 gene

    Directory of Open Access Journals (Sweden)

    He Cui

    2017-02-01

    Full Text Available Our previous study demonstrated that human KIAA0100 gene was a novel acute monocytic leukemia-associated antigen (MLAA gene. But the functional characterization of human KIAA0100 gene has remained unknown to date. Here, firstly, bioinformatic prediction of human KIAA0100 gene was carried out using online softwares; Secondly, Human KIAA0100 gene expression was downregulated by the clustered regularly interspaced short palindromic repeats (CRISPR/CRISPR-associated (Cas 9 system in U937 cells. Cell proliferation and apoptosis were next evaluated in KIAA0100-knockdown U937 cells. The bioinformatic prediction showed that human KIAA0100 gene was located on 17q11.2, and human KIAA0100 protein was located in the secretory pathway. Besides, human KIAA0100 protein contained a signalpeptide, a transmembrane region, three types of secondary structures (alpha helix, extended strand, and random coil , and four domains from mitochondrial protein 27 (FMP27. The observation on functional characterization of human KIAA0100 gene revealed that its downregulation inhibited cell proliferation, and promoted cell apoptosis in U937 cells. To summarize, these results suggest human KIAA0100 gene possibly comes within mitochondrial genome; moreover, it is a novel anti-apoptotic factor related to carcinogenesis or progression in acute monocytic leukemia, and may be a potential target for immunotherapy against acute monocytic leukemia.

  11. Automatic medical image annotation and keyword-based image retrieval using relevance feedback.

    Science.gov (United States)

    Ko, Byoung Chul; Lee, JiHyeon; Nam, Jae-Yeal

    2012-08-01

    This paper presents novel multiple keywords annotation for medical images, keyword-based medical image retrieval, and relevance feedback method for image retrieval for enhancing image retrieval performance. For semantic keyword annotation, this study proposes a novel medical image classification method combining local wavelet-based center symmetric-local binary patterns with random forests. For keyword-based image retrieval, our retrieval system use the confidence score that is assigned to each annotated keyword by combining probabilities of random forests with predefined body relation graph. To overcome the limitation of keyword-based image retrieval, we combine our image retrieval system with relevance feedback mechanism based on visual feature and pattern classifier. Compared with other annotation and relevance feedback algorithms, the proposed method shows both improved annotation performance and accurate retrieval results.

  12. Predicting financial markets with Google Trends and not so random keywords

    OpenAIRE

    Damien Challet; Ahmed Bel Hadj Ayed

    2013-01-01

    We check the claims that data from Google Trends contain enough data to predict future financial index returns. We first discuss the many subtle (and less subtle) biases that may affect the backtest of a trading strategy, particularly when based on such data. Expectedly, the choice of keywords is crucial: by using an industry-grade backtesting system, we verify that random finance-related keywords do not to contain more exploitable predictive information than random keywords related to illnes...

  13. Bioinformatics for cancer immunotherapy target discovery

    DEFF Research Database (Denmark)

    Olsen, Lars Rønn; Campos, Benito; Barnkob, Mike Stein

    2014-01-01

    cancer immunotherapies has yet to be fulfilled. The insufficient efficacy of existing treatments can be attributed to a number of biological and technical issues. In this review, we detail the current limitations of immunotherapy target selection and design, and review computational methods to streamline...... therapy target discovery in a bioinformatics analysis pipeline. We describe specialized bioinformatics tools and databases for three main bottlenecks in immunotherapy target discovery: the cataloging of potentially antigenic proteins, the identification of potential HLA binders, and the selection epitopes...... and co-targets for single-epitope and multi-epitope strategies. We provide examples of application to the well-known tumor antigen HER2 and suggest bioinformatics methods to ameliorate therapy resistance and ensure efficient and lasting control of tumors....

  14. Adapting bioinformatics curricula for big data.

    Science.gov (United States)

    Greene, Anna C; Giffin, Kristine A; Greene, Casey S; Moore, Jason H

    2016-01-01

    Modern technologies are capable of generating enormous amounts of data that measure complex biological systems. Computational biologists and bioinformatics scientists are increasingly being asked to use these data to reveal key systems-level properties. We review the extent to which curricula are changing in the era of big data. We identify key competencies that scientists dealing with big data are expected to possess across fields, and we use this information to propose courses to meet these growing needs. While bioinformatics programs have traditionally trained students in data-intensive science, we identify areas of particular biological, computational and statistical emphasis important for this era that can be incorporated into existing curricula. For each area, we propose a course structured around these topics, which can be adapted in whole or in parts into existing curricula. In summary, specific challenges associated with big data provide an important opportunity to update existing curricula, but we do not foresee a wholesale redesign of bioinformatics training programs.

  15. HarkMan-A Vocabulary-Independent Keyword Spotter for Spontaneous Chinese Speech

    Institute of Scientific and Technical Information of China (English)

    ZHENG Fang; XU Mingxing; MOU Xiaolong; WU Jian; WU Wenhu; FANG Ditang

    1999-01-01

    In this paper, a novel technique adopted in HarkMan is introduced. HarkMan is a keyword-spotter designed to automatically spot the given words of avocabulary-independent task in unconstrained Chinese telephone speech. The speaking manner and the number of keywords are not limited. This paper focuses on the novel technique which addresses acoustic modeling, keyword spotting network, search strategies, robustness, and rejection.The underlying technologies used in HarkMan given in this paper are useful not only for keyword spotting but also for continuous speech recognition. The system has achieved a figure-of-merit value over 90%.

  16. The GMOD Drupal Bioinformatic Server Framework

    Science.gov (United States)

    Papanicolaou, Alexie; Heckel, David G.

    2010-01-01

    Motivation: Next-generation sequencing technologies have led to the widespread use of -omic applications. As a result, there is now a pronounced bioinformatic bottleneck. The general model organism database (GMOD) tool kit (http://gmod.org) has produced a number of resources aimed at addressing this issue. It lacks, however, a robust online solution that can deploy heterogeneous data and software within a Web content management system (CMS). Results: We present a bioinformatic framework for the Drupal CMS. It consists of three modules. First, GMOD-DBSF is an application programming interface module for the Drupal CMS that simplifies the programming of bioinformatic Drupal modules. Second, the Drupal Bioinformatic Software Bench (biosoftware_bench) allows for a rapid and secure deployment of bioinformatic software. An innovative graphical user interface (GUI) guides both use and administration of the software, including the secure provision of pre-publication datasets. Third, we present genes4all_experiment, which exemplifies how our work supports the wider research community. Conclusion: Given the infrastructure presented here, the Drupal CMS may become a powerful new tool set for bioinformaticians. The GMOD-DBSF base module is an expandable community resource that decreases development time of Drupal modules for bioinformatics. The biosoftware_bench module can already enhance biologists' ability to mine their own data. The genes4all_experiment module has already been responsible for archiving of more than 150 studies of RNAi from Lepidoptera, which were previously unpublished. Availability and implementation: Implemented in PHP and Perl. Freely available under the GNU Public License 2 or later from http://gmod-dbsf.googlecode.com Contact: alexie@butterflybase.org PMID:20971988

  17. The Effects of Keyword Generation and Summary Writing on Teachers’ Judgments of Students’ Comprehension

    NARCIS (Netherlands)

    Engelen, Jan; Camp, Gino

    2016-01-01

    Sixth-graders (N = 282) judged their level of comprehension for six expository texts after writing keywords, summaries, or no additional activity. Afterwards, teachers (N = 14) judged the students’ level of comprehension while seeing the keywords or summaries. Monitoring accuracy was low in all cond

  18. Implementing bioinformatic workflows within the bioextract server

    Science.gov (United States)

    Computational workflows in bioinformatics are becoming increasingly important in the achievement of scientific advances. These workflows typically require the integrated use of multiple, distributed data sources and analytic tools. The BioExtract Server (http://bioextract.org) is a distributed servi...

  19. Bioinformatics in Undergraduate Education: Practical Examples

    Science.gov (United States)

    Boyle, John A.

    2004-01-01

    Bioinformatics has emerged as an important research tool in recent years. The ability to mine large databases for relevant information has become increasingly central to many different aspects of biochemistry and molecular biology. It is important that undergraduates be introduced to the available information and methodologies. We present a…

  20. "Extreme Programming" in a Bioinformatics Class

    Science.gov (United States)

    Kelley, Scott; Alger, Christianna; Deutschman, Douglas

    2009-01-01

    The importance of Bioinformatics tools and methodology in modern biological research underscores the need for robust and effective courses at the college level. This paper describes such a course designed on the principles of cooperative learning based on a computer software industry production model called "Extreme Programming" (EP).…

  1. Bioinformatics: A History of Evolution "In Silico"

    Science.gov (United States)

    Ondrej, Vladan; Dvorak, Petr

    2012-01-01

    Bioinformatics, biological databases, and the worldwide use of computers have accelerated biological research in many fields, such as evolutionary biology. Here, we describe a primer of nucleotide sequence management and the construction of a phylogenetic tree with two examples; the two selected are from completely different groups of organisms:…

  2. Privacy Preserving PCA on Distributed Bioinformatics Datasets

    Science.gov (United States)

    Li, Xin

    2011-01-01

    In recent years, new bioinformatics technologies, such as gene expression microarray, genome-wide association study, proteomics, and metabolomics, have been widely used to simultaneously identify a huge number of human genomic/genetic biomarkers, generate a tremendously large amount of data, and dramatically increase the knowledge on human…

  3. Hardware Acceleration of Bioinformatics Sequence Alignment Applications

    NARCIS (Netherlands)

    Hasan, L.

    2011-01-01

    Biological sequence alignment is an important and challenging task in bioinformatics. Alignment may be defined as an arrangement of two or more DNA or protein sequences to highlight the regions of their similarity. Sequence alignment is used to infer the evolutionary relationship between a set of pr

  4. Mass spectrometry and bioinformatics analysis data

    Directory of Open Access Journals (Sweden)

    Mainak Dutta

    2015-03-01

    Full Text Available 2DE and 2D-DIGE based proteomics analysis of serum from women with endometriosis revealed several proteins to be dysregulated. A complete list of these proteins along with their mass spectrometry data and subsequent bioinformatics analysis are presented here. The data is related to “Investigation of serum proteome alterations in human endometriosis” by Dutta et al. [1].

  5. Bioinformatics in Undergraduate Education: Practical Examples

    Science.gov (United States)

    Boyle, John A.

    2004-01-01

    Bioinformatics has emerged as an important research tool in recent years. The ability to mine large databases for relevant information has become increasingly central to many different aspects of biochemistry and molecular biology. It is important that undergraduates be introduced to the available information and methodologies. We present a…

  6. Privacy Preserving PCA on Distributed Bioinformatics Datasets

    Science.gov (United States)

    Li, Xin

    2011-01-01

    In recent years, new bioinformatics technologies, such as gene expression microarray, genome-wide association study, proteomics, and metabolomics, have been widely used to simultaneously identify a huge number of human genomic/genetic biomarkers, generate a tremendously large amount of data, and dramatically increase the knowledge on human…

  7. Bioinformatics: A History of Evolution "In Silico"

    Science.gov (United States)

    Ondrej, Vladan; Dvorak, Petr

    2012-01-01

    Bioinformatics, biological databases, and the worldwide use of computers have accelerated biological research in many fields, such as evolutionary biology. Here, we describe a primer of nucleotide sequence management and the construction of a phylogenetic tree with two examples; the two selected are from completely different groups of organisms:…

  8. Evolution of web services in bioinformatics

    NARCIS (Netherlands)

    Neerincx, P.B.T.; Leunissen, J.A.M.

    2005-01-01

    Bioinformaticians have developed large collections of tools to make sense of the rapidly growing pool of molecular biological data. Biological systems tend to be complex and in order to understand them, it is often necessary to link many data sets and use more than one tool. Therefore, bioinformatic

  9. SPECIES DATABASES AND THE BIOINFORMATICS REVOLUTION.

    Science.gov (United States)

    Biological databases are having a growth spurt. Much of this results from research in genetics and biodiversity, coupled with fast-paced developments in information technology. The revolution in bioinformatics, defined by Sugden and Pennisi (2000) as the "tools and techniques for...

  10. A Novel Model for Lattice-Based Authorized Searchable Encryption with Special Keyword

    Directory of Open Access Journals (Sweden)

    Fugeng Zeng

    2015-01-01

    Full Text Available Data stored in the cloud servers, keyword search, and access controls are two important capabilities which should be supported. Public-keyword encryption with keyword search (PEKS and attribute based encryption (ABE are corresponding solutions. Meanwhile, as we step into postquantum era, pairing related assumption is fragile. Lattice is an ideal choice for building secure encryption scheme against quantum attack. Based on this, we propose the first mathematical model for lattice-based authorized searchable encryption. Data owners can sort the ciphertext by specific keywords such as time; data users satisfying the access control hand the trapdoor generated with the keyword to the cloud sever; the cloud sever sends back the corresponding ciphertext. The security of our schemes is based on the worst-case hardness on lattices, called learning with errors (LWE assumption. In addition, our scheme achieves attribute-hiding, which could protect the sensitive information of data user.

  11. Keyword Searches in Data-Centric XML Documents Using Tree Partitioning

    Institute of Scientific and Technical Information of China (English)

    LI Guoliang; FENG Jianhua; ZHOU Lizhu

    2009-01-01

    This paper presents an effective keyword search method for data-centric extensive markup language (XML) documents.The method divides an XML document into compact connected integral subtrees,called self-integral trees (Si-Trees),to capture the structural information in the XML document.The Si-Trees are generated based on a schema guide.Meaningful self-integral trees (MSI-Trees) are identified,which contain all or some of the input keywords for the keyword search in the XML documents.Indexing is used to accelerate the retrieval of MSI-Trees related to the input keywords.The MSI-Trees are ranked to identify the top-k results with the highest ranks.Extensive tests demonstrate that this method costs 10-100 ms to answer a keyword query,and outperforms existing approaches by 1-2 orders of magnitude.

  12. CoPub: a literature-based keyword enrichment tool for microarray data analysis

    Science.gov (United States)

    Frijters, Raoul; Heupers, Bart; van Beek, Pieter; Bouwhuis, Maurice; van Schaik, René; de Vlieg, Jacob; Polman, Jan; Alkema, Wynand

    2008-01-01

    Medline is a rich information source, from which links between genes and keywords describing biological processes, pathways, drugs, pathologies and diseases can be extracted. We developed a publicly available tool called CoPub that uses the information in the Medline database for the biological interpretation of microarray data. CoPub allows batch input of multiple human, mouse or rat genes and produces lists of keywords from several biomedical thesauri that are significantly correlated with the set of input genes. These lists link to Medline abstracts in which the co-occurring input genes and correlated keywords are highlighted. Furthermore, CoPub can graphically visualize differentially expressed genes and over-represented keywords in a network, providing detailed insight in the relationships between genes and keywords, and revealing the most influential genes as highly connected hubs. CoPub is freely accessible at http://services.nbic.nl/cgi-bin/copub/CoPub.pl. PMID:18442992

  13. Quantitative analysis of the evolution of novelty in cinema through crowdsourced keywords

    CERN Document Server

    Sreenivasan, Sameet

    2013-01-01

    The generation of novelty is central to any creative endeavor. Novelty generation and the relationship between novelty and individual hedonic value have long been subjects of study in social psychology. However, few studies have utilized large-scale datasets to quantitatively investigate these issues. Here we consider the domain of American cinema and explore these questions using a database of films spanning a 70 year period. We use crowdsourced keywords from the Internet Movie Database as a window into the contents of films, and prescribe novelty scores for each film based on occurrence probabilities of individual keywords and keyword-pairs. These scores provide revealing insights into the dynamics of novelty in cinema. We investigate how novelty influences the revenue generated by a film, and find a statistically significant relationship that resembles the Wundt-Berlyne curve. We also study the statistics of keyword occurrence and the aggregate distribution of keywords over a 100 year period.

  14. Agile parallel bioinformatics workflow management using Pwrake

    Directory of Open Access Journals (Sweden)

    Tanaka Masahiro

    2011-09-01

    Full Text Available Abstract Background In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error. Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. Findings We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Conclusions Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows

  15. Navigating the changing learning landscape: perspective from bioinformatics.ca.

    Science.gov (United States)

    Brazas, Michelle D; Ouellette, B F Francis

    2013-09-01

    With the advent of YouTube channels in bioinformatics, open platforms for problem solving in bioinformatics, active web forums in computing analyses and online resources for learning to code or use a bioinformatics tool, the more traditional continuing education bioinformatics training programs have had to adapt. Bioinformatics training programs that solely rely on traditional didactic methods are being superseded by these newer resources. Yet such face-to-face instruction is still invaluable in the learning continuum. Bioinformatics.ca, which hosts the Canadian Bioinformatics Workshops, has blended more traditional learning styles with current online and social learning styles. Here we share our growing experiences over the past 12 years and look toward what the future holds for bioinformatics training programs.

  16. Bioinformatic tools and guideline for PCR primer design | Abd ...

    African Journals Online (AJOL)

    Bioinformatic tools and guideline for PCR primer design. ... AFRICAN JOURNALS ONLINE (AJOL) · Journals · Advanced Search · USING AJOL · RESOURCES ... Bioinformatics has become an essential tool not only for basic research but also ...

  17. Component-Based Approach for Educating Students in Bioinformatics

    Science.gov (United States)

    Poe, D.; Venkatraman, N.; Hansen, C.; Singh, G.

    2009-01-01

    There is an increasing need for an effective method of teaching bioinformatics. Increased progress and availability of computer-based tools for educating students have led to the implementation of a computer-based system for teaching bioinformatics as described in this paper. Bioinformatics is a recent, hybrid field of study combining elements of…

  18. Component-Based Approach for Educating Students in Bioinformatics

    Science.gov (United States)

    Poe, D.; Venkatraman, N.; Hansen, C.; Singh, G.

    2009-01-01

    There is an increasing need for an effective method of teaching bioinformatics. Increased progress and availability of computer-based tools for educating students have led to the implementation of a computer-based system for teaching bioinformatics as described in this paper. Bioinformatics is a recent, hybrid field of study combining elements of…

  19. Bioinformatics and systems biology research update from the 15(th) International Conference on Bioinformatics (InCoB2016).

    Science.gov (United States)

    Schönbach, Christian; Verma, Chandra; Bond, Peter J; Ranganathan, Shoba

    2016-12-22

    The International Conference on Bioinformatics (InCoB) has been publishing peer-reviewed conference papers in BMC Bioinformatics since 2006. Of the 44 articles accepted for publication in supplement issues of BMC Bioinformatics, BMC Genomics, BMC Medical Genomics and BMC Systems Biology, 24 articles with a bioinformatics or systems biology focus are reviewed in this editorial. InCoB2017 is scheduled to be held in Shenzen, China, September 20-22, 2017.

  20. Title, Description, and Subject are the Most Important Metadata Fields for Keyword Discoverability

    Directory of Open Access Journals (Sweden)

    Laura Costello

    2016-09-01

    Full Text Available A Review of: Yang, L. (2016. Metadata effectiveness in internet discovery: An analysis of digital collection metadata elements and internet search engine keywords. College & Research Libraries, 77(1, 7-19. http://doi.org/10.5860/crl.77.1.7 Objective – To determine which metadata elements best facilitate discovery of digital collections. Design – Case study. Setting – A public research university serving over 32,000 graduate and undergraduate students in the Southwestern United States of America. Subjects – A sample of 22,559 keyword searches leading to the institution’s digital repository between August 1, 2013, and July 31, 2014. Methods – The author used Google Analytics to analyze 73,341 visits to the institution’s digital repository. He determined that 22,559 of these visits were due to keyword searches. Using Random Integer Generator, the author identified a random sample of 378 keyword searches. The author then matched the keywords with the Dublin Core and VRA Core metadata elements on the landing page in the digital repository to determine which metadata field had drawn the keyword searcher to that particular page. Many of these keywords matched to more than one metadata field, so the author also analyzed the metadata elements that generated unique keyword hits and those fields that were frequently matched together. Main Results – Title was the most matched metadata field with 279 matched keywords from searches. Description and Subject were also significant fields with 208 and 79 matches respectively. Slightly more than half of the results, 195 keywords, matched the institutional repository in one field only. Both Title and Description had significant match rates both independently and in conjunction with other elements, but Subject keywords were the sole match in only three of the sampled cases. Conclusion – The Dublin Core elements of Title, Description, and Subject were the most frequently matched fields in keyword

  1. The biological aspects of physiological anthropology with reference to its five keywords.

    Science.gov (United States)

    Iwanaga, Koichi

    2005-05-01

    The methodology of physiological anthropology has been defined in the capacity of an independent academic field by five keywords: environmental adaptability, technological adaptability, physiological polymorphism, whole-body coordination and functional potentiality, clearly suggesting the direction of approach to human beings in the field of physiological anthropology. Recently, these keywords have attracted a great deal of attention from physiological anthropologists in Japan. Physiological anthropology is based on a biological framework. From the viewpoint of biology, it is essential to discuss the biological function of human behavior. In this brief conceptual manuscript, the biological aspects of physiological anthropology are discussed in relation to the five keywords.

  2. A web services choreography scenario for interoperating bioinformatics applications

    Directory of Open Access Journals (Sweden)

    Cheung David W

    2004-03-01

    Full Text Available Abstract Background Very often genome-wide data analysis requires the interoperation of multiple databases and analytic tools. A large number of genome databases and bioinformatics applications are available through the web, but it is difficult to automate interoperation because: 1 the platforms on which the applications run are heterogeneous, 2 their web interface is not machine-friendly, 3 they use a non-standard format for data input and output, 4 they do not exploit standards to define application interface and message exchange, and 5 existing protocols for remote messaging are often not firewall-friendly. To overcome these issues, web services have emerged as a standard XML-based model for message exchange between heterogeneous applications. Web services engines have been developed to manage the configuration and execution of a web services workflow. Results To demonstrate the benefit of using web services over traditional web interfaces, we compare the two implementations of HAPI, a gene expression analysis utility developed by the University of California San Diego (UCSD that allows visual characterization of groups or clusters of genes based on the biomedical literature. This utility takes a set of microarray spot IDs as input and outputs a hierarchy of MeSH Keywords that correlates to the input and is grouped by Medical Subject Heading (MeSH category. While the HTML output is easy for humans to visualize, it is difficult for computer applications to interpret semantically. To facilitate the capability of machine processing, we have created a workflow of three web services that replicates the HAPI functionality. These web services use document-style messages, which means that messages are encoded in an XML-based format. We compared three approaches to the implementation of an XML-based workflow: a hard coded Java application, Collaxa BPEL Server and Taverna Workbench. The Java program functions as a web services engine and interoperates

  3. A web services choreography scenario for interoperating bioinformatics applications

    Science.gov (United States)

    de Knikker, Remko; Guo, Youjun; Li, Jin-long; Kwan, Albert KH; Yip, Kevin Y; Cheung, David W; Cheung, Kei-Hoi

    2004-01-01

    Background Very often genome-wide data analysis requires the interoperation of multiple databases and analytic tools. A large number of genome databases and bioinformatics applications are available through the web, but it is difficult to automate interoperation because: 1) the platforms on which the applications run are heterogeneous, 2) their web interface is not machine-friendly, 3) they use a non-standard format for data input and output, 4) they do not exploit standards to define application interface and message exchange, and 5) existing protocols for remote messaging are often not firewall-friendly. To overcome these issues, web services have emerged as a standard XML-based model for message exchange between heterogeneous applications. Web services engines have been developed to manage the configuration and execution of a web services workflow. Results To demonstrate the benefit of using web services over traditional web interfaces, we compare the two implementations of HAPI, a gene expression analysis utility developed by the University of California San Diego (UCSD) that allows visual characterization of groups or clusters of genes based on the biomedical literature. This utility takes a set of microarray spot IDs as input and outputs a hierarchy of MeSH Keywords that correlates to the input and is grouped by Medical Subject Heading (MeSH) category. While the HTML output is easy for humans to visualize, it is difficult for computer applications to interpret semantically. To facilitate the capability of machine processing, we have created a workflow of three web services that replicates the HAPI functionality. These web services use document-style messages, which means that messages are encoded in an XML-based format. We compared three approaches to the implementation of an XML-based workflow: a hard coded Java application, Collaxa BPEL Server and Taverna Workbench. The Java program functions as a web services engine and interoperates with these web

  4. Bioinformatics analyses for signal transduction networks

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    Research in signaling networks contributes to a deeper understanding of organism living activities. With the development of experimental methods in the signal transduction field, more and more mechanisms of signaling pathways have been discovered. This paper introduces such popular bioin-formatics analysis methods for signaling networks as the common mechanism of signaling pathways and database resource on the Internet, summerizes the methods of analyzing the structural properties of networks, including structural Motif finding and automated pathways generation, and discusses the modeling and simulation of signaling networks in detail, as well as the research situation and tendency in this area. Now the investigation of signal transduction is developing from small-scale experiments to large-scale network analysis, and dynamic simulation of networks is closer to the real system. With the investigation going deeper than ever, the bioinformatics analysis of signal transduction would have immense space for development and application.

  5. Bioinformatics in New Generation Flavivirus Vaccines

    Directory of Open Access Journals (Sweden)

    Penelope Koraka

    2010-01-01

    Full Text Available Flavivirus infections are the most prevalent arthropod-borne infections world wide, often causing severe disease especially among children, the elderly, and the immunocompromised. In the absence of effective antiviral treatment, prevention through vaccination would greatly reduce morbidity and mortality associated with flavivirus infections. Despite the success of the empirically developed vaccines against yellow fever virus, Japanese encephalitis virus and tick-borne encephalitis virus, there is an increasing need for a more rational design and development of safe and effective vaccines. Several bioinformatic tools are available to support such rational vaccine design. In doing so, several parameters have to be taken into account, such as safety for the target population, overall immunogenicity of the candidate vaccine, and efficacy and longevity of the immune responses triggered. Examples of how bio-informatics is applied to assist in the rational design and improvements of vaccines, particularly flavivirus vaccines, are presented and discussed.

  6. Bioinformatics for saffron (Crocus sativus L. improvement

    Directory of Open Access Journals (Sweden)

    Ghulam A. Parray

    2009-02-01

    Full Text Available Saffron (Crocus sativus L. is a sterile triploid plant and belongs to the Iridaceae (Liliales, Monocots. Its genome is of relatively large size and is poorly characterized. Bioinformatics can play an enormous technical role in the sequence-level structural characterization of saffron genomic DNA. Bioinformatics tools can also help in appreciating the extent of diversity of various geographic or genetic groups of cultivated saffron to infer relationships between groups and accessions. The characterization of the transcriptome of saffron stigmas is the most vital for throwing light on the molecular basis of flavor, color biogenesis, genomic organization and biology of gynoecium of saffron. The information derived can be utilized for constructing biological pathways involved in the biosynthesis of principal components of saffron i.e., crocin, crocetin, safranal, picrocrocin and safchiA

  7. Discovery and Classification of Bioinformatics Web Services

    Energy Technology Data Exchange (ETDEWEB)

    Rocco, D; Critchlow, T

    2002-09-02

    The transition of the World Wide Web from a paradigm of static Web pages to one of dynamic Web services provides new and exciting opportunities for bioinformatics with respect to data dissemination, transformation, and integration. However, the rapid growth of bioinformatics services, coupled with non-standardized interfaces, diminish the potential that these Web services offer. To face this challenge, we examine the notion of a Web service class that defines the functionality provided by a collection of interfaces. These descriptions are an integral part of a larger framework that can be used to discover, classify, and wrapWeb services automatically. We discuss how this framework can be used in the context of the proliferation of sites offering BLAST sequence alignment services for specialized data sets.

  8. Bioinformatics Approaches for Human Gut Microbiome Research

    Directory of Open Access Journals (Sweden)

    Zhijun Zheng

    2016-07-01

    Full Text Available The human microbiome has received much attention because many studies have reported that the human gut microbiome is associated with several diseases. The very large datasets that are produced by these kinds of studies means that bioinformatics approaches are crucial for their analysis. Here, we systematically reviewed bioinformatics tools that are commonly used in microbiome research, including a typical pipeline and software for sequence alignment, abundance profiling, enterotype determination, taxonomic diversity, identifying differentially abundant species/genes, gene cataloging, and functional analyses. We also summarized the algorithms and methods used to define metagenomic species and co-abundance gene groups to expand our understanding of unclassified and poorly understood gut microbes that are undocumented in the current genome databases. Additionally, we examined the methods used to identify metagenomic biomarkers based on the gut microbiome, which might help to expand the knowledge and approaches for disease detection and monitoring.

  9. Bioinformatics Training: A Review of Challenges, Actions and Support Requirements

    DEFF Research Database (Denmark)

    Schneider, M.V.; Watson, J.; Attwood, T.;

    2010-01-01

    As bioinformatics becomes increasingly central to research in the molecular life sciences, the need to train non-bioinformaticians to make the most of bioinformatics resources is growing. Here, we review the key challenges and pitfalls to providing effective training for users of bioinformatics...... services, and discuss successful training strategies shared by a diverse set of bioinformatics trainers. We also identify steps that trainers in bioinformatics could take together to advance the state of the art in current training practices. The ideas presented in this article derive from the first...

  10. VLSI Microsystem for Rapid Bioinformatic Pattern Recognition

    Science.gov (United States)

    Fang, Wai-Chi; Lue, Jaw-Chyng

    2009-01-01

    A system comprising very-large-scale integrated (VLSI) circuits is being developed as a means of bioinformatics-oriented analysis and recognition of patterns of fluorescence generated in a microarray in an advanced, highly miniaturized, portable genetic-expression-assay instrument. Such an instrument implements an on-chip combination of polymerase chain reactions and electrochemical transduction for amplification and detection of deoxyribonucleic acid (DNA).

  11. The growing need for microservices in bioinformatics

    Directory of Open Access Journals (Sweden)

    Christopher L Williams

    2016-01-01

    Full Text Available Objective: Within the information technology (IT industry, best practices and standards are constantly evolving and being refined. In contrast, computer technology utilized within the healthcare industry often evolves at a glacial pace, with reduced opportunities for justified innovation. Although the use of timely technology refreshes within an enterprise′s overall technology stack can be costly, thoughtful adoption of select technologies with a demonstrated return on investment can be very effective in increasing productivity and at the same time, reducing the burden of maintenance often associated with older and legacy systems. In this brief technical communication, we introduce the concept of microservices as applied to the ecosystem of data analysis pipelines. Microservice architecture is a framework for dividing complex systems into easily managed parts. Each individual service is limited in functional scope, thereby conferring a higher measure of functional isolation and reliability to the collective solution. Moreover, maintenance challenges are greatly simplified by virtue of the reduced architectural complexity of each constitutive module. This fact notwithstanding, rendered overall solutions utilizing a microservices-based approach provide equal or greater levels of functionality as compared to conventional programming approaches. Bioinformatics, with its ever-increasing demand for performance and new testing algorithms, is the perfect use-case for such a solution. Moreover, if promulgated within the greater development community as an open-source solution, such an approach holds potential to be transformative to current bioinformatics software development. Context: Bioinformatics relies on nimble IT framework which can adapt to changing requirements. Aims: To present a well-established software design and deployment strategy as a solution for current challenges within bioinformatics Conclusions: Use of the microservices framework

  12. The growing need for microservices in bioinformatics.

    Science.gov (United States)

    Williams, Christopher L; Sica, Jeffrey C; Killen, Robert T; Balis, Ulysses G J

    2016-01-01

    Within the information technology (IT) industry, best practices and standards are constantly evolving and being refined. In contrast, computer technology utilized within the healthcare industry often evolves at a glacial pace, with reduced opportunities for justified innovation. Although the use of timely technology refreshes within an enterprise's overall technology stack can be costly, thoughtful adoption of select technologies with a demonstrated return on investment can be very effective in increasing productivity and at the same time, reducing the burden of maintenance often associated with older and legacy systems. In this brief technical communication, we introduce the concept of microservices as applied to the ecosystem of data analysis pipelines. Microservice architecture is a framework for dividing complex systems into easily managed parts. Each individual service is limited in functional scope, thereby conferring a higher measure of functional isolation and reliability to the collective solution. Moreover, maintenance challenges are greatly simplified by virtue of the reduced architectural complexity of each constitutive module. This fact notwithstanding, rendered overall solutions utilizing a microservices-based approach provide equal or greater levels of functionality as compared to conventional programming approaches. Bioinformatics, with its ever-increasing demand for performance and new testing algorithms, is the perfect use-case for such a solution. Moreover, if promulgated within the greater development community as an open-source solution, such an approach holds potential to be transformative to current bioinformatics software development. Bioinformatics relies on nimble IT framework which can adapt to changing requirements. To present a well-established software design and deployment strategy as a solution for current challenges within bioinformatics. Use of the microservices framework is an effective methodology for the fabrication and

  13. [Applied problems of mathematical biology and bioinformatics].

    Science.gov (United States)

    Lakhno, V D

    2011-01-01

    Mathematical biology and bioinformatics represent a new and rapidly progressing line of investigations which emerged in the course of work on the project "Human genome". The main applied problems of these sciences are grug design, patient-specific medicine and nanobioelectronics. It is shown that progress in the technology of mass sequencing of the human genome has set the stage for starting the national program on patient-specific medicine.

  14. Genome bioinformatics of tomato and potato

    OpenAIRE

    E Datema

    2011-01-01

    In the past two decades genome sequencing has developed from a laborious and costly technology employed by large international consortia to a widely used, automated and affordable tool used worldwide by many individual research groups. Genome sequences of many food animals and crop plants have been deciphered and are being exploited for fundamental research and applied to improve their breeding programs. The developments in sequencing technologies have also impacted the associated bioinformat...

  15. ballaxy: web services for structural bioinformatics.

    Science.gov (United States)

    Hildebrandt, Anna Katharina; Stöckel, Daniel; Fischer, Nina M; de la Garza, Luis; Krüger, Jens; Nickels, Stefan; Röttig, Marc; Schärfe, Charlotta; Schumann, Marcel; Thiel, Philipp; Lenhof, Hans-Peter; Kohlbacher, Oliver; Hildebrandt, Andreas

    2015-01-01

    Web-based workflow systems have gained considerable momentum in sequence-oriented bioinformatics. In structural bioinformatics, however, such systems are still relatively rare; while commercial stand-alone workflow applications are common in the pharmaceutical industry, academic researchers often still rely on command-line scripting to glue individual tools together. In this work, we address the problem of building a web-based system for workflows in structural bioinformatics. For the underlying molecular modelling engine, we opted for the BALL framework because of its extensive and well-tested functionality in the field of structural bioinformatics. The large number of molecular data structures and algorithms implemented in BALL allows for elegant and sophisticated development of new approaches in the field. We hence connected the versatile BALL library and its visualization and editing front end BALLView with the Galaxy workflow framework. The result, which we call ballaxy, enables the user to simply and intuitively create sophisticated pipelines for applications in structure-based computational biology, integrated into a standard tool for molecular modelling.  ballaxy consists of three parts: some minor modifications to the Galaxy system, a collection of tools and an integration into the BALL framework and the BALLView application for molecular modelling. Modifications to Galaxy will be submitted to the Galaxy project, and the BALL and BALLView integrations will be integrated in the next major BALL release. After acceptance of the modifications into the Galaxy project, we will publish all ballaxy tools via the Galaxy toolshed. In the meantime, all three components are available from http://www.ball-project.org/ballaxy. Also, docker images for ballaxy are available at https://registry.hub.docker.com/u/anhi/ballaxy/dockerfile/. ballaxy is licensed under the terms of the GPL. © The Author 2014. Published by Oxford University Press. All rights reserved. For

  16. Efficient secure-channel free public key encryption with keyword search for EMRs in cloud storage.

    Science.gov (United States)

    Guo, Lifeng; Yau, Wei-Chuen

    2015-02-01

    Searchable encryption is an important cryptographic primitive that enables privacy-preserving keyword search on encrypted electronic medical records (EMRs) in cloud storage. Efficiency of such searchable encryption in a medical cloud storage system is very crucial as it involves client platforms such as smartphones or tablets that only have constrained computing power and resources. In this paper, we propose an efficient secure-channel free public key encryption with keyword search (SCF-PEKS) scheme that is proven secure in the standard model. We show that our SCF-PEKS scheme is not only secure against chosen keyword and ciphertext attacks (IND-SCF-CKCA), but also secure against keyword guessing attacks (IND-KGA). Furthermore, our proposed scheme is more efficient than other recent SCF-PEKS schemes in the literature.

  17. KEYWORD SEARCH IN TEXT CUBE: FINDING TOP-K RELEVANT CELLS

    Data.gov (United States)

    National Aeronautics and Space Administration — KEYWORD SEARCH IN TEXT CUBE: FINDING TOP-K RELEVANT CELLS BOLIN DING, YINTAO YU, BO ZHAO, CINDY XIDE LIN, JIAWEI HAN, AND CHENGXIANG ZHAI Abstract. We study the...

  18. Structural Consistency: Enabling XML Keyword Search to Eliminate Spurious Results Consistently

    CERN Document Server

    Lee, Ki-Hoon; Han, Wook-Shin; Kim, Min-Soo

    2009-01-01

    XML keyword search is a user-friendly way to query XML data using only keywords. In XML keyword search, to achieve high precision without sacrificing recall, it is important to remove spurious results not intended by the user. Efforts to eliminate spurious results have enjoyed some success by using the concepts of LCA or its variants, SLCA and MLCA. However, existing methods still could find many spurious results. The fundamental cause for the occurrence of spurious results is that the existing methods try to eliminate spurious results locally without global examination of all the query results and, accordingly, some spurious results are not consistently eliminated. In this paper, we propose a novel keyword search method that removes spurious results consistently by exploiting the new concept of structural consistency.

  19. Where am I? Location archetype keyword extraction from urban mobility patterns.

    Science.gov (United States)

    Kostakos, Vassilis; Juntunen, Tomi; Goncalves, Jorge; Hosio, Simo; Ojala, Timo

    2013-01-01

    Can online behaviour be used as a proxy for studying urban mobility? The increasing availability of digital mobility traces has provided new insights into collective human behaviour. Mobility datasets have been shown to be an accurate proxy for daily behaviour and social patterns, and behavioural data from Twitter has been used to predict real world phenomena such as cinema ticket sale volumes, stock prices, and disease outbreaks. In this paper we correlate city-scale urban traffic patterns with online search trends to uncover keywords describing the pedestrian traffic location. By analysing a 3-year mobility dataset we show that our approach, called Location Archetype Keyword Extraction (LAKE), is capable of uncovering semantically relevant keywords for describing a location. Our findings demonstrate an overarching relationship between online and offline collective behaviour, and allow for advancing analysis of community-level behaviour by using online search keywords as a practical behaviour proxy.

  20. Where am I? Location archetype keyword extraction from urban mobility patterns.

    Directory of Open Access Journals (Sweden)

    Vassilis Kostakos

    Full Text Available Can online behaviour be used as a proxy for studying urban mobility? The increasing availability of digital mobility traces has provided new insights into collective human behaviour. Mobility datasets have been shown to be an accurate proxy for daily behaviour and social patterns, and behavioural data from Twitter has been used to predict real world phenomena such as cinema ticket sale volumes, stock prices, and disease outbreaks. In this paper we correlate city-scale urban traffic patterns with online search trends to uncover keywords describing the pedestrian traffic location. By analysing a 3-year mobility dataset we show that our approach, called Location Archetype Keyword Extraction (LAKE, is capable of uncovering semantically relevant keywords for describing a location. Our findings demonstrate an overarching relationship between online and offline collective behaviour, and allow for advancing analysis of community-level behaviour by using online search keywords as a practical behaviour proxy.

  1. Efficient Keyword-Based Search for Top-K Cells in Text Cube

    Data.gov (United States)

    National Aeronautics and Space Administration — Previous studies on supporting free-form keyword queries over RDBMSs provide users with linked-structures (e.g.,a set of joined tuples) that are relevant to a given...

  2. A library-based bioinformatics services program.

    Science.gov (United States)

    Yarfitz, S; Ketchell, D S

    2000-01-01

    Support for molecular biology researchers has been limited to traditional library resources and services in most academic health sciences libraries. The University of Washington Health Sciences Libraries have been providing specialized services to this user community since 1995. The library recruited a Ph.D. biologist to assess the molecular biological information needs of researchers and design strategies to enhance library resources and services. A survey of laboratory research groups identified areas of greatest need and led to the development of a three-pronged program: consultation, education, and resource development. Outcomes of this program include bioinformatics consultation services, library-based and graduate level courses, networking of sequence analysis tools, and a biological research Web site. Bioinformatics clients are drawn from diverse departments and include clinical researchers in need of tools that are not readily available outside of basic sciences laboratories. Evaluation and usage statistics indicate that researchers, regardless of departmental affiliation or position, require support to access molecular biology and genetics resources. Centralizing such services in the library is a natural synergy of interests and enhances the provision of traditional library resources. Successful implementation of a library-based bioinformatics program requires both subject-specific and library and information technology expertise.

  3. A library-based bioinformatics services program*

    Science.gov (United States)

    Yarfitz, Stuart; Ketchell, Debra S.

    2000-01-01

    Support for molecular biology researchers has been limited to traditional library resources and services in most academic health sciences libraries. The University of Washington Health Sciences Libraries have been providing specialized services to this user community since 1995. The library recruited a Ph.D. biologist to assess the molecular biological information needs of researchers and design strategies to enhance library resources and services. A survey of laboratory research groups identified areas of greatest need and led to the development of a three-pronged program: consultation, education, and resource development. Outcomes of this program include bioinformatics consultation services, library-based and graduate level courses, networking of sequence analysis tools, and a biological research Web site. Bioinformatics clients are drawn from diverse departments and include clinical researchers in need of tools that are not readily available outside of basic sciences laboratories. Evaluation and usage statistics indicate that researchers, regardless of departmental affiliation or position, require support to access molecular biology and genetics resources. Centralizing such services in the library is a natural synergy of interests and enhances the provision of traditional library resources. Successful implementation of a library-based bioinformatics program requires both subject-specific and library and information technology expertise. PMID:10658962

  4. Application of bioinformatics in chronobiology research.

    Science.gov (United States)

    Lopes, Robson da Silva; Resende, Nathalia Maria; Honorio-França, Adenilda Cristina; França, Eduardo Luzía

    2013-01-01

    Bioinformatics and other well-established sciences, such as molecular biology, genetics, and biochemistry, provide a scientific approach for the analysis of data generated through "omics" projects that may be used in studies of chronobiology. The results of studies that apply these techniques demonstrate how they significantly aided the understanding of chronobiology. However, bioinformatics tools alone cannot eliminate the need for an understanding of the field of research or the data to be considered, nor can such tools replace analysts and researchers. It is often necessary to conduct an evaluation of the results of a data mining effort to determine the degree of reliability. To this end, familiarity with the field of investigation is necessary. It is evident that the knowledge that has been accumulated through chronobiology and the use of tools derived from bioinformatics has contributed to the recognition and understanding of the patterns and biological rhythms found in living organisms. The current work aims to develop new and important applications in the near future through chronobiology research.

  5. Application of Bioinformatics in Chronobiology Research

    Directory of Open Access Journals (Sweden)

    Robson da Silva Lopes

    2013-01-01

    Full Text Available Bioinformatics and other well-established sciences, such as molecular biology, genetics, and biochemistry, provide a scientific approach for the analysis of data generated through “omics” projects that may be used in studies of chronobiology. The results of studies that apply these techniques demonstrate how they significantly aided the understanding of chronobiology. However, bioinformatics tools alone cannot eliminate the need for an understanding of the field of research or the data to be considered, nor can such tools replace analysts and researchers. It is often necessary to conduct an evaluation of the results of a data mining effort to determine the degree of reliability. To this end, familiarity with the field of investigation is necessary. It is evident that the knowledge that has been accumulated through chronobiology and the use of tools derived from bioinformatics has contributed to the recognition and understanding of the patterns and biological rhythms found in living organisms. The current work aims to develop new and important applications in the near future through chronobiology research.

  6. Bringing Web 2.0 to bioinformatics.

    Science.gov (United States)

    Zhang, Zhang; Cheung, Kei-Hoi; Townsend, Jeffrey P

    2009-01-01

    Enabling deft data integration from numerous, voluminous and heterogeneous data sources is a major bioinformatic challenge. Several approaches have been proposed to address this challenge, including data warehousing and federated databasing. Yet despite the rise of these approaches, integration of data from multiple sources remains problematic and toilsome. These two approaches follow a user-to-computer communication model for data exchange, and do not facilitate a broader concept of data sharing or collaboration among users. In this report, we discuss the potential of Web 2.0 technologies to transcend this model and enhance bioinformatics research. We propose a Web 2.0-based Scientific Social Community (SSC) model for the implementation of these technologies. By establishing a social, collective and collaborative platform for data creation, sharing and integration, we promote a web services-based pipeline featuring web services for computer-to-computer data exchange as users add value. This pipeline aims to simplify data integration and creation, to realize automatic analysis, and to facilitate reuse and sharing of data. SSC can foster collaboration and harness collective intelligence to create and discover new knowledge. In addition to its research potential, we also describe its potential role as an e-learning platform in education. We discuss lessons from information technology, predict the next generation of Web (Web 3.0), and describe its potential impact on the future of bioinformatics studies.

  7. Bioinformatics tools for analysing viral genomic data.

    Science.gov (United States)

    Orton, R J; Gu, Q; Hughes, J; Maabar, M; Modha, S; Vattipally, S B; Wilkie, G S; Davison, A J

    2016-04-01

    The field of viral genomics and bioinformatics is experiencing a strong resurgence due to high-throughput sequencing (HTS) technology, which enables the rapid and cost-effective sequencing and subsequent assembly of large numbers of viral genomes. In addition, the unprecedented power of HTS technologies has enabled the analysis of intra-host viral diversity and quasispecies dynamics in relation to important biological questions on viral transmission, vaccine resistance and host jumping. HTS also enables the rapid identification of both known and potentially new viruses from field and clinical samples, thus adding new tools to the fields of viral discovery and metagenomics. Bioinformatics has been central to the rise of HTS applications because new algorithms and software tools are continually needed to process and analyse the large, complex datasets generated in this rapidly evolving area. In this paper, the authors give a brief overview of the main bioinformatics tools available for viral genomic research, with a particular emphasis on HTS technologies and their main applications. They summarise the major steps in various HTS analyses, starting with quality control of raw reads and encompassing activities ranging from consensus and de novo genome assembly to variant calling and metagenomics, as well as RNA sequencing.

  8. Chapter 16: text mining for translational bioinformatics.

    Directory of Open Access Journals (Sweden)

    K Bretonnel Cohen

    2013-04-01

    Full Text Available Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.

  9. Chapter 16: text mining for translational bioinformatics.

    Science.gov (United States)

    Cohen, K Bretonnel; Hunter, Lawrence E

    2013-04-01

    Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.

  10. Translational bioinformatics in psychoneuroimmunology: methods and applications.

    Science.gov (United States)

    Yan, Qing

    2012-01-01

    Translational bioinformatics plays an indispensable role in transforming psychoneuroimmunology (PNI) into personalized medicine. It provides a powerful method to bridge the gaps between various knowledge domains in PNI and systems biology. Translational bioinformatics methods at various systems levels can facilitate pattern recognition, and expedite and validate the discovery of systemic biomarkers to allow their incorporation into clinical trials and outcome assessments. Analysis of the correlations between genotypes and phenotypes including the behavioral-based profiles will contribute to the transition from the disease-based medicine to human-centered medicine. Translational bioinformatics would also enable the establishment of predictive models for patient responses to diseases, vaccines, and drugs. In PNI research, the development of systems biology models such as those of the neurons would play a critical role. Methods based on data integration, data mining, and knowledge representation are essential elements in building health information systems such as electronic health records and computerized decision support systems. Data integration of genes, pathophysiology, and behaviors are needed for a broad range of PNI studies. Knowledge discovery approaches such as network-based systems biology methods are valuable in studying the cross-talks among pathways in various brain regions involved in disorders such as Alzheimer's disease.

  11. A Model for Personalized Keyword Extraction from Web Pages using Segmentation

    OpenAIRE

    Kuppusamy, K. S.; Aghila, G.

    2012-01-01

    The World Wide Web caters to the needs of billions of users in heterogeneous groups. Each user accessing the World Wide Web might have his / her own specific interest and would expect the web to respond to the specific requirements. The process of making the web to react in a customized manner is achieved through personalization. This paper proposes a novel model for extracting keywords from a web page with personalization being incorporated into it. The keyword extraction problem is approach...

  12. Comparing the Hierarchy of Keywords in On-Line News Portals

    Science.gov (United States)

    Tibély, Gergely; Sousa-Rodrigues, David; Pollner, Péter; Palla, Gergely

    2016-01-01

    Hierarchical organization is prevalent in networks representing a wide range of systems in nature and society. An important example is given by the tag hierarchies extracted from large on-line data repositories such as scientific publication archives, file sharing portals, blogs, on-line news portals, etc. The tagging of the stored objects with informative keywords in such repositories has become very common, and in most cases the tags on a given item are free words chosen by the authors independently. Therefore, the relations among keywords appearing in an on-line data repository are unknown in general. However, in most cases the topics and concepts described by these keywords are forming a latent hierarchy, with the more general topics and categories at the top, and more specialized ones at the bottom. There are several algorithms available for deducing this hierarchy from the statistical features of the keywords. In the present work we apply a recent, co-occurrence-based tag hierarchy extraction method to sets of keywords obtained from four different on-line news portals. The resulting hierarchies show substantial differences not just in the topics rendered as important (being at the top of the hierarchy) or of less interest (categorized low in the hierarchy), but also in the underlying network structure. This reveals discrepancies between the plausible keyword association frameworks in the studied news portals. PMID:27802319

  13. Comparing the Hierarchy of Keywords in On-Line News Portals.

    Science.gov (United States)

    Tibély, Gergely; Sousa-Rodrigues, David; Pollner, Péter; Palla, Gergely

    2016-01-01

    Hierarchical organization is prevalent in networks representing a wide range of systems in nature and society. An important example is given by the tag hierarchies extracted from large on-line data repositories such as scientific publication archives, file sharing portals, blogs, on-line news portals, etc. The tagging of the stored objects with informative keywords in such repositories has become very common, and in most cases the tags on a given item are free words chosen by the authors independently. Therefore, the relations among keywords appearing in an on-line data repository are unknown in general. However, in most cases the topics and concepts described by these keywords are forming a latent hierarchy, with the more general topics and categories at the top, and more specialized ones at the bottom. There are several algorithms available for deducing this hierarchy from the statistical features of the keywords. In the present work we apply a recent, co-occurrence-based tag hierarchy extraction method to sets of keywords obtained from four different on-line news portals. The resulting hierarchies show substantial differences not just in the topics rendered as important (being at the top of the hierarchy) or of less interest (categorized low in the hierarchy), but also in the underlying network structure. This reveals discrepancies between the plausible keyword association frameworks in the studied news portals.

  14. Biologically inspired intelligent decision making: a commentary on the use of artificial neural networks in bioinformatics.

    Science.gov (United States)

    Manning, Timmy; Sleator, Roy D; Walsh, Paul

    2014-01-01

    Artificial neural networks (ANNs) are a class of powerful machine learning models for classification and function approximation which have analogs in nature. An ANN learns to map stimuli to responses through repeated evaluation of exemplars of the mapping. This learning approach results in networks which are recognized for their noise tolerance and ability to generalize meaningful responses for novel stimuli. It is these properties of ANNs which make them appealing for applications to bioinformatics problems where interpretation of data may not always be obvious, and where the domain knowledge required for deductive techniques is incomplete or can cause a combinatorial explosion of rules. In this paper, we provide an introduction to artificial neural network theory and review some interesting recent applications to bioinformatics problems.

  15. Optimizing selection of microsatellite loci from 454 pyrosequencing via post-sequencing bioinformatic analyses.

    Science.gov (United States)

    Fernandez-Silva, Iria; Toonen, Robert J

    2013-01-01

    The comparatively low cost of massive parallel sequencing technology, also known as next-generation sequencing (NGS), has transformed the isolation of microsatellite loci. The most common NGS approach consists of obtaining large amounts of sequence data from genomic DNA or enriched microsatellite libraries, which is then mined for the discovery of microsatellite repeats using bioinformatics analyses. Here, we describe a bioinformatics approach to isolate microsatellite loci, starting from the raw sequence data through a subset of microsatellite primer pairs. The primary difference to previously published approaches includes analyses to select the most accurate sequence data and to eliminate repetitive elements prior to the design of primers. These analyses aim to minimize the testing of primer pairs by identifying the most promising microsatellite loci.

  16. Comparing the hierarchy of keywords in on-line news portals

    CERN Document Server

    Tibély, Gergely; Pollner, Péter; Palla, Gergely

    2016-01-01

    The tagging of on-line content with informative keywords is a widespread phenomenon from scientific article repositories through blogs to on-line news portals. In most of the cases, the tags on a given item are free words chosen by the authors independently. Therefore, relations among keywords in a collection of news items is unknown. However, in most cases the topics and concepts described by these keywords are forming a latent hierarchy, with the more general topics and categories at the top, and more specialised ones at the bottom. Here we apply a recent, cooccurrence-based tag hierarchy extraction method to sets of keywords obtained from four different on-line news portals. The resulting hierarchies show substantial differences not just in the topics rendered as important (being at the top of the hierarchy) or of less interest (categorised low in the hierarchy), but also in the underlying network structure. This reveals discrepancies between the plausible keyword association frameworks in the studied news...

  17. Citation searches are more sensitive than keyword searches to identify studies using specific measurement instruments.

    Science.gov (United States)

    Linder, Suzanne K; Kamath, Geetanjali R; Pratt, Gregory F; Saraykar, Smita S; Volk, Robert J

    2015-04-01

    To compare the effectiveness of two search methods in identifying studies that used the Control Preferences Scale (CPS), a health care decision-making instrument commonly used in clinical settings. We searched the literature using two methods: (1) keyword searching using variations of "Control Preferences Scale" and (2) cited reference searching using two seminal CPS publications. We searched three bibliographic databases [PubMed, Scopus, and Web of Science (WOS)] and one full-text database (Google Scholar). We report precision and sensitivity as measures of effectiveness. Keyword searches in bibliographic databases yielded high average precision (90%) but low average sensitivity (16%). PubMed was the most precise, followed closely by Scopus and WOS. The Google Scholar keyword search had low precision (54%) but provided the highest sensitivity (70%). Cited reference searches in all databases yielded moderate sensitivity (45-54%), but precision ranged from 35% to 75% with Scopus being the most precise. Cited reference searches were more sensitive than keyword searches, making it a more comprehensive strategy to identify all studies that use a particular instrument. Keyword searches provide a quick way of finding some but not all relevant articles. Goals, time, and resources should dictate the combination of which methods and databases are used. Copyright © 2015 Elsevier Inc. All rights reserved.

  18. CLASCN: Candidate Network Selection for Efficient Top-k Keyword Queries over Databases

    Institute of Scientific and Technical Information of China (English)

    Jun Zhang; Zhao-Hui Peng; Shan Wang; Hui-Jing Nie

    2007-01-01

    Keyword Search Over Relational Databases (KSORD) enables casual or Web users easily access databases through free-form keyword queries. Improving the performance of KSORD systems is a critical issue in this area. In this paper, a new approach CLASCN (Classification, Learning And Selection of Candidate Network) is developed to efficiently perform top-k keyword queries in schema-graph-based online KSORD systems. In this approach, the Candidate Networks(CNs) from trained keyword queries or executed user queries are classified and stored in the databases, and top-k results from the CNs are learned for constructing CN Language Models (CNLMs). The CNLMs are used to compute the similarity scores between a new user query and the CNs from the query. The CNs with relatively large similarity score, which are the most promising ones to produce top-k results, will be selected and performed. Currently, CLASCN is only applicable for past queries and New All-keyword-Used (NAU) queries which are frequently submitted queries. Extensive experiments also show the efficiency and effectiveness of our CLASCN approach.

  19. Robust Bioinformatics Recognition with VLSI Biochip Microsystem

    Science.gov (United States)

    Lue, Jaw-Chyng L.; Fang, Wai-Chi

    2006-01-01

    A microsystem architecture for real-time, on-site, robust bioinformatic patterns recognition and analysis has been proposed. This system is compatible with on-chip DNA analysis means such as polymerase chain reaction (PCR)amplification. A corresponding novel artificial neural network (ANN) learning algorithm using new sigmoid-logarithmic transfer function based on error backpropagation (EBP) algorithm is invented. Our results show the trained new ANN can recognize low fluorescence patterns better than the conventional sigmoidal ANN does. A differential logarithmic imaging chip is designed for calculating logarithm of relative intensities of fluorescence signals. The single-rail logarithmic circuit and a prototype ANN chip are designed, fabricated and characterized.

  20. Multiobjective optimization in bioinformatics and computational biology.

    Science.gov (United States)

    Handl, Julia; Kell, Douglas B; Knowles, Joshua

    2007-01-01

    This paper reviews the application of multiobjective optimization in the fields of bioinformatics and computational biology. A survey of existing work, organized by application area, forms the main body of the review, following an introduction to the key concepts in multiobjective optimization. An original contribution of the review is the identification of five distinct "contexts," giving rise to multiple objectives: These are used to explain the reasons behind the use of multiobjective optimization in each application area and also to point the way to potential future uses of the technique.

  1. Translational Bioinformatics:Past, Present, and Future

    Institute of Scientific and Technical Information of China (English)

    Jessica D. Tenenbaum

    2016-01-01

    Though a relatively young discipline, translational bioinformatics (TBI) has become a key component of biomedical research in the era of precision medicine. Development of high-throughput technologies and electronic health records has caused a paradigm shift in both healthcare and biomedical research. Novel tools and methods are required to convert increasingly voluminous datasets into information and actionable knowledge. This review provides a definition and contex-tualization of the term TBI, describes the discipline’s brief history and past accomplishments, as well as current foci, and concludes with predictions of future directions in the field.

  2. Microbial bioinformatics for food safety and production.

    Science.gov (United States)

    Alkema, Wynand; Boekhorst, Jos; Wels, Michiel; van Hijum, Sacha A F T

    2016-03-01

    In the production of fermented foods, microbes play an important role. Optimization of fermentation processes or starter culture production traditionally was a trial-and-error approach inspired by expert knowledge of the fermentation process. Current developments in high-throughput 'omics' technologies allow developing more rational approaches to improve fermentation processes both from the food functionality as well as from the food safety perspective. Here, the authors thematically review typical bioinformatics techniques and approaches to improve various aspects of the microbial production of fermented food products and food safety.

  3. Translational Bioinformatics: Past, Present, and Future

    Directory of Open Access Journals (Sweden)

    Jessica D. Tenenbaum

    2016-02-01

    Full Text Available Though a relatively young discipline, translational bioinformatics (TBI has become a key component of biomedical research in the era of precision medicine. Development of high-throughput technologies and electronic health records has caused a paradigm shift in both healthcare and biomedical research. Novel tools and methods are required to convert increasingly voluminous datasets into information and actionable knowledge. This review provides a definition and contextualization of the term TBI, describes the discipline’s brief history and past accomplishments, as well as current foci, and concludes with predictions of future directions in the field.

  4. Bioinformatics in human health and heredity

    CERN Document Server

    Rao, C R; Sen, Pranab K

    2007-01-01

    The field of statistics not only affects all areas of scientific activity, but also many other matters such as public policy. It is branching rapidly into so many different subjects that a series of handbooks is the only way of comprehensively presenting the various aspects of statistical methodology, applications, and recent developments. The Handbook of Statistics, a series of self-contained reference books. Each volume is devoted to a particular topic in statistics with Volume 28 dealing with bioinformatics. Every chapter is written by prominent workers in the area to which the volume is de

  5. Implementing bioinformatic workflows within the bioextract server.

    Science.gov (United States)

    Lushbough, Carol M; Bergman, Michael K; Lawrence, Carolyn J; Jennewein, Doug; Brendel, Volker

    2008-01-01

    Computational workflows in bioinformatics are becoming increasingly important in the achievement of scientific advances. These workflows typically require the integrated use of multiple, distributed data sources and analytic tools. The BioExtract Server (http://bioextract.org) is a distributed service designed to provide researchers with the web ability to query multiple data sources, save results as searchable data sets, and execute analytic tools. As the researcher works with the system, their tasks are saved in the background. At any time these steps can be saved as a workflow that can then be executed again and/or modified later.

  6. Introducing bioinformatics, the biosciences' genomic revolution

    CERN Document Server

    Zanella, Paolo

    1999-01-01

    The general audience for these lectures is mainly physicists, computer scientists, engineers or the general public wanting to know more about what’s going on in the biosciences. What’s bioinformatics and why is all this fuss being made about it ? What’s this revolution triggered by the human genome project ? Are there any results yet ? What are the problems ? What new avenues of research have been opened up ? What about the technology ? These new developments will be compared with what happened at CERN earlier in its evolution, and it is hoped that the similiraties and contrasts will stimulate new curiosity and provoke new thoughts.

  7. Context and Keyword Extraction in Plain Text Using a Graph Representation

    CERN Document Server

    Chahine, Carlo Abi; Kotowicz, Jean-Philippe; Pécuchet, Jean-Pierre

    2009-01-01

    Document indexation is an essential task achieved by archivists or automatic indexing tools. To retrieve relevant documents to a query, keywords describing this document have to be carefully chosen. Archivists have to find out the right topic of a document before starting to extract the keywords. For an archivist indexing specialized documents, experience plays an important role. But indexing documents on different topics is much harder. This article proposes an innovative method for an indexing support system. This system takes as input an ontology and a plain text document and provides as output contextualized keywords of the document. The method has been evaluated by exploiting Wikipedia's category links as a termino-ontological resources.

  8. EasyKSORD: A Platform of Keyword Search Over Relational Databases

    Science.gov (United States)

    Peng, Zhaohui; Li, Jing; Wang, Shan

    Keyword Search Over Relational Databases (KSORD) enables casual users to use keyword queries (a set of keywords) to search relational databases just like searching the Web, without any knowledge of the database schema or any need of writing SQL queries. Based on our previous work, we design and implement a novel KSORD platform named EasyKSORD for users and system administrators to use and manage different KSORD systems in a novel and simple manner. EasyKSORD supports advanced queries, efficient data-graph-based search engines, multiform result presentations, and system logging and analysis. Through EasyKSORD, users can search relational databases easily and read search results conveniently, and system administrators can easily monitor and analyze the operations of KSORD and manage KSORD systems much better.

  9. Analysis of Science and Mathematics Education Articles Published In Turkey-I: Keywords

    Directory of Open Access Journals (Sweden)

    Enver TATAR

    2008-01-01

    Full Text Available In this study, a descriptive analysis of science and mathematics education articlespublished in Turkey was purposed. The study was carried out based on the keywords ofa total of 680 articles which were published in 26 refereed journals during 2000-2006.As a result of the analysis of the data obtained from the study, the following werefound: (a there were keywords including up to eight words, almost in the form of asentence (b The frequencies of nearly all keywords related to topics in the curriculumof science and mathematics were low c Science and mathematics topics in thecurriculum of primary school were studied less than topics in secondary and universitylevel (d Studies related to the misconceptions in the science education and related toattitude the mathematics education were mostly studied by researchers.

  10. Method of determining of keywords in English texts based on the DKPro Core

    Science.gov (United States)

    Bisikalo, Oleg V.; Wójcik, Waldemar; Yahimovich, Olexand V.; Smailova, Saule

    2016-09-01

    The new method of the keywords determining based on finding the connections between word forms of the English text with the instrumental capabilities of package DKPro Core is suggested in this article. The method, which is illustrated with examples of analysis, aimed at solving problems of efficient processing of text documents - indexing, abstracting, clustering and classification. As a result of theoretical and experimental studies it is found that the developed method found more keywords, specified by the author of the text, compared to analogues and had better quality characteristics. The proposed method of determining of the keywords differs from existing in that it uses additional information about the complex relationships between members of an English sentence.

  11. AN EFFICIENT APPROACH FOR KEYWORD SELECTION; IMPROVING ACCESSIBILITY OF WEB CONTENTS BY GENERAL SEARCH ENGINES

    Directory of Open Access Journals (Sweden)

    H. H. Kian

    2011-11-01

    Full Text Available General search engines often provide low precise results even for detailed queries. So there is a vital needto elicit useful information like keywords for search engines to provide acceptable results for user’s searchqueries. Although many methods have been proposed to show how to extract keywords automatically, allattempt to get a better recall, precision and other criteria which describe how the method has done its jobas an author. This paper presents a new automatic keyword extraction method which improves accessibilityof web content by search engines. The proposed method defines some coefficients determining featuresefficiency and tries to optimize them by using a genetic algorithm. Furthermore, it evaluates candidatekeywords by a function that utilizes the result of search engines. When comparing to the other methods,experiments demonstrate that by using the proposed method, a higher score is achieved from searchengines without losing noticeable recall or precision.

  12. KEYWORD AND IMAGE CONTENT FEATURES FOR IMAGE INDEXING AND RETRIEVAL WITHIN COMPRESSED DOMAIN

    Directory of Open Access Journals (Sweden)

    Irianto .

    2009-01-01

    Full Text Available The central problem of most Content Based Image Retrieval approaches is poor quality in terms of sensitivity (recall and specificity (precision. To overcome this problem, the semantic gap between high-level concepts and low-level features has been acknowledged. In this paper we introduce an approach to reduce the impact of the semantic gap by integrating high-level (semantic and low-level features to improve the quality of Image Retrieval queries. Our experiments have been carried out by applying two hierarchical procedures. The first approach is called keyword-content, and the second content-keyword. Our proposed approaches show better results compared to a single method (keyword or content based in term of recall and precision. The average precision has increased by up to 50%.

  13. A Survey of Scholarly Literature Describing the Field of Bioinformatics Education and Bioinformatics Educational Research

    Science.gov (United States)

    Magana, Alejandra J.; Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari

    2014-01-01

    Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the…

  14. A Survey of Scholarly Literature Describing the Field of Bioinformatics Education and Bioinformatics Educational Research

    Science.gov (United States)

    Magana, Alejandra J.; Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari

    2014-01-01

    Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the…

  15. Whale song analyses using bioinformatics sequence analysis approaches

    Science.gov (United States)

    Chen, Yian A.; Almeida, Jonas S.; Chou, Lien-Siang

    2005-04-01

    Animal songs are frequently analyzed using discrete hierarchical units, such as units, themes and songs. Because animal songs and bio-sequences may be understood as analogous, bioinformatics analysis tools DNA/protein sequence alignment and alignment-free methods are proposed to quantify the theme similarities of the songs of false killer whales recorded off northeast Taiwan. The eighteen themes with discrete units that were identified in an earlier study [Y. A. Chen, masters thesis, University of Charleston, 2001] were compared quantitatively using several distance metrics. These metrics included the scores calculated using the Smith-Waterman algorithm with the repeated procedure; the standardized Euclidian distance and the angle metrics based on word frequencies. The theme classifications based on different metrics were summarized and compared in dendrograms using cluster analyses. The results agree with earlier classifications derived by human observation qualitatively. These methods further quantify the similarities among themes. These methods could be applied to the analyses of other animal songs on a larger scale. For instance, these techniques could be used to investigate song evolution and cultural transmission quantifying the dissimilarities of humpback whale songs across different seasons, years, populations, and geographic regions. [Work supported by SC Sea Grant, and Ilan County Government, Taiwan.

  16. Bioinformatic Identification and Analysis of Extensins in the Plant Kingdom

    Science.gov (United States)

    Liu, Xiao; Wolfe, Richard; Welch, Lonnie R.; Domozych, David S.; Popper, Zoë A.; Showalter, Allan M.

    2016-01-01

    Extensins (EXTs) are a family of plant cell wall hydroxyproline-rich glycoproteins (HRGPs) that are implicated to play important roles in plant growth, development, and defense. Structurally, EXTs are characterized by the repeated occurrence of serine (Ser) followed by three to five prolines (Pro) residues, which are hydroxylated as hydroxyproline (Hyp) and glycosylated. Some EXTs have Tyrosine (Tyr)-X-Tyr (where X can be any amino acid) motifs that are responsible for intramolecular or intermolecular cross-linkings. EXTs can be divided into several classes: classical EXTs, short EXTs, leucine-rich repeat extensins (LRXs), proline-rich extensin-like receptor kinases (PERKs), formin-homolog EXTs (FH EXTs), chimeric EXTs, and long chimeric EXTs. To guide future research on the EXTs and understand evolutionary history of EXTs in the plant kingdom, a bioinformatics study was conducted to identify and classify EXTs from 16 fully sequenced plant genomes, including Ostreococcus lucimarinus, Chlamydomonas reinhardtii, Volvox carteri, Klebsormidium flaccidum, Physcomitrella patens, Selaginella moellendorffii, Pinus taeda, Picea abies, Brachypodium distachyon, Zea mays, Oryza sativa, Glycine max, Medicago truncatula, Brassica rapa, Solanum lycopersicum, and Solanum tuberosum, to supplement data previously obtained from Arabidopsis thaliana and Populus trichocarpa. A total of 758 EXTs were newly identified, including 87 classical EXTs, 97 short EXTs, 61 LRXs, 75 PERKs, 54 FH EXTs, 38 long chimeric EXTs, and 346 other chimeric EXTs. Several notable findings were made: (1) classical EXTs were likely derived after the terrestrialization of plants; (2) LRXs, PERKs, and FHs were derived earlier than classical EXTs; (3) monocots have few classical EXTs; (4) Eudicots have the greatest number of classical EXTs and Tyr-X-Tyr cross-linking motifs are predominantly in classical EXTs; (5) green algae have no classical EXTs but have a number of long chimeric EXTs that are absent in

  17. Bioinformatic Identification and Analysis of Extensins in the Plant Kingdom.

    Directory of Open Access Journals (Sweden)

    Xiao Liu

    Full Text Available Extensins (EXTs are a family of plant cell wall hydroxyproline-rich glycoproteins (HRGPs that are implicated to play important roles in plant growth, development, and defense. Structurally, EXTs are characterized by the repeated occurrence of serine (Ser followed by three to five prolines (Pro residues, which are hydroxylated as hydroxyproline (Hyp and glycosylated. Some EXTs have Tyrosine (Tyr-X-Tyr (where X can be any amino acid motifs that are responsible for intramolecular or intermolecular cross-linkings. EXTs can be divided into several classes: classical EXTs, short EXTs, leucine-rich repeat extensins (LRXs, proline-rich extensin-like receptor kinases (PERKs, formin-homolog EXTs (FH EXTs, chimeric EXTs, and long chimeric EXTs. To guide future research on the EXTs and understand evolutionary history of EXTs in the plant kingdom, a bioinformatics study was conducted to identify and classify EXTs from 16 fully sequenced plant genomes, including Ostreococcus lucimarinus, Chlamydomonas reinhardtii, Volvox carteri, Klebsormidium flaccidum, Physcomitrella patens, Selaginella moellendorffii, Pinus taeda, Picea abies, Brachypodium distachyon, Zea mays, Oryza sativa, Glycine max, Medicago truncatula, Brassica rapa, Solanum lycopersicum, and Solanum tuberosum, to supplement data previously obtained from Arabidopsis thaliana and Populus trichocarpa. A total of 758 EXTs were newly identified, including 87 classical EXTs, 97 short EXTs, 61 LRXs, 75 PERKs, 54 FH EXTs, 38 long chimeric EXTs, and 346 other chimeric EXTs. Several notable findings were made: (1 classical EXTs were likely derived after the terrestrialization of plants; (2 LRXs, PERKs, and FHs were derived earlier than classical EXTs; (3 monocots have few classical EXTs; (4 Eudicots have the greatest number of classical EXTs and Tyr-X-Tyr cross-linking motifs are predominantly in classical EXTs; (5 green algae have no classical EXTs but have a number of long chimeric EXTs that are absent in

  18. Bioinformatics analyses of Shigella CRISPR structure and spacer classification.

    Science.gov (United States)

    Wang, Pengfei; Zhang, Bing; Duan, Guangcai; Wang, Yingfang; Hong, Lijuan; Wang, Linlin; Guo, Xiangjiao; Xi, Yuanlin; Yang, Haiyan

    2016-03-01

    Clustered regularly interspaced short palindromic repeats (CRISPR) are inheritable genetic elements of a variety of archaea and bacteria and indicative of the bacterial ecological adaptation, conferring acquired immunity against invading foreign nucleic acids. Shigella is an important pathogen for anthroponosis. This study aimed to analyze the features of Shigella CRISPR structure and classify the spacers through bioinformatics approach. Among 107 Shigella, 434 CRISPR structure loci were identified with two to seven loci in different strains. CRISPR-Q1, CRISPR-Q4 and CRISPR-Q5 were widely distributed in Shigella strains. Comparison of the first and last repeats of CRISPR1, CRISPR2 and CRISPR3 revealed several base variants and different stem-loop structures. A total of 259 cas genes were found among these 107 Shigella strains. The cas gene deletions were discovered in 88 strains. However, there is one strain that does not contain cas gene. Intact clusters of cas genes were found in 19 strains. From comprehensive analysis of sequence signature and BLAST and CRISPRTarget score, the 708 spacers were classified into three subtypes: Type I, Type II and Type III. Of them, Type I spacer referred to those linked with one gene segment, Type II spacer linked with two or more different gene segments, and Type III spacer undefined. This study examined the diversity of CRISPR/cas system in Shigella strains, demonstrated the main features of CRISPR structure and spacer classification, which provided critical information for elucidation of the mechanisms of spacer formation and exploration of the role the spacers play in the function of the CRISPR/cas system.

  19. Extraction of Keywords Related with Stock Price Change from Bloggers' Hot Topics

    Science.gov (United States)

    Hara, Shinji; Nadamoto, Hironori; Horiuchi, Tadashi

    This paper presents an extraction method of keywords related with actual changes of stock price from bloggers' hot topics. We realized the computer program to collect bloggers' hot topics about stocks and actual stock prices. Then, the important keywords, which have correlation with changes of the stock prices, are selected based on stochastic complexity. We classify information of stock price changes by using text classification method such as Naive Bayes method and decision tree learning. We confirm the effectiveness of our method through the classification experiment.

  20. Phosphoproteomics and bioinformatics analyses of spinal cord proteins in rats with morphine tolerance.

    Directory of Open Access Journals (Sweden)

    Wen-Jinn Liaw

    Full Text Available INTRODUCTION: Morphine is the most effective pain-relieving drug, but it can cause unwanted side effects. Direct neuraxial administration of morphine to spinal cord not only can provide effective, reliable pain relief but also can prevent the development of supraspinal side effects. However, repeated neuraxial administration of morphine may still lead to morphine tolerance. METHODS: To better understand the mechanism that causes morphine tolerance, we induced tolerance in rats at the spinal cord level by giving them twice-daily injections of morphine (20 µg/10 µL for 4 days. We confirmed tolerance by measuring paw withdrawal latencies and maximal possible analgesic effect of morphine on day 5. We then carried out phosphoproteomic analysis to investigate the global phosphorylation of spinal proteins associated with morphine tolerance. Finally, pull-down assays were used to identify phosphorylated types and sites of 14-3-3 proteins, and bioinformatics was applied to predict biological networks impacted by the morphine-regulated proteins. RESULTS: Our proteomics data showed that repeated morphine treatment altered phosphorylation of 10 proteins in the spinal cord. Pull-down assays identified 2 serine/threonine phosphorylated sites in 14-3-3 proteins. Bioinformatics further revealed that morphine impacted on cytoskeletal reorganization, neuroplasticity, protein folding and modulation, signal transduction and biomolecular metabolism. CONCLUSIONS: Repeated morphine administration may affect multiple biological networks by altering protein phosphorylation. These data may provide insight into the mechanism that underlies the development of morphine tolerance.

  1. Bioinformatics for cancer immunology and immunotherapy.

    Science.gov (United States)

    Charoentong, Pornpimol; Angelova, Mihaela; Efremova, Mirjana; Gallasch, Ralf; Hackl, Hubert; Galon, Jerome; Trajanoski, Zlatko

    2012-11-01

    Recent mechanistic insights obtained from preclinical studies and the approval of the first immunotherapies has motivated increasing number of academic investigators and pharmaceutical/biotech companies to further elucidate the role of immunity in tumor pathogenesis and to reconsider the role of immunotherapy. Additionally, technological advances (e.g., next-generation sequencing) are providing unprecedented opportunities to draw a comprehensive picture of the tumor genomics landscape and ultimately enable individualized treatment. However, the increasing complexity of the generated data and the plethora of bioinformatics methods and tools pose considerable challenges to both tumor immunologists and clinical oncologists. In this review, we describe current concepts and future challenges for the management and analysis of data for cancer immunology and immunotherapy. We first highlight publicly available databases with specific focus on cancer immunology including databases for somatic mutations and epitope databases. We then give an overview of the bioinformatics methods for the analysis of next-generation sequencing data (whole-genome and exome sequencing), epitope prediction tools as well as methods for integrative data analysis and network modeling. Mathematical models are powerful tools that can predict and explain important patterns in the genetic and clinical progression of cancer. Therefore, a survey of mathematical models for tumor evolution and tumor-immune cell interaction is included. Finally, we discuss future challenges for individualized immunotherapy and suggest how a combined computational/experimental approaches can lead to new insights into the molecular mechanisms of cancer, improved diagnosis, and prognosis of the disease and pinpoint novel therapeutic targets.

  2. Repeat-until-success quantum repeaters

    Science.gov (United States)

    Bruschi, David Edward; Barlow, Thomas M.; Razavi, Mohsen; Beige, Almut

    2014-09-01

    We propose a repeat-until-success protocol to improve the performance of probabilistic quantum repeaters. Conventionally, these rely on passive static linear-optics elements and photodetectors to perform Bell-state measurements (BSMs) with a maximum success rate of 50%. This is a strong impediment for entanglement swapping between distant quantum memories. Every time a BSM fails, entanglement needs to be redistributed between the corresponding memories in the repeater link. The key ingredients of our scheme are repeatable BSMs. Under ideal conditions, these turn probabilistic quantum repeaters into deterministic ones. Under realistic conditions, our protocol too might fail. However, using additional threshold detectors now allows us to improve the entanglement generation rate by almost orders of magnitude, at a nominal distance of 1000 km, compared to schemes that rely on conventional BSMs. This improvement is sufficient to make the performance of our scheme comparable to the expected performance of some deterministic quantum repeaters.

  3. Metadata Effectiveness in Internet Discovery: An Analysis of Digital Collection Metadata Elements and Internet Search Engine Keywords

    Science.gov (United States)

    Yang, Le

    2016-01-01

    This study analyzed digital item metadata and keywords from Internet search engines to learn what metadata elements actually facilitate discovery of digital collections through Internet keyword searching and how significantly each metadata element affects the discovery of items in a digital repository. The study found that keywords from Internet…

  4. An Algorithm to Self-Extract Secondary Keywords and Their Combinations Based on Abstracts Collected using Primary Keywords from Online Digital Libraries

    CERN Document Server

    Meghanathan, Natarajan; Isokpehi, Raphael; Cohly, Hari; 10.5121/ijcsit.2010.2307

    2010-01-01

    The high-level contribution of this paper is the development and implementation of an algorithm to selfextract secondary keywords and their combinations (combo words) based on abstracts collected using standard primary keywords for research areas from reputed online digital libraries like IEEE Explore, PubMed Central and etc. Given a collection of N abstracts, we arbitrarily select M abstracts (M<< N; M/N as low as 0.15) and parse each of the M abstracts, word by word. Upon the first-time appearance of a word, we query the user for classifying the word into an Accept-List or non-Accept-List. The effectiveness of the training approach is evaluated by measuring the percentage of words for which the user is queried for classification when the algorithm parses through the words of each of the M abstracts. We observed that as M grows larger, the percentage of words for which the user is queried for classification reduces drastically. After the list of acceptable words is built by parsing the M abstracts, we ...

  5. Fostering Learners' Metacognitive Skills of Keyword Reformulation in Image Seeking by Location-Based Hierarchical Navigation

    Science.gov (United States)

    Liu, Ming-Chi; Huang, Yueh-Min; Kinshuk; Wen, Dunwei

    2013-01-01

    It is critical that students learn how to retrieve useful information in hypermedia environments, a task that is often especially difficult when it comes to image retrieval, as little text feedback is given that allows them to reformulate keywords they need to use. This situation may make students feel disorientated while attempting image…

  6. The Effect of Keyword Method on Vocabulary Retention of Senior High School EFL Learners in Iran

    Science.gov (United States)

    Davoudi, Mohammad; Yousefi, Dina

    2016-01-01

    This study aimed at investigating the effect of keyword method, as one of the mnemonic strategies, on vocabulary retention of Iranian senior high school EFL learners. Following a quasi-experimental design, the study used thirty eight (n = 38) female senior high school students in grade four from two intact classes at a public high school. The…

  7. Consensus-based Approach for Keyword Extraction from Urban Events Collections

    Directory of Open Access Journals (Sweden)

    Ana OLIVEIRA ALVES

    2016-05-01

    Full Text Available Automatic keyword extraction (AKE from textual sources took a valuable step towards harnessing the problem of efficient scanning of large document collections. Particularly in the context of urban mobility, where the most relevant events in the city are advertised on-line, it becomes difficult to know exactly what is happening in a place.In this paper we tackle this problem by extracting a set of keywords from different kinds of textual sources, focusing on the urban events context. We propose an ensemble of automatic keyword extraction systems KEA (Key-phrase Extraction Algorithm and KUSCO (Knowledge Unsupervised Search for instantiating Concepts on lightweight Ontologies and Conditional Random Fields (CRF.Unlike KEA and KUSCO which are well-known tools for automatic keyword extraction, CRF needs further pre-processing. Therefore, a tool for handling AKE from the documents using CRF is developed. The architecture for the AKE ensemble system is designed and efficient integration of component applications is presented in which a consensus between such classifiers is achieved. Finally, we empirically show that our AKE ensemble system significantly succeeds on baseline sources and urban events collections.

  8. Universal Keyword Classifier on Public Key Based Encrypted Multikeyword Fuzzy Search in Public Cloud.

    Science.gov (United States)

    Munisamy, Shyamala Devi; Chokkalingam, Arun

    2015-01-01

    Cloud computing has pioneered the emerging world by manifesting itself as a service through internet and facilitates third party infrastructure and applications. While customers have no visibility on how their data is stored on service provider's premises, it offers greater benefits in lowering infrastructure costs and delivering more flexibility and simplicity in managing private data. The opportunity to use cloud services on pay-per-use basis provides comfort for private data owners in managing costs and data. With the pervasive usage of internet, the focus has now shifted towards effective data utilization on the cloud without compromising security concerns. In the pursuit of increasing data utilization on public cloud storage, the key is to make effective data access through several fuzzy searching techniques. In this paper, we have discussed the existing fuzzy searching techniques and focused on reducing the searching time on the cloud storage server for effective data utilization. Our proposed Asymmetric Classifier Multikeyword Fuzzy Search method provides classifier search server that creates universal keyword classifier for the multiple keyword request which greatly reduces the searching time by learning the search path pattern for all the keywords in the fuzzy keyword set. The objective of using BTree fuzzy searchable index is to resolve typos and representation inconsistencies and also to facilitate effective data utilization.

  9. Statistics of co-occurring keywords in confined text messages on Twitter

    DEFF Research Database (Denmark)

    Mathiesen, Joachim; Angheluta, L.; Jensen, M. H.

    2014-01-01

    Online social media such as the micro-blogging site Twitter has become a rich source of real-time data on online human behaviors. Here we analyze the occurrence and co-occurrence frequency of keywords in user posts on Twitter. From the occurrence rate of major international brand names, we provid...

  10. Restrictions of physical activity participation in older adults with disability: employing keyword network analysis

    Science.gov (United States)

    Koo, Kyo-Man; Kim, Chun-Jong; Park, Chae-Hee; Byeun, Jung-Kyun; Seo, Geon-Woo

    2016-01-01

    Older adults with disability might have been increasing due to the rapid aging of society. Many studies showed that physical activity is an essential part for improving quality of life in later lives. Regular physical activity is an efficient means that has roles of primary prevention and secondary prevention. However, there were few studies regarding older adults with disability and physical activity participation. The purpose of this current study was to investigate restriction factors to regularly participate older adults with disability in physical activity by employing keyword network analysis. Two hundred twenty-nine older adults with disability who were over 65 including aging with disability and disability with aging in type of physical disability and brain lesions defined by disabled person welfare law partook in the open questionnaire assessing barriers to participate in physical activity. The results showed that the keyword the most often used was ‘Traffic’ which was total of 21 times (3.47%) and the same proportion as in the ‘personal’ and ‘economical’. Exercise was considered the most central keyword for participating in physical activity and keywords such as facility, physical activity, disabled, program, transportation, gym, discomfort, opportunity, and leisure activity were associated with exercise. In conclusion, it is necessary to educate older persons with disability about a true meaning of physical activity and providing more physical activity opportunities and decreasing inconvenience should be systematically structured in Korea. PMID:27656637

  11. Generating Keywords Improves Metacomprehension and Self-Regulation in Elementary and Middle School Children

    Science.gov (United States)

    de Bruin, Anique B. H.; Thiede, Keith W.; Camp, Gino; Redford, Joshua

    2011-01-01

    The ability to monitor understanding of texts, usually referred to as metacomprehension accuracy, is typically quite poor in adult learners; however, recently interventions have been developed to improve accuracy. In two experiments, we evaluated whether generating delayed keywords prior to judging comprehension improved metacomprehension accuracy…

  12. Universal Keyword Classifier on Public Key Based Encrypted Multikeyword Fuzzy Search in Public Cloud

    Directory of Open Access Journals (Sweden)

    Shyamala Devi Munisamy

    2015-01-01

    Full Text Available Cloud computing has pioneered the emerging world by manifesting itself as a service through internet and facilitates third party infrastructure and applications. While customers have no visibility on how their data is stored on service provider’s premises, it offers greater benefits in lowering infrastructure costs and delivering more flexibility and simplicity in managing private data. The opportunity to use cloud services on pay-per-use basis provides comfort for private data owners in managing costs and data. With the pervasive usage of internet, the focus has now shifted towards effective data utilization on the cloud without compromising security concerns. In the pursuit of increasing data utilization on public cloud storage, the key is to make effective data access through several fuzzy searching techniques. In this paper, we have discussed the existing fuzzy searching techniques and focused on reducing the searching time on the cloud storage server for effective data utilization. Our proposed Asymmetric Classifier Multikeyword Fuzzy Search method provides classifier search server that creates universal keyword classifier for the multiple keyword request which greatly reduces the searching time by learning the search path pattern for all the keywords in the fuzzy keyword set. The objective of using BTree fuzzy searchable index is to resolve typos and representation inconsistencies and also to facilitate effective data utilization.

  13. Restrictions of physical activity participation in older adults with disability: employing keyword network analysis.

    Science.gov (United States)

    Koo, Kyo-Man; Kim, Chun-Jong; Park, Chae-Hee; Byeun, Jung-Kyun; Seo, Geon-Woo

    2016-08-01

    Older adults with disability might have been increasing due to the rapid aging of society. Many studies showed that physical activity is an essential part for improving quality of life in later lives. Regular physical activity is an efficient means that has roles of primary prevention and secondary prevention. However, there were few studies regarding older adults with disability and physical activity participation. The purpose of this current study was to investigate restriction factors to regularly participate older adults with disability in physical activity by employing keyword network analysis. Two hundred twenty-nine older adults with disability who were over 65 including aging with disability and disability with aging in type of physical disability and brain lesions defined by disabled person welfare law partook in the open questionnaire assessing barriers to participate in physical activity. The results showed that the keyword the most often used was 'Traffic' which was total of 21 times (3.47%) and the same proportion as in the 'personal' and 'economical'. Exercise was considered the most central keyword for participating in physical activity and keywords such as facility, physical activity, disabled, program, transportation, gym, discomfort, opportunity, and leisure activity were associated with exercise. In conclusion, it is necessary to educate older persons with disability about a true meaning of physical activity and providing more physical activity opportunities and decreasing inconvenience should be systematically structured in Korea.

  14. Discrete Strategies in Keyword Auctions and Their Inefficiency for Locally Aware Bidders

    NARCIS (Netherlands)

    V. Markakis (Vangelis); O. Telelis (Orestis); A. Saberi

    2010-01-01

    htmlabstractWe study formally discrete bidding strategies for the game induced by the Generalized Second Price keyword auction mechanism. Such strategies have seen experimental evaluation in the recent literature as parts of iterative best response procedures, which have been shown not to converge.

  15. The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction.

    Science.gov (United States)

    Najafi, Elham; Darooneh, Amir H

    2015-01-01

    A text can be considered as a one dimensional array of words. The locations of each word type in this array form a fractal pattern with certain fractal dimension. We observe that important words responsible for conveying the meaning of a text have dimensions considerably different from one, while the fractal dimensions of unimportant words are close to one. We introduce an index quantifying the importance of the words in a given text using their fractal dimensions and then ranking them according to their importance. This index measures the difference between the fractal pattern of a word in the original text relative to a shuffled version. Because the shuffled text is meaningless (i.e., words have no importance), the difference between the original and shuffled text can be used to ascertain degree of fractality. The degree of fractality may be used for automatic keyword detection. Words with the degree of fractality higher than a threshold value are assumed to be the retrieved keywords of the text. We measure the efficiency of our method for keywords extraction, making a comparison between our proposed method and two other well-known methods of automatic keyword extraction.

  16. Statistics of co-occurring keywords in confined text messages on Twitter

    DEFF Research Database (Denmark)

    Mathiesen, Joachim; Angheluta, L.; Jensen, M. H.

    2014-01-01

    Online social media such as the micro-blogging site Twitter has become a rich source of real-time data on online human behaviors. Here we analyze the occurrence and co-occurrence frequency of keywords in user posts on Twitter. From the occurrence rate of major international brand names, we provide...

  17. Implementing Keyword and Question Generation Approaches in Teaching EFL Summary Writing

    Science.gov (United States)

    Chou, Mu-hsuan

    2012-01-01

    Summary writing has been considered an important aspect of academic writing. However, writing summaries can be a challenging task for the majority of English as a Foreign Language (EFL) learners. Research into teaching summary writing has focused on different processes to teach EFL learners. The present study adopted two methods--keyword and…

  18. Leveraging Bibliographic RDF Data for Keyword Prediction with Association Rule Mining (ARM

    Directory of Open Access Journals (Sweden)

    Nidhi Kushwaha

    2014-11-01

    Full Text Available The Semantic Web (Web 3.0 has been proposed as an efficient way to access the increasingly large amounts of data on the internet. The Linked Open Data Cloud project at present is the major effort to implement the concepts of the Seamtic Web, addressing the problems of inhomogeneity and large data volumes. RKBExplorer is one of many repositories implementing Open Data and contains considerable bibliographic information. This paper discusses bibliographic data, an important part of cloud data. Effective searching of bibiographic datasets can be a challenge as many of the papers residing in these databases do not have sufficient or comprehensive keyword information. In these cases however, a search engine based on RKBExplorer is only able to use information to retrieve papers based on author names and title of papers without keywords. In this paper we attempt to address this problem by using the data mining algorithm Association Rule Mining (ARM to develop keywords based on features retrieved from Resource Description Framework (RDF data within a bibliographic citation. We have demonstrate the applicability of this method for predicting missing keywords for bibliographic entries in several typical databases. −−−−− Paper presented at 1st International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2014 March 27-28, 2014. Organized by VIT University, Chennai, India. Sponsored by BRNS.

  19. Grapho-Phonemic Enrichment Strengthens Keyword Analogy Instruction for Struggling Young Readers

    Science.gov (United States)

    Ehri, Linnea C.; Satlow, Eric; Gaskins, Irene

    2009-01-01

    First, second, and third graders (N = 102) who had completed from 1 to 3 years of literacy instruction in other schools and had experienced failure entered a private school for struggling readers and received instruction in either of 2 types of systematic phonics programs over a 4-year period. One group received a keyword analogy method (KEY) that…

  20. The Mnemonic Keyword Method: Effects on the Vocabulary Acquisition and Retention

    Science.gov (United States)

    Siriganjanavong, Vanlee

    2013-01-01

    The objectives of the study were to introduce the technique called "Mnemonic Keyword Method" ("MKM") to low proficiency English learners, and to explore the effectiveness of the method in terms of short-term and long-term retention. The sample was purposefully drawn from one intact class consisting of 44 students. They were…

  1. BioZone Exploting Source-Capability Information for Integrated Access to Multiple Bioinformatics Data Sources

    Energy Technology Data Exchange (ETDEWEB)

    Liu, L; Buttler, D; Paques, H; Pu, C; Critchlow

    2002-01-28

    Modern Bioinformatics data sources are widely used by molecular biologists for homology searching and new drug discovery. User-friendly and yet responsive access is one of the most desirable properties for integrated access to the rapidly growing, heterogeneous, and distributed collection of data sources. The increasing volume and diversity of digital information related to bioinformatics (such as genomes, protein sequences, protein structures, etc.) have led to a growing problem that conventional data management systems do not have, namely finding which information sources out of many candidate choices are the most relevant and most accessible to answer a given user query. We refer to this problem as the query routing problem. In this paper we introduce the notation and issues of query routing, and present a practical solution for designing a scalable query routing system based on multi-level progressive pruning strategies. The key idea is to create and maintain source-capability profiles independently, and to provide algorithms that can dynamically discover relevant information sources for a given query through the smart use of source profiles. Compared to the keyword-based indexing techniques adopted in most of the search engines and software, our approach offers fine-granularity of interest matching, thus it is more powerful and effective for handling queries with complex conditions.

  2. Cloning and Bioinformatics Analysis of ZmERECTA-LIKE1 and Construction of Plant Expression Vector

    Institute of Scientific and Technical Information of China (English)

    Yihong JI; Jinbao PAN; Min LU; Jun HAN; Zhangjie NAN; Qingpeng SUN

    2016-01-01

    Objective] This study was conducted to clone and analyze ERECTA-LIKE1 gene in Zea mays by PCR and bioinfor-matics methods and to construct plant expression vector pCambia3301-zmERECTA-LIKE1. [Method] zmERECTA-LIKE1 (zmERL1) gene was obtained using RT-PCR, and physical-chemical properties were analyzed by bioinformatics methods, including domains, transmembrane regions, N-Glycosylation potential sites phosphorylation sites, and etc. [Result] Bioinformatics results showed that zmERL1 gene was 2 169 bp, which encoded a protein consisting of 722 amino acids, 11 N-glycosylation potential sites and 42 kinase specific phosphorylation sites. According to CDD2.23 and TMHMM Server v. 2.0 software, there were leucine-rich repeats, a PKC domain and a transmembrane region in this protein. The theoretical pI and molecular weight of zmERL1 encoded protein was 6.20 and 79 184.8 using Compute PI/Mw tool. Furthermore, we constructed the plant expression vector pCambia3301-zmERECTA-LIKE1 by subcloning zmERL1 gene into pCambia3301 instead of GUS. [Conclusion] The results provide a theoretical basis for the application of zmERL1 gene in future study.

  3. Bioinformatics methods for identifying candidate disease genes

    Directory of Open Access Journals (Sweden)

    van Driel Marc A

    2006-06-01

    Full Text Available Abstract With the explosion in genomic and functional genomics information, methods for disease gene identification are rapidly evolving. Databases are now essential to the process of selecting candidate disease genes. Combining positional information with disease characteristics and functional information is the usual strategy by which candidate disease genes are selected. Enrichment for candidate disease genes, however, depends on the skills of the operating researcher. Over the past few years, a number of bioinformatics methods that enrich for the most likely candidate disease genes have been developed. Such in silico prioritisation methods may further improve by completion of datasets, by development of standardised ontologies across databases and species and, ultimately, by the integration of different strategies.

  4. Wrapping and interoperating bioinformatics resources using CORBA.

    Science.gov (United States)

    Stevens, R; Miller, C

    2000-02-01

    Bioinformaticians seeking to provide services to working biologists are faced with the twin problems of distribution and diversity of resources. Bioinformatics databases are distributed around the world and exist in many kinds of storage forms, platforms and access paradigms. To provide adequate services to biologists, these distributed and diverse resources have to interoperate seamlessly within single applications. The Common Object Request Broker Architecture (CORBA) offers one technical solution to these problems. The key component of CORBA is its use of object orientation as an intermediate form to translate between different representations. This paper concentrates on an explanation of object orientation and how it can be used to overcome the problems of distribution and diversity by describing the interfaces between objects.

  5. Using Cluster Computers in Bioinformatics Research

    Institute of Scientific and Technical Information of China (English)

    周澄; 郁松年

    2003-01-01

    In the last ten years, high-performance and massively parallel computing technology comes into a high speed developing phase and is used in all fields. The cluster computer systems are also being widely used for their low cost and high performance. In bioinformatics research, solving a problem with computer usually takes hours even days. To speed up research, high-performance cluster computers are considered to be a good platform. Moving into the new MPP (massively parallel processing) system, the original algorithm should be parallelized in a proper way. In this paper, a new parallelizing method of useful sequence alignment algorithm (Smith-Waterman) is designed based on its optimizing algorithm already exists. The result is gratifying.

  6. Storage, data management, and retrieval in bioinformatics

    Science.gov (United States)

    Wong, Stephen T. C.; Patwardhan, Anil

    2001-12-01

    The evolution of biology into a large-scale quantitative molecular science has been paralleled by concomitant advances in computer storage systems, processing power, and data-analysis algorithms. The application of computer technologies to molecular biology data has given rise to a new system-based approach to biological research. Bioinformatics addresses problems related to the storage, retrieval and analysis of information about biological structure, sequence and function. Its goals include the development of integrated storage systems and analysis tools to interpret molecular biology data in a biologically meaningful manner in normal and disease processes and in efforts for drug discovery. This paper reviews recent developments in data management, storage, and retrieval that are central to the effective use of structural and functional genomics in fulfilling these goals.

  7. Applied bioinformatics: Genome annotation and transcriptome analysis

    DEFF Research Database (Denmark)

    Gupta, Vikas

    and dhurrin, which have not previously been characterized in blueberries. There are more than 44,500 spider species with distinct habitats and unique characteristics. Spiders are masters of producing silk webs to catch prey and using venom to neutralize. The exploration of the genetics behind these properties...... japonicus (Lotus), Vaccinium corymbosum (blueberry), Stegodyphus mimosarum (spider) and Trifolium occidentale (clover). From a bioinformatics data analysis perspective, my work can be divided into three parts; genome annotation, small RNA, and gene expression analysis. Lotus is a legume of significant...... has just started. We have assembled and annotated the first two spider genomes to facilitate our understanding of spiders at the molecular level. The need for analyzing the large and increasing amount of sequencing data has increased the demand for efficient, user friendly, and broadly applicable...

  8. The European Bioinformatics Institute's data resources.

    Science.gov (United States)

    Brooksbank, Catherine; Cameron, Graham; Thornton, Janet

    2010-01-01

    The wide uptake of next-generation sequencing and other ultra-high throughput technologies by life scientists with a diverse range of interests, spanning fundamental biological research, medicine, agriculture and environmental science, has led to unprecedented growth in the amount of data generated. It has also put the need for unrestricted access to biological data at the centre of biology. The European Bioinformatics Institute (EMBL-EBI) is unique in Europe and is one of only two organisations worldwide providing access to a comprehensive, integrated set of these collections. Here, we describe how the EMBL-EBI's biomolecular databases are evolving to cope with increasing levels of submission, a growing and diversifying user base, and the demand for new types of data. All of the resources described here can be accessed from the EMBL-EBI website: http://www.ebi.ac.uk.

  9. Bioinformatics analysis of estrogen-responsive genes

    Science.gov (United States)

    Handel, Adam E.

    2016-01-01

    Estrogen is a steroid hormone that plays critical roles in a myriad of intracellular pathways. The expression of many genes is regulated through the steroid hormone receptors ESR1 and ESR2. These bind to DNA and modulate the expression of target genes. Identification of estrogen target genes is greatly facilitated by the use of transcriptomic methods, such as RNA-seq and expression microarrays, and chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq). Combining transcriptomic and ChIP-seq data enables a distinction to be drawn between direct and indirect estrogen target genes. This chapter will discuss some methods of identifying estrogen target genes that do not require any expertise in programming languages or complex bioinformatics. PMID:26585125

  10. Academic Training - Bioinformatics: Decoding the Genome

    CERN Multimedia

    Chris Jones

    2006-01-01

    ACADEMIC TRAINING LECTURE SERIES 27, 28 February 1, 2, 3 March 2006 from 11:00 to 12:00 - Auditorium, bldg. 500 Decoding the Genome A special series of 5 lectures on: Recent extraordinary advances in the life sciences arising through new detection technologies and bioinformatics The past five years have seen an extraordinary change in the information and tools available in the life sciences. The sequencing of the human genome, the discovery that we possess far fewer genes than foreseen, the measurement of the tiny changes in the genomes that differentiate us, the sequencing of the genomes of many pathogens that lead to diseases such as malaria are all examples of completely new information that is now available in the quest for improved healthcare. New tools have allowed similar strides in the discovery of the associated protein structures, providing invaluable information for those searching for new drugs. New DNA microarray chips permit simultaneous measurement of the state of expression of tens...

  11. Evaluating an Inquiry-Based Bioinformatics Course Using Q Methodology

    Science.gov (United States)

    Ramlo, Susan E.; McConnell, David; Duan, Zhong-Hui; Moore, Francisco B.

    2008-01-01

    Faculty at a Midwestern metropolitan public university recently developed a course on bioinformatics that emphasized collaboration and inquiry. Bioinformatics, essentially the application of computational tools to biological data, is inherently interdisciplinary. Thus part of the challenge of creating this course was serving the needs and…

  12. Assessment of a Bioinformatics across Life Science Curricula Initiative

    Science.gov (United States)

    Howard, David R.; Miskowski, Jennifer A.; Grunwald, Sandra K.; Abler, Michael L.

    2007-01-01

    At the University of Wisconsin-La Crosse, we have undertaken a program to integrate the study of bioinformatics across the undergraduate life science curricula. Our efforts have included incorporating bioinformatics exercises into courses in the biology, microbiology, and chemistry departments, as well as coordinating the efforts of faculty within…

  13. Generative Topic Modeling in Image Data Mining and Bioinformatics Studies

    Science.gov (United States)

    Chen, Xin

    2012-01-01

    Probabilistic topic models have been developed for applications in various domains such as text mining, information retrieval and computer vision and bioinformatics domain. In this thesis, we focus on developing novel probabilistic topic models for image mining and bioinformatics studies. Specifically, a probabilistic topic-connection (PTC) model…

  14. The bioinformatics of next generation sequencing: a meeting report

    Institute of Scientific and Technical Information of China (English)

    Ravi Shankar

    2011-01-01

    @@ The Studio of Computational Biology & Bioinformatics (SCBB), IHBT, CSIR,Palampur, India organized one of the very first national workshop funded by DBT,Govt.of India, on the Bioinformatics issues associated with next generation sequencing approaches.The course structure was designed by SCBB, IHBT.The workshop took place in the IHBT premise on 17 and 18 June 2010.

  15. The 2015 Bioinformatics Open Source Conference (BOSC 2015.

    Directory of Open Access Journals (Sweden)

    Nomi L Harris

    2016-02-01

    Full Text Available The Bioinformatics Open Source Conference (BOSC is organized by the Open Bioinformatics Foundation (OBF, a nonprofit group dedicated to promoting the practice and philosophy of open source software development and open science within the biological research community. Since its inception in 2000, BOSC has provided bioinformatics developers with a forum for communicating the results of their latest efforts to the wider research community. BOSC offers a focused environment for developers and users to interact and share ideas about standards; software development practices; practical techniques for solving bioinformatics problems; and approaches that promote open science and sharing of data, results, and software. BOSC is run as a two-day special interest group (SIG before the annual Intelligent Systems in Molecular Biology (ISMB conference. BOSC 2015 took place in Dublin, Ireland, and was attended by over 125 people, about half of whom were first-time attendees. Session topics included "Data Science;" "Standards and Interoperability;" "Open Science and Reproducibility;" "Translational Bioinformatics;" "Visualization;" and "Bioinformatics Open Source Project Updates". In addition to two keynote talks and dozens of shorter talks chosen from submitted abstracts, BOSC 2015 included a panel, titled "Open Source, Open Door: Increasing Diversity in the Bioinformatics Open Source Community," that provided an opportunity for open discussion about ways to increase the diversity of participants in BOSC in particular, and in open source bioinformatics in general. The complete program of BOSC 2015 is available online at http://www.open-bio.org/wiki/BOSC_2015_Schedule.

  16. Assessment of a Bioinformatics across Life Science Curricula Initiative

    Science.gov (United States)

    Howard, David R.; Miskowski, Jennifer A.; Grunwald, Sandra K.; Abler, Michael L.

    2007-01-01

    At the University of Wisconsin-La Crosse, we have undertaken a program to integrate the study of bioinformatics across the undergraduate life science curricula. Our efforts have included incorporating bioinformatics exercises into courses in the biology, microbiology, and chemistry departments, as well as coordinating the efforts of faculty within…

  17. Generative Topic Modeling in Image Data Mining and Bioinformatics Studies

    Science.gov (United States)

    Chen, Xin

    2012-01-01

    Probabilistic topic models have been developed for applications in various domains such as text mining, information retrieval and computer vision and bioinformatics domain. In this thesis, we focus on developing novel probabilistic topic models for image mining and bioinformatics studies. Specifically, a probabilistic topic-connection (PTC) model…

  18. Evaluating an Inquiry-Based Bioinformatics Course Using Q Methodology

    Science.gov (United States)

    Ramlo, Susan E.; McConnell, David; Duan, Zhong-Hui; Moore, Francisco B.

    2008-01-01

    Faculty at a Midwestern metropolitan public university recently developed a course on bioinformatics that emphasized collaboration and inquiry. Bioinformatics, essentially the application of computational tools to biological data, is inherently interdisciplinary. Thus part of the challenge of creating this course was serving the needs and…

  19. Rust-Bio: a fast and safe bioinformatics library

    NARCIS (Netherlands)

    J. Köster (Johannes)

    2015-01-01

    textabstractWe present Rust-Bio, the first general purpose bioinformatics library for the innovative Rust programming language. Rust-Bio leverages the unique combination of speed, memory safety and high-level syntax offered by Rust to provide a fast and safe set of bioinformatics algorithms and data

  20. Is there room for ethics within bioinformatics education?

    Science.gov (United States)

    Taneri, Bahar

    2011-07-01

    When bioinformatics education is considered, several issues are addressed. At the undergraduate level, the main issue revolves around conveying information from two main and different fields: biology and computer science. At the graduate level, the main issue is bridging the gap between biology students and computer science students. However, there is an educational component that is rarely addressed within the context of bioinformatics education: the ethics component. Here, a different perspective is provided on bioinformatics education, and the current status of ethics is analyzed within the existing bioinformatics programs. Analysis of the existing undergraduate and graduate programs, in both Europe and the United States, reveals the minimal attention given to ethics within bioinformatics education. Given that bioinformaticians speedily and effectively shape the biomedical sciences and hence their implications for society, here redesigning of the bioinformatics curricula is suggested in order to integrate the necessary ethics education. Unique ethical problems awaiting bioinformaticians and bioinformatics ethics as a separate field of study are discussed. In addition, a template for an "Ethics in Bioinformatics" course is provided.

  1. 4273π: bioinformatics education on low cost ARM hardware.

    Science.gov (United States)

    Barker, Daniel; Ferrier, David Ek; Holland, Peter Wh; Mitchell, John Bo; Plaisier, Heleen; Ritchie, Michael G; Smart, Steven D

    2013-08-12

    Teaching bioinformatics at universities is complicated by typical computer classroom settings. As well as running software locally and online, students should gain experience of systems administration. For a future career in biology or bioinformatics, the installation of software is a useful skill. We propose that this may be taught by running the course on GNU/Linux running on inexpensive Raspberry Pi computer hardware, for which students may be granted full administrator access. We release 4273π, an operating system image for Raspberry Pi based on Raspbian Linux. This includes minor customisations for classroom use and includes our Open Access bioinformatics course, 4273π Bioinformatics for Biologists. This is based on the final-year undergraduate module BL4273, run on Raspberry Pi computers at the University of St Andrews, Semester 1, academic year 2012-2013. 4273π is a means to teach bioinformatics, including systems administration tasks, to undergraduates at low cost.

  2. Keyword Extraction from a Document using Word Co-occurrence Statistical Information

    Science.gov (United States)

    Matsuo, Yutaka; Ishizuka, Mitsuru

    We present a new keyword extraction algorithm that applies to a single document without using a large corpus. Frequent terms are extracted first, then a set of co-occurrence between each term and the frequent terms, i.e., occurrences in the same sentences, is generated. The distribution of co-occurrence shows the importance of a term in the document as follows. If the probability distribution of co-occurrence between term a and the frequent terms is biased to a particular subset of the frequent terms, then term a is likely to be a keyword. The degree of the biases of the distribution is measured by χ²-measure. We show our algorithm performs well for indexing technical papers.

  3. A Model for Personalized Keyword Extraction from Web Pages using Segmentation

    Science.gov (United States)

    Kuppusamy, K. S.; Aghila, G.

    2012-03-01

    The World Wide Web caters to the needs of billions of users in heterogeneous groups. Each user accessing the World Wide Web might have his / her own specific interest and would expect the web to respond to the specific requirements. The process of making the web to react in a customized manner is achieved through personalization. This paper proposes a novel model for extracting keywords from a web page with personalization being incorporated into it. The keyword extraction problem is approached with the help of web page segmentation which facilitates in making the problem simpler and solving it effectively. The proposed model is implemented as a prototype and the experiments conducted on it empirically validate the model's efficiency.

  4. A Model for Personalized Keyword Extraction from Web Pages using Segmentation

    CERN Document Server

    Kuppusamy, K S; 10.5120/5682-7720

    2012-01-01

    The World Wide Web caters to the needs of billions of users in heterogeneous groups. Each user accessing the World Wide Web might have his / her own specific interest and would expect the web to respond to the specific requirements. The process of making the web to react in a customized manner is achieved through personalization. This paper proposes a novel model for extracting keywords from a web page with personalization being incorporated into it. The keyword extraction problem is approached with the help of web page segmentation which facilitates in making the problem simpler and solving it effectively. The proposed model is implemented as a prototype and the experiments conducted on it empirically validate the model's efficiency.

  5. A Corpus-based Study on Keywords in The Glass Menagerie

    Institute of Scientific and Technical Information of China (English)

    ZHANG Qi-mei

    2015-01-01

    Recently, Corpus-based approach has been an important method. Based on the analysis about some Papers on corpus and corpus-based approach in literary texts, this thesis takes the work of American famous writer Tennessee Williams The Glass Menagerie as the research object. Guided by the stylistic theoretical framework put forward by Leech and Short, the study con⁃ducts text analysis with Wordsmith and Antconc3.3.1 for studying the keywords of the novel.

  6. Supporting Keyword Search for Image Retrieval with Integration of Probabilistic Annotation

    Directory of Open Access Journals (Sweden)

    Tie Hua Zhou

    2015-05-01

    Full Text Available The ever-increasing quantities of digital photo resources are annotated with enriching vocabularies to form semantic annotations. Photo-sharing social networks have boosted the need for efficient and intuitive querying to respond to user requirements in large-scale image collections. In order to help users formulate efficient and effective image retrieval, we present a novel integration of a probabilistic model based on keyword query architecture that models the probability distribution of image annotations: allowing users to obtain satisfactory results from image retrieval via the integration of multiple annotations. We focus on the annotation integration step in order to specify the meaning of each image annotation, thus leading to the most representative annotations of the intent of a keyword search. For this demonstration, we show how a probabilistic model has been integrated to semantic annotations to allow users to intuitively define explicit and precise keyword queries in order to retrieve satisfactory image results distributed in heterogeneous large data sources. Our experiments on SBU (collected by Stony Brook University database show that (i our integrated annotation contains higher quality representatives and semantic matches; and (ii the results indicating annotation integration can indeed improve image search result quality.

  7. Efficient Multi-keyword Ranked Search over Outsourced Cloud Data based on Homomorphic Encryption

    Directory of Open Access Journals (Sweden)

    Nie Mengxi

    2016-01-01

    Full Text Available With the development of cloud computing, more and more data owners are motivated to outsource their data to the cloud server for great flexibility and less saving expenditure. Because the security of outsourced data must be guaranteed, some encryption methods should be used which obsoletes traditional data utilization based on plaintext, e.g. keyword search. To solve the search of encrypted data, some schemes were proposed to solve the search of encrypted data, e.g. top-k single or multiple keywords retrieval. However, the efficiency of these proposed schemes is not high enough to be impractical in the cloud computing. In this paper, we propose a new scheme based on homomorphic encryption to solve this challenging problem of privacy-preserving efficient multi-keyword ranked search over outsourced cloud data. In our scheme, the inner product is adopted to measure the relevance scores and the technique of relevance feedback is used to reflect the search preference of the data users. Security analysis shows that the proposed scheme can meet strict privacy requirements for such a secure cloud data utilization system. Performance evaluation demonstrates that the proposed scheme can achieve low overhead on both computation and communication.

  8. Using Gazetteers to Extract Sets of Keywords from Free-Flowing Texts

    Directory of Open Access Journals (Sweden)

    Adam Crymble

    2015-12-01

    Full Text Available If you have a copy of a text in electronic format stored on your computer, it is relatively easy to keyword search for a single term. Often you can do this by using the built-in search features in your favourite text editor. However, scholars are increasingly needing to find instances of many terms within a text or texts. For example, a scholar may want to use a gazetteer to extract all mentions of English placenames within a collection of texts so that those places can later be plotted on a map. Alternatively, they may want to extract all male given names, all pronouns, stop words, or any other set of words. Using those same built-in search features to achieve this more complex goal is time consuming and clunky. This lesson will teach you how to use Python to extract a set of keywords very quickly and systematically from a set of texts. It is expected that once you have completed this lesson, you will be able to generalise the skills to extract custom sets of keywords from any set of locally saved files.

  9. Bibliometric investigation on preventive medicine in North Korea: a coauthor and keyword network analysis.

    Science.gov (United States)

    Jung, Minsoo

    2013-01-01

    This study examined the 2 preventive medicine journals in North Korea by using coauthor and keyword network analysis on the basis of medical informatics and bibliometrics. Used were the Journal of Chosun Medicine (JCM) and the Journal of Preventive Medicine (JPM) (from the first volume of 1997 to the fourth volume of 2006) as data. Extracted were 1734 coauthors from 1104 articles and 1567 coauthors from 1172 articles, respectively. Huge single components were extracted in the coauthor analysis, which indicated a tendency toward structuralization. However, the 2 journals differed in that JPM showed a relative tendency toward specialization, whereas JCM showed one toward generalization. Seventeen and 33 keywords were extracted from each journal in the keyword analysis; JCM mainly concerned pathological research, whereas JPM mainly concerned virus and basic medicine studies that were based on infection and immunity. In contrast to South Korea, North Korea has developed Juche medicine, which came from self-reliance ideology and gratuitous medical service. According to the present study, their ideology was embodied by the discovery of bacteria, study on immune system, and emphasis on pathology, on the basis of experimental epidemiology. However, insufficient research has been conducted thus far on population health and its related determinants.

  10. Efficient re-indexing of automatically annotated image collections using keyword combination

    Science.gov (United States)

    Yavlinsky, Alexei; Rüger, Stefan

    2007-01-01

    This paper presents a framework for improving the image index obtained by automated image annotation. Within this framework, the technique of keyword combination is used for fast image re-indexing based on initial automated annotations. It aims to tackle the challenges of limited vocabulary size and low annotation accuracies resulting from differences between training and test collections. It is useful for situations when these two problems are not anticipated at the time of annotation. We show that based on example images from the automatically annotated collection, it is often possible to find multiple keyword queries that can retrieve new image concepts which are not present in the training vocabulary, and improve retrieval results of those that are already present. We demonstrate that this can be done at a very small computational cost and at an acceptable performance tradeoff, compared to traditional annotation models. We present a simple, robust, and computationally efficient approach for finding an appropriate set of keywords for a given target concept. We report results on TRECVID 2005, Getty Image Archive, and Web image datasets, the last two of which were specifically constructed to support realistic retrieval scenarios.

  11. Continuing Education Workshops in Bioinformatics Positively Impact Research and Careers.

    Science.gov (United States)

    Brazas, Michelle D; Ouellette, B F Francis

    2016-06-01

    Bioinformatics.ca has been hosting continuing education programs in introductory and advanced bioinformatics topics in Canada since 1999 and has trained more than 2,000 participants to date. These workshops have been adapted over the years to keep pace with advances in both science and technology as well as the changing landscape in available learning modalities and the bioinformatics training needs of our audience. Post-workshop surveys have been a mandatory component of each workshop and are used to ensure appropriate adjustments are made to workshops to maximize learning. However, neither bioinformatics.ca nor others offering similar training programs have explored the long-term impact of bioinformatics continuing education training. Bioinformatics.ca recently initiated a look back on the impact its workshops have had on the career trajectories, research outcomes, publications, and collaborations of its participants. Using an anonymous online survey, bioinformatics.ca analyzed responses from those surveyed and discovered its workshops have had a positive impact on collaborations, research, publications, and career progression.

  12. Bioinformatics approaches for identifying new therapeutic bioactive peptides in food

    Directory of Open Access Journals (Sweden)

    Nora Khaldi

    2012-10-01

    Full Text Available ABSTRACT:The traditional methods for mining foods for bioactive peptides are tedious and long. Similar to the drug industry, the length of time to identify and deliver a commercial health ingredient that reduces disease symptoms can take anything between 5 to 10 years. Reducing this time and effort is crucial in order to create new commercially viable products with clear and important health benefits. In the past few years, bioinformatics, the science that brings together fast computational biology, and efficient genome mining, is appearing as the long awaited solution to this problem. By quickly mining food genomes for characteristics of certain food therapeutic ingredients, researchers can potentially find new ones in a matter of a few weeks. Yet, surprisingly, very little success has been achieved so far using bioinformatics in mining for food bioactives.The absence of food specific bioinformatic mining tools, the slow integration of both experimental mining and bioinformatics, and the important difference between different experimental platforms are some of the reasons for the slow progress of bioinformatics in the field of functional food and more specifically in bioactive peptide discovery.In this paper I discuss some methods that could be easily translated, using a rational peptide bioinformatics design, to food bioactive peptide mining. I highlight the need for an integrated food peptide database. I also discuss how to better integrate experimental work with bioinformatics in order to improve the mining of food for bioactive peptides, therefore achieving a higher success rates.

  13. The Interaction Network Ontology-supported modeling and mining of complex interactions represented with multiple keywords in biomedical literature.

    Science.gov (United States)

    Özgür, Arzucan; Hur, Junguk; He, Yongqun

    2016-01-01

    The Interaction Network Ontology (INO) logically represents biological interactions, pathways, and networks. INO has been demonstrated to be valuable in providing a set of structured ontological terms and associated keywords to support literature mining of gene-gene interactions from biomedical literature. However, previous work using INO focused on single keyword matching, while many interactions are represented with two or more interaction keywords used in combination. This paper reports our extension of INO to include combinatory patterns of two or more literature mining keywords co-existing in one sentence to represent specific INO interaction classes. Such keyword combinations and related INO interaction type information could be automatically obtained via SPARQL queries, formatted in Excel format, and used in an INO-supported SciMiner, an in-house literature mining program. We studied the gene interaction sentences from the commonly used benchmark Learning Logic in Language (LLL) dataset and one internally generated vaccine-related dataset to identify and analyze interaction types containing multiple keywords. Patterns obtained from the dependency parse trees of the sentences were used to identify the interaction keywords that are related to each other and collectively represent an interaction type. The INO ontology currently has 575 terms including 202 terms under the interaction branch. The relations between the INO interaction types and associated keywords are represented using the INO annotation relations: 'has literature mining keywords' and 'has keyword dependency pattern'. The keyword dependency patterns were generated via running the Stanford Parser to obtain dependency relation types. Out of the 107 interactions in the LLL dataset represented with two-keyword interaction types, 86 were identified by using the direct dependency relations. The LLL dataset contained 34 gene regulation interaction types, each of which associated with multiple keywords. A

  14. Thriving in multidisciplinary research: advice for new bioinformatics students.

    Science.gov (United States)

    Auerbach, Raymond K

    2012-09-01

    The sciences have seen a large increase in demand for students in bioinformatics and multidisciplinary fields in general. Many new educational programs have been created to satisfy this demand, but navigating these programs requires a non-traditional outlook and emphasizes working in teams of individuals with distinct yet complementary skill sets. Written from the perspective of a current bioinformatics student, this article seeks to offer advice to prospective and current students in bioinformatics regarding what to expect in their educational program, how multidisciplinary fields differ from more traditional paths, and decisions that they will face on the road to becoming successful, productive bioinformaticists.

  15. Survey of MapReduce frame operation in bioinformatics.

    Science.gov (United States)

    Zou, Quan; Li, Xu-Bin; Jiang, Wen-Rui; Lin, Zi-Yu; Li, Gui-Lin; Chen, Ke

    2014-07-01

    Bioinformatics is challenged by the fact that traditional analysis tools have difficulty in processing large-scale data from high-throughput sequencing. The open source Apache Hadoop project, which adopts the MapReduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and reliable computing performance on Linux clusters and on cloud computing services. In this article, we present MapReduce frame-based applications that can be employed in the next-generation sequencing and other biological domains. In addition, we discuss the challenges faced by this field as well as the future works on parallel computing in bioinformatics.

  16. Evaluating an Inquiry-based Bioinformatics Course Using Q Methodology

    Science.gov (United States)

    Ramlo, Susan E.; McConnell, David; Duan, Zhong-Hui; Moore, Francisco B.

    2008-06-01

    Faculty at a Midwestern metropolitan public university recently developed a course on bioinformatics that emphasized collaboration and inquiry. Bioinformatics, essentially the application of computational tools to biological data, is inherently interdisciplinary. Thus part of the challenge of creating this course was serving the needs and backgrounds of a diverse set of students, predominantly computer science and biology undergraduate and graduate students. Although the researchers desired to investigate student views of the course, they were interested in the potentially different perspectives. Q methodology, a measure of subjectivity, allowed the researchers to determine the various student perspectives in the bioinformatics course.

  17. Thriving in Multidisciplinary Research: Advice for New Bioinformatics Students

    Science.gov (United States)

    Auerbach, Raymond K.

    2012-01-01

    The sciences have seen a large increase in demand for students in bioinformatics and multidisciplinary fields in general. Many new educational programs have been created to satisfy this demand, but navigating these programs requires a non-traditional outlook and emphasizes working in teams of individuals with distinct yet complementary skill sets. Written from the perspective of a current bioinformatics student, this article seeks to offer advice to prospective and current students in bioinformatics regarding what to expect in their educational program, how multidisciplinary fields differ from more traditional paths, and decisions that they will face on the road to becoming successful, productive bioinformaticists. PMID:23012580

  18. Website for avian flu information and bioinformatics

    Institute of Scientific and Technical Information of China (English)

    GAO; George; Fu

    2009-01-01

    Highly pathogenic influenza A virus H5N1 has spread out worldwide and raised the public concerns. This increased the output of influenza virus sequence data as well as the research publication and other reports. In order to fight against H5N1 avian flu in a comprehensive way, we designed and started to set up the Website for Avian Flu Information (http://www.avian-flu.info) from 2004. Other than the influenza virus database available, the website is aiming to integrate diversified information for both researchers and the public. From 2004 to 2009, we collected information from all aspects, i.e. reports of outbreaks, scientific publications and editorials, policies for prevention, medicines and vaccines, clinic and diagnosis. Except for publications, all information is in Chinese. Till April 15, 2009, the cumulative news entries had been over 2000 and research papers were approaching 5000. By using the curated data from Influenza Virus Resource, we have set up an influenza virus sequence database and a bioinformatic platform, providing the basic functions for the sequence analysis of influenza virus. We will focus on the collection of experimental data and results as well as the integration of the data from the geological information system and avian influenza epidemiology.

  19. Bioinformatics of the TULIP domain superfamily.

    Science.gov (United States)

    Kopec, Klaus O; Alva, Vikram; Lupas, Andrei N

    2011-08-01

    Proteins of the BPI (bactericidal/permeability-increasing protein)-like family contain either one or two tandem copies of a fold that usually provides a tubular cavity for the binding of lipids. Bioinformatic analyses show that, in addition to its known members, which include BPI, LBP [LPS (lipopolysaccharide)-binding protein)], CETP (cholesteryl ester-transfer protein), PLTP (phospholipid-transfer protein) and PLUNC (palate, lung and nasal epithelium clone) protein, this family also includes other, more divergent groups containing hypothetical proteins from fungi, nematodes and deep-branching unicellular eukaryotes. More distantly, BPI-like proteins are related to a family of arthropod proteins that includes hormone-binding proteins (Takeout-like; previously described to adopt a BPI-like fold), allergens and several groups of uncharacterized proteins. At even greater evolutionary distance, BPI-like proteins are homologous with the SMP (synaptotagmin-like, mitochondrial and lipid-binding protein) domains, which are found in proteins associated with eukaryotic membrane processes. In particular, SMP domain-containing proteins of yeast form the ERMES [ER (endoplasmic reticulum)-mitochondria encounter structure], required for efficient phospholipid exchange between these organelles. This suggests that SMP domains themselves bind lipids and mediate their exchange between heterologous membranes. The most distant group of homologues we detected consists of uncharacterized animal proteins annotated as TM (transmembrane) 24. We propose to group these families together into one superfamily that we term as the TULIP (tubular lipid-binding) domain superfamily.

  20. Evolution of web services in bioinformatics.

    Science.gov (United States)

    Neerincx, Pieter B T; Leunissen, Jack A M

    2005-06-01

    Bioinformaticians have developed large collections of tools to make sense of the rapidly growing pool of molecular biological data. Biological systems tend to be complex and in order to understand them, it is often necessary to link many data sets and use more than one tool. Therefore, bioinformaticians have experimented with several strategies to try to integrate data sets and tools. Owing to the lack of standards for data sets and the interfaces of the tools this is not a trivial task. Over the past few years building services with web-based interfaces has become a popular way of sharing the data and tools that have resulted from many bioinformatics projects. This paper discusses the interoperability problem and how web services are being used to try to solve it, resulting in the evolution of tools with web interfaces from HTML/web form-based tools not suited for automatic workflow generation to a dynamic network of XML-based web services that can easily be used to create pipelines.

  1. Bioinformatics study of the mangrove actin genes

    Science.gov (United States)

    Basyuni, M.; Wasilah, M.; Sumardi

    2017-01-01

    This study describes the bioinformatics methods to analyze eight actin genes from mangrove plants on DDBJ/EMBL/GenBank as well as predicted the structure, composition, subcellular localization, similarity, and phylogenetic. The physical and chemical properties of eight mangroves showed variation among the genes. The percentage of the secondary structure of eight mangrove actin genes followed the order of a helix > random coil > extended chain structure for BgActl, KcActl, RsActl, and A. corniculatum Act. In contrast to this observation, the remaining actin genes were random coil > extended chain structure > a helix. This study, therefore, shown the prediction of secondary structure was performed for necessary structural information. The values of chloroplast or signal peptide or mitochondrial target were too small, indicated that no chloroplast or mitochondrial transit peptide or signal peptide of secretion pathway in mangrove actin genes. These results suggested the importance of understanding the diversity and functional of properties of the different amino acids in mangrove actin genes. To clarify the relationship among the mangrove actin gene, a phylogenetic tree was constructed. Three groups of mangrove actin genes were formed, the first group contains B. gymnorrhiza BgAct and R. stylosa RsActl. The second cluster which consists of 5 actin genes the largest group, and the last branch consist of one gene, B. sexagula Act. The present study, therefore, supported the previous results that plant actin genes form distinct clusters in the tree.

  2. Website for avian flu information and bioinformatics

    Institute of Scientific and Technical Information of China (English)

    LIU Di; LIU Quan-He; WU Lin-Huan; LIU Bin; WU Jun; LAO Yi-Mei; LI Xiao-Jing; GAO George Fu; MA Jun-Cai

    2009-01-01

    Highly pathogenic influenza A virus H5N1 has spread out worldwide and raised the public concerns. This increased the output of influenza virus sequence data as well as the research publication and other reports. In order to fight against H5N1 avian flu in a comprehensive way, we designed and started to set up the Website for Avian Flu Information (http://www.avian-flu.info) from 2004. Other than the influenza virus database available, the website is aiming to integrate diversified information for both researchers and the public. From 2004 to 2009, we collected information from all aspects, i.e. reports of outbreaks, scientific publications and editorials, policies for prevention, medicines and vaccines, clinic and diagnosis. Except for publications, all information is in Chinese. Till April 15, 2009, the cumulative news entries had been over 2000 and research papers were approaching 5000. By using the curated data from Influenza Virus Resource, we have set up an influenza virus sequence database and a bioin-formatic platform, providing the basic functions for the sequence analysis of influenza virus. We will focus on the collection of experimental data and results as well as the integration of the data from the geological information system and avian influenza epidemiology.

  3. The Recent Trend in a Human Resource Management Journal: A Keyword Analysis

    Directory of Open Access Journals (Sweden)

    Muhammed Kürşad Özlen

    2014-07-01

    Full Text Available Continuous changes in technology, economic, social and psychological understandings and structures have influence on both Human Resources and their management. Organizations approach their human capital in a more sensitive way in order to win the loyalty and commitment of them, while increasing profit and maximizing the efficiency/effectiveness of its work power. Human Resources Management helps achieving these goals by recruiting, training, developing, motivating and rewarding employees. Therefore, the identification of current research interests is essential to lead them in defining organizational human resources strategies. The main purpose of this research is to identify top rated factors related to Human Resource Management by analyzing all the abstracts of the published papers in a Human Resource Management journal from the beginning of 2005 till the end of 2012. As a result of analyzing the keywords of all abstracts, the frequencies of the keyword categories are identified. Except the keywords related to Human Resources (17.6%, it is observed that the studies for the period consider the following: Employee rights and their career (18.3%, management (14.6%, contextual issues (10%, organizational strategies (9.5%, performance measurement and training (9.5%, behavioral issues and employee motivation (5.7, organizational culture (5.4%, technical issues (4.1%, etc. It should be noted that the researchers (a mainly stress on practice more than theory and (b consider the organization less than the individual. Interestingly, employee motivation is found to be less considered by the researchers. This study is believed to be useful for future studies and the industry by identifying the hot and top rated factors related to Human Resource Management.

  4. The potential of translational bioinformatics approaches for pharmacology research.

    Science.gov (United States)

    Li, Lang

    2015-10-01

    The field of bioinformatics has allowed the interpretation of massive amounts of biological data, ushering in the era of 'omics' to biomedical research. Its potential impact on pharmacology research is enormous and it has shown some emerging successes. A full realization of this potential, however, requires standardized data annotation for large health record databases and molecular data resources. Improved standardization will further stimulate the development of system pharmacology models, using translational bioinformatics methods. This new translational bioinformatics paradigm is highly complementary to current pharmacological research fields, such as personalized medicine, pharmacoepidemiology and drug discovery. In this review, I illustrate the application of transformational bioinformatics to research in numerous pharmacology subdisciplines. © 2015 The British Pharmacological Society.

  5. Bioinformatics Education in Pathology Training: Current Scope and Future Direction

    Directory of Open Access Journals (Sweden)

    Michael R Clay

    2017-04-01

    Full Text Available Training anatomic and clinical pathology residents in the principles of bioinformatics is a challenging endeavor. Most residents receive little to no formal exposure to bioinformatics during medical education, and most of the pathology training is spent interpreting histopathology slides using light microscopy or focused on laboratory regulation, management, and interpretation of discrete laboratory data. At a minimum, residents should be familiar with data structure, data pipelines, data manipulation, and data regulations within clinical laboratories. Fellowship-level training should incorporate advanced principles unique to each subspecialty. Barriers to bioinformatics education include the clinical apprenticeship training model, ill-defined educational milestones, inadequate faculty expertise, and limited exposure during medical training. Online educational resources, case-based learning, and incorporation into molecular genomics education could serve as effective educational strategies. Overall, pathology bioinformatics training can be incorporated into pathology resident curricula, provided there is motivation to incorporate, institutional support, educational resources, and adequate faculty expertise.

  6. Bioinformatics and phylogenetic analysis of human Tp73 gene

    African Journals Online (AJOL)

    Imtiaz

    2013-06-26

    Jun 26, 2013 ... Key words: Tp73, Bioinformatics, phylogenetics analysis, cancer, Tp53. INTRODUCTION ... splicing at C-terminal end of that protein and give rise to six different p73 terminal variants ..... 33 in human lung cancers. Cancer Res.

  7. Bioconductor: open software development for computational biology and bioinformatics

    DEFF Research Database (Denmark)

    Gentleman, R.C.; Carey, V.J.; Bates, D.M.;

    2004-01-01

    into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methods, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples....

  8. Scalable pattern recognition algorithms applications in computational biology and bioinformatics

    CERN Document Server

    Maji, Pradipta

    2014-01-01

    Reviews the development of scalable pattern recognition algorithms for computational biology and bioinformatics Includes numerous examples and experimental results to support the theoretical concepts described Concludes each chapter with directions for future research and a comprehensive bibliography

  9. An Efficient Multi-keyword Symmetric Searchable Encryption Scheme for Secure Data Outsourcing

    Directory of Open Access Journals (Sweden)

    Vasudha Arora

    2016-11-01

    Full Text Available Symmetric searchable encryption (SSE schemes allow a data owner to encrypt its data in such a way that it could be searched in encrypted form. When searching over encrypted data the retrieved data, search query, and search query outcome everything must be protected. A series of SSE schemes have been proposed in the past decade. In this paper, we are going to propose our an efficient multi-keyword symmetric searchable encryption scheme for secure data outsourcing and evaluate the performance of our proposed scheme on a real data set.

  10. An innovative approach for testing bioinformatics programs using metamorphic testing

    Directory of Open Access Journals (Sweden)

    Liu Huai

    2009-01-01

    Full Text Available Abstract Background Recent advances in experimental and computational technologies have fueled the development of many sophisticated bioinformatics programs. The correctness of such programs is crucial as incorrectly computed results may lead to wrong biological conclusion or misguide downstream experimentation. Common software testing procedures involve executing the target program with a set of test inputs and then verifying the correctness of the test outputs. However, due to the complexity of many bioinformatics programs, it is often difficult to verify the correctness of the test outputs. Therefore our ability to perform systematic software testing is greatly hindered. Results We propose to use a novel software testing technique, metamorphic testing (MT, to test a range of bioinformatics programs. Instead of requiring a mechanism to verify whether an individual test output is correct, the MT technique verifies whether a pair of test outputs conform to a set of domain specific properties, called metamorphic relations (MRs, thus greatly increases the number and variety of test cases that can be applied. To demonstrate how MT is used in practice, we applied MT to test two open-source bioinformatics programs, namely GNLab and SeqMap. In particular we show that MT is simple to implement, and is effective in detecting faults in a real-life program and some artificially fault-seeded programs. Further, we discuss how MT can be applied to test programs from various domains of bioinformatics. Conclusion This paper describes the application of a simple, effective and automated technique to systematically test a range of bioinformatics programs. We show how MT can be implemented in practice through two real-life case studies. Since many bioinformatics programs, particularly those for large scale simulation and data analysis, are hard to test systematically, their developers may benefit from using MT as part of the testing strategy. Therefore our work

  11. A high-throughput bioinformatics distributed computing platform

    OpenAIRE

    Keane, Thomas M; Page, Andrew J.; McInerney, James O; Naughton, Thomas J.

    2005-01-01

    In the past number of years the demand for high performance computing has greatly increased in the area of bioinformatics. The huge increase in size of many genomic databases has meant that many common tasks in bioinformatics are not possible to complete in a reasonable amount of time on a single processor. Recently distributed computing has emerged as an inexpensive alternative to dedicated parallel computing. We have developed a general-purpose distributed computing platform ...

  12. BOWS (bioinformatics open web services) to centralize bioinformatics tools in web services.

    Science.gov (United States)

    Velloso, Henrique; Vialle, Ricardo A; Ortega, J Miguel

    2015-06-02

    Bioinformaticians face a range of difficulties to get locally-installed tools running and producing results; they would greatly benefit from a system that could centralize most of the tools, using an easy interface for input and output. Web services, due to their universal nature and widely known interface, constitute a very good option to achieve this goal. Bioinformatics open web services (BOWS) is a system based on generic web services produced to allow programmatic access to applications running on high-performance computing (HPC) clusters. BOWS intermediates the access to registered tools by providing front-end and back-end web services. Programmers can install applications in HPC clusters in any programming language and use the back-end service to check for new jobs and their parameters, and then to send the results to BOWS. Programs running in simple computers consume the BOWS front-end service to submit new processes and read results. BOWS compiles Java clients, which encapsulate the front-end web service requisitions, and automatically creates a web page that disposes the registered applications and clients. Bioinformatics open web services registered applications can be accessed from virtually any programming language through web services, or using standard java clients. The back-end can run in HPC clusters, allowing bioinformaticians to remotely run high-processing demand applications directly from their machines.

  13. The Mnemonic Keyword Method: The Effects of Bidirectional Retrieval Training and of Ability to Image on Foreign Language Vocabulary Recall

    Science.gov (United States)

    Wyra, Mirella; Lawson, Michael J.; Hungi, Njora

    2007-01-01

    The mnemonic keyword method is an effective technique for vocabulary acquisition. This study examines the effects on recall of word-meaning pairs of (a) training in use of the keyword procedure at the time of retrieval; and (b) the influence of the self-rated ability to image. The performance of students trained in bidirectional retrieval using…

  14. Biopipe: a flexible framework for protocol-based bioinformatics analysis.

    Science.gov (United States)

    Hoon, Shawn; Ratnapu, Kiran Kumar; Chia, Jer-Ming; Kumarasamy, Balamurugan; Juguang, Xiao; Clamp, Michele; Stabenau, Arne; Potter, Simon; Clarke, Laura; Stupka, Elia

    2003-08-01

    We identify several challenges facing bioinformatics analysis today. Firstly, to fulfill the promise of comparative studies, bioinformatics analysis will need to accommodate different sources of data residing in a federation of databases that, in turn, come in different formats and modes of accessibility. Secondly, the tsunami of data to be handled will require robust systems that enable bioinformatics analysis to be carried out in a parallel fashion. Thirdly, the ever-evolving state of bioinformatics presents new algorithms and paradigms in conducting analysis. This means that any bioinformatics framework must be flexible and generic enough to accommodate such changes. In addition, we identify the need for introducing an explicit protocol-based approach to bioinformatics analysis that will lend rigorousness to the analysis. This makes it easier for experimentation and replication of results by external parties. Biopipe is designed in an effort to meet these goals. It aims to allow researchers to focus on protocol design. At the same time, it is designed to work over a compute farm and thus provides high-throughput performance. A common exchange format that encapsulates the entire protocol in terms of the analysis modules, parameters, and data versions has been developed to provide a powerful way in which to distribute and reproduce results. This will enable researchers to discuss and interpret the data better as the once implicit assumptions are now explicitly defined within the Biopipe framework.

  15. Quantum repeated games revisited

    CERN Document Server

    Frackiewicz, Piotr

    2011-01-01

    We present a scheme for playing quantum repeated 2x2 games based on the Marinatto and Weber's approach to quantum games. As a potential application, we study twice repeated Prisoner's Dilemma game. We show that results not available in classical game can be obtained when the game is played in the quantum way. Before we present our idea, we comment on the previous scheme of playing quantum repeated games.

  16. Hybrid ontology for semantic information retrieval model using keyword matching indexing system.

    Science.gov (United States)

    Uthayan, K R; Mala, G S Anandha

    2015-01-01

    Ontology is the process of growth and elucidation of concepts of an information domain being common for a group of users. Establishing ontology into information retrieval is a normal method to develop searching effects of relevant information users require. Keywords matching process with historical or information domain is significant in recent calculations for assisting the best match for specific input queries. This research presents a better querying mechanism for information retrieval which integrates the ontology queries with keyword search. The ontology-based query is changed into a primary order to predicate logic uncertainty which is used for routing the query to the appropriate servers. Matching algorithms characterize warm area of researches in computer science and artificial intelligence. In text matching, it is more dependable to study semantics model and query for conditions of semantic matching. This research develops the semantic matching results between input queries and information in ontology field. The contributed algorithm is a hybrid method that is based on matching extracted instances from the queries and information field. The queries and information domain is focused on semantic matching, to discover the best match and to progress the executive process. In conclusion, the hybrid ontology in semantic web is sufficient to retrieve the documents when compared to standard ontology.

  17. Hybrid Ontology for Semantic Information Retrieval Model Using Keyword Matching Indexing System

    Directory of Open Access Journals (Sweden)

    K. R. Uthayan

    2015-01-01

    Full Text Available Ontology is the process of growth and elucidation of concepts of an information domain being common for a group of users. Establishing ontology into information retrieval is a normal method to develop searching effects of relevant information users require. Keywords matching process with historical or information domain is significant in recent calculations for assisting the best match for specific input queries. This research presents a better querying mechanism for information retrieval which integrates the ontology queries with keyword search. The ontology-based query is changed into a primary order to predicate logic uncertainty which is used for routing the query to the appropriate servers. Matching algorithms characterize warm area of researches in computer science and artificial intelligence. In text matching, it is more dependable to study semantics model and query for conditions of semantic matching. This research develops the semantic matching results between input queries and information in ontology field. The contributed algorithm is a hybrid method that is based on matching extracted instances from the queries and information field. The queries and information domain is focused on semantic matching, to discover the best match and to progress the executive process. In conclusion, the hybrid ontology in semantic web is sufficient to retrieve the documents when compared to standard ontology.

  18. EMBANKS: Towards Disk Based Algorithms For Keyword-Search In Structured Databases

    CERN Document Server

    Gupta, Nitin

    2011-01-01

    In recent years, there has been a lot of interest in the field of keyword querying relational databases. A variety of systems such as DBXplorer [ACD02], Discover [HP02] and ObjectRank [BHP04] have been proposed. Another such system is BANKS, which enables data and schema browsing together with keyword-based search for relational databases. It models tuples as nodes in a graph, connected by links induced by foreign key and other relationships. The size of the database graph that BANKS uses is proportional to the sum of the number of nodes and edges in the graph. Systems such as SPIN, which search on Personal Information Networks and use BANKS as the backend, maintain a lot of information about the users' data. Since these systems run on the user workstation which have other demands of memory, such a heavy use of memory is unreasonable and if possible, should be avoided. In order to alleviate this problem, we introduce EMBANKS (acronym for External Memory BANKS), a framework for an optimized disk-based BANKS sy...

  19. Improving statistical keyword detection in short texts: Entropic and clustering approaches

    Science.gov (United States)

    Carretero-Campos, C.; Bernaola-Galván, P.; Coronado, A. V.; Carpena, P.

    2013-03-01

    In the last years, two successful approaches have been introduced to tackle the problem of statistical keyword detection in a text without the use of external information: (i) The entropic approach, where Shannon’s entropy of information is used to quantify the information content of the sequence of occurrences of each word in the text; and (ii) The clustering approach, which links the heterogeneity of the spatial distribution of a word in the text (clustering) with its relevance. In this paper, first we present some modifications to both techniques which improve their results. Then, we propose new metrics to evaluate the performance of keyword detectors based specifically on the needs of a typical user, and we employ them to find out which approach performs better. Although both approaches work well in long texts, we obtain in general that measures based on word-clustering perform at least as well as the entropic measure, which needs a convenient partition of the text to be applied, such as chapters of a book. In the latter approach we also show that the partition of the text chosen affects strongly its results. Finally, we focus on short texts, a case of high practical importance, such as short reports, web pages, scientific articles, etc. We show that the performance of word-clustering measures is also good in generic short texts since these measures are able to discriminate better the degree of relevance of low frequency words than the entropic approach.

  20. Repeated short climatic change affects the epidermal differentiation program and leads to matrix remodeling in a human organotypic skin model

    Directory of Open Access Journals (Sweden)

    Boutrand LB

    2017-02-01

    Full Text Available Laetitia-Barbollat Boutrand,1 Amélie Thépot,2 Charlotte Muther,3 Aurélie Boher,2 Julie Robic,4 Christelle Guéré,4 Katell Vié,4 Odile Damour,5 Jérôme Lamartine1,3 1Departement de Biologie, Université Claude Bernard Lyon I, 2LabSkinCreations, 3CNRS UMR5305, Laboratoire de Biologie Tissulaire et d’Ingénierie Thérapeutique (LBTI, Lyon, 4Laboratoires Clarins, Cergy-Pontoise, 5Banque de Tissus et Cellules, Hospices Civiles de Lyon, Lyon, France Abstract: Human skin is subject to frequent changes in ambient temperature and humidity and needs to cope with these environmental modifications. To decipher the molecular response of human skin to repeated climatic change, a versatile model of skin equivalent subject to “hot–wet” (40°C, 80% relative humidity [RH] or “cold–dry” (10°C, 40% RH climatic stress repeated daily was used. To obtain an exhaustive view of the molecular mechanisms elicited by climatic change, large-scale gene expression DNA microarray analysis was performed and modulated function was determined by bioinformatic annotation. This analysis revealed several functions, including epidermal differentiation and extracellular matrix, impacted by repeated variations in climatic conditions. Some of these molecular changes were confirmed by histological examination and protein expression. Both treatments (hot–wet and cold–dry reduced the expression of genes encoding collagens, laminin, and proteoglycans, suggesting a profound remodeling of the extracellular matrix. Strong induction of the entire family of late cornified envelope genes after cold–dry exposure, confirmed at protein level, was also observed. These changes correlated with an increase in epidermal differentiation markers such as corneodesmosin and a thickening of the stratum corneum, indicating possible implementation of defense mechanisms against dehydration. This study for the first time reveals the complex pattern of molecular response allowing

  1. Integrating bioinformatics into senior high school: design principles and implications.

    Science.gov (United States)

    Machluf, Yossy; Yarden, Anat

    2013-09-01

    Bioinformatics is an integral part of modern life sciences. It has revolutionized and redefined how research is carried out and has had an enormous impact on biotechnology, medicine, agriculture and related areas. Yet, it is only rarely integrated into high school teaching and learning programs, playing almost no role in preparing the next generation of information-oriented citizens. Here, we describe the design principles of bioinformatics learning environments, including our own, that are aimed at introducing bioinformatics into senior high school curricula through engaging learners in scientifically authentic inquiry activities. We discuss the bioinformatics-related benefits and challenges that high school teachers and students face in the course of the implementation process, in light of previous studies and our own experience. Based on these lessons, we present a new approach for characterizing the questions embedded in bioinformatics teaching and learning units, based on three criteria: the type of domain-specific knowledge required to answer each question (declarative knowledge, procedural knowledge, strategic knowledge, situational knowledge), the scientific approach from which each question stems (biological, bioinformatics, a combination of the two) and the associated cognitive process dimension (remember, understand, apply, analyze, evaluate, create). We demonstrate the feasibility of this approach using a learning environment, which we developed for the high school level, and suggest some of its implications. This review sheds light on unique and critical characteristics related to broader integration of bioinformatics in secondary education, which are also relevant to the undergraduate level, and especially on curriculum design, development of suitable learning environments and teaching and learning processes.

  2. A Novel Algorithm for Finding Interspersed Repeat Regions

    Institute of Scientific and Technical Information of China (English)

    Dongdong Li; Zhengzhi Wang; Qingshan Ni

    2004-01-01

    The analysis of repeats in the DNA sequences is an important subject in bioinformatics. In this paper, we propose a novel projection-assemble algorithm to find unknown interspersed repeats in DNA sequences. The algorithm employs random projection algorithm to obtain a candidate fragment set, and exhaustive search algorithm to search each pair of fragments from the candidate fragment set to find potential linkage, and then assemble them together. The complexity of our projection-assemble algorithm is nearly linear to the length of the genome sequence, and its memory usage is limited by the hardware. We tested our algorithm with both simulated data and real biology data, and the results show that our projection-assemble algorithm is efficient. By means of this algorithm, we found an un-labeled repeat region that occurs five times in Escherichia coli genome, with its length more than 5,000 bp, and a mismatch probability less than 4%.

  3. Constructing Virtual Documents for Keyword Based Concept Search in Web Ontology

    Directory of Open Access Journals (Sweden)

    Sapna Paliwal

    2013-04-01

    Full Text Available Web ontologies are structural frameworks for organizing information in semantics web and provide shared concepts. Ontology formally represents knowledge or information about particular entity as a set of concepts within a particular domain on semantic web. Web ontology helps to describe concepts within domain and also help us to enables semantic interoperability between two different applications byusing Falcons concept search. We can facilitate concept searching and ontologies reusing. Constructing virtual documents is a keyword based search in ontology. The proposed method helps us to find how search engine help user to find out ontologies in less time so we can satisfy their needs. It include some supportive technologies with new technique is to constructing virtual documents of concepts for keywordbased search and based on population scheme we rank the concept and ontologies, a way to generate structured snippets according to query. In this concept we can report the user feedback and usabilityevolution.

  4. A study of practical proxy reencryption with a keyword search scheme considering cloud storage structure.

    Science.gov (United States)

    Lee, Sun-Ho; Lee, Im-Yeong

    2014-01-01

    Data outsourcing services have emerged with the increasing use of digital information. They can be used to store data from various devices via networks that are easy to access. Unlike existing removable storage systems, storage outsourcing is available to many users because it has no storage limit and does not require a local storage medium. However, the reliability of storage outsourcing has become an important topic because many users employ it to store large volumes of data. To protect against unethical administrators and attackers, a variety of cryptography systems are used, such as searchable encryption and proxy reencryption. However, existing searchable encryption technology is inconvenient for use in storage outsourcing environments where users upload their data to be shared with others as necessary. In addition, some existing schemes are vulnerable to collusion attacks and have computing cost inefficiencies. In this paper, we analyze existing proxy re-encryption with keyword search.

  5. Level statistics of words: finding keywords in literary texts and symbolic sequences.

    Science.gov (United States)

    Carpena, P; Bernaola-Galván, P; Hackenberg, M; Coronado, A V; Oliver, J L

    2009-03-01

    Using a generalization of the level statistics analysis of quantum disordered systems, we present an approach able to extract automatically keywords in literary texts. Our approach takes into account not only the frequencies of the words present in the text but also their spatial distribution along the text, and is based on the fact that relevant words are significantly clustered (i.e., they self-attract each other), while irrelevant words are distributed randomly in the text. Since a reference corpus is not needed, our approach is especially suitable for single documents for which no a priori information is available. In addition, we show that our method works also in generic symbolic sequences (continuous texts without spaces), thus suggesting its general applicability.

  6. Query-By-Keywords (QBK): Query Formulation Using Semantics and Feedback

    Science.gov (United States)

    Telang, Aditya; Chakravarthy, Sharma; Li, Chengkai

    The staples of information retrieval have been querying and search, respectively, for structured and unstructured repositories. Processing queries over known, structured repositories (e.g., Databases) has been well-understood, and search has become ubiquitous when it comes to unstructured repositories (e.g., Web). Furthermore, searching structured repositories has been explored to a limited extent. However, there is not much work in querying unstructured sources. We argue that querying unstructured sources is the next step in performing focused retrievals. This paper proposed a new approach to generate queries from search-like inputs for unstructured repositories. Instead of burdening the user with schema details, we believe that pre-discovered semantic information in the form of taxonomies, relationship of keywords based on context, and attribute & operator compatibility can be used to generate query skeletons. Furthermore, progressive feedback from users can be used to improve the accuracy of query skeletons generated.

  7. Forecasting U.S. Home Foreclosures with an Index of Internet Keyword Searches

    Science.gov (United States)

    Webb, G. Kent

    Finding data to feed into financial and risk management models can be challenging. Many analysts attribute a lack of data or quality information as a contributing factor to the worldwide financial crises that seems to have begun in the U.S. subprime mortgage market. In this paper, a new source of data, key word search statistics recently available from Google, are applied in a experiment to develop a short-term forecasting model for the number of foreclosures in the U.S. housing market. The keyword search data significantly improves forecast of foreclosures, suggesting that this data can be useful for financial risk management. More generally, the new data source shows promise for a variety of financial and market analyses.

  8. Multi-stream LSTM-HMM decoding and histogram equalization for noise robust keyword spotting.

    Science.gov (United States)

    Wöllmer, Martin; Marchi, Erik; Squartini, Stefano; Schuller, Björn

    2011-09-01

    Highly spontaneous, conversational, and potentially emotional and noisy speech is known to be a challenge for today's automatic speech recognition (ASR) systems, which highlights the need for advanced algorithms that improve speech features and models. Histogram Equalization is an efficient method to reduce the mismatch between clean and noisy conditions by normalizing all moments of the probability distribution of the feature vector components. In this article, we propose to combine histogram equalization and multi-condition training for robust keyword detection in noisy speech. To better cope with conversational speaking styles, we show how contextual information can be effectively exploited in a multi-stream ASR framework that dynamically models context-sensitive phoneme estimates generated by a long short-term memory neural network. The proposed techniques are evaluated on the SEMAINE database-a corpus containing emotionally colored conversations with a cognitive system for "Sensitive Artificial Listening".

  9. Testing keywords internationally to define and apply undergraduate assessment standards in art and design

    Directory of Open Access Journals (Sweden)

    Robert Harland

    2015-07-01

    Full Text Available What language should be featured in assessment standards for international students? Have universities adjusted their assessment methods sufficiently to match the increased demand for studying abroad? How might art and design benefit from a more stable definition of standards? These are some questions this paper seeks to address by reporting the results of recent pedagogic research at the School of the Arts, Loughborough University, in the United Kingdom. Language use is at the heart of this issue, yet it is generally overlooked as an essential tool that links assessment, feedback and action planning for international students. The paper reveals existing and new data that builds on research since 2009, aimed at improving students’ assessment literacy. Recommendations are offered to stimulate local and global discussion about keyword use for defining undergraduate assessment standards in art and design.

  10. Secure Multi-Keyword Search with User/Owner-side Efficiency in the Cloud

    Directory of Open Access Journals (Sweden)

    LEE, Y.

    2016-05-01

    Full Text Available As the amount of data in the cloud grows, ranked search system, the similarity of a query to data is ranked, are of significant importance. on the other hand, to protect privacy, searchable encryption system are being actively studied. In this paper, we present a new similarity-based multi-keyword search scheme for encrypted data. This scheme provides high flexibility in the pre- and post-processing of encrypted data, including splitting stem/suffix and computing from the encrypted index-term matrix, demonstrated to support Latent Semantic Indexing(LSI. On the client side, the computation and communication costs are one to two orders of magnitude lower than those of previous methods, as demonstrated in the experimental results. we also provide a security analysis of the proposed scheme.

  11. Reconfigurable multiport EPON repeater

    Science.gov (United States)

    Oishi, Masayuki; Inohara, Ryo; Agata, Akira; Horiuchi, Yukio

    2009-11-01

    An extended reach EPON repeater is one of the solutions to effectively expand FTTH service areas. In this paper, we propose a reconfigurable multi-port EPON repeater for effective accommodation of multiple ODNs with a single OLT line card. The proposed repeater, which has multi-ports in both OLT and ODN sides, consists of TRs, BTRs with the CDR function and a reconfigurable electrical matrix switch, can accommodate multiple ODNs to a single OLT line card by controlling the connection of the matrix switch. Although conventional EPON repeaters require full OLT line cards to accommodate subscribers from the initial installation stage, the proposed repeater can dramatically reduce the number of required line cards especially when the number of subscribers is less than a half of the maximum registerable users per OLT. Numerical calculation results show that the extended reach EPON system with the proposed EPON repeater can save 17.5% of the initial installation cost compared with a conventional repeater, and can be less expensive than conventional systems up to the maximum subscribers especially when the percentage of ODNs in lightly-populated areas is higher.

  12. Revisiting the TALE repeat.

    Science.gov (United States)

    Deng, Dong; Yan, Chuangye; Wu, Jianping; Pan, Xiaojing; Yan, Nieng

    2014-04-01

    Transcription activator-like (TAL) effectors specifically bind to double stranded (ds) DNA through a central domain of tandem repeats. Each TAL effector (TALE) repeat comprises 33-35 amino acids and recognizes one specific DNA base through a highly variable residue at a fixed position in the repeat. Structural studies have revealed the molecular basis of DNA recognition by TALE repeats. Examination of the overall structure reveals that the basic building block of TALE protein, namely a helical hairpin, is one-helix shifted from the previously defined TALE motif. Here we wish to suggest a structure-based re-demarcation of the TALE repeat which starts with the residues that bind to the DNA backbone phosphate and concludes with the base-recognition hyper-variable residue. This new numbering system is consistent with the α-solenoid superfamily to which TALE belongs, and reflects the structural integrity of TAL effectors. In addition, it confers integral number of TALE repeats that matches the number of bound DNA bases. We then present fifteen crystal structures of engineered dHax3 variants in complex with target DNA molecules, which elucidate the structural basis for the recognition of bases adenine (A) and guanine (G) by reported or uncharacterized TALE codes. Finally, we analyzed the sequence-structure correlation of the amino acid residues within a TALE repeat. The structural analyses reported here may advance the mechanistic understanding of TALE proteins and facilitate the design of TALEN with improved affinity and specificity.

  13. Recursive quantum repeater networks

    CERN Document Server

    Van Meter, Rodney; Horsman, Clare

    2011-01-01

    Internet-scale quantum repeater networks will be heterogeneous in physical technology, repeater functionality, and management. The classical control necessary to use the network will therefore face similar issues as Internet data transmission. Many scalability and management problems that arose during the development of the Internet might have been solved in a more uniform fashion, improving flexibility and reducing redundant engineering effort. Quantum repeater network development is currently at the stage where we risk similar duplication when separate systems are combined. We propose a unifying framework that can be used with all existing repeater designs. We introduce the notion of a Quantum Recursive Network Architecture, developed from the emerging classical concept of 'recursive networks', extending recursive mechanisms from a focus on data forwarding to a more general distributed computing request framework. Recursion abstracts independent transit networks as single relay nodes, unifies software layer...

  14. Application of Bioinformatics and Systems Biology in Medicinal Plant Studies

    Institute of Scientific and Technical Information of China (English)

    DENG You-ping; AI Jun-mei; XIAO Pei-gen

    2010-01-01

    One important purpose to investigate medicinal plants is to understand genes and enzymes that govern the biological metabolic process to produce bioactive compounds.Genome wide high throughput technologies such as genomics,transcriptomics,proteomics and metabolomics can help reach that goal.Such technologies can produce a vast amount of data which desperately need bioinformatics and systems biology to process,manage,distribute and understand these data.By dealing with the"omics"data,bioinformatics and systems biology can also help improve the quality of traditional medicinal materials,develop new approaches for the classification and authentication of medicinal plants,identify new active compounds,and cultivate medicinal plant species that tolerate harsh environmental conditions.In this review,the application of bioinformatics and systems biology in medicinal plants is briefly introduced.

  15. Bioinformatics projects supporting life-sciences learning in high schools.

    Science.gov (United States)

    Marques, Isabel; Almeida, Paulo; Alves, Renato; Dias, Maria João; Godinho, Ana; Pereira-Leal, José B

    2014-01-01

    The interdisciplinary nature of bioinformatics makes it an ideal framework to develop activities enabling enquiry-based learning. We describe here the development and implementation of a pilot project to use bioinformatics-based research activities in high schools, called "Bioinformatics@school." It includes web-based research projects that students can pursue alone or under teacher supervision and a teacher training program. The project is organized so as to enable discussion of key results between students and teachers. After successful trials in two high schools, as measured by questionnaires, interviews, and assessment of knowledge acquisition, the project is expanding by the action of the teachers involved, who are helping us develop more content and are recruiting more teachers and schools.

  16. Bioinformatics projects supporting life-sciences learning in high schools.

    Directory of Open Access Journals (Sweden)

    Isabel Marques

    2014-01-01

    Full Text Available The interdisciplinary nature of bioinformatics makes it an ideal framework to develop activities enabling enquiry-based learning. We describe here the development and implementation of a pilot project to use bioinformatics-based research activities in high schools, called "Bioinformatics@school." It includes web-based research projects that students can pursue alone or under teacher supervision and a teacher training program. The project is organized so as to enable discussion of key results between students and teachers. After successful trials in two high schools, as measured by questionnaires, interviews, and assessment of knowledge acquisition, the project is expanding by the action of the teachers involved, who are helping us develop more content and are recruiting more teachers and schools.

  17. Bioinformatic approaches to identifying and classifying Rab proteins.

    Science.gov (United States)

    Diekmann, Yoan; Pereira-Leal, José B

    2015-01-01

    The bioinformatic annotation of Rab GTPases is important, for example, to understand the evolution of the endomembrane system. However, Rabs are particularly challenging for standard annotation pipelines because they are similar to other small GTPases and form a large family with many paralogous subfamilies. Here, we describe a bioinformatic annotation pipeline specifically tailored to Rab GTPases. It proceeds in two steps: first, Rabs are distinguished from other proteins based on GTPase-specific motifs, overall sequence similarity to other Rabs, and the occurrence of Rab-specific motifs. Second, Rabs are classified taking either a more accurate but slower phylogenetic approach or a slightly less accurate but much faster bioinformatic approach. All necessary steps can either be performed locally or using the referenced online tools. An implementation of a slightly more involved version of the pipeline presented here is available at RabDB.org.

  18. PineappleDB: An online pineapple bioinformatics resource

    Directory of Open Access Journals (Sweden)

    Fairbairn David J

    2005-10-01

    Full Text Available Abstract Background A world first pineapple EST sequencing program has been undertaken to investigate genes expressed during non-climacteric fruit ripening and the nematode-plant interaction during root infection. Very little is known of how non-climacteric fruit ripening is controlled or of the molecular basis of the nematode-plant interaction. PineappleDB was developed to provide the research community with access to a curated bioinformatics resource housing the fruit, root and nematode infected gall expressed sequences. Description PineappleDB is an online, curated database providing integrated access to annotated expressed sequence tag (EST data for cDNA clones isolated from pineapple fruit, root, and nematode infected root gall vascular cylinder tissues. The database currently houses over 5600 EST sequences, 3383 contig consensus sequences, and associated bioinformatic data including splice variants, Arabidopsis homologues, both MIPS based and Gene Ontology functional classifications, and clone distributions. The online resource can be searched by text or by BLAST sequence homology. The data outputs provide comprehensive sequence, bioinformatic and functional classification information. Conclusion The online pineapple bioinformatic resource provides the research community with access to pineapple fruit and root/gall sequence and bioinformatic data in a user-friendly format. The search tools enable efficient data mining and present a wide spectrum of bioinformatic and functional classification information. PineappleDB will be of broad appeal to researchers investigating pineapple genetics, non-climacteric fruit ripening, root-knot nematode infection, crassulacean acid metabolism and alternative RNA splicing in plants.

  19. Clustered regularly interspaced short palindromic repeats (CRISPRs) for the genotyping of bacterial pathogens.

    Science.gov (United States)

    Grissa, Ibtissem; Vergnaud, Gilles; Pourcel, Christine

    2009-01-01

    Clustered regularly interspaced short palindromic repeats (CRISPRs) are DNA sequences composed of a succession of repeats (23- to 47-bp long) separated by unique sequences called spacers. Polymorphism can be observed in different strains of a species and may be used for genotyping. We describe protocols and bioinformatics tools that allow the identification of CRISPRs from sequenced genomes, their comparison, and their component determination (the direct repeats and the spacers). A schematic representation of the spacer organization can be produced, allowing an easy comparison between strains.

  20. Approaches in integrative bioinformatics towards the virtual cell

    CERN Document Server

    Chen, Ming

    2014-01-01

    Approaches in Integrative Bioinformatics provides a basic introduction to biological information systems, as well as guidance for the computational analysis of systems biology. This book also covers a range of issues and methods that reveal the multitude of omics data integration types and the relevance that integrative bioinformatics has today. Topics include biological data integration and manipulation, modeling and simulation of metabolic networks, transcriptomics and phenomics, and virtual cell approaches, as well as a number of applications of network biology. It helps to illustrat

  1. Naturally selecting solutions: the use of genetic algorithms in bioinformatics.

    Science.gov (United States)

    Manning, Timmy; Sleator, Roy D; Walsh, Paul

    2013-01-01

    For decades, computer scientists have looked to nature for biologically inspired solutions to computational problems; ranging from robotic control to scheduling optimization. Paradoxically, as we move deeper into the post-genomics era, the reverse is occurring, as biologists and bioinformaticians look to computational techniques, to solve a variety of biological problems. One of the most common biologically inspired techniques are genetic algorithms (GAs), which take the Darwinian concept of natural selection as the driving force behind systems for solving real world problems, including those in the bioinformatics domain. Herein, we provide an overview of genetic algorithms and survey some of the most recent applications of this approach to bioinformatics based problems.

  2. Bioinformatic scaling of allosteric interactions in biomedical isozymes

    Science.gov (United States)

    Phillips, J. C.

    2016-09-01

    Allosteric (long-range) interactions can be surprisingly strong in proteins of biomedical interest. Here we use bioinformatic scaling to connect prior results on nonsteroidal anti-inflammatory drugs to promising new drugs that inhibit cancer cell metabolism. Many parallel features are apparent, which explain how even one amino acid mutation, remote from active sites, can alter medical results. The enzyme twins involved are cyclooxygenase (aspirin) and isocitrate dehydrogenase (IDH). The IDH results are accurate to 1% and are overdetermined by adjusting a single bioinformatic scaling parameter. It appears that the final stage in optimizing protein functionality may involve leveling of the hydrophobic limits of the arms of conformational hydrophilic hinges.

  3. High-performance computational solutions in protein bioinformatics

    CERN Document Server

    Mrozek, Dariusz

    2014-01-01

    Recent developments in computer science enable algorithms previously perceived as too time-consuming to now be efficiently used for applications in bioinformatics and life sciences. This work focuses on proteins and their structures, protein structure similarity searching at main representation levels and various techniques that can be used to accelerate similarity searches. Divided into four parts, the first part provides a formal model of 3D protein structures for functional genomics, comparative bioinformatics and molecular modeling. The second part focuses on the use of multithreading for

  4. Stock prediction: an event-driven approach based on bursty keywords

    Institute of Scientific and Technical Information of China (English)

    Di WU; Gabriel Pui Cheong FUNG; Jeffrey Xu YU; Qi PAN

    2009-01-01

    There are many real applications existing where the decision making process depends on a model that is built by collecting information from different data sources. Letus take the stock market as an example. The decision mak-ing process depends on a model which that is influenced by factors such as stock prices, exchange volumes, market in-dices (e.g. Dow Jones Index), news articles, and government announcements (e.g., the increase of stamp duty). Yet Nev-ertheless, modeling the stock market is a challenging task be-cause (1) the process related to market states (rise state/drop state) is a stochastic process, which is hard to capture us-ing the deterministic approach, and (2) the market state is invisible but will be influenced by the visible market infor-mation, like stock prices and news articles. In this paper, wepropose an approach to model the stock market pro-cess by using a Non-homogeneous Hidden Markov Model (NHMM). It takes both stock prices and news articles into consideration when it is being computed. A unique feature of our approach is event driven. We iden-tify associated events for a specific stock using a set of bursty features (keywords), which has a significant impact on the stock price changes when building the NHMM. We apply the model to predict the trend of future stock prices and the encouraging results indicate our proposed approach is practically sound and highly effective.

  5. Chemical evolution and the origin of life: cumulative keyword subject index 1970-1986

    Science.gov (United States)

    Roy, A. C.; Powers, J. V.; Rummel, J. D. (Principal Investigator)

    1990-01-01

    This cumulative subject index encompasses the subject indexes of the bibliographies on Chemical Evolution and the Origin of Life that were first published in 1970 and have continued through publication of the 1986 bibliography supplement. Early bibliographies focused on experimental and theoretical material dealing directly with the concepts of chemical evolution and the origin of life, excluding the broader areas of exobiology, biological evolution, and geochemistry. In recent years, these broader subject areas have also been incorporated as they appear in literature searches relating to chemical evolution and the origin of life, although direct attempts have not been made to compile all of the citations in these broad areas. The keyword subject indexes have also undergone an analogous change in scope. Compilers of earlier bibliographies used the most specific term available in producing the subject index. Compilers of recent bibliographies have used a number of broad terms relating to the overall subject content of each citation and specific terms where appropriate. The subject indexes of these 17 bibliographies have, in general, been cumulatively compiled exactly as they originally appeared. However, some changes have been made in an attempt to correct errors, combine terms, and provide more meaningful terms.

  6. Incorporating bioinformatics into biological science education in Nigeria: prospects and challenges.

    Science.gov (United States)

    Ojo, O O; Omabe, M

    2011-06-01

    The urgency to process and analyze the deluge of data created by proteomics and genomics studies worldwide has caused bioinformatics to gain prominence and importance. However, its multidisciplinary nature has created a unique demand for specialist trained in both biology and computing. Several countries, in response to this challenge, have developed a number of manpower training programmes. This review presents a description of the meaning, scope, history and development of bioinformatics with focus on prospects and challenges facing bioinformatics education worldwide. The paper also provides an overview of attempts at the introduction of bioinformatics in Nigeria; describes the existing bioinformatics scenario in Nigeria and suggests strategies for effective bioinformatics education in Nigeria.

  7. Incorporating Genomics and Bioinformatics across the Life Sciences Curriculum

    Energy Technology Data Exchange (ETDEWEB)

    Ditty, Jayna L.; Kvaal, Christopher A.; Goodner, Brad; Freyermuth, Sharyn K.; Bailey, Cheryl; Britton, Robert A.; Gordon, Stuart G.; Heinhorst, Sabine; Reed, Kelynne; Xu, Zhaohui; Sanders-Lorenz, Erin R.; Axen, Seth; Kim, Edwin; Johns, Mitrick; Scott, Kathleen; Kerfeld, Cheryl A.

    2011-08-01

    Undergraduate life sciences education needs an overhaul, as clearly described in the National Research Council of the National Academies publication BIO 2010: Transforming Undergraduate Education for Future Research Biologists. Among BIO 2010's top recommendations is the need to involve students in working with real data and tools that reflect the nature of life sciences research in the 21st century. Education research studies support the importance of utilizing primary literature, designing and implementing experiments, and analyzing results in the context of a bona fide scientific question in cultivating the analytical skills necessary to become a scientist. Incorporating these basic scientific methodologies in undergraduate education leads to increased undergraduate and post-graduate retention in the sciences. Toward this end, many undergraduate teaching organizations offer training and suggestions for faculty to update and improve their teaching approaches to help students learn as scientists, through design and discovery (e.g., Council of Undergraduate Research [www.cur.org] and Project Kaleidoscope [www.pkal.org]). With the advent of genome sequencing and bioinformatics, many scientists now formulate biological questions and interpret research results in the context of genomic information. Just as the use of bioinformatic tools and databases changed the way scientists investigate problems, it must change how scientists teach to create new opportunities for students to gain experiences reflecting the influence of genomics, proteomics, and bioinformatics on modern life sciences research. Educators have responded by incorporating bioinformatics into diverse life science curricula. While these published exercises in, and guidelines for, bioinformatics curricula are helpful and inspirational, faculty new to the area of bioinformatics inevitably need training in the theoretical underpinnings of the algorithms. Moreover, effectively integrating bioinformatics

  8. The Pentapeptide Repeat Proteins

    Energy Technology Data Exchange (ETDEWEB)

    Vetting,M.; Hegde, S.; Fajardo, J.; Fiser, A.; Roderick, S.; Takiff, H.; Blanchard, J.

    2006-01-01

    The Pentapeptide Repeat Protein (PRP) family has over 500 members in the prokaryotic and eukaryotic kingdoms. These proteins are composed of, or contain domains composed of, tandemly repeated amino acid sequences with a consensus sequence of [S, T,A, V][D, N][L, F]-[S, T,R][G]. The biochemical function of the vast majority of PRP family members is unknown. The three-dimensional structure of the first member of the PRP family was determined for the fluoroquinolone resistance protein (MfpA) from Mycobacterium tuberculosis. The structure revealed that the pentapeptide repeats encode the folding of a novel right-handed quadrilateral {beta}-helix. MfpA binds to DNA gyrase and inhibits its activity. The rod-shaped, dimeric protein exhibits remarkable size, shape and electrostatic similarity to DNA.

  9. Bioinformatics: Tools to accelerate population science and disease control research.

    Science.gov (United States)

    Forman, Michele R; Greene, Sarah M; Avis, Nancy E; Taplin, Stephen H; Courtney, Paul; Schad, Peter A; Hesse, Bradford W; Winn, Deborah M

    2010-06-01

    Population science and disease control researchers can benefit from a more proactive approach to applying bioinformatics tools for clinical and public health research. Bioinformatics utilizes principles of information sciences and technologies to transform vast, diverse, and complex life sciences data into a more coherent format for wider application. Bioinformatics provides the means to collect and process data, enhance data standardization and harmonization for scientific discovery, and merge disparate data sources. Achieving interoperability (i.e. the development of an informatics system that provides access to and use of data from different systems) will facilitate scientific explorations and careers and opportunities for interventions in population health. The National Cancer Institute's (NCI's) interoperable Cancer Biomedical Informatics Grid (caBIG) is one of a number of illustrative tools in this report that are being mined by population scientists. Tools are not all that is needed for progress. Challenges persist, including a lack of common data standards, proprietary barriers to data access, and difficulties pooling data from studies. Population scientists and informaticists are developing promising and innovative solutions to these barriers. The purpose of this paper is to describe how the application of bioinformatics systems can accelerate population health research across the continuum from prevention to detection, diagnosis, treatment, and outcome.

  10. CROSSWORK for Glycans: Glycan Identificatin Through Mass Spectrometry and Bioinformatics

    DEFF Research Database (Denmark)

    Rasmussen, Morten; Thaysen-Andersen, Morten; Højrup, Peter

      We have developed "GLYCANthrope " - CROSSWORKS for glycans:  a bioinformatics tool, which assists in identifying N-linked glycosylated peptides as well as their glycan moieties from MS2 data of enzymatically digested glycoproteins. The program runs either as a stand-alone application or as a plug...

  11. Learning Genetics through an Authentic Research Simulation in Bioinformatics

    Science.gov (United States)

    Gelbart, Hadas; Yarden, Anat

    2006-01-01

    Following the rationale that learning is an active process of knowledge construction as well as enculturation into a community of experts, we developed a novel web-based learning environment in bioinformatics for high-school biology majors in Israel. The learning environment enables the learners to actively participate in a guided inquiry process…

  12. Hidden in the Middle: Culture, Value and Reward in Bioinformatics

    Science.gov (United States)

    Lewis, Jamie; Bartlett, Andrew; Atkinson, Paul

    2016-01-01

    Bioinformatics--the so-called shotgun marriage between biology and computer science--is an interdiscipline. Despite interdisciplinarity being seen as a virtue, for having the capacity to solve complex problems and foster innovation, it has the potential to place projects and people in anomalous categories. For example, valorised…

  13. Intrageneric Primer Design: Bringing Bioinformatics Tools to the Class

    Science.gov (United States)

    Lima, Andre O. S.; Garces, Sergio P. S.

    2006-01-01

    Bioinformatics is one of the fastest growing scientific areas over the last decade. It focuses on the use of informatics tools for the organization and analysis of biological data. An example of their importance is the availability nowadays of dozens of software programs for genomic and proteomic studies. Thus, there is a growing field (private…

  14. An International Bioinformatics Infrastructure to Underpin the Arabidopsis Community

    Science.gov (United States)

    The future bioinformatics needs of the Arabidopsis community as well as those of other scientific communities that depend on Arabidopsis resources were discussed at a pair of recent meetings held by the Multinational Arabidopsis Steering Committee (MASC) and the North American Arabidopsis Steering C...

  15. Bioinformatic approaches to interrogating vitamin D receptor signaling.

    Science.gov (United States)

    Campbell, Moray J

    2017-03-10

    Bioinformatics applies unbiased approaches to develop statistically-robust insight into health and disease. At the global, or "20,000 foot" view bioinformatic analyses of vitamin D receptor (NR1I1/VDR) signaling can measure where the VDR gene or protein exerts a genome-wide significant impact on biology; VDR is significantly implicated in bone biology and immune systems, but not in cancer. With a more VDR-centric, or "2000 foot" view, bioinformatic approaches can interrogate events downstream of VDR activity. Integrative approaches can combine VDR ChIP-Seq in cell systems where significant volumes of publically available data are available. For example, VDR ChIP-Seq studies can be combined with genome-wide association studies to reveal significant associations to immune phenotypes. Similarly, VDR ChIP-Seq can be combined with data from Cancer Genome Atlas (TCGA) to infer the impact of VDR target genes in cancer progression. Therefore, bioinformatic approaches can reveal what aspects of VDR downstream networks are significantly related to disease or phenotype.

  16. WIWS: a protein structure bioinformatics Web service collection.

    NARCIS (Netherlands)

    Hekkelman, M.L.; Beek, T.A.H. te; Pettifer, S.R.; Thorne, D.; Attwood, T.K.; Vriend, G.

    2010-01-01

    The WHAT IF molecular-modelling and drug design program is widely distributed in the world of protein structure bioinformatics. Although originally designed as an interactive application, its highly modular design and inbuilt control language have recently enabled its deployment as a collection of p

  17. A Bioinformatic Approach to Inter Functional Interactions within Protein Sequences

    Science.gov (United States)

    2009-02-23

    Geoffrey Webb Prof James Whisstock Dr Jianging Song Mr Khalid Mahmood Mr Cyril Reboul Ms Wan Ting Kan Publications: List peer-reviewed...Khalid Mahmood, Jianging Song, Cyril Reboul , Wan Ting Kan, Geoffrey I. Webb and James C. Whisstock. To be submitted to BMC Bioinformatics. Outline

  18. Bioinformatics Assisted Gene Discovery and Annotation of Human Genome

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    As the sequencing stage of human genome project is near the end, the work has begun for discovering novel genes from genome sequences and annotating their biological functions. Here are reviewed current major bioinformatics tools and technologies available for large scale gene discovery and annotation from human genome sequences. Some ideas about possible future development are also provided.

  19. Pladipus Enables Universal Distributed Computing in Proteomics Bioinformatics.

    Science.gov (United States)

    Verheggen, Kenneth; Maddelein, Davy; Hulstaert, Niels; Martens, Lennart; Barsnes, Harald; Vaudel, Marc

    2016-03-04

    The use of proteomics bioinformatics substantially contributes to an improved understanding of proteomes, but this novel and in-depth knowledge comes at the cost of increased computational complexity. Parallelization across multiple computers, a strategy termed distributed computing, can be used to handle this increased complexity; however, setting up and maintaining a distributed computing infrastructure requires resources and skills that are not readily available to most research groups. Here we propose a free and open-source framework named Pladipus that greatly facilitates the establishment of distributed computing networks for proteomics bioinformatics tools. Pladipus is straightforward to install and operate thanks to its user-friendly graphical interface, allowing complex bioinformatics tasks to be run easily on a network instead of a single computer. As a result, any researcher can benefit from the increased computational efficiency provided by distributed computing, hence empowering them to tackle more complex bioinformatics challenges. Notably, it enables any research group to perform large-scale reprocessing of publicly available proteomics data, thus supporting the scientific community in mining these data for novel discoveries.

  20. BioRuby: Bioinformatics software for the Ruby programming language

    NARCIS (Netherlands)

    Goto, N.; Prins, J.C.P.; Nakao, M.; Bonnal, R.; Aerts, J.; Katayama, A.

    2010-01-01

    The BioRuby software toolkit contains a comprehensive set of free development tools and libraries for bioinformatics and molecular biology, written in the Ruby programming language. BioRuby has components for sequence analysis, pathway analysis, protein modelling and phylogenetic analysis; it suppor

  1. BioRuby : bioinformatics software for the Ruby programming language

    NARCIS (Netherlands)

    Goto, Naohisa; Prins, Pjotr; Nakao, Mitsuteru; Bonnal, Raoul; Aerts, Jan; Katayama, Toshiaki

    2010-01-01

    The BioRuby software toolkit contains a comprehensive set of free development tools and libraries for bioinformatics and molecular biology, written in the Ruby programming language. BioRuby has components for sequence analysis, pathway analysis, protein modelling and phylogenetic analysis; it suppor

  2. A Tool for Creating and Parallelizing Bioinformatics Pipelines

    Science.gov (United States)

    2007-06-01

    well as that are incorporated into InterPro (Mulder, et al., 2005). other users’ work. PUMA2 ( Maltsev , et al., 2006) incorporates more than 20 0-7695...pipeline for protocol-based bioinformatics analysis." Genome Res., 13(8), pp. 1904-1915, 2003. Maltsev , N. and E. Glass, et al., "PUMA2--grid-based 4

  3. Robust enzyme design: bioinformatic tools for improved protein stability.

    Science.gov (United States)

    Suplatov, Dmitry; Voevodin, Vladimir; Švedas, Vytas

    2015-03-01

    The ability of proteins and enzymes to maintain a functionally active conformation under adverse environmental conditions is an important feature of biocatalysts, vaccines, and biopharmaceutical proteins. From an evolutionary perspective, robust stability of proteins improves their biological fitness and allows for further optimization. Viewed from an industrial perspective, enzyme stability is crucial for the practical application of enzymes under the required reaction conditions. In this review, we analyze bioinformatic-driven strategies that are used to predict structural changes that can be applied to wild type proteins in order to produce more stable variants. The most commonly employed techniques can be classified into stochastic approaches, empirical or systematic rational design strategies, and design of chimeric proteins. We conclude that bioinformatic analysis can be efficiently used to study large protein superfamilies systematically as well as to predict particular structural changes which increase enzyme stability. Evolution has created a diversity of protein properties that are encoded in genomic sequences and structural data. Bioinformatics has the power to uncover this evolutionary code and provide a reproducible selection of hotspots - key residues to be mutated in order to produce more stable and functionally diverse proteins and enzymes. Further development of systematic bioinformatic procedures is needed to organize and analyze sequences and structures of proteins within large superfamilies and to link them to function, as well as to provide knowledge-based predictions for experimental evaluation.

  4. A BIOINFORMATIC STRATEGY TO RAPIDLY CHARACTERIZE CDNA LIBRARIES

    Science.gov (United States)

    A Bioinformatic Strategy to Rapidly Characterize cDNA LibrariesG. Charles Ostermeier1, David J. Dix2 and Stephen A. Krawetz1.1Departments of Obstetrics and Gynecology, Center for Molecular Medicine and Genetics, & Institute for Scientific Computing, Wayne State Univer...

  5. Mathematics and evolutionary biology make bioinformatics education comprehensible.

    Science.gov (United States)

    Jungck, John R; Weisstein, Anton E

    2013-09-01

    The patterns of variation within a molecular sequence data set result from the interplay between population genetic, molecular evolutionary and macroevolutionary processes-the standard purview of evolutionary biologists. Elucidating these patterns, particularly for large data sets, requires an understanding of the structure, assumptions and limitations of the algorithms used by bioinformatics software-the domain of mathematicians and computer scientists. As a result, bioinformatics often suffers a 'two-culture' problem because of the lack of broad overlapping expertise between these two groups. Collaboration among specialists in different fields has greatly mitigated this problem among active bioinformaticians. However, science education researchers report that much of bioinformatics education does little to bridge the cultural divide, the curriculum too focused on solving narrow problems (e.g. interpreting pre-built phylogenetic trees) rather than on exploring broader ones (e.g. exploring alternative phylogenetic strategies for different kinds of data sets). Herein, we present an introduction to the mathematics of tree enumeration, tree construction, split decomposition and sequence alignment. We also introduce off-line downloadable software tools developed by the BioQUEST Curriculum Consortium to help students learn how to interpret and critically evaluate the results of standard bioinformatics analyses.

  6. Hidden in the Middle: Culture, Value and Reward in Bioinformatics

    Science.gov (United States)

    Lewis, Jamie; Bartlett, Andrew; Atkinson, Paul

    2016-01-01

    Bioinformatics--the so-called shotgun marriage between biology and computer science--is an interdiscipline. Despite interdisciplinarity being seen as a virtue, for having the capacity to solve complex problems and foster innovation, it has the potential to place projects and people in anomalous categories. For example, valorised…

  7. Learning Genetics through an Authentic Research Simulation in Bioinformatics

    Science.gov (United States)

    Gelbart, Hadas; Yarden, Anat

    2006-01-01

    Following the rationale that learning is an active process of knowledge construction as well as enculturation into a community of experts, we developed a novel web-based learning environment in bioinformatics for high-school biology majors in Israel. The learning environment enables the learners to actively participate in a guided inquiry process…

  8. An evaluation of ontology exchange languages for bioinformatics.

    Science.gov (United States)

    McEntire, R; Karp, P; Abernethy, N; Benton, D; Helt, G; DeJongh, M; Kent, R; Kosky, A; Lewis, S; Hodnett, D; Neumann, E; Olken, F; Pathak, D; Tarczy-Hornoch, P; Toldo, L; Topaloglou, T

    2000-01-01

    Ontologies are specifications of the concepts in a given field, and of the relationships among those concepts. The development of ontologies for molecular-biology information and the sharing of those ontologies within the bioinformatics community are central problems in bioinformatics. If the bioinformatics community is to share ontologies effectively, ontologies must be exchanged in a form that uses standardized syntax and semantics. This paper reports on an effort among the authors to evaluate alternative ontology-exchange languages, and to recommend one or more languages for use within the larger bioinformatics community. The study selected a set of candidate languages, and defined a set of capabilities that the ideal ontology-exchange language should satisfy. The study scored the languages according to the degree to which they satisfied each capability. In addition, the authors performed several ontology-exchange experiments with the two languages that received the highest scores: OML and Ontolingua. The result of those experiments, and the main conclusion of this study, was that the frame-based semantic model of Ontolingua is preferable to the conceptual graph model of OML, but that the XML-based syntax of OML is preferable to the Lisp-based syntax of Ontolingua.

  9. Mathematics and evolutionary biology make bioinformatics education comprehensible

    Science.gov (United States)

    Weisstein, Anton E.

    2013-01-01

    The patterns of variation within a molecular sequence data set result from the interplay between population genetic, molecular evolutionary and macroevolutionary processes—the standard purview of evolutionary biologists. Elucidating these patterns, particularly for large data sets, requires an understanding of the structure, assumptions and limitations of the algorithms used by bioinformatics software—the domain of mathematicians and computer scientists. As a result, bioinformatics often suffers a ‘two-culture’ problem because of the lack of broad overlapping expertise between these two groups. Collaboration among specialists in different fields has greatly mitigated this problem among active bioinformaticians. However, science education researchers report that much of bioinformatics education does little to bridge the cultural divide, the curriculum too focused on solving narrow problems (e.g. interpreting pre-built phylogenetic trees) rather than on exploring broader ones (e.g. exploring alternative phylogenetic strategies for different kinds of data sets). Herein, we present an introduction to the mathematics of tree enumeration, tree construction, split decomposition and sequence alignment. We also introduce off-line downloadable software tools developed by the BioQUEST Curriculum Consortium to help students learn how to interpret and critically evaluate the results of standard bioinformatics analyses. PMID:23821621

  10. Intrageneric Primer Design: Bringing Bioinformatics Tools to the Class

    Science.gov (United States)

    Lima, Andre O. S.; Garces, Sergio P. S.

    2006-01-01

    Bioinformatics is one of the fastest growing scientific areas over the last decade. It focuses on the use of informatics tools for the organization and analysis of biological data. An example of their importance is the availability nowadays of dozens of software programs for genomic and proteomic studies. Thus, there is a growing field (private…

  11. Repeating the Past

    Science.gov (United States)

    Moore, John W.

    1998-05-01

    As part of the celebration of the Journal 's 75th year, we are scanning each Journal issue from 25, 50, and 74 years ago. Many of the ideas and practices described are so similar to present-day "innovations" that George Santayana's adage (1) "Those who cannot remember the past are condemned to repeat it" comes to mind. But perhaps "condemned" is too strong - sometimes it may be valuable to repeat something that was done long ago. One example comes from the earliest days of the Division of Chemical Education and of the Journal.

  12. 2016 Year-in-Review of Clinical and Consumer Informatics: Analysis and Visualization of Keywords and Topics.

    Science.gov (United States)

    Park, Hyeoun-Ae; Lee, Joo Yun; On, Jeongah; Lee, Ji Hyun; Jung, Hyesil; Park, Seul Ki

    2017-04-01

    The objective of this study was to review and visualize the medical informatics field over the previous 12 months according to the frequencies of keywords and topics in papers published in the top four journals in the field and in Healthcare Informatics Research (HIR), an official journal of the Korean Society of Medical Informatics. A six-person team conducted an extensive review of the literature on clinical and consumer informatics. The literature was searched using keywords employed in the American Medical Informatics Association year-in-review process and organized into 14 topics used in that process. Data were analyzed using word clouds, social network analysis, and association rules. The literature search yielded 370 references and 1,123 unique keywords. 'Electronic Health Record' (EHR) (78.6%) was the most frequently appearing keyword in the articles published in the five studied journals, followed by 'telemedicine' (2.1%). EHR (37.6%) was also the most frequently studied topic area, followed by clinical informatics (12.0%). However, 'telemedicine' (17.0%) was the most frequently appearing keyword in articles published in HIR, followed by 'telecommunications' (4.5%). Telemedicine (47.1%) was the most frequently studied topic area, followed by EHR (14.7%). The study findings reflect the Korean government's efforts to introduce telemedicine into the Korean healthcare system and reactions to this from the stakeholders associated with telemedicine.

  13. Bioclipse: an open source workbench for chemo- and bioinformatics

    Directory of Open Access Journals (Sweden)

    Wagener Johannes

    2007-02-01

    Full Text Available Abstract Background There is a need for software applications that provide users with a complete and extensible toolkit for chemo- and bioinformatics accessible from a single workbench. Commercial packages are expensive and closed source, hence they do not allow end users to modify algorithms and add custom functionality. Existing open source projects are more focused on providing a framework for integrating existing, separately installed bioinformatics packages, rather than providing user-friendly interfaces. No open source chemoinformatics workbench has previously been published, and no sucessful attempts have been made to integrate chemo- and bioinformatics into a single framework. Results Bioclipse is an advanced workbench for resources in chemo- and bioinformatics, such as molecules, proteins, sequences, spectra, and scripts. It provides 2D-editing, 3D-visualization, file format conversion, calculation of chemical properties, and much more; all fully integrated into a user-friendly desktop application. Editing supports standard functions such as cut and paste, drag and drop, and undo/redo. Bioclipse is written in Java and based on the Eclipse Rich Client Platform with a state-of-the-art plugin architecture. This gives Bioclipse an advantage over other systems as it can easily be extended with functionality in any desired direction. Conclusion Bioclipse is a powerful workbench for bio- and chemoinformatics as well as an advanced integration platform. The rich functionality, intuitive user interface, and powerful plugin architecture make Bioclipse the most advanced and user-friendly open source workbench for chemo- and bioinformatics. Bioclipse is released under Eclipse Public License (EPL, an open source license which sets no constraints on external plugin licensing; it is totally open for both open source plugins as well as commercial ones. Bioclipse is freely available at http://www.bioclipse.net.

  14. BioWarehouse: a bioinformatics database warehouse toolkit

    Directory of Open Access Journals (Sweden)

    Stringer-Calvert David WJ

    2006-03-01

    Full Text Available Abstract Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the

  15. Bioclipse: an open source workbench for chemo- and bioinformatics

    Science.gov (United States)

    Spjuth, Ola; Helmus, Tobias; Willighagen, Egon L; Kuhn, Stefan; Eklund, Martin; Wagener, Johannes; Murray-Rust, Peter; Steinbeck, Christoph; Wikberg, Jarl ES

    2007-01-01

    Background There is a need for software applications that provide users with a complete and extensible toolkit for chemo- and bioinformatics accessible from a single workbench. Commercial packages are expensive and closed source, hence they do not allow end users to modify algorithms and add custom functionality. Existing open source projects are more focused on providing a framework for integrating existing, separately installed bioinformatics packages, rather than providing user-friendly interfaces. No open source chemoinformatics workbench has previously been published, and no sucessful attempts have been made to integrate chemo- and bioinformatics into a single framework. Results Bioclipse is an advanced workbench for resources in chemo- and bioinformatics, such as molecules, proteins, sequences, spectra, and scripts. It provides 2D-editing, 3D-visualization, file format conversion, calculation of chemical properties, and much more; all fully integrated into a user-friendly desktop application. Editing supports standard functions such as cut and paste, drag and drop, and undo/redo. Bioclipse is written in Java and based on the Eclipse Rich Client Platform with a state-of-the-art plugin architecture. This gives Bioclipse an advantage over other systems as it can easily be extended with functionality in any desired direction. Conclusion Bioclipse is a powerful workbench for bio- and chemoinformatics as well as an advanced integration platform. The rich functionality, intuitive user interface, and powerful plugin architecture make Bioclipse the most advanced and user-friendly open source workbench for chemo- and bioinformatics. Bioclipse is released under Eclipse Public License (EPL), an open source license which sets no constraints on external plugin licensing; it is totally open for both open source plugins as well as commercial ones. Bioclipse is freely available at . PMID:17316423

  16. Virginia Bioinformatics Institute offers fellowships for graduate work in transdisciplinary science

    OpenAIRE

    Bland, Susan

    2008-01-01

    The Virginia Bioinformatics Institute at Virginia Tech, in collaboration with Virginia Tech's Ph.D. program in genetics, bioinformatics, and computational biology, is providing substantial fellowships in support of graduate work in transdisciplinary team science.

  17. Efficient feature selection and classification of protein sequence data in bioinformatics

    National Research Council Canada - National Science Library

    Iqbal, Muhammad Javed; Faye, Ibrahima; Samir, Brahim Belhaouari; Said, Abas Md

    2014-01-01

    Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding...

  18. Missing "Links" in Bioinformatics Education: Expanding Students' Conceptions of Bioinformatics Using a Biodiversity Database of Living and Fossil Reef Corals

    Science.gov (United States)

    Nehm, Ross H.; Budd, Ann F.

    2006-01-01

    NMITA is a reef coral biodiversity database that we use to introduce students to the expansive realm of bioinformatics beyond genetics. We introduce a series of lessons that have students use this database, thereby accessing real data that can be used to test hypotheses about biodiversity and evolution while targeting the "National Science …

  19. Bioinformatics training: selecting an appropriate learning content management system--an example from the European Bioinformatics Institute.

    Science.gov (United States)

    Wright, Victoria Ann; Vaughan, Brendan W; Laurent, Thomas; Lopez, Rodrigo; Brooksbank, Cath; Schneider, Maria Victoria

    2010-11-01

    Today's molecular life scientists are well educated in the emerging experimental tools of their trade, but when it comes to training on the myriad of resources and tools for dealing with biological data, a less ideal situation emerges. Often bioinformatics users receive no formal training on how to make the most of the bioinformatics resources and tools available in the public domain. The European Bioinformatics Institute, which is part of the European Molecular Biology Laboratory (EMBL-EBI), holds the world's most comprehensive collection of molecular data, and training the research community to exploit this information is embedded in the EBI's mission. We have evaluated eLearning, in parallel with face-to-face courses, as a means of training users of our data resources and tools. We anticipate that eLearning will become an increasingly important vehicle for delivering training to our growing user base, so we have undertaken an extensive review of Learning Content Management Systems (LCMSs). Here, we describe the process that we used, which considered the requirements of trainees, trainers and systems administrators, as well as taking into account our organizational values and needs. This review describes the literature survey, user discussions and scripted platform testing that we performed to narrow down our choice of platform from 36 to a single platform. We hope that it will serve as guidance for others who are seeking to incorporate eLearning into their bioinformatics training programmes.

  20. Missing "Links" in Bioinformatics Education: Expanding Students' Conceptions of Bioinformatics Using a Biodiversity Database of Living and Fossil Reef Corals

    Science.gov (United States)

    Nehm, Ross H.; Budd, Ann F.

    2006-01-01

    NMITA is a reef coral biodiversity database that we use to introduce students to the expansive realm of bioinformatics beyond genetics. We introduce a series of lessons that have students use this database, thereby accessing real data that can be used to test hypotheses about biodiversity and evolution while targeting the "National Science …

  1. All-optical repeater.

    Science.gov (United States)

    Silberberg, Y

    1986-06-01

    An all-optical device containing saturable gain, saturable loss, and unsaturable loss is shown to transform weak, distorted optical pulses into uniform standard-shape pulses. The proposed device performs thresholding, amplification, and pulse shaping as required from an optical repeater. It is shown that such a device could be realized by existing semiconductor technology.

  2. Bidirectional Manchester repeater

    Science.gov (United States)

    Ferguson, J.

    1980-01-01

    Bidirectional Manchester repeater is inserted at periodic intervals along single bidirectional twisted pair transmission line to detect, amplify, and transmit bidirectional Manchester 11 code signals. Requiring only 18 TTL 7400 series IC's, some line receivers and drivers, and handful of passive components, circuit is simple and relatively inexpensive to build.

  3. Report on the EMBER Project--A European Multimedia Bioinformatics Educational Resource

    Science.gov (United States)

    Attwood, Terri K.; Selimas, Ioannis; Buis, Rob; Altenburg, Ruud; Herzog, Robert; Ledent, Valerie; Ghita, Viorica; Fernandes, Pedro; Marques, Isabel; Brugman, Marc

    2005-01-01

    EMBER was a European project aiming to develop bioinformatics teaching materials on the Web and CD-ROM to help address the recognised skills shortage in bioinformatics. The project grew out of pilot work on the development of an interactive web-based bioinformatics tutorial and the desire to repackage that resource with the help of a professional…

  4. Introductory Bioinformatics Exercises Utilizing Hemoglobin and Chymotrypsin to Reinforce the Protein Sequence-Structure-Function Relationship

    Science.gov (United States)

    Inlow, Jennifer K.; Miller, Paige; Pittman, Bethany

    2007-01-01

    We describe two bioinformatics exercises intended for use in a computer laboratory setting in an upper-level undergraduate biochemistry course. To introduce students to bioinformatics, the exercises incorporate several commonly used bioinformatics tools, including BLAST, that are freely available online. The exercises build upon the students'…

  5. Vertical and Horizontal Integration of Bioinformatics Education: A Modular, Interdisciplinary Approach

    Science.gov (United States)

    Furge, Laura Lowe; Stevens-Truss, Regina; Moore, D. Blaine; Langeland, James A.

    2009-01-01

    Bioinformatics education for undergraduates has been approached primarily in two ways: introduction of new courses with largely bioinformatics focus or introduction of bioinformatics experiences into existing courses. For small colleges such as Kalamazoo, creation of new courses within an already resource-stretched setting has not been an option.…

  6. Implementing a web-based introductory bioinformatics course for non-bioinformaticians that incorporates practical exercises.

    Science.gov (United States)

    Vincent, Antony T; Bourbonnais, Yves; Brouard, Jean-Simon; Deveau, Hélène; Droit, Arnaud; Gagné, Stéphane M; Guertin, Michel; Lemieux, Claude; Rathier, Louis; Charette, Steve J; Lagüe, Patrick

    2017-09-13

    A recent scientific discipline, bioinformatics, defined as using informatics for the study of biological problems, is now a requirement for the study of biological sciences. Bioinformatics has become such a powerful and popular discipline that several academic institutions have created programs in this field, allowing students to become specialized. However, biology students who are not involved in a bioinformatics program also need a solid toolbox of bioinformatics software and skills. Therefore, we have developed a completely online bioinformatics course for non-bioinformaticians, entitled "BIF-1901 Introduction à la bio-informatique et à ses outils (Introduction to bioinformatics and bioinformatics tools)," given by the Department of Biochemistry, Microbiology, and Bioinformatics of Université Laval (Quebec City, Canada). This course requires neither a bioinformatics background nor specific skills in informatics. The underlying main goal was to produce a completely online up-to-date bioinformatics course, including practical exercises, with an intuitive pedagogical framework. The course, BIF-1901, was conceived to cover the three fundamental aspects of bioinformatics: (1) informatics, (2) biological sequence analysis, and (3) structural bioinformatics. This article discusses the content of the modules, the evaluations, the pedagogical framework, and the challenges inherent to a multidisciplinary, fully online course. © 2017 by The International Union of Biochemistry and Molecular Biology, 2017. © 2017 The International Union of Biochemistry and Molecular Biology.

  7. Introductory Bioinformatics Exercises Utilizing Hemoglobin and Chymotrypsin to Reinforce the Protein Sequence-Structure-Function Relationship

    Science.gov (United States)

    Inlow, Jennifer K.; Miller, Paige; Pittman, Bethany

    2007-01-01

    We describe two bioinformatics exercises intended for use in a computer laboratory setting in an upper-level undergraduate biochemistry course. To introduce students to bioinformatics, the exercises incorporate several commonly used bioinformatics tools, including BLAST, that are freely available online. The exercises build upon the students'…

  8. Vertical and Horizontal Integration of Bioinformatics Education: A Modular, Interdisciplinary Approach

    Science.gov (United States)

    Furge, Laura Lowe; Stevens-Truss, Regina; Moore, D. Blaine; Langeland, James A.

    2009-01-01

    Bioinformatics education for undergraduates has been approached primarily in two ways: introduction of new courses with largely bioinformatics focus or introduction of bioinformatics experiences into existing courses. For small colleges such as Kalamazoo, creation of new courses within an already resource-stretched setting has not been an option.…

  9. Applying Instructional Design Theories to Bioinformatics Education in Microarray Analysis and Primer Design Workshops

    Science.gov (United States)

    Shachak, Aviv; Ophir, Ron; Rubin, Eitan

    2005-01-01

    The need to support bioinformatics training has been widely recognized by scientists, industry, and government institutions. However, the discussion of instructional methods for teaching bioinformatics is only beginning. Here we report on a systematic attempt to design two bioinformatics workshops for graduate biology students on the basis of…

  10. Novel SINEs families in Medicago truncatula and Lotus japonicus: bioinformatic analysis.

    Science.gov (United States)

    Gadzalski, Marek; Sakowicz, Tomasz

    2011-07-01

    Although short interspersed elements (SINEs) were discovered nearly 30 years ago, the studies of these genomic repeats were mostly limited to animal genomes. Very little is known about SINEs in legumes--one of the most important plant families. Here we report identification, genomic distribution and molecular features of six novel SINE elements in Lotus japonicus (named LJ_SINE-1, -2, -3) and Medicago truncatula (MT_SINE-1, -2, -3), model species of legume. They possess all the structural features commonly found in short interspersed elements including RNA polymerase III promoter, polyA tail and flanking repeats. SINEs described here are present in low to moderate copy numbers from 150 to 3000. Bioinformatic analyses were used to searched public databases, we have shown that three of new SINE elements from M. truncatula seem to be characteristic of Medicago and Trifolium genera. Two SINE families have been found in L. japonicus and one is present in both M. truncatula and L. japonicus. In addition, we are discussing potential activities of the described elements. Copyright © 2011 Elsevier B.V. All rights reserved.

  11. Quantum Bio-Informatics II From Quantum Information to Bio-Informatics

    Science.gov (United States)

    Accardi, L.; Freudenberg, Wolfgang; Ohya, Masanori

    2009-02-01

    / H. Kamimura -- Massive collection of full-length complementary DNA clones and microarray analyses: keys to rice transcriptome analysis / S. Kikuchi -- Changes of influenza A(H5) viruses by means of entropic chaos degree / K. Sato and M. Ohya -- Basics of genome sequence analysis in bioinformatics - its fundamental ideas and problems / T. Suzuki and S. Miyazaki -- A basic introduction to gene expression studies using microarray expression data analysis / D. Wanke and J. Kilian -- Integrating biological perspectives: a quantum leap for microarray expression analysis / D. Wanke ... [et al.].

  12. PIBAS FedSPARQL: a web-based platform for integration and exploration of bioinformatics datasets.

    Science.gov (United States)

    Djokic-Petrovic, Marija; Cvjetkovic, Vladimir; Yang, Jeremy; Zivanovic, Marko; Wild, David J

    2017-09-20

    There are a huge variety of data sources relevant to chemical, biological and pharmacological research, but these data sources are highly siloed and cannot be queried together in a straightforward way. Semantic technologies offer the ability to create links and mappings across datasets and manage them as a single, linked network so that searching can be carried out across datasets, independently of the source. We have developed an application called PIBAS FedSPARQL that uses semantic technologies to allow researchers to carry out such searching across a vast array of data sources. PIBAS FedSPARQL is a web-based query builder and result set visualizer of bioinformatics data. As an advanced feature, our system can detect similar data items identified by different Uniform Resource Identifiers (URIs), using a text-mining algorithm based on the processing of named entities to be used in Vector Space Model and Cosine Similarity Measures. According to our knowledge, PIBAS FedSPARQL was unique among the systems that we found in that it allows detecting of similar data items. As a query builder, our system allows researchers to intuitively construct and run Federated SPARQL queries across multiple data sources, including global initiatives, such as Bio2RDF, Chem2Bio2RDF, EMBL-EBI, and one local initiative called CPCTAS, as well as additional user-specified data source. From the input topic, subtopic, template and keyword, a corresponding initial Federated SPARQL query is created and executed. Based on the data obtained, end users have the ability to choose the most appropriate data sources in their area of interest and exploit their Resource Description Framework (RDF) structure, which allows users to select certain properties of data to enhance query results. The developed system is flexible and allows intuitive creation and execution of queries for an extensive range of bioinformatics topics. Also, the novel "similar data items detection" algorithm can be particularly

  13. Duct Leakage Repeatability Testing

    Energy Technology Data Exchange (ETDEWEB)

    Walker, Iain [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Sherman, Max [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

    2014-01-01

    Duct leakage often needs to be measured to demonstrate compliance with requirements or to determine energy or Indoor Air Quality (IAQ) impacts. Testing is often done using standards such as ASTM E1554 (ASTM 2013) or California Title 24 (California Energy Commission 2013 & 2013b), but there are several choices of methods available within the accepted standards. Determining which method to use or not use requires an evaluation of those methods in the context of the particular needs. Three factors that are important considerations are the cost of the measurement, the accuracy of the measurement and the repeatability of the measurement. The purpose of this report is to evaluate the repeatability of the three most significant measurement techniques using data from the literature and recently obtained field data. We will also briefly discuss the first two factors. The main question to be answered by this study is to determine if differences in the repeatability of these tests methods is sufficient to indicate that any of these methods is so poor that it should be excluded from consideration as an allowed procedure in codes and standards.

  14. Use or abuse of bioinformatic tools: a response to Samach.

    Science.gov (United States)

    Muñoz-Fambuena, Natalia; Mesejo, Carlos; González-Mas, María C; Primo-Millo, Eduardo; Agustí, Manuel; Iglesias, Domingo J

    2013-03-01

    In a recent paper, we described for the first time the effects of fruit on the expression of putative homologues of genes involved in flowering pathways. It was our aim to provide insight into the molecular mechanisms underlying alternate bearing in citrus. However, a bioinformatics-based critique of our and other related papers has been given by Samach in the preceding Viewpoint article in this issue of Annals of Botany. The use of certain bioinformatic tools in a context of structural rather than functional genomics can cast doubts about the veracity of a large amount of data published in recent years. In this response, the contentions raised by Samach are analysed, and rebuttals of his criticisms are presented.

  15. WIWS: a protein structure bioinformatics Web service collection.

    Science.gov (United States)

    Hekkelman, M L; Te Beek, T A H; Pettifer, S R; Thorne, D; Attwood, T K; Vriend, G

    2010-07-01

    The WHAT IF molecular-modelling and drug design program is widely distributed in the world of protein structure bioinformatics. Although originally designed as an interactive application, its highly modular design and inbuilt control language have recently enabled its deployment as a collection of programmatically accessible web services. We report here a collection of WHAT IF-based protein structure bioinformatics web services: these relate to structure quality, the use of symmetry in crystal structures, structure correction and optimization, adding hydrogens and optimizing hydrogen bonds and a series of geometric calculations. The freely accessible web services are based on the industry standard WS-I profile and the EMBRACE technical guidelines, and are available via both REST and SOAP paradigms. The web services run on a dedicated computational cluster; their function and availability is monitored daily.

  16. Statistical modelling in biostatistics and bioinformatics selected papers

    CERN Document Server

    Peng, Defen

    2014-01-01

    This book presents selected papers on statistical model development related mainly to the fields of Biostatistics and Bioinformatics. The coverage of the material falls squarely into the following categories: (a) Survival analysis and multivariate survival analysis, (b) Time series and longitudinal data analysis, (c) Statistical model development and (d) Applied statistical modelling. Innovations in statistical modelling are presented throughout each of the four areas, with some intriguing new ideas on hierarchical generalized non-linear models and on frailty models with structural dispersion, just to mention two examples. The contributors include distinguished international statisticians such as Philip Hougaard, John Hinde, Il Do Ha, Roger Payne and Alessandra Durio, among others, as well as promising newcomers. Some of the contributions have come from researchers working in the BIO-SI research programme on Biostatistics and Bioinformatics, centred on the Universities of Limerick and Galway in Ireland and fu...

  17. Architecture exploration of FPGA based accelerators for bioinformatics applications

    CERN Document Server

    Varma, B Sharat Chandra; Balakrishnan, M

    2016-01-01

    This book presents an evaluation methodology to design future FPGA fabrics incorporating hard embedded blocks (HEBs) to accelerate applications. This methodology will be useful for selection of blocks to be embedded into the fabric and for evaluating the performance gain that can be achieved by such an embedding. The authors illustrate the use of their methodology by studying the impact of HEBs on two important bioinformatics applications: protein docking and genome assembly. The book also explains how the respective HEBs are designed and how hardware implementation of the application is done using these HEBs. It shows that significant speedups can be achieved over pure software implementations by using such FPGA-based accelerators. The methodology presented in this book may also be used for designing HEBs for accelerating software implementations in other domains besides bioinformatics. This book will prove useful to students, researchers, and practicing engineers alike.

  18. 2nd Colombian Congress on Computational Biology and Bioinformatics

    CERN Document Server

    Cristancho, Marco; Isaza, Gustavo; Pinzón, Andrés; Rodríguez, Juan

    2014-01-01

    This volume compiles accepted contributions for the 2nd Edition of the Colombian Computational Biology and Bioinformatics Congress CCBCOL, after a rigorous review process in which 54 papers were accepted for publication from 119 submitted contributions. Bioinformatics and Computational Biology are areas of knowledge that have emerged due to advances that have taken place in the Biological Sciences and its integration with Information Sciences. The expansion of projects involving the study of genomes has led the way in the production of vast amounts of sequence data which needs to be organized, analyzed and stored to understand phenomena associated with living organisms related to their evolution, behavior in different ecosystems, and the development of applications that can be derived from this analysis.  .

  19. State of the nation in data integration for bioinformatics.

    Science.gov (United States)

    Goble, Carole; Stevens, Robert

    2008-10-01

    Data integration is a perennial issue in bioinformatics, with many systems being developed and many technologies offered as a panacea for its resolution. The fact that it is still a problem indicates a persistence of underlying issues. Progress has been made, but we should ask "what lessons have been learnt?", and "what still needs to be done?" Semantic Web and Web 2.0 technologies are the latest to find traction within bioinformatics data integration. Now we can ask whether the Semantic Web, mashups, or their combination, have the potential to help. This paper is based on the opening invited talk by Carole Goble given at the Health Care and Life Sciences Data Integration for the Semantic Web Workshop collocated with WWW2007. The paper expands on that talk. We attempt to place some perspective on past efforts, highlight the reasons for success and failure, and indicate some pointers to the future.

  20. Some statistics in bioinformatics: the fifth Armitage Lecture.

    Science.gov (United States)

    Solomon, Patricia J

    2009-10-15

    The spirit and content of the 2007 Armitage Lecture are presented in this paper. To begin, two areas of Peter Armitage's early work are distinguished: his pioneering research on sequential methods intended for use in medical trials and the comparison of survival curves. Their influence on much later work is highlighted, and motivate the proposal of several statistical 'truths' that are presented in the paper. The illustration of these truths demonstrates biology's new morphology and its dominance over statistics in this century. An overview of a recent proteomics ovarian cancer study is given as a warning of what can happen when bioinformatics meets epidemiology badly, in particular, when the study design is poor. A statistical bioinformatics success story is outlined, in which gene profiling is helping to identify novel genes and networks involved in mouse embryonic stem cell development. Some concluding thoughts are given.

  1. Bioinformatics Analysis of Zinc Transporter from Baoding Alfalfa

    Institute of Scientific and Technical Information of China (English)

    Haibo WANG; Junyun GUO

    2012-01-01

    [Objective] This study aimed to perform the bioinformatics analysis of Zinc transporter (ZnT) from Baoding Alfalfa. [Method] Based on the amino acid sequence, the physical and chemical properties, hydrophilicity/hydrophobicity, secondary structure of ZnT from Baoding alfalfa were predicted by a series of bioinformatics software. And the transmembrane domains were predicted by using different online tools. [Result] ZnT is a hydrophobic protein containing 408 amino acids with the theoretical pl of 5.94, and it has 7 potential transmembrane hydrophobic regions. In the sec- ondary structure, co-helix (Hh) accounted for 48.04%, extended strand (Ee) for 9.56%, random coil (Cc) for 42.40%, which was accored with the characteristic of transmembrane protein. [Conclusion] mZnT is a member of CDF family, responsible for transporting Zn^2+ out of the cell membrane to reduce the concentration and toxicity of Zn^2+.

  2. Bioinformatics Data Distribution and Integration via Web Services and XML

    Institute of Scientific and Technical Information of China (English)

    Xiao Li; Yizheng Zhang

    2003-01-01

    It is widely recognized that exchange, distribution, and integration of biological data are the keys to improve bioinformatics and genome biology in post-genomic era. However, the problem of exchanging and integrating biological data is not solved satisfactorily. The eXtensible Markup Language (XML) is rapidly spreading as an emerging standard for structuring documents to exchange and integrate data on the World Wide Web (WWW). Web service is the next generation of WWW and is founded upon the open standards of W3C (World Wide Web Consortium)and IETF (Internet Engineering Task Force). This paper presents XML and Web Services technologies and their use for an appropriate solution to the problem of bioinformatics data exchange and integration.

  3. Bioinformatics for whole-genome shotgun sequencing of microbial communities.

    Directory of Open Access Journals (Sweden)

    Kevin Chen

    2005-07-01

    Full Text Available The application of whole-genome shotgun sequencing to microbial communities represents a major development in metagenomics, the study of uncultured microbes via the tools of modern genomic analysis. In the past year, whole-genome shotgun sequencing projects of prokaryotic communities from an acid mine biofilm, the Sargasso Sea, Minnesota farm soil, three deep-sea whale falls, and deep-sea sediments have been reported, adding to previously published work on viral communities from marine and fecal samples. The interpretation of this new kind of data poses a wide variety of exciting and difficult bioinformatics problems. The aim of this review is to introduce the bioinformatics community to this emerging field by surveying existing techniques and promising new approaches for several of the most interesting of these computational problems.

  4. Rise and demise of bioinformatics? Promise and progress.

    Directory of Open Access Journals (Sweden)

    Christos A Ouzounis

    Full Text Available The field of bioinformatics and computational biology has gone through a number of transformations during the past 15 years, establishing itself as a key component of new biology. This spectacular growth has been challenged by a number of disruptive changes in science and technology. Despite the apparent fatigue of the linguistic use of the term itself, bioinformatics has grown perhaps to a point beyond recognition. We explore both historical aspects and future trends and argue that as the field expands, key questions remain unanswered and acquire new meaning while at the same time the range of applications is widening to cover an ever increasing number of biological disciplines. These trends appear to be pointing to a redefinition of certain objectives, milestones, and possibly the field itself.

  5. Best practices in bioinformatics training for life scientists

    DEFF Research Database (Denmark)

    Via, Allegra; Blicher, Thomas; Bongcam-Rudloff, Erik

    2013-01-01

    The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians...... to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse....... In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course...

  6. Bioinformatics Tools for Small Genomes, Such as Hepatitis B Virus

    Directory of Open Access Journals (Sweden)

    Trevor G. Bell

    2015-02-01

    Full Text Available DNA sequence analysis is undertaken in many biological research laboratories. The workflow consists of several steps involving the bioinformatic processing of biological data. We have developed a suite of web-based online bioinformatic tools to assist with processing, analysis and curation of DNA sequence data. Most of these tools are genome-agnostic, with two tools specifically designed for hepatitis B virus sequence data. Tools in the suite are able to process sequence data from Sanger sequencing, ultra-deep amplicon resequencing (pyrosequencing and chromatograph (trace files, as appropriate. The tools are available online at no cost and are aimed at researchers without specialist technical computer knowledge. The tools can be accessed at http://hvdr.bioinf.wits.ac.za/SmallGenomeTools, and the source code is available online at https://github.com/DrTrevorBell/SmallGenomeTools.

  7. REVIEW-ARTICLE Bioinformatics: an overview and its applications.

    Science.gov (United States)

    Diniz, W J S; Canduri, F

    2017-03-15

    Technological advancements in recent years have promoted a marked progress in understanding the genetic basis of phenotypes. In line with these advances, genomics has changed the paradigm of biological questions in full genome-wide scale (genome-wide), revealing an explosion of data and opening up many possibilities. On the other hand, the vast amount of information that has been generated points the challenges that must be overcome for storage (Moore's law) and processing of biological information. In this context, bioinformatics and computational biology have sought to overcome such challenges. This review presents an overview of bioinformatics and its use in the analysis of biological data, exploring approaches, emerging methodologies, and tools that can give biological meaning to the data generated.

  8. Reading the World's Classics Critically: A Keyword-Based Approach to Literary Analysis in Foreign Language Studies

    Science.gov (United States)

    García, Nuria Alonso; Caplan, Alison

    2014-01-01

    While there are a number of important critical pedagogies being proposed in the field of foreign language study, more attention should be given to providing concrete examples of how to apply these ideas in the classroom. This article offers a new approach to the textual analysis of literary classics through the keyword-based methodology originally…

  9. The Effects of Keyword Cues and 3R Strategy on Children's e-Book Reading

    Science.gov (United States)

    Liang, T.-H.

    2015-01-01

    Various studies have found that electronic books (e-books) promote learning, but few works have examined the use of e-books along with an adaptive reading strategy for children. The current study implemented a method to extract keyword cues from e-books to support e-book reading with the read, recite and review (3R) strategy, and then examined the…

  10. The Effects of Keyword Cues and 3R Strategy on Children's e-Book Reading

    Science.gov (United States)

    Liang, T.-H.

    2015-01-01

    Various studies have found that electronic books (e-books) promote learning, but few works have examined the use of e-books along with an adaptive reading strategy for children. The current study implemented a method to extract keyword cues from e-books to support e-book reading with the read, recite and review (3R) strategy, and then examined the…

  11. lobSTR: A short tandem repeat profiler for personal genomes

    OpenAIRE

    Gymrek, Melissa; Golan, David; Rosset, Saharon; Erlich, Yaniv

    2012-01-01

    Short tandem repeats (STRs) have a wide range of applications, including medical genetics, forensics, and genetic genealogy. High-throughput sequencing (HTS) has the potential to profile hundreds of thousands of STR loci. However, mainstream bioinformatics pipelines are inadequate for the task. These pipelines treat STR mapping as gapped alignment, which results in cumbersome processing times and a biased sampling of STR alleles. Here, we present lobSTR, a novel method for profiling STRs in p...

  12. [Clustered regularly interspaced short palindromic repeats: structure, function and application--a review].

    Science.gov (United States)

    Cui, Yujun; Li, Yanjun; Yan, Yanfeng; Yang, Ruifu

    2008-11-01

    CRISPRs (Clustered Regularly Interspaced Short Palindromic Repeats), the basis of spoligotyping technology, can provide prokaryotes with heritable adaptive immunity against phages' invasion. Studies on CRISPR loci and their associated elements, including various CAS (CRISPR-associated) proteins and leader sequences, are still in its infant period. We introduce the brief history', structure, function, bioinformatics research and application of this amazing immunity system in prokaryotic organism for inspiring more scientists to find their interest in this developing topic.

  13. BIRCH: A user-oriented, locally-customizable, bioinformatics system

    Directory of Open Access Journals (Sweden)

    Fristensky Brian

    2007-02-01

    Full Text Available Abstract Background Molecular biologists need sophisticated analytical tools which often demand extensive computational resources. While finding, installing, and using these tools can be challenging, pipelining data from one program to the next is particularly awkward, especially when using web-based programs. At the same time, system administrators tasked with maintaining these tools do not always appreciate the needs of research biologists. Results BIRCH (Biological Research Computing Hierarchy is an organizational framework for delivering bioinformatics resources to a user group, scaling from a single lab to a large institution. The BIRCH core distribution includes many popular bioinformatics programs, unified within the GDE (Genetic Data Environment graphic interface. Of equal importance, BIRCH provides the system administrator with tools that simplify the job of managing a multiuser bioinformatics system across different platforms and operating systems. These include tools for integrating locally-installed programs and databases into BIRCH, and for customizing the local BIRCH system to meet the needs of the user base. BIRCH can also act as a front end to provide a unified view of already-existing collections of bioinformatics software. Documentation for the BIRCH and locally-added programs is merged in a hierarchical set of web pages. In addition to manual pages for individual programs, BIRCH tutorials employ step by step examples, with screen shots and sample files, to illustrate both the important theoretical and practical considerations behind complex analytical tasks. Conclusion BIRCH provides a versatile organizational framework for managing software and databases, and making these accessible to a user base. Because of its network-centric design, BIRCH makes it possible for any user to do any task from anywhere.

  14. Bioinformatics meets user-centred design: a perspective.

    Directory of Open Access Journals (Sweden)

    Katrina Pavelin

    Full Text Available Designers have a saying that "the joy of an early release lasts but a short time. The bitterness of an unusable system lasts for years." It is indeed disappointing to discover that your data resources are not being used to their full potential. Not only have you invested your time, effort, and research grant on the project, but you may face costly redesigns if you want to improve the system later. This scenario would be less likely if the product was designed to provide users with exactly what they need, so that it is fit for purpose before its launch. We work at EMBL-European Bioinformatics Institute (EMBL-EBI, and we consult extensively with life science researchers to find out what they need from biological data resources. We have found that although users believe that the bioinformatics community is providing accurate and valuable data, they often find the interfaces to these resources tricky to use and navigate. We believe that if you can find out what your users want even before you create the first mock-up of a system, the final product will provide a better user experience. This would encourage more people to use the resource and they would have greater access to the data, which could ultimately lead to more scientific discoveries. In this paper, we explore the need for a user-centred design (UCD strategy when designing bioinformatics resources and illustrate this with examples from our work at EMBL-EBI. Our aim is to introduce the reader to how selected UCD techniques may be successfully applied to software design for bioinformatics.

  15. Bioinformatics meets user-centred design: a perspective.

    Science.gov (United States)

    Pavelin, Katrina; Cham, Jennifer A; de Matos, Paula; Brooksbank, Cath; Cameron, Graham; Steinbeck, Christoph

    2012-01-01

    Designers have a saying that "the joy of an early release lasts but a short time. The bitterness of an unusable system lasts for years." It is indeed disappointing to discover that your data resources are not being used to their full potential. Not only have you invested your time, effort, and research grant on the project, but you may face costly redesigns if you want to improve the system later. This scenario would be less likely if the product was designed to provide users with exactly what they need, so that it is fit for purpose before its launch. We work at EMBL-European Bioinformatics Institute (EMBL-EBI), and we consult extensively with life science researchers to find out what they need from biological data resources. We have found that although users believe that the bioinformatics community is providing accurate and valuable data, they often find the interfaces to these resources tricky to use and navigate. We believe that if you can find out what your users want even before you create the first mock-up of a system, the final product will provide a better user experience. This would encourage more people to use the resource and they would have greater access to the data, which could ultimately lead to more scientific discoveries. In this paper, we explore the need for a user-centred design (UCD) strategy when designing bioinformatics resources and illustrate this with examples from our work at EMBL-EBI. Our aim is to introduce the reader to how selected UCD techniques may be successfully applied to software design for bioinformatics.

  16. Bioinformatics Prediction of Polyketide Synthase Gene Clusters from Mycosphaerella fijiensis

    OpenAIRE

    Noar, Roslyn D.; Daub, Margaret E.

    2016-01-01

    Mycosphaerella fijiensis, causal agent of black Sigatoka disease of banana, is a Dothideomycete fungus closely related to fungi that produce polyketides important for plant pathogenicity. We utilized the M. fijiensis genome sequence to predict PKS genes and their gene clusters and make bioinformatics predictions about the types of compounds produced by these clusters. Eight PKS gene clusters were identified in the M. fijiensis genome, placing M. fijiensis into the 23rd percentile for the numb...

  17. A Quick Guide for Building a Successful Bioinformatics Community

    Science.gov (United States)

    Budd, Aidan; Corpas, Manuel; Brazas, Michelle D.; Fuller, Jonathan C.; Goecks, Jeremy; Mulder, Nicola J.; Michaut, Magali; Ouellette, B. F. Francis; Pawlik, Aleksandra; Blomberg, Niklas

    2015-01-01

    “Scientific community” refers to a group of people collaborating together on scientific-research-related activities who also share common goals, interests, and values. Such communities play a key role in many bioinformatics activities. Communities may be linked to a specific location or institute, or involve people working at many different institutions and locations. Education and training is typically an important component of these communities, providing a valuable context in which to develop skills and expertise, while also strengthening links and relationships within the community. Scientific communities facilitate: (i) the exchange and development of ideas and expertise; (ii) career development; (iii) coordinated funding activities; (iv) interactions and engagement with professionals from other fields; and (v) other activities beneficial to individual participants, communities, and the scientific field as a whole. It is thus beneficial at many different levels to understand the general features of successful, high-impact bioinformatics communities; how individual participants can contribute to the success of these communities; and the role of education and training within these communities. We present here a quick guide to building and maintaining a successful, high-impact bioinformatics community, along with an overview of the general benefits of participating in such communities. This article grew out of contributions made by organizers, presenters, panelists, and other participants of the ISMB/ECCB 2013 workshop “The ‘How To Guide’ for Establishing a Successful Bioinformatics Network” at the 21st Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) and the 12th European Conference on Computational Biology (ECCB). PMID:25654371

  18. The web server of IBM's Bioinformatics and Pattern Discovery group

    OpenAIRE

    Huynh, Tien; Rigoutsos, Isidore; Parida, Laxmi; Platt, Daniel,; Shibuya, Tetsuo

    2003-01-01

    We herein present and discuss the services and content which are available on the web server of IBM's Bioinformatics and Pattern Discovery group. The server is operational around the clock and provides access to a variety of methods that have been published by the group's members and collaborators. The available tools correspond to applications ranging from the discovery of patterns in streams of events and the computation of multiple sequence alignments, to the discovery of genes in nucleic ...

  19. KBWS: an EMBOSS associated package for accessing bioinformatics web services.

    Science.gov (United States)

    Oshita, Kazuki; Arakawa, Kazuharu; Tomita, Masaru

    2011-04-29

    The availability of bioinformatics web-based services is rapidly proliferating, for their interoperability and ease of use. The next challenge is in the integration of these services in the form of workflows, and several projects are already underway, standardizing the syntax, semantics, and user interfaces. In order to deploy the advantages of web services with locally installed tools, here we describe a collection of proxy client tools for 42 major bioinformatics web services in the form of European Molecular Biology Open Software Suite (EMBOSS) UNIX command-line tools. EMBOSS provides sophisticated means for discoverability and interoperability for hundreds of tools, and our package, named the Keio Bioinformatics Web Service (KBWS), adds functionalities of local and multiple alignment of sequences, phylogenetic analyses, and prediction of cellular localization of proteins and RNA secondary structures. This software implemented in C is available under GPL from http://www.g-language.org/kbws/ and GitHub repository http://github.com/cory-ko/KBWS. Users can utilize the SOAP services implemented in Perl directly via WSDL file at http://soap.g-language.org/kbws.wsdl (RPC Encoded) and http://soap.g-language.org/kbws_dl.wsdl (Document/literal).

  20. Best practices in bioinformatics training for life scientists.

    KAUST Repository

    Via, Allegra

    2013-06-25

    The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse their data efficiently, and progress their research, is a challenge across the globe. Delivering good training goes beyond traditional lectures and resource-centric demos, using interactivity, problem-solving exercises and cooperative learning to substantially enhance training quality and learning outcomes. In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course organizers and trainers on the path towards bioinformatics training excellence but, importantly, also to improve the training experience for life scientists.

  1. A comparison of common programming languages used in bioinformatics.

    Science.gov (United States)

    Fourment, Mathieu; Gillings, Michael R

    2008-02-05

    The performance of different programming languages has previously been benchmarked using abstract mathematical algorithms, but not using standard bioinformatics algorithms. We compared the memory usage and speed of execution for three standard bioinformatics methods, implemented in programs using one of six different programming languages. Programs for the Sellers algorithm, the Neighbor-Joining tree construction algorithm and an algorithm for parsing BLAST file outputs were implemented in C, C++, C#, Java, Perl and Python. Implementations in C and C++ were fastest and used the least memory. Programs in these languages generally contained more lines of code. Java and C# appeared to be a compromise between the flexibility of Perl and Python and the fast performance of C and C++. The relative performance of the tested languages did not change from Windows to Linux and no clear evidence of a faster operating system was found. Source code and additional information are available from http://www.bioinformatics.org/benchmark/. This benchmark provides a comparison of six commonly used programming languages under two different operating systems. The overall comparison shows that a developer should choose an appropriate language carefully, taking into account the performance expected and the library availability for each language.

  2. p3d – Python module for structural bioinformatics

    Directory of Open Access Journals (Sweden)

    Fufezan Christian

    2009-08-01

    Full Text Available Abstract Background High-throughput bioinformatic analysis tools are needed to mine the large amount of structural data via knowledge based approaches. The development of such tools requires a robust interface to access the structural data in an easy way. For this the Python scripting language is the optimal choice since its philosophy is to write an understandable source code. Results p3d is an object oriented Python module that adds a simple yet powerful interface to the Python interpreter to process and analyse three dimensional protein structure files (PDB files. p3d's strength arises from the combination of a very fast spatial access to the structural data due to the implementation of a binary space partitioning (BSP tree, b set theory and c functions that allow to combine a and b and that use human readable language in the search queries rather than complex computer language. All these factors combined facilitate the rapid development of bioinformatic tools that can perform quick and complex analyses of protein structures. Conclusion p3d is the perfect tool to quickly develop tools for structural bioinformatics using the Python scripting language.

  3. High-throughput protein analysis integrating bioinformatics and experimental assays.

    Science.gov (United States)

    del Val, Coral; Mehrle, Alexander; Falkenhahn, Mechthild; Seiler, Markus; Glatting, Karl-Heinz; Poustka, Annemarie; Suhai, Sandor; Wiemann, Stefan

    2004-01-01

    The wealth of transcript information that has been made publicly available in recent years requires the development of high-throughput functional genomics and proteomics approaches for its analysis. Such approaches need suitable data integration procedures and a high level of automation in order to gain maximum benefit from the results generated. We have designed an automatic pipeline to analyse annotated open reading frames (ORFs) stemming from full-length cDNAs produced mainly by the German cDNA Consortium. The ORFs are cloned into expression vectors for use in large-scale assays such as the determination of subcellular protein localization or kinase reaction specificity. Additionally, all identified ORFs undergo exhaustive bioinformatic analysis such as similarity searches, protein domain architecture determination and prediction of physicochemical characteristics and secondary structure, using a wide variety of bioinformatic methods in combination with the most up-to-date public databases (e.g. PRINTS, BLOCKS, INTERPRO, PROSITE SWISSPROT). Data from experimental results and from the bioinformatic analysis are integrated and stored in a relational database (MS SQL-Server), which makes it possible for researchers to find answers to biological questions easily, thereby speeding up the selection of targets for further analysis. The designed pipeline constitutes a new automatic approach to obtaining and administrating relevant biological data from high-throughput investigations of cDNAs in order to systematically identify and characterize novel genes, as well as to comprehensively describe the function of the encoded proteins.

  4. An Integrative Study on Bioinformatics Computing Concepts, Issues and Problems

    Directory of Open Access Journals (Sweden)

    Muhammad Zakarya

    2011-11-01

    Full Text Available Bioinformatics is the permutation and mishmash of biological science and 4IT. The discipline covers every computational tools and techniques used to administer, examine and manipulate huge sets of biological statistics. The discipline also helps in creation of databases to store up and supervise biological statistics, improvement of computer algorithms to find out relations in these databases and use of computer tools for the study and understanding of biological information, including DNA, RNA, protein sequences, gene expression profiles, protein structures, and biochemical pathways. The study of this paper implements an integrative solution. As we know that solution to a problem in a specific discipline may be a solution to another problem in a different discipline. For example entropy that has been rented from physical sciences is solution to most of the problems and issues in computer science. Another example is bioinformatics, where computing method and applications are implemented over biological information. This paper shows an initiative step towards that and will discuss upon the needs for integration of multiple discipline and sciences. Similarly green chemistry gives birth to a new kind of computing i.e. green computing. In next versions of this paper we will study biological fuel cell and will discuss to develop a mobile battery that will be life time charged using the concepts of biological fuel cell. Another issue that we are going to discuss in our series is brain tumor detection. This paper is a review on BI i.e. bioinformatics to start with.

  5. GOBLET: The Global Organisation for Bioinformatics Learning, Education and Training

    Science.gov (United States)

    Atwood, Teresa K.; Bongcam-Rudloff, Erik; Brazas, Michelle E.; Corpas, Manuel; Gaudet, Pascale; Lewitter, Fran; Mulder, Nicola; Palagi, Patricia M.; Schneider, Maria Victoria; van Gelder, Celia W. G.

    2015-01-01

    In recent years, high-throughput technologies have brought big data to the life sciences. The march of progress has been rapid, leaving in its wake a demand for courses in data analysis, data stewardship, computing fundamentals, etc., a need that universities have not yet been able to satisfy—paradoxically, many are actually closing “niche” bioinformatics courses at a time of critical need. The impact of this is being felt across continents, as many students and early-stage researchers are being left without appropriate skills to manage, analyse, and interpret their data with confidence. This situation has galvanised a group of scientists to address the problems on an international scale. For the first time, bioinformatics educators and trainers across the globe have come together to address common needs, rising above institutional and international boundaries to cooperate in sharing bioinformatics training expertise, experience, and resources, aiming to put ad hoc training practices on a more professional footing for the benefit of all. PMID:25856076

  6. KBWS: an EMBOSS associated package for accessing bioinformatics web services

    Directory of Open Access Journals (Sweden)

    Tomita Masaru

    2011-04-01

    Full Text Available Abstract The availability of bioinformatics web-based services is rapidly proliferating, for their interoperability and ease of use. The next challenge is in the integration of these services in the form of workflows, and several projects are already underway, standardizing the syntax, semantics, and user interfaces. In order to deploy the advantages of web services with locally installed tools, here we describe a collection of proxy client tools for 42 major bioinformatics web services in the form of European Molecular Biology Open Software Suite (EMBOSS UNIX command-line tools. EMBOSS provides sophisticated means for discoverability and interoperability for hundreds of tools, and our package, named the Keio Bioinformatics Web Service (KBWS, adds functionalities of local and multiple alignment of sequences, phylogenetic analyses, and prediction of cellular localization of proteins and RNA secondary structures. This software implemented in C is available under GPL from http://www.g-language.org/kbws/ and GitHub repository http://github.com/cory-ko/KBWS. Users can utilize the SOAP services implemented in Perl directly via WSDL file at http://soap.g-language.org/kbws.wsdl (RPC Encoded and http://soap.g-language.org/kbws_dl.wsdl (Document/literal.

  7. Bioinformatics analysis and detection of gelatinase encoded gene in Lysinibacillussphaericus

    Science.gov (United States)

    Repin, Rul Aisyah Mat; Mutalib, Sahilah Abdul; Shahimi, Safiyyah; Khalid, Rozida Mohd.; Ayob, Mohd. Khan; Bakar, Mohd. Faizal Abu; Isa, Mohd Noor Mat

    2016-11-01

    In this study, we performed bioinformatics analysis toward genome sequence of Lysinibacillussphaericus (L. sphaericus) to determine gene encoded for gelatinase. L. sphaericus was isolated from soil and gelatinase species-specific bacterium to porcine and bovine gelatin. This bacterium offers the possibility of enzymes production which is specific to both species of meat, respectively. The main focus of this research is to identify the gelatinase encoded gene within the bacteria of L. Sphaericus using bioinformatics analysis of partially sequence genome. From the research study, three candidate gene were identified which was, gelatinase candidate gene 1 (P1), NODE_71_length_93919_cov_158.931839_21 which containing 1563 base pair (bp) in size with 520 amino acids sequence; Secondly, gelatinase candidate gene 2 (P2), NODE_23_length_52851_cov_190.061386_17 which containing 1776 bp in size with 591 amino acids sequence; and Thirdly, gelatinase candidate gene 3 (P3), NODE_106_length_32943_cov_169.147919_8 containing 1701 bp in size with 566 amino acids sequence. Three pairs of oligonucleotide primers were designed and namely as, F1, R1, F2, R2, F3 and R3 were targeted short sequences of cDNA by PCR. The amplicons were reliably results in 1563 bp in size for candidate gene P1 and 1701 bp in size for candidate gene P3. Therefore, the results of bioinformatics analysis of L. Sphaericus resulting in gene encoded gelatinase were identified.

  8. Best practices in bioinformatics training for life scientists.

    Science.gov (United States)

    Via, Allegra; Blicher, Thomas; Bongcam-Rudloff, Erik; Brazas, Michelle D; Brooksbank, Cath; Budd, Aidan; De Las Rivas, Javier; Dreyer, Jacqueline; Fernandes, Pedro L; van Gelder, Celia; Jacob, Joachim; Jimenez, Rafael C; Loveland, Jane; Moran, Federico; Mulder, Nicola; Nyrönen, Tommi; Rother, Kristian; Schneider, Maria Victoria; Attwood, Teresa K

    2013-09-01

    The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse their data efficiently, and progress their research, is a challenge across the globe. Delivering good training goes beyond traditional lectures and resource-centric demos, using interactivity, problem-solving exercises and cooperative learning to substantially enhance training quality and learning outcomes. In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course organizers and trainers on the path towards bioinformatics training excellence but, importantly, also to improve the training experience for life scientists.

  9. Repeatability of Cryogenic Multilayer Insulation

    Science.gov (United States)

    Johnson, W. L.; Vanderlaan, M.; Wood, J. J.; Rhys, N. O.; Guo, W.; Van Sciver, S.; Chato, D. J.

    2017-01-01

    Due to the variety of requirements across aerospace platforms, and one off projects, the repeatability of cryogenic multilayer insulation has never been fully established. The objective of this test program is to provide a more basic understanding of the thermal performance repeatability of MLI systems that are applicable to large scale tanks. There are several different types of repeatability that can be accounted for: these include repeatability between multiple identical blankets, repeatability of installation of the same blanket, and repeatability of a test apparatus. The focus of the work in this report is on the first two types of repeatability. Statistically, repeatability can mean many different things. In simplest form, it refers to the range of performance that a population exhibits and the average of the population. However, as more and more identical components are made (i.e. the population of concern grows), the simple range morphs into a standard deviation from an average performance. Initial repeatability testing on MLI blankets has been completed at Florida State University. Repeatability of five GRC provided coupons with 25 layers was shown to be +/- 8.4 whereas repeatability of repeatedly installing a single coupon was shown to be +/- 8.0. A second group of 10 coupons have been fabricated by Yetispace and tested by Florida State University, through the first 4 tests, the repeatability has been shown to be +/- 16. Based on detailed statistical analysis, the data has been shown to be statistically significant.

  10. 意会关键词信息取证方法%Information Forensics Method for Tacit Keyword

    Institute of Scientific and Technical Information of China (English)

    孙艳; 周学广; 陈涛

    2011-01-01

    As a new form of media, the Internet public opinion influences the society profoundly. Information forensics is an important aspect in the Internet public opinion monitoring field. To resolve the problem of unhealthy information forensics, an information forensics technique for tacit keywords is presented. It defines the term of tacit keywords, then classifies and quantifies tacit keywords. Six algorithms of tacit keywords extracted for information forensics are proposed. Based on the algorithms, the integrity of the evidence information is protected with Hash function. Experimental results show all the extraction time of the six algorithms presented controls in the ms grade, and the accurate rate and recall rate are up to 92% and 95%. This indicates that the presented information forensics technique is effective.%新型宣传媒介的网络舆情存在大量非法信息.为此,提出一种意会关键词信息取证方法,对中文意会关键词进行定义、分类和量化,给出6种意会关键词提取算法,对提取的证据信息进行完整性处理.实验结果表明,6种算法的提取速度均在毫秒级,查准率和查全率分别达到92%和95%,从而保证在网页舆情监控下非法信息的取证效率.

  11. Computer-aided vaccine designing approach against fish pathogens Edwardsiella tarda and Flavobacterium columnare using bioinformatics softwares

    Directory of Open Access Journals (Sweden)

    Mahendran R

    2016-05-01

    Full Text Available Radha Mahendran,1 Suganya Jeyabaskar,1 Gayathri Sitharaman,1 Rajamani Dinakaran Michael,2 Agnal Vincent Paul1 1Department of Bioinformatics, 2Centre for Fish Immunology, School of Life Sciences, Vels University, Pallavaram, Chennai, Tamil Nadu, India Abstract: Edwardsiella tarda and Flavobacterium columnare are two important intracellular pathogenic bacteria that cause the infectious diseases edwardsiellosis and columnaris in wild and cultured fish. Prediction of major histocompatibility complex (MHC binding is an important issue in T-cell epitope prediction. In a healthy immune system, the T-cells must recognize epitopes and induce the immune response. In this study, T-cell epitopes were predicted by using in silico immunoinformatics approach with the help of bioinformatics tools that are less expensive and are not time consuming. Such identification of binding interaction between peptides and MHC alleles aids in the discovery of new peptide vaccines. We have reported the potential peptides chosen from the outer membrane proteins (OMPs of E. tarda and F. columnare, which interact well with MHC class I alleles. OMPs from E. tarda and F. columnare were selected and analyzed based on their antigenic and immunogenic properties. The OMPs of the genes TolC and FCOL_04620, respectively, from E. tarda and F. columnare were taken for study. Finally, two epitopes from the OMP of E. tarda exhibited excellent protein–peptide interaction when docked with MHC class I alleles. Five epitopes from the OMP of F. columnare had good protein–peptide interaction when docked with MHC class I alleles. Further in vitro studies can aid in the development of potential peptide vaccines using the predicted peptides. Keywords: E. tarda, F. columnare, edwardsiellosis, columnaris, T-cell epitopes, MHC class I, peptide vaccine, outer membrane proteins 

  12. Bioinformatics in microbial biotechnology – a mini review

    Directory of Open Access Journals (Sweden)

    Bansal Arvind K

    2005-06-01

    Full Text Available Abstract The revolutionary growth in the computation speed and memory storage capability has fueled a new era in the analysis of biological data. Hundreds of microbial genomes and many eukaryotic genomes including a cleaner draft of human genome have been sequenced raising the expectation of better control of microorganisms. The goals are as lofty as the development of rational drugs and antimicrobial agents, development of new enhanced bacterial strains for bioremediation and pollution control, development of better and easy to administer vaccines, the development of protein biomarkers for various bacterial diseases, and better understanding of host-bacteria interaction to prevent bacterial infections. In the last decade the development of many new bioinformatics techniques and integrated databases has facilitated the realization of these goals. Current research in bioinformatics can be classified into: (i genomics – sequencing and comparative study of genomes to identify gene and genome functionality, (ii proteomics – identification and characterization of protein related properties and reconstruction of metabolic and regulatory pathways, (iii cell visualization and simulation to study and model cell behavior, and (iv application to the development of drugs and anti-microbial agents. In this article, we will focus on the techniques and their limitations in genomics and proteomics. Bioinformatics research can be classified under three major approaches: (1 analysis based upon the available experimental wet-lab data, (2 the use of mathematical modeling to derive new information, and (3 an integrated approach that integrates search techniques with mathematical modeling. The major impact of bioinformatics research has been to automate the genome sequencing, automated development of integrated genomics and proteomics databases, automated genome comparisons to identify the genome function, automated derivation of metabolic pathways, gene

  13. Promoting synergistic research and education in genomics and bioinformatics.

    Science.gov (United States)

    Yang, Jack Y; Yang, Mary Qu; Zhu, Mengxia Michelle; Arabnia, Hamid R; Deng, Youping

    2008-01-01

    Bioinformatics and Genomics are closely related disciplines that hold great promises for the advancement of research and development in complex biomedical systems, as well as public health, drug design, comparative genomics, personalized medicine and so on. Research and development in these two important areas are impacting the science and technology.High throughput sequencing and molecular imaging technologies marked the beginning of a new era for modern translational medicine and personalized healthcare. The impact of having the human sequence and personalized digital images in hand has also created tremendous demands of developing powerful supercomputing, statistical learning and artificial intelligence approaches to handle the massive bioinformatics and personalized healthcare data, which will obviously have a profound effect on how biomedical research will be conducted toward the improvement of human health and prolonging of human life in the future. The International Society of Intelligent Biological Medicine (http://www.isibm.org) and its official journals, the International Journal of Functional Informatics and Personalized Medicine (http://www.inderscience.com/ijfipm) and the International Journal of Computational Biology and Drug Design (http://www.inderscience.com/ijcbdd) in collaboration with International Conference on Bioinformatics and Computational Biology (Biocomp), touch tomorrow's bioinformatics and personalized medicine throughout today's efforts in promoting the research, education and awareness of the upcoming integrated inter/multidisciplinary field. The 2007 international conference on Bioinformatics and Computational Biology (BIOCOMP07) was held in Las Vegas, the United States of American on June 25-28, 2007. The conference attracted over 400 papers, covering broad research areas in the genomics, biomedicine and bioinformatics. The Biocomp 2007 provides a common platform for the cross fertilization of ideas, and to help shape knowledge and

  14. Bioinformatics pipeline for functional identification and characterization of proteins

    Science.gov (United States)

    Skarzyńska, Agnieszka; Pawełkowicz, Magdalena; Krzywkowski, Tomasz; Świerkula, Katarzyna; PlÄ der, Wojciech; Przybecki, Zbigniew

    2015-09-01

    The new sequencing methods, called Next Generation Sequencing gives an opportunity to possess a vast amount of data in short time. This data requires structural and functional annotation. Functional identification and characterization of predicted proteins could be done by in silico approches, thanks to a numerous computational tools available nowadays. However, there is a need to confirm the results of proteins function prediction using different programs and comparing the results or confirm experimentally. Here we present a bioinformatics pipeline for structural and functional annotation of proteins.

  15. Biophysics and bioinformatics of transcription regulation in bacteria and bacteriophages

    Science.gov (United States)

    Djordjevic, Marko

    2005-11-01

    Due to rapid accumulation of biological data, bioinformatics has become a very important branch of biological research. In this thesis, we develop novel bioinformatic approaches and aid design of biological experiments by using ideas and methods from statistical physics. Identification of transcription factor binding sites within the regulatory segments of genomic DNA is an important step towards understanding of the regulatory circuits that control expression of genes. We propose a novel, biophysics based algorithm, for the supervised detection of transcription factor (TF) binding sites. The method classifies potential binding sites by explicitly estimating the sequence-specific binding energy and the chemical potential of a given TF. In contrast with the widely used information theory based weight matrix method, our approach correctly incorporates saturation in the transcription factor/DNA binding probability. This results in a significant reduction in the number of expected false positives, and in the explicit appearance---and determination---of a binding threshold. The new method was used to identify likely genomic binding sites for the Escherichia coli TFs, and to examine the relationship between TF binding specificity and degree of pleiotropy (number of regulatory targets). We next address how parameters of protein-DNA interactions can be obtained from data on protein binding to random oligos under controlled conditions (SELEX experiment data). We show that 'robust' generation of an appropriate data set is achieved by a suitable modification of the standard SELEX procedure, and propose a novel bioinformatic algorithm for analysis of such data. Finally, we use quantitative data analysis, bioinformatic methods and kinetic modeling to analyze gene expression strategies of bacterial viruses. We study bacteriophage Xp10 that infects rice pathogen Xanthomonas oryzae. Xp10 is an unusual bacteriophage, which has morphology and genome organization that most closely

  16. The bioinformatics of microarrays to study cancer: Advantages and disadvantages

    Science.gov (United States)

    Rodríguez-Segura, M. A.; Godina-Nava, J. J.; Villa-Treviño, S.

    2012-10-01

    Microarrays are devices designed to analyze simultaneous expression of thousands of genes. However, the process will adds noise into the information at each stage of the study. To analyze these thousands of data is necessary to use bioinformatics tools. The traditional analysis begins by normalizing data, but the obtained results are highly dependent on how it is conducted the study. It is shown the need to develop new strategies to analyze microarray. Liver tissue taken from an animal model in which is chemically induced cancer is used as an example.

  17. Bioinformatics Tools for the Discovery of New Nonribosomal Peptides

    DEFF Research Database (Denmark)

    Leclère, Valérie; Weber, Tilmann; Jacques, Philippe

    2016-01-01

    -dimensional structure of the peptides can be compared with the structural patterns of all known NRPs. The presented workflow leads to an efficient and rapid screening of genomic data generated by high throughput technologies. The exploration of such sequenced genomes may lead to the discovery of new drugs (i......This chapter helps in the use of bioinformatics tools relevant to the discovery of new nonribosomal peptides (NRPs) produced by microorganisms. The strategy described can be applied to draft or fully assembled genome sequences. It relies on the identification of the synthetase genes...

  18. SPOT--towards temporal data mining in medicine and bioinformatics.

    Science.gov (United States)

    Tusch, Guenter; Bretl, Chris; O'Connor, Martin; Connor, Martin; Das, Amar

    2008-11-06

    Mining large clinical and bioinformatics databases often includes exploration of temporal data. E.g., in liver transplantation, researchers might look for patients with an unusual time pattern of potential complications of the liver. In Knowledge-based Temporal Abstraction time-stamped data points are transformed into an interval-based representation. We extended this framework by creating an open-source platform, SPOT. It supports the R statistical package and knowledge representation standards (OWL, SWRL) using the open source Semantic Web tool Protégé-OWL.

  19. myGrid: personalised bioinformatics on the information grid.

    Science.gov (United States)

    Stevens, Robert D; Robinson, Alan J; Goble, Carole A

    2003-01-01

    The (my)Grid project aims to exploit Grid technology, with an emphasis on the Information Grid, and provide middleware layers that make it appropriate for the needs of bioinformatics. (my)Grid is building high level services for data and application integration such as resource discovery, workflow enactment and distributed query processing. Additional services are provided to support the scientific method and best practice found at the bench but often neglected at the workstation, notably provenance management, change notification and personalisation. We give an overview of these services and their metadata. In particular, semantically rich metadata expressed using ontologies necessary to discover, select and compose services into dynamic workflows.

  20. 一种改进的web文档关键词权重计算方法%An improved algorithm for weighting keywords in web documents

    Institute of Scientific and Technical Information of China (English)

    孙双; 贺樑; 杨静; 顾君忠

    2008-01-01

    In this paper, an improved algorithm, web-based keyword weight algorithm (WKWA), is presented to weight keywords in web documents. WKWA takes into account representation features of web documents and advantages of the TF*IDF, TFC and ITC algorithms in order to make it more appropriate for web documents. Meanwhile, the presented algorithm is applied to improved vector space model (IVSM). A real system has been implemented for calculating semantic similarities of web documents. Four experiments have been carried out. They are keyword weight calculation, feature item selection, semantic similarity calculation, and WKWA time performance. The results demonstrate accuracy of keyword weight, and semantic similarity is improved.

  1. Structural, Bioinformatic, and In Vivo Analyses of Two Treponema pallidum Lipoproteins Reveal a Unique TRAP Transporter

    Energy Technology Data Exchange (ETDEWEB)

    Deka, Ranjit K.; Brautigam, Chad A.; Goldberg, Martin; Schuck, Peter; Tomchick, Diana R.; Norgard, Michael V. (NIH); (UTSMC)

    2012-05-25

    Treponema pallidum, the bacterial agent of syphilis, is predicted to encode one tripartite ATP-independent periplasmic transporter (TRAP-T). TRAP-Ts typically employ a periplasmic substrate-binding protein (SBP) to deliver the cognate ligand to the transmembrane symporter. Herein, we demonstrate that the genes encoding the putative TRAP-T components from T. pallidum, tp0957 (the SBP), and tp0958 (the symporter), are in an operon with an uncharacterized third gene, tp0956. We determined the crystal structure of recombinant Tp0956; the protein is trimeric and perforated by a pore. Part of Tp0956 forms an assembly similar to those of 'tetratricopeptide repeat' (TPR) motifs. The crystal structure of recombinant Tp0957 was also determined; like the SBPs of other TRAP-Ts, there are two lobes separated by a cleft. In these other SBPs, the cleft binds a negatively charged ligand. However, the cleft of Tp0957 has a strikingly hydrophobic chemical composition, indicating that its ligand may be substantially different and likely hydrophobic. Analytical ultracentrifugation of the recombinant versions of Tp0956 and Tp0957 established that these proteins associate avidly. This unprecedented interaction was confirmed for the native molecules using in vivo cross-linking experiments. Finally, bioinformatic analyses suggested that this transporter exemplifies a new subfamily of TPATs (TPR-protein-associated TRAP-Ts) that require the action of a TPR-containing accessory protein for the periplasmic transport of a potentially hydrophobic ligand(s).

  2. Prokaryotic Expression of Rice Ospgip1 Gene and Bioinformatic Analysis of Encoded Product

    Institute of Scientific and Technical Information of China (English)

    CHEN Xi-jun; LIU Xiao-wei; Zuo Si-min; MA Yu-yin; TONG Yun-hui; PAN Xue-biao; XU Jing-you

    2011-01-01

    Using the reference sequences of pgip genes in GenBank,a fragment of 930 bp covering the open reading frame (ORF) of rice Ospgip1 (Oryza sativa polygalacturonase-inhibiting protein 1) was amplified.The prokaryotic expression product of the gene inhibited the growth of Rhizoctonia solani,the causal agent of rice sheath blight,and reduced its polygalacturonase activity.Bioinformatic analysis showed that OsPGIP1 is a hydrophobic protein with a molecular weight of 32.8 kDa and an isoelectric point (pl) of 7.26.The protein is mainly located in the cell wall of rice,and its signal peptide cleavage site is located between the 17th and 18th amino acids.There are four cysteines in both the N-and C-termini of the deduced protein,which can form three disulfide bonds (between the 56th and 63rd,the 278th and 298th,and the 300th and308th amino acids).The protein has a typical leucine-rich repeat (LRR) domain,and its secondary structure comprises α-helices,β-sheets and irregular coils.Compared with polygalacturonase-inhibiting proteins (PGIPs) from other plants,the 7th LRR is absent in OsPGIP1.The nine LRRs could form a cleft that might associate with proteins from pathogenic fungi,such as polygalacturonase.

  3. Bioinformatics Methods and Tools to Advance Clinical Care. Findings from the Yearbook 2015 Section on Bioinformatics and Translational Informatics.

    Science.gov (United States)

    Soualmia, L F; Lecroq, T

    2015-08-13

    To summarize excellent current research in the field of Bioinformatics and Translational Informatics with application in the health domain and clinical care. We provide a synopsis of the articles selected for the IMIA Yearbook 2015, from which we attempt to derive a synthetic overview of current and future activities in the field. As last year, a first step of selection was performed by querying MEDLINE with a list of MeSH descriptors completed by a list of terms adapted to the section. Each section editor has evaluated separately the set of 1,594 articles and the evaluation results were merged for retaining 15 articles for peer-review. The selection and evaluation process of this Yearbook's section on Bioinformatics and Translational Informatics yielded four excellent articles regarding data management and genome medicine that are mainly tool-based papers. In the first article, the authors present PPISURV a tool for uncovering the role of specific genes in cancer survival outcome. The second article describes the classifier PredictSNP which combines six performing tools for predicting disease-related mutations. In the third article, by presenting a high-coverage map of the human proteome using high resolution mass spectrometry, the authors highlight the need for using mass spectrometry to complement genome annotation. The fourth article is also related to patient survival and decision support. The authors present datamining methods of large-scale datasets of past transplants. The objective is to identify chances of survival. The current research activities still attest the continuous convergence of Bioinformatics and Medical Informatics, with a focus this year on dedicated tools and methods to advance clinical care. Indeed, there is a need for powerful tools for managing and interpreting complex, large-scale genomic and biological datasets, but also a need for user-friendly tools developed for the clinicians in their daily practice. All the recent research and

  4. Exact Tandem Repeats Analyzer (E-TRA): A new program for DNA sequence mining

    Indian Academy of Sciences (India)

    Mehmet Karaca; Mehmet Bilgen; A. Naci Onus; Ayse Gul Ince; Safinaz Y. Elmasulu

    2005-04-01

    Exact Tandem Repeats Analyzer 1.0 (E-TRA) combines sequence motif searches with keywords such as ‘organs’, ‘tissues’, ‘cell lines’ and ‘development stages’ for finding simple exact tandem repeats as well as non-simple repeats. E-TRA has several advanced repeat search parameters/options compared to other repeat finder programs as it not only accepts GenBank, FASTA and expressed sequence tags (EST) sequence files, but also does analysis of multiple files with multiple sequences. The minimum and maximum tandem repeat motif lengths that E-TRA finds vary from one to one thousand. Advanced user defined parameters/options let the researchers use different minimum motif repeats search criteria for varying motif lengths simultaneously. One of the most interesting features of genomes is the presence of relatively short tandem repeats (TRs). These repeated DNA sequences are found in both prokaryotes and eukaryotes, distributed almost at random throughout the genome. Some of the tandem repeats play important roles in the regulation of gene expression whereas others do not have any known biological function as yet. Nevertheless, they have proven to be very beneficial in DNA profiling and genetic linkage analysis studies. To demonstrate the use of E-TRA, we used 5,465,605 human EST sequences derived from 18,814,550 GenBank EST sequences. Our results indicated that 12.44% (679,800) of the human EST sequences contained simple and non-simple repeat string patterns varying from one to 126 nucleotides in length. The results also revealed that human organs, tissues, cell lines and different developmental stages differed in number of repeats as well as repeat composition, indicating that the distribution of expressed tandem repeats among tissues or organs are not random, thus differing from the un-transcribed repeats found in genomes.

  5. Agonist Binding to Chemosensory Receptors: A Systematic Bioinformatics Analysis

    Directory of Open Access Journals (Sweden)

    Fabrizio Fierro

    2017-09-01

    Full Text Available Human G-protein coupled receptors (hGPCRs constitute a large and highly pharmaceutically relevant membrane receptor superfamily. About half of the hGPCRs' family members are chemosensory receptors, involved in bitter taste and olfaction, along with a variety of other physiological processes. Hence these receptors constitute promising targets for pharmaceutical intervention. Molecular modeling has been so far the most important tool to get insights on agonist binding and receptor activation. Here we investigate both aspects by bioinformatics-based predictions across all bitter taste and odorant receptors for which site-directed mutagenesis data are available. First, we observe that state-of-the-art homology modeling combined with previously used docking procedures turned out to reproduce only a limited fraction of ligand/receptor interactions inferred by experiments. This is most probably caused by the low sequence identity with available structural templates, which limits the accuracy of the protein model and in particular of the side-chains' orientations. Methods which transcend the limited sampling of the conformational space of docking may improve the predictions. As an example corroborating this, we review here multi-scale simulations from our lab and show that, for the three complexes studied so far, they significantly enhance the predictive power of the computational approach. Second, our bioinformatics analysis provides support to previous claims that several residues, including those at positions 1.50, 2.50, and 7.52, are involved in receptor activation.

  6. MAPI: towards the integrated exploitation of bioinformatics Web Services

    Directory of Open Access Journals (Sweden)

    Karlsson Johan

    2011-10-01

    Full Text Available Abstract Background Bioinformatics is commonly featured as a well assorted list of available web resources. Although diversity of services is positive in general, the proliferation of tools, their dispersion and heterogeneity complicate the integrated exploitation of such data processing capacity. Results To facilitate the construction of software clients and make integrated use of this variety of tools, we present a modular programmatic application interface (MAPI that provides the necessary functionality for uniform representation of Web Services metadata descriptors including their management and invocation protocols of the services which they represent. This document describes the main functionality of the framework and how it can be used to facilitate the deployment of new software under a unified structure of bioinformatics Web Services. A notable feature of MAPI is the modular organization of the functionality into different modules associated with specific tasks. This means that only the modules needed for the client have to be installed, and that the module functionality can be extended without the need for re-writing the software client. Conclusions The potential utility and versatility of the software library has been demonstrated by the implementation of several currently available clients that cover different aspects of integrated data processing, ranging from service discovery to service invocation with advanced features such as workflows composition and asynchronous services calls to multiple types of Web Services including those registered in repositories (e.g. GRID-based, SOAP, BioMOBY, R-bioconductor, and others.

  7. Computational Lipidomics and Lipid Bioinformatics: Filling In the Blanks.

    Science.gov (United States)

    Pauling, Josch; Klipp, Edda

    2016-12-22

    Lipids are highly diverse metabolites of pronounced importance in health and disease. While metabolomics is a broad field under the omics umbrella that may also relate to lipids, lipidomics is an emerging field which specializes in the identification, quantification and functional interpretation of complex lipidomes. Today, it is possible to identify and distinguish lipids in a high-resolution, high-throughput manner and simultaneously with a lot of structural detail. However, doing so may produce thousands of mass spectra in a single experiment which has created a high demand for specialized computational support to analyze these spectral libraries. The computational biology and bioinformatics community has so far established methodology in genomics, transcriptomics and proteomics but there are many (combinatorial) challenges when it comes to structural diversity of lipids and their identification, quantification and interpretation. This review gives an overview and outlook on lipidomics research and illustrates ongoing computational and bioinformatics efforts. These efforts are important and necessary steps to advance the lipidomics field alongside analytic, biochemistry, biomedical and biology communities and to close the gap in available computational methodology between lipidomics and other omics sub-branches.

  8. mockrobiota: a Public Resource for Microbiome Bioinformatics Benchmarking.

    Science.gov (United States)

    Bokulich, Nicholas A; Rideout, Jai Ram; Mercurio, William G; Shiffer, Arron; Wolfe, Benjamin; Maurice, Corinne F; Dutton, Rachel J; Turnbaugh, Peter J; Knight, Rob; Caporaso, J Gregory

    2016-01-01

    Mock communities are an important tool for validating, optimizing, and comparing bioinformatics methods for microbial community analysis. We present mockrobiota, a public resource for sharing, validating, and documenting mock community data resources, available at http://caporaso-lab.github.io/mockrobiota/. The materials contained in mockrobiota include data set and sample metadata, expected composition data (taxonomy or gene annotations or reference sequences for mock community members), and links to raw data (e.g., raw sequence data) for each mock community data set. mockrobiota does not supply physical sample materials directly, but the data set metadata included for each mock community indicate whether physical sample materials are available. At the time of this writing, mockrobiota contains 11 mock community data sets with known species compositions, including bacterial, archaeal, and eukaryotic mock communities, analyzed by high-throughput marker gene sequencing. IMPORTANCE The availability of standard and public mock community data will facilitate ongoing method optimizations, comparisons across studies that share source data, and greater transparency and access and eliminate redundancy. These are also valuable resources for bioinformatics teaching and training. This dynamic resource is intended to expand and evolve to meet the changing needs of the omics community.

  9. Making sense of genomes of parasitic worms: Tackling bioinformatic challenges.

    Science.gov (United States)

    Korhonen, Pasi K; Young, Neil D; Gasser, Robin B

    2016-01-01

    Billions of people and animals are infected with parasitic worms (helminths). Many of these worms cause diseases that have a major socioeconomic impact worldwide, and are challenging to control because existing treatment methods are often inadequate. There is, therefore, a need to work toward developing new intervention methods, built on a sound understanding of parasitic worms at molecular level, the relationships that they have with their animal hosts and/or the diseases that they cause. Decoding the genomes and transcriptomes of these parasites brings us a step closer to this goal. The key focus of this article is to critically review and discuss bioinformatic tools used for the assembly and annotation of these genomes and transcriptomes, as well as various post-genomic analyses of transcription profiles, biological pathways, synteny, phylogeny, biogeography and the prediction and prioritisation of drug target candidates. Bioinformatic pipelines implemented and established recently provide practical and efficient tools for the assembly and annotation of genomes of parasitic worms, and will be applicable to a wide range of other parasites and eukaryotic organisms. Future research will need to assess the utility of long-read sequence data sets for enhanced genomic assemblies, and develop improved algorithms for gene prediction and post-genomic analyses, to enable comprehensive systems biology explorations of parasitic organisms.

  10. Web services at the European Bioinformatics Institute-2009.

    Science.gov (United States)

    McWilliam, Hamish; Valentin, Franck; Goujon, Mickael; Li, Weizhong; Narayanasamy, Menaka; Martin, Jenny; Miyar, Teresa; Lopez, Rodrigo

    2009-07-01

    The European Bioinformatics Institute (EMBL-EBI) has been providing access to mainstream databases and tools in bioinformatics since 1997. In addition to the traditional web form based interfaces, APIs exist for core data resources such as EMBL-Bank, Ensembl, UniProt, InterPro, PDB and ArrayExpress. These APIs are based on Web Services (SOAP/REST) interfaces that allow users to systematically access databases and analytical tools. From the user's point of view, these Web Services provide the same functionality as the browser-based forms. However, using the APIs frees the user from web page constraints and are ideal for the analysis of large batches of data, performing text-mining tasks and the casual or systematic evaluation of mathematical models in regulatory networks. Furthermore, these services are widespread and easy to use; require no prior knowledge of the technology and no more than basic experience in programming. In the following we wish to inform of new and updated services as well as briefly describe planned developments to be made available during the course of 2009-2010.

  11. The MPI Bioinformatics Toolkit for protein sequence analysis.

    Science.gov (United States)

    Biegert, Andreas; Mayer, Christian; Remmert, Michael; Söding, Johannes; Lupas, Andrei N

    2006-07-01

    The MPI Bioinformatics Toolkit is an interactive web service which offers access to a great variety of public and in-house bioinformatics tools. They are grouped into different sections that support sequence searches, multiple alignment, secondary and tertiary structure prediction and classification. Several public tools are offered in customized versions that extend their functionality. For example, PSI-BLAST can be run against regularly updated standard databases, customized user databases or selectable sets of genomes. Another tool, Quick2D, integrates the results of various secondary structure, transmembrane and disorder prediction programs into one view. The Toolkit provides a friendly and intuitive user interface with an online help facility. As a key feature, various tools are interconnected so that the results of one tool can be forwarded to other tools. One could run PSI-BLAST, parse out a multiple alignment of selected hits and send the results to a cluster analysis tool. The Toolkit framework and the tools developed in-house will be packaged and freely available under the GNU Lesser General Public Licence (LGPL). The Toolkit can be accessed at http://toolkit.tuebingen.mpg.de.

  12. Bioinformatics analysis of metastasis-related proteins in hepatocellular carcinoma

    Institute of Scientific and Technical Information of China (English)

    Pei-Ming Song; Yang Zhang; Yu-Fei He; Hui-Min Bao; Jian-Hua Luo; Yin-Kun Liu; Peng-Yuan Yang; Xian Chen

    2008-01-01

    AIM: To analyze the metastasis-related proteins in hepatocellular carcinoma (HCC) and discover the biomark-er candidates for diagnosis and therapeutic intervention of HCC metastasis with bioinformatics tools.METHODS: Metastasis-related proteins were determined by stable isotope labeling and MS analysis and analyzed with bioinformatics resources, including Phobius, Kyoto encyclopedia of genes and genomes (KEGG), online mendelian inheritance in man (OHIH) and human protein reference database (HPRD).RESULTS: All the metastasis-related proteins were linked to 83 pathways in KEGG, including MAPK and p53 signal pathways. Protein-protein interaction network showed that all the metastasis-related proteins were categorized into 19 function groups, including cell cycle, apoptosis and signal transcluction. OMIM analysis linked these proteins to 186 OMIM entries.CONCLUSION: Metastasis-related proteins provide HCC cells with biological advantages in cell proliferation, migration and angiogenesis, and facilitate metastasis of HCC cells. The bird's eye view can reveal a global charac-teristic of metastasis-related proteins and many differen-tially expressed proteins can be identified as candidates for diagnosis and treatment of HCC.

  13. Rabifier2: an improved bioinformatic classifier of Rab GTPases.

    Science.gov (United States)

    Surkont, Jaroslaw; Diekmann, Yoan; Pereira-Leal, José B

    2017-02-15

    The Rab family of small GTPases regulates and provides specificity to the endomembrane trafficking system; each Rab subfamily is associated with specific pathways. Thus, characterization of Rab repertoires provides functional information about organisms and evolution of the eukaryotic cell. Yet, the complex structure of the Rab family limits the application of existing methods for protein classification. Here, we present a major redesign of the Rabifier, a bioinformatic pipeline for detection and classification of Rab GTPases. It is more accurate, significantly faster than the original version and is now open source, both the code and the data, allowing for community participation. Rabifier and RabDB are freely available through the web at http://rabdb.org . The Rabifier package can be downloaded from the Python Package Index at https://pypi.python.org/pypi/rabifier , the source code is available at Github https://github.com/evocell/rabifier . jsurkont@igc.gulbenkian.pt or jleal@igc.gulbenkian.pt. Supplementary data are available at Bioinformatics online.

  14. Protecting innovation in bioinformatics and in-silico biology.

    Science.gov (United States)

    Harrison, Robert

    2003-01-01

    Commercial success or failure of innovation in bioinformatics and in-silico biology requires the appropriate use of legal tools for protecting and exploiting intellectual property. These tools include patents, copyrights, trademarks, design rights, and limiting information in the form of 'trade secrets'. Potentially patentable components of bioinformatics programmes include lines of code, algorithms, data content, data structure and user interfaces. In both the US and the European Union, copyright protection is granted for software as a literary work, and most other major industrial countries have adopted similar rules. Nonetheless, the grant of software patents remains controversial and is being challenged in some countries. Current debate extends to aspects such as whether patents can claim not only the apparatus and methods but also the data signals and/or products, such as a CD-ROM, on which the programme is stored. The patentability of substances discovered using in-silico methods is a separate debate that is unlikely to be resolved in the near future.

  15. Technosciences in Academia: Rethinking a Conceptual Framework for Bioinformatics Undergraduate Curricula

    Science.gov (United States)

    Symeonidis, Iphigenia Sofia

    This paper aims to elucidate guiding concepts for the design of powerful undergraduate bioinformatics degrees which will lead to a conceptual framework for the curriculum. "Powerful" here should be understood as having truly bioinformatics objectives rather than enrichment of existing computer science or life science degrees on which bioinformatics degrees are often based. As such, the conceptual framework will be one which aims to demonstrate intellectual honesty in regards to the field of bioinformatics. A synthesis/conceptual analysis approach was followed as elaborated by Hurd (1983). The approach takes into account the following: bioinfonnatics educational needs and goals as expressed by different authorities, five undergraduate bioinformatics degrees case-studies, educational implications of bioinformatics as a technoscience and approaches to curriculum design promoting interdisciplinarity and integration. Given these considerations, guiding concepts emerged and a conceptual framework was elaborated. The practice of bioinformatics was given a closer look, which led to defining tool-integration skills and tool-thinking capacity as crucial areas of the bioinformatics activities spectrum. It was argued, finally, that a process-based curriculum as a variation of a concept-based curriculum (where the concepts are processes) might be more conducive to the teaching of bioinformatics given a foundational first year of integrated science education as envisioned by Bialek and Botstein (2004). Furthermore, the curriculum design needs to define new avenues of communication and learning which bypass the traditional disciplinary barriers of academic settings as undertaken by Tador and Tidmor (2005) for graduate studies.

  16. Model-driven user interfaces for bioinformatics data resources: regenerating the wheel as an alternative to reinventing it

    Directory of Open Access Journals (Sweden)

    Swainston Neil

    2006-12-01

    Full Text Available Abstract Background The proliferation of data repositories in bioinformatics has resulted in the development of numerous interfaces that allow scientists to browse, search and analyse the data that they contain. Interfaces typically support repository access by means of web pages, but other means are also used, such as desktop applications and command line tools. Interfaces often duplicate functionality amongst each other, and this implies that associated development activities are repeated in different laboratories. Interfaces developed by public laboratories are often created with limited developer resources. In such environments, reducing the time spent on creating user interfaces allows for a better deployment of resources for specialised tasks, such as data integration or analysis. Laboratories maintaining data resources are challenged to reconcile requirements for software that is reliable, functional and flexible with limitations on software development resources. Results This paper proposes a model-driven approach for the partial generation of user interfaces for searching and browsing bioinformatics data repositories. Inspired by the Model Driven Architecture (MDA of the Object Management Group (OMG, we have developed a system that generates interfaces designed for use with bioinformatics resources. This approach helps laboratory domain experts decrease the amount of time they have to spend dealing with the repetitive aspects of user interface development. As a result, the amount of time they can spend on gathering requirements and helping develop specialised features increases. The resulting system is known as Pierre, and has been validated through its application to use cases in the life sciences, including the PEDRoDB proteomics database and the e-Fungi data warehouse. Conclusion MDAs focus on generating software from models that describe aspects of service capabilities, and can be applied to support rapid development of repository

  17. Variation of serine-aspartate repeats in membrane proteins possibly contributes to staphylococcal microevolution.

    Directory of Open Access Journals (Sweden)

    Jing Cheng

    Full Text Available Tandem repeats (either as microsatellites or minisatellites in eukaryotic and prokaryotic organisms are mutation-prone DNA. While minisatellites in prokaryotic genomes are underrepresented, the cell surface adhesins of bacteria often contain the minisatellite SD repeats, encoding the amino acid pair of serine-asparatate, especially in Staphylococcal strains. However, their relationship to biological functions is still elusive. In this study, effort was made to uncover the copy number variations of SD repeats by bioinformatic analysis and to detect changes in SD repeats during a plasmid-based assay, as a first step to understand its biological functions. The SD repeats were found to be mainly present in the cell surface proteins. The SD repeats were genetically unstable and polymorphic in terms of copy numbers and sequence compositions. Unlike SNPs, the change of its copy number was reversible, without frame shifting. More significantly, a rearrangement hot spot, the ATTC/AGRT site, was found to be mainly responsible for the instability and reversibility of SD repeats. These characteristics of SD repeats may facilitate bacteria to respond to environmental changes, with low cost, low risk and high efficiency.

  18. Design and Implementation of Keywords Frequency Statistic Program Basing on Keywords Fields Extracted from Subject Bibliography of Web of Science%WoS题录Keywords字段关键词频度统计程序设计与实现

    Institute of Scientific and Technical Information of China (English)

    朱玉强

    2015-01-01

    The paper takes Visual Basic as programming tool and uses regular expression to extract keywords filed data in batches from BibTex bibliography derived from Web of Science, combines synonyms, homoionyms and morphological changes of words as need-ed, then writes the keywords frequency statistics in Excel graph, and programs an Excel macro to draw broken line graph automatically to visualize the data. Information workers can further carry out complex data portfolio analysis of the Excel files by using Excel to im-prove their working efficiency.%使用Visual Basic编程,采用正则表达式批量提取由Web of Science导出的BibTex题录中所有Keywords字段关键词,按需合并所得关键词的同义词、近义词及词形变化词,然后将出现频度的统计数据写入Excel表,并编制Excel宏自动生成折线图,实现关键词分布的简单可视化。情报工作者后续可借助Excel功能对该程序生成的Excel表执行复杂的数据组合分析,以提高工作效率。

  19. Expansion of protein domain repeats.

    Directory of Open Access Journals (Sweden)

    Asa K Björklund

    2006-08-01

    Full Text Available Many proteins, especially in eukaryotes, contain tandem repeats of several domains from the same family. These repeats have a variety of binding properties and are involved in protein-protein interactions as well as binding to other ligands such as DNA and RNA. The rapid expansion of protein domain repeats is assumed to have evolved through internal tandem duplications. However, the exact mechanisms behind these tandem duplications are not well-understood. Here, we have studied the evolution, function, protein structure, gene structure, and phylogenetic distribution of domain repeats. For this purpose we have assigned Pfam-A domain families to 24 proteomes with more sensitive domain assignments in the repeat regions. These assignments confirmed previous findings that eukaryotes, and in particular vertebrates, contain a much higher fraction of proteins with repeats compared with prokaryotes. The internal sequence similarity in each protein revealed that the domain repeats are often expanded through duplications of several domains at a time, while the duplication of one domain is less common. Many of the repeats appear to have been duplicated in the middle of the repeat region. This is in strong contrast to the evolution of other proteins that mainly works through additions of single domains at either terminus. Further, we found that some domain families show distinct duplication patterns, e.g., nebulin domains have mainly been expanded with a unit of seven domains at a time, while duplications of other domain families involve varying numbers of domains. Finally, no common mechanism for the expansion of all repeats could be detected. We found that the duplication patterns show no dependence on the size of the domains. Further, repeat expansion in some families can possibly be explained by shuffling of exons. However, exon shuffling could not have created all repeats.

  20. The Bioinformatics of Integrative Medical Insights: Proposals for an International PsychoSocial and Cultural Bioinformatics Project

    Directory of Open Access Journals (Sweden)

    Ernest Rossi

    2006-01-01

    Full Text Available We propose the formation of an International PsychoSocial and Cultural Bioinformatics Project (IPCBP to explore the research foundations of Integrative Medical Insights (IMI on all levels from the molecular-genomic to the psychological, cultural, social, and spiritual. Just as The Human Genome Project identified the molecular foundations of modern medicine with the new technology of sequencing DNA during the past decade, the IPCBP would extend and integrate this neuroscience knowledge base with the technology of gene expression via DNA/proteomic microarray research and brain imaging in development, stress, healing, rehabilitation, and the psychotherapeutic facilitation of existentional wellness. We anticipate that the IPCBP will require a unique international collaboration of, academic institutions, researchers, and clinical practioners for the creation of a new neuroscience of mind-body communication, brain plasticity, memory, learning, and creative processing during optimal experiential states of art, beauty, and truth. We illustrate this emerging integration of bioinformatics with medicine with a videotape of the classical 4-stage creative process in a neuroscience approach to psychotherapy.

  1. DWI Repeaters and Non-Repeaters: A Comparison.

    Science.gov (United States)

    Weeber, Stan

    1981-01-01

    Discussed how driving-while-intoxicated (DWI) repeaters differed signigicantly from nonrepeaters on 4 of 23 variables tested. Repeaters were more likely to have zero or two dependent children, attend church frequently, drink occasionally and have one or more arrests for public intoxication. (Author)

  2. To Repeat or Not to Repeat a Course

    Science.gov (United States)

    Armstrong, Michael J.; Biktimirov, Ernest N.

    2013-01-01

    The difficult transition from high school to university means that many students need to repeat (retake) 1 or more of their university courses. The authors examine the performance of students repeating first-year core courses in an undergraduate business program. They used data from university records for 116 students who took a total of 232…

  3. Gold nanoparticle-based beacon to detect STAT5b mRNA expression in living cells: a case optimized by bioinformatics screen

    Directory of Open Access Journals (Sweden)

    Deng D

    2015-04-01

    Full Text Available Dawei Deng,* Yang Li,* Jianpeng Xue, Jie Wang, Guanhua Ai, Xin Li, Yueqing GuDepartment of Biomedical Engineering, China Pharmaceutical University, Nanjing, People’s Republic of China*These authors contributed equally to this workAbstract: Messenger RNA (mRNA, a single-strand ribonucleic acid with functional gene information is usually abnormally expressed in cancer cells and has become a promising biomarker for the study of tumor progress. Hairpin DNA-coated gold nanoparticle (hDAuNP beacon containing a bare gold nanoparticle (AuNP as fluorescence quencher and thiol-terminated fluorescently labeled stem–loop–stem oligonucleotide sequences attached by Au–S bond is currently a new nanoscale biodiagnostic platform capable of mRNA detection, in which the design of the loop region sequence is crucial for hybridizing with the target mRNA. Hence, in this study, to improve the sensitivity and selectivity of hDAuNP beacon simultaneously, the loop region of hairpin DNA was screened by bioinformatics strategy. Here, signal transducer and activator of transcription 5b (STAT5b mRNA was selected and used as a practical example. The results from the combined characterizations using optical techniques, flow cytometry assay, and cell microscopic imaging showed that after optimization, the as-prepared hDAuNP beacon had higher selectivity and sensitivity for the detection of STAT5b mRNA in living cells, as compared with our previous beacon. Thus, the bioinformatics method may be a promising new strategy for assisting in the designing of the hDAuNP beacon, extending its application in the detection of mRNA expression and the resultant mRNA-based biological processes and disease pathogenesis.Keywords: molecular beacon, bioinformatics, gold nanoparticle, STAT5b mRNA, visual detection

  4. Telecommunication Support System Using Keywords and Their Relevant Information in Videoconferencing — Presentation Method for Keeping Audience's Concentration at Distance Lectures

    Science.gov (United States)

    Asai, Kikuo; Kondo, Kimio; Kobayashi, Hideaki; Saito, Fumihiko

    We developed a prototype system to support telecommunication by using keywords selected by the speaker in a videoconference. In the traditional presentation style, a speaker talks and uses audiovisual materials, and the audience at remote sites looks at these materials. Unfortunately, the audience often loses concentration and attention during the talk. To overcome this problem, we investigate a keyword presentation style, in which the speaker holds keyword cards that enable the audience to see additional information. Although keyword captions were originally intended for use in video materials for learning foreign languages, they can also be used to improve the quality of distance lectures in videoconferences. Our prototype system recognizes printed keywords in a video image at a server, and transfers the data to clients as multimedia functions such as language translation, three-dimensional (3D) model visualization, and audio reproduction. The additional information is collocated to the keyword cards in the display window, thus forming a spatial relationship between them. We conducted an experiment to investigate the properties of the keyword presentation style for an audience. The results suggest the potential of the keyword presentation style for improving the audience's concentration and attention in distance lectures by providing an environment that facilitates eye contact during videoconferencing.

  5. Atlas – a data warehouse for integrative bioinformatics

    Directory of Open Access Journals (Sweden)

    Yuen Macaire MS

    2005-02-01

    Full Text Available Abstract Background We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. Description The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL calls that are implemented in a set of Application Programming Interfaces (APIs. The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD, Biomolecular Interaction Network Database (BIND, Database of Interacting Proteins (DIP, Molecular Interactions Database (MINT, IntAct, NCBI Taxonomy, Gene Ontology (GO, Online Mendelian Inheritance in Man (OMIM, LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. Conclusion The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First

  6. Bioinformatics in the secondary science classroom: A study of state content standards and students' perceptions of, and performance in, bioinformatics lessons

    Science.gov (United States)

    Wefer, Stephen H.

    The proliferation of bioinformatics in modern Biology marks a new revolution in science, which promises to influence science education at all levels. This thesis examined state standards for content that articulated bioinformatics, and explored secondary students' affective and cognitive perceptions of, and performance in, a bioinformatics mini-unit. The results are presented as three studies. The first study analyzed secondary science standards of 49 U.S States (Iowa has no science framework) and the District of Columbia for content related to bioinformatics at the introductory high school biology level. The bionformatics content of each state's Biology standards were categorized into nine areas and the prevalence of each area documented. The nine areas were: The Human Genome Project, Forensics, Evolution, Classification, Nucleotide Variations, Medicine, Computer Use, Agriculture/Food Technology, and Science Technology and Society/Socioscientific Issues (STS/SSI). Findings indicated a generally low representation of bioinformatics related content, which varied substantially across the different areas. Recommendations are made for reworking existing standards to incorporate bioinformatics and to facilitate the goal of promoting science literacy in this emerging new field among secondary school students. The second study examined thirty-two students' affective responses to, and content mastery of, a two-week bioinformatics mini-unit. The findings indicate that the students generally were positive relative to their interest level, the usefulness of the lessons, the difficulty level of the lessons, likeliness to engage in additional bioinformatics, and were overall successful on the assessments. A discussion of the results and significance is followed by suggestions for future research and implementation for transferability. The third study presents a case study of individual differences among ten secondary school students, whose cognitive and affective percepts were

  7. Public sphere and the sustainability of the bioinformatics promise.

    Science.gov (United States)

    Leite, Marcelo

    2004-12-30

    The literature about genomics and bioinformatics achievements in high-impact journals such as Nature and Science has raised disproportionate expectations amongst the general public about fast and revolutionary drugs and breakthroughs in biomedicine. However, the yield obtained by database mining activities has been modest, as reported in the February 2001 issues of these journals featuring the completion of human genome draft sequences by the Human Genome Project Consortium and the company Celera. I have compared changes in rethoric employed by molecular biologists in 2001 and in April 2003, when the final sequence was announced. The comparison suggests that researchers are concerned about the sustainability of society's investment in this field, though not explicitly.

  8. Current challenges in genome annotation through structural biology and bioinformatics.

    Science.gov (United States)

    Furnham, Nicholas; de Beer, Tjaart A P; Thornton, Janet M

    2012-10-01

    With the huge volume in genomic sequences being generated from high-throughout sequencing projects the requirement for providing accurate and detailed annotations of gene products has never been greater. It is proving to be a huge challenge for computational biologists to use as much information as possible from experimental data to provide annotations for genome data of unknown function. A central component to this process is to use experimentally determined structures, which provide a means to detect homology that is not discernable from just the sequence and permit the consequences of genomic variation to be realized at the molecular level. In particular, structures also form the basis of many bioinformatics methods for improving the detailed functional annotations of enzymes in combination with similarities in sequence and chemistry. Copyright © 2012. Published by Elsevier Ltd.

  9. Systems biology and bioinformatics in aging research: a workshop report.

    Science.gov (United States)

    Fuellen, Georg; Dengjel, Jörn; Hoeflich, Andreas; Hoeijemakers, Jan; Kestler, Hans A; Kowald, Axel; Priebe, Steffen; Rebholz-Schuhmann, Dietrich; Schmeck, Bernd; Schmitz, Ulf; Stolzing, Alexandra; Sühnel, Jürgen; Wuttke, Daniel; Vera, Julio

    2012-12-01

    In an "aging society," health span extension is most important. As in 2010, talks in this series of meetings in Rostock-Warnemünde demonstrated that aging is an apparently very complex process, where computational work is most useful for gaining insights and to find interventions that counter aging and prevent or counteract aging-related diseases. The specific topics of this year's meeting entitled, "RoSyBA: Rostock Symposium on Systems Biology and Bioinformatics in Ageing Research," were primarily related to "Cancer and Aging" and also had a focus on work funded by the German Federal Ministry of Education and Research (BMBF). The next meeting in the series, scheduled for September 20-21, 2013, will focus on the use of ontologies for computational research into aging, stem cells, and cancer. Promoting knowledge formalization is also at the core of the set of proposed action items concluding this report.

  10. Bioinfogrid:. Bioinformatics Simulation and Modeling Based on Grid

    Science.gov (United States)

    Milanesi, Luciano

    2007-12-01

    Genomics sequencing projects and new technologies applied to molecular genetics analysis are producing huge amounts of raw data. In future the trend of the biomedical scientific research will be based on computing Grids for data crunching applications, data Grids for distributed storage of large amounts of accessible data and the provision of tools to all users. Biomedical research laboratories are moving towards an environment, created through the sharing of resources, in which heterogeneous and dispersed health data, such as molecular data (e.g. genomics, proteomics), cellular data (e.g. pathways), tissue data, population data (e.g. Genotyping, SNP, Epidemiology), as well the data generated by large scale analysis (eg. Simulation data, Modelling). In this paper some applications developed in the framework of the European Project "Bioinformatics Grid Application for life science - BioinfoGRID" will be described in order to show the potentiality of the GRID to carry out large scale analysis and research worldwide.

  11. Integrative content-driven concepts for bioinformatics ``beyond the cell"

    Indian Academy of Sciences (India)

    Edgar Wingender; Torsten Crass; Jennifer D Hogan; Alexander E Kel; Olga V Kel-Margoulis; Anatolij P Potapov

    2007-01-01

    Bioinformatics has delivered great contributions to genome and genomics research, without which the world-wide success of this and other global (‘omics’) approaches would not have been possible. More recently, it has developed further towards the analysis of different kinds of networks thus laying the foundation for comprehensive description, analysis and manipulation of whole living systems in modern ``systems biology”. The next step which is necessary for developing a systems biology that deals with systemic phenomena is to expand the existing and develop new methodologies that are appropriate to characterize intercellular processes and interactions without omitting the causal underlying molecular mechanisms. Modelling the processes on the different levels of complexity involved requires a comprehensive integration of information on gene regulatory events, signal transduction pathways, protein interaction and metabolic networks as well as cellular functions in the respective tissues/organs.

  12. An Adaptive Hybrid Multiprocessor technique for bioinformatics sequence alignment

    KAUST Repository

    Bonny, Talal

    2012-07-28

    Sequence alignment algorithms such as the Smith-Waterman algorithm are among the most important applications in the development of bioinformatics. Sequence alignment algorithms must process large amounts of data which may take a long time. Here, we introduce our Adaptive Hybrid Multiprocessor technique to accelerate the implementation of the Smith-Waterman algorithm. Our technique utilizes both the graphics processing unit (GPU) and the central processing unit (CPU). It adapts to the implementation according to the number of CPUs given as input by efficiently distributing the workload between the processing units. Using existing resources (GPU and CPU) in an efficient way is a novel approach. The peak performance achieved for the platforms GPU + CPU, GPU + 2CPUs, and GPU + 3CPUs is 10.4 GCUPS, 13.7 GCUPS, and 18.6 GCUPS, respectively (with the query length of 511 amino acid). © 2010 IEEE.

  13. Meta-learning framework applied in bioinformatics inference system design.

    Science.gov (United States)

    Arredondo, Tomás; Ormazábal, Wladimir

    2015-01-01

    This paper describes a meta-learner inference system development framework which is applied and tested in the implementation of bioinformatic inference systems. These inference systems are used for the systematic classification of the best candidates for inclusion in bacterial metabolic pathway maps. This meta-learner-based approach utilises a workflow where the user provides feedback with final classification decisions which are stored in conjunction with analysed genetic sequences for periodic inference system training. The inference systems were trained and tested with three different data sets related to the bacterial degradation of aromatic compounds. The analysis of the meta-learner-based framework involved contrasting several different optimisation methods with various different parameters. The obtained inference systems were also contrasted with other standard classification methods with accurate prediction capabilities observed.

  14. Research Techniques Made Simple: Bioinformatics for Genome-Scale Biology.

    Science.gov (United States)

    Foulkes, Amy C; Watson, David S; Griffiths, Christopher E M; Warren, Richard B; Huber, Wolfgang; Barnes, Michael R

    2017-09-01

    High-throughput biology presents unique opportunities and challenges for dermatological research. Drawing on a small handful of exemplary studies, we review some of the major lessons of these new technologies. We caution against several common errors and introduce helpful statistical concepts that may be unfamiliar to researchers without experience in bioinformatics. We recommend specific software tools that can aid dermatologists at varying levels of computational literacy, including platforms with command line and graphical user interfaces. The future of dermatology lies in integrative research, in which clinicians, laboratory scientists, and data analysts come together to plan, execute, and publish their work in open forums that promote critical discussion and reproducibility. In this article, we offer guidelines that we hope will steer researchers toward best practices for this new and dynamic era of data intensive dermatology. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  15. Why Polyphenols have Promiscuous Actions? An Investigation by Chemical Bioinformatics.

    Science.gov (United States)

    Tang, Guang-Yan

    2016-05-01

    Despite their diverse pharmacological effects, polyphenols are poor for use as drugs, which have been traditionally ascribed to their low bioavailability. However, Baell and co-workers recently proposed that the redox potential of polyphenols also plays an important role in this, because redox reactions bring promiscuous actions on various protein targets and thus produce non-specific pharmacological effects. To investigate whether the redox reactivity behaves as a critical factor in polyphenol promiscuity, we performed a chemical bioinformatics analysis on the structure-activity relationships of twenty polyphenols. It was found that the gene expression profiles of human cell lines induced by polyphenols were not correlated with the presence or not of redox moieties in the polyphenols, but significantly correlated with their molecular structures. Therefore, it is concluded that the promiscuous actions of polyphenols are likely to result from their inherent structural features rather than their redox potential.

  16. BioRuby: bioinformatics software for the Ruby programming language.

    Science.gov (United States)

    Goto, Naohisa; Prins, Pjotr; Nakao, Mitsuteru; Bonnal, Raoul; Aerts, Jan; Katayama, Toshiaki

    2010-10-15

    The BioRuby software toolkit contains a comprehensive set of free development tools and libraries for bioinformatics and molecular biology, written in the Ruby programming language. BioRuby has components for sequence analysis, pathway analysis, protein modelling and phylogenetic analysis; it supports many widely used data formats and provides easy access to databases, external programs and public web services, including BLAST, KEGG, GenBank, MEDLINE and GO. BioRuby comes with a tutorial, documentation and an interactive environment, which can be used in the shell, and in the web browser. BioRuby is free and open source software, made available under the Ruby license. BioRuby runs on all platforms that support Ruby, including Linux, Mac OS X and Windows. And, with JRuby, BioRuby runs on the Java Virtual Machine. The source code is available from http://www.bioruby.org/. katayama@bioruby.org

  17. Bioinformatics analysis of the gene expression profile in Bladder carcinoma

    Directory of Open Access Journals (Sweden)

    Jing Xiao

    2013-01-01

    Full Text Available Bladder carcinoma, which has the ninth highest incidence among malignant tumors in the world, is a complex, multifactorial disease. The malignant transformation of bladder cells results from DNA mutations and alterations in gene expression levels. In this work, we used a bioinformatics approach to investigate the molecular mechanisms of bladder carcinoma. Biochips downloaded from the Gene Expression Omnibus (GEO were used to analyze the gene expression profile in urinary bladder cells from individuals with carcinoma. The gene expression profile of normal genomes was used as a control. The analysis of gene expression revealed important alterations in genes involved in biological processes and metabolic pathways. We also identified some small molecules capable of reversing the altered gene expression in bladder carcinoma; these molecules could provide a basis for future therapies for the treatment of this disease.

  18. Databases and Bioinformatics Tools for the Study of DNA Repair

    Directory of Open Access Journals (Sweden)

    Kaja Milanowska

    2011-01-01

    Full Text Available DNA is continuously exposed to many different damaging agents such as environmental chemicals, UV light, ionizing radiation, and reactive cellular metabolites. DNA lesions can result in different phenotypical consequences ranging from a number of diseases, including cancer, to cellular malfunction, cell death, or aging. To counteract the deleterious effects of DNA damage, cells have developed various repair systems, including biochemical pathways responsible for the removal of single-strand lesions such as base excision repair (BER and nucleotide excision repair (NER or specialized polymerases temporarily taking over lesion-arrested DNA polymerases during the S phase in translesion synthesis (TLS. There are also other mechanisms of DNA repair such as homologous recombination repair (HRR, nonhomologous end-joining repair (NHEJ, or DNA damage response system (DDR. This paper reviews bioinformatics resources specialized in disseminating information about DNA repair pathways, proteins involved in repair mechanisms, damaging agents, and DNA lesions.

  19. ISEV position paper: extracellular vesicle RNA analysis and bioinformatics

    Directory of Open Access Journals (Sweden)

    Andrew F. Hill

    2013-12-01

    Full Text Available Extracellular vesicles (EVs are the collective term for the various vesicles that are released by cells into the extracellular space. Such vesicles include exosomes and microvesicles, which vary by their size and/or protein and genetic cargo. With the discovery that EVs contain genetic material in the form of RNA (evRNA has come the increased interest in these vesicles for their potential use as sources of disease biomarkers and potential therapeutic agents. Rapid developments in the availability of deep sequencing technologies have enabled the study of EV-related RNA in detail. In October 2012, the International Society for Extracellular Vesicles (ISEV held a workshop on “evRNA analysis and bioinformatics.” Here, we report the conclusions of one of the roundtable discussions where we discussed evRNA analysis technologies and provide some guidelines to researchers in the field to consider when performing such analysis.

  20. Mining Cancer Transcriptomes: Bioinformatic Tools and the Remaining Challenges.

    Science.gov (United States)

    Milan, Thomas; Wilhelm, Brian T

    2017-02-22

    The development of next-generation sequencing technologies has had a profound impact on the field of cancer genomics. With the enormous quantities of data being generated from tumor samples, researchers have had to rapidly adapt tools or develop new ones to analyse the raw data to maximize its value. While much of this effort has been focused on improving specific algorithms to get faster and more precise results, the accessibility of the final data for the research community remains a significant problem. Large amounts of data exist but are not easily available to researchers who lack the resources and experience to download and reanalyze them. In this article, we focus on RNA-seq analysis in the context of cancer genomics and discuss the bioinformatic tools available to explore these data. We also highlight the importance of developing new and more intuitive tools to provide easier access to public data and discuss the related issues of data sharing and patient privacy.