multilingual text mining: Topics by WorldWideScience.org

Sample records for multilingual text mining

Multilingual access to full text databases; Acces multilingue aux bases de donnees en texte integral

Energy Technology Data Exchange (ETDEWEB)

Fluhr, C; Radwan, K [Institut National des Sciences et Techniques Nucleaires (INSTN), Centre d` Etudes de Saclay, 91 - Gif-sur-Yvette (France)

1990-05-01

Many full text databases are available in only one language, or more, they may contain documents in different languages. Even if the user is able to understand the language of the documents in the database, it could be easier for him to express his need in his own language. For the case of databases containing documents in different languages, it is more simple to formulate the query in one language only and to retrieve documents in different languages. This paper present the developments and the first experiments of multilingual search, applied to french-english pair, for text data in nuclear field, based on the system SPIRIT. After reminding the general problems of full text databases search by queries formulated in natural language, we present the methods used to reformulate the queries and show how they can be expanded for multilingual search. The first results on data in nuclear field are presented (AFCEN norms and INIS abstracts). 4 refs.
Multilingual text induced spelling correction

NARCIS (Netherlands)

Reynaert, M.W.C.

2004-01-01

We present TISC, a multilingual, language-independent and context-sensitive spelling checking and correction system designed to facilitate the automatic removal of non-word spelling errors in large corpora. Its lexicon is derived from raw text corpora, without supervision, and contains word unigrams
Multilingual access to full text databases

International Nuclear Information System (INIS)

Fluhr, C.; Radwan, K.

1990-05-01

Many full text databases are available in only one language, or more, they may contain documents in different languages. Even if the user is able to understand the language of the documents in the database, it could be easier for him to express his need in his own language. For the case of databases containing documents in different languages, it is more simple to formulate the query in one language only and to retrieve documents in different languages. This paper present the developments and the first experiments of multilingual search, applied to french-english pair, for text data in nuclear field, based on the system SPIRIT. After reminding the general problems of full text databases search by queries formulated in natural language, we present the methods used to reformulate the queries and show how they can be expanded for multilingual search. The first results on data in nuclear field are presented (AFCEN norms and INIS abstracts). 4 refs
Speect: a multilingual text-to-speech system

CSIR Research Space (South Africa)

Louw, JA

2008-11-01

Full Text Available This paper introduces a new multilingual text-to-speech system, which we call Speect (Speech synthesis with extensible architecture), aiming to address the shortcomings of using Festival as a research sytem and Flite as a deployment system in a...
The Concordance of Multilingual Legal Texts at the WTO

Science.gov (United States)

Condon, Bradly J.

2012-01-01

Multilingualism is a sensitive and complex subject in a global organisation such as the World Trade Organization (WTO). In the WTO legal texts, there is a need for full concordance, not simply translation. This article begins with an overview of the issues raised by multilingual processes at the WTO in the negotiation, drafting, translation,…
Text Mining.

Science.gov (United States)

Trybula, Walter J.

1999-01-01

Reviews the state of research in text mining, focusing on newer developments. The intent is to describe the disparate investigations currently included under the term text mining and provide a cohesive structure for these efforts. A summary of research identifies key organizations responsible for pushing the development of text mining. A section…
Text Mining in Organizational Research.

Science.gov (United States)

Kobayashi, Vladimer B; Mol, Stefan T; Berkers, Hannah A; Kismihók, Gábor; Den Hartog, Deanne N

2018-07-01

Despite the ubiquity of textual data, so far few researchers have applied text mining to answer organizational research questions. Text mining, which essentially entails a quantitative approach to the analysis of (usually) voluminous textual data, helps accelerate knowledge discovery by radically increasing the amount data that can be analyzed. This article aims to acquaint organizational researchers with the fundamental logic underpinning text mining, the analytical stages involved, and contemporary techniques that may be used to achieve different types of objectives. The specific analytical techniques reviewed are (a) dimensionality reduction, (b) distance and similarity computing, (c) clustering, (d) topic modeling, and (e) classification. We describe how text mining may extend contemporary organizational research by allowing the testing of existing or new research questions with data that are likely to be rich, contextualized, and ecologically valid. After an exploration of how evidence for the validity of text mining output may be generated, we conclude the article by illustrating the text mining process in a job analysis setting using a dataset composed of job vacancies.
A Customizable Text Classifier for Text Mining

Directory of Open Access Journals (Sweden)

Yun-liang Zhang

2007-12-01

Full Text Available Text mining deals with complex and unstructured texts. Usually a particular collection of texts that is specified to one or more domains is necessary. We have developed a customizable text classifier for users to mine the collection automatically. It derives from the sentence category of the HNC theory and corresponding techniques. It can start with a few texts, and it can adjust automatically or be adjusted by user. The user can also control the number of domains chosen and decide the standard with which to choose the texts based on demand and abundance of materials. The performance of the classifier varies with the user's choice.
SparkText: Biomedical Text Mining on Big Data Framework.

Science.gov (United States)

Ye, Zhan; Tafti, Ahmad P; He, Karen Y; Wang, Kai; He, Max M

Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research.
Estimation of Cross-Lingual News Similarities Using Text-Mining Methods

Directory of Open Access Journals (Sweden)

Zhouhao Wang

2018-01-01

Full Text Available In this research, two estimation algorithms for extracting cross-lingual news pairs based on machine learning from financial news articles have been proposed. Every second, innumerable text data, including all kinds news, reports, messages, reviews, comments, and tweets are generated on the Internet, and these are written not only in English but also in other languages such as Chinese, Japanese, French, etc. By taking advantage of multi-lingual text resources provided by Thomson Reuters News, we developed two estimation algorithms for extracting cross-lingual news pairs from multilingual text resources. In our first method, we propose a novel structure that uses the word information and the machine learning method effectively in this task. Simultaneously, we developed a bidirectional Long Short-Term Memory (LSTM based method to calculate cross-lingual semantic text similarity for long text and short text, respectively. Thus, when an important news article is published, users can read similar news articles that are written in their native language using our method.
SparkText: Biomedical Text Mining on Big Data Framework.

Directory of Open Access Journals (Sweden)

Zhan Ye

Full Text Available Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment.In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM, and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes.This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research.
SIAM 2007 Text Mining Competition dataset

Data.gov (United States)

National Aeronautics and Space Administration — Subject Area: Text Mining Description: This is the dataset used for the SIAM 2007 Text Mining competition. This competition focused on developing text mining...
Language Contact in Nigerian Multilingual Society

Directory of Open Access Journals (Sweden)

C.A. Adetuyi

2017-12-01

Full Text Available Multilingual society, being a society that has more than one significant lan-guage group is a sociolinguistic phenomenon that arises as a result of language contact but the fundamental problem in this type of society is that to enthrone one of the languages can be accepted conveniently as the National language. Any attempt to enthrone one of the languages at the expense of the other has proven a failure due to the fact that it appears as distinct, which is inherent and regrettably discriminating and domineering on the other languages and this dies in the mine of ethnic bickering. In Nigeria, like many other African nations, multilingualism is a rule, rather than an exemption, the problem of 'forging ahead' is of crucial importance. Among the competing languages that scramble for national recognition or official status, whether indigenous or for-eign, one must emerge as the official language (the language of administration and education at some levels, the language of relevance, from the competition for the purpose of uniting the nation. Fortunately, English has emerged as that privileged language of its kind. The Nigerian society is irretrievably heterogeneous. Students from diverse ethno-linguistic, cultural and economic groups are exposed quite early to several languages, including their mother tongues and English. Nigerian scholars have variously, as have others examined the connection between multilingualism and interference; we avail ourselves of such studies in situating our reflections. This paper thus looks into the importance of language, most especially English language in the multilingual society.
SparkText: Biomedical Text Mining on Big Data Framework

Science.gov (United States)

He, Karen Y.; Wang, Kai

2016-01-01

Background Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. Results In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. Conclusions This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research. PMID:27685652
Text mining from ontology learning to automated text processing applications

CERN Document Server

Biemann, Chris

2014-01-01

This book comprises a set of articles that specify the methodology of text mining, describe the creation of lexical resources in the framework of text mining and use text mining for various tasks in natural language processing (NLP). The analysis of large amounts of textual data is a prerequisite to build lexical resources such as dictionaries and ontologies and also has direct applications in automated text processing in fields such as history, healthcare and mobile applications, just to name a few. This volume gives an update in terms of the recent gains in text mining methods and reflects
Text Mining Applications and Theory

CERN Document Server

Berry, Michael W

2010-01-01

Text Mining: Applications and Theory presents the state-of-the-art algorithms for text mining from both the academic and industrial perspectives. The contributors span several countries and scientific domains: universities, industrial corporations, and government laboratories, and demonstrate the use of techniques from machine learning, knowledge discovery, natural language processing and information retrieval to design computational models for automated text analysis and mining. This volume demonstrates how advancements in the fields of applied mathematics, computer science, machine learning
Working with text tools, techniques and approaches for text mining

CERN Document Server

Tourte, Gregory J L

2016-01-01

Text mining tools and technologies have long been a part of the repository world, where they have been applied to a variety of purposes, from pragmatic aims to support tools. Research areas as diverse as biology, chemistry, sociology and criminology have seen effective use made of text mining technologies. Working With Text collects a subset of the best contributions from the 'Working with text: Tools, techniques and approaches for text mining' workshop, alongside contributions from experts in the area. Text mining tools and technologies in support of academic research include supporting research on the basis of a large body of documents, facilitating access to and reuse of extant work, and bridging between the formal academic world and areas such as traditional and social media. Jisc have funded a number of projects, including NaCTem (the National Centre for Text Mining) and the ResDis programme. Contents are developed from workshop submissions and invited contributions, including: Legal considerations in te...
Contextual Text Mining

Science.gov (United States)

Mei, Qiaozhu

2009-01-01

With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the…
Text mining for the biocuration workflow.

Science.gov (United States)

Hirschman, Lynette; Burns, Gully A P C; Krallinger, Martin; Arighi, Cecilia; Cohen, K Bretonnel; Valencia, Alfonso; Wu, Cathy H; Chatr-Aryamontri, Andrew; Dowell, Karen G; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G

2012-01-01

Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on 'Text Mining for the BioCuration Workflow' at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.
Text mining for the biocuration workflow

Science.gov (United States)

Hirschman, Lynette; Burns, Gully A. P. C; Krallinger, Martin; Arighi, Cecilia; Cohen, K. Bretonnel; Valencia, Alfonso; Wu, Cathy H.; Chatr-Aryamontri, Andrew; Dowell, Karen G.; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G.

2012-01-01

Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on ‘Text Mining for the BioCuration Workflow’ at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community. PMID:22513129

A STUDY OF TEXT MINING METHODS, APPLICATIONS,AND TECHNIQUES

OpenAIRE

R. Rajamani*1 & S. Saranya2

2017-01-01

Data mining is used to extract useful information from the large amount of data. It is used to implement and solve different types of research problems. The research related areas in data mining are text mining, web mining, image mining, sequential pattern mining, spatial mining, medical mining, multimedia mining, structure mining and graph mining. Text mining also referred to text of data mining, it is also called knowledge discovery in text (KDT) or knowledge of intelligent text analysis. T...
Stylistic Performance through Affective Marking: A Case of Multilingual Literary Discourse

Directory of Open Access Journals (Sweden)

Urjani Chakravarty

2016-12-01

Full Text Available This paper provides an overall analysis of how multi-lingual writer like Amitav Ghosh write about emotion in his literary text, and emphasize on how multilingual authors display emotion/affect through use of literary multilingualism (affective markers combined with writer style. Through use of multiple strategies, they reduces the limitations of interpretation of their texts. Furthermore, this paper highlighted the centrally sociolinguistic and cognitive dimensions of the relationships between multilingualism and emotion and how this is influenced by assumptions of Relevance Theory i.e. optimal relevance in a literary text. One should expect to find relationships between sociolinguistic diversity and affective expression for most authors in locally specific ways, whether multilingual or not. Such scholarship can then illuminate how the authors by using literary multilingualism through writer style and affective markers can shape emotions across various contexts in a literary text. Future research into multilingualism and emotion should continue to distinguish between how multilingual authors use linguistic forms to show feeling, and how they express about feeling in their created texts. Keywords: Language, Culture, Literary Multilingualism, Style, Affect and Relevance Theory
Biomarker Identification Using Text Mining

Directory of Open Access Journals (Sweden)

Hui Li

2012-01-01

Full Text Available Identifying molecular biomarkers has become one of the important tasks for scientists to assess the different phenotypic states of cells or organisms correlated to the genotypes of diseases from large-scale biological data. In this paper, we proposed a text-mining-based method to discover biomarkers from PubMed. First, we construct a database based on a dictionary, and then we used a finite state machine to identify the biomarkers. Our method of text mining provides a highly reliable approach to discover the biomarkers in the PubMed database.
The multilingual brain

OpenAIRE

Engel de Abreu, Pascale

2013-01-01

The multilingual brain. Is a multilingual education beneficial for children? What are the optimal conditions under which a child can become perfectly multilingual? The given lecture will focus on the "cognitive advantages" of multilingualism and illustrate the impact that being multilingual has on the cognitive organisation of the brain. Practical questions regarding multilingual education will also be discussed.
Multilingual Policies and Multilingual Education in the Nordic Countries

Science.gov (United States)

Björklund, Mikaela; Björklund, Siv; Sjöholm, Kaj

2013-01-01

This article presents some aspects of multilingualism and multilingual education in the Nordic countries, drawing upon experiences from the project "Network for Researchers of Multilingualism and Multilingual Education, RoMME" (2011-2013), where Denmark, Finland, Norway and Sweden are represented. The aim is to briefly present and…
Chapter 16: text mining for translational bioinformatics.

Science.gov (United States)

Cohen, K Bretonnel; Hunter, Lawrence E

2013-04-01

Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.
Frontiers of biomedical text mining: current progress

Science.gov (United States)

Zweigenbaum, Pierre; Demner-Fushman, Dina; Yu, Hong; Cohen, Kevin B.

2008-01-01

It is now almost 15 years since the publication of the first paper on text mining in the genomics domain, and decades since the first paper on text mining in the medical domain. Enormous progress has been made in the areas of information retrieval, evaluation methodologies and resource construction. Some problems, such as abbreviation-handling, can essentially be considered solved problems, and others, such as identification of gene mentions in text, seem likely to be solved soon. However, a number of problems at the frontiers of biomedical text mining continue to present interesting challenges and opportunities for great improvements and interesting research. In this article we review the current state of the art in biomedical text mining or ‘BioNLP’ in general, focusing primarily on papers published within the past year. PMID:17977867
Text mining resources for the life sciences.

Science.gov (United States)

Przybyła, Piotr; Shardlow, Matthew; Aubin, Sophie; Bossy, Robert; Eckart de Castilho, Richard; Piperidis, Stelios; McNaught, John; Ananiadou, Sophia

2016-01-01

Text mining is a powerful technology for quickly distilling key information from vast quantities of biomedical literature. However, to harness this power the researcher must be well versed in the availability, suitability, adaptability, interoperability and comparative accuracy of current text mining resources. In this survey, we give an overview of the text mining resources that exist in the life sciences to help researchers, especially those employed in biocuration, to engage with text mining in their own work. We categorize the various resources under three sections: Content Discovery looks at where and how to find biomedical publications for text mining; Knowledge Encoding describes the formats used to represent the different levels of information associated with content that enable text mining, including those formats used to carry such information between processes; Tools and Services gives an overview of workflow management systems that can be used to rapidly configure and compare domain- and task-specific processes, via access to a wide range of pre-built tools. We also provide links to relevant repositories in each section to enable the reader to find resources relevant to their own area of interest. Throughout this work we give a special focus to resources that are interoperable-those that have the crucial ability to share information, enabling smooth integration and reusability. © The Author(s) 2016. Published by Oxford University Press.
Text mining resources for the life sciences

Science.gov (United States)

Shardlow, Matthew; Aubin, Sophie; Bossy, Robert; Eckart de Castilho, Richard; Piperidis, Stelios; McNaught, John; Ananiadou, Sophia

2016-01-01

Text mining is a powerful technology for quickly distilling key information from vast quantities of biomedical literature. However, to harness this power the researcher must be well versed in the availability, suitability, adaptability, interoperability and comparative accuracy of current text mining resources. In this survey, we give an overview of the text mining resources that exist in the life sciences to help researchers, especially those employed in biocuration, to engage with text mining in their own work. We categorize the various resources under three sections: Content Discovery looks at where and how to find biomedical publications for text mining; Knowledge Encoding describes the formats used to represent the different levels of information associated with content that enable text mining, including those formats used to carry such information between processes; Tools and Services gives an overview of workflow management systems that can be used to rapidly configure and compare domain- and task-specific processes, via access to a wide range of pre-built tools. We also provide links to relevant repositories in each section to enable the reader to find resources relevant to their own area of interest. Throughout this work we give a special focus to resources that are interoperable—those that have the crucial ability to share information, enabling smooth integration and reusability. PMID:27888231
Cultural text mining: using text mining to map the emergence of transnational reference cultures in public media repositories

NARCIS (Netherlands)

Pieters, Toine; Verheul, Jaap

2014-01-01

This paper discusses the research project Translantis, which uses innovative technologies for cultural text mining to analyze large repositories of digitized public media, such as newspapers and journals.1 The Translantis research team uses and develops the text mining tool Texcavator, which is
The multilingual brain

OpenAIRE

Engel de Abreu, Pascale

2014-01-01

The multilingual brain. Is a multilingual education beneficial for children? What are the optimal conditions under which a child can become perfectly multilingual? The given lecture will focus on the "cognitive advantages" of multilingualism and illustrate the impact that being multilingual has on the cognitive organisation of the brain. Practical questions regarding multilingual education will also be discussed. Ass et gutt e Kand méisproocheg ze erzéien? Wat sinn déi optimal Konditio...
Multilingual Europe

DEFF Research Database (Denmark)

Phillipson, Robert

2013-01-01

Review of: Multilingual Europe: Multilingual Europeans. (European Studies: An Interdisciplinary Series in European Culture, History and Politics, Vol. 29). Eds. Láslá Maràcz & Mireille Rosello. Rodopi, 2012. 323 pp.......Review of: Multilingual Europe: Multilingual Europeans. (European Studies: An Interdisciplinary Series in European Culture, History and Politics, Vol. 29). Eds. Láslá Maràcz & Mireille Rosello. Rodopi, 2012. 323 pp....
Text mining patents for biomedical knowledge.

Science.gov (United States)

Rodriguez-Esteban, Raul; Bundschus, Markus

2016-06-01

Biomedical text mining of scientific knowledge bases, such as Medline, has received much attention in recent years. Given that text mining is able to automatically extract biomedical facts that revolve around entities such as genes, proteins, and drugs, from unstructured text sources, it is seen as a major enabler to foster biomedical research and drug discovery. In contrast to the biomedical literature, research into the mining of biomedical patents has not reached the same level of maturity. Here, we review existing work and highlight the associated technical challenges that emerge from automatically extracting facts from patents. We conclude by outlining potential future directions in this domain that could help drive biomedical research and drug discovery. Copyright © 2016 Elsevier Ltd. All rights reserved.
Text mining for biology--the way forward

DEFF Research Database (Denmark)

Altman, Russ B; Bergman, Casey M; Blake, Judith

2008-01-01

This article collects opinions from leading scientists about how text mining can provide better access to the biological literature, how the scientific community can help with this process, what the next steps are, and what role future BioCreative evaluations can play. The responses identify...... several broad themes, including the possibility of fusing literature and biological databases through text mining; the need for user interfaces tailored to different classes of users and supporting community-based annotation; the importance of scaling text mining technology and inserting it into larger...
Understanding Editing Behaviors in Multilingual Wikipedia.

Directory of Open Access Journals (Sweden)

Suin Kim

Full Text Available Multilingualism is common offline, but we have a more limited understanding of the ways multilingualism is displayed online and the roles that multilinguals play in the spread of content between speakers of different languages. We take a computational approach to studying multilingualism using one of the largest user-generated content platforms, Wikipedia. We study multilingualism by collecting and analyzing a large dataset of the content written by multilingual editors of the English, German, and Spanish editions of Wikipedia. This dataset contains over two million paragraphs edited by over 15,000 multilingual users from July 8 to August 9, 2013. We analyze these multilingual editors in terms of their engagement, interests, and language proficiency in their primary and non-primary (secondary languages and find that the English edition of Wikipedia displays different dynamics from the Spanish and German editions. Users primarily editing the Spanish and German editions make more complex edits than users who edit these editions as a second language. In contrast, users editing the English edition as a second language make edits that are just as complex as the edits by users who primarily edit the English edition. In this way, English serves a special role bringing together content written by multilinguals from many language editions. Nonetheless, language remains a formidable hurdle to the spread of content: we find evidence for a complexity barrier whereby editors are less likely to edit complex content in a second language. In addition, we find that multilinguals are less engaged and show lower levels of language proficiency in their second languages. We also examine the topical interests of multilingual editors and find that there is no significant difference between primary and non-primary editors in each language.
Benchmarking infrastructure for mutation text mining.

Science.gov (United States)

Klein, Artjom; Riazanov, Alexandre; Hindle, Matthew M; Baker, Christopher Jo

2014-02-25

Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption.
Benchmarking infrastructure for mutation text mining

Science.gov (United States)

2014-01-01

Background Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. Results We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. Conclusion We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption. PMID:24568600
GPU-Accelerated Text Mining

International Nuclear Information System (INIS)

Cui, X.; Mueller, F.; Zhang, Y.; Potok, Thomas E.

2009-01-01

Accelerating hardware devices represent a novel promise for improving the performance for many problem domains but it is not clear for which domains what accelerators are suitable. While there is no room in general-purpose processor design to significantly increase the processor frequency, developers are instead resorting to multi-core chips duplicating conventional computing capabilities on a single die. Yet, accelerators offer more radical designs with a much higher level of parallelism and novel programming environments. This present work assesses the viability of text mining on CUDA. Text mining is one of the key concepts that has become prominent as an effective means to index the Internet, but its applications range beyond this scope and extend to providing document similarity metrics, the subject of this work. We have developed and optimized text search algorithms for GPUs to exploit their potential for massive data processing. We discuss the algorithmic challenges of parallelization for text search problems on GPUs and demonstrate the potential of these devices in experiments by reporting significant speedups. Our study may be one of the first to assess more complex text search problems for suitability for GPU devices, and it may also be one of the first to exploit and report on atomic instruction usage that have recently become available in NVIDIA devices
Biomedical text mining and its applications in cancer research.

Science.gov (United States)

Zhu, Fei; Patumcharoenpol, Preecha; Zhang, Cheng; Yang, Yang; Chan, Jonathan; Meechai, Asawin; Vongsangnak, Wanwipa; Shen, Bairong

2013-04-01

Cancer is a malignant disease that has caused millions of human deaths. Its study has a long history of well over 100years. There have been an enormous number of publications on cancer research. This integrated but unstructured biomedical text is of great value for cancer diagnostics, treatment, and prevention. The immense body and rapid growth of biomedical text on cancer has led to the appearance of a large number of text mining techniques aimed at extracting novel knowledge from scientific text. Biomedical text mining on cancer research is computationally automatic and high-throughput in nature. However, it is error-prone due to the complexity of natural language processing. In this review, we introduce the basic concepts underlying text mining and examine some frequently used algorithms, tools, and data sets, as well as assessing how much these algorithms have been utilized. We then discuss the current state-of-the-art text mining applications in cancer research and we also provide some resources for cancer text mining. With the development of systems biology, researchers tend to understand complex biomedical systems from a systems biology viewpoint. Thus, the full utilization of text mining to facilitate cancer systems biology research is fast becoming a major concern. To address this issue, we describe the general workflow of text mining in cancer systems biology and each phase of the workflow. We hope that this review can (i) provide a useful overview of the current work of this field; (ii) help researchers to choose text mining tools and datasets; and (iii) highlight how to apply text mining to assist cancer systems biology research. Copyright © 2012 Elsevier Inc. All rights reserved.
Text Mining in Biomedical Domain with Emphasis on Document Clustering.

Science.gov (United States)

Renganathan, Vinaitheerthan

2017-07-01

With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise.

Text mining meets workflow: linking U-Compare with Taverna

Science.gov (United States)

Kano, Yoshinobu; Dobson, Paul; Nakanishi, Mio; Tsujii, Jun'ichi; Ananiadou, Sophia

2010-01-01

Summary: Text mining from the biomedical literature is of increasing importance, yet it is not easy for the bioinformatics community to create and run text mining workflows due to the lack of accessibility and interoperability of the text mining resources. The U-Compare system provides a wide range of bio text mining resources in a highly interoperable workflow environment where workflows can very easily be created, executed, evaluated and visualized without coding. We have linked U-Compare to Taverna, a generic workflow system, to expose text mining functionality to the bioinformatics community. Availability: http://u-compare.org/taverna.html, http://u-compare.org Contact: kano@is.s.u-tokyo.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20709690
Mining knowledge from text repositories using information extraction ...

Indian Academy of Sciences (India)

Information extraction (IE); text mining; text repositories; knowledge discovery from .... general purpose English words. However ... of precision and recall, as extensive experimentation is required due to lack of public tagged corpora. 4. Mining ...
Translanguaging in the Writing of Emergent Multilinguals

Science.gov (United States)

Kiramba, Lydiah Kananu

2017-01-01

This article discusses the findings of an empirical study that investigated the writing practices in a multilingual, rural, fourth-grade classroom in Kenya. The study was undergirded by Bakhtin's heteroglossia. Analysis of texts indicated that these emergent multilinguals used multiple semiotic resources to maximize the chances of meeting the…
Introduction: Multilingual Behavior in Youth Groups

Directory of Open Access Journals (Sweden)

Jens Normann Jørgensen

2004-01-01

Full Text Available This introduction reviews some of the major work on bilingual and multilingual children and adolescents in Scandinavia, from Kotsinas (1985 and Boyd (1985 to the present. The introduction was originally published in J. N. Jørgensen (ed. 2001: Multilingual behavior in Youth Groups, Copenhagen Studies in Bilingualism, The Køge Series, Volume K11, Danish University of Education.
Multilingual Awareness and Heritage Language Education: Children's Multimodal Representations of Their Multilingualism

Science.gov (United States)

Melo-Pfeifer, Sílvia

2015-01-01

In this article, we analyse visual narratives of multilingual children, in order to acknowledge their self-perception as multilingual selves. Through the analysis of drawings produced by children enrolled in Portuguese as heritage language (PHL) classes in Germany, we analyse how bi-/multilingual children perceive their multilingual repertoires…
Text Mining of Supreme Administrative Court Jurisdictions

OpenAIRE

Feinerer, Ingo; Hornik, Kurt

2007-01-01

Within the last decade text mining, i.e., extracting sensitive information from text corpora, has become a major factor in business intelligence. The automated textual analysis of law corpora is highly valuable because of its impact on a company's legal options and the raw amount of available jurisdiction. The study of supreme court jurisdiction and international law corpora is equally important due to its effects on business sectors. In this paper we use text mining methods to investigate Au...
Text mining in cancer gene and pathway prioritization.

Science.gov (United States)

Luo, Yuan; Riedlinger, Gregory; Szolovits, Peter

2014-01-01

Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.
Text mining and visualization case studies using open-source tools

CERN Document Server

Chisholm, Andrew

2016-01-01

Text Mining and Visualization: Case Studies Using Open-Source Tools provides an introduction to text mining using some of the most popular and powerful open-source tools: KNIME, RapidMiner, Weka, R, and Python. The contributors-all highly experienced with text mining and open-source software-explain how text data are gathered and processed from a wide variety of sources, including books, server access logs, websites, social media sites, and message boards. Each chapter presents a case study that you can follow as part of a step-by-step, reproducible example. You can also easily apply and extend the techniques to other problems. All the examples are available on a supplementary website. The book shows you how to exploit your text data, offering successful application examples and blueprints for you to tackle your text mining tasks and benefit from open and freely available tools. It gets you up to date on the latest and most powerful tools, the data mining process, and specific text mining activities.
Citation Mining: Integrating Text Mining and Bibliometrics for Research User Profiling.

Science.gov (United States)

Kostoff, Ronald N.; del Rio, J. Antonio; Humenik, James A.; Garcia, Esther Ofilia; Ramirez, Ana Maria

2001-01-01

Discusses the importance of identifying the users and impact of research, and describes an approach for identifying the pathways through which research can impact other research, technology development, and applications. Describes a study that used citation mining, an integration of citation bibliometrics and text mining, on articles from the…
Affordances theory in multilingualism studies

Directory of Open Access Journals (Sweden)

Larissa Aronin

2012-10-01

Full Text Available The concept of affordances originating in Gibson’s work (Gibson, 1977 is gaining ground in multilingualism studies (cf. Aronin and Singleton, 2010; Singleton and Aronin, 2007; Dewaele, 2010. Nevertheless, studies investigating affordances in respect of teaching, learning or using languages are still somewhat rare and tend to treat isolated aspects of multilingualism. This is despite the fact that the theory of affordances can actually provide a valuable, supplementary, up-to-date framework within which a clearer, sharper description and explication of the intriguing range of attributes of multilingual communities, educational institutions and individuals, as well as teaching practices, become feasible. It is important that not only researchers and practitioners (teachers, educators, parents, community and political actors but also language users and learners themselves should be aware of how to identify or, if necessary, design new affordances for language acquisition and learning. The aim of this article is to adapt the concept of affordances to multilingualism studies and additional language teaching, and in so doing advance theoretical understanding in this context. To this end the article contains a brief summary of the findings so far available. The article also goes further into defining the ways of how affordances work in relation to multilingualism and second language teaching and puts forward an integrated model of affordances.
Stylistic Performance through Affective Marking: A Case of Multilingual Literary Discourse

Science.gov (United States)

Chakravarty, Urjani

2016-01-01

This paper provides an overall analysis of how multi-lingual writer like Amitav Ghosh write about emotion in his literary text, and emphasize on how multilingual authors display emotion/affect through use of literary multilingualism (affective markers) combined with writer style. Through use of multiple strategies, they reduces the limitations of…
Financial Statement Fraud Detection using Text Mining

OpenAIRE

Rajan Gupta; Nasib Singh Gill

2013-01-01

Data mining techniques have been used enormously by the researchers’ community in detecting financial statement fraud. Most of the research in this direction has used the numbers (quantitative information) i.e. financial ratios present in the financial statements for detecting fraud. There is very little or no research on the analysis of text such as auditor’s comments or notes present in published reports. In this study we propose a text mining approach for detecting financial statement frau...
Application of text mining in the biomedical domain.

Science.gov (United States)

Fleuren, Wilco W M; Alkema, Wynand

2015-03-01

In recent years the amount of experimental data that is produced in biomedical research and the number of papers that are being published in this field have grown rapidly. In order to keep up to date with developments in their field of interest and to interpret the outcome of experiments in light of all available literature, researchers turn more and more to the use of automated literature mining. As a consequence, text mining tools have evolved considerably in number and quality and nowadays can be used to address a variety of research questions ranging from de novo drug target discovery to enhanced biological interpretation of the results from high throughput experiments. In this paper we introduce the most important techniques that are used for a text mining and give an overview of the text mining tools that are currently being used and the type of problems they are typically applied for. Copyright © 2015 Elsevier Inc. All rights reserved.
Olowalu Review: Developing identity through translanguaging in a multilingual literary magazine

Directory of Open Access Journals (Sweden)

Alex Josef Kasula

2016-07-01

Full Text Available With the current trends in our globalized society, there is a clear increase in multilinguals rise; however, the understanding of multilingual identity and policy towards education stays relatively the same. Recent investigation in multilingualism in the US has shed light on the positive impacts of alternating policy in language education with regard to a greater understanding in how translanguaging and identity impact the language learner and language learning policies (Garcia & Wei, 2013. The following article describes the development of an online multilingual literary magazine, Olowalu Review, that aimed to provide English language learners in an English-only language policy a space to translanguage. Thus, having the opportunity to develop and express their multilingual identities. Goals and the development of the magazine are described in terms relating to current multilingual theory. While the outcomes and findings reveal how Olowalu Review enabled multilinguals to foster and exercise multilingual identities and skills, raise multilingual awareness, and act as an important multilingual artifact through an analysis of written submissions and interviews with authors. Pedagogical implications are discussed to empower language teachers, learners, or artists to develop the same or similar project for their own local, national, or global community.
Text mining in livestock animal science: introducing the potential of text mining to animal sciences.

Science.gov (United States)

Sahadevan, S; Hofmann-Apitius, M; Schellander, K; Tesfaye, D; Fluck, J; Friedrich, C M

2012-10-01

In biological research, establishing the prior art by searching and collecting information already present in the domain has equal importance as the experiments done. To obtain a complete overview about the relevant knowledge, researchers mainly rely on 2 major information sources: i) various biological databases and ii) scientific publications in the field. The major difference between the 2 information sources is that information from databases is available, typically well structured and condensed. The information content in scientific literature is vastly unstructured; that is, dispersed among the many different sections of scientific text. The traditional method of information extraction from scientific literature occurs by generating a list of relevant publications in the field of interest and manually scanning these texts for relevant information, which is very time consuming. It is more than likely that in using this "classical" approach the researcher misses some relevant information mentioned in the literature or has to go through biological databases to extract further information. Text mining and named entity recognition methods have already been used in human genomics and related fields as a solution to this problem. These methods can process and extract information from large volumes of scientific text. Text mining is defined as the automatic extraction of previously unknown and potentially useful information from text. Named entity recognition (NER) is defined as the method of identifying named entities (names of real world objects; for example, gene/protein names, drugs, enzymes) in text. In animal sciences, text mining and related methods have been briefly used in murine genomics and associated fields, leaving behind other fields of animal sciences, such as livestock genomics. The aim of this work was to develop an information retrieval platform in the livestock domain focusing on livestock publications and the recognition of relevant data from
Multilingual speaker age recognition: regression analyses on the Lwazi corpus

CSIR Research Space (South Africa)

Feld, M

2009-12-01

Full Text Available Multilinguality represents an area of significant opportunities for automatic speech-processing systems: whereas multilingual societies are commonplace, the majority of speechprocessing systems are developed with a single language in mind. As a step...
OntoGene web services for biomedical text mining.

Science.gov (United States)

Rinaldi, Fabio; Clematide, Simon; Marques, Hernani; Ellendorff, Tilia; Romacker, Martin; Rodriguez-Esteban, Raul

2014-01-01

Text mining services are rapidly becoming a crucial component of various knowledge management pipelines, for example in the process of database curation, or for exploration and enrichment of biomedical data within the pharmaceutical industry. Traditional architectures, based on monolithic applications, do not offer sufficient flexibility for a wide range of use case scenarios, and therefore open architectures, as provided by web services, are attracting increased interest. We present an approach towards providing advanced text mining capabilities through web services, using a recently proposed standard for textual data interchange (BioC). The web services leverage a state-of-the-art platform for text mining (OntoGene) which has been tested in several community-organized evaluation challenges,with top ranked results in several of them.
Mining biological networks from full-text articles.

Science.gov (United States)

Czarnecki, Jan; Shepherd, Adrian J

2014-01-01

The study of biological networks is playing an increasingly important role in the life sciences. Many different kinds of biological system can be modelled as networks; perhaps the most important examples are protein-protein interaction (PPI) networks, metabolic pathways, gene regulatory networks, and signalling networks. Although much useful information is easily accessible in publicly databases, a lot of extra relevant data lies scattered in numerous published papers. Hence there is a pressing need for automated text-mining methods capable of extracting such information from full-text articles. Here we present practical guidelines for constructing a text-mining pipeline from existing code and software components capable of extracting PPI networks from full-text articles. This approach can be adapted to tackle other types of biological network.
Text mining of web-based medical content

CERN Document Server

Neustein, Amy

2014-01-01

Text Mining of Web-Based Medical Content examines web mining for extracting useful information that can be used for treating and monitoring the healthcare of patients. This work provides methodological approaches to designing mapping tools that exploit data found in social media postings. Specific linguistic features of medical postings are analyzed vis-a-vis available data extraction tools for culling useful information.
MeSHmap: a text mining tool for MEDLINE.

OpenAIRE

Srinivasan, P.

2001-01-01

Our research goal is to explore text mining from the metadata included in MEDLINE documents. We present MeSHmap our prototype text mining system that exploits the MeSH indexing accompanying MEDLINE records. MeSHmap supports searches via PubMed followed by user driven exploration of the MeSH terms and subheadings in the retrieved set. The potential of the system goes beyond text retrieval. It may also be used to compare entities of the same type such as pairs of drugs or pairs of procedures et...

Text mining with R a tidy approach

CERN Document Server

Silge, Julia

2017-01-01

Much of the data available today is unstructured and text-heavy, making it challenging for analysts to apply their usual data wrangling and visualization tools. With this practical book, you'll explore text-mining techniques with tidytext, a package that authors Julia Silge and David Robinson developed using the tidy principles behind R packages like ggraph and dplyr. You'll learn how tidytext and other tidy tools in R can make text analysis easier and more effective. The authors demonstrate how treating text as data frames enables you to manipulate, summarize, and visualize characteristics of text. You'll also learn how to integrate natural language processing (NLP) into effective workflows. Practical code examples and data explorations will help you generate real insights from literature, news, and social media. Learn how to apply the tidy text format to NLP Use sentiment analysis to mine the emotional content of text Identify a document's most important terms with frequency measurements E...
Text mining in the classification of digital documents

Directory of Open Access Journals (Sweden)

Marcial Contreras Barrera

2016-11-01

Full Text Available Objective: Develop an automated classifier for the classification of bibliographic material by means of the text mining. Methodology: The text mining is used for the development of the classifier, based on a method of type supervised, conformed by two phases; learning and recognition, in the learning phase, the classifier learns patterns across the analysis of bibliographical records, of the classification Z, belonging to library science, information sciences and information resources, recovered from the database LIBRUNAM, in this phase is obtained the classifier capable of recognizing different subclasses (LC. In the recognition phase the classifier is validated and evaluates across classification tests, for this end bibliographical records of the classification Z are taken randomly, classified by a cataloguer and processed by the automated classifier, in order to obtain the precision of the automated classifier. Results: The application of the text mining achieved the development of the automated classifier, through the method classifying documents supervised type. The precision of the classifier was calculated doing the comparison among the assigned topics manually and automated obtaining 75.70% of precision. Conclusions: The application of text mining facilitated the creation of automated classifier, allowing to obtain useful technology for the classification of bibliographical material with the aim of improving and speed up the process of organizing digital documents.
Multilingual School Population: Ensuring School Belonging by Tolerating Multilingualism

Science.gov (United States)

Van Der Wildt, Anouk; Van Avermaet, Piet; Van Houtte, Mieke

2017-01-01

Societies have become super-diverse due to migration and globalization. Many mainstream classroom teachers feel managing the linguistic variety children bring to school is challenging. This often leads to restrictive language policies. Research on multilingualism has given us insight into the multilingual realities of pupils, which allows us to…
Teachers' Beliefs about Multilingualism and a Multilingual Pedagogical Approach

Science.gov (United States)

Haukås, Åsta

2016-01-01

Knowledge of teachers' beliefs is central to understanding teachers' decision-making in the classroom. The present study explores Norwegian language teachers' beliefs about multilingualism and the use of a multilingual pedagogical approach in the third-language (L3) classroom. This study analysed data collected via focus group discussions with 12…
Text mining for traditional Chinese medical knowledge discovery: a survey.

Science.gov (United States)

Zhou, Xuezhong; Peng, Yonghong; Liu, Baoyan

2010-08-01

Extracting meaningful information and knowledge from free text is the subject of considerable research interest in the machine learning and data mining fields. Text data mining (or text mining) has become one of the most active research sub-fields in data mining. Significant developments in the area of biomedical text mining during the past years have demonstrated its great promise for supporting scientists in developing novel hypotheses and new knowledge from the biomedical literature. Traditional Chinese medicine (TCM) provides a distinct methodology with which to view human life. It is one of the most complete and distinguished traditional medicines with a history of several thousand years of studying and practicing the diagnosis and treatment of human disease. It has been shown that the TCM knowledge obtained from clinical practice has become a significant complementary source of information for modern biomedical sciences. TCM literature obtained from the historical period and from modern clinical studies has recently been transformed into digital data in the form of relational databases or text documents, which provide an effective platform for information sharing and retrieval. This motivates and facilitates research and development into knowledge discovery approaches and to modernize TCM. In order to contribute to this still growing field, this paper presents (1) a comparative introduction to TCM and modern biomedicine, (2) a survey of the related information sources of TCM, (3) a review and discussion of the state of the art and the development of text mining techniques with applications to TCM, (4) a discussion of the research issues around TCM text mining and its future directions. Copyright 2010 Elsevier Inc. All rights reserved.
PathText: a text mining integrator for biological pathway visualizations

Science.gov (United States)

Kemper, Brian; Matsuzaki, Takuya; Matsuoka, Yukiko; Tsuruoka, Yoshimasa; Kitano, Hiroaki; Ananiadou, Sophia; Tsujii, Jun'ichi

2010-01-01

Motivation: Metabolic and signaling pathways are an increasingly important part of organizing knowledge in systems biology. They serve to integrate collective interpretations of facts scattered throughout literature. Biologists construct a pathway by reading a large number of articles and interpreting them as a consistent network, but most of the models constructed currently lack direct links to those articles. Biologists who want to check the original articles have to spend substantial amounts of time to collect relevant articles and identify the sections relevant to the pathway. Furthermore, with the scientific literature expanding by several thousand papers per week, keeping a model relevant requires a continuous curation effort. In this article, we present a system designed to integrate a pathway visualizer, text mining systems and annotation tools into a seamless environment. This will enable biologists to freely move between parts of a pathway and relevant sections of articles, as well as identify relevant papers from large text bases. The system, PathText, is developed by Systems Biology Institute, Okinawa Institute of Science and Technology, National Centre for Text Mining (University of Manchester) and the University of Tokyo, and is being used by groups of biologists from these locations. Contact: brian@monrovian.com. PMID:20529930
Aspects of Text Mining From Computational Semiotics to Systemic Functional Hypertexts

Directory of Open Access Journals (Sweden)

Alexander Mehler

2001-05-01

Full Text Available The significance of natural language texts as the prime information structure for the management and dissemination of knowledge in organisations is still increasing. Making relevant documents available depending on varying tasks in different contexts is of primary importance for any efficient task completion. Implementing this demand requires the content based processing of texts, which enables to reconstruct or, if necessary, to explore the relationship of task, context and document. Text mining is a technology that is suitable for solving problems of this kind. In the following, semiotic aspects of text mining are investigated. Based on the primary object of text mining - natural language lexis - the specific complexity of this class of signs is outlined and requirements for the implementation of text mining procedures are derived. This is done with reference to text linkage introduced as a special task in text mining. Text linkage refers to the exploration of implicit, content based relations of texts (and their annotation as typed links in corpora possibly organised as hypertexts. In this context, the term systemic functional hypertext is introduced, which distinguishes genre and register layers for the management of links in a poly-level hypertext system.
UNITY IN DIVERSITY. THE EUROPEAN UNION’S MULTILINGUALISM

Directory of Open Access Journals (Sweden)

Laura-Cristiana SPĂTARU-NEGURĂ

2016-06-01

Full Text Available It is undeniable that the European Union represents the most ambitious legal and linguistic project, integrating 28 Member States and 24 official languages. What we undertook with this study was to explore the importance of multilingualism in the European Union and the problems that unity in diversity involves. This study tried to touch upon both theoretical aspects (i.e., what the multilingualism of EU law implies and practical issues (i.e., the interaction between legal languages at national and at EU level, problems emerging from multilingualism, illustrated by the relevant case law of the European Court of Justice. In many ECJ cases, it was underlined that multilingualism is essential to the EU legal order. The meaning of EU law cannot be derived from one version of the official languages and the ECJ regularly heads for a uniform interpretation of the contradictory versions. The present study is part of a more complex research on this theme and it is meant to approach certain important points of my PhD thesis. A first part of this research on multilingualism has already been published.
Multilingualism and Specific Language Impairment

OpenAIRE

Engel de Abreu, Pascale

2014-01-01

Is a multilingual education beneficial for children? What are the optimal conditions under which a child can become perfectly multilingual? When should we be concerned about a multilingual child's language skills? What are the signs of Specific Language Impairment in a child who speaks more than one language? Developmental psychologist and Associate Professor in multilingual cognitive development at the University of Luxembourg Pascale Engel de Abreu will address these questions based on what...
PubRunner: A light-weight framework for updating text mining results.

Science.gov (United States)

Anekalla, Kishore R; Courneya, J P; Fiorini, Nicolas; Lever, Jake; Muchow, Michael; Busby, Ben

2017-01-01

Biomedical text mining promises to assist biologists in quickly navigating the combined knowledge in their domain. This would allow improved understanding of the complex interactions within biological systems and faster hypothesis generation. New biomedical research articles are published daily and text mining tools are only as good as the corpus from which they work. Many text mining tools are underused because their results are static and do not reflect the constantly expanding knowledge in the field. In order for biomedical text mining to become an indispensable tool used by researchers, this problem must be addressed. To this end, we present PubRunner, a framework for regularly running text mining tools on the latest publications. PubRunner is lightweight, simple to use, and can be integrated with an existing text mining tool. The workflow involves downloading the latest abstracts from PubMed, executing a user-defined tool, pushing the resulting data to a public FTP or Zenodo dataset, and publicizing the location of these results on the public PubRunner website. We illustrate the use of this tool by re-running the commonly used word2vec tool on the latest PubMed abstracts to generate up-to-date word vector representations for the biomedical domain. This shows a proof of concept that we hope will encourage text mining developers to build tools that truly will aid biologists in exploring the latest publications.
Book Review: Multilingualism online | Roux | Southern African ...

African Journals Online (AJOL)

Book Title: Multilingualism online. Book Author: Carmen Lee. 2017. London and New York: Routledge. ISBN 9781138900493. 170 pages. Full Text: EMAIL FULL TEXT EMAIL FULL TEXT · DOWNLOAD FULL TEXT DOWNLOAD FULL TEXT · http://dx.doi.org/10.2989/16073614.2017.1373369 · AJOL African Journals Online.
Text mining a self-report back-translation.

Science.gov (United States)

Blanch, Angel; Aluja, Anton

2016-06-01

There are several recommendations about the routine to undertake when back translating self-report instruments in cross-cultural research. However, text mining methods have been generally ignored within this field. This work describes a text mining innovative application useful to adapt a personality questionnaire to 12 different languages. The method is divided in 3 different stages, a descriptive analysis of the available back-translated instrument versions, a dissimilarity assessment between the source language instrument and the 12 back-translations, and an item assessment of item meaning equivalence. The suggested method contributes to improve the back-translation process of self-report instruments for cross-cultural research in 2 significant intertwined ways. First, it defines a systematic approach to the back translation issue, allowing for a more orderly and informed evaluation concerning the equivalence of different versions of the same instrument in different languages. Second, it provides more accurate instrument back-translations, which has direct implications for the reliability and validity of the instrument's test scores when used in different cultures/languages. In addition, this procedure can be extended to the back-translation of self-reports measuring psychological constructs in clinical assessment. Future research works could refine the suggested methodology and use additional available text mining tools. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Text Mining of Journal Articles for Sleep Disorder Terminologies.

Directory of Open Access Journals (Sweden)

Calvin Lam

Full Text Available Research on publication trends in journal articles on sleep disorders (SDs and the associated methodologies by using text mining has been limited. The present study involved text mining for terms to determine the publication trends in sleep-related journal articles published during 2000-2013 and to identify associations between SD and methodology terms as well as conducting statistical analyses of the text mining findings.SD and methodology terms were extracted from 3,720 sleep-related journal articles in the PubMed database by using MetaMap. The extracted data set was analyzed using hierarchical cluster analyses and adjusted logistic regression models to investigate publication trends and associations between SD and methodology terms.MetaMap had a text mining precision, recall, and false positive rate of 0.70, 0.77, and 11.51%, respectively. The most common SD term was breathing-related sleep disorder, whereas narcolepsy was the least common. Cluster analyses showed similar methodology clusters for each SD term, except narcolepsy. The logistic regression models showed an increasing prevalence of insomnia, parasomnia, and other sleep disorders but a decreasing prevalence of breathing-related sleep disorder during 2000-2013. Different SD terms were positively associated with different methodology terms regarding research design terms, measure terms, and analysis terms.Insomnia-, parasomnia-, and other sleep disorder-related articles showed an increasing publication trend, whereas those related to breathing-related sleep disorder showed a decreasing trend. Furthermore, experimental studies more commonly focused on hypersomnia and other SDs and less commonly on insomnia, breathing-related sleep disorder, narcolepsy, and parasomnia. Thus, text mining may facilitate the exploration of the publication trends in SDs and the associated methodologies.
Science and Technology Text Mining Basic Concepts

National Research Council Canada - National Science Library

Losiewicz, Paul

2003-01-01

...). It then presents some of the most widely used data and text mining techniques, including clustering and classification methods, such as nearest neighbor, relational learning models, and genetic...
Inequalities of Multilingualism: Challenges to Mother Tongue-Based Multilingual Education

Science.gov (United States)

Tupas, Ruanni

2015-01-01

This paper discusses structural and ideological challenges to mother tongue-based multilingual education (MTB-MLE) which has in recent years been gaining ground in many educational contexts around the world. The paper argues, however, that MTB-MLE is set against these challenges - referred to here as inequalities of multilingualism - which prevent…
A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts.

Science.gov (United States)

Westergaard, David; Stærfeldt, Hans-Henrik; Tønsberg, Christian; Jensen, Lars Juhl; Brunak, Søren

2018-02-01

Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823-2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein-protein, disease-gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only.
Building multilingual learning environments in early years education

Directory of Open Access Journals (Sweden)

Martin Dodman

2016-07-01

Full Text Available This paper examines the early language development of children with particular reference to the importance of personal multilingualism and the reasons why this should be promoted in early years education. It is argued that such an objective is best achieved by building multilingual learning environments at the level of nursery and infant schools. The characteristics of such environments are described and ways of evaluating projects designed to build them are presented.
Multilingual Information Discovery and AccesS (MIDAS): A Joint ACM DL'99/ ACM SIGIR'99 Workshop.

Science.gov (United States)

Oard, Douglas; Peters, Carol; Ruiz, Miguel; Frederking, Robert; Klavans, Judith; Sheridan, Paraic

1999-01-01

Discusses a multidisciplinary workshop that addressed issues concerning internationally distributed information networks. Highlights include multilingual information access in media other than character-coded text; cross-language information retrieval and multilingual metadata; and evaluation of multilingual systems. (LRW)
Benefits of Multilingualism in Education

Science.gov (United States)

Okal, Benard Odoyo

2014-01-01

The article gives a brief analytical survey of multilingualism practices, its consequences, its benefits in education and discussions on the appropriate ways towards its achievement in education. Multilingualism refers to speaking more than one language competently. Generally there are both the official and unofficial multilingualism practices. A…
Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges.

Science.gov (United States)

Singhal, Ayush; Leaman, Robert; Catlett, Natalie; Lemberger, Thomas; McEntyre, Johanna; Polson, Shawn; Xenarios, Ioannis; Arighi, Cecilia; Lu, Zhiyong

2016-01-01

Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system 'accuracy' remains a challenge and identify several additional common difficulties and potential research directions including (i) the 'scalability' issue due to the increasing need of mining information from millions of full-text articles, (ii) the 'interoperability' issue of integrating various text-mining systems into existing curation workflows and (iii) the 'reusability' issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. Finally, in order to sustain the curation ecosystem and have text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.

Preparing FCS Professionals for a Multilingual Society: Building Community through the Experiences of Multilingual Families

Science.gov (United States)

Duncan, Janine; Duncan, Daniel

2014-01-01

As demographics in the United States shift, family and consumer sciences (FCS) professionals must be prepared to foster healthy communities that embrace multilingual families. Because hegemonic language ideologies challenge multilingual families, FCS professionals need to know how to inclusively reframe communities to honor multilingual families.…
Multi-lingual Opinion Mining on YouTube

NARCIS (Netherlands)

Severyn, Aliaksei; Moschitti, Alessandro; Uryupina, Olga; Plank, Barbara; Filippova, Katja

In order to successfully apply opinion mining (OM) to the large amounts of user-generated content produced every day, we need robust models that can handle the noisy input well yet can easily be adapted to a new domain or language. We here focus on opinion mining for YouTube by (i) modeling
pubmed.mineR: An R package with text-mining algorithms to ...

Indian Academy of Sciences (India)

2015-09-29

Sep 29, 2015 ... using text-mining algorithms for biomedical research pur- poses. ... studies are described to illustrate some potential uses of ... This is the most applied task. ... other alphabets (for example, Greek alphabets) and hyphens.
A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts

Science.gov (United States)

Westergaard, David; Stærfeldt, Hans-Henrik

2018-01-01

Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823–2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein–protein, disease–gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only. PMID:29447159
EnvMine: A text-mining system for the automatic extraction of contextual information

Directory of Open Access Journals (Sweden)

de Lorenzo Victor

2010-06-01

Full Text Available Abstract Background For ecological studies, it is crucial to count on adequate descriptions of the environments and samples being studied. Such a description must be done in terms of their physicochemical characteristics, allowing a direct comparison between different environments that would be difficult to do otherwise. Also the characterization must include the precise geographical location, to make possible the study of geographical distributions and biogeographical patterns. Currently, there is no schema for annotating these environmental features, and these data have to be extracted from textual sources (published articles. So far, this had to be performed by manual inspection of the corresponding documents. To facilitate this task, we have developed EnvMine, a set of text-mining tools devoted to retrieve contextual information (physicochemical variables and geographical locations from textual sources of any kind. Results EnvMine is capable of retrieving the physicochemical variables cited in the text, by means of the accurate identification of their associated units of measurement. In this task, the system achieves a recall (percentage of items retrieved of 92% with less than 1% error. Also a Bayesian classifier was tested for distinguishing parts of the text describing environmental characteristics from others dealing with, for instance, experimental settings. Regarding the identification of geographical locations, the system takes advantage of existing databases such as GeoNames to achieve 86% recall with 92% precision. The identification of a location includes also the determination of its exact coordinates (latitude and longitude, thus allowing the calculation of distance between the individual locations. Conclusion EnvMine is a very efficient method for extracting contextual information from different text sources, like published articles or web pages. This tool can help in determining the precise location and physicochemical
Text Mining of Journal Articles for Sleep Disorder Terminologies.

Science.gov (United States)

Lam, Calvin; Lai, Fu-Chih; Wang, Chia-Hui; Lai, Mei-Hsin; Hsu, Nanly; Chung, Min-Huey

2016-01-01

Research on publication trends in journal articles on sleep disorders (SDs) and the associated methodologies by using text mining has been limited. The present study involved text mining for terms to determine the publication trends in sleep-related journal articles published during 2000-2013 and to identify associations between SD and methodology terms as well as conducting statistical analyses of the text mining findings. SD and methodology terms were extracted from 3,720 sleep-related journal articles in the PubMed database by using MetaMap. The extracted data set was analyzed using hierarchical cluster analyses and adjusted logistic regression models to investigate publication trends and associations between SD and methodology terms. MetaMap had a text mining precision, recall, and false positive rate of 0.70, 0.77, and 11.51%, respectively. The most common SD term was breathing-related sleep disorder, whereas narcolepsy was the least common. Cluster analyses showed similar methodology clusters for each SD term, except narcolepsy. The logistic regression models showed an increasing prevalence of insomnia, parasomnia, and other sleep disorders but a decreasing prevalence of breathing-related sleep disorder during 2000-2013. Different SD terms were positively associated with different methodology terms regarding research design terms, measure terms, and analysis terms. Insomnia-, parasomnia-, and other sleep disorder-related articles showed an increasing publication trend, whereas those related to breathing-related sleep disorder showed a decreasing trend. Furthermore, experimental studies more commonly focused on hypersomnia and other SDs and less commonly on insomnia, breathing-related sleep disorder, narcolepsy, and parasomnia. Thus, text mining may facilitate the exploration of the publication trends in SDs and the associated methodologies.
Text mining improves prediction of protein functional sites.

Directory of Open Access Journals (Sweden)

Karin M Verspoor

Full Text Available We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites. The structure analysis was carried out using Dynamics Perturbation Analysis (DPA, which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions.
A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts

DEFF Research Database (Denmark)

Westergaard, David; Stærfeldt, Hans Henrik; Tønsberg, Christian

2018-01-01

Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15...... subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full...... million English scientific full-text articles published during the period 1823-2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein-protein, disease-gene, and protein...
Embracing African languages as indispensable resources through the promotion of multilingualism

Directory of Open Access Journals (Sweden)

Ndimande-Hlongwa, Nobuhle

2017-06-01

Full Text Available This paper seeks to explore the potential significance of additive multilingualism in South Africa’s multilingual society. Additive multilingualism treasures the principle of equality among all 11 official languages. Therefore, our point of departure is the South African Constitution and various policy provisions that advocate for a multilingual mode of operation. The paper is premised upon the potential value of multilingualism that encompasses indigenous African languages and the view of language as a resource. This concurs with the language policy of the University of KwaZulu-Natal (UKZN, which seeks to promote a multilingual society. Perceptions and experiences of a group of part-time LLB students regarding the learning of isiZulu as an additional language at UKZN were solicited in this study. The ‘language as a resource’ framework was employed as the theoretical approach of the study. The study established an acknowledgement of the resourcefulness of isiZulu as instrumental in fostering social cohesion, breaking communication barriers, and dispelling misconceptions about the value of these languages.
Imitating manual curation of text-mined facts in biomedicine.

Directory of Open Access Journals (Sweden)

Raul Rodriguez-Esteban

2006-09-01

Full Text Available Text-mining algorithms make mistakes in extracting facts from natural-language texts. In biomedical applications, which rely on use of text-mined data, it is critical to assess the quality (the probability that the message is correctly extracted of individual facts--to resolve data conflicts and inconsistencies. Using a large set of almost 100,000 manually produced evaluations (most facts were independently reviewed more than once, producing independent evaluations, we implemented and tested a collection of algorithms that mimic human evaluation of facts provided by an automated information-extraction system. The performance of our best automated classifiers closely approached that of our human evaluators (ROC score close to 0.95. Our hypothesis is that, were we to use a larger number of human experts to evaluate any given sentence, we could implement an artificial-intelligence curator that would perform the classification job at least as accurately as an average individual human evaluator. We illustrated our analysis by visualizing the predicted accuracy of the text-mined relations involving the term cocaine.
English and Arabic Inscriptions in the Linguistic Landscape of Yemen: A Multilingual Writing Approach

Directory of Open Access Journals (Sweden)

Anwar A. H. Al-Athwary

2017-05-01

Full Text Available The present paper investigates the multilingual written texts of the signboards in the public space of Yemen. It attempts to apply Reh's (2004 typology of multilingual writing. Reh introduces four strategies of multilingualism: duplicating, fragmentary, overlapping, and complementary. They refer to the arrangement of information in the inscriptions of multilingual signs in a given linguistic landscape (LL. To achieve this purpose, a data corpus of 755 multilingual signs in the LL of Yemen has been used, the majority of which are bilingual in Arabic and English. The analysis showed that all four strategies of duplicating, fragmentary, overlapping, and complementary multilingual writings were generally employed in Sana'a's LL. While overlapping and complementary multilingualism were totally absent in the top-down signs, duplicating and fragmentary multilingualism had much higher frequency over overlapping and complementary ones in bottom-up signs. Keeping in mind that speech community in Yemen is monolingual in Arabic, the absence or low frequency of overlapping, and complementary signs in both top-down and bottom-up levels can be explained by the fact that these two types of texts presuppose multilingual readers since knowledge of all the languages involved is necessary to understand the whole message. The model of writing mimicry system proposed by Sutherland (2015 is also examined. Writing mimicry system was found to be a salient feature of the public space of Yemen performing some specific functions; it is only used for advertising and promotional purposes rather than expressing the identity of ethnolinguistic minorities. The study also revealed that Sana'a multilingual LL is characterized by the use of Arabicised English, glocalisation and multifunctional signs, all of which are employed to serve a general purpose of promoting, and advertising commodities and showing modernity and success. Standard Arabic appears on almost all of both top
A Survey of Text Mining in Social Media: Facebook and Twitter Perspectives

Directory of Open Access Journals (Sweden)

Said A. Salloum

2017-01-01

Full Text Available Text mining has become one of the trendy fields that has been incorporated in several research fields such as computational linguistics, Information Retrieval (IR and data mining. Natural Language Processing (NLP techniques were used to extract knowledge from the textual text that is written by human beings. Text mining reads an unstructured form of data to provide meaningful information patterns in a shortest time period. Social networking sites are a great source of communication as most of the people in today’s world use these sites in their daily lives to keep connected to each other. It becomes a common practice to not write a sentence with correct grammar and spelling. This practice may lead to different kinds of ambiguities like lexical, syntactic, and semantic and due to this type of unclear data, it is hard to find out the actual data order. Accordingly, we are conducting an investigation with the aim of looking for different text mining methods to get various textual orders on social media websites. This survey aims to describe how studies in social media have used text analytics and text mining techniques for the purpose of identifying the key themes in the data. This survey focused on analyzing the text mining studies related to Facebook and Twitter; the two dominant social media in the world. Results of this survey can serve as the baselines for future text mining research.
Understanding Editing Behaviors in Multilingual Wikipedia.

Science.gov (United States)

Kim, Suin; Park, Sungjoon; Hale, Scott A; Kim, Sooyoung; Byun, Jeongmin; Oh, Alice H

2016-01-01

Multilingualism is common offline, but we have a more limited understanding of the ways multilingualism is displayed online and the roles that multilinguals play in the spread of content between speakers of different languages. We take a computational approach to studying multilingualism using one of the largest user-generated content platforms, Wikipedia. We study multilingualism by collecting and analyzing a large dataset of the content written by multilingual editors of the English, German, and Spanish editions of Wikipedia. This dataset contains over two million paragraphs edited by over 15,000 multilingual users from July 8 to August 9, 2013. We analyze these multilingual editors in terms of their engagement, interests, and language proficiency in their primary and non-primary (secondary) languages and find that the English edition of Wikipedia displays different dynamics from the Spanish and German editions. Users primarily editing the Spanish and German editions make more complex edits than users who edit these editions as a second language. In contrast, users editing the English edition as a second language make edits that are just as complex as the edits by users who primarily edit the English edition. In this way, English serves a special role bringing together content written by multilinguals from many language editions. Nonetheless, language remains a formidable hurdle to the spread of content: we find evidence for a complexity barrier whereby editors are less likely to edit complex content in a second language. In addition, we find that multilinguals are less engaged and show lower levels of language proficiency in their second languages. We also examine the topical interests of multilingual editors and find that there is no significant difference between primary and non-primary editors in each language.
Text Mining Improves Prediction of Protein Functional Sites

Science.gov (United States)

Cohn, Judith D.; Ravikumar, Komandur E.

2012-01-01

We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites). The structure analysis was carried out using Dynamics Perturbation Analysis (DPA), which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites) in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions. PMID:22393388
DISEASES: text mining and data integration of disease-gene associations.

Science.gov (United States)

Pletscher-Frankild, Sune; Pallejà, Albert; Tsafou, Kalliopi; Binder, Janos X; Jensen, Lars Juhl

2015-03-01

Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease-gene associations from biomedical abstracts. The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences both within and between sentences. We show that this approach is able to extract half of all manually curated associations with a false positive rate of only 0.16%. Nonetheless, text mining should not stand alone, but be combined with other types of evidence. For this reason, we have developed the DISEASES resource, which integrates the results from text mining with manually curated disease-gene associations, cancer mutation data, and genome-wide association studies from existing databases. The DISEASES resource is accessible through a web interface at http://diseases.jensenlab.org/, where the text-mining software and all associations are also freely available for download. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Concept of Multilingualism as Strategy of Language Policy and Foreign-Language Education in Europe

Directory of Open Access Journals (Sweden)

I A Korotova

2015-12-01

Full Text Available In this article the language policy of the European Union (EU in the field of lobbying the concept of multilingual Europe is considered. In this research the didactic aspects of the policy of multilingualism are accented, and also the results of the approbation of policy of multilingualism in the educational theory and practice of the EU are analyzed.
Automated detection of follow-up appointments using text mining of discharge records.

Science.gov (United States)

Ruud, Kari L; Johnson, Matthew G; Liesinger, Juliette T; Grafft, Carrie A; Naessens, James M

2010-06-01

To determine whether text mining can accurately detect specific follow-up appointment criteria in free-text hospital discharge records. Cross-sectional study. Mayo Clinic Rochester hospitals. Inpatients discharged from general medicine services in 2006 (n = 6481). Textual hospital dismissal summaries were manually reviewed to determine whether the records contained specific follow-up appointment arrangement elements: date, time and either physician or location for an appointment. The data set was evaluated for the same criteria using SAS Text Miner software. The two assessments were compared to determine the accuracy of text mining for detecting records containing follow-up appointment arrangements. Agreement of text-mined appointment findings with gold standard (manual abstraction) including sensitivity, specificity, positive predictive and negative predictive values (PPV and NPV). About 55.2% (3576) of discharge records contained all criteria for follow-up appointment arrangements according to the manual review, 3.2% (113) of which were missed through text mining. Text mining incorrectly identified 3.7% (107) follow-up appointments that were not considered valid through manual review. Therefore, the text mining analysis concurred with the manual review in 96.6% of the appointment findings. Overall sensitivity and specificity were 96.8 and 96.3%, respectively; and PPV and NPV were 97.0 and 96.1%, respectively. of individual appointment criteria resulted in accuracy rates of 93.5% for date, 97.4% for time, 97.5% for physician and 82.9% for location. Text mining of unstructured hospital dismissal summaries can accurately detect documentation of follow-up appointment arrangement elements, thus saving considerable resources for performance assessment and quality-related research.
Monitoring interaction and collective text production through text mining

Directory of Open Access Journals (Sweden)

Macedo, Alexandra Lorandi

2014-04-01

Full Text Available This article presents the Concepts Network tool, developed using text mining technology. The main objective of this tool is to extract and relate terms of greatest incidence from a text and exhibit the results in the form of a graph. The Network was implemented in the Collective Text Editor (CTE which is an online tool that allows the production of texts in synchronized or non-synchronized forms. This article describes the application of the Network both in texts produced collectively and texts produced in a forum. The purpose of the tool is to offer support to the teacher in managing the high volume of data generated in the process of interaction amongst students and in the construction of the text. Specifically, the aim is to facilitate the teacher’s job by allowing him/her to process data in a shorter time than is currently demanded. The results suggest that the Concepts Network can aid the teacher, as it provides indicators of the quality of the text produced. Moreover, messages posted in forums can be analyzed without their content necessarily having to be pre-read.
DrugQuest - a text mining workflow for drug association discovery.

Science.gov (United States)

Papanikolaou, Nikolas; Pavlopoulos, Georgios A; Theodosiou, Theodosios; Vizirianakis, Ioannis S; Iliopoulos, Ioannis

2016-06-06

Text mining and data integration methods are gaining ground in the field of health sciences due to the exponential growth of bio-medical literature and information stored in biological databases. While such methods mostly try to extract bioentity associations from PubMed, very few of them are dedicated in mining other types of repositories such as chemical databases. Herein, we apply a text mining approach on the DrugBank database in order to explore drug associations based on the DrugBank "Description", "Indication", "Pharmacodynamics" and "Mechanism of Action" text fields. We apply Name Entity Recognition (NER) techniques on these fields to identify chemicals, proteins, genes, pathways, diseases, and we utilize the TextQuest algorithm to find additional biologically significant words. Using a plethora of similarity and partitional clustering techniques, we group the DrugBank records based on their common terms and investigate possible scenarios why these records are clustered together. Different views such as clustered chemicals based on their textual information, tag clouds consisting of Significant Terms along with the terms that were used for clustering are delivered to the user through a user-friendly web interface. DrugQuest is a text mining tool for knowledge discovery: it is designed to cluster DrugBank records based on text attributes in order to find new associations between drugs. The service is freely available at http://bioinformatics.med.uoc.gr/drugquest .
Design Considerations for Multilingual Web Sites

Directory of Open Access Journals (Sweden)

Joan Starr

2005-09-01

Full Text Available The most powerful marketing, service, and information-distribution tool a library has today is its Web site, but providing Web content in many languages is complex. Before allocating scarce technical and financial resources, it is valuable to learn about writing systems, types of writing, how computers render and represent writing systems, and to study potential problem areas and their possible solutions. The accepted Web standard for presenting languages is Unicode and a full understanding of its history and the coding tools it provides is essential to making appropriate decisions for specific multilingual and internationalization projects. Actual coding examples, as well as a sampling of existing multilingual library services, also serve to illuminate the path of implementation.

Multilingualism and social inclusion

NARCIS (Netherlands)

Marácz, L.; Adamo, S.

2017-01-01

This is a thematic issue on the relation between multilingualism and social inclusion. Due to globalization, Europeanization, supranational and transnational regulations linguistic diversity and multilingualism are on the rise. Migration and old and new forms of mobility play an important role in
CONAN : Text Mining in the Biomedical Domain

NARCIS (Netherlands)

Malik, R.

2006-01-01

This thesis is about Text Mining. Extracting important information from literature. In the last years, the number of biomedical articles and journals is growing exponentially. Scientists might not find the information they want because of the large number of publications. Therefore a system was
Multilingual Practices of University Students and Changing Forms of Multilingualism in Luxembourg

Science.gov (United States)

de Bres, Julia; Franziskus, Anne

2014-01-01

With its own national language, Luxembourgish, and three languages of administration, French, German and Luxembourgish, Luxembourg has long been a very multilingual country. The nature of this multilingualism is now changing, due to the rising proportion of migrants in the country, who now make up 43% of the resident population. The changing…
Opinion Mining in Latvian Text Using Semantic Polarity Analysis and Machine Learning Approach

Directory of Open Access Journals (Sweden)

Gatis Špats

2016-07-01

Full Text Available In this paper we demonstrate approaches for opinion mining in Latvian text. Authors have applied, combined and extended results of several previous studies and public resources to perform opinion mining in Latvian text using two approaches, namely, semantic polarity analysis and machine learning. One of the most significant constraints that make application of opinion mining for written content classification in Latvian text challenging is the limited publicly available text corpora for classifier training. We have joined several sources and created a publically available extended lexicon. Our results are comparable to or outperform current achievements in opinion mining in Latvian. Experiments show that lexicon-based methods provide more accurate opinion mining than the application of Naive Bayes machine learning classifier on Latvian tweets. Methods used during this study could be further extended using human annotators, unsupervised machine learning and bootstrapping to create larger corpora of classified text.
Multilingualism As A Contemporary Phenomenon; Its Potential for Teachers And Learners

Directory of Open Access Journals (Sweden)

Luminita DIACONU

2018-10-01

Full Text Available Knowledge of teachers’ beliefs is central to understanding teachers’ decision-making in the classroom. The present study explores international language teachers’ beliefs about multilingualism and the use of a multilingual pedagogical approach in the third-language (L3 classroom. This study analyzed data collected with 12 teachers of French (N = 4, German (N = 2 and Spanish (N = 6 using qualitative content analysis. Three main themes emerged from the analysis. (1 The teachers view multilingualism as a potentially positive asset. Although they think that multilingualism has benefited their own language learning, they do not conclude that multilingualism is automatically an asset to students. (2 The teachers claim to make frequent use of their students’ linguistic knowledge of English when teaching the L3. However, the teachers rarely focus on the transfer of learning strategies because they believe that learning an L3 is completely different from learning the second language L2 English. (3 The teachers think that collaboration across languages could enhance students’ language learning; however, no such collaboration currently exists.
Beyond accuracy: creating interoperable and scalable text-mining web services.

Science.gov (United States)

Wei, Chih-Hsuan; Leaman, Robert; Lu, Zhiyong

2016-06-15

The biomedical literature is a knowledge-rich resource and an important foundation for future research. With over 24 million articles in PubMed and an increasing growth rate, research in automated text processing is becoming increasingly important. We report here our recently developed web-based text mining services for biomedical concept recognition and normalization. Unlike most text-mining software tools, our web services integrate several state-of-the-art entity tagging systems (DNorm, GNormPlus, SR4GN, tmChem and tmVar) and offer a batch-processing mode able to process arbitrary text input (e.g. scholarly publications, patents and medical records) in multiple formats (e.g. BioC). We support multiple standards to make our service interoperable and allow simpler integration with other text-processing pipelines. To maximize scalability, we have preprocessed all PubMed articles, and use a computer cluster for processing large requests of arbitrary text. Our text-mining web service is freely available at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/#curl : Zhiyong.Lu@nih.gov. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.
Embodying multilingual interaction

DEFF Research Database (Denmark)

Hazel, Spencer; Mortensen, Janus

this linguistic diversity is managed in situ by participants engaged in dialogue with one another, and what it is used for in these transient multilingual communities. This paper presents CA-based micro-ethnographic analyses of language choice in an informal social setting – a kitchen – of an international study...... literature on language choice in interaction, our findings emphasize that analyses of language choice in multilingual settings need to take into account social actions beyond the words that are spoken. We show that facial, spatial and postural configurations, gaze orientation and gestures as well as prosodic...... in the particular community of practice that we are investigating. Reference Hazel, Spencer, and Janus Mortensen. forthcoming. Kitchen talk: Exploring linguistic practices in liminal institutional interactions in a multilingual university setting. in Language Alternation, Language Choice, and Language Encounter...
Heterogeneity: multilingualism and democracy

Directory of Open Access Journals (Sweden)

Hans-Jürgen Krumm

2004-01-01

Full Text Available Linguistic diversity and multilingualism on the part of individuals are aprerequisite and a constitutive condition of enabling people to live togetherin a world of growing heterogeneity. Foreign language teaching plays animportant part in democratic education because it can be seen as a trainingin respecting otherness and developing an intercultural, non-ethnocentricperception and attitude. This is all the more important because of the neces-sity of integrating children from migrant families into school life.My article argues that language education policy has to take this per-spective into account, i.e., of establishing a planned diversification so thatpupils (and their parents will not feel satisfied with learning English only,but also become motivated to learn languages of their own neighbourhood,such as migrant and minority languages. However, in order to make use ofthe linguistic resources in the classroom, relating it to the democratic impetusof foreign language education, it is necessary to revise existing languagepolicies and to develop a multilingual perspective for all educational institutions.
Using ontology network structure in text mining.

Science.gov (United States)

Berndt, Donald J; McCart, James A; Luther, Stephen L

2010-11-13

Statistical text mining treats documents as bags of words, with a focus on term frequencies within documents and across document collections. Unlike natural language processing (NLP) techniques that rely on an engineered vocabulary or a full-featured ontology, statistical approaches do not make use of domain-specific knowledge. The freedom from biases can be an advantage, but at the cost of ignoring potentially valuable knowledge. The approach proposed here investigates a hybrid strategy based on computing graph measures of term importance over an entire ontology and injecting the measures into the statistical text mining process. As a starting point, we adapt existing search engine algorithms such as PageRank and HITS to determine term importance within an ontology graph. The graph-theoretic approach is evaluated using a smoking data set from the i2b2 National Center for Biomedical Computing, cast as a simple binary classification task for categorizing smoking-related documents, demonstrating consistent improvements in accuracy.
A tm Plug-In for Distributed Text Mining in R

Directory of Open Access Journals (Sweden)

Stefan Theussl

2012-11-01

Full Text Available R has gained explicit text mining support with the tm package enabling statisticians to answer many interesting research questions via statistical analysis or modeling of (text corpora. However, we typically face two challenges when analyzing large corpora: (1 the amount of data to be processed in a single machine is usually limited by the available main memory (i.e., RAM, and (2 the more data to be analyzed the higher the need for efficient procedures for calculating valuable results. Fortunately, adequate programming models like MapReduce facilitate parallelization of text mining tasks and allow for processing data sets beyond what would fit into memory by using a distributed file system possibly spanning over several machines, e.g., in a cluster of workstations. In this paper we present a plug-in package to tm called tm.plugin.dc implementing a distributed corpus class which can take advantage of the Hadoop MapReduce library for large scale text mining tasks. We show on the basis of an application in culturomics that we can efficiently handle data sets of significant size.
Multilingualism, Empathy and Multicompetence

Science.gov (United States)

Dewaele, Jean-Marc; Wei, Li

2012-01-01

The present study investigates the link between multilingualism and the personality trait of cognitive empathy among 2158 mono- and multilinguals. Data were collected through an online questionnaire. Statistical analyses revealed that the knowledge of more languages was not linked to cognitive empathy. Bilingual upbringing and the experience of…
A text-mining system for extracting metabolic reactions from full-text articles.

Science.gov (United States)

Czarnecki, Jan; Nobeli, Irene; Smith, Adrian M; Shepherd, Adrian J

2012-07-23

Increasingly biological text mining research is focusing on the extraction of complex relationships relevant to the construction and curation of biological networks and pathways. However, one important category of pathway - metabolic pathways - has been largely neglected.Here we present a relatively simple method for extracting metabolic reaction information from free text that scores different permutations of assigned entities (enzymes and metabolites) within a given sentence based on the presence and location of stemmed keywords. This method extends an approach that has proved effective in the context of the extraction of protein-protein interactions. When evaluated on a set of manually-curated metabolic pathways using standard performance criteria, our method performs surprisingly well. Precision and recall rates are comparable to those previously achieved for the well-known protein-protein interaction extraction task. We conclude that automated metabolic pathway construction is more tractable than has often been assumed, and that (as in the case of protein-protein interaction extraction) relatively simple text-mining approaches can prove surprisingly effective. It is hoped that these results will provide an impetus to further research and act as a useful benchmark for judging the performance of more sophisticated methods that are yet to be developed.
Text Mining for Protein Docking.

Directory of Open Access Journals (Sweden)

Varsha D Badal

2015-12-01

Full Text Available The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking. Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu. The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound
Number pronunciation in a multilingual environment and implications for an ASR system

CSIR Research Space (South Africa)

Molapo, R

2014-11-01

Full Text Available . Mbogho, “Web-based corpus acquisition for Swahili language modelling,” in 3rd workshop on Spoken Languages Technolo- gies for Under-resourced languages, 2012, pp. 42–47. [8] T. Schlippe, C. Zhu, J. Gebhardt, and T. Schultz, “Text normalization based... multilingual environment and implications for an ASR system Raymond Molapo Human Language Technologies Research Group Meraka Institute CSIR, South Africa Multilingual Speech Technologies Group North-West University Vanderbijlpark South Africa Email: rmolapo...
Multi-lingual Opinion Mining on YouTube

DEFF Research Database (Denmark)

Severyn, Aliaksei; Moschitti, Alessandro; Uryupina, Olga

2015-01-01

In order to successfully apply opinion mining (OM) to the large amounts of user-generated content produced every day, we need robust models that can handle the noisy input well yet can easily be adapted to a new domain or language. We here focus on opinion mining for YouTube by (i) modeling...... domain (up to 2.6% and 3% of absolute improvement for Italian and English, respectively); (ii) it is particularly useful when tested across domains (up to more than 4% absolute improvement for both languages), especially when little training data is available (up to 10% absolute improvement) and (iii...
ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers.

Science.gov (United States)

Xing, Yuting; Wu, Chengkun; Yang, Xi; Wang, Wei; Zhu, En; Yin, Jianping

2018-04-27

A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.
Empirical advances with text mining of electronic health records.

Science.gov (United States)

Delespierre, T; Denormandie, P; Bar-Hen, A; Josseran, L

2017-08-22

Korian is a private group specializing in medical accommodations for elderly and dependent people. A professional data warehouse (DWH) established in 2010 hosts all of the residents' data. Inside this information system (IS), clinical narratives (CNs) were used only by medical staff as a residents' care linking tool. The objective of this study was to show that, through qualitative and quantitative textual analysis of a relatively small physiotherapy and well-defined CN sample, it was possible to build a physiotherapy corpus and, through this process, generate a new body of knowledge by adding relevant information to describe the residents' care and lives. Meaningful words were extracted through Standard Query Language (SQL) with the LIKE function and wildcards to perform pattern matching, followed by text mining and a word cloud using R® packages. Another step involved principal components and multiple correspondence analyses, plus clustering on the same residents' sample as well as on other health data using a health model measuring the residents' care level needs. By combining these techniques, physiotherapy treatments could be characterized by a list of constructed keywords, and the residents' health characteristics were built. Feeding defects or health outlier groups could be detected, physiotherapy residents' data and their health data were matched, and differences in health situations showed qualitative and quantitative differences in physiotherapy narratives. This textual experiment using a textual process in two stages showed that text mining and data mining techniques provide convenient tools to improve residents' health and quality of care by adding new, simple, useable data to the electronic health record (EHR). When used with a normalized physiotherapy problem list, text mining through information extraction (IE), named entity recognition (NER) and data mining (DM) can provide a real advantage to describe health care, adding new medical material and
Assimilating Text-Mining & Bio-Informatics Tools to Analyze Cellulase structures

Science.gov (United States)

Satyasree, K. P. N. V., Dr; Lalitha Kumari, B., Dr; Jyotsna Devi, K. S. N. V.; Choudri, S. M. Roy; Pratap Joshi, K.

2017-08-01

Text-mining is one of the best potential way of automatically extracting information from the huge biological literature. To exploit its prospective, the knowledge encrypted in the text should be converted to some semantic representation such as entities and relations, which could be analyzed by machines. But large-scale practical systems for this purpose are rare. But text mining could be helpful for generating or validating predictions. Cellulases have abundant applications in various industries. Cellulose degrading enzymes are cellulases and the same producing bacteria - Bacillus subtilis & fungus Pseudomonas putida were isolated from top soil of Guntur Dt. A.P. India. Absolute cultures were conserved on potato dextrose agar medium for molecular studies. In this paper, we presented how well the text mining concepts can be used to analyze cellulase producing bacteria and fungi, their comparative structures are also studied with the aid of well-establised, high quality standard bioinformatic tools such as Bioedit, Swissport, Protparam, EMBOSSwin with which a complete data on Cellulases like structure, constituents of the enzyme has been obtained.
Language Identification of Kannada, Hindi and English Text Words Through Visual Discriminating Features

Directory of Open Access Journals (Sweden)

M.C. Padma

2008-06-01

Full Text Available In a multilingual country like India, a document may contain text words in more than one language. For a multilingual environment, multi lingual Optical Character Recognition (OCR system is needed to read the multilingual documents. So, it is necessary to identify different language regions of the document before feeding the document to the OCRs of individual language. The objective of this paper is to propose visual clues based procedure to identify Kannada, Hindi and English text portions of the Indian multilingual document.
BioCreative Workshops for DOE Genome Sciences: Text Mining for Metagenomics

Energy Technology Data Exchange (ETDEWEB)

Wu, Cathy H. [Univ. of Delaware, Newark, DE (United States). Center for Bioinformatics and Computational Biology; Hirschman, Lynette [The MITRE Corporation, Bedford, MA (United States)

2016-10-29

The objective of this project was to host BioCreative workshops to define and develop text mining tasks to meet the needs of the Genome Sciences community, focusing on metadata information extraction in metagenomics. Following the successful introduction of metagenomics at the BioCreative IV workshop, members of the metagenomics community and BioCreative communities continued discussion to identify candidate topics for a BioCreative metagenomics track for BioCreative V. Of particular interest was the capture of environmental and isolation source information from text. The outcome was to form a “community of interest” around work on the interactive EXTRACT system, which supported interactive tagging of environmental and species data. This experiment is included in the BioCreative V virtual issue of Database. In addition, there was broad participation by members of the metagenomics community in the panels held at BioCreative V, leading to valuable exchanges between the text mining developers and members of the metagenomics research community. These exchanges are reflected in a number of the overview and perspective pieces also being captured in the BioCreative V virtual issue. Overall, this conversation has exposed the metagenomics researchers to the possibilities of text mining, and educated the text mining developers to the specific needs of the metagenomics community.

Text mining for adverse drug events: the promise, challenges, and state of the art.

Science.gov (United States)

Harpaz, Rave; Callahan, Alison; Tamang, Suzanne; Low, Yen; Odgers, David; Finlayson, Sam; Jung, Kenneth; LePendu, Paea; Shah, Nigam H

2014-10-01

Text mining is the computational process of extracting meaningful information from large amounts of unstructured text. It is emerging as a tool to leverage underutilized data sources that can improve pharmacovigilance, including the objective of adverse drug event (ADE) detection and assessment. This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining, and discusses several data sources-such as biomedical literature, clinical narratives, product labeling, social media, and Web search logs-that are amenable to text mining for pharmacovigilance. Given the state of the art, it appears text mining can be applied to extract useful ADE-related information from multiple textual sources. Nonetheless, further research is required to address remaining technical challenges associated with the text mining methodologies, and to conclusively determine the relative contribution of each textual source to improving pharmacovigilance.
A Bright Future for Interdisciplinary Multilingualism Research

Science.gov (United States)

Comanaru, Ruxandra-S.; Dewaele, Jean-Marc

2015-01-01

Multilingualism is a prevalent reality in today's world. From an individual level to a societal one, multilingualism incorporates many aspects that have been studied extensively by diverse social research disciplines. The present article will explore the potential directions which multilingualism research can take, concentrating mainly on the…
The Role of Text Mining in Export Control

Energy Technology Data Exchange (ETDEWEB)

Tae, Jae-woong; Son, Choul-woong; Shin, Dong-hoon [Korea Institute of Nuclear Nonproliferation and Control, Daejeon (Korea, Republic of)

2015-10-15

Korean government provides classification services to exporters. It is simple to copy technology such as documents and drawings. Moreover, it is also easy that new technology derived from the existing technology. The diversity of technology makes classification difficult because the boundary between strategic and nonstrategic technology is unclear and ambiguous. Reviewers should consider previous classification cases enough. However, the increase of the classification cases prevent consistent classifications. This made another innovative and effective approaches necessary. IXCRS (Intelligent Export Control Review System) is proposed to coincide with demands. IXCRS consists of and expert system, a semantic searching system, a full text retrieval system, and image retrieval system and a document retrieval system. It is the aim of the present paper to observe the document retrieval system based on text mining and to discuss how to utilize the system. This study has demonstrated how text mining technique can be applied to export control. The document retrieval system supports reviewers to treat previous classification cases effectively. Especially, it is highly probable that similarity data will contribute to specify classification criterion. However, an analysis of the system showed a number of problems that remain to be explored such as a multilanguage problem and an inclusion relationship problem. Further research should be directed to solve problems and to apply more data mining techniques so that the system should be used as one of useful tools for export control.
The Role of Text Mining in Export Control

International Nuclear Information System (INIS)

Tae, Jae-woong; Son, Choul-woong; Shin, Dong-hoon

2015-01-01

Korean government provides classification services to exporters. It is simple to copy technology such as documents and drawings. Moreover, it is also easy that new technology derived from the existing technology. The diversity of technology makes classification difficult because the boundary between strategic and nonstrategic technology is unclear and ambiguous. Reviewers should consider previous classification cases enough. However, the increase of the classification cases prevent consistent classifications. This made another innovative and effective approaches necessary. IXCRS (Intelligent Export Control Review System) is proposed to coincide with demands. IXCRS consists of and expert system, a semantic searching system, a full text retrieval system, and image retrieval system and a document retrieval system. It is the aim of the present paper to observe the document retrieval system based on text mining and to discuss how to utilize the system. This study has demonstrated how text mining technique can be applied to export control. The document retrieval system supports reviewers to treat previous classification cases effectively. Especially, it is highly probable that similarity data will contribute to specify classification criterion. However, an analysis of the system showed a number of problems that remain to be explored such as a multilanguage problem and an inclusion relationship problem. Further research should be directed to solve problems and to apply more data mining techniques so that the system should be used as one of useful tools for export control
A Text-Mining Framework for Supporting Systematic Reviews.

Science.gov (United States)

Li, Dingcheng; Wang, Zhen; Wang, Liwei; Sohn, Sunghwan; Shen, Feichen; Murad, Mohammad Hassan; Liu, Hongfang

2016-11-01

Systematic reviews (SRs) involve the identification, appraisal, and synthesis of all relevant studies for focused questions in a structured reproducible manner. High-quality SRs follow strict procedures and require significant resources and time. We investigated advanced text-mining approaches to reduce the burden associated with abstract screening in SRs and provide high-level information summary. A text-mining SR supporting framework consisting of three self-defined semantics-based ranking metrics was proposed, including keyword relevance, indexed-term relevance and topic relevance. Keyword relevance is based on the user-defined keyword list used in the search strategy. Indexed-term relevance is derived from indexed vocabulary developed by domain experts used for indexing journal articles and books. Topic relevance is defined as the semantic similarity among retrieved abstracts in terms of topics generated by latent Dirichlet allocation, a Bayesian-based model for discovering topics. We tested the proposed framework using three published SRs addressing a variety of topics (Mass Media Interventions, Rectal Cancer and Influenza Vaccine). The results showed that when 91.8%, 85.7%, and 49.3% of the abstract screening labor was saved, the recalls were as high as 100% for the three cases; respectively. Relevant studies identified manually showed strong topic similarity through topic analysis, which supported the inclusion of topic analysis as relevance metric. It was demonstrated that advanced text mining approaches can significantly reduce the abstract screening labor of SRs and provide an informative summary of relevant studies.
Voice user interface design for emerging multilingual markets

CSIR Research Space (South Africa)

Van Huyssteen, G

2012-10-01

Full Text Available Multilingual emerging markets hold many opportunities for the application of spoken language technologies, such as automatic speech recognition (ASR) or test-to-speech (TTS) technologies in interactive voice response (IVR) systems. However...
pubmed.mineR: an R package with text-mining algorithms to analyse PubMed abstracts.

Science.gov (United States)

Rani, Jyoti; Shah, A B Rauf; Ramachandran, Srinivasan

2015-10-01

The PubMed literature database is a valuable source of information for scientific research. It is rich in biomedical literature with more than 24 million citations. Data-mining of voluminous literature is a challenging task. Although several text-mining algorithms have been developed in recent years with focus on data visualization, they have limitations such as speed, are rigid and are not available in the open source. We have developed an R package, pubmed.mineR, wherein we have combined the advantages of existing algorithms, overcome their limitations, and offer user flexibility and link with other packages in Bioconductor and the Comprehensive R Network (CRAN) in order to expand the user capabilities for executing multifaceted approaches. Three case studies are presented, namely, 'Evolving role of diabetes educators', 'Cancer risk assessment' and 'Dynamic concepts on disease and comorbidity' to illustrate the use of pubmed.mineR. The package generally runs fast with small elapsed times in regular workstations even on large corpus sizes and with compute intensive functions. The pubmed.mineR is available at http://cran.rproject. org/web/packages/pubmed.mineR.
German Schools Abroad: Hotspots of Elite Multilingualism?

Science.gov (United States)

Sander, Anne E; Admiraal, Wilfried

2016-01-01

While multilingualism itself is a widely analyzed topic, a study about multilingualism at German schools abroad is so far unique. This quantitative study investigates the differences in the size of German expressive and receptive vocabulary between monolingual and multilingual students, aged between 5 and 11 years. A cohort of 65 multilingual…
Piecing Together the "Workplace Multilingualism" Jigsaw Puzzle

Science.gov (United States)

Hua, Zhu

2014-01-01

Multilingualism in the workplace is different from multilingualism at home or in other domains of social life. It has more direct, yet entangled, economic and social implications and serves interactional purposes which can be at any point on the continuum of goal-orientation and relationship-building. Multilingualism in the workplace is both a…
Text mining by Tsallis entropy

Science.gov (United States)

Jamaati, Maryam; Mehri, Ali

2018-01-01

Long-range correlations between the elements of natural languages enable them to convey very complex information. Complex structure of human language, as a manifestation of natural languages, motivates us to apply nonextensive statistical mechanics in text mining. Tsallis entropy appropriately ranks the terms' relevance to document subject, taking advantage of their spatial correlation length. We apply this statistical concept as a new powerful word ranking metric in order to extract keywords of a single document. We carry out an experimental evaluation, which shows capability of the presented method in keyword extraction. We find that, Tsallis entropy has reliable word ranking performance, at the same level of the best previous ranking methods.
Text Mining to Support Gene Ontology Curation and Vice Versa.

Science.gov (United States)

Ruch, Patrick

2017-01-01

In this chapter, we explain how text mining can support the curation of molecular biology databases dealing with protein functions. We also show how curated data can play a disruptive role in the developments of text mining methods. We review a decade of efforts to improve the automatic assignment of Gene Ontology (GO) descriptors, the reference ontology for the characterization of genes and gene products. To illustrate the high potential of this approach, we compare the performances of an automatic text categorizer and show a large improvement of +225 % in both precision and recall on benchmarked data. We argue that automatic text categorization functions can ultimately be embedded into a Question-Answering (QA) system to answer questions related to protein functions. Because GO descriptors can be relatively long and specific, traditional QA systems cannot answer such questions. A new type of QA system, so-called Deep QA which uses machine learning methods trained with curated contents, is thus emerging. Finally, future advances of text mining instruments are directly dependent on the availability of high-quality annotated contents at every curation step. Databases workflows must start recording explicitly all the data they curate and ideally also some of the data they do not curate.
Language Choice in Multilingual Communities: The Case of Larteh ...

African Journals Online (AJOL)

In a multilingual community, the multilingual speaker needs to make the right language choice which principally depends on the domain of usage and the linguistic repertoire of speech participants. This paper investigates factors that govern language choices that multilingual speakers make in Larteh, a multilingual ...
Word level language identification in online multilingual communication

NARCIS (Netherlands)

Nguyen, Dong-Phuong; Dogruoz, A. Seza

2013-01-01

Multilingual speakers switch between languages in online and spoken communication. Analyses of large scale multilingual data require automatic language identification at the word level. For our experiments with multilingual online discussions, we first tag the language of individual words using
Advances in Text Mining and Visualization for Precision Medicine.

Science.gov (United States)

Gonzalez-Hernandez, Graciela; Sarker, Abeed; O'Connor, Karen; Greene, Casey; Liu, Hongfang

2018-01-01

According to the National Institutes of Health (NIH), precision medicine is "an emerging approach for disease treatment and prevention that takes into account individual variability in genes, environment, and lifestyle for each person." Although the text mining community has explored this realm for some years, the official endorsement and funding launched in 2015 with the Precision Medicine Initiative are beginning to bear fruit. This session sought to elicit participation of researchers with strong background in text mining and/or visualization who are actively collaborating with bench scientists and clinicians for the deployment of integrative approaches in precision medicine that could impact scientific discovery and advance the vision of precision medicine as a universal, accessible approach at the point of care.
Application of text mining for customer evaluations in commercial banking

Science.gov (United States)

Tan, Jing; Du, Xiaojiang; Hao, Pengpeng; Wang, Yanbo J.

2015-07-01

Nowadays customer attrition is increasingly serious in commercial banks. To combat this problem roundly, mining customer evaluation texts is as important as mining customer structured data. In order to extract hidden information from customer evaluations, Textual Feature Selection, Classification and Association Rule Mining are necessary techniques. This paper presents all three techniques by using Chinese Word Segmentation, C5.0 and Apriori, and a set of experiments were run based on a collection of real textual data that includes 823 customer evaluations taken from a Chinese commercial bank. Results, consequent solutions, some advice for the commercial bank are given in this paper.
Using Text Mining to Characterize Online Discussion Facilitation

Science.gov (United States)

Ming, Norma; Baumer, Eric

2011-01-01

Facilitating class discussions effectively is a critical yet challenging component of instruction, particularly in online environments where student and faculty interaction is limited. Our goals in this research were to identify facilitation strategies that encourage productive discussion, and to explore text mining techniques that can help…
Multilingual Researchers Internationalizing Monolingual English-Only Education through Post-Monolingual Research Methodologies

Directory of Open Access Journals (Sweden)

Michael Singh

2017-02-01

Full Text Available The argument advanced in this Special Issue of Education Sciences favors democratizing knowledge production and dissemination across the humanities and social sciences through the mainstreaming of multilingual researchers capabilities for theorizing using their full linguistic repertoire. An important contribution of the papers in this Special Issue is the promise that post-monolingual research methodology holds for collaborative projects among multilingual and monolingual researchers that tap into intercultural divergences across languages. Together these papers give warrant to multilingual researchers, including Higher Degree Researchers develop their capabilities for theorizing using their full linguistic repertoire, an educational innovation that could be of immense benefit to scholars working predominantly monolingual universities. Through their thought provoking papers presented in this Special Issue, these researchers invites those working in the education sciences to seriously consider the potential benefits of multiplying the intellectual resources used for theorizing that is possible through activating, mobilizing and deploying researchers’ multilingual resources in knowledge production and dissemination.
Practical text mining and statistical analysis for non-structured text data applications

CERN Document Server

Miner, Gary; Hill, Thomas; Nisbet, Robert; Delen, Dursun

2012-01-01

The world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. This makes it possible to do many things that previously could not be done: spot business trends, prevent diseases, combat crime and so on. Managed well, the textual data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account. As the Internet expands and our natural capacity to process the unstructured text that it contains diminishes, the value of text mining for information retrieval and search will increase d
Using text-mining techniques in electronic patient records to identify ADRs from medicine use

DEFF Research Database (Denmark)

Warrer, Pernille; Hansen, Ebba Holme; Jensen, Lars Juhl

2012-01-01

This literature review included studies that use text-mining techniques in narrative documents stored in electronic patient records (EPRs) to investigate ADRs. We searched PubMed, Embase, Web of Science and International Pharmaceutical Abstracts without restrictions from origin until July 2011. We...... included empirically based studies on text mining of electronic patient records (EPRs) that focused on detecting ADRs, excluding those that investigated adverse events not related to medicine use. We extracted information on study populations, EPR data sources, frequencies and types of the identified ADRs......, medicines associated with ADRs, text-mining algorithms used and their performance. Seven studies, all from the United States, were eligible for inclusion in the review. Studies were published from 2001, the majority between 2009 and 2010. Text-mining techniques varied over time from simple free text...
Multilingualism and Social Inclusion

DEFF Research Database (Denmark)

2017-01-01

This is a thematic issue on the relation between multilingualism and social inclusion. Due to globalization, Europeanization, supranational and transnational regulations linguistic diversity and multilingualism are on the rise. Migration and old and new forms of mobility play an important role...... in these processes. As a consequence, English as the only global language is spreading around the world, including Europe and the European Union. Social and linguistic inclusion was accounted for in the pre-globalization age by the nation-state ideology implementing the ‘one nation-one people-one language’ doctrine...... in governance and daily life protected by a legal framework. This does not mean that there is full equality of languages. This carries over to the fair and just social inclusion of the speakers of these weaker, dominated languages as well. There is always a power question related to multilingualism. The ten...

Underlying Paradox in the European Union's Multilingualism Policies

Science.gov (United States)

Johnson, Fern L.

2013-01-01

The European Union (EU) has developed comprehensive policies in recent years to promote multilingualism. In this article, major EU policy statements on multilingualism are analyzed to demonstrate how their underlying language ideology produces paradox by both encouraging multilingualism and regulating its definition within the EU. The first…
Biomedical hypothesis generation by text mining and gene prioritization.

Science.gov (United States)

Petric, Ingrid; Ligeti, Balazs; Gyorffy, Balazs; Pongor, Sandor

2014-01-01

Text mining methods can facilitate the generation of biomedical hypotheses by suggesting novel associations between diseases and genes. Previously, we developed a rare-term model called RaJoLink (Petric et al, J. Biomed. Inform. 42(2): 219-227, 2009) in which hypotheses are formulated on the basis of terms rarely associated with a target domain. Since many current medical hypotheses are formulated in terms of molecular entities and molecular mechanisms, here we extend the methodology to proteins and genes, using a standardized vocabulary as well as a gene/protein network model. The proposed enhanced RaJoLink rare-term model combines text mining and gene prioritization approaches. Its utility is illustrated by finding known as well as potential gene-disease associations in ovarian cancer using MEDLINE abstracts and the STRING database.
Developing Multilingual Pedagogies and Research through Language Study and Reflection

Science.gov (United States)

Catalano, Theresa; Shende, Madhur; Suh, Emily K.

2018-01-01

Globalisation and increased transnational migration underscore the need for educational responses to multilingualism and multilingual discourses. One way to heighten awareness of multilingual pedagogies (while simultaneously providing data for multilingual research) is the use of reflective language study and journaling by language…
Multilingualism as a Principle of the EU Court of Justice

Directory of Open Access Journals (Sweden)

Karina Kh. Rekosh

2014-01-01

Full Text Available Since the jurisprudence reflects relations between the institutions, bodies and organizations of the EU and native speakers, the EU Court of Justice plays a huge role in shaping the legal discourse. Relations between the EU and citizens show the effectiveness of the principle of multilingualism, that is apparent before the Court. The enlargement of the Union to 28 member States and, accordingly, the increase of the number of official languages to 24 complicate the implementation of the principle of multilingualism and create many problems for the EU Court of Justice: legal, linguistic, budget, translation. All documents of the Court are not translated into 24 EU official languages completely and often limited to summaries. All documents are translated only into French and proceeding languages, for the scale of the translation work have a direct impact on the timing of legal proceedings. To provide help in written translations, much work is carried out in the Court on drawing up dictionaries, thesauri, where multilingualism is fully manifested. On the use of languages and language regime, There is an extensive legal practice, however, the term «multilingualism» is not used by the Court, despite the recognition of the principle of equality of all official languages, perhaps, due to the fact that the Court itself not always follows it. The article shows that multilingualism as a legal concept and principle opens up, sometimes adjacent to the already distinguished objects of regulation, new areas of legal research. Comparison of legal solutions to the problems of multilingualism in different states with a variety of languages, law and order, or in international organizations, lays basis of "comparative linguistic law" Now in the doctrine of law of the European Union neither the linguistic law, nor the comparative linguistic law do not exist, but to provide cooperation in the field of justice and mutual recognition of judicial decisions on the
Text Mining for Information Systems Researchers: An Annotated Topic Modeling Tutorial

DEFF Research Database (Denmark)

Debortoli, Stefan; Müller, Oliver; Junglas, Iris

2016-01-01

, such as manual coding. Yet, the size of text data setsobtained from the Internet makes manual analysis virtually impossible. In this tutorial, we discuss the challengesencountered when applying automated text-mining techniques in information systems research. In particular, weshowcase the use of probabilistic...... researchers,this tutorial provides some guidance for conducting text mining studies on their own and for evaluating the quality ofothers.......t is estimated that more than 80 percent of today’s data is stored in unstructured form (e.g., text, audio, image, video);and much of it is expressed in rich and ambiguous natural language. Traditionally, the analysis of natural languagehas prompted the use of qualitative data analysis approaches...
Negation scope and spelling variation for text-mining of Danish electronic patient records

DEFF Research Database (Denmark)

Thomas, Cecilia Engel; Jensen, Peter Bjødstrup; Werge, Thomas

2014-01-01

Electronic patient records are a potentially rich data source for knowledge extraction in biomedical research. Here we present a method based on the ICD10 system for text-mining of Danish health records. We have evaluated how adding functionalities to a baseline text-mining tool affected...
Knowledge based word-concept model estimation and refinement for biomedical text mining.

Science.gov (United States)

Jimeno Yepes, Antonio; Berlanga, Rafael

2015-02-01

Text mining of scientific literature has been essential for setting up large public biomedical databases, which are being widely used by the research community. In the biomedical domain, the existence of a large number of terminological resources and knowledge bases (KB) has enabled a myriad of machine learning methods for different text mining related tasks. Unfortunately, KBs have not been devised for text mining tasks but for human interpretation, thus performance of KB-based methods is usually lower when compared to supervised machine learning methods. The disadvantage of supervised methods though is they require labeled training data and therefore not useful for large scale biomedical text mining systems. KB-based methods do not have this limitation. In this paper, we describe a novel method to generate word-concept probabilities from a KB, which can serve as a basis for several text mining tasks. This method not only takes into account the underlying patterns within the descriptions contained in the KB but also those in texts available from large unlabeled corpora such as MEDLINE. The parameters of the model have been estimated without training data. Patterns from MEDLINE have been built using MetaMap for entity recognition and related using co-occurrences. The word-concept probabilities were evaluated on the task of word sense disambiguation (WSD). The results showed that our method obtained a higher degree of accuracy than other state-of-the-art approaches when evaluated on the MSH WSD data set. We also evaluated our method on the task of document ranking using MEDLINE citations. These results also showed an increase in performance over existing baseline retrieval approaches. Copyright © 2014 Elsevier Inc. All rights reserved.
Mining protein function from text using term-based support vector machines

Science.gov (United States)

Rice, Simon B; Nenadic, Goran; Stapley, Benjamin J

2005-01-01

Background Text mining has spurred huge interest in the domain of biology. The goal of the BioCreAtIvE exercise was to evaluate the performance of current text mining systems. We participated in Task 2, which addressed assigning Gene Ontology terms to human proteins and selecting relevant evidence from full-text documents. We approached it as a modified form of the document classification task. We used a supervised machine-learning approach (based on support vector machines) to assign protein function and select passages that support the assignments. As classification features, we used a protein's co-occurring terms that were automatically extracted from documents. Results The results evaluated by curators were modest, and quite variable for different problems: in many cases we have relatively good assignment of GO terms to proteins, but the selected supporting text was typically non-relevant (precision spanning from 3% to 50%). The method appears to work best when a substantial set of relevant documents is obtained, while it works poorly on single documents and/or short passages. The initial results suggest that our approach can also mine annotations from text even when an explicit statement relating a protein to a GO term is absent. Conclusion A machine learning approach to mining protein function predictions from text can yield good performance only if sufficient training data is available, and significant amount of supporting data is used for prediction. The most promising results are for combined document retrieval and GO term assignment, which calls for the integration of methods developed in BioCreAtIvE Task 1 and Task 2. PMID:15960835
Complex systems, multilingualism and academic success in South ...

African Journals Online (AJOL)

In South Africa, students are multilingual and this is often ignored or perceived as a hindrance to academic success. Conversely, there are studies that have found a positive relationship between bi- and multilingualism and cognitive development during the past 40 years. The aim of this article is to view multilingualism and ...
Hot complaint intelligent classification based on text mining

Directory of Open Access Journals (Sweden)

XIA Haifeng

2013-10-01

Full Text Available The complaint recognizer system plays an important role in making sure the correct classification of the hot complaint,improving the service quantity of telecommunications industry.The customers’ complaint in telecommunications industry has its special particularity which should be done in limited time,which cause the error in classification of hot complaint.The paper presents a model of complaint hot intelligent classification based on text mining,which can classify the hot complaint in the correct level of the complaint navigation.The examples show that the model can be efficient to classify the text of the complaint.
Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies.

Science.gov (United States)

Cohen, Raphael; Elhadad, Michael; Elhadad, Noémie

2013-01-16

The increasing availability of Electronic Health Record (EHR) data and specifically free-text patient notes presents opportunities for phenotype extraction. Text-mining methods in particular can help disease modeling by mapping named-entities mentions to terminologies and clustering semantically related terms. EHR corpora, however, exhibit specific statistical and linguistic characteristics when compared with corpora in the biomedical literature domain. We focus on copy-and-paste redundancy: clinicians typically copy and paste information from previous notes when documenting a current patient encounter. Thus, within a longitudinal patient record, one expects to observe heavy redundancy. In this paper, we ask three research questions: (i) How can redundancy be quantified in large-scale text corpora? (ii) Conventional wisdom is that larger corpora yield better results in text mining. But how does the observed EHR redundancy affect text mining? Does such redundancy introduce a bias that distorts learned models? Or does the redundancy introduce benefits by highlighting stable and important subsets of the corpus? (iii) How can one mitigate the impact of redundancy on text mining? We analyze a large-scale EHR corpus and quantify redundancy both in terms of word and semantic concept repetition. We observe redundancy levels of about 30% and non-standard distribution of both words and concepts. We measure the impact of redundancy on two standard text-mining applications: collocation identification and topic modeling. We compare the results of these methods on synthetic data with controlled levels of redundancy and observe significant performance variation. Finally, we compare two mitigation strategies to avoid redundancy-induced bias: (i) a baseline strategy, keeping only the last note for each patient in the corpus; (ii) removing redundant notes with an efficient fingerprinting-based algorithm. (a)For text mining, preprocessing the EHR corpus with fingerprinting yields
Multilingual phonological analysis and speech synthesis

NARCIS (Netherlands)

Coleman, J.S.; Dirksen, A.; Hussain, S.; Waals, J.

1996-01-01

We give an overview of multilingual speech synthesis using the IPOX system. The first part discusses work in progress for various languages: Tashlhit Berber, Urdu and Dutch. The second part discusses a multilingual phonological grammar, which can be adapted to a particular language by setting
New Perspectives on Multilingualism and L2 Acquisition: An Introduction

Science.gov (United States)

de Zarobe, Leyre Ruiz; de Zarobe, Yolanda Ruiz

2015-01-01

This article focuses on the description of one of the main features of current multilingualism, complexity, through a selection of issues related to its role in L2 acquisition, as the proper notion of multilingualism, multilingualism as a social phenomenon and multilingualism as a multidimensional phenomenon. We also present several aspects of…
Text-mining analysis of mHealth research

Science.gov (United States)

Zengul, Ferhat; Oner, Nurettin; Delen, Dursun

2017-01-01

In recent years, because of the advancements in communication and networking technologies, mobile technologies have been developing at an unprecedented rate. mHealth, the use of mobile technologies in medicine, and the related research has also surged parallel to these technological advancements. Although there have been several attempts to review mHealth research through manual processes such as systematic reviews, the sheer magnitude of the number of studies published in recent years makes this task very challenging. The most recent developments in machine learning and text mining offer some potential solutions to address this challenge by allowing analyses of large volumes of texts through semi-automated processes. The objective of this study is to analyze the evolution of mHealth research by utilizing text-mining and natural language processing (NLP) analyses. The study sample included abstracts of 5,644 mHealth research articles, which were gathered from five academic search engines by using search terms such as mobile health, and mHealth. The analysis used the Text Explorer module of JMP Pro 13 and an iterative semi-automated process involving tokenizing, phrasing, and terming. After developing the document term matrix (DTM) analyses such as single value decomposition (SVD), topic, and hierarchical document clustering were performed, along with the topic-informed document clustering approach. The results were presented in the form of word-clouds and trend analyses. There were several major findings regarding research clusters and trends. First, our results confirmed time-dependent nature of terminology use in mHealth research. For example, in earlier versus recent years the use of terminology changed from “mobile phone” to “smartphone” and from “applications” to “apps”. Second, ten clusters for mHealth research were identified including (I) Clinical Research on Lifestyle Management, (II) Community Health, (III) Literature Review, (IV) Medical
Text-mining analysis of mHealth research.

Science.gov (United States)

Ozaydin, Bunyamin; Zengul, Ferhat; Oner, Nurettin; Delen, Dursun

2017-01-01

In recent years, because of the advancements in communication and networking technologies, mobile technologies have been developing at an unprecedented rate. mHealth, the use of mobile technologies in medicine, and the related research has also surged parallel to these technological advancements. Although there have been several attempts to review mHealth research through manual processes such as systematic reviews, the sheer magnitude of the number of studies published in recent years makes this task very challenging. The most recent developments in machine learning and text mining offer some potential solutions to address this challenge by allowing analyses of large volumes of texts through semi-automated processes. The objective of this study is to analyze the evolution of mHealth research by utilizing text-mining and natural language processing (NLP) analyses. The study sample included abstracts of 5,644 mHealth research articles, which were gathered from five academic search engines by using search terms such as mobile health, and mHealth. The analysis used the Text Explorer module of JMP Pro 13 and an iterative semi-automated process involving tokenizing, phrasing, and terming. After developing the document term matrix (DTM) analyses such as single value decomposition (SVD), topic, and hierarchical document clustering were performed, along with the topic-informed document clustering approach. The results were presented in the form of word-clouds and trend analyses. There were several major findings regarding research clusters and trends. First, our results confirmed time-dependent nature of terminology use in mHealth research. For example, in earlier versus recent years the use of terminology changed from "mobile phone" to "smartphone" and from "applications" to "apps". Second, ten clusters for mHealth research were identified including (I) Clinical Research on Lifestyle Management, (II) Community Health, (III) Literature Review, (IV) Medical Interventions
Using text-mining techniques in electronic patient records to identify ADRs from medicine use.

Science.gov (United States)

Warrer, Pernille; Hansen, Ebba Holme; Juhl-Jensen, Lars; Aagaard, Lise

2012-05-01

This literature review included studies that use text-mining techniques in narrative documents stored in electronic patient records (EPRs) to investigate ADRs. We searched PubMed, Embase, Web of Science and International Pharmaceutical Abstracts without restrictions from origin until July 2011. We included empirically based studies on text mining of electronic patient records (EPRs) that focused on detecting ADRs, excluding those that investigated adverse events not related to medicine use. We extracted information on study populations, EPR data sources, frequencies and types of the identified ADRs, medicines associated with ADRs, text-mining algorithms used and their performance. Seven studies, all from the United States, were eligible for inclusion in the review. Studies were published from 2001, the majority between 2009 and 2010. Text-mining techniques varied over time from simple free text searching of outpatient visit notes and inpatient discharge summaries to more advanced techniques involving natural language processing (NLP) of inpatient discharge summaries. Performance appeared to increase with the use of NLP, although many ADRs were still missed. Due to differences in study design and populations, various types of ADRs were identified and thus we could not make comparisons across studies. The review underscores the feasibility and potential of text mining to investigate narrative documents in EPRs for ADRs. However, more empirical studies are needed to evaluate whether text mining of EPRs can be used systematically to collect new information about ADRs. © 2011 The Authors. British Journal of Clinical Pharmacology © 2011 The British Pharmacological Society.
Resourcing speech-language pathologists to work with multilingual children.

Science.gov (United States)

McLeod, Sharynne

2014-06-01

Speech-language pathologists play important roles in supporting people to be competent communicators in the languages of their communities. However, with over 7000 languages spoken throughout the world and the majority of the global population being multilingual, there is often a mismatch between the languages spoken by children and families and their speech-language pathologists. This paper provides insights into service provision for multilingual children within an English-dominant country by viewing Australia's multilingual population as a microcosm of ethnolinguistic minorities. Recent population studies of Australian pre-school children show that their most common languages other than English are: Arabic, Cantonese, Vietnamese, Italian, Mandarin, Spanish, and Greek. Although 20.2% of services by Speech Pathology Australia members are offered in languages other than English, there is a mismatch between the language of the services and the languages of children within similar geographical communities. Australian speech-language pathologists typically use informal or English-based assessments and intervention tools with multilingual children. Thus, there is a need for accessible culturally and linguistically appropriate resources for working with multilingual children. Recent international collaborations have resulted in practical strategies to support speech-language pathologists during assessment, intervention, and collaboration with families, communities, and other professionals. The International Expert Panel on Multilingual Children's Speech was assembled to prepare a position paper to address issues faced by speech-language pathologists when working with multilingual populations. The Multilingual Children's Speech website ( http://www.csu.edu.au/research/multilingual-speech ) addresses one of the aims of the position paper by providing free resources and information for speech-language pathologists about more than 45 languages. These international
Multilingualism and Nation Building. Multilingual Matters 91.

Science.gov (United States)

Mansour, Gerda

This book examines the phenomenon of multilingualism in West Africa from a historical, social, and environmental perspective. Chapter 1 explains why the catalogue of African languages established by linguists is not reliable for assessing the linguistic diversity of the region. It also discusses studies that show that the linguistic behavior in…
Multilingualism and Creativity: A Multivariate Approach

Science.gov (United States)

Fürst, Guillaume; Grin, François

2018-01-01

This paper proposes a contribution to the investigation of the relation between multilingualism and creativity. Past evidence of a correlation between multilingualism and creativity is reviewed in a generalist perspective, that is, without focusing on a specific population such as migrants or highly proficient bilinguals. This review is also…
From university research to innovation Detecting knowledge transfer via text mining

DEFF Research Database (Denmark)

Woltmann, Sabrina; Clemmensen, Line Katrine Harder; Alkærsig, Lars

2016-01-01

and indicators such as patents, collaborative publications and license agreements, to assess the contribution to the socioeconomic surrounding of universities. In this study, we present an extension of the current empirical framework by applying new computational methods, namely text mining and pattern...... associated the former with the latter to obtain insights into possible text and semantic relatedness. The text mining methods are extrapolating the correlations, semantic patterns and content comparison of the two corpora to define the document relatedness. We expect the development of a novel tool using...... recognition. Text samples for this purpose can include files containing social media contents, company websites and annual reports. The empirical focus in the present study is on the technical sciences and in particular on the case of the Technical University of Denmark (DTU). We generated two independent...

Vaccine adverse event text mining system for extracting features from vaccine safety reports.

Science.gov (United States)

Botsis, Taxiarchis; Buttolph, Thomas; Nguyen, Michael D; Winiecki, Scott; Woo, Emily Jane; Ball, Robert

2012-01-01

To develop and evaluate a text mining system for extracting key clinical features from vaccine adverse event reporting system (VAERS) narratives to aid in the automated review of adverse event reports. Based upon clinical significance to VAERS reviewing physicians, we defined the primary (diagnosis and cause of death) and secondary features (eg, symptoms) for extraction. We built a novel vaccine adverse event text mining (VaeTM) system based on a semantic text mining strategy. The performance of VaeTM was evaluated using a total of 300 VAERS reports in three sequential evaluations of 100 reports each. Moreover, we evaluated the VaeTM contribution to case classification; an information retrieval-based approach was used for the identification of anaphylaxis cases in a set of reports and was compared with two other methods: a dedicated text classifier and an online tool. The performance metrics of VaeTM were text mining metrics: recall, precision and F-measure. We also conducted a qualitative difference analysis and calculated sensitivity and specificity for classification of anaphylaxis cases based on the above three approaches. VaeTM performed best in extracting diagnosis, second level diagnosis, drug, vaccine, and lot number features (lenient F-measure in the third evaluation: 0.897, 0.817, 0.858, 0.874, and 0.914, respectively). In terms of case classification, high sensitivity was achieved (83.1%); this was equal and better compared to the text classifier (83.1%) and the online tool (40.7%), respectively. Our VaeTM implementation of a semantic text mining strategy shows promise in providing accurate and efficient extraction of key features from VAERS narratives.
Methods for Mining and Summarizing Text Conversations

CERN Document Server

Carenini, Giuseppe; Murray, Gabriel

2011-01-01

Due to the Internet Revolution, human conversational data -- in written forms -- are accumulating at a phenomenal rate. At the same time, improvements in speech technology enable many spoken conversations to be transcribed. Individuals and organizations engage in email exchanges, face-to-face meetings, blogging, texting and other social media activities. The advances in natural language processing provide ample opportunities for these "informal documents" to be analyzed and mined, thus creating numerous new and valuable applications. This book presents a set of computational methods
Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II.

Science.gov (United States)

Lu, Zhiyong; Hirschman, Lynette

2012-01-01

Manual curation of data from the biomedical literature is a rate-limiting factor for many expert curated databases. Despite the continuing advances in biomedical text mining and the pressing needs of biocurators for better tools, few existing text-mining tools have been successfully integrated into production literature curation systems such as those used by the expert curated databases. To close this gap and better understand all aspects of literature curation, we invited submissions of written descriptions of curation workflows from expert curated databases for the BioCreative 2012 Workshop Track II. We received seven qualified contributions, primarily from model organism databases. Based on these descriptions, we identified commonalities and differences across the workflows, the common ontologies and controlled vocabularies used and the current and desired uses of text mining for biocuration. Compared to a survey done in 2009, our 2012 results show that many more databases are now using text mining in parts of their curation workflows. In addition, the workshop participants identified text-mining aids for finding gene names and symbols (gene indexing), prioritization of documents for curation (document triage) and ontology concept assignment as those most desired by the biocurators. DATABASE URL: http://www.biocreative.org/tasks/bc-workshop-2012/workflow/.
Multilingualism and Multicompetence: A Conceptual View

Science.gov (United States)

Franceschini, Rita

2011-01-01

The overall aim of this article is to argue that the functioning of every language system is based on a potential multilingual competence. The empirical basis for this is now broad enough to gain a comprehensive view on the overall competence of a multilingual individual. Moreover, increasing theoretical reflection has conferred an increasingly…
Multilingualism: Its Open and Hidden Agendas

Science.gov (United States)

Komorowska, Hanna

2013-01-01

The author analyses tendencies presented in recently launched EU reports claiming that newly published data reveal a need to rethink approaches to individual and social multilingualism. In the first part of the article approaches to individual as well as to societal multilingualism are discussed from a historical perspective. In the second part…
The gradience of multilingualism in typical and impaired language development: Positioning bilectalism within comparative bilingualism

Directory of Open Access Journals (Sweden)

Kleanthes K. Grohmann

2016-02-01

Full Text Available A multitude of factors characterizes bi- and multilingual compared to monolingual language acquisition. Two of the most prominent viewpoints have recently been put in perspective and enriched by a third (Tsimpli 2014: age of onset of children’s exposure to their native languages, the role of the input they receive, and the timing in monolingual first language development of the phenomena examined in bi- and multilingual children’s performance. This article picks up a fourth potential factor (Grohmann 2014b: language proximity, that is, the closeness between the two or more grammars a multilingual child acquires. It is a first attempt to flesh out the proposed gradient scale of multilingualism within the approach dubbed ‘comparative bilingualism’. The empirical part of this project comes from three types of research: (i the acquisition and subsequent development of pronominal object clitic placement in two closely related varieties of Greek by bilectal, binational, bilingual, and multilingual children; (ii the performance on executive control tasks by monolingual, bilectal, and bi- or multilingual children; and (iii the role of comparative bilingualism in children with a developmental language impairment for both the diagnosis and subsequent treatment as well as the possible avoidance or weakening of how language impairment presents.
Mining Sequential Update Summarization with Hierarchical Text Analysis

Directory of Open Access Journals (Sweden)

Chunyun Zhang

2016-01-01

Full Text Available The outbreak of unexpected news events such as large human accident or natural disaster brings about a new information access problem where traditional approaches fail. Mostly, news of these events shows characteristics that are early sparse and later redundant. Hence, it is very important to get updates and provide individuals with timely and important information of these incidents during their development, especially when being applied in wireless and mobile Internet of Things (IoT. In this paper, we define the problem of sequential update summarization extraction and present a new hierarchical update mining system which can broadcast with useful, new, and timely sentence-length updates about a developing event. The new system proposes a novel method, which incorporates techniques from topic-level and sentence-level summarization. To evaluate the performance of the proposed system, we apply it to the task of sequential update summarization of temporal summarization (TS track at Text Retrieval Conference (TREC 2013 to compute four measurements of the update mining system: the expected gain, expected latency gain, comprehensiveness, and latency comprehensiveness. Experimental results show that our proposed method has good performance.
Is Multilingualism Linked to a Higher Tolerance of Ambiguity?

Science.gov (United States)

DeWaele, Jean-Marc; Wei, Li

2013-01-01

The present study investigates the link between multilingualism and the personality trait Tolerance of Ambiguity (TA) among 2158 mono-, bi- and multilinguals. Monolinguals and bilinguals scored significantly lower on TA compared to multilinguals. A high level of global proficiency of various languages was linked to higher TA scores. A stay abroad…
Sounds Affecting the Moments of Stuttering in Multilingualism: A Case Study

Science.gov (United States)

Morrish, Taryn; Nesbitt, Amy; le Roux, Mia; Zsilavecz, Ursula; van der Linde, Jeannie

2017-01-01

Research involving stuttering in multilingual individuals is limited. Speech-language therapists face the challenge of treating a diverse client base, which includes multilingual individuals. The aim of this study was to examine the stuttering moments across English, Afrikaans, and German in a multilingual speaker. A single multilingual adult with…
Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text.

Science.gov (United States)

Garten, Yael; Altman, Russ B

2009-02-05

Pharmacogenomics studies the relationship between genetic variation and the variation in drug response phenotypes. The field is rapidly gaining importance: it promises drugs targeted to particular subpopulations based on genetic background. The pharmacogenomics literature has expanded rapidly, but is dispersed in many journals. It is challenging, therefore, to identify important associations between drugs and molecular entities--particularly genes and gene variants, and thus these critical connections are often lost. Text mining techniques can allow us to convert the free-style text to a computable, searchable format in which pharmacogenomic concepts (such as genes, drugs, polymorphisms, and diseases) are identified, and important links between these concepts are recorded. Availability of full text articles as input into text mining engines is key, as literature abstracts often do not contain sufficient information to identify these pharmacogenomic associations. Thus, building on a tool called Textpresso, we have created the Pharmspresso tool to assist in identifying important pharmacogenomic facts in full text articles. Pharmspresso parses text to find references to human genes, polymorphisms, drugs and diseases and their relationships. It presents these as a series of marked-up text fragments, in which key concepts are visually highlighted. To evaluate Pharmspresso, we used a gold standard of 45 human-curated articles. Pharmspresso identified 78%, 61%, and 74% of target gene, polymorphism, and drug concepts, respectively. Pharmspresso is a text analysis tool that extracts pharmacogenomic concepts from the literature automatically and thus captures our current understanding of gene-drug interactions in a computable form. We have made Pharmspresso available at http://pharmspresso.stanford.edu.
Text Mining Untuk Analisis Sentimen Review Film Menggunakan Algoritma K-Means

Directory of Open Access Journals (Sweden)

Setyo Budi

2017-02-01

Full Text Available Kemudahan manusia didalam menggunakan website mengakibatkan bertambahnya dokumen teks yang berupa pendapat dan informasi. Dalam waktu yang lama dokumen teks akan bertambah besar. Text mining merupakan salah satu teknik yang digunakan untuk menggali kumpulan dokumen text sehingga dapat diambil intisarinya. Ada beberapa algoritma yang di gunakan untuk penggalian dokumen untuk analisis sentimen, salah satunya adalah K-Means. Didalam penelitian ini algoritma yang digunakan adalah K-Means. Hasil penelitian menunjukkan bahwa akurasi K-Means dengan dataset digunakan 300 positif dan 300 negatif akurasinya 57.83%, 700 dokumen positif dan 700 negatif akurasinya 56.71%%, 1000 dokumen positif dan 1000 negatif akurasinya 50.40%%. Dari hasil pengujian disimpulkan bahwa semakin besar dataset yang digunakan semakin rendah akurasi K-Means. Kata Kunci : Text Mining, Analisis Sentimen, K-Means, Review Film
The Application of Text Mining in Business Research

DEFF Research Database (Denmark)

Preuss, Bjørn

2017-01-01

The aim of this paper is to present a methodological concept in business research that has the potential to become one of the most powerful methods in the upcoming years when it comes to research qualitative phenomena in business and society. It presents a selection of algorithms as well elaborat...... on potential use cases for a text mining based approach to qualitative data analysis....
Linked Heritage: a collaborative terminology management platform for a network of multilingual thesauri and controlled vocabularies

Directory of Open Access Journals (Sweden)

Marie-Veronique Leroi

2013-01-01

Full Text Available Terminology and multilingualism have been one of the main focuses of the Athena Project. Linked Heritage as a legacy of this project also deals with terminology and bring theory to practice applying the recommendations given in the Athena Project. Linked Heritage as a direct follow-up of these recommendations on terminology and multilingualism is currently working on the development of a Terminology Management Platform (TMP. This platform will allow any cultural institution to register, SKOSify and manage its terminology in a collaborative way. This Terminology Management Platform will provide a network of multilingual and cross-domain terminologies.
Compatibility between Text Mining and Qualitative Research in the Perspectives of Grounded Theory, Content Analysis, and Reliability

Science.gov (United States)

Yu, Chong Ho; Jannasch-Pennell, Angel; DiGangi, Samuel

2011-01-01

The objective of this article is to illustrate that text mining and qualitative research are epistemologically compatible. First, like many qualitative research approaches, such as grounded theory, text mining encourages open-mindedness and discourages preconceptions. Contrary to the popular belief that text mining is a linear and fully automated…
Identifying child abuse through text mining and machine learning

NARCIS (Netherlands)

Amrit, Chintan; Paauw, Tim; Aly, Robin; Lavric, Miha

2017-01-01

In this paper, we describe how we used text mining and analysis to identify and predict cases of child abuse in a public health institution. Such institutions in the Netherlands try to identify and prevent different kinds of abuse. A significant part of the medical data that the institutions have on
Data Processing and Text Mining Technologies on Electronic Medical Records: A Review

Directory of Open Access Journals (Sweden)

Wencheng Sun

2018-01-01

Full Text Available Currently, medical institutes generally use EMR to record patient’s condition, including diagnostic information, procedures performed, and treatment results. EMR has been recognized as a valuable resource for large-scale analysis. However, EMR has the characteristics of diversity, incompleteness, redundancy, and privacy, which make it difficult to carry out data mining and analysis directly. Therefore, it is necessary to preprocess the source data in order to improve data quality and improve the data mining results. Different types of data require different processing technologies. Most structured data commonly needs classic preprocessing technologies, including data cleansing, data integration, data transformation, and data reduction. For semistructured or unstructured data, such as medical text, containing more health information, it requires more complex and challenging processing methods. The task of information extraction for medical texts mainly includes NER (named-entity recognition and RE (relation extraction. This paper focuses on the process of EMR processing and emphatically analyzes the key techniques. In addition, we make an in-depth study on the applications developed based on text mining together with the open challenges and research issues for future work.
Text-mining-assisted biocuration workflows in Argo

Science.gov (United States)

Rak, Rafal; Batista-Navarro, Riza Theresa; Rowley, Andrew; Carter, Jacob; Ananiadou, Sophia

2014-01-01

Biocuration activities have been broadly categorized into the selection of relevant documents, the annotation of biological concepts of interest and identification of interactions between the concepts. Text mining has been shown to have a potential to significantly reduce the effort of biocurators in all the three activities, and various semi-automatic methodologies have been integrated into curation pipelines to support them. We investigate the suitability of Argo, a workbench for building text-mining solutions with the use of a rich graphical user interface, for the process of biocuration. Central to Argo are customizable workflows that users compose by arranging available elementary analytics to form task-specific processing units. A built-in manual annotation editor is the single most used biocuration tool of the workbench, as it allows users to create annotations directly in text, as well as modify or delete annotations created by automatic processing components. Apart from syntactic and semantic analytics, the ever-growing library of components includes several data readers and consumers that support well-established as well as emerging data interchange formats such as XMI, RDF and BioC, which facilitate the interoperability of Argo with other platforms or resources. To validate the suitability of Argo for curation activities, we participated in the BioCreative IV challenge whose purpose was to evaluate Web-based systems addressing user-defined biocuration tasks. Argo proved to have the edge over other systems in terms of flexibility of defining biocuration tasks. As expected, the versatility of the workbench inevitably lengthened the time the curators spent on learning the system before taking on the task, which may have affected the usability of Argo. The participation in the challenge gave us an opportunity to gather valuable feedback and identify areas of improvement, some of which have already been introduced. Database URL: http://argo.nactem.ac.uk PMID
LANGUAGE POLICIES AND MULTILINGUAL EDUCATION IN MINORITY SCHOOLS IN OTTOMAN EMPIRE: OUTCOMES AND FUTURE INSIGHTS

Directory of Open Access Journals (Sweden)

Emrah DOLGUNSOZ

2014-05-01

Full Text Available Language is the spirit of nations; the cement of the culture mosaic. Its education has a critical role especially for multi-national societies and states. According to Human Rights, every individual has the right to develop, teach and learn his native language in any setting. But this democratic right is required to be regularized with a healthy, efficient and long term multilingual education policy. As one of the most powerful multi-ethnic empires of history, Ottoman Empire embraced numerous cultures and several unique languages. As a policy, the Empire followed a relatively flexible and irregular language policy which fostered national homogeneity and unity in time. On the other hand, the Empire always kept the gap between Anatolian Turkish language by employing Ottoman language as official language. The imbalanced policies of multilingual education and Porte’s distance to Anatolian Turkish contributed a lot to the disintegration of the Empire. This study focuses on why Ottoman language policies adversely affected the unity of the multilingual Empire, scrutinizes the insufficient multilingual education models among Muslim society with its outcomes and discusses how multilingual education in minority schools contributed the disintegration process.
Post-Monolingual Research Methodology: Multilingual Researchers Democratizing Theorizing and Doctoral Education

Directory of Open Access Journals (Sweden)

Michael Singh

2017-02-01

Full Text Available This paper reports on the ground-breaking research in the study of languages in doctoral education. It argues for democratizing the production and dissemination of original contributions to knowledge through activating and mobilizing multilingual Higher Degree Researchers’ (HDRs capabilities for theorizing through them using their full linguistic repertoire. This paper contributes to this study’s development of post-monolingual research methodology which provides a theoretic-pedagogical framework for multilingual HDRs (a to use their full linguistic repertoire in their research; (b to develop their capabilities for theorizing and (c to construct potentially valuable theoretical tools using metaphors, images, concepts and modes of critique. This paper is based on a longitudinal program of collaborative research whereby monolingual Anglophone and multilingual HDRs jointly developed their capabilities for theorizing through producing Anglo-Chinese analytical tools, and the associated pedagogies for using their languages in doctoral research. This longitudinal research program has been undertaken in the field of doctoral education to further a defining feature of democracy, namely linguistic diversity. This research has been conducted with the aims of promoting the multilingualism of Australian universities and activating linguistic communities of scholars to use their full linguistic repertoire in their research. The main finding arising from this program of research has been the development of post-monolingual research methodology which (a uses the divergences within and between languages to undertake theorizing and (b in co-existence with the tensions posed by monolingualism, especially the insistence on using extant theories available in only one language. Doctoral pedagogies of intellectual/racial equality provide multilingual HDRs with insights into the debates about the geopolitics governing the use of languages in the production and
New challenges for text mining: mapping between text and manually curated pathways

Science.gov (United States)

Oda, Kanae; Kim, Jin-Dong; Ohta, Tomoko; Okanohara, Daisuke; Matsuzaki, Takuya; Tateisi, Yuka; Tsujii, Jun'ichi

2008-01-01

Background Associating literature with pathways poses new challenges to the Text Mining (TM) community. There are three main challenges to this task: (1) the identification of the mapping position of a specific entity or reaction in a given pathway, (2) the recognition of the causal relationships among multiple reactions, and (3) the formulation and implementation of required inferences based on biological domain knowledge. Results To address these challenges, we constructed new resources to link the text with a model pathway; they are: the GENIA pathway corpus with event annotation and NF-kB pathway. Through their detailed analysis, we address the untapped resource, ‘bio-inference,’ as well as the differences between text and pathway representation. Here, we show the precise comparisons of their representations and the nine classes of ‘bio-inference’ schemes observed in the pathway corpus. Conclusions We believe that the creation of such rich resources and their detailed analysis is the significant first step for accelerating the research of the automatic construction of pathway from text. PMID:18426550

Spectral signature verification using statistical analysis and text mining

Science.gov (United States)

DeCoster, Mallory E.; Firpi, Alexe H.; Jacobs, Samantha K.; Cone, Shelli R.; Tzeng, Nigel H.; Rodriguez, Benjamin M.

2016-05-01

In the spectral science community, numerous spectral signatures are stored in databases representative of many sample materials collected from a variety of spectrometers and spectroscopists. Due to the variety and variability of the spectra that comprise many spectral databases, it is necessary to establish a metric for validating the quality of spectral signatures. This has been an area of great discussion and debate in the spectral science community. This paper discusses a method that independently validates two different aspects of a spectral signature to arrive at a final qualitative assessment; the textual meta-data and numerical spectral data. Results associated with the spectral data stored in the Signature Database1 (SigDB) are proposed. The numerical data comprising a sample material's spectrum is validated based on statistical properties derived from an ideal population set. The quality of the test spectrum is ranked based on a spectral angle mapper (SAM) comparison to the mean spectrum derived from the population set. Additionally, the contextual data of a test spectrum is qualitatively analyzed using lexical analysis text mining. This technique analyzes to understand the syntax of the meta-data to provide local learning patterns and trends within the spectral data, indicative of the test spectrum's quality. Text mining applications have successfully been implemented for security2 (text encryption/decryption), biomedical3 , and marketing4 applications. The text mining lexical analysis algorithm is trained on the meta-data patterns of a subset of high and low quality spectra, in order to have a model to apply to the entire SigDB data set. The statistical and textual methods combine to assess the quality of a test spectrum existing in a database without the need of an expert user. This method has been compared to other validation methods accepted by the spectral science community, and has provided promising results when a baseline spectral signature is
Multilingual classrooms, language and literacy learners: Global childhoods

DEFF Research Database (Denmark)

Christensen, Mette Vedsgaard; Daugaard, Line Møller; Cox, Robyn

2015-01-01

childhoods of young multilingual and multiliterate learners, but explore globalised classrooms from various perspectives: the perspectives of learners, teachers and policymakers. In combination, the papers in the symposium offer a nuanced description of the tensions and dilemmas in contemporary multilingual...... classrooms across the globe and a multifaceted analysis of the multilingual nature of global childhoods. The first paper reports on research study conducted in primary schools in Sydney, Australia which investigated how multilingual children understand their own linguistic practices and how they report...... this practice. The children were asked to consider the role of their mulitingualism in their daily classroom experiences. The second paper, based on a linguistic ethnographic case study in Denmark, explores language ideological aspects of global childhoods as they are negotiated in and around ’mother tongue...
Literacy and linguistic diversity in the multilingual classroom

DEFF Research Database (Denmark)

Laursen, Helle Pia

and educational failure. Our study takes place in classrooms where teachers are engaged in developing a literacy pe dagogy which allows space for multilingualism and multimodality. Through intervention studies in these linguistically diverse classrooms, we are also investigating how teachers and students navigate....... The longitudinal study ‘Signs of language’ involves five multilingual classrooms. We are exploring how multilingual children interpret and create signs in order to communicate and perform their social identity in different multilingual and multimodal classroom settings. We are aiming at getting a better...... understanding of the children’s complex uses of the linguistic and semiotic resources available to them by paying close attention to the perspective of the children - as users and nterpreters of literacy (Blackledge & Creeese 2010). In classrooms some identity options are more available to the students than...
Equitable multilingualism? The case of Stellenbosch University ...

African Journals Online (AJOL)

This article reflects on Stellenbosch University Writing Lab's pedagogical approach to multilingualism and inclusivity within the complex and political nature of multilingual language policies at a South African university. The Writing Lab has always been promoted as a facility for all students, not just those in need of ...
Multilingual and Multimodal Composition at School: "ScribJab" in Action

Science.gov (United States)

Dagenais, Diane; Toohey, Kelleen; Bennett Fox, Alexa; Singh, Angelpreet

2017-01-01

In this article, we explain how recent research on multilingualism, multilingual education, and multimodality informs our thinking about the use of "ScribJab," a multilingual iPad application and website ("ScribJab.com"), which enables users to compose, illustrate, and narrate stories in two languages. Drawing on excerpts from…
Unsupervised text mining for assessing and augmenting GWAS results.

Science.gov (United States)

Ailem, Melissa; Role, François; Nadif, Mohamed; Demenais, Florence

2016-04-01

Text mining can assist in the analysis and interpretation of large-scale biomedical data, helping biologists to quickly and cheaply gain confirmation of hypothesized relationships between biological entities. We set this question in the context of genome-wide association studies (GWAS), an actively emerging field that contributed to identify many genes associated with multifactorial diseases. These studies allow to identify groups of genes associated with the same phenotype, but provide no information about the relationships between these genes. Therefore, our objective is to leverage unsupervised text mining techniques using text-based cosine similarity comparisons and clustering applied to candidate and random gene vectors, in order to augment the GWAS results. We propose a generic framework which we used to characterize the relationships between 10 genes reported associated with asthma by a previous GWAS. The results of this experiment showed that the similarities between these 10 genes were significantly stronger than would be expected by chance (one-sided p-value<0.01). The clustering of observed and randomly selected gene also allowed to generate hypotheses about potential functional relationships between these genes and thus contributed to the discovery of new candidate genes for asthma. Copyright © 2016 Elsevier Inc. All rights reserved.
The Paradoxical Visions of Multilingualism in Education: The Ideological Dimension of Discourses on Multilingualism in Belgium and Canada

Science.gov (United States)

Hambye, Philippe; Richards, Mary

2012-01-01

In this article, we will examine some contrasted discourses on multilingualism that circulate nowadays in the field of education. Focusing on the cases of French-speaking Belgium and of the Franco-Ontarian community in Canada, we will show the existence of two discourses on multilingualism: one that insists on the positive value of multilingualism…
Evolution of bayesian-related research over time: a temporal text mining task

CSIR Research Space (South Africa)

de Waal, A

2006-06-01

Full Text Available Ronald Reagan’s Radio Addresses? Bayesian Analysis 2006, Volume 1, Number 2, pp. 189-383. 2. Mei Q and Zhai C, 2005. Discovering Evolutionary Theme Patterns from Text – An Exploration of Temporal Text Mining. KDD’05, August 21-24, 2005. Chicago...
Building a glaucoma interaction network using a text mining approach.

Science.gov (United States)

Soliman, Maha; Nasraoui, Olfa; Cooper, Nigel G F

2016-01-01

The volume of biomedical literature and its underlying knowledge base is rapidly expanding, making it beyond the ability of a single human being to read through all the literature. Several automated methods have been developed to help make sense of this dilemma. The present study reports on the results of a text mining approach to extract gene interactions from the data warehouse of published experimental results which are then used to benchmark an interaction network associated with glaucoma. To the best of our knowledge, there is, as yet, no glaucoma interaction network derived solely from text mining approaches. The presence of such a network could provide a useful summative knowledge base to complement other forms of clinical information related to this disease. A glaucoma corpus was constructed from PubMed Central and a text mining approach was applied to extract genes and their relations from this corpus. The extracted relations between genes were checked using reference interaction databases and classified generally as known or new relations. The extracted genes and relations were then used to construct a glaucoma interaction network. Analysis of the resulting network indicated that it bears the characteristics of a small world interaction network. Our analysis showed the presence of seven glaucoma linked genes that defined the network modularity. A web-based system for browsing and visualizing the extracted glaucoma related interaction networks is made available at http://neurogene.spd.louisville.edu/GlaucomaINViewer/Form1.aspx. This study has reported the first version of a glaucoma interaction network using a text mining approach. The power of such an approach is in its ability to cover a wide range of glaucoma related studies published over many years. Hence, a bigger picture of the disease can be established. To the best of our knowledge, this is the first glaucoma interaction network to summarize the known literature. The major findings were a set of
English Medium Instruction in Multilingual and Multicultural Universities:

DEFF Research Database (Denmark)

Henriksen, Birgit; Holmen, Anne; Kling, Joyce

’ experiences in the midst of curricular change and presents reflections on ways to professionally navigate in English to meet the demands of the multilingual and multicultural classroom. English Medium Instruction in Multilingual and Multicultural Universities is key reading for university management......English Medium Instruction in Multilingual and Multicultural Universities analyses the issues related to EMI at both a local and international level and provides a broad perspective on this topic. Drawing on field studies from a Northern European context and based primarily on research carried out...
The Languages of the Multilingual: Some Conceptual and Terminological Issues

Science.gov (United States)

Hammarberg, Bjorn

2010-01-01

Research on individual multilingualism and third language acquisition has expanded greatly in recent years. A theoretical correlate of this is the recognition of the fact that humans are potentially multilingual by nature, that multilingualism is the default state of language competence, and that this in turn has implications for an adequate…
[Monolingualism, an overlooked multilingual?

Science.gov (United States)

Vincent, E

There has been some emphasis on the practice of multilingualism. It is seen as encouraging children creativity, linguistic sensitivity and openness. In this article, we seek to find out if the different qualities demonstrated in multilingualism can also be developed in a monolingualism context. Despite the fact that it is a single language system - where grammar, accents, the rhythm of the sentence remain unchanged - it will be interesting to draw some parallels with multilingualism. This will lead us to study the processes of oral and written language acquisition in children. The associations with stuttering will also be mentioned.
DDMGD: the database of text-mined associations between genes methylated in diseases from different species

KAUST Repository

Raies, A. B.; Mansour, H.; Incitti, R.; Bajic, Vladimir B.

2014-01-01

://www.cbrc.kaust.edu.sa/ddmgd/) to provide a comprehensive repository of information related to genes methylated in diseases that can be found through text mining. DDMGD's scope is not limited to a particular group of genes, diseases or species. Using the text mining system DEMGD we
The Distribution of the Informative Intensity of the Text in Terms of its Structure (On Materials of the English Texts in the Mining Sphere

Directory of Open Access Journals (Sweden)

Znikina Ludmila

2017-01-01

Full Text Available The article deals with the distribution of informative intensity of the English-language scientific text based on its structural features contributing to the process of formalization of the scientific text and the preservation of the adequacy of the text with derived semantic information in relation to the primary. Discourse analysis is built on specific compositional and meaningful examples of scientific texts taken from the mining field. It also analyzes the adequacy of the translation of foreign texts into another language, the relationships between elements of linguistic systems, the degree of a formal conformance, translation with the specific objectives and information needs of the recipient. Some key words and ideas are emphasized in the paragraphs of the English-language mining scientific texts. The article gives the characteristic features of the structure of paragraphs of technical text and examples of constructions in English scientific texts based on a mining theme with the aim to explain the possible ways of their adequate translation.
Multilingual Aeronautical Dictionary (Dictionnaire Aeronautique Multilingue)

Science.gov (United States)

1980-01-01

8217See ’aerofoil profile’ DE Bord’Boden-Funkverkehr (ili 20~ AGARD MULTILINGUAL AERONAUTICAL DICTIONARY 10318 air mileage indicator (AMI) ES comunicacion ...Autogenschweissen (nil ES sistema fml autom~tico de comunicacion NE automatische besturing ES soldadura MI autdgena aire-tierra P otooWatmtc FR soudage Wm autogene...AERONAUTICAL DICTIONARY DE Fernmeldesatellit [m) RU 1. maPXWbPOBK& ff1 OTcOKOB RU onPe~ene~me Wn Aesma84HN Komnaca ES satelite Wm do comunicaciones 2
From Word Alignment to Word Senses, via Multilingual Wordnets

Directory of Open Access Journals (Sweden)

Dan Tufis

2006-05-01

Full Text Available Most of the successful commercial applications in language processing (text and/or speech dispense with any explicit concern on semantics, with the usual motivations stemming from the computational high costs required for dealing with semantics, in case of large volumes of data. With recent advances in corpus linguistics and statistical-based methods in NLP, revealing useful semantic features of linguistic data is becoming cheaper and cheaper and the accuracy of this process is steadily improving. Lately, there seems to be a growing acceptance of the idea that multilingual lexical ontologisms might be the key towards aligning different views on the semantic atomic units to be used in characterizing the general meaning of various and multilingual documents. Depending on the granularity at which semantic distinctions are necessary, the accuracy of the basic semantic processing (such as word sense disambiguation can be very high with relatively low complexity computing. The paper substantiates this statement by presenting a statistical/based system for word alignment and word sense disambiguation in parallel corpora. We describe a word alignment platform which ensures text pre-processing (tokenization, POS-tagging, lemmatization, chunking, sentence and word alignment as required by an accurate word sense disambiguation.
The Distribution of the Informative Intensity of the Text in Terms of its Structure (On Materials of the English Texts in the Mining Sphere)

Science.gov (United States)

Znikina, Ludmila; Rozhneva, Elena

2017-11-01

The article deals with the distribution of informative intensity of the English-language scientific text based on its structural features contributing to the process of formalization of the scientific text and the preservation of the adequacy of the text with derived semantic information in relation to the primary. Discourse analysis is built on specific compositional and meaningful examples of scientific texts taken from the mining field. It also analyzes the adequacy of the translation of foreign texts into another language, the relationships between elements of linguistic systems, the degree of a formal conformance, translation with the specific objectives and information needs of the recipient. Some key words and ideas are emphasized in the paragraphs of the English-language mining scientific texts. The article gives the characteristic features of the structure of paragraphs of technical text and examples of constructions in English scientific texts based on a mining theme with the aim to explain the possible ways of their adequate translation.
Using Text Mining to Uncover Students' Technology-Related Problems in Live Video Streaming

Science.gov (United States)

Abdous, M'hammed; He, Wu

2011-01-01

Because of their capacity to sift through large amounts of data, text mining and data mining are enabling higher education institutions to reveal valuable patterns in students' learning behaviours without having to resort to traditional survey methods. In an effort to uncover live video streaming (LVS) students' technology related-problems and to…
Text mining approach to predict hospital admissions using early medical records from the emergency department.

Science.gov (United States)

Lucini, Filipe R; S Fogliatto, Flavio; C da Silveira, Giovani J; L Neyeloff, Jeruza; Anzanello, Michel J; de S Kuchenbecker, Ricardo; D Schaan, Beatriz

2017-04-01

Emergency department (ED) overcrowding is a serious issue for hospitals. Early information on short-term inward bed demand from patients receiving care at the ED may reduce the overcrowding problem, and optimize the use of hospital resources. In this study, we use text mining methods to process data from early ED patient records using the SOAP framework, and predict future hospitalizations and discharges. We try different approaches for pre-processing of text records and to predict hospitalization. Sets-of-words are obtained via binary representation, term frequency, and term frequency-inverse document frequency. Unigrams, bigrams and trigrams are tested for feature formation. Feature selection is based on χ 2 and F-score metrics. In the prediction module, eight text mining methods are tested: Decision Tree, Random Forest, Extremely Randomized Tree, AdaBoost, Logistic Regression, Multinomial Naïve Bayes, Support Vector Machine (Kernel linear) and Nu-Support Vector Machine (Kernel linear). Prediction performance is evaluated by F1-scores. Precision and Recall values are also informed for all text mining methods tested. Nu-Support Vector Machine was the text mining method with the best overall performance. Its average F1-score in predicting hospitalization was 77.70%, with a standard deviation (SD) of 0.66%. The method could be used to manage daily routines in EDs such as capacity planning and resource allocation. Text mining could provide valuable information and facilitate decision-making by inward bed management teams. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
Proposals and strategies for the valorization of the multilingualism in the mother tongue

Directory of Open Access Journals (Sweden)

Álvaro Antônio Caretta

2016-06-01

Full Text Available In every society, the language is manifested through various oral genres, written and also multimodal, with a huge diversity related to the various conditions of production and circulation of statements. It is true that language is shaped according to the various uses citizens make of it in society. Thus, one cannot perpetuate the myth of a uniform language; on the contrary, the study of the varieties that constitute the multilingualism should be emphasized because it contains a multiculturalism and also the identity of the different communities. In this context, it is essential to reflect on the issue of linguistic discrimination, a major drawback in the process of building a more egalitarian society, diverse and democratic, and also on the true role of cultural norms for writing texts teaching in schools and, especially, of orality where multilingualism appears more clearly. From the observation of these language modes, we understand the importance of various forms of linguistic expression in the constitution of social multilingualism, a prerequisite for a society that seeks to enhance the multiple facets of its multiculturalism. --- http://dx.doi.org/10.12957/matraga.2016.20771

An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature.

Science.gov (United States)

Trybula, Walter J.; Wyllys, Ronald E.

2000-01-01

Addresses an approach to the discovery of scientific knowledge through an examination of data mining and text mining techniques. Presents the results of experiments that investigated knowledge acquisition from a selected set of technical documents by domain experts. (Contains 15 references.) (Author/LRW)
CrossRef text and data mining services

Directory of Open Access Journals (Sweden)

Rachael Lammey

2015-02-01

Full Text Available CrossRef is an association of scholarly publishers that develops shared infrastructure to support more effective scholarly communications. It is a registration agency for the digital object identifier (DOI, and has built additional services for CrossRef members around the DOI and the bibliographic metadata that publishers deposit in order to register DOIs for their publications. Among these services are CrossCheck, powered by iThenticate, which helps publishers screen for plagiarism in submitted manuscripts and FundRef, which gives publishers standard way to report funding sources for published scholarly research. To add to these services, Cross-Ref launched CrossRef text and data mining services in May 2014. This article will explain the thinking behind CrossRef launching this new service, what it offers to publishers and researchers alike, how publishers can participate in it, and the uptake of the service so far.
Mining consumer health vocabulary from community-generated text.

Science.gov (United States)

Vydiswaran, V G Vinod; Mei, Qiaozhu; Hanauer, David A; Zheng, Kai

2014-01-01

Community-generated text corpora can be a valuable resource to extract consumer health vocabulary (CHV) and link them to professional terminologies and alternative variants. In this research, we propose a pattern-based text-mining approach to identify pairs of CHV and professional terms from Wikipedia, a large text corpus created and maintained by the community. A novel measure, leveraging the ratio of frequency of occurrence, was used to differentiate consumer terms from professional terms. We empirically evaluated the applicability of this approach using a large data sample consisting of MedLine abstracts and all posts from an online health forum, MedHelp. The results show that the proposed approach is able to identify synonymous pairs and label the terms as either consumer or professional term with high accuracy. We conclude that the proposed approach provides great potential to produce a high quality CHV to improve the performance of computational applications in processing consumer-generated health text.
Constructing Glocal Identities through Multilingual Writing Practices on Flickr.com[R

Science.gov (United States)

Lee, Carmen K. M.; Barton, David

2011-01-01

This article reports on a study of user-generated multilingual writing activities on the photo sharing site, Flickr.com[R]. It discusses how Flickr users deploy their multilingual resources when interacting with international audiences, the factors affecting their language choice, and how new multilingual identities are constructed. An exploratory…
OSCAR4: a flexible architecture for chemical text-mining

Directory of Open Access Journals (Sweden)

Jessop David M

2011-10-01

Full Text Available Abstract The Open-Source Chemistry Analysis Routines (OSCAR software, a toolkit for the recognition of named entities and data in chemistry publications, has been developed since 2002. Recent work has resulted in the separation of the core OSCAR functionality and its release as the OSCAR4 library. This library features a modular API (based on reduction of surface coupling that permits client programmers to easily incorporate it into external applications. OSCAR4 offers a domain-independent architecture upon which chemistry specific text-mining tools can be built, and its development and usage are discussed.
Multilingual Communication and Language Acquisition: New Research Directions

Science.gov (United States)

Canagarajah, A. Suresh; Wurr, Adrian J.

2011-01-01

In this article, we outline the differences between a monolingual and multilingual orientation to language and language acquisition. The increasing contact between languages in the context of globalization motivates such a shift of paradigms. Multilingual communicative practices have remained vibrant in non-western communities for a long time. We…
Multilingualism in the Workplace: Language Practices in Multilingual Contexts

Science.gov (United States)

Angouri, Jo

2014-01-01

The modern workplace is international and multilingual. Both white and blue collar employees are expected to be mobile, work increasingly in (virtual) teams (Gee et al. 1996) and to address complex organisational issues in a language that, often, is not their first language (L1). This results in a number of languages forming the ecosystem of…
An overview of the BioCreative 2012 Workshop Track III: interactive text mining task.

Science.gov (United States)

Arighi, Cecilia N; Carterette, Ben; Cohen, K Bretonnel; Krallinger, Martin; Wilbur, W John; Fey, Petra; Dodson, Robert; Cooper, Laurel; Van Slyke, Ceri E; Dahdul, Wasila; Mabee, Paula; Li, Donghui; Harris, Bethany; Gillespie, Marc; Jimenez, Silvia; Roberts, Phoebe; Matthews, Lisa; Becker, Kevin; Drabkin, Harold; Bello, Susan; Licata, Luana; Chatr-aryamontri, Andrew; Schaeffer, Mary L; Park, Julie; Haendel, Melissa; Van Auken, Kimberly; Li, Yuling; Chan, Juancarlos; Muller, Hans-Michael; Cui, Hong; Balhoff, James P; Chi-Yang Wu, Johnny; Lu, Zhiyong; Wei, Chih-Hsuan; Tudor, Catalina O; Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar; Cejuela, Juan Miguel; Dubey, Pratibha; Wu, Cathy

2013-01-01

In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators' overall experience of a system, regardless of the system's high score on design, learnability and
Nigerian Theatre and the Multilingual Challenge | Umukoro | Ibadan ...

African Journals Online (AJOL)

The paper examines the prospects and problems of theatrical communication in a multilingual and multicultural society like Nigeria. It attempts to identify the historical basis for multilingualism from the global perspective and explores the relative potentials of the literary arts of poetry, prose and drama in responding to the ...
Lifelong exposure to multilingualism: new evidence to support cognitive reserve hypothesis.

Directory of Open Access Journals (Sweden)

Magali Perquin

Full Text Available OBJECTIVE: Investigate the protective effect of multilingualism on cognition in seniors. METHODS: As part of the MemoVie study conducted on 232 non-demented volunteers aged 65 and more, neurogeriatric and neuropsychological evaluations were performed. Participants were classified as presenting either cognitive impairment without dementia (CIND or being free of any cognitive impairment (CIND-free. Language practices, socio-demographic data and lifestyle habits were recorded. In this retrospective nested case-control design, we used as proxies of multilingualism: number of languages practiced, age of acquisition and duration of practice, emphasizing the temporal pattern of acquisition, and the resulting practice of several languages sequentially or concomitantly during various periods of life. This special angle on the matter offered to our work a dimension particularly original and innovative. RESULTS: 44 subjects (19% had CIND, the others were cognitively normal. All practiced from 2 to 7 languages. When compared with bilinguals, participants who practiced more than 2 languages presented a lower risk of CIND, after adjustment for education and age (odds ratio (OR = 0.30, 95% confidence limits (95%CL = [0.10-0.92]. Progressing from 2 to 3 languages, instead of staying bilingual, was associated with a 7-fold protection against CIND (OR = 0.14, 95%CL = [0.04-0.45], p = 0.0010. A one year delay to reach multilingualism (3 languages practiced being the threshold multiplied the risk of CIND by 1.022 (OR = 1.022, 95%CL = [1.01-1.04], p = 0.0044. Also noteworthy, just as for multilingualism, an impact of cognitively stimulating activities on the occurrence of CIND was found as well (OR = 0.979, 95%CL = [0.961-0.998], p = 0.033. CONCLUSION: The study did not show independence of multilingualism and CIND. Rather it seems to show a strong association toward a protection against CIND. Practicing multilingualism
Text Mining for Drugs and Chemical Compounds: Methods, Tools and Applications.

Science.gov (United States)

Vazquez, Miguel; Krallinger, Martin; Leitner, Florian; Valencia, Alfonso

2011-06-01

Providing prior knowledge about biological properties of chemicals, such as kinetic values, protein targets, or toxic effects, can facilitate many aspects of drug development. Chemical information is rapidly accumulating in all sorts of free text documents like patents, industry reports, or scientific articles, which has motivated the development of specifically tailored text mining applications. Despite the potential gains, chemical text mining still faces significant challenges. One of the most salient is the recognition of chemical entities mentioned in text. To help practitioners contribute to this area, a good portion of this review is devoted to this issue, and presents the basic concepts and principles underlying the main strategies. The technical details are introduced and accompanied by relevant bibliographic references. Other tasks discussed are retrieving relevant articles, identifying relationships between chemicals and other entities, or determining the chemical structures of chemicals mentioned in text. This review also introduces a number of published applications that can be used to build pipelines in topics like drug side effects, toxicity, and protein-disease-compound network analysis. We conclude the review with an outlook on how we expect the field to evolve, discussing its possibilities and its current limitations. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Mining free-text medical records for companion animal enteric syndrome surveillance.

Science.gov (United States)

Anholt, R M; Berezowski, J; Jamal, I; Ribble, C; Stephen, C

2014-03-01

Large amounts of animal health care data are present in veterinary electronic medical records (EMR) and they present an opportunity for companion animal disease surveillance. Veterinary patient records are largely in free-text without clinical coding or fixed vocabulary. Text-mining, a computer and information technology application, is needed to identify cases of interest and to add structure to the otherwise unstructured data. In this study EMR's were extracted from veterinary management programs of 12 participating veterinary practices and stored in a data warehouse. Using commercially available text-mining software (WordStat™), we developed a categorization dictionary that could be used to automatically classify and extract enteric syndrome cases from the warehoused electronic medical records. The diagnostic accuracy of the text-miner for retrieving cases of enteric syndrome was measured against human reviewers who independently categorized a random sample of 2500 cases as enteric syndrome positive or negative. Compared to the reviewers, the text-miner retrieved cases with enteric signs with a sensitivity of 87.6% (95%CI, 80.4-92.9%) and a specificity of 99.3% (95%CI, 98.9-99.6%). Automatic and accurate detection of enteric syndrome cases provides an opportunity for community surveillance of enteric pathogens in companion animals. Copyright © 2014 Elsevier B.V. All rights reserved.
[Text mining, a method for computer-assisted analysis of scientific texts, demonstrated by an analysis of author networks].

Science.gov (United States)

Hahn, P; Dullweber, F; Unglaub, F; Spies, C K

2014-06-01

Searching for relevant publications is becoming more difficult with the increasing number of scientific articles. Text mining as a specific form of computer-based data analysis may be helpful in this context. Highlighting relations between authors and finding relevant publications concerning a specific subject using text analysis programs are illustrated graphically by 2 performed examples. © Georg Thieme Verlag KG Stuttgart · New York.
U-Compare: share and compare text mining tools with UIMA

Science.gov (United States)

Kano, Yoshinobu; Baumgartner, William A.; McCrohon, Luke; Ananiadou, Sophia; Cohen, K. Bretonnel; Hunter, Lawrence; Tsujii, Jun'ichi

2009-01-01

Summary: Due to the increasing number of text mining resources (tools and corpora) available to biologists, interoperability issues between these resources are becoming significant obstacles to using them effectively. UIMA, the Unstructured Information Management Architecture, is an open framework designed to aid in the construction of more interoperable tools. U-Compare is built on top of the UIMA framework, and provides both a concrete framework for out-of-the-box text mining and a sophisticated evaluation platform allowing users to run specific tools on any target text, generating both detailed statistics and instance-based visualizations of outputs. U-Compare is a joint project, providing the world's largest, and still growing, collection of UIMA-compatible resources. These resources, originally developed by different groups for a variety of domains, include many famous tools and corpora. U-Compare can be launched straight from the web, without needing to be manually installed. All U-Compare components are provided ready-to-use and can be combined easily via a drag-and-drop interface without any programming. External UIMA components can also simply be mixed with U-Compare components, without distinguishing between locally and remotely deployed resources. Availability: http://u-compare.org/ Contact: kano@is.s.u-tokyo.ac.jp PMID:19414535
Vladimir Nabokov: A Case Study of Multilingualism and Translation

Directory of Open Access Journals (Sweden)

Paulina Rothermel

2014-11-01

Full Text Available This article explores the relationship between translation and multilingualism through an examination of Vladimir Nabokov’s works and views on the topic. The main idea of the article is that translation is one of the implications of multi-competence, as defined by Vivian Cook in 1991, and as such is reliant on the translator’s cultural grounding. In Nabokov’s case, multilingualism and multiculturalism resulted in some very specific approaches in his own translation, as well as in his setting of canons for other translators to follow. Advocacy of the literal style in transliteration which remains faithful to the original author constitutes evidence of the utmost appreciation for the broadening of mental horizons that such foreignization may bring. Some rendering of Nabokov’s works into Polish, and the following of his directives in those renditions, were also analyzed by the author of the article.
Text Mining of UU-ITE Implementation in Indonesia

Science.gov (United States)

Hakim, Lukmanul; Kusumasari, Tien F.; Lubis, Muharman

2018-04-01

At present, social media and networks act as one of the main platforms for sharing information, idea, thought and opinions. Many people share their knowledge and express their views on the specific topics or current hot issues that interest them. The social media texts have rich information about the complaints, comments, recommendation and suggestion as the automatic reaction or respond to government initiative or policy in order to overcome certain issues.This study examines the sentiment from netizensas part of citizen who has vocal sound about the implementation of UU ITE as the first cyberlaw in Indonesia as a means to identify the current tendency of citizen perception. To perform text mining techniques, this study used Twitter Rest API while R programming was utilized for the purpose of classification analysis based on hierarchical cluster.
Intensity of Multilingual Language Use Predicts Cognitive Performance in Some Multilingual Older Adults

Science.gov (United States)

Keijzer, Merel; de Bot, Kees

2018-01-01

Cognitive advantages for bilinguals have inconsistently been observed in different populations, with different operationalisations of bilingualism, cognitive performance, and the process by which language control transfers to cognitive control. This calls for studies investigating which aspects of multilingualism drive a cognitive advantage, in which populations and under which conditions. This study reports on two cognitive tasks coupled with an extensive background questionnaire on health, wellbeing, personality, language knowledge and language use, administered to 387 older adults in the northern Netherlands, a small but highly multilingual area. Using linear mixed effects regression modeling, we find that when different languages are used frequently in different contexts, enhanced attentional control is observed. Subsequently, a PLS regression model targeting also other influential factors yielded a two-component solution whereby only more sensitive measures of language proficiency and language usage in different social contexts were predictive of cognitive performance above and beyond the contribution of age, gender, income and education. We discuss these findings in light of previous studies that try to uncover more about the nature of bilingualism and the cognitive processes that may drive an advantage. With an unusually large sample size our study advocates for a move away from dichotomous, knowledge-based operationalisations of multilingualism and offers new insights for future studies at the individual level. PMID:29783764
Multilingualism and dyslexia: challenges for research and practice.

Science.gov (United States)

Cline, T

2000-01-01

Over the last two decades there has been an expansion of activity and substantial progress in research on dyslexia and research on bilingualism and multilingualism. But the study of dyslexia has generally focused on monolingual learners and the study of bilingualism has tended to focus on speakers who do not have special educational needs. This paper will review the strands of research to date that have a bearing on multilingualism and dyslexia and attempt to identify the major challenges that face researchers and teachers. A satisfactory response cannot be developed without a full understanding of the impact that dyslexia has on language learning and the impact that multilingualism has on literacy learning.
The potential of text mining in data integration and network biology for plant research: a case study on Arabidopsis.

Science.gov (United States)

Van Landeghem, Sofie; De Bodt, Stefanie; Drebert, Zuzanna J; Inzé, Dirk; Van de Peer, Yves

2013-03-01

Despite the availability of various data repositories for plant research, a wealth of information currently remains hidden within the biomolecular literature. Text mining provides the necessary means to retrieve these data through automated processing of texts. However, only recently has advanced text mining methodology been implemented with sufficient computational power to process texts at a large scale. In this study, we assess the potential of large-scale text mining for plant biology research in general and for network biology in particular using a state-of-the-art text mining system applied to all PubMed abstracts and PubMed Central full texts. We present extensive evaluation of the textual data for Arabidopsis thaliana, assessing the overall accuracy of this new resource for usage in plant network analyses. Furthermore, we combine text mining information with both protein-protein and regulatory interactions from experimental databases. Clusters of tightly connected genes are delineated from the resulting network, illustrating how such an integrative approach is essential to grasp the current knowledge available for Arabidopsis and to uncover gene information through guilt by association. All large-scale data sets, as well as the manually curated textual data, are made publicly available, hereby stimulating the application of text mining data in future plant biology studies.
Evaluating a Bilingual Text-Mining System with a Taxonomy of Key Words and Hierarchical Visualization for Understanding Learner-Generated Text

Science.gov (United States)

Kong, Siu Cheung; Li, Ping; Song, Yanjie

2018-01-01

This study evaluated a bilingual text-mining system, which incorporated a bilingual taxonomy of key words and provided hierarchical visualization, for understanding learner-generated text in the learning management systems through automatic identification and counting of matching key words. A class of 27 in-service teachers studied a course…

Academic outcomes of multilingual children in Australia.

Science.gov (United States)

O'Connor, Meredith; O'Connor, Elodie; Tarasuik, Joanne; Gray, Sarah; Kvalsvig, Amanda; Goldfeld, Sharon

2017-02-24

The Australian educational system is increasingly challenged to meet the needs of multilingual students, who comprise a fifth of the student population. Within the context of a monolingual English curriculum, multilingual children who enter school not yet English proficient may be at risk of experiencing inequitable educational outcomes. We examined the relationship between the timing of multilingual children's acquisition of receptive English vocabulary skills and subsequent reading and numeracy outcomes, as well as factors associated with earlier versus later timing of acquisition. Data were drawn from the Kindergarten-cohort (n = 4983) of the Longitudinal Study of Australian Children - a nationally representative, community sample of Australian children. Linear regression analyses revealed that multilingual children who begin school with proficient receptive English vocabulary skills, or who acquire proficiency early in schooling, are indistinguishable from their monolingual peers in literacy and numeracy outcomes by 10-11 years. However, later acquisition of receptive English vocabulary skills (i.e. after 6-7 years) was associated with poorer literacy outcomes. In turn, socioeconomic disadvantage and broader language or learning problems predicted this later acquisition of receptive English vocabulary skills. All children need to be supported during the early years of school to reach their full educational potential.
Experiences with Text Mining Large Collections of Unstructured Systems Development Artifacts at JPL

Science.gov (United States)

Port, Dan; Nikora, Allen; Hihn, Jairus; Huang, LiGuo

2011-01-01

Often repositories of systems engineering artifacts at NASA's Jet Propulsion Laboratory (JPL) are so large and poorly structured that they have outgrown our capability to effectively manually process their contents to extract useful information. Sophisticated text mining methods and tools seem a quick, low-effort approach to automating our limited manual efforts. Our experiences of exploring such methods mainly in three areas including historical risk analysis, defect identification based on requirements analysis, and over-time analysis of system anomalies at JPL, have shown that obtaining useful results requires substantial unanticipated efforts - from preprocessing the data to transforming the output for practical applications. We have not observed any quick 'wins' or realized benefit from short-term effort avoidance through automation in this area. Surprisingly we have realized a number of unexpected long-term benefits from the process of applying text mining to our repositories. This paper elaborates some of these benefits and our important lessons learned from the process of preparing and applying text mining to large unstructured system artifacts at JPL aiming to benefit future TM applications in similar problem domains and also in hope for being extended to broader areas of applications.
Text Mining the History of Medicine.

Science.gov (United States)

Thompson, Paul; Batista-Navarro, Riza Theresa; Kontonatsios, Georgios; Carter, Jacob; Toon, Elizabeth; McNaught, John; Timmermann, Carsten; Worboys, Michael; Ananiadou, Sophia

2016-01-01

Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while
Multilingualism and transnational communication strategies in Europe: from Hapsburg to the European Union

NARCIS (Netherlands)

Korshunova, G.; Marácz, L.; Marácz, L.; Rosello, M.

2012-01-01

The chapter discusses multilingualism in the European context and transnational communication strategies in order to accommodate the challenges of multilingualism. In the introduction, concepts defining multilingualism, transnationalism and communication strategies will be discussed and clarified.
Text Mining Untuk Analisis Sentimen Review Film Menggunakan Algoritma K-Means

OpenAIRE

Setyo Budi

2017-01-01

Kemudahan manusia didalam menggunakan website mengakibatkan bertambahnya dokumen teks yang berupa pendapat dan informasi. Dalam waktu yang lama dokumen teks akan bertambah besar. Text mining merupakan salah satu teknik yang digunakan untuk menggali kumpulan dokumen text sehingga dapat diambil intisarinya. Ada beberapa algoritma yang di gunakan untuk penggalian dokumen untuk analisis sentimen, salah satunya adalah K-Means. Didalam penelitian ini algoritma yang digunakan adalah K-Means. Hasil p...
MLED_BI: a new BI Design Approach to Support Multilingualism in Business Intelligence

Directory of Open Access Journals (Sweden)

Nedim Dedić

2017-11-01

Full Text Available Existing approaches to support Multilingualism (ML in Business Intelligence (BI create problems for business users, present a number of challenges from the technical perspective, and lead to issues with logical dependence in the star schema. In this paper, we propose MLED_BI (Multilingual Enabled Design for Business Intelligence, a novel BI design approach to support the application of ML in BI Environment, which overcomes the issues and problems found with existing approaches. The approach is based on a revision of the data warehouse dimensional modelling approach and treats the Star Schema as a higher level entity. This paper describes MLED_BI and the validation and evaluation approach used.
Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery.

Science.gov (United States)

Gonzalez, Graciela H; Tahsin, Tasnia; Goodale, Britton C; Greene, Anna C; Greene, Casey S

2016-01-01

Precision medicine will revolutionize the way we treat and prevent disease. A major barrier to the implementation of precision medicine that clinicians and translational scientists face is understanding the underlying mechanisms of disease. We are starting to address this challenge through automatic approaches for information extraction, representation and analysis. Recent advances in text and data mining have been applied to a broad spectrum of key biomedical questions in genomics, pharmacogenomics and other fields. We present an overview of the fundamental methods for text and data mining, as well as recent advances and emerging applications toward precision medicine. © The Author 2015. Published by Oxford University Press.
Rights and Multilingualism

Science.gov (United States)

Torpsten, Ann-Christin

2012-01-01

In this paper, the author focuses on educational values and second language learners' experiences from education. She is using a life story approach. Overarching aim of the presentation is to discuss second language teacher students' encountering with Swedish school, mother tongue tuition, second language and multilingualism. The goal was achieved…
Theorizing Translanguaging and Multilingual Literacies through Human Capital Theory

Science.gov (United States)

Smith, Patrick H.; Murillo, Luz A.

2015-01-01

In this conceptual article we invite multilingual researchers to consider the concept of translanguaging through the lens of human capital theory. Our thinking about the interconnections among human capital, multilingualism, and translanguaging is motivated by our research in border "colonias" and other minoritized communities in South…
Supporting the annotation of chronic obstructive pulmonary disease (COPD) phenotypes with text mining workflows.

Science.gov (United States)

Fu, Xiao; Batista-Navarro, Riza; Rak, Rafal; Ananiadou, Sophia

2015-01-01

Chronic obstructive pulmonary disease (COPD) is a life-threatening lung disorder whose recent prevalence has led to an increasing burden on public healthcare. Phenotypic information in electronic clinical records is essential in providing suitable personalised treatment to patients with COPD. However, as phenotypes are often "hidden" within free text in clinical records, clinicians could benefit from text mining systems that facilitate their prompt recognition. This paper reports on a semi-automatic methodology for producing a corpus that can ultimately support the development of text mining tools that, in turn, will expedite the process of identifying groups of COPD patients. A corpus of 30 full-text papers was formed based on selection criteria informed by the expertise of COPD specialists. We developed an annotation scheme that is aimed at producing fine-grained, expressive and computable COPD annotations without burdening our curators with a highly complicated task. This was implemented in the Argo platform by means of a semi-automatic annotation workflow that integrates several text mining tools, including a graphical user interface for marking up documents. When evaluated using gold standard (i.e., manually validated) annotations, the semi-automatic workflow was shown to obtain a micro-averaged F-score of 45.70% (with relaxed matching). Utilising the gold standard data to train new concept recognisers, we demonstrated that our corpus, although still a work in progress, can foster the development of significantly better performing COPD phenotype extractors. We describe in this work the means by which we aim to eventually support the process of COPD phenotype curation, i.e., by the application of various text mining tools integrated into an annotation workflow. Although the corpus being described is still under development, our results thus far are encouraging and show great potential in stimulating the development of further automatic COPD phenotype extractors.
Flexible Multilingual Education: Putting Children's Needs First

Science.gov (United States)

Weber, Jean-Jacques

2014-01-01

This book examines the benefits of multilingual education that puts children's needs and interests above the individual languages involved. It advocates flexible multilingual education, which builds upon children's actual home resources and provides access to both the local and global languages that students need for their educational and…
Multilingual Cultural Resources in Child-Headed Families in Uganda

Science.gov (United States)

Namazzi, Elizabeth; Kendrick, Maureen E.

2014-01-01

This article reports on a study focusing on the use of multilingual cultural resources in child-headed households (CHHs) in Uganda's Rakai District. Using funds of knowledge and sociocultural perspectives on children's learning, we documented through ethnographic observations and interviews how children in four CHHs used multilingual cultural…
Identity Practices of Multilingual Writers in Social Networking Spaces

Science.gov (United States)

Chen, Hsin-I

2013-01-01

This study examines the literacy practices of two multilingual writers in social networking communities. The findings show that the multilingual writers explored and reappropriated symbolic resources afforded by the social networking site as they aligned themselves with particular collective and personal identities at local and global levels.…
Using text mining for study identification in systematic reviews: a systematic review of current approaches.

Science.gov (United States)

O'Mara-Eves, Alison; Thomas, James; McNaught, John; Miwa, Makoto; Ananiadou, Sophia

2015-01-14

The large and growing number of published studies, and their increasing rate of publication, makes the task of identifying relevant studies in an unbiased way for inclusion in systematic reviews both complex and time consuming. Text mining has been offered as a potential solution: through automating some of the screening process, reviewer time can be saved. The evidence base around the use of text mining for screening has not yet been pulled together systematically; this systematic review fills that research gap. Focusing mainly on non-technical issues, the review aims to increase awareness of the potential of these technologies and promote further collaborative research between the computer science and systematic review communities. Five research questions led our review: what is the state of the evidence base; how has workload reduction been evaluated; what are the purposes of semi-automation and how effective are they; how have key contextual problems of applying text mining to the systematic review field been addressed; and what challenges to implementation have emerged? We answered these questions using standard systematic review methods: systematic and exhaustive searching, quality-assured data extraction and a narrative synthesis to synthesise findings. The evidence base is active and diverse; there is almost no replication between studies or collaboration between research teams and, whilst it is difficult to establish any overall conclusions about best approaches, it is clear that efficiencies and reductions in workload are potentially achievable. On the whole, most suggested that a saving in workload of between 30% and 70% might be possible, though sometimes the saving in workload is accompanied by the loss of 5% of relevant studies (i.e. a 95% recall). Using text mining to prioritise the order in which items are screened should be considered safe and ready for use in 'live' reviews. The use of text mining as a 'second screener' may also be used cautiously
A Framework for Text Mining in Scientometric Study: A Case Study in Biomedicine Publications

Science.gov (United States)

Silalahi, V. M. M.; Hardiyati, R.; Nadhiroh, I. M.; Handayani, T.; Rahmaida, R.; Amelia, M.

2018-04-01

The data of Indonesians research publications in the domain of biomedicine has been collected to be text mined for the purpose of a scientometric study. The goal is to build a predictive model that provides a classification of research publications on the potency for downstreaming. The model is based on the drug development processes adapted from the literatures. An effort is described to build the conceptual model and the development of a corpus on the research publications in the domain of Indonesian biomedicine. Then an investigation is conducted relating to the problems associated with building a corpus and validating the model. Based on our experience, a framework is proposed to manage the scientometric study based on text mining. Our method shows the effectiveness of conducting a scientometric study based on text mining in order to get a valid classification model. This valid model is mainly supported by the iterative and close interactions with the domain experts starting from identifying the issues, building a conceptual model, to the labelling, validation and results interpretation.
tmBioC: improving interoperability of text-mining tools with BioC.

Science.gov (United States)

Khare, Ritu; Wei, Chih-Hsuan; Mao, Yuqing; Leaman, Robert; Lu, Zhiyong

2014-01-01

The lack of interoperability among biomedical text-mining tools is a major bottleneck in creating more complex applications. Despite the availability of numerous methods and techniques for various text-mining tasks, combining different tools requires substantial efforts and time owing to heterogeneity and variety in data formats. In response, BioC is a recent proposal that offers a minimalistic approach to tool interoperability by stipulating minimal changes to existing tools and applications. BioC is a family of XML formats that define how to present text documents and annotations, and also provides easy-to-use functions to read/write documents in the BioC format. In this study, we introduce our text-mining toolkit, which is designed to perform several challenging and significant tasks in the biomedical domain, and repackage the toolkit into BioC to enhance its interoperability. Our toolkit consists of six state-of-the-art tools for named-entity recognition, normalization and annotation (PubTator) of genes (GenNorm), diseases (DNorm), mutations (tmVar), species (SR4GN) and chemicals (tmChem). Although developed within the same group, each tool is designed to process input articles and output annotations in a different format. We modify these tools and enable them to read/write data in the proposed BioC format. We find that, using the BioC family of formats and functions, only minimal changes were required to build the newer versions of the tools. The resulting BioC wrapped toolkit, which we have named tmBioC, consists of our tools in BioC, an annotated full-text corpus in BioC, and a format detection and conversion tool. Furthermore, through participation in the 2013 BioCreative IV Interoperability Track, we empirically demonstrate that the tools in tmBioC can be more efficiently integrated with each other as well as with external tools: Our experimental results show that using BioC reduces >60% in lines of code for text-mining tool integration. The tmBioC toolkit
Rate of multilingual phonological acquisition: Evidence from a cross-sectional study of English-Mandarin-Malay.

Science.gov (United States)

Lim, Hui W; Wells, Bill; Howard, Sara

2015-01-01

Early child multilingual acquisition is under-explored. Using a cross-sectional study approach, the present research investigates the rate of multilingual phonological acquisition of English-Mandarin-Malay by 64 ethnic Chinese children aged 2;06-4;05 in Malaysia--a multiracial-multilingual country of Asia. The aims of the study are to provide clinical norms for speech development in the multilingual children and to compare multilingual acquisition with monolingual and bilingual acquisition. An innovative multilingual phonological test which adopts well-defined scoring criteria drawing upon local accents of English, Mandarin and Malay is proposed and described in this article. This procedure has been neglected in the few existing Chinese bilingual phonological acquisition studies resulting in peculiar findings. The multilingual children show comparable phonological acquisition milestones to that of monolingual and bilingual peers acquiring the same languages. The implications of the present results are discussed. The present findings contribute to the development of models and theories of child multilingual acquisition.
Business drivers and design choices for multilingual IVRs : A government service delivery case study

CSIR Research Space (South Africa)

Calteaux, K

2012-05-01

Full Text Available Multilingual emerging markets hold many opportunities for the application of spoken language technologies, such as interactive voice response (IVR) systems. Designing such systems requires an in-depth understanding of the business drivers...
Using text mining for study identification in systematic reviews: a systematic review of current approaches

OpenAIRE

O?Mara-Eves, Alison; Thomas, James; McNaught, John; Miwa, Makoto; Ananiadou, Sophia

2015-01-01

Background The large and growing number of published studies, and their increasing rate of publication, makes the task of identifying relevant studies in an unbiased way for inclusion in systematic reviews both complex and time consuming. Text mining has been offered as a potential solution: through automating some of the screening process, reviewer time can be saved. The evidence base around the use of text mining for screening has not yet been pulled together systematically; this systematic...
Language to Language: Nurturing Writing Development in Multilingual Classrooms

Science.gov (United States)

Shagoury, Ruth

2009-01-01

The author spent four years embedded in a multilingual kindergarten classroom in which children spoke six different languages and several more years observing multilingual Head Start classrooms. She shares numerous examples of young dual language learners actively figuring out the way written language works in their first and second languages.…

The Literacy Practices of "Transfronterizos" in a Multilingual World

Science.gov (United States)

de la Piedra, Maria Teresa; Guerra, Juan C.

2012-01-01

This introduction provides the background for this special issue by first describing the US-Mexico border, a fascinating context in which to research issues related to Spanish-English biliteracy and multilingualism. We present main points in the prevailing discussion within the field of literacy studies about issues of multilingualism and…
Multilingual Education: The Role of Language Ideologies and Attitudes

Science.gov (United States)

Liddicoat, Anthony J.; Taylor-Leech, Kerry

2015-01-01

This paper overviews issues relating to the role of ideologies and attitudes in multilingual education (MLE). It argues that ideologies and attitudes are constituent parts of the language planning process and shape the possibilities for multilingualism in educational programmes in complex ways, but most frequently work to constrain the ways that…
SOME ASPECTS REGARDING TRANSLATION DIVERGENCES BETWEEN THE AUTHENTIC TEXTS OF THE EUROPEAN UNION

Directory of Open Access Journals (Sweden)

Laura-Cristiana SPĂTARU-NEGURĂ

2014-05-01

Full Text Available When multiple legal orders and languages co-exist within a single legal regime, there is potential for divergences between the legal texts. The European Union represents on the international legal stage, the most ambitious linguistic project, integrating 28 Member States and 24 official languages. What we undertook with this study was to discover how the multilingual and multicultural environment of the European Union affects its legislative and judicial processes. We tried to argue the problem of translation divergences between the authentic texts of the European Union. Many questions arise. Is ‘controlled multilingualism’ the key to our problem? Is weak multilingualism the solution - especially that it is not new for the European construction? Should one language be chosen as the original? Of course that we have to see that multilingualism is an advantage, a blessing of the European Union and not an obstacle, a curse. We consider that, despite the various problems with the European multilingualism described in this study, it is unlikely that something would change in the foreseeable future. However, we consider that lawyers should research more in languages and legal interpretation. Interdisciplinary efforts could solve the multilingualism problems of the European Union. The present study is part of a more complex research on this theme and it is meant to approach certain important points of the master thesis prepared in Switzerland for a LL.M. program.
Learning to Read and Write in the Multilingual Family

Science.gov (United States)

Wang, Xiao-lei

2011-01-01

This book is a guide for parents who wish to raise children with more than one language and literacy. Drawing on interdisciplinary research, as well as the experiences of parents of multilingual children, this book walks parents through the multilingual reading and writing process from infancy to adolescence. It identifies essential literacy…
Complementing the Numbers: A Text Mining Analysis of College Course Withdrawals

Science.gov (United States)

Michalski, Greg V.

2011-01-01

Excessive college course withdrawals are costly to the student and the institution in terms of time to degree completion, available classroom space, and other resources. Although generally well quantified, detailed analysis of the reasons given by students for course withdrawal is less common. To address this, a text mining analysis was performed…
TIME SERIES ANALYSIS ON STOCK MARKET FOR TEXT MINING CORRELATION OF ECONOMY NEWS

Directory of Open Access Journals (Sweden)

Sadi Evren SEKER

2014-01-01

Full Text Available This paper proposes an information retrieval methodfor the economy news. Theeffect of economy news, are researched in the wordlevel and stock market valuesare considered as the ground proof.The correlation between stock market prices and economy news is an already ad-dressed problem for most of the countries. The mostwell-known approach is ap-plying the text mining approaches to the news and some time series analysis tech-niques over stock market closing values in order toapply classification or cluster-ing algorithms over the features extracted. This study goes further and tries to askthe question what are the available time series analysis techniques for the stockmarket closing values and which one is the most suitable? In this study, the newsand their dates are collected into a database and text mining is applied over thenews, the text mining part has been kept simple with only term frequency – in-verse document frequency method. For the time series analysis part, we havestudied 10 different methods such as random walk, moving average, acceleration,Bollinger band, price rate of change, periodic average, difference, momentum orrelative strength index and their variation. In this study we have also explainedthese techniques in a comparative way and we have applied the methods overTurkish Stock Market closing values for more than a2 year period. On the otherhand, we have applied the term frequency – inversedocument frequency methodon the economy news of one of the high-circulatingnewspapers in Turkey.
Crowd-Sourcing (Semantically) Structured Multilingual Educational Content (CoSMEC)

Science.gov (United States)

Tarasowa, Darya; Auer, Sören; Khalili, Ali; Unbehauen, Jörg

2014-01-01

The support of multilingual content becomes crucial for educational platforms due to the benefits it offers. In this paper we propose a concept that allows content authors to use the power of the crowd to create (semantically) structured multilingual educational content out of their material. To enable the collaboration of the crowd, we expand our…
A Circle of Learning: The impact of a narrative multilingualism approach on in-service teachers’ literacy pedagogies

Directory of Open Access Journals (Sweden)

Belinda Mendelowitz

2011-05-01

Full Text Available This paper explores the impact of a narrative multilingualism approach on in-service primary school teachers who attended the Advanced Certi"cate of Education (ACE Languages course at the University of the Witwatersrand in 2009. The teachers wrote their own language narratives and were required to implement language narrative work in their classrooms. The paper is a case study of three teachers’ implementation of multilingual narrative pedagogy, and explores the ways in which each teacher translates this pedagogy into their specific contexts. Theoretically, the paper attempts to deepen and extend narrative multilingualism as an approach to language teaching. The notions of uptake and pedagogical translation are explored at various levels, namely, the teachers’ uptake of a multilingual narrative approach and the learners’ uptake. The most striking aspect of the data, across all teachers, is the process and dynamics unleashed in the classroom space. The process of sharing language narratives reconfigured dynamics in the classroom and opened up the classroom space for teachers and learners. The interventions that the pedagogy of narrative multilingualism afforded enabled the validation of linguistic diversity. In a society where xenophobia and linguicism is prevalent, such interventions can play a valuable role in changing attitudes and teaching learners to value difference. Furthermore, previously silenced learners found their voices and participated more in class activities.
Sentiment analysis of Arabic tweets using text mining techniques

Science.gov (United States)

Al-Horaibi, Lamia; Khan, Muhammad Badruddin

2016-07-01

Sentiment analysis has become a flourishing field of text mining and natural language processing. Sentiment analysis aims to determine whether the text is written to express positive, negative, or neutral emotions about a certain domain. Most sentiment analysis researchers focus on English texts, with very limited resources available for other complex languages, such as Arabic. In this study, the target was to develop an initial model that performs satisfactorily and measures Arabic Twitter sentiment by using machine learning approach, Naïve Bayes and Decision Tree for classification algorithms. The datasets used contains more than 2,000 Arabic tweets collected from Twitter. We performed several experiments to check the performance of the two algorithms classifiers using different combinations of text-processing functions. We found that available facilities for Arabic text processing need to be made from scratch or improved to develop accurate classifiers. The small functionalities developed by us in a Python language environment helped improve the results and proved that sentiment analysis in the Arabic domain needs lot of work on the lexicon side.
Biomedical text mining for research rigor and integrity: tasks, challenges, directions.

Science.gov (United States)

Kilicoglu, Halil

2017-06-13

An estimated quarter of a trillion US dollars is invested in the biomedical research enterprise annually. There is growing alarm that a significant portion of this investment is wasted because of problems in reproducibility of research findings and in the rigor and integrity of research conduct and reporting. Recent years have seen a flurry of activities focusing on standardization and guideline development to enhance the reproducibility and rigor of biomedical research. Research activity is primarily communicated via textual artifacts, ranging from grant applications to journal publications. These artifacts can be both the source and the manifestation of practices leading to research waste. For example, an article may describe a poorly designed experiment, or the authors may reach conclusions not supported by the evidence presented. In this article, we pose the question of whether biomedical text mining techniques can assist the stakeholders in the biomedical research enterprise in doing their part toward enhancing research integrity and rigor. In particular, we identify four key areas in which text mining techniques can make a significant contribution: plagiarism/fraud detection, ensuring adherence to reporting guidelines, managing information overload and accurate citation/enhanced bibliometrics. We review the existing methods and tools for specific tasks, if they exist, or discuss relevant research that can provide guidance for future work. With the exponential increase in biomedical research output and the ability of text mining approaches to perform automatic tasks at large scale, we propose that such approaches can support tools that promote responsible research practices, providing significant benefits for the biomedical research enterprise. Published by Oxford University Press 2017. This work is written by a US Government employee and is in the public domain in the US.
Cluo: Web-Scale Text Mining System For Open Source Intelligence Purposes

Directory of Open Access Journals (Sweden)

Przemyslaw Maciolek

2013-01-01

Full Text Available The amount of textual information published on the Internet is considered tobe in billions of web pages, blog posts, comments, social media updates andothers. Analyzing such quantities of data requires high level of distribution –both data and computing. This is especially true in case of complex algorithms,often used in text mining tasks.The paper presents a prototype implementation of CLUO – an Open SourceIntelligence (OSINT system, which extracts and analyzes signiﬁcant quantitiesof openly available information.
New CALL-SLA Research Interfaces for the 21st Century: Towards Equitable Multilingualism

Science.gov (United States)

Ortega, Lourdes

2017-01-01

The majority of the world is multilingual, but inequitably multilingual, and much of the world is also technologized, but inequitably so. Thus, researchers in the fields of computer-assisted language learning (CALL) and second language acquisition (SLA) would profit from considering multilingualism and social justice when envisioning new CALL-SLA…
A Multilingual Approach to Analysing Standardized Test Results: Immigrant Primary School Children and the Role of Languages Spoken in a Bi-/Multilingual Community

Science.gov (United States)

De Angelis, Gessica

2014-01-01

The present study adopts a multilingual approach to analysing the standardized test results of primary school immigrant children living in the bi-/multilingual context of South Tyrol, Italy. The standardized test results are from the Invalsi test administered across Italy in 2009/2010. In South Tyrol, several languages are spoken on a daily basis…
Multilingual Development in Children with Autism: Perspectives of South Asian Muslim Immigrant Parents on Raising a Child with a Communicative Disorder in Multilingual Contexts

Science.gov (United States)

Jegatheesan, Brinda

2011-01-01

This study examined the perceptions of three Muslim families on multilingual development in their children with autism. Findings indicate that the families' goal of maintaining normalcy in their children's life could not be attained without immersion in multiple languages. They believe that immersion in multilingual contexts helped their children…
Beyond Identity: The Desirability and Possibility of Policies of Multilingualism

Science.gov (United States)

Rubin, Aviad

2017-01-01

Many contributors to the normative literature on language policy argue that inclusive multilingual regimes are beneficial on several grounds. However, despite the professed advantages of multilingualism, most nation-states have been reluctant to equally recognise minority languages alongside the majority language. This reality raises three…
Text Mining Metal-Organic Framework Papers.

Science.gov (United States)

Park, Sanghoon; Kim, Baekjun; Choi, Sihoon; Boyd, Peter G; Smit, Berend; Kim, Jihan

2018-02-26

We have developed a simple text mining algorithm that allows us to identify surface area and pore volumes of metal-organic frameworks (MOFs) using manuscript html files as inputs. The algorithm searches for common units (e.g., m 2 /g, cm 3 /g) associated with these two quantities to facilitate the search. From the sample set data of over 200 MOFs, the algorithm managed to identify 90% and 88.8% of the correct surface area and pore volume values. Further application to a test set of randomly chosen MOF html files yielded 73.2% and 85.1% accuracies for the two respective quantities. Most of the errors stem from unorthodox sentence structures that made it difficult to identify the correct data as well as bolded notations of MOFs (e.g., 1a) that made it difficult identify its real name. These types of tools will become useful when it comes to discovering structure-property relationships among MOFs as well as collecting a large set of data for references.
Application of Ferulic Acid for Alzheimer's Disease: Combination of Text Mining and Experimental Validation.

Science.gov (United States)

Meng, Guilin; Meng, Xiulin; Ma, Xiaoye; Zhang, Gengping; Hu, Xiaolin; Jin, Aiping; Zhao, Yanxin; Liu, Xueyuan

2018-01-01

Alzheimer's disease (AD) is an increasing concern in human health. Despite significant research, highly effective drugs to treat AD are lacking. The present study describes the text mining process to identify drug candidates from a traditional Chinese medicine (TCM) database, along with associated protein target mechanisms. We carried out text mining to identify literatures that referenced both AD and TCM and focused on identifying compounds and protein targets of interest. After targeting one potential TCM candidate, corresponding protein-protein interaction (PPI) networks were assembled in STRING to decipher the most possible mechanism of action. This was followed by validation using Western blot and co-immunoprecipitation in an AD cell model. The text mining strategy using a vast amount of AD-related literature and the TCM database identified curcumin, whose major component was ferulic acid (FA). This was used as a key candidate compound for further study. Using the top calculated interaction score in STRING, BACE1 and MMP2 were implicated in the activity of FA in AD. Exposure of SHSY5Y-APP cells to FA resulted in the decrease in expression levels of BACE-1 and APP, while the expression of MMP-2 and MMP-9 increased in a dose-dependent manner. This suggests that FA induced BACE1 and MMP2 pathways maybe novel potential mechanisms involved in AD. The text mining of literature and TCM database related to AD suggested FA as a promising TCM ingredient for the treatment of AD. Potential mechanisms interconnected and integrated with Aβ aggregation inhibition and extracellular matrix remodeling underlying the activity of FA were identified using in vitro studies.
Text Mining for Precision Medicine: Bringing structure to EHRs and biomedical literature to understand genes and health

Science.gov (United States)

Simmons, Michael; Singhal, Ayush; Lu, Zhiyong

2018-01-01

The key question of precision medicine is whether it is possible to find clinically actionable granularity in diagnosing disease and classifying patient risk. The advent of next generation sequencing and the widespread adoption of electronic health records (EHRs) have provided clinicians and researchers a wealth of data and made possible the precise characterization of individual patient genotypes and phenotypes. Unstructured text — found in biomedical publications and clinical notes — is an important component of genotype and phenotype knowledge. Publications in the biomedical literature provide essential information for interpreting genetic data. Likewise, clinical notes contain the richest source of phenotype information in EHRs. Text mining can render these texts computationally accessible and support information extraction and hypothesis generation. This chapter reviews the mechanics of text mining in precision medicine and discusses several specific use cases, including database curation for personalized cancer medicine, patient outcome prediction from EHR-derived cohorts, and pharmacogenomic research. Taken as a whole, these use cases demonstrate how text mining enables effective utilization of existing knowledge sources and thus promotes increased value for patients and healthcare systems. Text mining is an indispensable tool for translating genotype-phenotype data into effective clinical care that will undoubtedly play an important role in the eventual realization of precision medicine. PMID:27807747
Text Mining for Precision Medicine: Bringing Structure to EHRs and Biomedical Literature to Understand Genes and Health.

Science.gov (United States)

Simmons, Michael; Singhal, Ayush; Lu, Zhiyong

2016-01-01

The key question of precision medicine is whether it is possible to find clinically actionable granularity in diagnosing disease and classifying patient risk. The advent of next-generation sequencing and the widespread adoption of electronic health records (EHRs) have provided clinicians and researchers a wealth of data and made possible the precise characterization of individual patient genotypes and phenotypes. Unstructured text-found in biomedical publications and clinical notes-is an important component of genotype and phenotype knowledge. Publications in the biomedical literature provide essential information for interpreting genetic data. Likewise, clinical notes contain the richest source of phenotype information in EHRs. Text mining can render these texts computationally accessible and support information extraction and hypothesis generation. This chapter reviews the mechanics of text mining in precision medicine and discusses several specific use cases, including database curation for personalized cancer medicine, patient outcome prediction from EHR-derived cohorts, and pharmacogenomic research. Taken as a whole, these use cases demonstrate how text mining enables effective utilization of existing knowledge sources and thus promotes increased value for patients and healthcare systems. Text mining is an indispensable tool for translating genotype-phenotype data into effective clinical care that will undoubtedly play an important role in the eventual realization of precision medicine.
Automatic detection of adverse events to predict drug label changes using text and data mining techniques.

Science.gov (United States)

Gurulingappa, Harsha; Toldo, Luca; Rajput, Abdul Mateen; Kors, Jan A; Taweel, Adel; Tayrouz, Yorki

2013-11-01

The aim of this study was to assess the impact of automatically detected adverse event signals from text and open-source data on the prediction of drug label changes. Open-source adverse effect data were collected from FAERS, Yellow Cards and SIDER databases. A shallow linguistic relation extraction system (JSRE) was applied for extraction of adverse effects from MEDLINE case reports. Statistical approach was applied on the extracted datasets for signal detection and subsequent prediction of label changes issued for 29 drugs by the UK Regulatory Authority in 2009. 76% of drug label changes were automatically predicted. Out of these, 6% of drug label changes were detected only by text mining. JSRE enabled precise identification of four adverse drug events from MEDLINE that were undetectable otherwise. Changes in drug labels can be predicted automatically using data and text mining techniques. Text mining technology is mature and well-placed to support the pharmacovigilance tasks. Copyright © 2013 John Wiley & Sons, Ltd.

Developments in the Multilingual and Multicultural Learning Space

DEFF Research Database (Denmark)

Lauridsen, Karen M.; Cozart, Stacey Marie; Kling, Joyce

Uni project (2012-15) recommends that higher education institutions (HEI) provide ‘the necessary professional development and teacher training programmes that will allow HE teachers to appropriately develop (…) their professional and pedagogical knowledge, skills and competences and thereby empower them...... to ensure the quality of their teaching – and their students’ learning – in the multilingual and multicultural learning space’ (www.intluni.eu; Carroll 2015; Leask 2015). For many universities and other HEIs around the world, the multilingual and multicultural classroom is the new – or no longer quite so...... platform with resources targeted at EDs responsible for advancing faculty development in this area. In this session, the presenters will report on the first outcomes of EQUiiP. Participants will then be invited to interact and explore best practices in the multilingual and multicultural learning space...
Coronary artery disease risk assessment from unstructured electronic health records using text mining.

Science.gov (United States)

Jonnagaddala, Jitendra; Liaw, Siaw-Teng; Ray, Pradeep; Kumar, Manish; Chang, Nai-Wen; Dai, Hong-Jie

2015-12-01

Coronary artery disease (CAD) often leads to myocardial infarction, which may be fatal. Risk factors can be used to predict CAD, which may subsequently lead to prevention or early intervention. Patient data such as co-morbidities, medication history, social history and family history are required to determine the risk factors for a disease. However, risk factor data are usually embedded in unstructured clinical narratives if the data is not collected specifically for risk assessment purposes. Clinical text mining can be used to extract data related to risk factors from unstructured clinical notes. This study presents methods to extract Framingham risk factors from unstructured electronic health records using clinical text mining and to calculate 10-year coronary artery disease risk scores in a cohort of diabetic patients. We developed a rule-based system to extract risk factors: age, gender, total cholesterol, HDL-C, blood pressure, diabetes history and smoking history. The results showed that the output from the text mining system was reliable, but there was a significant amount of missing data to calculate the Framingham risk score. A systematic approach for understanding missing data was followed by implementation of imputation strategies. An analysis of the 10-year Framingham risk scores for coronary artery disease in this cohort has shown that the majority of the diabetic patients are at moderate risk of CAD. Copyright © 2015 Elsevier Inc. All rights reserved.
Text mining applied to electronic cardiovascular procedure reports to identify patients with trileaflet aortic stenosis and coronary artery disease.

Science.gov (United States)

Small, Aeron M; Kiss, Daniel H; Zlatsin, Yevgeny; Birtwell, David L; Williams, Heather; Guerraty, Marie A; Han, Yuchi; Anwaruddin, Saif; Holmes, John H; Chirinos, Julio A; Wilensky, Robert L; Giri, Jay; Rader, Daniel J

2017-08-01

Interrogation of the electronic health record (EHR) using billing codes as a surrogate for diagnoses of interest has been widely used for clinical research. However, the accuracy of this methodology is variable, as it reflects billing codes rather than severity of disease, and depends on the disease and the accuracy of the coding practitioner. Systematic application of text mining to the EHR has had variable success for the detection of cardiovascular phenotypes. We hypothesize that the application of text mining algorithms to cardiovascular procedure reports may be a superior method to identify patients with cardiovascular conditions of interest. We adapted the Oracle product Endeca, which utilizes text mining to identify terms of interest from a NoSQL-like database, for purposes of searching cardiovascular procedure reports and termed the tool "PennSeek". We imported 282,569 echocardiography reports representing 81,164 individuals and 27,205 cardiac catheterization reports representing 14,567 individuals from non-searchable databases into PennSeek. We then applied clinical criteria to these reports in PennSeek to identify patients with trileaflet aortic stenosis (TAS) and coronary artery disease (CAD). Accuracy of patient identification by text mining through PennSeek was compared with ICD-9 billing codes. Text mining identified 7115 patients with TAS and 9247 patients with CAD. ICD-9 codes identified 8272 patients with TAS and 6913 patients with CAD. 4346 patients with AS and 6024 patients with CAD were identified by both approaches. A randomly selected sample of 200-250 patients uniquely identified by text mining was compared with 200-250 patients uniquely identified by billing codes for both diseases. We demonstrate that text mining was superior, with a positive predictive value (PPV) of 0.95 compared to 0.53 by ICD-9 for TAS, and a PPV of 0.97 compared to 0.86 for CAD. These results highlight the superiority of text mining algorithms applied to electronic
Rewriting traditional tales as multilingual narratives at elementary school: Problems and progress

Directory of Open Access Journals (Sweden)

Heather Lotherington

2007-08-01

Full Text Available Abstract For several years children at Joyce Public School have been rewriting traditional stories from localized cultural and linguistic perspectives, creating innovative, individualized narrative forms with digital technology. Our experimental multiliteracies research project is a collaboration of school and university teachers and researchers following a guided action research paradigm. The study has as one of its stated objectives the development of multilingual story retelling as a means of inexpensively supporting home language maintenance, fostering language awareness and aiding English as a second language learning in a community of high linguistic diversity. This paper tells our story thus far, focusing on how we have approached the creation of multilingual stories in heterogeneous, urban language classes, discussing stumbling blocks that have forced creative problem-solving and showcasing successes.
Why Multilingualism and Multilingual Communication Jeopardize a Common Social Policy for Europe

Directory of Open Access Journals (Sweden)

Marácz László

2017-09-01

Full Text Available This paper studies the consequences of European multilingualism and multilingual communication for a common social policy in the Europe Union. In the past fifty years, the main focus of the Europeanization project has been on financial-economic developments and less on a common social policy. Even today, there is no common framework for social protection in the European Union. Common minimum income or wages for European citizens are lacking. In this paper, it will be argued that the lack of social protection has to do with Europe’s linguistic diversity. Language is seen as a building block of national communities and their political cultures. The European integration project can only continue if different European political cultures are shared. However, due to the fact that a neutral lingua franca is lacking, this has been unsuccessful so far. The interaction of social groups that have a different language repertoire with the structures of multilevel governance are responsible for the fact that some of these social groups, including the ‘Eurostars’, and national cosmopolitans benefit from social protection, whereas other groups lacking relevant language skills, such as anti-establishment forces, commoners, and migrants, are excluded from the European power domains. These power configurations can be fruitfully studied in the floral figuration model. Consequently, due to these patterns of inclusion and exclusion, true solidarity among European citizens is not within reach. These claims will be illustrated by a case study on the Netherlands, a country that has been pursuing neoliberal policies counterbalancing Eurozone and economic crises and is trying to assimilate migrants and other newcomers. Apart from assimilatory policies targeting migrants, language games used by competing forces are playing an important role in the discourse in order to set up power structures.
Multilingualism in Southern Africa.

Science.gov (United States)

Peirce, Bonny Norton; Ridge, Stanley G. M.

1997-01-01

Reviews recent research in multilingualism in Southern Africa, focusing on the role of languages in education, sociolinguistics, and language policy. Much of the research is on South Africa. Topics discussed include language of instruction in schools, teacher education, higher education, adult literacy, language contact, gender and linguistic…
Literacy at a Distance in Multilingual Contexts: Issues and Challenges

Directory of Open Access Journals (Sweden)

Christine I. Ofulue

2011-10-01

Full Text Available Literacy is perhaps the most fundamental skill required for effective participation in education (formal and non-formal for national development. At the same time, the choice of language for literacy is a complex issue in multilingual societies like Nigeria. This paper examines the issues involved, namely language policy, language and teacher development, and the role of distance education and information and communication technologies (ICTs, in making literacy accessible in as many languages as possible. Two distance learning literacy projects are presented as case studies and the lessons learned are discussed. The findings of this study suggest that although there is evidence of growing accessibility to ICTs like mobile phones, their use and success to increase access to literacy in the users’ languages are yet to be attained and maximised. The implication of the lessons learned should be relevant to other multilingual nations that seek the goal of increasing access to learning and promoting development so as to harvest economic benefits.
Constructions of the literacy competence levels of multilingual students

DEFF Research Database (Denmark)

Holm, Lars

2017-01-01

discourse about the validity of standardised literacy testing of multilingual students. These findings give reason to question and discuss equality oriented educational programmes and strategies for multilingual students in which standarised literacy testing plays a central role, and to discuss ethical...... issues around the production and use of standarised literacy tests in educational contexts which are characterised by linguistic diversity....
The need for an electronic multilingual dictionary

Directory of Open Access Journals (Sweden)

Anna Kisiel

2014-09-01

Full Text Available The need for an electronic multilingual dictionary The paper analyses the issue of providing adequate equivalents in multilingual dictionaries. If equivalents are adequate, it means that: (1 the scope of meaning of one item is identical to its equivalent (cf. drive: drive a nail vs. drive a car; and (2 the collocations of the equivalents overlap. Two significant problems arise when searching for adequate equivalents: the lack of equivalents whose meanings are identical (narrower/wider meanings, partial overlap of meanings, more than equally good equivalent, and equivalents with homographs in a given language. Because such issues are difficult to resolve in a printed dictionary, we put forward some methods of addressing the problems in an electronic dictionary. The paper offers an example entry from such a dictionary, which presents a suggestion of a layout. We also took into consideration the potential problems which may appear if the entry is presented in this manner: first, one must set a limit for the description (a defined number of lexical units; second, one must avoid circularity, but at the same time also strive for an exhaustive description. Electronic dictionaries offer greater possibilities of presenting modern vocabulary and adding new classifiers (e.g. a classifier of politeness.
Multilingual Children's Interaction with Metafiction in a Postmodern Picture Book

Science.gov (United States)

Daugaard, Line Møller; Johansen, Martin Blok

2014-01-01

When teachers and school librarians choose picture books for multilingual children, they often base their choice on an evaluation of linguistic comprehensibility, content familiarity and cultural appropriateness. This means that postmodern picture books may be excluded. This paper presents a case study of multilingual children's encounter with a…
Working Memory and Short-Term Memory Abilities in Accomplished Multilinguals

Science.gov (United States)

Biedron, Adriana; Szczepaniak, Anna

2012-01-01

The role of short-term memory and working memory in accomplished multilinguals was investigated. Twenty-eight accomplished multilinguals were compared to 36 mainstream philology students. The following instruments were used in the study: three memory subtests of the Wechsler Intelligence Scale (Digit Span, Digit-Symbol Coding, and Arithmetic,…
Systematic analysis of molecular mechanisms for HCC metastasis via text mining approach.

Science.gov (United States)

Zhen, Cheng; Zhu, Caizhong; Chen, Haoyang; Xiong, Yiru; Tan, Junyuan; Chen, Dong; Li, Jin

2017-02-21

To systematically explore the molecular mechanism for hepatocellular carcinoma (HCC) metastasis and identify regulatory genes with text mining methods. Genes with highest frequencies and significant pathways related to HCC metastasis were listed. A handful of proteins such as EGFR, MDM2, TP53 and APP, were identified as hub nodes in PPI (protein-protein interaction) network. Compared with unique genes for HBV-HCCs, genes particular to HCV-HCCs were less, but may participate in more extensive signaling processes. VEGFA, PI3KCA, MAPK1, MMP9 and other genes may play important roles in multiple phenotypes of metastasis. Genes in abstracts of HCC-metastasis literatures were identified. Word frequency analysis, KEGG pathway and PPI network analysis were performed. Then co-occurrence analysis between genes and metastasis-related phenotypes were carried out. Text mining is effective for revealing potential regulators or pathways, but the purpose of it should be specific, and the combination of various methods will be more useful.
Is ERASMUS furthering multilingualism?

DEFF Research Database (Denmark)

Petersen, Margrethe

One aim of the ERASMUS program is the furthering of multilingualism in Europe. This paper examines under what conditions the aim is achieved in the case of non-language exchange students coming to Scandinavia. The paper draws on a longitudinal study involving interviews with, and tests done by, 240...
The Potential of Text Mining in Data Integration and Network Biology for Plant Research: A Case Study on Arabidopsis[C][W

Science.gov (United States)

Van Landeghem, Sofie; De Bodt, Stefanie; Drebert, Zuzanna J.; Inzé, Dirk; Van de Peer, Yves

2013-01-01

Despite the availability of various data repositories for plant research, a wealth of information currently remains hidden within the biomolecular literature. Text mining provides the necessary means to retrieve these data through automated processing of texts. However, only recently has advanced text mining methodology been implemented with sufficient computational power to process texts at a large scale. In this study, we assess the potential of large-scale text mining for plant biology research in general and for network biology in particular using a state-of-the-art text mining system applied to all PubMed abstracts and PubMed Central full texts. We present extensive evaluation of the textual data for Arabidopsis thaliana, assessing the overall accuracy of this new resource for usage in plant network analyses. Furthermore, we combine text mining information with both protein–protein and regulatory interactions from experimental databases. Clusters of tightly connected genes are delineated from the resulting network, illustrating how such an integrative approach is essential to grasp the current knowledge available for Arabidopsis and to uncover gene information through guilt by association. All large-scale data sets, as well as the manually curated textual data, are made publicly available, hereby stimulating the application of text mining data in future plant biology studies. PMID:23532071
Can abstract screening workload be reduced using text mining? User experiences of the tool Rayyan.

Science.gov (United States)

Olofsson, Hanna; Brolund, Agneta; Hellberg, Christel; Silverstein, Rebecca; Stenström, Karin; Österberg, Marie; Dagerhamn, Jessica

2017-09-01

One time-consuming aspect of conducting systematic reviews is the task of sifting through abstracts to identify relevant studies. One promising approach for reducing this burden uses text mining technology to identify those abstracts that are potentially most relevant for a project, allowing those abstracts to be screened first. To examine the effectiveness of the text mining functionality of the abstract screening tool Rayyan. User experiences were collected. Rayyan was used to screen abstracts for 6 reviews in 2015. After screening 25%, 50%, and 75% of the abstracts, the screeners logged the relevant references identified. A survey was sent to users. After screening half of the search result with Rayyan, 86% to 99% of the references deemed relevant to the study were identified. Of those studies included in the final reports, 96% to 100% were already identified in the first half of the screening process. Users rated Rayyan 4.5 out of 5. The text mining function in Rayyan successfully helped reviewers identify relevant studies early in the screening process. Copyright © 2017 John Wiley & Sons, Ltd.
Politics and Policies of Promoting Multilingualism in the European Union

Science.gov (United States)

Romaine, Suzanne

2013-01-01

This article examines the politics of policies promoting multilingualism in the European Union (EU), specifically in light of the recently released European Union Civil Society Platform on Multilingualism. As the most far-reaching and ambitious policy document issued by the European Commission, the Platform warrants close scrutiny at a significant…
The Multilingual Apple: Languages in New York City. Second Edition.

Science.gov (United States)

Garcia, Ofelia, Ed.; Fishman, Joshua A., Ed.

This collection of papers tells the story of how languages other than English have contributed to making New York City a culturally vibrant and linguistically diverse city. Part 1, "Introduction to the Multilingual Apple," features "New York's Multilingualism: World Languages and Their Role in a U.S. City" (Ofelia Garcia). Part…
Opening up towards Children's Languages: Enhancing Teachers' Tolerant Practices towards Multilingualism

Science.gov (United States)

Van Der Wildt, Anouk; Van Avermaet, Piet; Van Houtte, Mieke

2017-01-01

Mainstream teachers struggle with linguistic diversity, often leading to restricting multilingualism. Scientific research, however, recommends including pupils' home languages in school. Various qualitative studies have evaluated implementations in schools and indicated possibilities for improving teachers' attitudes towards multilingualism. This…
English in the multilingual classroom: implications for research, policy and practice

Directory of Open Access Journals (Sweden)

Janina Brutt-Griffler

2017-11-01

Full Text Available Purpose – The shift in the function of English as a medium of instruction together with its use in knowledge construction and dissemination among scholars continue to fuel the global demand for high-level proficiency in the language. These components of the global knowledge economy mean that the ability of nations to produce multilinguals with advanced English proficiency alongside their mastery of other languages has become a key to global competitiveness. That need is helping to drive one of the greatest language learning experiments the world has ever known. It carries significant implications for new research agendas and teacher preparation in applied linguistics. Design/methodology/approach – Evidence-based decision-making, whether it pertains to language policy decisions, instructional practices, teacher professional development or curricula/program building, needs to be based on a rigorous and systematically pursued program of research and assessment. Findings – This paper seeks to advance these objectives by identifying new research foci that underscore a student-centered approach. Originality/value – It introduces a new theoretical construct – multilingual proficiency – to underscore the knowledge that the learner develops in the process of language learning that makes for the surest route to the desired high levels of language proficiency. The paper highlights the advantages of a student-centered approach that focuses on multilingual proficiency for teachers and explores the concomitant conclusions for teacher development.
The Feasibility of Using Large-Scale Text Mining to Detect Adverse Childhood Experiences in a VA-Treated Population.

Science.gov (United States)

Hammond, Kenric W; Ben-Ari, Alon Y; Laundry, Ryan J; Boyko, Edward J; Samore, Matthew H

2015-12-01

Free text in electronic health records resists large-scale analysis. Text records facts of interest not found in encoded data, and text mining enables their retrieval and quantification. The U.S. Department of Veterans Affairs (VA) clinical data repository affords an opportunity to apply text-mining methodology to study clinical questions in large populations. To assess the feasibility of text mining, investigation of the relationship between exposure to adverse childhood experiences (ACEs) and recorded diagnoses was conducted among all VA-treated Gulf war veterans, utilizing all progress notes recorded from 2000-2011. Text processing extracted ACE exposures recorded among 44.7 million clinical notes belonging to 243,973 veterans. The relationship of ACE exposure to adult illnesses was analyzed using logistic regression. Bias considerations were assessed. ACE score was strongly associated with suicide attempts and serious mental disorders (ORs = 1.84 to 1.97), and less so with behaviorally mediated and somatic conditions (ORs = 1.02 to 1.36) per unit. Bias adjustments did not remove persistent associations between ACE score and most illnesses. Text mining to detect ACE exposure in a large population was feasible. Analysis of the relationship between ACE score and adult health conditions yielded patterns of association consistent with prior research. Copyright © 2015 International Society for Traumatic Stress Studies.

Biography, policy and language teaching practices in a multilingual context: Early childhood classrooms in Mauritius

Directory of Open Access Journals (Sweden)

Aruna Ankiah-Gangadeen

2014-12-01

Full Text Available Language policies in education in multilingual postcolonial contexts are often driven by ideological considerations more veered towards socio-economic and political viability for the country than towards the practicality at implementation level. Centuries after the advent of colonisation, when culturally and linguistically homogenous countries helped to maintain the dominion of colonisers, the English language still has a stronghold in numerous countries due to the material rewards it offers. How then are the diversity of languages – often with different statuses and functions in society – reconciled in the teaching and learning process? How do teachers deal with the intricacies that are generated within a situation where children are taught in a language that is foreign to them? This paper is based on a study involving pre-primary teachers in Mauritius, a developing multilingual African country. The aim was to understand how their approach to the teaching of English was shaped by their biographical experiences of learning the language. The narrative inquiry methodology offered rich possibilities to foray into these experiences, including the manifestations of negotiating their classroom pedagogy in relation to their own personal historical biographies of language teaching and learning, the policy environment, and the pragmatic classroom specificities of diverse, multilingual learners. These insights become resources for early childhood education and teacher development in multilingual contexts caught within the tensions between language policy and pedagogy.
Multilingual Children Increase Language Differentiation by Indexing Communities of Practice

Science.gov (United States)

O'Shannessy, Carmel

2015-01-01

An area in need of study in child language acquisition is that of complex multilingual contexts in which there is little language separation by interlocutor or domain. Little is known about how multilingual children use language to construct their identities in each language or in both languages. Identity construction in monolingual contexts has…
Emerging Multilingual Awareness in Educational Contexts: From Theory to Practice

Science.gov (United States)

Jessner, Ulrike; Allgäuer-Hackl, Elisabeth; Hofer, Barbara

2016-01-01

The aim of this article is to stress the importance of a dynamic systems or complexity theory approach as a necessary prerequisite to understanding the development of multi-competence in multilingual learners. Selected results from a study on emergent multilingual awareness in children, carried out in South Tyrol, are outlined and discussed. The…
Contextualizing Multilingualism in Morocco

Science.gov (United States)

Daniel, Mayra C.; Ball, Alexis

2009-01-01

This article discusses the educational system of Morocco and the ways the country's multilingual history has influenced and continues to direct the choice of the languages used in schools. Suggestions that will eliminate cultural mismatch and thus facilitate interactions with Moroccan students and their families are included. The research focuses…
Honoring and Building on the Rich Literacy Practices of Young Bilingual and Multilingual Learners

Science.gov (United States)

Souto-Manning, Mariana

2016-01-01

In this article, the author invites teachers of children who are bilingual, multilingual, and at promise for bi-/multilingualism to honor and build on their rich literacy practices. To do so, she challenges ideas and labels that continuously disempower bilingual and multilingual learners. Souto-Manning establishes the understanding that education…
Multilingual educational trends and practices in Lebanon: A case study

Science.gov (United States)

Bahous, Rima; Bacha, Nahla Nola; Nabhani, Mona

2011-12-01

This paper reports on the multilingual background, language education policies and practices in Lebanon. Specifically, it shows how the multilingual make-up in the country is translated into language policies in schools. A survey of 30 private school principals, middle managers and teachers was administered online to obtain their views on school policies, problems, successes, concerns and quality ranking. Results showed that a great deal of work has been done to introduce a language of instruction and a third language as decreed by the Ministry of Education and at the same time keep the national language, Arabic, alive. The main concerns of the participants were the need for teacher training programmes and resources. Although the research implies that the school systems, in keeping up with this multilingual milieu, could be contributing to the death of the national language as well as producing students who are not fluent in any of the languages, there continues to be an attempt to keep alive a quality multilingual educational context which contributes to a cohesive society.
Multilingual and social semiotic perspectives on literacy learning and teaching

DEFF Research Database (Denmark)

Laursen, Helle Pia

to the complex processes involved in biliterate meaning making and script learning. Multilingual and social semiotic perspectives on literacy learning and teaching – summaryOn the basis of data from the longitudinal study Signs of Language, I focus on how a social semiotic perspective on literacy learning...... and teaching can contribute to expanding the conceptualization of literacy to be more sensitive to the complex processes involved in biliterate meaning making and script learning.......Multilingual and social semiotic perspectives on literacy learning and teaching – abstract In the context of an increasing multilingualism, literacy teaching has become a central and contested issue in public and political debate. International comparisons of levels of literacy have been...
Towards A Model Of Knowledge Extraction Of Text Mining For Palliative Care Patients In Panama.

Directory of Open Access Journals (Sweden)

Denis Cedeno Moreno

2015-08-01

Full Text Available Solutions using information technology is an innovative way to manage the information hospice patients in hospitals in Panama. The application of techniques of text mining for the domain of medicine especially information from electronic health records of patients in palliative care is one of the most recent and promising research areas for the analysis of textual data. Text mining is based on new knowledge extraction from unstructured natural language data. We may also create ontologies to describe the terminology and knowledge in a given domain. In an ontology conceptualization of a domain that may be general or specific formalized. Knowledge can be used for decision making by health specialists or can help in research topics for improving the health system.
Text mining for literature review and knowledge discovery in cancer risk assessment and research.

Directory of Open Access Journals (Sweden)

Anna Korhonen

Full Text Available Research in biomedical text mining is starting to produce technology which can make information in biomedical literature more accessible for bio-scientists. One of the current challenges is to integrate and refine this technology to support real-life scientific tasks in biomedicine, and to evaluate its usefulness in the context of such tasks. We describe CRAB - a fully integrated text mining tool designed to support chemical health risk assessment. This task is complex and time-consuming, requiring a thorough review of existing scientific data on a particular chemical. Covering human, animal, cellular and other mechanistic data from various fields of biomedicine, this is highly varied and therefore difficult to harvest from literature databases via manual means. Our tool automates the process by extracting relevant scientific data in published literature and classifying it according to multiple qualitative dimensions. Developed in close collaboration with risk assessors, the tool allows navigating the classified dataset in various ways and sharing the data with other users. We present a direct and user-based evaluation which shows that the technology integrated in the tool is highly accurate, and report a number of case studies which demonstrate how the tool can be used to support scientific discovery in cancer risk assessment and research. Our work demonstrates the usefulness of a text mining pipeline in facilitating complex research tasks in biomedicine. We discuss further development and application of our technology to other types of chemical risk assessment in the future.
Direct and indirect effects of multilingualism on novel language learning: An integrative review.

Science.gov (United States)

Hirosh, Zoya; Degani, Tamar

2017-05-25

Accumulated recent research suggests that prior knowledge of multiple languages leads to advantages in learning additional languages. In the current article, we review studies examining potential differences between monolingual and multilingual speakers in novel language learning in an effort to uncover the cognitive mechanisms that underlie such differences. We examine the multilingual advantage in children and adults, across a wide array of languages and learner populations. The majority of this literature focused on vocabulary learning, but studies that address phonology, grammar, and literacy learning are also discussed to provide a comprehensive picture of the way in which multilingualism affects novel language learning. Our synthesis indicates two avenues to the multilingual advantage including direct transfer of prior knowledge and prior skills as well as indirect influences that result from multilingual background and include more general changes to the cognitive-linguistic system. Finally, we highlight topics that are in need of future systematic research.
Multilingual Access to Cultural Heritage Resources

Directory of Open Access Journals (Sweden)

Irina Oberländer-Târnoveanu

2005-09-01

Full Text Available For the visitor to the ARENA Portal for Archaeological Records of Europe Networked Access, the first option is to choose the language of the interface: Danish, English, Icelandic, Polish, Norwegian or Romanian. These are the languages of the six partners in the European project developed between 2001 and 2004. We expect a significant number of visitors from these countries, which made the choice of each respective mother tongue a natural one. Is the option of several languages just a courtesy for our public? It is more than that - it is a tool to facilitate access to multilingual archaeological information. Before we were ready for visitors to our sites, we had to understand each other, to index our digital resources using common terms, to find the right equivalents for archaeological realities described in several languages, to explain the concepts behind the words. Language is related to culture, identity and memory. There is a growing concern about the dominance of English as a global language of communication, while probably the majority of known languages are in danger of disappearing and cultural diversity is menaced. If we wish to make cultural heritage resources accessible to more people and to share knowledge, language is a key. My article is an attempt to address these issues. I will explore the role of language in scientific communication, multilingualism on the Internet, language policies, and also have a closer look at terminological tools for cultural heritage, especially for archaeology.
Shaping Tourist LL: Language Display and the Sociolinguistic Background of an International Multilingual Readership

Science.gov (United States)

Bruyèl-Olmedo, Antonio; Juan-Garau, Maria

2015-01-01

Linguistic landscape studies increasingly focus on the variables that intertwine to generate the meaning of texts on display. International tourist resorts, largely multilingual, reveal how languages in signage combine and respond to the sociolinguistic profile of their readership. However, these settings have received scant attention in the…
Turbulence and Dilemma: Implications of Diversity and Multilingualism in Australian Education

Science.gov (United States)

Heugh, Kathleen

2014-01-01

An international interest in multilingualism and multilingual education has burgeoned since the turn of the twenty-first century, accompanying apparently significant changes in the physical and virtual mobilities of people, international frameworks, and commitments and goals for socially just education. It has also accompanied major political…
Web services-based text-mining demonstrates broad impacts for interoperability and process simplification.

Science.gov (United States)

Wiegers, Thomas C; Davis, Allan Peter; Mattingly, Carolyn J

2014-01-01

The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation tasks collectively represent a community-wide effort to evaluate a variety of text-mining and information extraction systems applied to the biological domain. The BioCreative IV Workshop included five independent subject areas, including Track 3, which focused on named-entity recognition (NER) for the Comparative Toxicogenomics Database (CTD; http://ctdbase.org). Previously, CTD had organized document ranking and NER-related tasks for the BioCreative Workshop 2012; a key finding of that effort was that interoperability and integration complexity were major impediments to the direct application of the systems to CTD's text-mining pipeline. This underscored a prevailing problem with software integration efforts. Major interoperability-related issues included lack of process modularity, operating system incompatibility, tool configuration complexity and lack of standardization of high-level inter-process communications. One approach to potentially mitigate interoperability and general integration issues is the use of Web services to abstract implementation details; rather than integrating NER tools directly, HTTP-based calls from CTD's asynchronous, batch-oriented text-mining pipeline could be made to remote NER Web services for recognition of specific biological terms using BioC (an emerging family of XML formats) for inter-process communications. To test this concept, participating groups developed Representational State Transfer /BioC-compliant Web services tailored to CTD's NER requirements. Participants were provided with a comprehensive set of training materials. CTD evaluated results obtained from the remote Web service-based URLs against a test data set of 510 manually curated scientific articles. Twelve groups participated in the challenge. Recall, precision, balanced F-scores and response times were calculated. Top balanced F-scores for gene, chemical and
Does multilingualism affect the incidence of Alzheimer’s disease?: A worldwide analysis by country

Directory of Open Access Journals (Sweden)

Raymond M. Klein

2016-12-01

Full Text Available It has been suggested that the cognitive requirements associated with bi- and multilingual processing provide a form of mental exercise that, through increases in cognitive reserve and brain fitness, may delay the symptoms of cognitive failure associated with Alzheimer′s disease and other forms of dementia. We collected data on a country-by-country basis that might shed light on this suggestion. Using the best available evidence we could find, the somewhat mixed results we obtained provide tentative support for the protective benefits of multilingualism against cognitive decline. But more importantly, this study exposes a critical issue, which is the need for more comprehensive and more appropriate data on the subject. Keywords: Bilingualism, Alzheimer's disease, Dementia, Brain reserve
Mining for associations between text and brain activation in a functional neuroimaging database

DEFF Research Database (Denmark)

Nielsen, Finn Årup; Hansen, Lars Kai; Balslev, D.

2004-01-01

We describe a method for mining a neuroimaging database for associations between text and brain locations. The objective is to discover association rules between words indicative of cognitive function as described in abstracts of neuroscience papers and sets of reported stereotactic Talairach...
Translanguaging as a vehicle for epistemic access: cases for reading comprehension and multilingual interactions

Directory of Open Access Journals (Sweden)

Makalela, Leketi

2015-12-01

Full Text Available African multilingualism has always been construed from a monoglossic (i.e., one language at a time lens despite the pretensions of plural language policies in Sub-Saharan Africa. The study reported in this paper explored the efficacy of alternating languages of input and output in the same lessons in order to offset linguistic fixity that is often experienced in monolingual classrooms. I present two case studies of translanguaging practices, one at an institution of higher learning and another in the intermediate phase (primary school. The results from these cases show that the use of more than one language by multilingual learners in classroom settings provides cognitive and social advantages for them. Using what I refer to as the ubuntu translanguaging model, I make a case that fuzziness and blurring of boundaries between languages in the translanguaging classes are (i necessary and relevant features of the 21st century to enhance epistemic access for speakers in complex multilingual spaces, and that they are (ii indexical to the pre-colonial African value system of ubuntu. Useful recommendations for classroom applications and further research are considered at the end of the paper.
Material culture of multilingualism and affectivity

Directory of Open Access Journals (Sweden)

Larissa Aronin

2012-10-01

Full Text Available Affectivity is an important dimension in humans’ social and individual lives. It is either a stimulating or hindering aspect of language learning. This article aims to draw attention to material culture as a powerful, but mostly neglected source of data on the use and acquisition of languages, and demonstrates the close and intricate links between affectivity and material culture. It is hoped that revealing these interrelationships will assist in understanding and managing language diversity. It will allow practitioners and teachers to carry out social and private encounters, events and language teaching with more care, understanding and expertise. Researchers will be encouraged to join the investigation of yet one more important facet of multilingualism – material culture.
Text mining analysis of public comments regarding high-level radioactive waste disposal

International Nuclear Information System (INIS)

Kugo, Akihide; Yoshikawa, Hidekazu; Shimoda, Hiroshi; Wakabayashi, Yasunaga

2005-01-01

In order to narrow the risk perception gap as seen in social investigations between the general public and people who are involved in nuclear industry, public comments on high-level radioactive waste (HLW) disposal have been conducted to find the significant talking points with the general public for constructing an effective risk communication model of social risk information regarding HLW disposal. Text mining was introduced to examine public comments to identify the core public interest underlying the comments. The utilized test mining method is to cluster specific groups of words with negative meanings and then to analyze public understanding by employing text structural analysis to extract words from subjective expressions. Using these procedures, it was found that the public does not trust the nuclear fuel cycle promotion policy and shows signs of anxiety about the long-lasting technological reliability of waste storage. To develop effective social risk communication of HLW issues, these findings are expected to help experts in the nuclear industry to communicate with the general public more effectively to obtain their trust. (author)
Are Multilingualism, Tolerance of Ambiguity, and Attitudes toward Linguistic Variation Related?

Science.gov (United States)

van Compernolle, Rémi A.

2016-01-01

This article explores the links between multilingualism, the personality trait Tolerance of Ambiguity (TA), and attitudes toward linguistic variation among 379 mono-, bi-, and multilingual adults who completed an online questionnaire. A self-reported high level of proficiency in multiple languages, short- and long-term residence abroad, and high…

Text-based language identification of multilingual names

CSIR Research Space (South Africa)

Giwa, O

2015-11-01

Full Text Available Text-based language identification (T-LID) of isolated words has been shown to be useful for various speech processing tasks, including pronunciation modelling and data categorisation. When the words to be categorised are proper names, the task...
Linguistic diversity and literacy practices in multilingual classrooms

DEFF Research Database (Denmark)

Laursen, Helle Pia

& Leung, 2001). In search of a critical postmodern perspective on classroom studies, as advocated by Lin & Luk (2002), the study 'Signs of language‘ (2008-2014) aims to investigate the possibilities of restructuring the literacy practices in multilingual classrooms by giving attention to the children‘ s......In the context of an increasing multilingualism, literacy teaching has become a central and contested issue in public and political debate. International comparisons of levels of literacy have been interpreted as an indication of a prevailing literacy crisis that demands political actions to avoid...
From university research to innovation: Detecting knowledge transfer via text mining

Energy Technology Data Exchange (ETDEWEB)

Woltmann, S.; Clemmensen, L.; Alkærsig, L

2016-07-01

Knowledge transfer by universities is a top priority in innovation policy and a primary purpose for public research funding, due to being an important driver of technical change and innovation. Current empirical research on the impact of university research relies mainly on formal databases and indicators such as patents, collaborative publications and license agreements, to assess the contribution to the socioeconomic surrounding of universities. In this study, we present an extension of the current empirical framework by applying new computational methods, namely text mining and pattern recognition. Text samples for this purpose can include files containing social media contents, company websites and annual reports. The empirical focus in the present study is on the technical sciences and in particular on the case of the Technical University of Denmark (DTU). We generated two independent text collections (corpora) to identify correlations of university publications and company webpages. One corpus representing the company sites, serving as sample of the private economy and a second corpus, providing the reference to the university research, containing relevant publications. We associated the former with the latter to obtain insights into possible text and semantic relatedness. The text mining methods are extrapolating the correlations, semantic patterns and content comparison of the two corpora to define the document relatedness. We expect the development of a novel tool using contemporary techniques for the measurement of public research impact. The approach aims to be applicable across universities and thus enable a more holistic comparable assessment. This rely less on formal databases, which is certainly beneficial in terms of the data reliability. We seek to provide a supplementary perspective for the detection of the dissemination of university research and hereby enable policy makers to gain additional insights of (informal) contributions of knowledge
Highlighting entanglement of cultures via ranking of multilingual Wikipedia articles.

Directory of Open Access Journals (Sweden)

Young-Ho Eom

Full Text Available How different cultures evaluate a person? Is an important person in one culture is also important in the other culture? We address these questions via ranking of multilingual Wikipedia articles. With three ranking algorithms based on network structure of Wikipedia, we assign ranking to all articles in 9 multilingual editions of Wikipedia and investigate general ranking structure of PageRank, CheiRank and 2DRank. In particular, we focus on articles related to persons, identify top 30 persons for each rank among different editions and analyze distinctions of their distributions over activity fields such as politics, art, science, religion, sport for each edition. We find that local heroes are dominant but also global heroes exist and create an effective network representing entanglement of cultures. The Google matrix analysis of network of cultures shows signs of the Zipf law distribution. This approach allows to examine diversity and shared characteristics of knowledge organization between cultures. The developed computational, data driven approach highlights cultural interconnections in a new perspective. Dated: June 26, 2013.
Facilitating Multilingual Tutorials at the University of the Free State

Directory of Open Access Journals (Sweden)

du Buisson Theuns

2017-12-01

Full Text Available Conducting undergraduate studies in the English language, while only a small minority of students speak English at home, poses many problems to learning in the South African context. This article explores how restrictive language policies may influence proper learning and impact negatively on the self-understanding of students. It also explores how multilingualism could help to reduce the continued reliance on English, without doing away with English in its entirety. This is especially relevant in light of English and other colonial languages still being perceived as “languages of power” (Stroud & Kerfoot, 2013, p. 403. Therefore, attention is given to the link between language and power, especially in light of languages often being used to implement, display and preserve power. Language use in the classroom, especially with regard to codeswitching (also called translanguaging, is discussed. Finally, it explores the success that was achieved during multilingual tutorial sessions. In the tutorials, students were encouraged to explore the course work in their native languages, thereby internalising it and getting a better understanding thereof.
Singing as Language Learning Activity in Multilingual Toddler Groups in Preschool

Science.gov (United States)

Kultti, Anne

2013-01-01

This research focused on learning conditions in preschool that support multilingual children's linguistic development. The aim of this paper was to study singing activities through the experiences of ten multilingual children in toddler groups (one to three years of age) in eight Swedish preschools. A sociocultural theoretical approach is used to…
Language Practices in Multilingual Communities: Insights from a Suburban High School

Science.gov (United States)

Willoughby, Louisa

2013-01-01

As a result of globalisation and mass migration, suburbs and schools around the world are becoming increasingly multiethnic, multilingual places. Yet there is still relatively little linguistic research on how language is used in everyday interaction in these multilingual communities. In this paper, I explore the strengths and limitations of…
Multilingualism in Canadian schools: Myths, realities and possibilities

Directory of Open Access Journals (Sweden)

Patricia A. Duff

2007-08-01

Full Text Available Abstract Bilingualism and multiculturalism have for four decades been official ideologies and policies in Canada but, as is often the case, the implementation and outcomes of such government policies nationally are less impressive than the rhetoric would suggest. This article reviews the political, theoretical and demographic contexts justifying support for the learning and use of additional languages in contemporary Canadian society and schools, and summarizes research demonstrating that bilingualism and multilingualism are indeed cognitively, socially, and linguistically advantageous for children (and adults, as well as for society. The five studies in this special issue are then previewed with respect to the following themes that run across them: (1 the potential for bilingual synergies and transformations in language awareness activities and crosslinguistic knowledge construction; (2 the role of multiliteracies and multimodality in mediated learning; and (3 the interplay of positioning, identity, and agency in language learning by immigrant youth. The article concludes that more Canadian schools and educators must, like the researchers in this volume, find ways to embrace and build upon students’ prior knowledge, their creativity, their collaborative problem-solving skills, their potential for mastering and manipulating multiple, multilingual semiotic tools, and their desire for inclusion and integration in productive, engaging learning communities.
Ion Channel ElectroPhysiology Ontology (ICEPO) - a case study of text mining assisted ontology development.

Science.gov (United States)

Elayavilli, Ravikumar Komandur; Liu, Hongfang

2016-01-01

Computational modeling of biological cascades is of great interest to quantitative biologists. Biomedical text has been a rich source for quantitative information. Gathering quantitative parameters and values from biomedical text is one significant challenge in the early steps of computational modeling as it involves huge manual effort. While automatically extracting such quantitative information from bio-medical text may offer some relief, lack of ontological representation for a subdomain serves as impedance in normalizing textual extractions to a standard representation. This may render textual extractions less meaningful to the domain experts. In this work, we propose a rule-based approach to automatically extract relations involving quantitative data from biomedical text describing ion channel electrophysiology. We further translated the quantitative assertions extracted through text mining to a formal representation that may help in constructing ontology for ion channel events using a rule based approach. We have developed Ion Channel ElectroPhysiology Ontology (ICEPO) by integrating the information represented in closely related ontologies such as, Cell Physiology Ontology (CPO), and Cardiac Electro Physiology Ontology (CPEO) and the knowledge provided by domain experts. The rule-based system achieved an overall F-measure of 68.93% in extracting the quantitative data assertions system on an independently annotated blind data set. We further made an initial attempt in formalizing the quantitative data assertions extracted from the biomedical text into a formal representation that offers potential to facilitate the integration of text mining into ontological workflow, a novel aspect of this study. This work is a case study where we created a platform that provides formal interaction between ontology development and text mining. We have achieved partial success in extracting quantitative assertions from the biomedical text and formalizing them in ontological
Towards multilingual access to textual databases in natural language

International Nuclear Information System (INIS)

Radwan, Khaled

1994-01-01

The Cross-Lingual Information Retrieval system (CLIR) or Multilingual Information Retrieval (MIR) has become the key issue in electronic documents management systems in a multinational environment. We propose here a multilingual information retrieval system consisting of a morpho-syntactic analyser, a transfer system from source language to target language and an information retrieval system. A thorough investigation into the system architecture and the transfer mechanisms is proposed in that report, using two different performance evaluation methods. (author) [fr
Weighted mining of massive collections of [Formula: see text]-values by convex optimization.

Science.gov (United States)

Dobriban, Edgar

2018-06-01

Researchers in data-rich disciplines-think of computational genomics and observational cosmology-often wish to mine large bodies of [Formula: see text]-values looking for significant effects, while controlling the false discovery rate or family-wise error rate. Increasingly, researchers also wish to prioritize certain hypotheses, for example, those thought to have larger effect sizes, by upweighting, and to impose constraints on the underlying mining, such as monotonicity along a certain sequence. We introduce Princessp , a principled method for performing weighted multiple testing by constrained convex optimization. Our method elegantly allows one to prioritize certain hypotheses through upweighting and to discount others through downweighting, while constraining the underlying weights involved in the mining process. When the [Formula: see text]-values derive from monotone likelihood ratio families such as the Gaussian means model, the new method allows exact solution of an important optimal weighting problem previously thought to be non-convex and computationally infeasible. Our method scales to massive data set sizes. We illustrate the applications of Princessp on a series of standard genomics data sets and offer comparisons with several previous 'standard' methods. Princessp offers both ease of operation and the ability to scale to extremely large problem sizes. The method is available as open-source software from github.com/dobriban/pvalue_weighting_matlab (accessed 11 October 2017).
Modelling vocabulary development among multilingual children prior to and following the transition to school entry

Science.gov (United States)

MacLeod, Andrea A. N.; Castellanos-Ryan, Natalie; Parent, Sophie; Jacques, Sophie; Séguin, Jean R.

2017-01-01

Differences between monolingual and multilingual vocabulary development have been observed but few studies provide a longitudinal perspective on vocabulary development before and following school entry. This study compares vocabulary growth profiles of 106 multilingual children to 211 monolingual peers before and after school entry to examine whether: (1) school entry coincides with different rates of vocabulary growth compared to prior to school entry, (2) compared to monolingual peers, multilingual children show different vocabulary sizes or rates of vocabulary growth, (3) the age of onset of second-language acquisition for multilingual children is associated with vocabulary size or rate of vocabulary growth, and (4) the sociolinguistic context of the languages spoken by multilingual children is associated with vocabulary size or rate of vocabulary growth. Results showed increases in vocabulary size across time for all children, with a steeper increase prior to school entry. A significant difference between monolingual and multilingual children who speak a minority language was observed with regards to vocabulary size at school entry and vocabulary growth prior to school entry, but growth rate differences were no longer present following school entry. Taken together, results suggest that which languages children speak may matter more than being multilingual per se. PMID:29354017
Modelling vocabulary development among multilingual children prior to and following the transition to school entry.

Science.gov (United States)

MacLeod, Andrea A N; Castellanos-Ryan, Natalie; Parent, Sophie; Jacques, Sophie; Séguin, Jean R

2018-01-01

Differences between monolingual and multilingual vocabulary development have been observed but few studies provide a longitudinal perspective on vocabulary development before and following school entry. This study compares vocabulary growth profiles of 106 multilingual children to 211 monolingual peers before and after school entry to examine whether: (1) school entry coincides with different rates of vocabulary growth compared to prior to school entry, (2) compared to monolingual peers, multilingual children show different vocabulary sizes or rates of vocabulary growth, (3) the age of onset of second-language acquisition for multilingual children is associated with vocabulary size or rate of vocabulary growth, and (4) the sociolinguistic context of the languages spoken by multilingual children is associated with vocabulary size or rate of vocabulary growth. Results showed increases in vocabulary size across time for all children, with a steeper increase prior to school entry. A significant difference between monolingual and multilingual children who speak a minority language was observed with regards to vocabulary size at school entry and vocabulary growth prior to school entry, but growth rate differences were no longer present following school entry. Taken together, results suggest that which languages children speak may matter more than being multilingual per se.
Tracing Knowledge Transfer from Universities to Industry: A Text Mining Approach

DEFF Research Database (Denmark)

Woltmann, Sabrina; Alkærsig, Lars

2017-01-01

This paper identifies transferred knowledge between universities and the industry by proposing the use of a computational linguistic method. Current research on university-industry knowledge exchange relies often on formal databases and indicators such as patents, collaborative publications and l...... is the first step to enable the identification of common knowledge and knowledge transfer via text mining to increase its measurability....... and license agreements, to assess the contribution to the socioeconomic surrounding of universities. We, on the other hand, use the texts from university abstracts to identify university knowledge and compare them with texts from firm webpages. We use these text data to identify common key words and thereby...... identify overlapping contents among the texts. As method we use a well-established word ranking method from the field of information retrieval term frequency–inverse document frequency (TFIDF) to identify commonalities between texts from university. In examining the outcomes of the TFIDF statistic we find...
Multilingual students' acquisition of English as their L3

DEFF Research Database (Denmark)

Samal Jalal, Rawand

with regard to English proficiency. The current study conducted in Denmark investigated multilingual students’ English proficiency compared to their monolingual peers’, and examined which learning strategies proficient L3 learners utilize. The sample was comprised of 9-graders who are monolinguals (N = 82......) and multilinguals with Turkish L1 (N = 134). The participants provided basic demographic information, and were tested in their general English proficiency. Out of the 70 multilinguals with Turkish L1, 12 participants were selected for further testing; i.e., the four participants who scored the lowest, four...... participants with intermediate scores, and the four who scored the highest, on a test of English proficiency. These participants were tested in their L1 (Turkish) and their L2 (Danish) in order to examine whether their proficiency in their L1 and L2 was associated with English proficiency. Furthermore, the 12...
Monolingual versus multilingual acquisition of English morphology: what can we expect at age 3?

Science.gov (United States)

Nicholls, Ruth J; Eadie, Patricia A; Reilly, Sheena

2011-01-01

At least two-thirds of the world's children grow up in environments where more than one language is spoken. Despite the global predominance of multilingualism, much remains unknown regarding the language acquisition of children acquiring multiple languages compared with monolingual children. A greater understanding of multilingualism is crucial for speech-language pathologists given the increasing number of children being raised in linguistically diverse environments. To investigate the expressive morphological abilities of multilingual children acquiring English, compared with monolingual children, at 3 years of age. Participants were 148 children (74 multilingual children; 74 matched monolingual children; mean age of 3 years 4 months) already participating in a larger prospective longitudinal cohort study of language development in Melbourne, Australia. Thirty-one languages in addition to English were represented within the embedded cohort. All participants completed a direct language assessment to measure their expressive abilities across a range of English morphemes. The parents of the multilingual participants completed an interview regarding the children's language backgrounds and experiences. The Multilingual Group typically performed below the Monolingual Group in terms of their accurate use and mastery of English morphemes at 3 years of age, although variable expressive abilities were indicated within each group. The same morphemes were shown to be mastered by relatively higher proportions of each group. Likewise, the same forms were mastered by relatively lower proportions of each group. The results indicated similarities between the children's acquisition of English morphology, regardless of whether they were acquiring English only or in combination with another language(s) at 3 years of age. This study found a range of similarities and differences between multilingual compared with monolingual children's acquisition of English morphology at 3 years of
Practitioner Review: Multilingualism and neurodevelopmental disorders - an overview of recent research and discussion of clinical implications.

Science.gov (United States)

Uljarević, Mirko; Katsos, Napoleon; Hudry, Kristelle; Gibson, Jenny L

2016-11-01

Language and communication skills are essential aspects of child development, which are often disrupted in children with neurodevelopmental disorders. Cutting edge research in psycholinguistics suggests that multilingualism has potential to influence social, linguistic and cognitive development. Thus, multilingualism has implications for clinical assessment, diagnostic formulation, intervention and support offered to families. We present a systematic review and synthesis of the effects of multilingualism for children with neurodevelopmental disorders and discuss clinical implications. We conducted systematic searches for studies on multilingualism in neurodevelopmental disorders. Keywords for neurodevelopmental disorders were based on Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition categories as follows; Intellectual Disabilities, Communication Disorders, Autism Spectrum Disorder (ASD), Attention-Deficit/Hyperactivity Disorder, Specific Learning Disorder, Motor Disorders, Other Neurodevelopmental Disorders. We included only studies based on empirical research and published in peer-reviewed journals. Fifty studies met inclusion criteria. Thirty-eight studies explored multilingualism in Communication Disorders, 10 in ASD and two in Intellectual Disability. No studies on multilingualism in Specific Learning Disorder or Motor Disorders were identified. Studies which found a disadvantage for multilingual children with neurodevelopmental disorders were rare, and there appears little reason to assume that multilingualism has negative effects on various aspects of functioning across a range of conditions. In fact, when considering only those studies which have compared a multilingual group with developmental disorders to a monolingual group with similar disorders, the findings consistently show no adverse effects on language development or other aspects of functioning. In the case of ASD, a positive effect on communication and social functioning has
Identifying Understudied Nuclear Reactions by Text-mining the EXFOR Experimental Nuclear Reaction Library

Energy Technology Data Exchange (ETDEWEB)

Hirdt, J.A. [Department of Mathematics and Computer Science, St. Joseph' s College, Patchogue, NY 11772 (United States); Brown, D.A., E-mail: dbrown@bnl.gov [National Nuclear Data Center, Brookhaven National Laboratory, Upton, NY 11973-5000 (United States)

2016-01-15

The EXFOR library contains the largest collection of experimental nuclear reaction data available as well as the data's bibliographic information and experimental details. We text-mined the REACTION and MONITOR fields of the ENTRYs in the EXFOR library in order to identify understudied reactions and quantities. Using the results of the text-mining, we created an undirected graph from the EXFOR datasets with each graph node representing a single reaction and quantity and graph links representing the various types of connections between these reactions and quantities. This graph is an abstract representation of the connections in EXFOR, similar to graphs of social networks, authorship networks, etc. We use various graph theoretical tools to identify important yet understudied reactions and quantities in EXFOR. Although we identified a few cross sections relevant for shielding applications and isotope production, mostly we identified charged particle fluence monitor cross sections. As a side effect of this work, we learn that our abstract graph is typical of other real-world graphs.
Identifying Understudied Nuclear Reactions by Text-mining the EXFOR Experimental Nuclear Reaction Library

International Nuclear Information System (INIS)

Hirdt, J.A.; Brown, D.A.

2016-01-01

The EXFOR library contains the largest collection of experimental nuclear reaction data available as well as the data's bibliographic information and experimental details. We text-mined the REACTION and MONITOR fields of the ENTRYs in the EXFOR library in order to identify understudied reactions and quantities. Using the results of the text-mining, we created an undirected graph from the EXFOR datasets with each graph node representing a single reaction and quantity and graph links representing the various types of connections between these reactions and quantities. This graph is an abstract representation of the connections in EXFOR, similar to graphs of social networks, authorship networks, etc. We use various graph theoretical tools to identify important yet understudied reactions and quantities in EXFOR. Although we identified a few cross sections relevant for shielding applications and isotope production, mostly we identified charged particle fluence monitor cross sections. As a side effect of this work, we learn that our abstract graph is typical of other real-world graphs.
Text mining to decipher free-response consumer complaints: insights from the NHTSA vehicle owner's complaint database.

Science.gov (United States)

Ghazizadeh, Mahtab; McDonald, Anthony D; Lee, John D

2014-09-01

This study applies text mining to extract clusters of vehicle problems and associated trends from free-response data in the National Highway Traffic Safety Administration's vehicle owner's complaint database. As the automotive industry adopts new technologies, it is important to systematically assess the effect of these changes on traffic safety. Driving simulators, naturalistic driving data, and crash databases all contribute to a better understanding of how drivers respond to changing vehicle technology, but other approaches, such as automated analysis of incident reports, are needed. Free-response data from incidents representing two severity levels (fatal incidents and incidents involving injury) were analyzed using a text mining approach: latent semantic analysis (LSA). LSA and hierarchical clustering identified clusters of complaints for each severity level, which were compared and analyzed across time. Cluster analysis identified eight clusters of fatal incidents and six clusters of incidents involving injury. Comparisons showed that although the airbag clusters across the two severity levels have the same most frequent terms, the circumstances around the incidents differ. The time trends show clear increases in complaints surrounding the Ford/Firestone tire recall and the Toyota unintended acceleration recall. Increases in complaints may be partially driven by these recall announcements and the associated media attention. Text mining can reveal useful information from free-response databases that would otherwise be prohibitively time-consuming and difficult to summarize manually. Text mining can extend human analysis capabilities for large free-response databases to support earlier detection of problems and more timely safety interventions.

Public reactions to e-cigarette regulations on Twitter: a text mining analysis.

Science.gov (United States)

Lazard, Allison J; Wilcox, Gary B; Tuttle, Hannah M; Glowacki, Elizabeth M; Pikowski, Jessica

2017-12-01

In May 2016, the Food and Drug Administration (FDA) issued a final rule that deemed e-cigarettes to be within their regulatory authority as a tobacco product. News and opinions about the regulation were shared on social media platforms, such as Twitter, which can play an important role in shaping the public's attitudes. We analysed information shared on Twitter for insights into initial public reactions. A text mining approach was used to uncover important topics among reactions to the e-cigarette regulations on Twitter. SAS Text Miner V.12.1 software was used for descriptive text mining to uncover the primary topics from tweets collected from May 1 to May 17 2016 using NUVI software to gather the data. A total of nine topics were generated. These topics reveal initial reactions to whether the FDA's e-cigarette regulations will benefit or harm public health, how the regulations will impact the emerging e-cigarette market and efforts to share the news. The topics were dominated by negative or mixed reactions. In the days following the FDA's announcement of the new deeming regulations, the public reaction on Twitter was largely negative. Public health advocates should consider using social media outlets to better communicate the policy's intentions, reach and potential impact for public good to create a more balanced conversation. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Multilingual processing in the brain

NARCIS (Netherlands)

Noort, M.W.M.L. van den; Struys, E.; Kim, K.Y.; Bosch, M.P.C.; Mondt, K.; Kralingen, R.B.A.S. van; Lee, M.Y.; Craen, P. van de

2014-01-01

In this paper, in contrast to previous neuroimaging literature reviews on first language (L1) and second language (L2), the focus was only on neuroimaging studies that were directly conducted on multilingual participants. In total, 14 neuroimaging studies were included in our study such as 10
Multilingual event extraction for epidemic detection.

Science.gov (United States)

Lejeune, Gaël; Brixtel, Romain; Doucet, Antoine; Lucas, Nadine

2015-10-01

This paper presents a multilingual news surveillance system applied to tele-epidemiology. It has been shown that multilingual approaches improve timeliness in detection of epidemic events across the globe, eliminating the wait for local news to be translated into major languages. We present here a system to extract epidemic events in potentially any language, provided a Wikipedia seed for common disease names exists. The Daniel system presented herein relies on properties that are common to news writing (the journalistic genre), the most useful being repetition and saliency. Wikipedia is used to screen common disease names to be matched with repeated characters strings. Language variations, such as declensions, are handled by processing text at the character-level, rather than at the word level. This additionally makes it possible to handle various writing systems in a similar fashion. As no multilingual ground truth existed to evaluate the Daniel system, we built a multilingual corpus from the Web, and collected annotations from native speakers of Chinese, English, Greek, Polish and Russian, with no connection or interest in the Daniel system. This data set is available online freely, and can be used for the evaluation of other event extraction systems. Experiments for 5 languages out of 17 tested are detailed in this paper: Chinese, English, Greek, Polish and Russian. The Daniel system achieves an average F-measure of 82% in these 5 languages. It reaches 87% on BEcorpus, the state-of-the-art corpus in English, slightly below top-performing systems, which are tailored with numerous language-specific resources. The consistent performance of Daniel on multiple languages is an important contribution to the reactivity and the coverage of epidemiological event detection systems. Most event extraction systems rely on extensive resources that are language-specific. While their sophistication induces excellent results (over 90% precision and recall), it restricts their
Classifying unstructed textual data using the Product Score Model: an alternative text mining algorithm

NARCIS (Netherlands)

He, Qiwei; Veldkamp, Bernard P.; Eggen, T.J.H.M.; Veldkamp, B.P.

2012-01-01

Unstructured textual data such as students’ essays and life narratives can provide helpful information in educational and psychological measurement, but often contain irregularities and ambiguities, which creates difficulties in analysis. Text mining techniques that seek to extract useful
Mediating Multilingual Children's Language Resources

Science.gov (United States)

Potts, D.; Moran, M. J.

2013-01-01

The everyday reality of children's multilingualism is a significant resource for expanding students' perspectives on the world, but many questions remain regarding the negotiation of these resources in mainstream classrooms. Drawing on research from a long-term Canadian study of multiliterate pedagogies, this paper explores mediation of home…
Multilingualism and Education for Democracy

Science.gov (United States)

Biseth, Heidi

2009-01-01

This essay attempts to show the importance of linguistic issues in education for democracy and the close relationship between democracy and multilingualism. Increasingly nation-states are having to adapt to linguistic diversity within their borders and to recognize that democracy requires the participation of all citizens, including those…
Affordances Theory in Multilingualism Studies

Science.gov (United States)

Aronin, Larissa; Singleton, David

2012-01-01

The concept of affordances originating in Gibson's work (Gibson, 1977) is gaining ground in multilingualism studies (cf. Aronin and Singleton, 2010; Singleton and Aronin, 2007; Dewaele, 2010). Nevertheless, studies investigating affordances in respect of teaching, learning or using languages are still somewhat rare and tend to treat isolated…
An Enhanced Text-Mining Framework for Extracting Disaster Relevant Data through Social Media and Remote Sensing Data Fusion

Science.gov (United States)

Scheele, C. J.; Huang, Q.

2016-12-01

In the past decade, the rise in social media has led to the development of a vast number of social media services and applications. Disaster management represents one of such applications leveraging massive data generated for event detection, response, and recovery. In order to find disaster relevant social media data, current approaches utilize natural language processing (NLP) methods based on keywords, or machine learning algorithms relying on text only. However, these approaches cannot be perfectly accurate due to the variability and uncertainty in language used on social media. To improve current methods, the enhanced text-mining framework is proposed to incorporate location information from social media and authoritative remote sensing datasets for detecting disaster relevant social media posts, which are determined by assessing the textual content using common text mining methods and how the post relates spatiotemporally to the disaster event. To assess the framework, geo-tagged Tweets were collected for three different spatial and temporal disaster events: hurricane, flood, and tornado. Remote sensing data and products for each event were then collected using RealEarthTM. Both Naive Bayes and Logistic Regression classifiers were used to compare the accuracy within the enhanced text-mining framework. Finally, the accuracies from the enhanced text-mining framework were compared to the current text-only methods for each of the case study disaster events. The results from this study address the need for more authoritative data when using social media in disaster management applications.
Literacy education, reading engagement, and library use in multilingual classes

OpenAIRE

Tonne, Ingebjørg; Pihl, Joron

2012-01-01

The topic of this paper is literacy education and reading engagement in multilingual classes. What facilitates reading engagement in the language of instruction in multilingual classes? In this paper, we analyze reading engagement in a literature-based literacy program in Norway (2007–2011). The design was a research and development project in which teachers, researchers, and librarians collaborated within literacy education. We present pedagogical interventions within the project and analyze...
Antinomies of Ideologies and Situationality of Education Language Politics in Multilingual Contexts

Science.gov (United States)

Odugu, Desmond Ikenna

2015-01-01

Widespread scholarly and political attention to language-related inequities in the 20th century precipitated a spate of orientations to language planning in multilingual societies. While various orientations indicate a shift from earlier deficit to affirmative views of multilingualism, vigorous debates persist about the logical and pragmatic…
Lexicon Reduction for Urdu/Arabic Script Based Character Recognition: A Multilingual OCR

Directory of Open Access Journals (Sweden)

Saeeda Naz

2016-04-01

Full Text Available Arabic script character recognition is challenging task due to complexity of the script and huge number of ligatures. We present a method for the development of multilingual Arabic script OCR (Optical Character Recognition and lexicon reduction for Arabic Script and its derivative languages. The objective of the proposed method is to overcome the large dataset Urdu and similar scripts by using GCT (Ghost Character Theory concept. Arabic and its sibling script languages share the similar character dataset i.e. the character set are difference in diacritic and writing styles like Naskh or Nasta?liq. Based on the proposed method, the lexicon for Arabic and Arabic script based languages can be minimized approximately up to 20 times. The proposed multilingual Arabic script OCR approach have been evaluated for online Arabic and its derivative language like Urdu using BPNN. The result showed that proposed method helps to not only the reduction of lexicon but also helps to develop the Multilanguage character recognition system for Arabic Script.
A Multilingual Perspective on Translanguaging

Science.gov (United States)

MacSwan, Jeff

2017-01-01

Translanguaging is a new term in bilingual education; it supports a heteroglossic language ideology, which views bilingualism as valuable in its own right. Some translanguaging scholars have questioned the existence of discrete languages, further concluding that multilingualism does not exist. I argue that the political use of language names can…
Multilingual children between real and imaginary worlds

DEFF Research Database (Denmark)

Laursen, Helle Pia; Kolstrup, Kirsten Lundgaard

2017-01-01

This article analyzes how a group of multilingual children in their early adolescence use various forms of language play and position themselves symbolically through involvement in signifying practices. By developing a conceptual framework that combines insights on language play (Cook 2000......) and the signifying self (Kramsch 2009), it demonstrates how the children as sign makers and symbolic subjects (re)signify their own learning space. The analysis reveals how, during a reading and joint text construction activity in Danish, they explore the symbolic possibilities of signs and subjectivities, while...... moving in and out of the text and back and forth between imagined and real worlds. These findings illustrate how the children’s interest both shapes their playful interaction and takes shape through it. It furthermore shows how language play contributes to paving the way for a resignification...
Modalities to Implement the Multilinguality in Web DYNPRO ABAP

Directory of Open Access Journals (Sweden)

Ana Daniela CRISTEA

2010-08-01

Full Text Available The integrated platform SAP Netweaver is a platform that offers support in realizing Web bussiness applications that use the Model View Controller (MVC concept. The Multilinguality being a property of this platform. The purpose of this article is to highlight the modality to internationalize a Web Dynpro ABAP project The techniquesused for the internationalization of a Web Dynpro ABAP application are: the OTR [Online Text Repository] translations, the implementation of the assistance class and the technique of information internationalization in a database. The case study has been performed on the trial “SAP Netweaver 7.0 Application Server ABAP” that offered the possibility to log-in in English and German languages.
Integrated, Not Isolated: Defining Typological Proximity in an Integrated Multilingual Architecture

Directory of Open Access Journals (Sweden)

Michael T. Putnam

2018-01-01

Full Text Available On the surface, bi- and multilingualism would seem to be an ideal context for exploring questions of typological proximity. The obvious intuition is that the more closely related two languages are, the easier it should be to implement the two languages in one mind. This is the starting point adopted here, but we immediately run into the difficulty that the overwhelming majority of cognitive, computational, and linguistic research on bi- and multilingualism exhibits a monolingual bias (i.e., where monolingual grammars are used as the standard of comparison for outputs from bilingual grammars. The primary questions so far have focused on how bilinguals balance and switch between their two languages, but our perspective on typology leads us to consider the nature of bi- and multi-lingual systems as a whole. Following an initial proposal from Hsin (2014, we conjecture that bilingual grammars are neither isolated, nor (completely conjoined with one another in the bilingual mind, but rather exist as integrated source grammars that are further mitigated by a common, combined grammar (Cook, 2016; Goldrick et al., 2016a,b; Putnam and Klosinski, 2017. Here we conceive such a combined grammar in a parallel, distributed, and gradient architecture implemented in a shared vector-space model that employs compression through routinization and dimensionality reduction. We discuss the emergence of such representations and their function in the minds of bilinguals. This architecture aims to be consistent with empirical results on bilingual cognition and memory representations in computational cognitive architectures.
[Exploring the clinical characters of Shugan Jieyu capsule through text mining].

Science.gov (United States)

Pu, Zheng-Ping; Xia, Jiang-Ming; Xie, Wei; He, Jin-Cai

2017-09-01

The study was main to explore the clinical characters of Shugan Jieyu capsule through text mining. The data sets of Shugan Jieyu capsule were downloaded from CMCC database by the method of literature retrieved from May 2009 to Jan 2016. Rules of Chinese medical patterns, diseases, symptoms and combination treatment were mined out by data slicing algorithm, and they were demonstrated in frequency tables and two dimension based network. Then totally 190 literature were recruited. The outcomess suggested that SC was most frequently correlated with liver Qi stagnation. Primary depression, depression due to brain disease, concomitant depression followed by physical diseases, concomitant depression followed by schizophrenia and functional dyspepsia were main diseases treated by Shugan Jieyu capsule. Symptoms like low mood, psychic anxiety, somatic anxiety and dysfunction of automatic nerve were mainy relieved bv Shugan Jieyu capsule.For combination treatment. Shugan Jieyu capsule was most commonly used with paroxetine, sertraline and fluoxetine. The research suggested that syndrome types and mining results of Shugan Jieyu capsule were almost the same as its instructions. Syndrome of malnutrition of heart spirit was the potential Chinese medical pattern of Shugan Jieyu capsule. Primary comorbid anxiety and depression, concomitant comorbid anxiety and depression followed by physical diseases, and postpartum depression were potential diseases treated by Shugan Jieyu capsule.For combination treatment, Shugan Jieyu capsule was most commonly used with paroxetine, sertraline and fluoxetine. Copyright© by the Chinese Pharmaceutical Association.
DiMeX: A Text Mining System for Mutation-Disease Association Extraction.

Science.gov (United States)

Mahmood, A S M Ashique; Wu, Tsung-Jung; Mazumder, Raja; Vijay-Shanker, K

2016-01-01

The number of published articles describing associations between mutations and diseases is increasing at a fast pace. There is a pressing need to gather such mutation-disease associations into public knowledge bases, but manual curation slows down the growth of such databases. We have addressed this problem by developing a text-mining system (DiMeX) to extract mutation to disease associations from publication abstracts. DiMeX consists of a series of natural language processing modules that preprocess input text and apply syntactic and semantic patterns to extract mutation-disease associations. DiMeX achieves high precision and recall with F-scores of 0.88, 0.91 and 0.89 when evaluated on three different datasets for mutation-disease associations. DiMeX includes a separate component that extracts mutation mentions in text and associates them with genes. This component has been also evaluated on different datasets and shown to achieve state-of-the-art performance. The results indicate that our system outperforms the existing mutation-disease association tools, addressing the low precision problems suffered by most approaches. DiMeX was applied on a large set of abstracts from Medline to extract mutation-disease associations, as well as other relevant information including patient/cohort size and population data. The results are stored in a database that can be queried and downloaded at http://biotm.cis.udel.edu/dimex/. We conclude that this high-throughput text-mining approach has the potential to significantly assist researchers and curators to enrich mutation databases.
Law and Language in a Multilingual Society

Directory of Open Access Journals (Sweden)

Judge Louis Harms

2012-08-01

Full Text Available Terence McKenna, in Wild Ducks Flying Backwards, said that he did not believe thatthe world is made of quarks or electro-magnetic waves, or stars, or planets, or of anysuch things. ’I believe’ he said, ‘the world is made of language.’ It would have beenmore correct to have said that the world is made of languages, many of them.The subject, Law and Language in a Multilingual Society, raises critical issues notonly for us in this country but also for others because language is part – the greaterpart – of one's culture. A people without a culture is said to be like a zebra withoutstripes. Culture, and not race, nationality, religion or border (natural or political,determines one's identity. As one of the founding fathers of the Afrikaans language,Rev SJ du Toit, wrote in 1891: language is a portrait of the soul and life of a nation;and it mirrors the character and intellectual development of a people (my translation.Unfortunately language tends to divide, more particularly, a multilingual society. Lawis supposed to close the divide but more often than not widens it and is used todeepen divisions. This is because the ruler determines the law and, consequently,the language of the law, in the belief that the use of language can be enforced fromabove. Law and language, like oil and water, do not mix although the former isdependent on the latter.
Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature.

Science.gov (United States)

Ravikumar, Komandur Elayavilli; Wagholikar, Kavishwar B; Li, Dingcheng; Kocher, Jean-Pierre; Liu, Hongfang

2015-06-06

Advances in the next generation sequencing technology has accelerated the pace of individualized medicine (IM), which aims to incorporate genetic/genomic information into medicine. One immediate need in interpreting sequencing data is the assembly of information about genetic variants and their corresponding associations with other entities (e.g., diseases or medications). Even with dedicated effort to capture such information in biological databases, much of this information remains 'locked' in the unstructured text of biomedical publications. There is a substantial lag between the publication and the subsequent abstraction of such information into databases. Multiple text mining systems have been developed, but most of them focus on the sentence level association extraction with performance evaluation based on gold standard text annotations specifically prepared for text mining systems. We developed and evaluated a text mining system, MutD, which extracts protein mutation-disease associations from MEDLINE abstracts by incorporating discourse level analysis, using a benchmark data set extracted from curated database records. MutD achieves an F-measure of 64.3% for reconstructing protein mutation disease associations in curated database records. Discourse level analysis component of MutD contributed to a gain of more than 10% in F-measure when compared against the sentence level association extraction. Our error analysis indicates that 23 of the 64 precision errors are true associations that were not captured by database curators and 68 of the 113 recall errors are caused by the absence of associated disease entities in the abstract. After adjusting for the defects in the curated database, the revised F-measure of MutD in association detection reaches 81.5%. Our quantitative analysis reveals that MutD can effectively extract protein mutation disease associations when benchmarking based on curated database records. The analysis also demonstrates that incorporating
Integrated Text Mining and Chemoinformatics Analysis Associates Diet to Health Benefit at Molecular Level

DEFF Research Database (Denmark)

Jensen, Kasper; Panagiotou, Gianni; Kouskoumvekaki, Irene

2014-01-01

, lipids and nutrients. In this work, we applied text mining and Naïve Bayes classification to assemble the knowledge space of food-phytochemical and food-disease associations, where we distinguish between disease prevention/amelioration and disease progression. We subsequently searched for frequently...

Marketing, Management and Performance: Multilingualism as Commodity in a Tourism Call Centre

Science.gov (United States)

Duchene, Alexandre

2009-01-01

This paper focuses on the ways an institution of the new economy--a tourism call centre in Switzerland--markets, manages and performs multilingual services. In particular, it explores the ways multilingualism operates as a strategic and managerial tool within tourism call centres and how the institutional regulation of language practices…
Literacy for All? Using multilingual reading stories for literacy development in a Grade One classroom in the Western Cape

Directory of Open Access Journals (Sweden)

Prosper, Ancyfrida

2016-12-01

Full Text Available This paper reports on a literacy pilot project which investigated the use of multilingual reading books and the pedagogical strategies that were employed by one bilingual teacher and her assistant to teach literacy in a linguistically diverse Grade 1 classroom in a primary school in the Western Cape, South Africa. Data were collected by means of classroom observations and semi-structured interviews to understand the teacher’s literacy instruction, reflecting her understanding of the multilingual pedagogical approach as a means of fostering learners’ biliteracy skills. Through the lens of the social constructivist theory and the notion of biliteracy, this paper argues that bilingual competence does not necessarily translate to biliteracy if the teaching approaches and learning materials are not systematically and adequately used to support learners’ listening, oral, reading and writing skills in different languages in an integrated and holistic manner in multilingual classrooms. It concludes that, despite the progressive South African Language-in-Education Policy which supports additive multilingualism, classroom practices continue to reinforce monolingualism in English, which deprives the majority of learners of meaningful access to literacy in different languages as they do not exploit the socio-cultural and cognitive capital embedded in the learners’ home languages for additive bilingual and biliteracy competence.
Towards the multilingual semantic web principles, methods and applications

CERN Document Server

Buitelaar, Paul

2014-01-01

To date, the relation between multilingualism and the Semantic Web has not yet received enough attention in the research community. One major challenge for the Semantic Web community is to develop architectures, frameworks and systems that can help in overcoming national and language barriers, facilitating equal access to information produced in different cultures and languages. As such, this volume aims at documenting the state-of-the-art with regard to the vision of a Multilingual Semantic Web, in which semantic information will be accessible in and across multiple languages. The Multiling
Multilingual trends in a globalized world prospects and challenges

CERN Document Server

Singh, Navin Kumar

2013-01-01

This book presents evolving language education trends by drawing examples and case studies from around the world. Over the past few decades, significant economic and political changes have taken place around the world which have had a significant impact on language teaching and learning practices across the globe. With globalization, the focus of language education has shifted from monolingualism towards bilingualism and multilingualism, in that multilingual practices have become the norm rather than the exception in most parts of the world. This book brings together some of latest controversi
Information Retrieval and Text Mining Technologies for Chemistry.

Science.gov (United States)

Krallinger, Martin; Rabal, Obdulia; Lourenço, Anália; Oyarzabal, Julen; Valencia, Alfonso

2017-06-28

Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.
HPIminer: A text mining system for building and visualizing human protein interaction networks and pathways.

Science.gov (United States)

Subramani, Suresh; Kalpana, Raja; Monickaraj, Pankaj Moses; Natarajan, Jeyakumar

2015-04-01

The knowledge on protein-protein interactions (PPI) and their related pathways are equally important to understand the biological functions of the living cell. Such information on human proteins is highly desirable to understand the mechanism of several diseases such as cancer, diabetes, and Alzheimer's disease. Because much of that information is buried in biomedical literature, an automated text mining system for visualizing human PPI and pathways is highly desirable. In this paper, we present HPIminer, a text mining system for visualizing human protein interactions and pathways from biomedical literature. HPIminer extracts human PPI information and PPI pairs from biomedical literature, and visualize their associated interactions, networks and pathways using two curated databases HPRD and KEGG. To our knowledge, HPIminer is the first system to build interaction networks from literature as well as curated databases. Further, the new interactions mined only from literature and not reported earlier in databases are highlighted as new. A comparative study with other similar tools shows that the resultant network is more informative and provides additional information on interacting proteins and their associated networks. Copyright © 2015 Elsevier Inc. All rights reserved.
pubmed. mineR: An R package with text-mining algorithms to ...

Indian Academy of Sciences (India)

2016-08-26

Aug 26, 2016 ... Three case studies are presented, namely, `Evolving role of diabetes educators', `Cancer risk assessment' and `Dynamic concepts on disease and comorbidity' to illustrate the use of pubmed.mineR. The package generally runs fast with small elapsed times in regular workstations even on large corpus ...
The Accommodation of Multilingualism through Blended Learning in Two Information Technology Classes

Science.gov (United States)

Olivier, Jako

2013-01-01

The South African society can be described as culturally diverse and multilingual. However, despite the advantages of mother-tongue education, English is often chosen as the language of learning and teaching at the cost of the other official languages. This article proposes that multilingualism, through the use of languages other than English in…
Multilingual Aspects of Speech Sound Disorders in Children. Communication Disorders across Languages

Science.gov (United States)

McLeod, Sharynne; Goldstein, Brian

2012-01-01

Multilingual Aspects of Speech Sound Disorders in Children explores both multilingual and multicultural aspects of children with speech sound disorders. The 30 chapters have been written by 44 authors from 16 different countries about 112 languages and dialects. The book is designed to translate research into clinical practice. It is divided into…
Multilingual Federated Searching Across Heterogeneous Collections.

Science.gov (United States)

Powell, James; Fox, Edward A.

1998-01-01

Describes a scalable system for searching heterogeneous multilingual collections on the World Wide Web. Details Searchable Database Markup Language (SearchDB-ML) for describing the characteristics of a search engine and its interface, and a protocol for requesting word translations between languages. (Author)
The WONP-NURT corpus as nuclear knowledge base for text mining in the INIS database

International Nuclear Information System (INIS)

Guerra Valdes, R.

2011-01-01

In the present work the WONP-NURT corpus is taken as knowledge base for text mining in the INIS database. Main components of the information processing system, as well as computational methods for content analysis of INIS database record files are described. Results of the content analysis of the WONP-NURT corpus are reported. Furthermore, results of two comparative text mining studies in the INIS database are also shown. The first one explores 10 research areas in the more familiar nearest range of WONP-NURT corpus, while the second one surveys 15 regions in the more exotic far range. The results provide new elements to asses the significance of the WONP-NURT corpus in the context of the current state of nuclear science and technology research areas. (Author)
Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge.

Science.gov (United States)

Cormack, James; Nath, Chinmoy; Milward, David; Raja, Kalpana; Jonnalagadda, Siddhartha R

2015-12-01

This paper describes the use of an agile text mining platform (Linguamatics' Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system. Copyright © 2015 Elsevier Inc. All rights reserved.
More Delusions May Be Observed in Low-Proficient Multilingual Alzheimer's Disease Patients.

Science.gov (United States)

Liu, Yi-Chien; Liu, Yen-Ying; Yip, Ping-Keung; Akanuma, Kyoko; Meguro, Kenichi

2015-01-01

Language impairment and behavioral symptoms are both common phenomena in dementia patients. In this study, we investigated the behavioral symptoms in dementia patients with different language backgrounds. Through this, we aimed to propose a possible connection between language and delusion. We recruited 21 patients with Alzheimer's disease (AD), according to the DSM-IV and NINCDS-ADRDA criteria, from the memory clinic of the Cardinal Tien Hospital in Taipei, Taiwan. They were classified into two groups: 11 multilinguals who could speak Japanese, Taiwanese and Mandarin Chinese, and 10 bilinguals who only spoke Taiwanese and Mandarin Chinese. There were no differences between age, education, disease duration, disease severity, environment and medical care between these two groups. Comprehensive neuropsychological examinations, including Clinical Dementia Rating (CDR), Mini-Mental Status Examination (MMSE), Cognitive Abilities Screening Instrument (CASI), Verbal fluency, Chinese version of the Boston naming test (BNT) and the Behavioral Pathology in Alzheimer's Disease Rating Scale (BEHAVE-AD), were administered. The multilingual group showed worse results on the Boston naming test. Other neuropsychological tests, including the MMSE, CASI and Verbal fluency, were not significantly different. More delusions were noted in the multilingual group. Three pairs of subjects were identified for further examination of their differences. These three cases presented the typical scenario of how language misunderstanding may cause delusions in multilingual dementia patients. Consequently, more emotion and distorted ideas may be induced in the multilinguals compared with the MMSE-matched controls. Inappropriate mixing of language or conflict between cognition and emotion may cause more delusions in these multilingual patients. This reminds us that delusion is not a pure biological outcome of brain degeneration. Although the cognitive performance was not significantly different
Facilitating Reading Acquisition in Multilingual Environments in India (FRAME-India). Final Report

Science.gov (United States)

Nakamura, Pooja; de Hoop, Thomas

2014-01-01

Most of the world is multilingual--multilingual at the national level (policies), at the community and family level (practices), and at the individual level (cognitive)--and each of these has implications for teaching and learning. Yet, at present, most reading decisions are not based on empirical research of how children learn to read in…
The Gradience of Multilingualism in Typical and Impaired Language Development: Positioning Bilectalism within Comparative Bilingualism.

Science.gov (United States)

Grohmann, Kleanthes K; Kambanaros, Maria

2016-01-01

A multitude of factors characterizes bi- and multilingual compared to monolingual language acquisition. Two of the most prominent viewpoints have recently been put in perspective and enriched by a third (Tsimpli, 2014): age of onset of children's exposure to their native languages, the role of the input they receive, and the timing in monolingual first language development of the phenomena examined in bi- and multilingual children's performance. This article picks up a fourth potential factor (Grohmann, 2014b): language proximity, that is, the closeness between the two or more grammars a multilingual child acquires. It is a first attempt to flesh out the proposed gradient scale of multilingualism within the approach dubbed "comparative bilingualism." The empirical part of this project comes from three types of research: (i) the acquisition and subsequent development of pronominal object clitic placement in two closely related varieties of Greek by bilectal, binational, bilingual, and multilingual children; (ii) the performance on executive control tasks by monolingual, bilectal, and bi- or multilingual children; and (iii) the role of comparative bilingualism in children with a developmental language impairment for both the diagnosis and subsequent treatment as well as the possible avoidance or weakening of how language impairment presents.
Multilingual Institutional Discourses of Negotiation and Intertextuality in Writing Center Interactions in Macao

Science.gov (United States)

Lee, Alice Shu-Ju

2017-01-01

This dissertation explores the identity enactments (Bucholtz & Hall, 2005) of 14 multilingual university writing center tutors and multilingual student writers who use English and Putonghua to negotiate their interactions. The study is situated within sociocultural theory (Vygotsky, 1978) and uses ethnographic methods such as observation,…
Multilingual school starters

DEFF Research Database (Denmark)

Laursen, Helle Pia

Multilingual school starters: social semiotics perspectives on second language and literacy learning in education Helle Pia Laursen The starting point for this paper is the still increasing role of literacy in educational settings. Often primary education is seen as almost being synonymous...... of globalisation. Furthermore, this perception of literacy entails that the student’s possible insights into other ways of adding signs to language than those we know from a specific version of the Latin alphabet, fall outside the interests of research and teaching. From this perspective and with a social semiotic...
Multilingual Practices in Contemporary and Historical Contexts: Interfaces between Code-Switching and Translation

Science.gov (United States)

Kolehmainen, Leena; Skaffari, Janne

2016-01-01

This article serves as an introduction to a collection of four articles on multilingual practices in speech and writing, exploring both contemporary and historical sources. It not only introduces the articles but also discusses the scope and definitions of code-switching, attitudes towards multilingual interaction and, most pertinently, the…
Multilingual Competences and Family Language Practices

NARCIS (Netherlands)

Duarte, Joana; Gogolin, Ingrid; Klinger, Thorsten; Schnoor, Birger

2014-01-01

In this paper we examine the role of family-induced linguistic input as a predictor for proficiencies in written language production of multilingual children aged 11. Our study considers their proficiencies in majority language (German) as well as in their family languages. Given that in most cases
Grammatical gender in the discourse of multilingual children's acquisition of German

Directory of Open Access Journals (Sweden)

Montanari, Elke

2014-03-01

Full Text Available The acquisition of grammatical gender by multilingual pre-school children (aged six was investigated by observing their narration and discourse. It emerged that only three of the 17 children actually used gender to classify nouns. Grammatical agreement is acknowledged as a key feature of gender acquisition, and it reflects developmental steps. Children growing up with mostly bilingual German input at a low proficiency level had the greatest difficulties in acquiring gender and agreement in the group investigated.

Integrating text mining, data mining, and network analysis for identifying genetic breast cancer trends.

Science.gov (United States)

Jurca, Gabriela; Addam, Omar; Aksac, Alper; Gao, Shang; Özyer, Tansel; Demetrick, Douglas; Alhajj, Reda

2016-04-26

Breast cancer is a serious disease which affects many women and may lead to death. It has received considerable attention from the research community. Thus, biomedical researchers aim to find genetic biomarkers indicative of the disease. Novel biomarkers can be elucidated from the existing literature. However, the vast amount of scientific publications on breast cancer make this a daunting task. This paper presents a framework which investigates existing literature data for informative discoveries. It integrates text mining and social network analysis in order to identify new potential biomarkers for breast cancer. We utilized PubMed for the testing. We investigated gene-gene interactions, as well as novel interactions such as gene-year, gene-country, and abstract-country to find out how the discoveries varied over time and how overlapping/diverse are the discoveries and the interest of various research groups in different countries. Interesting trends have been identified and discussed, e.g., different genes are highlighted in relationship to different countries though the various genes were found to share functionality. Some text analysis based results have been validated against results from other tools that predict gene-gene relations and gene functions.
Reconceptualizing Practice with Multilingual Children with Speech Sound Disorders: People, Practicalities and Policy

Science.gov (United States)

Verdon, Sarah; McLeod, Sharynne; Wong, Sandie

2015-01-01

Background: The speech and language therapy profession is required to provide services to increasingly multilingual caseloads. Much international research has focused on the challenges of speech and language therapists' (SLTs) practice with multilingual children. Aims: To draw on the experience and knowledge of experts in the field to: (1)…
Current trends in multilingual speech processing

Indian Academy of Sciences (India)

In this paper, we describe recent work at Idiap Research Institute in the domain of multilingual speech processing and provide some insights into emerging ... and industry for technologies to help break down domestic and international language barriers, these also being barriers to the expansion of policy and commerce.
Linguistic Minorities and the Multilingual Turn: Constructing Language Ownership through Affect in Cultural Production

Science.gov (United States)

McLaughlin, Mireille

2016-01-01

The "multilingual turn" brings questions of language ownership to the forefront of debates about linguistic minority governance. Acadian minority cultural producers construct language ownership using multiple languages and targeting multilingual publics, but use ideologies of monolingualism to situate Acadian authenticity in place and…
Towards Technological Approaches for Concept Maps Mining from Text

Directory of Open Access Journals (Sweden)

Camila Zacche Aguiar

2018-04-01

Full Text Available Concept maps are resources for the representation and construction of knowledge. They allow showing, through concepts and relationships, how knowledge about a subject is organized. Technological advances have boosted the development of approaches for the automatic construction of a concept map, to facilitate and provide the benefits of that resource more broadly. Due to the need to better identify and analyze the functionalities and characteristics of those approaches, we conducted a detailed study on technological approaches for automatic construction of concept maps published between 1994 and 2016 in the IEEE Xplore, ACM and Elsevier Science Direct data bases. From this study, we elaborate a categorization defined on two perspectives, Data Source and Graphic Representation, and fourteen categories. That study collected 30 relevant articles, which were applied to the proposed categorization to identify the main features and limitations of each approach. A detailed view on these approaches, their characteristics and techniques are presented enabling a quantitative analysis. In addition, the categorization has given us objective conditions to establish new specification requirements for a new technological approach aiming at concept maps mining from texts.
Potential Lessons for Teaching in Multilingual Mathematics Classrooms in Australia and Southeast Asia

Science.gov (United States)

Clarkson, Philip C.

2009-01-01

Multilingual classrooms are the normal learning contexts for most children throughout the world. However not all such contexts are identical. This distinction is not always made in the literature. In this paper the multilingual context for classrooms in many urban classrooms in Australia is described before exploring a possible model that might be…
Towards Multilingual Higher Education in South Africa: The University of Cape Town's Experience

Science.gov (United States)

Madiba, Mbulungeni

2010-01-01

South African universities are required by the Language Policy for Higher Education adopted by the government on 6 November 2002 to implement multilingualism in their learning and teaching programmes. Multilingualism is recommended in this policy as a means to ensure equity of access and success in higher education, in contrast to past colonial…
Multilingual Interaction and Minority Languages: Proficiency and Language Practices in Education and Society

Science.gov (United States)

Gorter, Durk

2015-01-01

In this plenary speech I examine multilingual interaction in a number of European regions in which minority languages are being revitalized. Education is a crucial variable, but the wider society is equally significant. The context of revitalization is no longer bilingual but increasingly multilingual. I draw on the results of a long-running…
Stopping Antidepressants and Anxiolytics as Major Concerns Reported in Online Health Communities: A Text Mining Approach.

Science.gov (United States)

Abbe, Adeline; Falissard, Bruno

2017-10-23

Internet is a particularly dynamic way to quickly capture the perceptions of a population in real time. Complementary to traditional face-to-face communication, online social networks help patients to improve self-esteem and self-help. The aim of this study was to use text mining on material from an online forum exploring patients' concerns about treatment (antidepressants and anxiolytics). Concerns about treatment were collected from discussion titles in patients' online community related to antidepressants and anxiolytics. To examine the content of these titles automatically, we used text mining methods, such as word frequency in a document-term matrix and co-occurrence of words using a network analysis. It was thus possible to identify topics discussed on the forum. The forum included 2415 discussions on antidepressants and anxiolytics over a period of 3 years. After a preprocessing step, the text mining algorithm identified the 99 most frequently occurring words in titles, among which were escitalopram, withdrawal, antidepressant, venlafaxine, paroxetine, and effect. Patients' concerns were related to antidepressant withdrawal, the need to share experience about symptoms, effects, and questions on weight gain with some drugs. Patients' expression on the Internet is a potential additional resource in addressing patients' concerns about treatment. Patient profiles are close to that of patients treated in psychiatry. ©Adeline Abbe, Bruno Falissard. Originally published in JMIR Mental Health (http://mental.jmir.org), 23.10.2017.
Lifelong exposure to multilingualism: new evidence to support cognitive reserve hypothesis.

Science.gov (United States)

Perquin, Magali; Vaillant, Michel; Schuller, Anne-Marie; Pastore, Jessica; Dartigues, Jean-François; Lair, Marie-Lise; Diederich, Nico

2013-01-01

Investigate the protective effect of multilingualism on cognition in seniors. As part of the MemoVie study conducted on 232 non-demented volunteers aged 65 and more, neurogeriatric and neuropsychological evaluations were performed. Participants were classified as presenting either cognitive impairment without dementia (CIND) or being free of any cognitive impairment (CIND-free). Language practices, socio-demographic data and lifestyle habits were recorded. In this retrospective nested case-control design, we used as proxies of multilingualism: number of languages practiced, age of acquisition and duration of practice, emphasizing the temporal pattern of acquisition, and the resulting practice of several languages sequentially or concomitantly during various periods of life. This special angle on the matter offered to our work a dimension particularly original and innovative. 44 subjects (19%) had CIND, the others were cognitively normal. All practiced from 2 to 7 languages. When compared with bilinguals, participants who practiced more than 2 languages presented a lower risk of CIND, after adjustment for education and age (odds ratio (OR) = 0.30, 95% confidence limits (95%CL) = [0.10-0.92]). Progressing from 2 to 3 languages, instead of staying bilingual, was associated with a 7-fold protection against CIND (OR = 0.14, 95%CL = [0.04-0.45], p = 0.0010). A one year delay to reach multilingualism (3 languages practiced being the threshold) multiplied the risk of CIND by 1.022 (OR = 1.022, 95%CL = [1.01-1.04], p = 0.0044). Also noteworthy, just as for multilingualism, an impact of cognitively stimulating activities on the occurrence of CIND was found as well (OR = 0.979, 95%CL = [0.961-0.998], p = 0.033). The study did not show independence of multilingualism and CIND. Rather it seems to show a strong association toward a protection against CIND. Practicing multilingualism from early life on, and/or learning it at a fast pace
Anxiety and EFL: Does Multilingualism Matter?

Science.gov (United States)

Thompson, Amy S.; Lee, Junkyu

2013-01-01

The current study is motivated by the gap in the current literature about foreign language classroom anxiety, namely the underlying construct of FL anxiety with regard to the understudied relationship between anxiety, proficiency, and multilingualism. The evidence for the effect of language anxiety on achievement is well-documented. More recently,…
Multilingual natural language generation as part of a medical terminology server.

Science.gov (United States)

Wagner, J C; Solomon, W D; Michel, P A; Juge, C; Baud, R H; Rector, A L; Scherrer, J R

1995-01-01

Re-usable and sharable, and therefore language-independent concept models are of increasing importance in the medical domain. The GALEN project (Generalized Architecture for Languages Encyclopedias and Nomenclatures in Medicine) aims at developing language-independent concept representation systems as the foundations for the next generation of multilingual coding systems. For use within clinical applications, the content of the model has to be mapped to natural language. A so-called Multilingual Information Module (MM) establishes the link between the language-independent concept model and different natural languages. This text generation software must be versatile enough to cope at the same time with different languages and with different parts of a compositional model. It has to meet, on the one hand, the properties of the language as used in the medical domain and, on the other hand, the specific characteristics of the underlying model and its representation formalism. We propose a semantic-oriented approach to natural language generation that is based on linguistic annotations to a concept model. This approach is realized as an integral part of a Terminology Server, built around the concept model and offering different terminological services for clinical applications.
Mnohojazyčnost jako dílčí cíl výuky cizích jazyků a možnosti její podpory / Multilingualism as a particular goal of foreign language education and possibilities for its support

Directory of Open Access Journals (Sweden)

Miroslav Janík

2014-06-01

Full Text Available The study deals with the issue of multilingualism as a particular goal of language education. The author aims to offer an insight into the additional language teaching at primary/secondary schools in the context of multilingualism. The first part of the paper provides a definition of the concept of multilingualism, which forms the theoretical background of our study (i.e. multilingualism as a particular goal of language teaching. As we define multilingualism as pupils’ ability to speak three or more languages, we propose a system for ordering and labelling these languages. The next part of the paper focuses on acquisition of additional (second, third etc. languages and its characteristics. The fourth part of the paper deals with the didactic approach to the concept of multilingualism and its possible implementation into additional language teaching. Finally yet importantly, we focus on teachers and the competencies that they should have in order to meaningfully support multilingualism in instruction.
More Delusions May Be Observed in Low-Proficient Multilingual Alzheimer’s Disease Patients

Science.gov (United States)

Liu, Yi-Chien; Liu, Yen-Ying; Yip, Ping-Keung; Akanuma, Kyoko; Meguro, Kenichi

2015-01-01

Background Language impairment and behavioral symptoms are both common phenomena in dementia patients. In this study, we investigated the behavioral symptoms in dementia patients with different language backgrounds. Through this, we aimed to propose a possible connection between language and delusion. Methods We recruited 21 patients with Alzheimer’s disease (AD), according to the DSM-IV and NINCDS-ADRDA criteria, from the memory clinic of the Cardinal Tien Hospital in Taipei, Taiwan. They were classified into two groups: 11 multilinguals who could speak Japanese, Taiwanese and Mandarin Chinese, and 10 bilinguals who only spoke Taiwanese and Mandarin Chinese. There were no differences between age, education, disease duration, disease severity, environment and medical care between these two groups. Comprehensive neuropsychological examinations, including Clinical Dementia Rating (CDR), Mini-Mental Status Examination (MMSE), Cognitive Abilities Screening Instrument (CASI), Verbal fluency, Chinese version of the Boston naming test (BNT) and the Behavioral Pathology in Alzheimer’s Disease Rating Scale (BEHAVE-AD), were administered. Results The multilingual group showed worse results on the Boston naming test. Other neuropsychological tests, including the MMSE, CASI and Verbal fluency, were not significantly different. More delusions were noted in the multilingual group. Three pairs of subjects were identified for further examination of their differences. These three cases presented the typical scenario of how language misunderstanding may cause delusions in multilingual dementia patients. Consequently, more emotion and distorted ideas may be induced in the multilinguals compared with the MMSE-matched controls. Conclusion Inappropriate mixing of language or conflict between cognition and emotion may cause more delusions in these multilingual patients. This reminds us that delusion is not a pure biological outcome of brain degeneration. Although the cognitive
Analysis of Nature of Science Included in Recent Popular Writing Using Text Mining Techniques

Science.gov (United States)

Jiang, Feng; McComas, William F.

2014-01-01

This study examined the inclusion of nature of science (NOS) in popular science writing to determine whether it could serve supplementary resource for teaching NOS and to evaluate the accuracy of text mining and classification as a viable research tool in science education research. Four groups of documents published from 2001 to 2010 were…
A multilingual, multicultural and explanatory music education ...

African Journals Online (AJOL)

A multilingual, multicultural and explanatory music education dictionary for South Africa - using Wiegand's metalexicography to establish its purposes, functions ... dictionary, it will have to contain elements of different types of dictionaries, such as explanatory dictionaries, translation dictionaries, and learner's dictionaries.
Educational Trajectories at the Crossroads: The Making and Unmaking of Multilingual Communities of Learners

Science.gov (United States)

Budach, Gabriele

2014-01-01

This article investigates the educational trajectories of young multilingual learners in Germany. Drawing on previous ethnographic research in a primary bilingual German-Italian Two-Way-Immersion classroom, this study examines the continuity and fragmentation of multilingual learning as they occur in the transition from primary to secondary…
Introducing discussion into multilingual mathematics classrooms: An issue of code switching?

Directory of Open Access Journals (Sweden)

Lyn Webb

2008-10-01

Full Text Available The Department of Education in South Africa advocates collaborative and constructivist learning; however, observations indicate that little discussion occurs in most multilingual mathematics classes. In this paper we draw on a pilot study set in the Eastern Cape where teachers were introduced to the theory and practice of exploratory talk, and then tasked to perform an action research project on introducing discussion in their own multilingual mathematics classrooms. The results of the study suggest some successes in terms of teachers initiating exploratory talk and highlight the fact that these successes were only achieved where code switching between English and isiXhosa formed an integral part of the process.
The Moroccan Educational Context: Evolving Multilingualism

Science.gov (United States)

Daniel, Mayra C.; Ball, Alexis

2010-01-01

This article begins an investigation of the educational system of Morocco and its context of language diversity. It examines the Moroccan cultural environment and the ways the multilingualism and education of its people has been and continues to be influenced by geography, colonization periods, religion, and history. The effects of the Educational…
Drupal 7 Multilingual Sites

CERN Document Server

Pol, Kristen

2012-01-01

A practical book with plenty of screenshots to guide you through the many features of multilingual Drupal. A demo ecommerce site is provided if you want to practice on a sample site, although you can apply the techniques learnt in the book directly to your site too. Any Drupal users who know the basics of building a Drupal site and are familiar with the Drupal UI, will benefit from this book. No previous knowledge of localization or internationalization is required.

Theoretical-and-Methodological Substantiation of Multilingual Model Activity in Kazakhstan Higher School Education System

Science.gov (United States)

Ospanova, Bikesh Revovna; Azimbayeva, Zhanat Amantayevna; Timokhina, Tatyana Vladimirovna; Seydakhmetova, Zergul Koblandiyevna

2016-01-01

The need of implementing the model of professional development in training an expert in the conditions of multilingualism is considered. The possibility of using the multilingual approach in the context of present day education with the use of innovative technologies of training is substantiated, the definition of "multilingual…
Social Inclusion through Multilingual Ideologies, Policies and Practices: A Case Study of a Minority Church

Science.gov (United States)

Han, Huamei

2011-01-01

Adopting a materialist and processual approach to language and specifically multilingualism, this paper explores what language ideologies a minority, non-educational institution embraced and how this facilitated social inclusion through constructing institutional multilingualism within societal monolingualism. Specifically, I document how a…
A SKOS-based multilingual thesaurus of geological time scale for interopability of online geological maps

NARCIS (Netherlands)

Ma, X.; Carranza, E.J.M.; Wu, C.; Meer, F.D. van der; Liu, G.

2011-01-01

The usefulness of online geological maps is hindered by linguistic barriers. Multilingual geoscience thesauri alleviate linguistic barriers of geological maps. However, the benefits of multilingual geoscience thesauri for online geological maps are less studied. In this regard, we developed a
Rastafarian-herbalists' enregisterment of multilingual voices in an ...

African Journals Online (AJOL)

Kate H

issues of a religious nature are talked about and debated by multilingual speakers. ..... commuters through Bellstar Junction provides important business for the .... movement the Rastafari is guided by the following religious ethical practices ...
Data mining of text as a tool in authorship attribution

Science.gov (United States)

Visa, Ari J. E.; Toivonen, Jarmo; Autio, Sami; Maekinen, Jarno; Back, Barbro; Vanharanta, Hannu

2001-03-01

It is common that text documents are characterized and classified by keywords that the authors use to give them. Visa et al. have developed a new methodology based on prototype matching. The prototype is an interesting document or a part of an extracted, interesting text. This prototype is matched with the document database of the monitored document flow. The new methodology is capable of extracting the meaning of the document in a certain degree. Our claim is that the new methodology is also capable of authenticating the authorship. To verify this claim two tests were designed. The test hypothesis was that the words and the word order in the sentences could authenticate the author. In the first test three authors were selected. The selected authors were William Shakespeare, Edgar Allan Poe, and George Bernard Shaw. Three texts from each author were examined. Every text was one by one used as a prototype. The two nearest matches with the prototype were noted. The second test uses the Reuters-21578 financial news database. A group of 25 short financial news reports from five different authors are examined. Our new methodology and the interesting results from the two tests are reported in this paper. In the first test, for Shakespeare and for Poe all cases were successful. For Shaw one text was confused with Poe. In the second test the Reuters-21578 financial news were identified by the author relatively well. The resolution is that our text mining methodology seems to be capable of authorship attribution.
Comparison between BIDE, PrefixSpan, and TRuleGrowth for Mining of Indonesian Text

Science.gov (United States)

Sa'adillah Maylawati, Dian; Irfan, Mohamad; Budiawan Zulfikar, Wildan

2017-01-01

Mining proscess for Indonesian language still be an interesting research. Multiple of words representation was claimed can keep the meaning of text better than bag of words. In this paper, we compare several sequential pattern algortihm, among others BIDE (BIDirectional Extention), PrefixSpan, and TRuleGrowth. All of those algorithm produce frequent word sequence to keep the meaning of text. However, the experiment result, with 14.006 of Indonesian tweet from Twitter, shows that BIDE can produce more efficient frequent word sequence than PrefixSpan and TRuleGrowth without missing the meaning of text. Then, the average of time process of PrefixSpan is faster than BIDE and TRuleGrowth. In the other hand, PrefixSpan and TRuleGrowth is more efficient in using memory than BIDE.
Intercultural Contact and Multilingualism in an Intimate Relationship in the Austro-Hungarian Littoral.

Science.gov (United States)

Martinis, Anja Iveković

2016-09-01

The paper presents a case study of multilingualism in private correspondence in turn-of-the-century Austro-Hungarian Istria. Language attitudes and use of German, Italian and Slovenian are analyzed, with results indicating the compatibility of national feelings with an appreciation of multilingualism, as well as the important role that intimate intercultural relationships play in this regard in a culturally mixed region.
tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles.

Science.gov (United States)

Cejuela, Juan Miguel; McQuilton, Peter; Ponting, Laura; Marygold, Steven J; Stefancsik, Raymund; Millburn, Gillian H; Rost, Burkhard

2014-01-01

The breadth and depth of biomedical literature are increasing year upon year. To keep abreast of these increases, FlyBase, a database for Drosophila genomic and genetic information, is constantly exploring new ways to mine the published literature to increase the efficiency and accuracy of manual curation and to automate some aspects, such as triaging and entity extraction. Toward this end, we present the 'tagtog' system, a web-based annotation framework that can be used to mark up biological entities (such as genes) and concepts (such as Gene Ontology terms) in full-text articles. tagtog leverages manual user annotation in combination with automatic machine-learned annotation to provide accurate identification of gene symbols and gene names. As part of the BioCreative IV Interactive Annotation Task, FlyBase has used tagtog to identify and extract mentions of Drosophila melanogaster gene symbols and names in full-text biomedical articles from the PLOS stable of journals. We show here the results of three experiments with different sized corpora and assess gene recognition performance and curation speed. We conclude that tagtog-named entity recognition improves with a larger corpus and that tagtog-assisted curation is quicker than manual curation. DATABASE URL: www.tagtog.net, www.flybase.org.
Preparing pre-service teachers for multilingual classrooms ...

African Journals Online (AJOL)

This article addresses the challenge and process of the curriculum design using the classic ADDIE model. It also documents student reaction to the compulsory module as well as their experience of language learning. Keywords: curriculum + multilingual classrooms, instructional design and development, language learning ...
Appraising the Corporate Sustainability Reports - Text Mining and Multi-Discriminatory Analysis

Science.gov (United States)

Modapothala, J. R.; Issac, B.; Jayamani, E.

The voluntary disclosure of the sustainability reports by the companies attracts wider stakeholder groups. Diversity in these reports poses challenge to the users of information and regulators. This study appraises the corporate sustainability reports as per GRI (Global Reporting Initiative) guidelines (the most widely accepted and used) across all industrial sectors. Text mining is adopted to carry out the initial analysis with a large sample size of 2650 reports. Statistical analyses were performed for further investigation. The results indicate that the disclosures made by the companies differ across the industrial sectors. Multivariate Discriminant Analysis (MDA) shows that the environmental variable is a greater significant contributing factor towards explanation of sustainability report.
Text mining applications in psychiatry: a systematic literature review.

Science.gov (United States)

Abbe, Adeline; Grouin, Cyril; Zweigenbaum, Pierre; Falissard, Bruno

2016-06-01

The expansion of biomedical literature is creating the need for efficient tools to keep pace with increasing volumes of information. Text mining (TM) approaches are becoming essential to facilitate the automated extraction of useful biomedical information from unstructured text. We reviewed the applications of TM in psychiatry, and explored its advantages and limitations. A systematic review of the literature was carried out using the CINAHL, Medline, EMBASE, PsycINFO and Cochrane databases. In this review, 1103 papers were screened, and 38 were included as applications of TM in psychiatric research. Using TM and content analysis, we identified four major areas of application: (1) Psychopathology (i.e. observational studies focusing on mental illnesses) (2) the Patient perspective (i.e. patients' thoughts and opinions), (3) Medical records (i.e. safety issues, quality of care and description of treatments), and (4) Medical literature (i.e. identification of new scientific information in the literature). The information sources were qualitative studies, Internet postings, medical records and biomedical literature. Our work demonstrates that TM can contribute to complex research tasks in psychiatry. We discuss the benefits, limits, and further applications of this tool in the future. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
From Bilingualism to Multilingualism in the Workplace: The Case of the Basque Autonomous Community

Science.gov (United States)

van der Worp, Karin; Cenoz, Jasone; Gorter, Durk

2017-01-01

In this article we discuss the outcomes of a study into the languages of the workplace of internationally operating companies. Our aim is to contribute to studies of multilingualism in the workplace by adopting a holistic approach that focuses on several languages and relates the competences and attitudes of multilingual professionals to the…
Mining concepts of health responsibility using text mining and exploratory graph analysis.

Science.gov (United States)

Kjellström, Sofia; Golino, Hudson

2018-05-24

Occupational therapists need to know about people's beliefs about personal responsibility for health to help them pursue everyday activities. The study aims to employ state-of-the-art quantitative approaches to understand people's views of health and responsibility at different ages. A mixed method approach was adopted, using text mining to extract information from 233 interviews with participants aged 5 to 96 years, and then exploratory graph analysis to estimate the number of latent variables. The fit of the structure estimated via the exploratory graph analysis was verified using confirmatory factor analysis. Exploratory graph analysis estimated three dimensions of health responsibility: (1) creating good health habits and feeling good; (2) thinking about one's own health and wanting to improve it; and 3) adopting explicitly normative attitudes to take care of one's health. The comparison between the three dimensions among age groups showed, in general, that children and adolescents, as well as the old elderly (>73 years old) expressed ideas about personal responsibility for health less than young adults, adults and young elderly. Occupational therapists' knowledge of the concepts of health responsibility is of value when working with a patient's health, but an identified challenge is how to engage children and older persons.
Examining Mobile Learning Trends 2003-2008: A Categorical Meta-Trend Analysis Using Text Mining Techniques

Science.gov (United States)

Hung, Jui-Long; Zhang, Ke

2012-01-01

This study investigated the longitudinal trends of academic articles in Mobile Learning (ML) using text mining techniques. One hundred and nineteen (119) refereed journal articles and proceedings papers from the SCI/SSCI database were retrieved and analyzed. The taxonomies of ML publications were grouped into twelve clusters (topics) and four…
Text mining effectively scores and ranks the literature for improving chemical-gene-disease curation at the comparative toxicogenomics database.

Directory of Open Access Journals (Sweden)

Allan Peter Davis

Full Text Available The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/ is a public resource that curates interactions between environmental chemicals and gene products, and their relationships to diseases, as a means of understanding the effects of environmental chemicals on human health. CTD provides a triad of core information in the form of chemical-gene, chemical-disease, and gene-disease interactions that are manually curated from scientific articles. To increase the efficiency, productivity, and data coverage of manual curation, we have leveraged text mining to help rank and prioritize the triaged literature. Here, we describe our text-mining process that computes and assigns each article a document relevancy score (DRS, wherein a high DRS suggests that an article is more likely to be relevant for curation at CTD. We evaluated our process by first text mining a corpus of 14,904 articles triaged for seven heavy metals (cadmium, cobalt, copper, lead, manganese, mercury, and nickel. Based upon initial analysis, a representative subset corpus of 3,583 articles was then selected from the 14,094 articles and sent to five CTD biocurators for review. The resulting curation of these 3,583 articles was analyzed for a variety of parameters, including article relevancy, novel data content, interaction yield rate, mean average precision, and biological and toxicological interpretability. We show that for all measured parameters, the DRS is an effective indicator for scoring and improving the ranking of literature for the curation of chemical-gene-disease information at CTD. Here, we demonstrate how fully incorporating text mining-based DRS scoring into our curation pipeline enhances manual curation by prioritizing more relevant articles, thereby increasing data content, productivity, and efficiency.
Adverse Event extraction from Structured Product Labels using the Event-based Text-mining of Health Electronic Records (ETHER)system.

Science.gov (United States)

Pandey, Abhishek; Kreimeyer, Kory; Foster, Matthew; Botsis, Taxiarchis; Dang, Oanh; Ly, Thomas; Wang, Wei; Forshee, Richard

2018-01-01

Structured Product Labels follow an XML-based document markup standard approved by the Health Level Seven organization and adopted by the US Food and Drug Administration as a mechanism for exchanging medical products information. Their current organization makes their secondary use rather challenging. We used the Side Effect Resource database and DailyMed to generate a comparison dataset of 1159 Structured Product Labels. We processed the Adverse Reaction section of these Structured Product Labels with the Event-based Text-mining of Health Electronic Records system and evaluated its ability to extract and encode Adverse Event terms to Medical Dictionary for Regulatory Activities Preferred Terms. A small sample of 100 labels was then selected for further analysis. Of the 100 labels, Event-based Text-mining of Health Electronic Records achieved a precision and recall of 81 percent and 92 percent, respectively. This study demonstrated Event-based Text-mining of Health Electronic Record's ability to extract and encode Adverse Event terms from Structured Product Labels which may potentially support multiple pharmacoepidemiological tasks.
Functional MRI of Multilingual Subjects

International Nuclear Information System (INIS)

Cho, Jae Min; Ryoo, Jae Wook; Choi, Dae Seob; Shin, Tae Beom; Chung, Sung Hoon; Kim, Ji Eun; Han, Heon; Kim, Sam Soo; Jeon, Yong Hwan

2009-01-01

To evaluate brain activation areas during the processing of languages in multilingual volunteers by functional MRI and to examine the differences between the mother and foreign languages. Nine multilingual (Korean, French, and English speaking) Korean individuals were enrolled in this study. Functional images were acquired during a lexical decision task (LDT) and picture naming task (PNT) in each of the Korean, French and English languages. The areas activated were analyzed topographically in each language and task, and compared between languages. Activation was noted in Broca's area, supramarginal gyrus, fusiform gyrus during the LDT. During the PNT, activation was noted in Broca's area, left prefrontal area, cerebellum, right extrastriated cortex. While Broca's area activation was observed for all languages during LDT, there was more activation in Broca's area and additional activation in the right prefrontal area with foreign languages. During the PNT, there was more activation in the left prefrontal area with foreign languages. Broca's area, which is known as a major language region, was activated by all languages and tasks. The brain activation areas were largely overlapping with the mother and foreign languages. However, there were wider areas of activation and additional different activation areas with foreign languages. These results suggest more cerebral effort during foreign language processing
Trends of E-Learning Research from 2000 to 2008: Use of Text Mining and Bibliometrics

Science.gov (United States)

Hung, Jui-long

2012-01-01

This study investigated the longitudinal trends of e-learning research using text mining techniques. Six hundred and eighty-nine (689) refereed journal articles and proceedings were retrieved from the Science Citation Index/Social Science Citation Index database in the period from 2000 to 2008. All e-learning publications were grouped into two…
Seqenv: linking sequences to environments through text mining

Directory of Open Access Journals (Sweden)

Lucas Sinclair

2016-12-01

Full Text Available Understanding the distribution of taxa and associated traits across different environments is one of the central questions in microbial ecology. High-throughput sequencing (HTS studies are presently generating huge volumes of data to address this biogeographical topic. However, these studies are often focused on specific environment types or processes leading to the production of individual, unconnected datasets. The large amounts of legacy sequence data with associated metadata that exist can be harnessed to better place the genetic information found in these surveys into a wider environmental context. Here we introduce a software program, seqenv, to carry out precisely such a task. It automatically performs similarity searches of short sequences against the “nt” nucleotide database provided by NCBI and, out of every hit, extracts–if it is available–the textual metadata field. After collecting all the isolation sources from all the search results, we run a text mining algorithm to identify and parse words that are associated with the Environmental Ontology (EnvO controlled vocabulary. This, in turn, enables us to determine both in which environments individual sequences or taxa have previously been observed and, by weighted summation of those results, to summarize complete samples. We present two demonstrative applications of seqenv to a survey of ammonia oxidizing archaea as well as to a plankton paleome dataset from the Black Sea. These demonstrate the ability of the tool to reveal novel patterns in HTS and its utility in the fields of environmental source tracking, paleontology, and studies of microbial biogeography. To install seqenv, go to: https://github.com/xapple/seqenv.
Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine.

Science.gov (United States)

Singhal, Ayush; Simmons, Michael; Lu, Zhiyong

2016-11-01

The practice of precision medicine will ultimately require databases of genes and mutations for healthcare providers to reference in order to understand the clinical implications of each patient's genetic makeup. Although the highest quality databases require manual curation, text mining tools can facilitate the curation process, increasing accuracy, coverage, and productivity. However, to date there are no available text mining tools that offer high-accuracy performance for extracting such triplets from biomedical literature. In this paper we propose a high-performance machine learning approach to automate the extraction of disease-gene-variant triplets from biomedical literature. Our approach is unique because we identify the genes and protein products associated with each mutation from not just the local text content, but from a global context as well (from the Internet and from all literature in PubMed). Our approach also incorporates protein sequence validation and disease association using a novel text-mining-based machine learning approach. We extract disease-gene-variant triplets from all abstracts in PubMed related to a set of ten important diseases (breast cancer, prostate cancer, pancreatic cancer, lung cancer, acute myeloid leukemia, Alzheimer's disease, hemochromatosis, age-related macular degeneration (AMD), diabetes mellitus, and cystic fibrosis). We then evaluate our approach in two ways: (1) a direct comparison with the state of the art using benchmark datasets; (2) a validation study comparing the results of our approach with entries in a popular human-curated database (UniProt) for each of the previously mentioned diseases. In the benchmark comparison, our full approach achieves a 28% improvement in F1-measure (from 0.62 to 0.79) over the state-of-the-art results. For the validation study with UniProt Knowledgebase (KB), we present a thorough analysis of the results and errors. Across all diseases, our approach returned 272 triplets (disease

Text mining of rheumatoid arthritis and diabetes mellitus to understand the mechanisms of Chinese medicine in different diseases with same treatment.

Science.gov (United States)

Zhao, Ning; Zheng, Guang; Li, Jian; Zhao, Hong-Yan; Lu, Cheng; Jiang, Miao; Zhang, Chi; Guo, Hong-Tao; Lu, Ai-Ping

2018-01-09

To identify the commonalities between rheumatoid arthritis (RA) and diabetes mellitus (DM) to understand the mechanisms of Chinese medicine (CM) in different diseases with the same treatment. A text mining approach was adopted to analyze the commonalities between RA and DM according to CM and biological elements. The major commonalities were subsequently verifified in RA and DM rat models, in which herbal formula for the treatment of both RA and DM identifified via text mining was used as the intervention. Similarities were identifified between RA and DM regarding the CM approach used for diagnosis and treatment, as well as the networks of biological activities affected by each disease, including the involvement of adhesion molecules, oxidative stress, cytokines, T-lymphocytes, apoptosis, and inflfl ammation. The Ramulus Cinnamomi-Radix Paeoniae Alba-Rhizoma Anemarrhenae is an herbal combination used to treat RA and DM. This formula demonstrated similar effects on oxidative stress and inflfl ammation in rats with collagen-induced arthritis, which supports the text mining results regarding the commonalities between RA and DM. Commonalities between the biological activities involved in RA and DM were identifified through text mining, and both RA and DM might be responsive to the same intervention at a specifific stage.
Data Mining of Acupoint Characteristics from the Classical Medical Text: DongUiBoGam of Korean Medicine

Directory of Open Access Journals (Sweden)

Taehyung Lee

2014-01-01

Full Text Available Throughout the history of East Asian medicine, different kinds of acupuncture treatment experiences have been accumulated in classical medical texts. Reexamining knowledge from classical medical texts is expected to provide meaningful information that could be utilized in current medical practices. In this study, we used data mining methods to analyze the association between acupoints and patterns of disorder with the classical medical book DongUiBoGam of Korean medicine. Using the term frequency-inverse document frequency (tf-idf method, we quantified the significance of acupoints to its targeting patterns and, conversely, the significance of patterns to acupoints. Through these processes, we extracted characteristics of each acupoint based on its treating patterns. We also drew practical information for selecting acupoints on certain patterns according to their association. Data analysis on DongUiBoGam’s acupuncture treatment gave us an insight into the main idea of DongUiBoGam. We strongly believe that our approach can provide a novel understanding of unknown characteristics of acupoint and pattern identification from the classical medical text using data mining methods.
A Study on Environmental Research Trends Using Text-Mining Method - Focus on Spatial information and ICT -

Science.gov (United States)

Lee, M. J.; Oh, K. Y.; Joung-ho, L.

2016-12-01

Recently there are many research about analysing the interaction between entities by text-mining analysis in various fields. In this paper, we aimed to quantitatively analyse research-trends in the area of environmental research relating either spatial information or ICT (Information and Communications Technology) by Text-mining analysis. To do this, we applied low-dimensional embedding method, clustering analysis, and association rule to find meaningful associative patterns of key words frequently appeared in the articles. As the authors suppose that KCI (Korea Citation Index) articles reflect academic demands, total 1228 KCI articles that have been published from 1996 to 2015 were reviewed and analysed by Text-mining method. First, we derived KCI articles from NDSL(National Discovery for Science Leaders) site. And then we pre-processed their key-words elected from abstract and then classified those in separable sectors. We investigated the appearance rates and association rule of key-words for articles in the two fields: spatial-information and ICT. In order to detect historic trends, analysis was conducted separately for the four periods: 1996-2000, 2001-2005, 2006-2010, 2011-2015. These analysis were conducted with the usage of R-software. As a result, we conformed that environmental research relating spatial information mainly focused upon such fields as `GIS(35%)', `Remote-Sensing(25%)', `environmental theme map(15.7%)'. Next, `ICT technology(23.6%)', `ICT service(5.4%)', `mobile(24%)', `big data(10%)', `AI(7%)' are primarily emerging from environmental research relating ICT. Thus, from the analysis results, this paper asserts that research trends and academic progresses are well-structured to review recent spatial information and ICT technology and the outcomes of the analysis can be an adequate guidelines to establish environment policies and strategies. KEY WORDS: Big data, Test-mining, Environmental research, Spatial-information, ICT Acknowledgements: The
From Opposition to Transcendence: The Language Practices and Ideologies of Students in a Multilingual University

Science.gov (United States)

Gu, Mingyue

2014-01-01

This article explores language ideologies and language uses in a multilingual university in Hong Kong by exploring the voices and experiences of both mainland Chinese and Hong Kong students. Drawing on the notions of language ideologies, separate multilingualism, and translanguaging, the research illustrates how students' linguistic ideologies are…
Alkemio: association of chemicals with biomedical topics by text and data mining.

Science.gov (United States)

Gijón-Correas, José A; Andrade-Navarro, Miguel A; Fontaine, Jean F

2014-07-01

The PubMed® database of biomedical citations allows the retrieval of scientific articles studying the function of chemicals in biology and medicine. Mining millions of available citations to search reported associations between chemicals and topics of interest would require substantial human time. We have implemented the Alkemio text mining web tool and SOAP web service to help in this task. The tool uses biomedical articles discussing chemicals (including drugs), predicts their relatedness to the query topic with a naïve Bayesian classifier and ranks all chemicals by P-values computed from random simulations. Benchmarks on seven human pathways showed good retrieval performance (areas under the receiver operating characteristic curves ranged from 73.6 to 94.5%). Comparison with existing tools to retrieve chemicals associated to eight diseases showed the higher precision and recall of Alkemio when considering the top 10 candidate chemicals. Alkemio is a high performing web tool ranking chemicals for any biomedical topics and it is free to non-commercial users. http://cbdm.mdc-berlin.de/∼medlineranker/cms/alkemio. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
A method for extracting design rationale knowledge based on Text Mining

Directory of Open Access Journals (Sweden)

Liu Jihong

2017-01-01

Full Text Available Capture design rationale (DR knowledge and presenting it to designers by good form, which have great significance for design reuse and design innovation. Since the 1970s design rationality began to develop, many teams have developed their own design rational system. However, the DR acquisition system is not intelligent enough, and it still requires designers to do a lot of operations. In addition, the existing design documents contain a large number of DR knowledge, but it has not been well excavated. Therefore, a method and system are needed to better extract DR knowledge in design documents. We have proposed a DRKH (design rationale knowledge hierarchy model for DR representation. The DRKH model has three layers, respectively as design intent layer, design decision layer and design basis layer. In this paper, we use text mining method to extract DR from design documents and construct DR model. Finally, the welding robot design specification is taken as an example to demonstrate the system interface.
Internet of Things in Health Trends Through Bibliometrics and Text Mining.

Science.gov (United States)

Konstantinidis, Stathis Th; Billis, Antonis; Wharrad, Heather; Bamidis, Panagiotis D

2017-01-01

Recently a new buzzword has slowly but surely emerged, namely the Internet of Things (IoT). The importance of IoT is identified worldwide both by organisations and governments and the scientific community with an incremental number of publications during the last few years. IoT in Health is one of the main pillars of this evolution, but limited research has been performed on future visions and trends. Thus, in this study we investigate the longitudinal trends of Internet of Things in Health through bibliometrics and use of text mining. Seven hundred seventy eight (778) articles were retrieved form The Web of Science database from 1998 to 2016. The publications are grouped into thirty (30) clusters based on abstract text analysis resulting into some eight (8) trends of IoT in Health. Research in this field is obviously obtaining a worldwide character with specific trends, which are worth delineating to be in favour of some areas.
Text mining-based in silico drug discovery in oral mucositis caused by high-dose cancer therapy.

Science.gov (United States)

Kirk, Jon; Shah, Nirav; Noll, Braxton; Stevens, Craig B; Lawler, Marshall; Mougeot, Farah B; Mougeot, Jean-Luc C

2018-08-01

Oral mucositis (OM) is a major dose-limiting side effect of chemotherapy and radiation used in cancer treatment. Due to the complex nature of OM, currently available drug-based treatments are of limited efficacy. Our objectives were (i) to determine genes and molecular pathways associated with OM and wound healing using computational tools and publicly available data and (ii) to identify drugs formulated for topical use targeting the relevant OM molecular pathways. OM and wound healing-associated genes were determined by text mining, and the intersection of the two gene sets was selected for gene ontology analysis using the GeneCodis program. Protein interaction network analysis was performed using STRING-db. Enriched gene sets belonging to the identified pathways were queried against the Drug-Gene Interaction database to find drug candidates for topical use in OM. Our analysis identified 447 genes common to both the "OM" and "wound healing" text mining concepts. Gene enrichment analysis yielded 20 genes representing six pathways and targetable by a total of 32 drugs which could possibly be formulated for topical application. A manual search on ClinicalTrials.gov confirmed no relevant pathway/drug candidate had been overlooked. Twenty-five of the 32 drugs can directly affect the PTGS2 (COX-2) pathway, the pathway that has been targeted in previous clinical trials with limited success. Drug discovery using in silico text mining and pathway analysis tools can facilitate the identification of existing drugs that have the potential of topical administration to improve OM treatment.
Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine.

Directory of Open Access Journals (Sweden)

Ayush Singhal

2016-11-01

Full Text Available The practice of precision medicine will ultimately require databases of genes and mutations for healthcare providers to reference in order to understand the clinical implications of each patient's genetic makeup. Although the highest quality databases require manual curation, text mining tools can facilitate the curation process, increasing accuracy, coverage, and productivity. However, to date there are no available text mining tools that offer high-accuracy performance for extracting such triplets from biomedical literature. In this paper we propose a high-performance machine learning approach to automate the extraction of disease-gene-variant triplets from biomedical literature. Our approach is unique because we identify the genes and protein products associated with each mutation from not just the local text content, but from a global context as well (from the Internet and from all literature in PubMed. Our approach also incorporates protein sequence validation and disease association using a novel text-mining-based machine learning approach. We extract disease-gene-variant triplets from all abstracts in PubMed related to a set of ten important diseases (breast cancer, prostate cancer, pancreatic cancer, lung cancer, acute myeloid leukemia, Alzheimer's disease, hemochromatosis, age-related macular degeneration (AMD, diabetes mellitus, and cystic fibrosis. We then evaluate our approach in two ways: (1 a direct comparison with the state of the art using benchmark datasets; (2 a validation study comparing the results of our approach with entries in a popular human-curated database (UniProt for each of the previously mentioned diseases. In the benchmark comparison, our full approach achieves a 28% improvement in F1-measure (from 0.62 to 0.79 over the state-of-the-art results. For the validation study with UniProt Knowledgebase (KB, we present a thorough analysis of the results and errors. Across all diseases, our approach returned 272 triplets
Structural brain differences between monolingual and multilingual patients with mild cognitive impairment and Alzheimer disease: Evidence for cognitive reserve.

Science.gov (United States)

Duncan, Hilary D; Nikelski, Jim; Pilon, Randi; Steffener, Jason; Chertkow, Howard; Phillips, Natalie A

2018-01-31

Two independent lines of research provide evidence that speaking more than one language may 1) contribute to increased grey matter in healthy younger and older adults and 2) delay cognitive symptoms in mild cognitive impairment (MCI) or Alzheimer disease (AD). We examined cortical thickness and tissue density in monolingual and multilingual MCI and AD patients matched (within Diagnosis Groups) on demographic and cognitive variables. In medial temporal disease-related (DR) areas, we found higher tissue density in multilingual MCIs versus monolingual MCIs, but similar or lower tissue density in multilingual AD versus monolingual AD, a pattern consistent with cognitive reserve in AD. In areas related to language and cognitive control (LCC), both multilingual MCI and AD patients had thicker cortex than the monolinguals. Results were largely replicated in our native-born Canadian MCI participants, ruling out immigration as a potential confound. Finally, multilingual patients showed a correlation between cortical thickness in LCC regions and performance on episodic memory tasks. Given that multilinguals and monolinguals were matched on memory functioning, this suggests that increased gray matter in these regions may provide support to memory functioning. Our results suggest that being multilingual may contribute to increased gray matter in LCC areas and may also delay the cognitive effects of disease-related atrophy. Copyright © 2017 Elsevier Ltd. All rights reserved.
Finding novel relationships with integrated gene-gene association network analysis of Synechocystis sp. PCC 6803 using species-independent text-mining.

Science.gov (United States)

Kreula, Sanna M; Kaewphan, Suwisa; Ginter, Filip; Jones, Patrik R

2018-01-01

The increasing move towards open access full-text scientific literature enhances our ability to utilize advanced text-mining methods to construct information-rich networks that no human will be able to grasp simply from 'reading the literature'. The utility of text-mining for well-studied species is obvious though the utility for less studied species, or those with no prior track-record at all, is not clear. Here we present a concept for how advanced text-mining can be used to create information-rich networks even for less well studied species and apply it to generate an open-access gene-gene association network resource for Synechocystis sp. PCC 6803, a representative model organism for cyanobacteria and first case-study for the methodology. By merging the text-mining network with networks generated from species-specific experimental data, network integration was used to enhance the accuracy of predicting novel interactions that are biologically relevant. A rule-based algorithm (filter) was constructed in order to automate the search for novel candidate genes with a high degree of likely association to known target genes by (1) ignoring established relationships from the existing literature, as they are already 'known', and (2) demanding multiple independent evidences for every novel and potentially relevant relationship. Using selected case studies, we demonstrate the utility of the network resource and filter to ( i ) discover novel candidate associations between different genes or proteins in the network, and ( ii ) rapidly evaluate the potential role of any one particular gene or protein. The full network is provided as an open-source resource.
No free lunch

KAUST Repository

Ture, Ferhan; Elsayed, Tamer; Lin, Jimmy

2011-01-01

This work explores the problem of cross-lingual pairwise similarity, where the task is to extract similar pairs of documents across two different languages. Solutions to this problem are of general interest for text mining in the multilingual
The State Language of Kazakhstan and Multilingualism

Directory of Open Access Journals (Sweden)

Nazilyz M. Abduova

2017-06-01

Full Text Available The article deals with some actual problems of real bilingualism (polylingualism in modern Kazakhstan.One of the most important aspects of the Kazakhstani society of economic and social modernization is the policy in the field of language. In the modern world, multilingual and multicultural, the problem of language conjugation is more urgent than ever, the search for effective and viable programs in the field of languageson the consolidation of societies. Integration of Kazakhstan into the world community depends today on the realization and realization of a simple truth: the world is open to those who can master new knowledge through mastering the dominant languages. In Kazakhstan the notions of “bilingualism” and “polylingualism” mean the equality of languages.It is quite natural that bilingualism (polylingualism gains more and more importance in our republic.
VisualUrText: A Text Analytics Tool for Unstructured Textual Data

Science.gov (United States)

Zainol, Zuraini; Jaymes, Mohd T. H.; Nohuddin, Puteri N. E.

2018-05-01

The growing amount of unstructured text over Internet is tremendous. Text repositories come from Web 2.0, business intelligence and social networking applications. It is also believed that 80-90% of future growth data is available in the form of unstructured text databases that may potentially contain interesting patterns and trends. Text Mining is well known technique for discovering interesting patterns and trends which are non-trivial knowledge from massive unstructured text data. Text Mining covers multidisciplinary fields involving information retrieval (IR), text analysis, natural language processing (NLP), data mining, machine learning statistics and computational linguistics. This paper discusses the development of text analytics tool that is proficient in extracting, processing, analyzing the unstructured text data and visualizing cleaned text data into multiple forms such as Document Term Matrix (DTM), Frequency Graph, Network Analysis Graph, Word Cloud and Dendogram. This tool, VisualUrText, is developed to assist students and researchers for extracting interesting patterns and trends in document analyses.
Exploring Linguistic Identity in Young Multilingual Learners

Science.gov (United States)

Dressler, Roswita

2014-01-01

This article explores the linguistic identity of young multilingual learners through the use of a Language Portrait Silhouette. Examples from a research study of children aged 6-8 years in a German bilingual program in Canada provide teachers with an understanding that linguistic identity comprises expertise, affiliation, and inheritance. This…
EuroGOV: Engineering a Multilingual Web Corpus

NARCIS (Netherlands)

Sigurbjörnsson, B.; Kamps, J.; de Rijke, M.

2005-01-01

EuroGOV is a multilingual web corpus that was created to serve as the document collection for WebCLEF, the CLEF 2005 web retrieval task. EuroGOV is a collection of web pages crawled from the European Union portal, European Union member state governmental web sites, and Russian government web sites.
Lexicographical Resources in A Multilingual Environment: An ...

African Journals Online (AJOL)

This article considers dictionaries as lexical infonnation / knowledge sources to be derived from a deeper, underlying, lexical database. These dictionary-tokens or -instantiations are inter alia specified by the users' needs. As a case in point of such a derivation meeting the needs of a multilingual society, a bidirectional ...
Unsupervised text mining methods for literature analysis: a case study for Thomas Pynchon's V.

Directory of Open Access Journals (Sweden)

Christos Iraklis Tsatsoulis

2013-08-01

Full Text Available We investigate the use of unsupervised text mining methods for the analysis of prose literature works, using Thomas Pynchon's novel 'V'. as a case study. Our results suggest that such methods may be employed to reveal meaningful information regarding the novel’s structure. We report results using a wide variety of clustering algorithms, several distinct distance functions, and different visualization techniques. The application of a simple topic model is also demonstrated. We discuss the meaningfulness of our results along with the limitations of our approach, and we suggest some possible paths for further study.
Text Mining of the Classical Medical Literature for Medicines That Show Potential in Diabetic Nephropathy

Directory of Open Access Journals (Sweden)

Lei Zhang

2014-01-01

Full Text Available Objectives. To apply modern text-mining methods to identify candidate herbs and formulae for the treatment of diabetic nephropathy. Methods. The method we developed includes three steps: (1 identification of candidate ancient terms; (2 systemic search and assessment of medical records written in classical Chinese; (3 preliminary evaluation of the effect and safety of candidates. Results. Ancient terms Xia Xiao, Shen Xiao, and Xiao Shen were determined as the most likely to correspond with diabetic nephropathy and used in text mining. A total of 80 Chinese formulae for treating conditions congruent with diabetic nephropathy recorded in medical books from Tang Dynasty to Qing Dynasty were collected. Sao si tang (also called Reeling Silk Decoction was chosen to show the process of preliminary evaluation of the candidates. It had promising potential for development as new agent for the treatment of diabetic nephropathy. However, further investigations about the safety to patients with renal insufficiency are still needed. Conclusions. The methods developed in this study offer a targeted approach to identifying traditional herbs and/or formulae as candidates for further investigation in the search for new drugs for modern disease. However, more effort is still required to improve our techniques, especially with regard to compound formulae.
Is mommy talking to daddy or to me? Exploring parental estimates of child language exposure using the Multilingual Infant Language Questionnaire

NARCIS (Netherlands)

Liu, L.; Kager, R.W.J.

2017-01-01

Language input is a key factor in bi-/multilingual research. It roots in the definition of bi-/multilingualism and influences infant cognitive development since and even before birth. The methods used to assess language exposure among bi-/multilingual infants vary across studies. This paper

Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature.

Science.gov (United States)

Müller, H-M; Van Auken, K M; Li, Y; Sternberg, P W

2018-03-09

The biomedical literature continues to grow at a rapid pace, making the challenge of knowledge retrieval and extraction ever greater. Tools that provide a means to search and mine the full text of literature thus represent an important way by which the efficiency of these processes can be improved. We describe the next generation of the Textpresso information retrieval system, Textpresso Central (TPC). TPC builds on the strengths of the original system by expanding the full text corpus to include the PubMed Central Open Access Subset (PMC OA), as well as the WormBase C. elegans bibliography. In addition, TPC allows users to create a customized corpus by uploading and processing documents of their choosing. TPC is UIMA compliant, to facilitate compatibility with external processing modules, and takes advantage of Lucene indexing and search technology for efficient handling of millions of full text documents. Like Textpresso, TPC searches can be performed using keywords and/or categories (semantically related groups of terms), but to provide better context for interpreting and validating queries, search results may now be viewed as highlighted passages in the context of full text. To facilitate biocuration efforts, TPC also allows users to select text spans from the full text and annotate them, create customized curation forms for any data type, and send resulting annotations to external curation databases. As an example of such a curation form, we describe integration of TPC with the Noctua curation tool developed by the Gene Ontology (GO) Consortium. Textpresso Central is an online literature search and curation platform that enables biocurators and biomedical researchers to search and mine the full text of literature by integrating keyword and category searches with viewing search results in the context of the full text. It also allows users to create customized curation interfaces, use those interfaces to make annotations linked to supporting evidence statements
iSentenizer-μ: Multilingual Sentence Boundary Detection Model

Directory of Open Access Journals (Sweden)

Derek F. Wong

2014-01-01

Full Text Available Sentence boundary detection (SBD system is normally quite sensitive to genres of data that the system is trained on. The genres of data are often referred to the shifts of text topics and new languages domains. Although new detection models can be retrained for different languages or new text genres, previous model has to be thrown away and the creation process has to be restarted from scratch. In this paper, we present a multilingual sentence boundary detection system (iSentenizer-μ for Danish, German, English, Spanish, Dutch, French, Italian, Portuguese, Greek, Finnish, and Swedish languages. The proposed system is able to detect the sentence boundaries of a mixture of different text genres and languages with high accuracy. We employ i+Learning algorithm, an incremental tree learning architecture, for constructing the system. iSentenizer-μ, under the incremental learning framework, is adaptable to text of different topics and Roman-alphabet languages, by merging new data into existing model to learn the new knowledge incrementally by revision instead of retraining. The system has been extensively evaluated on different languages and text genres and has been compared against two state-of-the-art SBD systems, Punkt and MaxEnt. The experimental results show that the proposed system outperforms the other systems on all datasets.
Innovation in learning and development in multilingual and multicultural contexts: Principles learned from a higher educational study programme in Luxembourg

Science.gov (United States)

Ziegler, Gudrun

2011-12-01

Multilingualism in education is a conceptual as well as a pedagogical challenge of the 21st century. Luxembourg, with its three statutory official languages (Luxembourgish, French and German), is an especially complex setting. The gap between traditional principles of language education on the one hand and the challenging impacts of today's multilingualisms on the other led the University of Luxembourg (founded in 2003) to set up a developmentally-driven Master's programme in 2007, entitled "Learning and Development in Multilingual and Multicultural Contexts". After a presentation of the general multilingual settings in Luxembourg, this paper discusses the constellation of the multilingual University's staff and students and provides an analysis of the concept of the course by outlining its innovative approach, its principles and lessons learned with regard to running a trilingual higher education programme.
The Exposure Advantage: Early Exposure to a Multilingual Environment Promotes Effective Communication.

Science.gov (United States)

Fan, Samantha P; Liberman, Zoe; Keysar, Boaz; Kinzler, Katherine D

2015-07-01

Early language exposure is essential to developing a formal language system, but may not be sufficient for communicating effectively. To understand a speaker's intention, one must take the speaker's perspective. Multilingual exposure may promote effective communication by enhancing perspective taking. We tested children on a task that required perspective taking to interpret a speaker's intended meaning. Monolingual children failed to interpret the speaker's meaning dramatically more often than both bilingual children and children who were exposed to a multilingual environment but were not bilingual themselves. Children who were merely exposed to a second language performed as well as bilingual children, despite having lower executive-function scores. Thus, the communicative advantages demonstrated by the bilinguals may be social in origin, and not due to enhanced executive control. For millennia, multilingual exposure has been the norm. Our study shows that such an environment may facilitate the development of perspective-taking tools that are critical for effective communication. © The Author(s) 2015.
Positive Cognitive Effects of Bilingualism and Multilingualism on Cerebral Function: a Review.

Science.gov (United States)

Quinteros Baumgart, Cibel; Billick, Stephen Bates

2018-06-01

A review of the current literature regarding bilingualism demonstrates that bilingualism is linked to higher levels of controlled attention and inhibition in executive control and can protect against the decline of executive control in aging by contributing to cognitive reserve. Bilinguals may also have smaller vocabulary size and slower lexical retrieval for each language. The joint activation theory is proposed to explain these results. Older trilingual adults experience more protection against cognitive decline and children and young adults showed similar cognitive advantages to bilinguals in inhibitory control. Second language learners do not yet show cognitive changes associated with multilingualism. The Specificity Principle states that the acquisition of multiple languages is moderated by multiple factors and varies between experiences. Bilingualism and multilingualism are both associated with immigration but different types of multilingualism can develop depending on the situation. Cultural cues and language similarity also play a role in language switching and multiple language acquisition.
Multilingualism and Education for Democracy

Science.gov (United States)

Biseth, Heidi

2009-01-01

This essay attempts to show the importance of linguistic issues in education for democracy and the close relationship between democracy and multilingualism. Increasingly nation-states are having to adapt to linguistic diversity within their borders and to recognize that democracy requires the participation of all citizens, including those belonging to linguistic minorities. Democracy also requires that all linguistic groups share a sense of community. The author argues the need for educational policies that address these challenges.
Text Mining Effectively Scores and Ranks the Literature for Improving Chemical-Gene-Disease Curation at the Comparative Toxicogenomics Database

Science.gov (United States)

Johnson, Robin J.; Lay, Jean M.; Lennon-Hopkins, Kelley; Saraceni-Richards, Cynthia; Sciaky, Daniela; Murphy, Cynthia Grondin; Mattingly, Carolyn J.

2013-01-01

The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) is a public resource that curates interactions between environmental chemicals and gene products, and their relationships to diseases, as a means of understanding the effects of environmental chemicals on human health. CTD provides a triad of core information in the form of chemical-gene, chemical-disease, and gene-disease interactions that are manually curated from scientific articles. To increase the efficiency, productivity, and data coverage of manual curation, we have leveraged text mining to help rank and prioritize the triaged literature. Here, we describe our text-mining process that computes and assigns each article a document relevancy score (DRS), wherein a high DRS suggests that an article is more likely to be relevant for curation at CTD. We evaluated our process by first text mining a corpus of 14,904 articles triaged for seven heavy metals (cadmium, cobalt, copper, lead, manganese, mercury, and nickel). Based upon initial analysis, a representative subset corpus of 3,583 articles was then selected from the 14,094 articles and sent to five CTD biocurators for review. The resulting curation of these 3,583 articles was analyzed for a variety of parameters, including article relevancy, novel data content, interaction yield rate, mean average precision, and biological and toxicological interpretability. We show that for all measured parameters, the DRS is an effective indicator for scoring and improving the ranking of literature for the curation of chemical-gene-disease information at CTD. Here, we demonstrate how fully incorporating text mining-based DRS scoring into our curation pipeline enhances manual curation by prioritizing more relevant articles, thereby increasing data content, productivity, and efficiency. PMID:23613709
Multilingual Language and Literacy Practices and Social Identities in Sunni Madrassahs in Mauritius: A Case Study

Science.gov (United States)

Owodally, Ambarin Mooznah Auleear

2011-01-01

This study analyzes the connections among multilingual language practices, multilingual literacy practices, and social identities in two Sunni madrassahs in Mauritius. The study is framed by sociolinguistic and poststructuralist perspectives on language and identity, and social practice views of literacy. Data collection and analysis involved…
Towards a democratisation of new media spaces in multilingual ...

African Journals Online (AJOL)

Towards a democratisation of new media spaces in multilingual/multicultural Africa: A ... and can lead to misinterpretation and misunderstanding of the messages. ... also demonstrate that language and 'culture' are products of social activities.
Multilingualism as Utopia: Fashioning Non-Racial Selves

Science.gov (United States)

Stroud, Christopher; Williams, Quentin

2017-01-01

The challenge of contemporary South Africa is that of building a (post)nation of postracial equity in a fragmented world of a globalized ethical, economic and ecological meltdown. In this paper, we seek to explore the idea of multilingualism as a technology in the conceptualization of alternative, competing futures. We suggest that multilingualism…
Language Policy, Multilingual Encounters, and Transnational Families

Science.gov (United States)

King, Kendall A.

2016-01-01

The study of what has come to be known as family language policy has evolved and expanded significantly over the last hundred years, from its early beginnings in the diary studies of Ronjat and Leopold, to the interdisciplinary and transnational research found in this thematic issue of the "Journal of Multilingual and Multicultural…
Multilingualism remixed: Sampling, braggadocio and the stylisation ...

African Journals Online (AJOL)

Remixing multilingualism and hip-hop in times of globalisation ... only the case that “[…] ... The remixing of global hip-hop in Cape Town started in an era defined by apartheid ..... refers to his socio-economic condition and attempts to relate to audience ..... Conversational sampling, race trafficking, and the invocation of the.
Mining Tasks from the Web Anchor Text Graph: MSR Notebook Paper for the TREC 2015 Tasks Track

Science.gov (United States)

2015-11-20

Mining Tasks from the Web Anchor Text Graph: MSR Notebook Paper for the TREC 2015 Tasks Track Paul N. Bennett Microsoft Research Redmond, USA pauben...anchor text graph has proven useful in the general realm of query reformulation [2], we sought to quantify the value of extracting key phrases from...anchor text in the broader setting of the task understanding track. Given a query, our approach considers a simple method for identifying a relevant
Argo: an integrative, interactive, text mining-based workbench supporting curation

Science.gov (United States)

Rak, Rafal; Rowley, Andrew; Black, William; Ananiadou, Sophia

2012-01-01

Curation of biomedical literature is often supported by the automatic analysis of textual content that generally involves a sequence of individual processing components. Text mining (TM) has been used to enhance the process of manual biocuration, but has been focused on specific databases and tasks rather than an environment integrating TM tools into the curation pipeline, catering for a variety of tasks, types of information and applications. Processing components usually come from different sources and often lack interoperability. The well established Unstructured Information Management Architecture is a framework that addresses interoperability by defining common data structures and interfaces. However, most of the efforts are targeted towards software developers and are not suitable for curators, or are otherwise inconvenient to use on a higher level of abstraction. To overcome these issues we introduce Argo, an interoperable, integrative, interactive and collaborative system for text analysis with a convenient graphic user interface to ease the development of processing workflows and boost productivity in labour-intensive manual curation. Robust, scalable text analytics follow a modular approach, adopting component modules for distinct levels of text analysis. The user interface is available entirely through a web browser that saves the user from going through often complicated and platform-dependent installation procedures. Argo comes with a predefined set of processing components commonly used in text analysis, while giving the users the ability to deposit their own components. The system accommodates various areas and levels of user expertise, from TM and computational linguistics to ontology-based curation. One of the key functionalities of Argo is its ability to seamlessly incorporate user-interactive components, such as manual annotation editors, into otherwise completely automatic pipelines. As a use case, we demonstrate the functionality of an in
PubstractHelper: A Web-based Text-Mining Tool for Marking Sentences in Abstracts from PubMed Using Multiple User-Defined Keywords.

Science.gov (United States)

Chen, Chou-Cheng; Ho, Chung-Liang

2014-01-01

While a huge amount of information about biological literature can be obtained by searching the PubMed database, reading through all the titles and abstracts resulting from such a search for useful information is inefficient. Text mining makes it possible to increase this efficiency. Some websites use text mining to gather information from the PubMed database; however, they are database-oriented, using pre-defined search keywords while lacking a query interface for user-defined search inputs. We present the PubMed Abstract Reading Helper (PubstractHelper) website which combines text mining and reading assistance for an efficient PubMed search. PubstractHelper can accept a maximum of ten groups of keywords, within each group containing up to ten keywords. The principle behind the text-mining function of PubstractHelper is that keywords contained in the same sentence are likely to be related. PubstractHelper highlights sentences with co-occurring keywords in different colors. The user can download the PMID and the abstracts with color markings to be reviewed later. The PubstractHelper website can help users to identify relevant publications based on the presence of related keywords, which should be a handy tool for their research. http://bio.yungyun.com.tw/ATM/PubstractHelper.aspx and http://holab.med.ncku.edu.tw/ATM/PubstractHelper.aspx.
Towards cross-lingual alerting for bursty epidemic events

Directory of Open Access Journals (Sweden)

Collier Nigel

2011-10-01

Full Text Available Abstract Background Online news reports are increasingly becoming a source for event-based early warning systems that detect natural disasters. Harnessing the massive volume of information available from multilingual newswire presents as many challanges as opportunities due to the patterns of reporting complex spatio-temporal events. Results In this article we study the problem of utilising correlated event reports across languages. We track the evolution of 16 disease outbreaks using 5 temporal aberration detection algorithms on text-mined events classified according to disease and outbreak country. Using ProMED reports as a silver standard, comparative analysis of news data for 13 languages over a 129 day trial period showed improved sensitivity, F1 and timeliness across most models using cross-lingual events. We report a detailed case study analysis for Cholera in Angola 2010 which highlights the challenges faced in correlating news events with the silver standard. Conclusions The results show that automated health surveillance using multilingual text mining has the potential to turn low value news into high value alerts if informed choices are used to govern the selection of models and data sources. An implementation of the C2 alerting algorithm using multilingual news is available at the BioCaster portal http://born.nii.ac.jp/?page=globalroundup.
E-Cigarette Social Media Messages: A Text Mining Analysis of Marketing and Consumer Conversations on Twitter

OpenAIRE

Lazard, Allison J; Saffer, Adam J; Wilcox, Gary B; Chung, Arnold DongWoo; Mackert, Michael S; Bernhardt, Jay M

2016-01-01

Background As the use of electronic cigarettes (e-cigarettes) rises, social media likely influences public awareness and perception of this emerging tobacco product. Objective This study examined the public conversation on Twitter to determine overarching themes and insights for trending topics from commercial and consumer users. Methods Text mining uncovered key patterns and important topics for e-cigarettes on Twitter. SAS Text Miner 12.1 software (SAS Institute Inc) was used for descriptiv...
Supporting the education evidence portal via text mining

Science.gov (United States)

Ananiadou, Sophia; Thompson, Paul; Thomas, James; Mu, Tingting; Oliver, Sandy; Rickinson, Mark; Sasaki, Yutaka; Weissenbacher, Davy; McNaught, John

2010-01-01

The UK Education Evidence Portal (eep) provides a single, searchable, point of access to the contents of the websites of 33 organizations relating to education, with the aim of revolutionizing work practices for the education community. Use of the portal alleviates the need to spend time searching multiple resources to find relevant information. However, the combined content of the websites of interest is still very large (over 500 000 documents and growing). This means that searches using the portal can produce very large numbers of hits. As users often have limited time, they would benefit from enhanced methods of performing searches and viewing results, allowing them to drill down to information of interest more efficiently, without having to sift through potentially long lists of irrelevant documents. The Joint Information Systems Committee (JISC)-funded ASSIST project has produced a prototype web interface to demonstrate the applicability of integrating a number of text-mining tools and methods into the eep, to facilitate an enhanced searching, browsing and document-viewing experience. New features include automatic classification of documents according to a taxonomy, automatic clustering of search results according to similar document content, and automatic identification and highlighting of key terms within documents. PMID:20643679
Multi-Paradigm and Multi-Lingual Information Extraction as Support for Medical Web Labelling Authorities

Directory of Open Access Journals (Sweden)

Martin Labsky

2010-10-01

Full Text Available Until recently, quality labelling of medical web content has been a pre-dominantly manual activity. However, the advances in automated text processing opened the way to computerised support of this activity. The core enabling technology is information extraction (IE. However, the heterogeneity of websites offering medical content imposes particular requirements on the IE techniques to be applied. In the paper we discuss these requirements and describe a multi-paradigm approach to IE addressing them. Experiments on multi-lingual data are reported. The research has been carried out within the EU MedIEQ project.
Text mining for search term development in systematic reviewing: A discussion of some methods and challenges.

Science.gov (United States)

Stansfield, Claire; O'Mara-Eves, Alison; Thomas, James

2017-09-01

Using text mining to aid the development of database search strings for topics described by diverse terminology has potential benefits for systematic reviews; however, methods and tools for accomplishing this are poorly covered in the research methods literature. We briefly review the literature on applications of text mining for search term development for systematic reviewing. We found that the tools can be used in 5 overarching ways: improving the precision of searches; identifying search terms to improve search sensitivity; aiding the translation of search strategies across databases; searching and screening within an integrated system; and developing objectively derived search strategies. Using a case study and selected examples, we then reflect on the utility of certain technologies (term frequency-inverse document frequency and Termine, term frequency, and clustering) in improving the precision and sensitivity of searches. Challenges in using these tools are discussed. The utility of these tools is influenced by the different capabilities of the tools, the way the tools are used, and the text that is analysed. Increased awareness of how the tools perform facilitates the further development of methods for their use in systematic reviews. Copyright © 2017 John Wiley & Sons, Ltd.

LA DIAGNOSI DELLE COMPETENZE LINGUISTICHE IN UN CONTESTO MULTILINGUE: UN PROCESSO CONTINUO CHE FAVORISCE L’INSEGNAMENTO E L’APPRENDIMENTO INDIVIDUALIZZATO

Directory of Open Access Journals (Sweden)

Drorit Lengyel

2011-07-01

Full Text Available Il presente studio affronta il problema della valutazione diagnostica delle competenze linguistiche in contesti educativi multilingui, con particolare attenzione ai bisogni dei bambini e degli adolescenti immigrati. Vengono brevemente esposti gli obiettivi, le funzioni e i principi della valutazione diagnostica e della valutazione formativa considerate parte integrante di una educazione linguistica continua che prevede un insegnamento-apprendimento individualizzato. Sul piano teorico le procedure della valutazione diagnostica delle competenze linguistiche in contesti multilingui fanno riferimento all’apprendimento delle lingue inteso come attività socio-culturale. Language diagnostics in multilingual settings with respect to continuous procedures as accompaniment of individualized learning and teaching This study provides an introduction to language diagnostics in multilingual educational settings, with particular reference to the needs of children and adolescents from migrant backgrounds. It summarises the objectives and functions of language diagnostics and the principles that govern diagnostics including formative assessment, considered to be an integral part of continuous language education that emphasises individualised teaching and learning. From a theoretical perspective, diagnostic procedures in multilingual settings treat language learning as a socio-cultural activity.
Whole field tendencies in transcranial magnetic stimulation: A systematic review with data and text mining.

Science.gov (United States)

Dias, Alvaro Machado; Mansur, Carlos Gustavo; Myczkowski, Martin; Marcolin, Marco

2011-06-01

Transcranial magnetic stimulation (TMS) has played an important role in the fields of psychiatry, neurology and neuroscience, since its emergence in the mid-1980s; and several high quality reviews have been produced since then. Most high quality reviews serve as powerful tools in the evaluation of predefined tendencies, but they cannot actually uncover new trends within the literature. However, special statistical procedures to 'mine' the literature have been developed which aid in achieving such a goal. This paper aims to uncover patterns within the literature on TMS as a whole, as well as specific trends in the recent literature on TMS for the treatment of depression. Data mining and text mining. Currently there are 7299 publications, which can be clustered in four essential themes. Considering the frequency of the core psychiatric concepts within the indexed literature, the main results are: depression is present in 13.5% of the publications; Parkinson's disease in 2.94%; schizophrenia in 2.76%; bipolar disorder in 0.158%; and anxiety disorder in 0.142% of all the publications indexed in PubMed. Several other perspectives are discussed in the article. Copyright © 2011 Elsevier B.V. All rights reserved.
A text-based data mining and toxicity prediction modeling system for a clinical decision support in radiation oncology: A preliminary study

Science.gov (United States)

Kim, Kwang Hyeon; Lee, Suk; Shim, Jang Bo; Chang, Kyung Hwan; Yang, Dae Sik; Yoon, Won Sup; Park, Young Je; Kim, Chul Yong; Cao, Yuan Jie

2017-08-01

The aim of this study is an integrated research for text-based data mining and toxicity prediction modeling system for clinical decision support system based on big data in radiation oncology as a preliminary research. The structured and unstructured data were prepared by treatment plans and the unstructured data were extracted by dose-volume data image pattern recognition of prostate cancer for research articles crawling through the internet. We modeled an artificial neural network to build a predictor model system for toxicity prediction of organs at risk. We used a text-based data mining approach to build the artificial neural network model for bladder and rectum complication predictions. The pattern recognition method was used to mine the unstructured toxicity data for dose-volume at the detection accuracy of 97.9%. The confusion matrix and training model of the neural network were achieved with 50 modeled plans (n = 50) for validation. The toxicity level was analyzed and the risk factors for 25% bladder, 50% bladder, 20% rectum, and 50% rectum were calculated by the artificial neural network algorithm. As a result, 32 plans could cause complication but 18 plans were designed as non-complication among 50 modeled plans. We integrated data mining and a toxicity modeling method for toxicity prediction using prostate cancer cases. It is shown that a preprocessing analysis using text-based data mining and prediction modeling can be expanded to personalized patient treatment decision support based on big data.
Is Mommy Talking to Daddy or to Me? Exploring Parental Estimates of Child Language Exposure Using the Multilingual Infant Language Questionnaire

Science.gov (United States)

Liu, Liquan; Kager, René

2017-01-01

Language input is a key factor in bi-/multilingual research. It roots in the definition of bi-/multilingualism and influences infant cognitive development since and even before birth. The methods used to assess language exposure among bi-/multilingual infants vary across studies. This paper discusses the parental report patterns of the…
ASCOT: a text mining-based web-service for efficient search and assisted creation of clinical trials

Science.gov (United States)

2012-01-01

Clinical trials are mandatory protocols describing medical research on humans and among the most valuable sources of medical practice evidence. Searching for trials relevant to some query is laborious due to the immense number of existing protocols. Apart from search, writing new trials includes composing detailed eligibility criteria, which might be time-consuming, especially for new researchers. In this paper we present ASCOT, an efficient search application customised for clinical trials. ASCOT uses text mining and data mining methods to enrich clinical trials with metadata, that in turn serve as effective tools to narrow down search. In addition, ASCOT integrates a component for recommending eligibility criteria based on a set of selected protocols. PMID:22595088
Operationalizing Multilingualism: Language Learning Motivation in Turkey

Science.gov (United States)

Thompson, Amy S.; Erdil-Moody, Zeynep

2016-01-01

This study is an examination of language learning motivation and multilingual status in the Turkish English as a foreign language (EFL) context. Using Dörnyei's L2 Motivational Self System (L2MSS) framework, specifically the ideal and ought-to L2 selves, this study examines the relationship between motivation and two operationalizations of…
Language policy implementation in multilingual Nigeria: French and ...

African Journals Online (AJOL)

By all standards, Nigeria is a multilingual and Multicultural state parading more than five hundred indigenous languages existing alongside English language as an official language and French language as the defacto second official language. Choosing a national language among the existing indigenous languages has ...
Translanguaging Practices and Perspectives of Four Multilingual Teens

Science.gov (United States)

Daniel, Shannon M.; Pacheco, Mark B.

2016-01-01

Increasingly, educational research suggests that translanguaging pedagogies can provide meaningful supports for English language learners. Yet, few studies examine how multilingual teens in English-dominant settings independently translanguage to make sense of school and achieve their goals. In this study, we review definitions of translanguaging…
Multilingualism and Teacher Preparation for the Universal Basic ...

African Journals Online (AJOL)

Although the National Policy on Education stipulates the learning of at least three languages on completion of Junior Secondary School by each learner, the Nigerian linguistic environment which is evidently multilingual poses some challenges and setbacks for language teaching and learning among ethnic minorities.
Language, Power, Multilingual and Non-Verbal Multicultural Communication

NARCIS (Netherlands)

Marácz, L.; Zhuravleva, E.A.

2014-01-01

Due to developments in internal migration and mobility there is a proliferation of linguistic diversity, multilingual and non-verbal multicultural communication. At the same time the recognition of the use of one’s first language receives more and more support in international political, legal and
iSentenizer-μ: multilingual sentence boundary detection model.

Science.gov (United States)

Wong, Derek F; Chao, Lidia S; Zeng, Xiaodong

2014-01-01

Sentence boundary detection (SBD) system is normally quite sensitive to genres of data that the system is trained on. The genres of data are often referred to the shifts of text topics and new languages domains. Although new detection models can be retrained for different languages or new text genres, previous model has to be thrown away and the creation process has to be restarted from scratch. In this paper, we present a multilingual sentence boundary detection system (iSentenizer-μ) for Danish, German, English, Spanish, Dutch, French, Italian, Portuguese, Greek, Finnish, and Swedish languages. The proposed system is able to detect the sentence boundaries of a mixture of different text genres and languages with high accuracy. We employ i (+)Learning algorithm, an incremental tree learning architecture, for constructing the system. iSentenizer-μ, under the incremental learning framework, is adaptable to text of different topics and Roman-alphabet languages, by merging new data into existing model to learn the new knowledge incrementally by revision instead of retraining. The system has been extensively evaluated on different languages and text genres and has been compared against two state-of-the-art SBD systems, Punkt and MaxEnt. The experimental results show that the proposed system outperforms the other systems on all datasets.
Multimodal Literacy Practices in the Indigenous Sámi Classroom: Children Navigating in a Complex Multilingual Setting

Science.gov (United States)

Pietikäinen, Sari; Pitkänen-Huhta, Anne

2013-01-01

This article explores multimodal literacy practices in a transforming multilingual context of an indigenous and endangered Sámi language classroom. Looking at literacy practices as embedded in a complex and shifting terrain of language ideologies, language norms, and individual experiences and attitudes, we examined how multilingual Sámi children…
Analysing Customer Opinions with Text Mining Algorithms

Science.gov (United States)

Consoli, Domenico

2009-08-01

Knowing what the customer thinks of a particular product/service helps top management to introduce improvements in processes and products, thus differentiating the company from their competitors and gain competitive advantages. The customers, with their preferences, determine the success or failure of a company. In order to know opinions of the customers we can use technologies available from the web 2.0 (blog, wiki, forums, chat, social networking, social commerce). From these web sites, useful information must be extracted, for strategic purposes, using techniques of sentiment analysis or opinion mining.
Requests for Help in a Multilingual Professional Environment Testimonies and Actantial Models

Directory of Open Access Journals (Sweden)

Lejot Eve

2017-12-01

Full Text Available Professional multilingual environments using English as a lingua franca are prone to imbalances in communication, linguistic insecurity and rising tension. Non-native English speakers develop avoidance strategies in order to lessen their apprehension. To overcome these imbalances, this research aims to understand the relationships formed around languages focusing on the dynamics of integration and the requests for help. Guided by the actantial models of Greimas (1966, this qualitative study employs semiolinguistics and discourse analysis, including 19 narrative interviews with employees of Airbus and UNESCO in Hamburg, Germany in 2013. This methodology draws on actors connected through relationships of power and/or collaboration. The actantial models applied seek linguistic input through designational paradigms, shifters and modal occurrences. The actantial models illustrate how a good language competence provides a better understanding of one’s direct as well as passive environment. The learning process is shown to be a conduit to integration. The actantial model and discourse analysis shed light on the complex situation of multilingual communication settings by highlighting the influence of individuals’ linguistic skills. As a matter of fact, depending on the role of each individual in a given situation, lending a helping hand sometimes equates to upsetting the balance.
Multilingualism and later life: a sociolinguistic perspective on age and aging.

Science.gov (United States)

Divita, David

2014-08-01

In this paper, I contribute to subjective accounts of aging by focusing on a population that has been largely overlooked in social gerontology: individuals in later life who are multilingual. How do such individuals experience and make sense of their multilingualism? What role does language play in the way they experience and make sense of their lives? To answer these questions I take a life story approach to three women who experienced similar sociohistorical circumstances but arrived at different linguistic outcomes: born in Spain around the time of the civil war (1936-1939), they migrated to Paris in the 1960s to pursue social and economic mobility. Although they arrived in France as monolingual Spanish speakers, they have since acquired French and now practice their multilingualism in distinct ways. I juxtapose their life stories to illustrate how the acquisition and use of language are informed by a confluence of personal, social, and historical factors. Focusing on the linguistic dimension of the life course I thus introduce a new perspective on the heterogeneity obtained among individuals at this stage of their biographical trajectories. Copyright © 2014 Elsevier Inc. All rights reserved.
Age of second language acquisition in multilinguals has an impact on grey matter volume in language-associated brain areas

Directory of Open Access Journals (Sweden)

Anelis eKaiser

2015-06-01

Full Text Available Numerous structural studies have established that experience shapes and reshapes the brain throughout a lifetime. The impact of early development, however, is still a matter of debate. Further clues may come from studying multilinguals who acquired their second language at different ages. We investigated adult multilinguals who spoke three languages fluently, where the third language was learned in classroom settings, not before the age of 9 years. Multilinguals exposed to 2 languages simultaneously from birth (SiM were contrasted with multinguals who acquired their first two languages successively (SuM. Whole brain voxel based morphometry revealed that, relative to SuM, SiM have significantly lower grey matter volume in several language-associated cortical areas in both hemispheres: bilaterally in medial and inferior frontal gyrus, in the right medial temporal gyrus and inferior posterior parietal gyrus, as well as in the left inferior frontal gyrus. Thus, as shown by others, successive language learning increases the volume of language-associated cortical areas. In brains exposed early on and simultaneously to more than one language, however, learning of additional languages seems to have less impact. We conclude that - at least with respect to language acquisition - early developmental influences are maintained and influence experience-dependent plasticity well into adulthood.
Working with Data: Discovering Knowledge through Mining and Analysis; Systematic Knowledge Management and Knowledge Discovery; Text Mining; Methodological Approach in Discovering User Search Patterns through Web Log Analysis; Knowledge Discovery in Databases Using Formal Concept Analysis; Knowledge Discovery with a Little Perspective.

Science.gov (United States)

Qin, Jian; Jurisica, Igor; Liddy, Elizabeth D.; Jansen, Bernard J; Spink, Amanda; Priss, Uta; Norton, Melanie J.

2000-01-01

These six articles discuss knowledge discovery in databases (KDD). Topics include data mining; knowledge management systems; applications of knowledge discovery; text and Web mining; text mining and information retrieval; user search patterns through Web log analysis; concept analysis; data collection; and data structure inconsistency. (LRW)
International aspirations for speech-language pathologists' practice with multilingual children with speech sound disorders: development of a position paper.

Science.gov (United States)

McLeod, Sharynne; Verdon, Sarah; Bowen, Caroline

2013-01-01

A major challenge for the speech-language pathology profession in many cultures is to address the mismatch between the "linguistic homogeneity of the speech-language pathology profession and the linguistic diversity of its clientele" (Caesar & Kohler, 2007, p. 198). This paper outlines the development of the Multilingual Children with Speech Sound Disorders: Position Paper created to guide speech-language pathologists' (SLPs') facilitation of multilingual children's speech. An international expert panel was assembled comprising 57 researchers (SLPs, linguists, phoneticians, and speech scientists) with knowledge about multilingual children's speech, or children with speech sound disorders. Combined, they had worked in 33 countries and used 26 languages in professional practice. Fourteen panel members met for a one-day workshop to identify key points for inclusion in the position paper. Subsequently, 42 additional panel members participated online to contribute to drafts of the position paper. A thematic analysis was undertaken of the major areas of discussion using two data sources: (a) face-to-face workshop transcript (133 pages) and (b) online discussion artifacts (104 pages). Finally, a moderator with international expertise in working with children with speech sound disorders facilitated the incorporation of the panel's recommendations. The following themes were identified: definitions, scope, framework, evidence, challenges, practices, and consideration of a multilingual audience. The resulting position paper contains guidelines for providing services to multilingual children with speech sound disorders (http://www.csu.edu.au/research/multilingual-speech/position-paper). The paper is structured using the International Classification of Functioning, Disability and Health: Children and Youth Version (World Health Organization, 2007) and incorporates recommendations for (a) children and families, (b) SLPs' assessment and intervention, (c) SLPs' professional
Mine Water Treatment in Hongai Coal Mines

Directory of Open Access Journals (Sweden)

Dang Phuong Thao

2018-01-01

Full Text Available Acid mine drainage (AMD is recognized as one of the most serious environmental problem associated with mining industry. Acid water, also known as acid mine drainage forms when iron sulfide minerals found in the rock of coal seams are exposed to oxidizing conditions in coal mining. Until 2009, mine drainage in Hongai coal mines was not treated, leading to harmful effects on humans, animals and aquatic ecosystem. This report has examined acid mine drainage problem and techniques for acid mine drainage treatment in Hongai coal mines. In addition, selection and criteria for the design of the treatment systems have been presented.
The Determination of Children's Knowledge of Global Lunar Patterns from Online Essays Using Text Mining Analysis

Science.gov (United States)

Cheon, Jongpil; Lee, Sangno; Smith, Walter; Song, Jaeki; Kim, Yongjin

2013-01-01

The purpose of this study was to use text mining analysis of early adolescents' online essays to determine their knowledge of global lunar patterns. Australian and American students in grades five to seven wrote about global lunar patterns they had discovered by sharing observations with each other via the Internet. These essays were analyzed for…

University students' perceptions of multilingual education: A case ...

African Journals Online (AJOL)

to implement multilingual education for purposes of teaching and learning ... higher education system, very little (excluding a range of stereotypes) is known about the ... learning of one additional or supportive language of tuition (CHE 2001:11). ..... state their gender, age, home language, household economic status and the ...
Legitimating Multilingual Teacher Identities in the Mainstream Classroom

Science.gov (United States)

Higgins, Christina; Ponte, Eva

2017-01-01

This article explores the identities of a group of elementary teachers who participated in a professional development (PD) project on multilingual language learners. We study how the participating teachers drew on different aspects of their identities to respond to encouragement to increase their attention to students' diverse multilingual…
Multilingualism and Language Learning: The Rome City Report

Science.gov (United States)

Menghini, Michela

2016-01-01

This article illustrates the findings on multilingualism related to the educational sphere in the city of Rome, within the scope and theoretical framework of the international project LUCIDE (Languages in Urban Communities--Integration and Diversity for Europe). Particularly, it describes the type of linguistic and cultural support offered to…
Admitted or Denied: Multilingual Writers Negotiate Admissions Essays

Science.gov (United States)

Wight, Shauna

2017-01-01

This article presents data from a collection of yearlong case studies on resident multilingual writers' college admissions essays. The focal student in this piece revealed the challenges that such writers face in presenting themselves to college admissions officers. Exploring these cultural and linguistic conflicts, this analysis uses Goffman's…
Imagining a multilingual academy: rethinking language in higher ...

African Journals Online (AJOL)

Language affects all aspects of the academy as a platform for the cultivation of critical thinking. The extension of multilingualism (ie "more people using more languages in more registers and in more domains") in higher education would contribute significantly to "improving the quality of the higher education sector".
MultiFarm: A Benchmark for Multilingual Ontology Matching

NARCIS (Netherlands)

Meilicke, C.; García-Castro, R.; Freitas, F.; van Hage, W.R.; Montiel-Ponsoda, E.; Ribeiro de Azevedo, R.; Stuckenschmidt, H.; Svab-Zamazal, O.; Svatek, V.; Tamalin, A.; Wang, S.

2012-01-01

In this paper we present the MultiFarm dataset, which has been designed as a benchmark for multilingual ontology matching. The MultiFarm dataset is composed of a set of ontologies translated in different languages and the corresponding alignments between these ontologies. It is based on the OntoFarm
Multilingual Writing and Pedagogical Cooperation in Virtual Learning Environments

DEFF Research Database (Denmark)

Mousten, Birthe; Vandepitte, Sonia; Arnó Macà, Elisabet

Multilingual Writing and Pedagogical Cooperation in Virtual Learning Environments is a critical scholarly resource that examines experiences with virtual networks and their advantages for universities and students in the domains of writing, translation, and usability testing. Featuring coverage o...
Practice-based evidence: profiling the safety of cilostazol by text-mining of clinical notes.

Directory of Open Access Journals (Sweden)

Nicholas J Leeper

Full Text Available Peripheral arterial disease (PAD is a growing problem with few available therapies. Cilostazol is the only FDA-approved medication with a class I indication for intermittent claudication, but carries a black box warning due to concerns for increased cardiovascular mortality. To assess the validity of this black box warning, we employed a novel text-analytics pipeline to quantify the adverse events associated with Cilostazol use in a clinical setting, including patients with congestive heart failure (CHF.We analyzed the electronic medical records of 1.8 million subjects from the Stanford clinical data warehouse spanning 18 years using a novel text-mining/statistical analytics pipeline. We identified 232 PAD patients taking Cilostazol and created a control group of 1,160 PAD patients not taking this drug using 1:5 propensity-score matching. Over a mean follow up of 4.2 years, we observed no association between Cilostazol use and any major adverse cardiovascular event including stroke (OR = 1.13, CI [0.82, 1.55], myocardial infarction (OR = 1.00, CI [0.71, 1.39], or death (OR = 0.86, CI [0.63, 1.18]. Cilostazol was not associated with an increase in any arrhythmic complication. We also identified a subset of CHF patients who were prescribed Cilostazol despite its black box warning, and found that it did not increase mortality in this high-risk group of patients.This proof of principle study shows the potential of text-analytics to mine clinical data warehouses to uncover 'natural experiments' such as the use of Cilostazol in CHF patients. We envision this method will have broad applications for examining difficult to test clinical hypotheses and to aid in post-marketing drug safety surveillance. Moreover, our observations argue for a prospective study to examine the validity of a drug safety warning that may be unnecessarily limiting the use of an efficacious therapy.
multilingualism and the ethnic identity of the ette people

African Journals Online (AJOL)

chy

Bilingualism/multilingualism will be discussed in section four; while section ... In the past, our ancestors moved from one place to the other in search of ... The concept of ethnicity, however, encompasses a lot of meaning so that its definition is.
Language Status and Literacy Trend in a Multilingual Society - Singapore

Science.gov (United States)

Kuo, Eddie C. Y.

1974-01-01

Using data from census reports and educational statistics, this paper analyzes the language status and literacy trends in multilingual Singapore, where the four official languages are Malay, Chinese, Tamil and English. (CK)
Assertions of Japanese Websites for and Against Cancer Screening: a Text Mining Analysis

Science.gov (United States)

Okuhara, Tsuyoshi; Ishikawa, Hirono; Okada, Masahumi; Kato, Mio; Kiuchi, Takahiro

2017-04-01

Background: Cancer screening rates are lower in Japan than in Western countries such as the United States and the United Kingdom. While health professionals publish pro-cancer-screening messages online to encourage proactive seeking for screening, anti-screening activists use the same medium to warn readers against following guidelines. Contents of pro- and anti-cancer-screening sites may contribute to readers’ acceptance of one or the other position. We aimed to use a text-mining method to examine frequently appearing contents on sites for and against cancer screening. Methods: We conducted online searches in December 2016 using two major search engines in Japan (Google Japan and Yahoo! Japan). Targeted websites were classified as “pro”, “anti”, or “neutral” depending on their claims, with the author(s) classified as “health professional”, “mass media”, or “layperson”. Text-mining analyses were conducted, and statistical analysis was performed using the chi-square test. Results: Of the 169 websites analyzed, the top-three most frequently appearing content topics in pro sites were reducing mortality via cancer screening, benefits of early detection, and recommendations for obtaining detailed examination. The top three most frequent in anti-sites were harm from radiation exposure, non-efficacy of cancer screening, and lack of necessity of early detection. Anti-sites also frequently referred to a well-known Japanese radiologist, Makoto Kondo, who rejects the standard forms of cancer care. Conclusion: Our findings should enable authors of pro-cancer-screening sites to write to counter misleading anti-cancer-screening messages and facilitate dissemination of accurate information. Creative Commons Attribution License
Multilingual Data Selection for Low Resource Speech Recognition

Science.gov (United States)

2016-09-12

output of this feature fron- tend is a multilingual representation from the bottleneck layer of the second network. In our training framework , we use 40...16] M. Harper, “IARPA Babel Program,” http://www.iarpa.gov/index. php /research-programs/babel, [Online; accessed 2016-03-25]. [17] “Assamese Babel
Automated Assessment of Patients' Self-Narratives for Posttraumatic Stress Disorder Screening Using Natural Language Processing and Text Mining.

Science.gov (United States)

He, Qiwei; Veldkamp, Bernard P; Glas, Cees A W; de Vries, Theo

2017-03-01

Patients' narratives about traumatic experiences and symptoms are useful in clinical screening and diagnostic procedures. In this study, we presented an automated assessment system to screen patients for posttraumatic stress disorder via a natural language processing and text-mining approach. Four machine-learning algorithms-including decision tree, naive Bayes, support vector machine, and an alternative classification approach called the product score model-were used in combination with n-gram representation models to identify patterns between verbal features in self-narratives and psychiatric diagnoses. With our sample, the product score model with unigrams attained the highest prediction accuracy when compared with practitioners' diagnoses. The addition of multigrams contributed most to balancing the metrics of sensitivity and specificity. This article also demonstrates that text mining is a promising approach for analyzing patients' self-expression behavior, thus helping clinicians identify potential patients from an early stage.
A mediation model for the translation of radio news texts in a ...

African Journals Online (AJOL)

Broadcast journalists in South Africa are media workers, editors and translators simultaneously producing news for bilingual or multilingual audiences. News texts are translated from English into one or more of the other official languages, depending on the target audience of the broadcaster. This article aims to indicate how ...
Problèmes linguistiques dans le système multilingues Linguistic Problems in Multilingual Systems

Directory of Open Access Journals (Sweden)

Moureau M.

2006-11-01

Full Text Available La communauté scientifique est internationale. L'industrie du pétrole est internationale. Des problèmes de communication, de langage, de langue y sont rencontrés chaque jour. C'est pourquoi, bien que tranchant assez vigoureusement sur les sujets habituels de la Revue de l'Institut Français du Pétrole, l'article de Magdeleine Moureau et Gerald Brace sur des problèmes de linguistique ne nous a pas paru trop étranger aux préoccupations de nos lecteurs pour leur être présenté. Cet article a pour but de traiter d'abord des impossibilités théoriques de la traduction, d'évoquer ensuite les modalités pratiques de sa réalisation quotidienne et de les appliquer à l'étude des problèmes inhérents à l'élaboration d'un langage documentaire multilinguisme. The aim of this paper is to discuss the theoretical impossibilities of translation, and then to describe the practical ways of actually translating, and to apply these ways to the task of studying problems inherent in elaborating a multilingual documentary language.
A universal multilingual weightless neural network tagger via quantitative linguistics.

Science.gov (United States)

Carneiro, Hugo C C; Pedreira, Carlos E; França, Felipe M G; Lima, Priscila M V

2017-07-01

In the last decade, given the availability of corpora in several distinct languages, research on multilingual part-of-speech tagging started to grow. Amongst the novelties there is mWANN-Tagger (multilingual weightless artificial neural network tagger), a weightless neural part-of-speech tagger capable of being used for mostly-suffix-oriented languages. The tagger was subjected to corpora in eight languages of quite distinct natures and had a remarkable accuracy with very low sample deviation in every one of them, indicating the robustness of weightless neural systems for part-of-speech tagging tasks. However, mWANN-Tagger needed to be tuned for every new corpus, since each one required a different parameter configuration. For mWANN-Tagger to be truly multilingual, it should be usable for any new language with no need of parameter tuning. This article proposes a study that aims to find a relation between the lexical diversity of a language and the parameter configuration that would produce the best performing mWANN-Tagger instance. Preliminary analyses suggested that a single parameter configuration may be applied to the eight aforementioned languages. The mWANN-Tagger instance produced by this configuration was as accurate as the language-dependent ones obtained through tuning. Afterwards, the weightless neural tagger was further subjected to new corpora in languages that range from very isolating to polysynthetic ones. The best performing instances of mWANN-Tagger are again the ones produced by the universal parameter configuration. Hence, mWANN-Tagger can be applied to new corpora with no need of parameter tuning, making it a universal multilingual part-of-speech tagger. Further experiments with Universal Dependencies treebanks reveal that mWANN-Tagger may be extended and that it has potential to outperform most state-of-the-art part-of-speech taggers if better word representations are provided. Copyright © 2017 Elsevier Ltd. All rights reserved.
Mandarin, Suzhou Dialect and English: Multilingualism in Suzhou

Science.gov (United States)

Xu, Sibing

2015-01-01

This paper explains the situation of Mandarin, Suzhou dialect and English in Suzhou, the relation between language policy and language use, discusses the positive and negative consequences that multilingualism in Suzhou might have for the society, and focuses on the change of language use in Suzhou and provides suggestions for the maintenance of…
Researching Multilingualism and Superdiversity: Grassroots Actions and Responsibilities

Science.gov (United States)

Wei, Li

2014-01-01

The articles in this thematic issue document studies of grassroots actions in promoting multilingualism across different sectors of society as well as in different social and professional domains. In doing so, the contributors raise issues of the relevance of the notion of community in the age of superdiversity and the researcher's…
A critical analysis of multilingual dictionaries | Prinsloo | Lexikos

African Journals Online (AJOL)

This article evaluates the lexicographic value of multilingual dictionaries. Dictionaries covering three or more languages spoken in South Africa are taken as a case in point. An attempt will be made to reflect on their merits and shortcomings as reference works and learning tools but the focus will be on presumed ...
Automated assessment of patients' self-narratives for posttraumatic stress disorder screening using natural language processing and text mining

NARCIS (Netherlands)

He, Qiwei; Veldkamp, Bernard P.; Glas, Cornelis A.W.; de Vries, Theo

2017-01-01

Patients’ narratives about traumatic experiences and symptoms are useful in clinical screening and diagnostic procedures. In this study, we presented an automated assessment system to screen patients for posttraumatic stress disorder via a natural language processing and text-mining approach. Four

DDMGD: the database of text-mined associations between genes methylated in diseases from different species

KAUST Repository

Raies, A. B.

2014-11-14

Gathering information about associations between methylated genes and diseases is important for diseases diagnosis and treatment decisions. Recent advancements in epigenetics research allow for large-scale discoveries of associations of genes methylated in diseases in different species. Searching manually for such information is not easy, as it is scattered across a large number of electronic publications and repositories. Therefore, we developed DDMGD database (http://www.cbrc.kaust.edu.sa/ddmgd/) to provide a comprehensive repository of information related to genes methylated in diseases that can be found through text mining. DDMGD\\'s scope is not limited to a particular group of genes, diseases or species. Using the text mining system DEMGD we developed earlier and additional post-processing, we extracted associations of genes methylated in different diseases from PubMed Central articles and PubMed abstracts. The accuracy of extracted associations is 82% as estimated on 2500 hand-curated entries. DDMGD provides a user-friendly interface facilitating retrieval of these associations ranked according to confidence scores. Submission of new associations to DDMGD is provided. A comparison analysis of DDMGD with several other databases focused on genes methylated in diseases shows that DDMGD is comprehensive and includes most of the recent information on genes methylated in diseases.
Book Review: Backhaus, Peter (2007: Linguistic Landscapes: A Comparative Study of Urban Multilingualism in Tokyo. Clevedon: Multilingual Matters; 158 Pages ISBN 9781853599460

Directory of Open Access Journals (Sweden)

Omar Alomoush

2015-12-01

Full Text Available Backhaus examines urban multilingualism in the linguistic landscape of Tokyo, the capital city of Japan. In this monograph, the linguistic landscape is seen as a sub-discipline of sociolinguistics. The significance of this monograph to linguistic landscape research is that it represents the first comprehensive approach tackling multilingualism in the linguistic landscape and overcoming a range of methodological problems facing former studies. In this sense, Backhaus’s approach in data collection and analysis may help linguistic landscapers and researchers to undertake research in multilingualism in the linguistic landscape. The current work comprises acknowledgements, a foreword by Bernard Spolsky, six chapters, an appendix, references, and an index. While the first three chapters represent an introduction and theoretical background, the fourth chapter in turn paves the way for the application of an empirical study in Tokyo’s linguistic landscape, applied in chapter five. That chapter one discusses the examination of written language in the public space of metropolises is the bulk of Backhaus’s work. In this respect, the author (p.1 refers to previous studies such as Halliday (1972, who considers the city not only a place of talk, but also a place of writing and reading. At the same time, this work focuses on ‘urban language contact in the written medium: the languages of the signs’. Backhaus (p.1 holds: Every urban environment is a myriad of written messages on public display: office and shop signs, billboards, and neon advertisements, traffic signs, topographic information and area maps, emergency guidance and political poster campaigns, stone inscriptions, and enigmatic graffiti discourse. The author maintains that these messages contribute to the making of the linguistic landscape of any given place. In chapter two, Semiotic Background and Terminology, Backhaus gives an introduction to the main features of language use on signs
‘I can’t really think in English’: Translation as literacy mediation in multilingual/multicultural learning contexts

Directory of Open Access Journals (Sweden)

Banda, Felix

2003-12-01

Full Text Available The article explores some aspects of a study which investigates translation as academic literacy mediation in South Africa’s multilingual/multicultural contexts. The focus is on learners’ translations of academic texts between the L2 and L1, and vice-versa, as a strategy to cope with ESL academic tasks. Using reflection discourse from one-on-one and focus group interviews as well as study group discussion texts, the study uses the New Literacy Studies model of literacy as social practice and aspects of critical discourse analysis to identify some pedagogical implications. One of the conclusions is that although learners are able to ‘translate’ in the sense of swapping labels between the L2 and L1 for the same concept, they are unable to successfully ‘translate’ in the sense of transfer of knowledge/cognitive skills between the L2 and L1, and the reverse. The need for functional use of the L1 and L2, critical cross-cultural awareness and language socialisation, as well as for trained bilingual teachers and literacy mediators, is explored as a way to promote positive difference, and help learners develop strategies to achieve transform/recontextualise knowledge/cognitive skills between the L2 and L1, and vice-versa, in multilingual/multicul-tural contexts.
Modelling vocabulary development among multilingual children prior to and following the transition to school entry

OpenAIRE

MacLeod, Andrea A. N.; Castellanos-Ryan, Natalie; Parent, Sophie; Jacques, Sophie; Séguin, Jean R.

2017-01-01

Differences between monolingual and multilingual vocabulary development have been observed but few studies provide a longitudinal perspective on vocabulary development before and following school entry. This study compares vocabulary growth profiles of 106 multilingual children to 211 monolingual peers before and after school entry to examine whether: (1) school entry coincides with different rates of vocabulary growth compared to prior to school entry, (2) compared to monolingual peers, mult...
A Text Matching Method to Facilitate the Validation of Frequent Order Sets Obtained Through Data Mining

OpenAIRE

Che, Chengjian; Rocha, Roberto A.

2006-01-01

In order to compare order sets discovered using a data mining algorithm with existing order sets, we developed an order matching tool based on Oracle Text. The tool includes both automated searching and manual review processes. The comparison between the automated process and the manual review process indicates that the sensitivity of the automated matching is 81% and the specificity is 84%.
Text mining with emergent self organizing maps and multi-dimensional scaling: a comparative study on domestic violence

NARCIS (Netherlands)

Poelmans, J.; van Hulle, M.M.; Viaene, S.; Elzinga, P.; Dedene, G.

2011-01-01

In this paper we compare the usability of ESOM and MDS as text exploration instruments in police investigations. We combine them with traditional classification instruments such as the SVM and Naïve Bayes. We perform a case of real-life data mining using a dataset consisting of police reports
English-lus Multilingualism as the New Linguistic Capital? Implications of University Students' Attitudes towards Languages of Instruction in a Multilingual Environment

Science.gov (United States)

Klapwijk, Nanda; Van der Walt, Christa

2016-01-01

This article investigates university students' attitudes and perceptions about language in a multilingual country where most instruction is in English and annual national literacy results have been declining for at least 15 years. Despite this decline, English seems to be entrenched as the language of instruction, and at university it seems a…
PubMed-EX: a web browser extension to enhance PubMed search with text mining features.

Science.gov (United States)

Tsai, Richard Tzong-Han; Dai, Hong-Jie; Lai, Po-Ting; Huang, Chi-Hsin

2009-11-15

PubMed-EX is a browser extension that marks up PubMed search results with additional text-mining information. PubMed-EX's page mark-up, which includes section categorization and gene/disease and relation mark-up, can help researchers to quickly focus on key terms and provide additional information on them. All text processing is performed server-side, freeing up user resources. PubMed-EX is freely available at http://bws.iis.sinica.edu.tw/PubMed-EX and http://iisr.cse.yzu.edu.tw:8000/PubMed-EX/.
Multilingual education in the light of diversity : Lessons learned

NARCIS (Netherlands)

Herzog-Punzenberger, Barbara; Le Pichon, E.M.M.; Siarova, Hanna

2017-01-01

While multilingualism and diversity have always been an integral part of Europe, they also became important characteristics of many national education systems in the past two decades. The linguistic diversity of modern classrooms is shaped by 1. the presence of historical non-dominant language
A network of tongues: African languages, multilingualism and global ...

African Journals Online (AJOL)

Some scholars have noted that globalization portends 'a global common language' that offers unprecedented possibilities for mutual understanding and thus enables us to find fresh opportunities for international co-operation. Yet, others have argued in favour of multilingualism, described as an alternative, fundamental ...
Texts and data mining and their possibilities applied to the process of news production

Directory of Open Access Journals (Sweden)

Walter Teixeira Lima Jr

2008-06-01

Full Text Available The proposal of this essay is to discuss the challenges of representing in a formalist computational process the knowledge which the journalist uses to articulate news values for the purpose of selecting and imposing hierarchy on news. It discusses how to make bridges to emulate this knowledge obtained in an empirical form with the bases of computational science, in the area of storage, recovery and linked to data in a database, which must show the way human brains treat information obtained through their sensorial system. Systemizing and automating part of the journalistic process in a database contributes to eliminating distortions, faults and to applying, in an efficient manner, techniques for Data Mining and/or Texts which, by definition, permit the discovery of nontrivial relations.
Texts and data mining and their possibilities applied to the process of news production

Directory of Open Access Journals (Sweden)

Walter Teixeira Lima Jr

2011-02-01

Full Text Available The proposal of this essay is to discuss the challenges of representing in a formalist computational process the knowledge which the journalist uses to articulate news values for the purpose of selecting and imposing hierarchy on news. It discusses how to make bridges to emulate this knowledge obtained in an empirical form with the bases of computational science, in the area of storage, recovery and linked to data in a database, which must show the way human brains treat information obtained through their sensorial system. Systemizing and automating part of the journalistic process in a database contributes to eliminating distortions, faults and to applying, in an efficient manner, techniques for Data Mining and/or Texts which, by definition, permit the discovery of nontrivial relations.
Interactive text mining with Pipeline Pilot: a bibliographic web-based tool for PubMed.

Science.gov (United States)

Vellay, S G P; Latimer, N E Miller; Paillard, G

2009-06-01

Text mining has become an integral part of all research in the medical field. Many text analysis software platforms support particular use cases and only those. We show an example of a bibliographic tool that can be used to support virtually any use case in an agile manner. Here we focus on a Pipeline Pilot web-based application that interactively analyzes and reports on PubMed search results. This will be of interest to any scientist to help identify the most relevant papers in a topical area more quickly and to evaluate the results of query refinement. Links with Entrez databases help both the biologist and the chemist alike. We illustrate this application with Leishmaniasis, a neglected tropical disease, as a case study.
Semantic transference for enriching multilingual biomedical knowledge resources.

Science.gov (United States)

Pérez, María; Berlanga, Rafael

2015-12-01

Biomedical knowledge resources (KRs) are mainly expressed in English, and many applications using them suffer from the scarcity of knowledge in non-English languages. The goal of the present work is to take maximum profit from existing multilingual biomedical KRs lexicons to enrich their non-English counterparts. We propose to combine different automatic methods to generate pair-wise language alignments. More specifically, we use two well-known translation methods (GIZA++ and Moses), and we propose a new ad hoc method specially devised for multilingual KRs. Then, resulting alignments are used to transfer semantics between KRs across their languages. Transference quality is ensured by checking the semantic coherence of the generated alignments. Experiments have been carried out over the Spanish, French and German UMLS Metathesaurus counterparts. As a result, the enriched Spanish KR can grow up to 1,514,217 concepts (originally 286,659), the French KR up to 1,104,968 concepts (originally 83,119), and the German KR up to 1,136,020 concepts (originally 86,842). Copyright © 2015 Elsevier Inc. All rights reserved.
Towards cross-lingual alerting for bursty epidemic events.

Science.gov (United States)

Collier, Nigel

2011-10-06

Online news reports are increasingly becoming a source for event-based early warning systems that detect natural disasters. Harnessing the massive volume of information available from multilingual newswire presents as many challanges as opportunities due to the patterns of reporting complex spatio-temporal events. In this article we study the problem of utilising correlated event reports across languages. We track the evolution of 16 disease outbreaks using 5 temporal aberration detection algorithms on text-mined events classified according to disease and outbreak country. Using ProMED reports as a silver standard, comparative analysis of news data for 13 languages over a 129 day trial period showed improved sensitivity, F1 and timeliness across most models using cross-lingual events. We report a detailed case study analysis for Cholera in Angola 2010 which highlights the challenges faced in correlating news events with the silver standard. The results show that automated health surveillance using multilingual text mining has the potential to turn low value news into high value alerts if informed choices are used to govern the selection of models and data sources. An implementation of the C2 alerting algorithm using multilingual news is available at the BioCaster portal http://born.nii.ac.jp/?page=globalroundup.
The Voice of Chinese Health Consumers: A Text Mining Approach to Web-Based Physician Reviews.

Science.gov (United States)

Hao, Haijing; Zhang, Kunpeng

2016-05-10

Many Web-based health care platforms allow patients to evaluate physicians by posting open-end textual reviews based on their experiences. These reviews are helpful resources for other patients to choose high-quality doctors, especially in countries like China where no doctor referral systems exist. Analyzing such a large amount of user-generated content to understand the voice of health consumers has attracted much attention from health care providers and health care researchers. The aim of this paper is to automatically extract hidden topics from Web-based physician reviews using text-mining techniques to examine what Chinese patients have said about their doctors and whether these topics differ across various specialties. This knowledge will help health care consumers, providers, and researchers better understand this information. We conducted two-fold analyses on the data collected from the "Good Doctor Online" platform, the largest online health community in China. First, we explored all reviews from 2006-2014 using descriptive statistics. Second, we applied the well-known topic extraction algorithm Latent Dirichlet Allocation to more than 500,000 textual reviews from over 75,000 Chinese doctors across four major specialty areas to understand what Chinese health consumers said online about their doctor visits. On the "Good Doctor Online" platform, 112,873 out of 314,624 doctors had been reviewed at least once by April 11, 2014. Among the 772,979 textual reviews, we chose to focus on four major specialty areas that received the most reviews: Internal Medicine, Surgery, Obstetrics/Gynecology and Pediatrics, and Chinese Traditional Medicine. Among the doctors who received reviews from those four medical specialties, two-thirds of them received more than two reviews and in a few extreme cases, some doctors received more than 500 reviews. Across the four major areas, the most popular topics reviewers found were the experience of finding doctors, doctors' technical
Automatic creation of specialised multilingual dictionaries in new subject areas

Directory of Open Access Journals (Sweden)

Joaquim Moré

2009-05-01

Full Text Available This article presents a tool to automatically generate specialised dictionaries of multilingual equivalents in new subject areas. The tool uses resources that are available on the web to search for equivalents and verify their reliability. These resources are, on the one hand, the Wikipedias, which can be freely downloaded and processed, and, on the other, the materials that terminological institutions of reference make available. This tool is of use to teachers producing teaching materials and researchers preparing theses, articles or reference manuals. It is also of use to translators and terminologists working on terminological standardisation in a new subject area in a given language, as it helps them in their work to pinpoint concepts that have yet to receive a standardised denomination.
The Compilation of Multilingual Concept Literacy Glossaries at the ...

African Journals Online (AJOL)

account for the multilingual concept literacy glossaries being compiled under the auspices of .... a theory, i.e. the set of premises, arguments and conclusions required for explaining ... fully address cognitive and communicative needs, especially of laypersons. ..... tion at UCT, and in indigenous languages as auxiliary media.
Foreign Language Anxiety in Turkey: The Role of Multilingualism

Science.gov (United States)

Thompson, Amy S.; Khawaja, Anastasia J.

2016-01-01

As part of a larger study on individual differences and language learning in Turkey, this study explores the relationship between foreign language anxiety and two operationalisations of multilingualism: any experience with a third language and Perceived Positive Language Interaction; it also illuminates connections among the aforementioned…
Snapshots of the Universe: A Multilingual Astronomy Book

Science.gov (United States)

Beaton, R. L.; Sokal, K. R.; Liss, S. E.; Johnson, K. E.

2015-11-01

Dark Skies, Bright Kids! (DSBK) is an outreach organization at the University of Virginia, focused on enhancing elementary level science education in under-served communities. Early in the program, DSBK volunteers encountered difficulties connecting with English as a second language (ESL) students. To meet that challenge, DSBK volunteers created story-book style art with short descriptions of astronomical objects in both Spanish and English to help communicate basic astronomy concepts to these students. Building on this initial success, our simple project has evolved into a full multilingual children's book targeted at 2nd-5th grade students. Though originally in Spanish and English, a partnership with the University of Alberta (Canada) has produced a French translation of the text, broadening the outreach potential of the book. In this contribution, we describe Snapshots of the Universe (Instantáneas del Universo) and reflect upon the process of creating this unique resource.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.