WorldWideScience

Sample records for automated natural language

  1. Automation of a problem list using natural language processing

    Directory of Open Access Journals (Sweden)

    Haug Peter J

    2005-08-01

    Full Text Available Abstract Background The medical problem list is an important part of the electronic medical record in development in our institution. To serve the functions it is designed for, the problem list has to be as accurate and timely as possible. However, the current problem list is usually incomplete and inaccurate, and is often totally unused. To alleviate this issue, we are building an environment where the problem list can be easily and effectively maintained. Methods For this project, 80 medical problems were selected for their frequency of use in our future clinical field of evaluation (cardiovascular. We have developed an Automated Problem List system composed of two main components: a background and a foreground application. The background application uses Natural Language Processing (NLP to harvest potential problem list entries from the list of 80 targeted problems detected in the multiple free-text electronic documents available in our electronic medical record. These proposed medical problems drive the foreground application designed for management of the problem list. Within this application, the extracted problems are proposed to the physicians for addition to the official problem list. Results The set of 80 targeted medical problems selected for this project covered about 5% of all possible diagnoses coded in ICD-9-CM in our study population (cardiovascular adult inpatients, but about 64% of all instances of these coded diagnoses. The system contains algorithms to detect first document sections, then sentences within these sections, and finally potential problems within the sentences. The initial evaluation of the section and sentence detection algorithms demonstrated a sensitivity and positive predictive value of 100% when detecting sections, and a sensitivity of 89% and a positive predictive value of 94% when detecting sentences. Conclusion The global aim of our project is to automate the process of creating and maintaining a problem

  2. Automated Radiology Report Summarization Using an Open-Source Natural Language Processing Pipeline.

    Science.gov (United States)

    Goff, Daniel J; Loehfelm, Thomas W

    2017-10-30

    Diagnostic radiologists are expected to review and assimilate findings from prior studies when constructing their overall assessment of the current study. Radiology information systems facilitate this process by presenting the radiologist with a subset of prior studies that are more likely to be relevant to the current study, usually by comparing anatomic coverage of both the current and prior studies. It is incumbent on the radiologist to review the full text report and/or images from those prior studies, a process that is time-consuming and confers substantial risk of overlooking a relevant prior study or finding. This risk is compounded when patients have dozens or even hundreds of prior imaging studies. Our goal is to assess the feasibility of natural language processing techniques to automatically extract asserted and negated disease entities from free-text radiology reports as a step towards automated report summarization. We compared automatically extracted disease mentions to a gold-standard set of manual annotations for 50 radiology reports from CT abdomen and pelvis examinations. The automated report summarization pipeline found perfect or overlapping partial matches for 86% of the manually annotated disease mentions (sensitivity 0.86, precision 0.66, accuracy 0.59, F1 score 0.74). The performance of the automated pipeline was good, and the overall accuracy was similar to the interobserver agreement between the two manual annotators.

  3. A semi-automated approach for generating natural language requirements documents based on business process models

    NARCIS (Netherlands)

    Aysolmaz, Banu; Leopold, Henrik; Reijers, Hajo A.; Demirörs, Onur

    2018-01-01

    Context: The analysis of requirements for business-related software systems is often supported by using business process models. However, the final requirements are typically still specified in natural language. This means that the knowledge captured in process models must be consistently

  4. Automated Assessment of Patients' Self-Narratives for Posttraumatic Stress Disorder Screening Using Natural Language Processing and Text Mining.

    Science.gov (United States)

    He, Qiwei; Veldkamp, Bernard P; Glas, Cees A W; de Vries, Theo

    2017-03-01

    Patients' narratives about traumatic experiences and symptoms are useful in clinical screening and diagnostic procedures. In this study, we presented an automated assessment system to screen patients for posttraumatic stress disorder via a natural language processing and text-mining approach. Four machine-learning algorithms-including decision tree, naive Bayes, support vector machine, and an alternative classification approach called the product score model-were used in combination with n-gram representation models to identify patterns between verbal features in self-narratives and psychiatric diagnoses. With our sample, the product score model with unigrams attained the highest prediction accuracy when compared with practitioners' diagnoses. The addition of multigrams contributed most to balancing the metrics of sensitivity and specificity. This article also demonstrates that text mining is a promising approach for analyzing patients' self-expression behavior, thus helping clinicians identify potential patients from an early stage.

  5. Measuring information acquisition from sensory input using automated scoring of natural-language descriptions.

    Directory of Open Access Journals (Sweden)

    Daniel R Saunders

    Full Text Available Information acquisition, the gathering and interpretation of sensory information, is a basic function of mobile organisms. We describe a new method for measuring this ability in humans, using free-recall responses to sensory stimuli which are scored objectively using a "wisdom of crowds" approach. As an example, we demonstrate this metric using perception of video stimuli. Immediately after viewing a 30 s video clip, subjects responded to a prompt to give a short description of the clip in natural language. These responses were scored automatically by comparison to a dataset of responses to the same clip by normally-sighted viewers (the crowd. In this case, the normative dataset consisted of responses to 200 clips by 60 subjects who were stratified by age (range 22 to 85 y and viewed the clips in the lab, for 2,400 responses, and by 99 crowdsourced participants (age range 20 to 66 y who viewed clips in their Web browser, for 4,000 responses. We compared different algorithms for computing these similarities and found that a simple count of the words in common had the best performance. It correctly matched 75% of the lab-sourced and 95% of crowdsourced responses to their corresponding clips. We validated the measure by showing that when the amount of information in the clip was degraded using defocus lenses, the shared word score decreased across the five predetermined visual-acuity levels, demonstrating a dose-response effect (N = 15. This approach, of scoring open-ended immediate free recall of the stimulus, is applicable not only to video, but also to other situations where a measure of the information that is successfully acquired is desirable. Information acquired will be affected by stimulus quality, sensory ability, and cognitive processes, so our metric can be used to assess each of these components when the others are controlled.

  6. Automated chart review utilizing natural language processing algorithm for asthma predictive index.

    Science.gov (United States)

    Kaur, Harsheen; Sohn, Sunghwan; Wi, Chung-Il; Ryu, Euijung; Park, Miguel A; Bachman, Kay; Kita, Hirohito; Croghan, Ivana; Castro-Rodriguez, Jose A; Voge, Gretchen A; Liu, Hongfang; Juhn, Young J

    2018-02-13

    Thus far, no algorithms have been developed to automatically extract patients who meet Asthma Predictive Index (API) criteria from the Electronic health records (EHR) yet. Our objective is to develop and validate a natural language processing (NLP) algorithm to identify patients that meet API criteria. This is a cross-sectional study nested in a birth cohort study in Olmsted County, MN. Asthma status ascertained by manual chart review based on API criteria served as gold standard. NLP-API was developed on a training cohort (n = 87) and validated on a test cohort (n = 427). Criterion validity was measured by sensitivity, specificity, positive predictive value and negative predictive value of the NLP algorithm against manual chart review for asthma status. Construct validity was determined by associations of asthma status defined by NLP-API with known risk factors for asthma. Among the eligible 427 subjects of the test cohort, 48% were males and 74% were White. Median age was 5.3 years (interquartile range 3.6-6.8). 35 (8%) had a history of asthma by NLP-API vs. 36 (8%) by abstractor with 31 by both approaches. NLP-API predicted asthma status with sensitivity 86%, specificity 98%, positive predictive value 88%, negative predictive value 98%. Asthma status by both NLP and manual chart review were significantly associated with the known asthma risk factors, such as history of allergic rhinitis, eczema, family history of asthma, and maternal history of smoking during pregnancy (p value NLP-API and abstractor, and the effect sizes were similar between the reviews with 4.4 vs 4.2 respectively. NLP-API was able to ascertain asthma status in children mining from EHR and has a potential to enhance asthma care and research through population management and large-scale studies when identifying children who meet API criteria.

  7. Natural Language Processing

    OpenAIRE

    Preeti; BrahmaleenKaurSidhu

    2013-01-01

    Natural language processing (NLP) work began more than sixty years ago; it is a field of computer science and linguistics devoted to creating computer systems that use human (natural) language. Natural Language Processing holds great promise for making computer interfaces that are easier to use for people, since people will be able to talk to the computer in their own language, rather than learn a specialized language of computer commands. Natural Language processing techniques can make possi...

  8. Design automation, languages, and simulations

    CERN Document Server

    Chen, Wai-Kai

    2003-01-01

    As the complexity of electronic systems continues to increase, the micro-electronic industry depends upon automation and simulations to adapt quickly to market changes and new technologies. Compiled from chapters contributed to CRC's best-selling VLSI Handbook, this volume covers a broad range of topics relevant to design automation, languages, and simulations. These include a collaborative framework that coordinates distributed design activities through the Internet, an overview of the Verilog hardware description language and its use in a design environment, hardware/software co-design, syst

  9. Advances in natural language processing.

    Science.gov (United States)

    Hirschberg, Julia; Manning, Christopher D

    2015-07-17

    Natural language processing employs computational techniques for the purpose of learning, understanding, and producing human language content. Early computational approaches to language research focused on automating the analysis of the linguistic structure of language and developing basic technologies such as machine translation, speech recognition, and speech synthesis. Today's researchers refine and make use of such tools in real-world applications, creating spoken dialogue systems and speech-to-speech translation engines, mining social media for information about health or finance, and identifying sentiment and emotion toward products and services. We describe successes and challenges in this rapidly advancing area. Copyright © 2015, American Association for the Advancement of Science.

  10. Automated identification of wound information in clinical notes of patients with heart diseases: Developing and validating a natural language processing application.

    Science.gov (United States)

    Topaz, Maxim; Lai, Kenneth; Dowding, Dawn; Lei, Victor J; Zisberg, Anna; Bowles, Kathryn H; Zhou, Li

    2016-12-01

    Electronic health records are being increasingly used by nurses with up to 80% of the health data recorded as free text. However, only a few studies have developed nursing-relevant tools that help busy clinicians to identify information they need at the point of care. This study developed and validated one of the first automated natural language processing applications to extract wound information (wound type, pressure ulcer stage, wound size, anatomic location, and wound treatment) from free text clinical notes. First, two human annotators manually reviewed a purposeful training sample (n=360) and random test sample (n=1100) of clinical notes (including 50% discharge summaries and 50% outpatient notes), identified wound cases, and created a gold standard dataset. We then trained and tested our natural language processing system (known as MTERMS) to process the wound information. Finally, we assessed our automated approach by comparing system-generated findings against the gold standard. We also compared the prevalence of wound cases identified from free-text data with coded diagnoses in the structured data. The testing dataset included 101 notes (9.2%) with wound information. The overall system performance was good (F-measure is a compiled measure of system's accuracy=92.7%), with best results for wound treatment (F-measure=95.7%) and poorest results for wound size (F-measure=81.9%). Only 46.5% of wound notes had a structured code for a wound diagnosis. The natural language processing system achieved good performance on a subset of randomly selected discharge summaries and outpatient notes. In more than half of the wound notes, there were no coded wound diagnoses, which highlight the significance of using natural language processing to enrich clinical decision making. Our future steps will include expansion of the application's information coverage to other relevant wound factors and validation of the model with external data. Copyright © 2016 Elsevier Ltd. All

  11. Natural Language Sourcebook.

    Science.gov (United States)

    Baker, Eva; And Others

    This sourcebook is intended to provide researchers and users of natural language computer systems with a classification scheme to describe language-related problems associated with such systems. Methods from the disciplines of artificial intelligence (AI), education, linguistics, psychology, anthropology, and psychometrics were applied in an…

  12. Natural language generation

    Science.gov (United States)

    Maybury, Mark T.

    The goal of natural language generation is to replicate human writers or speakers: to generate fluent, grammatical, and coherent text or speech. Produced language, using both explicit and implicit means, must clearly and effectively express some intended message. This demands the use of a lexicon and a grammar together with mechanisms which exploit semantic, discourse and pragmatic knowledge to constrain production. Furthermore, special processors may be required to guide focus, extract presuppositions, and maintain coherency. As with interpretation, generation may require knowledge of the world, including information about the discourse participants as well as knowledge of the specific domain of discourse. All of these processes and knowledge sources must cooperate to produce well-written, unambiguous language. Natural language generation has received less attention than language interpretation due to the nature of language: it is important to interpret all the ways of expressing a message but we need to generate only one. Furthermore, the generative task can often be accomplished by canned text (e.g., error messages or user instructions). The advent of more sophisticated computer systems, however, has intensified the need to express multisentential English.

  13. Natural language modeling

    Energy Technology Data Exchange (ETDEWEB)

    Sharp, J.K. [Sandia National Labs., Albuquerque, NM (United States)

    1997-11-01

    This seminar describes a process and methodology that uses structured natural language to enable the construction of precise information requirements directly from users, experts, and managers. The main focus of this natural language approach is to create the precise information requirements and to do it in such a way that the business and technical experts are fully accountable for the results. These requirements can then be implemented using appropriate tools and technology. This requirement set is also a universal learning tool because it has all of the knowledge that is needed to understand a particular process (e.g., expense vouchers, project management, budget reviews, tax, laws, machine function).

  14. Mixed deep learning and natural language processing method for fake-food image recognition and standardization to help automated dietary assessment.

    Science.gov (United States)

    Mezgec, Simon; Eftimov, Tome; Bucher, Tamara; Koroušić Seljak, Barbara

    2018-04-06

    The present study tested the combination of an established and a validated food-choice research method (the 'fake food buffet') with a new food-matching technology to automate the data collection and analysis. The methodology combines fake-food image recognition using deep learning and food matching and standardization based on natural language processing. The former is specific because it uses a single deep learning network to perform both the segmentation and the classification at the pixel level of the image. To assess its performance, measures based on the standard pixel accuracy and Intersection over Union were applied. Food matching firstly describes each of the recognized food items in the image and then matches the food items with their compositional data, considering both their food names and their descriptors. The final accuracy of the deep learning model trained on fake-food images acquired by 124 study participants and providing fifty-five food classes was 92·18 %, while the food matching was performed with a classification accuracy of 93 %. The present findings are a step towards automating dietary assessment and food-choice research. The methodology outperforms other approaches in pixel accuracy, and since it is the first automatic solution for recognizing the images of fake foods, the results could be used as a baseline for possible future studies. As the approach enables a semi-automatic description of recognized food items (e.g. with respect to FoodEx2), these can be linked to any food composition database that applies the same classification and description system.

  15. Laboratory automation in a functional programming language.

    Science.gov (United States)

    Runciman, Colin; Clare, Amanda; Harkness, Rob

    2014-12-01

    After some years of use in academic and research settings, functional languages are starting to enter the mainstream as an alternative to more conventional programming languages. This article explores one way to use Haskell, a functional programming language, in the development of control programs for laboratory automation systems. We give code for an example system, discuss some programming concepts that we need for this example, and demonstrate how the use of functional programming allows us to express and verify properties of the resulting code. © 2014 Society for Laboratory Automation and Screening.

  16. Security Classification Using Automated Learning (SCALE): Optimizing Statistical Natural Language Processing Techniques to Assign Security Labels to Unstructured Text

    Science.gov (United States)

    2010-12-01

    2010 © Sa Majesté la Reine (en droit du Canada), telle que représentée par le ministre de la Défense nationale, 2010 Abstract Automating the... fonction de son expérience et des politiques de sécurité. Pour étiqueter de manière efficace toutes les données disponibles dans les réseaux du...bien que l’on ait étudié en profondeur la catégorisation automatique de données en fonction du sujet, peu de recherches axées sur l’évaluation

  17. The NCL natural constraint language

    CERN Document Server

    Zhou, Jianyang

    2012-01-01

    This book presents the Natural Constraint Language (NCL) language, a description language in conventional mathematical logic for modeling and solving constraint satisfaction problems. It uses illustrations and tutorials to detail NCL and its applications.

  18. Teaching natural language to computers

    OpenAIRE

    Corneli, Joseph; Corneli, Miriam

    2016-01-01

    "Natural Language," whether spoken and attended to by humans, or processed and generated by computers, requires networked structures that reflect creative processes in semantic, syntactic, phonetic, linguistic, social, emotional, and cultural modules. Being able to produce novel and useful behavior following repeated practice gets to the root of both artificial intelligence and human language. This paper investigates the modalities involved in language-like applications that computers -- and ...

  19. Handbook of Natural Language Processing

    CERN Document Server

    Indurkhya, Nitin

    2010-01-01

    Provides a comprehensive, modern reference of practical tools and techniques for implementing natural language processing in computer systems. This title covers classical methods, empirical and statistical techniques, and various applications. It describes how the techniques can be applied to European and Asian languages as well as English

  20. Henkin semantics for reasoning with natural language

    Directory of Open Access Journals (Sweden)

    Michael Hahn

    2016-02-01

    Full Text Available The frequency of intensional and non-first-order definable operators in natural languages constitutes a challenge for automated reasoning with the kind of logical translations that are deemed adequate by formal semanticists. Whereas linguists employ expressive higher-order logics in their theories of meaning, the most successful logical reasoning strategies with natural language to date rely on sophisticated first-order theorem provers and model builders. In order to bridge the fundamental mathematical gap between linguistic theory and computational practice, we present a general translation from a higher-order logic frequently employed in the linguistics literature, two-sorted Type Theory, to first-order logic under Henkin semantics. We investigate alternative formulations of the translation, discuss their properties, and evaluate the availability of linguistically relevant inferences with standard theorem provers in a test suite of inference problems stated in English. The results of the experiment indicate that translation from higher-order logic to first-order logic under Henkin semantics is a promising strategy for automated reasoning with natural languages.The paper is accompanied by the source code (cf. SUPP. FILES of the grammar and reasoning architecture described in the paper.

  1. Application of an automated natural language processing (NLP) workflow to enable federated search of external biomedical content in drug discovery and development.

    Science.gov (United States)

    McEntire, Robin; Szalkowski, Debbie; Butler, James; Kuo, Michelle S; Chang, Meiping; Chang, Man; Freeman, Darren; McQuay, Sarah; Patel, Jagruti; McGlashen, Michael; Cornell, Wendy D; Xu, Jinghai James

    2016-05-01

    External content sources such as MEDLINE(®), National Institutes of Health (NIH) grants and conference websites provide access to the latest breaking biomedical information, which can inform pharmaceutical and biotechnology company pipeline decisions. The value of the sites for industry, however, is limited by the use of the public internet, the limited synonyms, the rarity of batch searching capability and the disconnected nature of the sites. Fortunately, many sites now offer their content for download and we have developed an automated internal workflow that uses text mining and tailored ontologies for programmatic search and knowledge extraction. We believe such an efficient and secure approach provides a competitive advantage to companies needing access to the latest information for a range of use cases and complements manually curated commercial sources. Copyright © 2016. Published by Elsevier Ltd.

  2. A Portable Natural Language Interface.

    Science.gov (United States)

    1987-09-01

    and that would integrate graphics, mouse deixis , and natural language. Although the project was originally intended to last several years, it has been...planning program, an expert system used to plan air attack missions for the Air Force. This interface combined English with graphics and mouse deixis

  3. Natural language processing with Java

    CERN Document Server

    Reese, Richard M

    2015-01-01

    If you are a Java programmer who wants to learn about the fundamental tasks underlying natural language processing, this book is for you. You will be able to identify and use NLP tasks for many common problems, and integrate them in your applications to solve more difficult problems. Readers should be familiar/experienced with Java software development.

  4. New trends in natural language processing: statistical natural language processing.

    OpenAIRE

    Marcus, M

    1995-01-01

    The field of natural language processing (NLP) has seen a dramatic shift in both research direction and methodology in the past several years. In the past, most work in computational linguistics tended to focus on purely symbolic methods. Recently, more and more work is shifting toward hybrid methods that combine new empirical corpus-based methods, including the use of probabilistic and information-theoretic techniques, with traditional symbolic methods. This work is made possible by the rece...

  5. Natural language processing in psychiatry. Artificial intelligence technology and psychopathology.

    Science.gov (United States)

    Garfield, D A; Rapp, C; Evens, M

    1992-04-01

    The potential benefit of artificial intelligence (AI) technology as a tool of psychiatry has not been well defined. In this essay, the technology of natural language processing and its position with regard to the two main schools of AI is clearly outlined. Past experiments utilizing AI techniques in understanding psychopathology are reviewed. Natural language processing can automate the analysis of transcripts and can be used in modeling theories of language comprehension. In these ways, it can serve as a tool in testing psychological theories of psychopathology and can be used as an effective tool in empirical research on verbal behavior in psychopathology.

  6. Artificial intelligence, expert systems, computer vision, and natural language processing

    Science.gov (United States)

    Gevarter, W. B.

    1984-01-01

    An overview of artificial intelligence (AI), its core ingredients, and its applications is presented. The knowledge representation, logic, problem solving approaches, languages, and computers pertaining to AI are examined, and the state of the art in AI is reviewed. The use of AI in expert systems, computer vision, natural language processing, speech recognition and understanding, speech synthesis, problem solving, and planning is examined. Basic AI topics, including automation, search-oriented problem solving, knowledge representation, and computational logic, are discussed.

  7. Natural Language Processing and the Language-Impaired.

    Science.gov (United States)

    Ward, R. D.

    1986-01-01

    Describes ideas for making the best use of simple language processing interfaces in computer-based learning activities. These ideas are based on classroom observations of hearing-impaired, language-impaired, and unimpaired children using programs with a natural language interface which allows them to communicate with the computer about…

  8. Natural language processing: an introduction.

    Science.gov (United States)

    Nadkarni, Prakash M; Ohno-Machado, Lucila; Chapman, Wendy W

    2011-01-01

    To provide an overview and tutorial of natural language processing (NLP) and modern NLP-system design. This tutorial targets the medical informatics generalist who has limited acquaintance with the principles behind NLP and/or limited knowledge of the current state of the art. We describe the historical evolution of NLP, and summarize common NLP sub-problems in this extensive field. We then provide a synopsis of selected highlights of medical NLP efforts. After providing a brief description of common machine-learning approaches that are being used for diverse NLP sub-problems, we discuss how modern NLP architectures are designed, with a summary of the Apache Foundation's Unstructured Information Management Architecture. We finally consider possible future directions for NLP, and reflect on the possible impact of IBM Watson on the medical field.

  9. Visualizing Natural Language Descriptions: A Survey

    OpenAIRE

    Hassani, Kaveh; Lee, Won-Sook

    2016-01-01

    A natural language interface exploits the conceptual simplicity and naturalness of the language to create a high-level user-friendly communication channel between humans and machines. One of the promising applications of such interfaces is generating visual interpretations of semantic content of a given natural language that can be then visualized either as a static scene or a dynamic animation. This survey discusses requirements and challenges of developing such systems and reports 26 graphi...

  10. Knowledge representation and natural language processing

    Energy Technology Data Exchange (ETDEWEB)

    Weischedel, R.M.

    1986-07-01

    In principle, natural language and knowledge representation are closely related. This paper investigates this by demonstrating how several natural language phenomena, such as definite reference, ambiguity, ellipsis, ill-formed input, figures of speech, and vagueness, require diverse knowledge sources and reasoning. The breadth of kinds of knowledge needed to represent morphology, syntax, semantics, and pragmatics is surveyed. Furthermore, several current issues in knowledge representation, such as logic versus semantic nets, general-purpose versus special-purpose reasoners, adequacy of first-order logic, wait-and-see strategies, and default reasoning, are illustrated in terms of their relation to natural language processing and how natural language impact the issues.

  11. Mobile speech and advanced natural language solutions

    CERN Document Server

    Markowitz, Judith

    2013-01-01

    Mobile Speech and Advanced Natural Language Solutions provides a comprehensive and forward-looking treatment of natural speech in the mobile environment. This fourteen-chapter anthology brings together lead scientists from Apple, Google, IBM, AT&T, Yahoo! Research and other companies, along with academicians, technology developers and market analysts.  They analyze the growing markets for mobile speech, new methodological approaches to the study of natural language, empirical research findings on natural language and mobility, and future trends in mobile speech.  Mobile Speech opens with a challenge to the industry to broaden the discussion about speech in mobile environments beyond the smartphone, to consider natural language applications across different domains.   Among the new natural language methods introduced in this book are Sequence Package Analysis, which locates and extracts valuable opinion-related data buried in online postings; microintonation as a way to make TTS truly human-like; and se...

  12. Generating natural language under pragmatic constraints

    CERN Document Server

    Hovy, Eduard H

    2013-01-01

    Recognizing that the generation of natural language is a goal- driven process, where many of the goals are pragmatic (i.e., interpersonal and situational) in nature, this book provides an overview of the role of pragmatics in language generation. Each chapter states a problem that arises in generation, develops a pragmatics-based solution, and then describes how the solution is implemented in PAULINE, a language generator that can produce numerous versions of a single underlying message, depending on its setting.

  13. Emerging Approach of Natural Language Processing in Opinion Mining: A Review

    Science.gov (United States)

    Kim, Tai-Hoon

    Natural language processing (NLP) is a subfield of artificial intelligence and computational linguistics. It studies the problems of automated generation and understanding of natural human languages. This paper outlines a framework to use computer and natural language techniques for various levels of learners to learn foreign languages in Computer-based Learning environment. We propose some ideas for using the computer as a practical tool for learning foreign language where the most of courseware is generated automatically. We then describe how to build Computer Based Learning tools, discuss its effectiveness, and conclude with some possibilities using on-line resources.

  14. Policy-Based Management Natural Language Parser

    Science.gov (United States)

    James, Mark

    2009-01-01

    The Policy-Based Management Natural Language Parser (PBEM) is a rules-based approach to enterprise management that can be used to automate certain management tasks. This parser simplifies the management of a given endeavor by establishing policies to deal with situations that are likely to occur. Policies are operating rules that can be referred to as a means of maintaining order, security, consistency, or other ways of successfully furthering a goal or mission. PBEM provides a way of managing configuration of network elements, applications, and processes via a set of high-level rules or business policies rather than managing individual elements, thus switching the control to a higher level. This software allows unique management rules (or commands) to be specified and applied to a cross-section of the Global Information Grid (GIG). This software embodies a parser that is capable of recognizing and understanding conversational English. Because all possible dialect variants cannot be anticipated, a unique capability was developed that parses passed on conversation intent rather than the exact way the words are used. This software can increase productivity by enabling a user to converse with the system in conversational English to define network policies. PBEM can be used in both manned and unmanned science-gathering programs. Because policy statements can be domain-independent, this software can be applied equally to a wide variety of applications.

  15. Applications of Natural Language Processing in Biodiversity Science

    Directory of Open Access Journals (Sweden)

    Anne E. Thessen

    2012-01-01

    A computer can handle the volume but cannot make sense of the language. This paper reviews and discusses the use of natural language processing (NLP and machine-learning algorithms to extract information from systematic literature. NLP algorithms have been used for decades, but require special development for application in the biological realm due to the special nature of the language. Many tools exist for biological information extraction (cellular processes, taxonomic names, and morphological characters, but none have been applied life wide and most still require testing and development. Progress has been made in developing algorithms for automated annotation of taxonomic text, identification of taxonomic names in text, and extraction of morphological character information from taxonomic descriptions. This manuscript will briefly discuss the key steps in applying information extraction tools to enhance biodiversity science.

  16. Automated sensitivity analysis using the GRESS language

    International Nuclear Information System (INIS)

    Pin, F.G.; Oblow, E.M.; Wright, R.Q.

    1986-04-01

    An automated procedure for performing large-scale sensitivity studies based on the use of computer calculus is presented. The procedure is embodied in a FORTRAN precompiler called GRESS, which automatically processes computer models and adds derivative-taking capabilities to the normal calculated results. In this report, the GRESS code is described, tested against analytic and numerical test problems, and then applied to a major geohydrological modeling problem. The SWENT nuclear waste repository modeling code is used as the basis for these studies. Results for all problems are discussed in detail. Conclusions are drawn as to the applicability of GRESS in the problems at hand and for more general large-scale modeling sensitivity studies

  17. A Natural Logic for Natural-Language Knowledge Bases

    DEFF Research Database (Denmark)

    Andreasen, Troels; Styltsvig, Henrik Bulskov; Jensen, Per Anker

    2017-01-01

    We describe a natural logic for computational reasoning with a regimented fragment of natural language. The natural logic comes with intuitive inference rules enabling deductions and with an internal graph representation facilitating conceptual path finding between pairs of terms as an approach t......-conservative constructs in order to approach scientific use of natural language. Finally, we outline a prototype system addressing life science for the natural logic knowledge base setup being under continuous development.......We describe a natural logic for computational reasoning with a regimented fragment of natural language. The natural logic comes with intuitive inference rules enabling deductions and with an internal graph representation facilitating conceptual path finding between pairs of terms as an approach...

  18. Blurring the Inputs: A Natural Language Approach to Sensitivity Analysis

    Science.gov (United States)

    Kleb, William L.; Thompson, Richard A.; Johnston, Christopher O.

    2007-01-01

    To document model parameter uncertainties and to automate sensitivity analyses for numerical simulation codes, a natural-language-based method to specify tolerances has been developed. With this new method, uncertainties are expressed in a natural manner, i.e., as one would on an engineering drawing, namely, 5.25 +/- 0.01. This approach is robust and readily adapted to various application domains because it does not rely on parsing the particular structure of input file formats. Instead, tolerances of a standard format are added to existing fields within an input file. As a demonstration of the power of this simple, natural language approach, a Monte Carlo sensitivity analysis is performed for three disparate simulation codes: fluid dynamics (LAURA), radiation (HARA), and ablation (FIAT). Effort required to harness each code for sensitivity analysis was recorded to demonstrate the generality and flexibility of this new approach.

  19. Arabic Natural Language Processing System Code Library

    Science.gov (United States)

    2014-06-01

    POS Tagging, and Dependency Parsing. Fourth Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL). English (Note: These are for...Detection, Affix Labeling, POS Tagging, and Dependency Parsing" by Stephen Tratz presented at the Statistical Parsing of Morphologically Rich Languages ...and also English ) natural language processing (NLP), containing code for training and applying the Arabic NLP system described in Stephen Tratz’s

  20. Bayesian natural language semantics and pragmatics

    CERN Document Server

    Zeevat, Henk

    2015-01-01

    The contributions in this volume focus on the Bayesian interpretation of natural languages, which is widely used in areas of artificial intelligence, cognitive science, and computational linguistics. This is the first volume to take up topics in Bayesian Natural Language Interpretation and make proposals based on information theory, probability theory, and related fields. The methodologies offered here extend to the target semantic and pragmatic analyses of computational natural language interpretation. Bayesian approaches to natural language semantics and pragmatics are based on methods from signal processing and the causal Bayesian models pioneered by especially Pearl. In signal processing, the Bayesian method finds the most probable interpretation by finding the one that maximizes the product of the prior probability and the likelihood of the interpretation. It thus stresses the importance of a production model for interpretation as in Grice's contributions to pragmatics or in interpretation by abduction.

  1. Natural Language Description of Emotion

    Science.gov (United States)

    Kazemzadeh, Abe

    2013-01-01

    This dissertation studies how people describe emotions with language and how computers can simulate this descriptive behavior. Although many non-human animals can express their current emotions as social signals, only humans can communicate about emotions symbolically. This symbolic communication of emotion allows us to talk about emotions that we…

  2. Trainable Methods for Surface Natural Language Generation

    OpenAIRE

    Ratnaparkhi, Adwait

    2000-01-01

    We present three systems for surface natural language generation that are trainable from annotated corpora. The first two systems, called NLG1 and NLG2, require a corpus marked only with domain-specific semantic attributes, while the last system, called NLG3, requires a corpus marked with both semantic attributes and syntactic dependency information. All systems attempt to produce a grammatical natural language phrase from a domain-specific semantic representation. NLG1 serves a baseline syst...

  3. Evolution, brain, and the nature of language.

    Science.gov (United States)

    Berwick, Robert C; Friederici, Angela D; Chomsky, Noam; Bolhuis, Johan J

    2013-02-01

    Language serves as a cornerstone for human cognition, yet much about its evolution remains puzzling. Recent research on this question parallels Darwin's attempt to explain both the unity of all species and their diversity. What has emerged from this research is that the unified nature of human language arises from a shared, species-specific computational ability. This ability has identifiable correlates in the brain and has remained fixed since the origin of language approximately 100 thousand years ago. Although songbirds share with humans a vocal imitation learning ability, with a similar underlying neural organization, language is uniquely human. Copyright © 2012 Elsevier Ltd. All rights reserved.

  4. Biocoder: A programming language for standardizing and automating biology protocols.

    Science.gov (United States)

    Ananthanarayanan, Vaishnavi; Thies, William

    2010-11-08

    Published descriptions of biology protocols are often ambiguous and incomplete, making them difficult to replicate in other laboratories. However, there is increasing benefit to formalizing the descriptions of protocols, as laboratory automation systems (such as microfluidic chips) are becoming increasingly capable of executing them. Our goal in this paper is to improve both the reproducibility and automation of biology experiments by using a programming language to express the precise series of steps taken. We have developed BioCoder, a C++ library that enables biologists to express the exact steps needed to execute a protocol. In addition to being suitable for automation, BioCoder converts the code into a readable, English-language description for use by biologists. We have implemented over 65 protocols in BioCoder; the most complex of these was successfully executed by a biologist in the laboratory using BioCoder as the only reference. We argue that BioCoder exposes and resolves ambiguities in existing protocols, and could provide the software foundations for future automation platforms. BioCoder is freely available for download at http://research.microsoft.com/en-us/um/india/projects/biocoder/. BioCoder represents the first practical programming system for standardizing and automating biology protocols. Our vision is to change the way that experimental methods are communicated: rather than publishing a written account of the protocols used, researchers will simply publish the code. Our experience suggests that this practice is tractable and offers many benefits. We invite other researchers to leverage BioCoder to improve the precision and completeness of their protocols, and also to adapt and extend BioCoder to new domains.

  5. Using natural language processing techniques to inform research on nanotechnology

    Directory of Open Access Journals (Sweden)

    Nastassja A. Lewinski

    2015-07-01

    Full Text Available Literature in the field of nanotechnology is exponentially increasing with more and more engineered nanomaterials being created, characterized, and tested for performance and safety. With the deluge of published data, there is a need for natural language processing approaches to semi-automate the cataloguing of engineered nanomaterials and their associated physico-chemical properties, performance, exposure scenarios, and biological effects. In this paper, we review the different informatics methods that have been applied to patent mining, nanomaterial/device characterization, nanomedicine, and environmental risk assessment. Nine natural language processing (NLP-based tools were identified: NanoPort, NanoMapper, TechPerceptor, a Text Mining Framework, a Nanodevice Analyzer, a Clinical Trial Document Classifier, Nanotoxicity Searcher, NanoSifter, and NEIMiner. We conclude with recommendations for sharing NLP-related tools through online repositories to broaden participation in nanoinformatics.

  6. Developing Formal Correctness Properties from Natural Language Requirements

    Science.gov (United States)

    Nikora, Allen P.

    2006-01-01

    This viewgraph presentation reviews the rationale of the program to transform natural language specifications into formal notation.Specifically, automate generation of Linear Temporal Logic (LTL)correctness properties from natural language temporal specifications. There are several reasons for this approach (1) Model-based techniques becoming more widely accepted, (2) Analytical verification techniques (e.g., model checking, theorem proving) significantly more effective at detecting types of specification design errors (e.g., race conditions, deadlock) than manual inspection, (3) Many requirements still written in natural language, which results in a high learning curve for specification languages, associated tools and increased schedule and budget pressure on projects reduce training opportunities for engineers, and (4) Formulation of correctness properties for system models can be a difficult problem. This has relevance to NASA in that it would simplify development of formal correctness properties, lead to more widespread use of model-based specification, design techniques, assist in earlier identification of defects and reduce residual defect content for space mission software systems. The presentation also discusses: potential applications, accomplishments and/or technological transfer potential and the next steps.

  7. Research in Natural Language Understanding

    Science.gov (United States)

    1978-08-31

    of the lexical material to explain how many actions there were, how many actors , etc., and the nature of the map from actor onto action, etc. For...direction and make a measurement there, or may scan from the current focus in a specified " direccion " (or by some other specification of a trajectory

  8. Semantic structures advances in natural language processing

    CERN Document Server

    Waltz, David L

    2014-01-01

    Natural language understanding is central to the goals of artificial intelligence. Any truly intelligent machine must be capable of carrying on a conversation: dialogue, particularly clarification dialogue, is essential if we are to avoid disasters caused by the misunderstanding of the intelligent interactive systems of the future. This book is an interim report on the grand enterprise of devising a machine that can use natural language as fluently as a human. What has really been achieved since this goal was first formulated in Turing's famous test? What obstacles still need to be overcome?

  9. The social impact of natural language processing

    DEFF Research Database (Denmark)

    Hovy, Dirk; Spruit, Shannon

    Research in natural language processing (NLP) used to be mostly performed on anonymous corpora, with the goal of enriching linguistic analysis. Authors were either largely unknown or public figures. As we increasingly use more data from social media, this situation has changed: users are now...

  10. Natural Language Navigation Support in Virtual Reality

    NARCIS (Netherlands)

    van Luin, J.; Nijholt, Antinus; op den Akker, Hendrikus J.A.; Giagourta, V.; Strintzis, M.G.

    2001-01-01

    We describe our work on designing a natural language accessible navigation agent for a virtual reality (VR) environment. The agent is part of an agent framework, which means that it can communicate with other agents. Its navigation task consists of guiding the visitors in the environment and to

  11. Theoretical approaches to natural language understanding

    Energy Technology Data Exchange (ETDEWEB)

    1985-01-01

    This book discusses the following: Computational Linguistics, Artificial Intelligence, Linguistics, Philosophy, and Cognitive Science and the current state of natural language understanding. Three topics form the focus for discussion; these topics include aspects of grammars, aspects of semantics/pragmatics, and knowledge representation.

  12. DPMine Graphical Language for Automation of Experiments in Process Mining

    Directory of Open Access Journals (Sweden)

    S. A. Shershakov

    2014-01-01

    Full Text Available Process mining is a new direction in the field of modeling and analysis of processes, where the use of information from event logs describing the history of the system behavior plays an important role. Methods and approaches used in the process mining are often based on various heuristics, and experiments with large event logs are crucial for the study and comparison of the developed methods and algorithms. Such experiments are very time consuming, so automation of experiments is an important task in the field of process mining. This paper presents the language DPMine developed specifically to describe and carry out experiments on the discovery and analysis of process models. The basic concepts of the DPMine language as well as principles and mechanisms of its extension are described. Ways of integration of the DPMine language as dynamically loaded components into the VTMine modeling tool are considered. An illustrating example of an experiment for building a fuzzy model of the process discovered from the log data stored in a normalized database is given.

  13. Brain readiness and the nature of language.

    Science.gov (United States)

    Bouchard, Denis

    2015-01-01

    To identify the neural components that make a brain ready for language, it is important to have well defined linguistic phenotypes, to know precisely what language is. There are two central features to language: the capacity to form signs (words), and the capacity to combine them into complex structures. We must determine how the human brain enables these capacities. A sign is a link between a perceptual form and a conceptual meaning. Acoustic elements and content elements, are already brain-internal in non-human animals, but as categorical systems linked with brain-external elements. Being indexically tied to objects of the world, they cannot freely link to form signs. A crucial property of a language-ready brain is the capacity to process perceptual forms and contents offline, detached from any brain-external phenomena, so their "representations" may be linked into signs. These brain systems appear to have pleiotropic effects on a variety of phenotypic traits and not to be specifically designed for language. Syntax combines signs, so the combination of two signs operates simultaneously on their meaning and form. The operation combining the meanings long antedates its function in language: the primitive mode of predication operative in representing some information about an object. The combination of the forms is enabled by the capacity of the brain to segment vocal and visual information into discrete elements. Discrete temporal units have order and juxtaposition, and vocal units have intonation, length, and stress. These are primitive combinatorial processes. So the prior properties of the physical and conceptual elements of the sign introduce combinatoriality into the linguistic system, and from these primitive combinatorial systems derive concatenation in phonology and combination in morphosyntax. Given the nature of language, a key feature to our understanding of the language-ready brain is to be found in the mechanisms in human brains that enable the unique

  14. Brain readiness and the nature of language

    Directory of Open Access Journals (Sweden)

    Denis eBouchard

    2015-09-01

    Full Text Available To identify the neural components that make a brain ready for language, it is important to have well defined linguistic phenotypes, to know precisely what language is. There are two central features to language: the capacity to form signs (words, and the capacity to combine them into complex structures. We must determine how the human brain enables these capacities.A sign is a link between a perceptual form and a conceptual meaning. Acoustic elements and content elements, are already brain-internal in non-human animals, but as categorical systems linked with brain-external elements. Being indexically tied to objects of the world, they cannot freely link to form signs. A crucial property of a language-ready brain is the capacity to process perceptual forms and contents offline, detached from any brain-external phenomena, so their representations may be linked into signs. These brain systems appear to have pleiotropic effects on a variety of phenotypic traits and not to be specifically designed for language.Syntax combines signs, so the combination of two signs operates simultaneously on their meaning and form. The operation combining the meanings long antedates its function in language: the primitive mode of predication operative in representing some information about an object. The combination of the forms is enabled by the capacity of the brain to segment vocal and visual information into discrete elements. Discrete temporal units have order and juxtaposition, and vocal units have intonation, length, and stress. These are primitive combinatorial processes. So the prior properties of the physical and conceptual elements of the sign introduce combinatoriality into the linguistic system, and from these primitive combinatorial systems derive concatenation in phonology and combination in morphosyntax.Given the nature of language, a key feature to our understanding of the language-ready brain is to be found in the mechanisms in human brains that

  15. Automated genome mining for natural products

    Directory of Open Access Journals (Sweden)

    Zajkowski James

    2009-06-01

    Full Text Available Abstract Background Discovery of new medicinal agents from natural sources has largely been an adventitious process based on screening of plant and microbial extracts combined with bioassay-guided identification and natural product structure elucidation. Increasingly rapid and more cost-effective genome sequencing technologies coupled with advanced computational power have converged to transform this trend toward a more rational and predictive pursuit. Results We have developed a rapid method of scanning genome sequences for multiple polyketide, nonribosomal peptide, and mixed combination natural products with output in a text format that can be readily converted to two and three dimensional structures using conventional software. Our open-source and web-based program can assemble various small molecules composed of twenty standard amino acids and twenty two other chain-elongation intermediates used in nonribosomal peptide systems, and four acyl-CoA extender units incorporated into polyketides by reading a hidden Markov model of DNA. This process evaluates and selects the substrate specificities along the assembly line of nonribosomal synthetases and modular polyketide synthases. Conclusion Using this approach we have predicted the structures of natural products from a diverse range of bacteria based on a limited number of signature sequences. In accelerating direct DNA to metabolomic analysis, this method bridges the interface between chemists and biologists and enables rapid scanning for compounds with potential therapeutic value.

  16. Natural Language Processing Technologies in Radiology Research and Clinical Applications

    Science.gov (United States)

    Cai, Tianrun; Giannopoulos, Andreas A.; Yu, Sheng; Kelil, Tatiana; Ripley, Beth; Kumamaru, Kanako K.; Rybicki, Frank J.

    2016-01-01

    The migration of imaging reports to electronic medical record systems holds great potential in terms of advancing radiology research and practice by leveraging the large volume of data continuously being updated, integrated, and shared. However, there are significant challenges as well, largely due to the heterogeneity of how these data are formatted. Indeed, although there is movement toward structured reporting in radiology (ie, hierarchically itemized reporting with use of standardized terminology), the majority of radiology reports remain unstructured and use free-form language. To effectively “mine” these large datasets for hypothesis testing, a robust strategy for extracting the necessary information is needed. Manual extraction of information is a time-consuming and often unmanageable task. “Intelligent” search engines that instead rely on natural language processing (NLP), a computer-based approach to analyzing free-form text or speech, can be used to automate this data mining task. The overall goal of NLP is to translate natural human language into a structured format (ie, a fixed collection of elements), each with a standardized set of choices for its value, that is easily manipulated by computer programs to (among other things) order into subcategories or query for the presence or absence of a finding. The authors review the fundamentals of NLP and describe various techniques that constitute NLP in radiology, along with some key applications. ©RSNA, 2016 PMID:26761536

  17. Natural Language Processing Technologies in Radiology Research and Clinical Applications.

    Science.gov (United States)

    Cai, Tianrun; Giannopoulos, Andreas A; Yu, Sheng; Kelil, Tatiana; Ripley, Beth; Kumamaru, Kanako K; Rybicki, Frank J; Mitsouras, Dimitrios

    2016-01-01

    The migration of imaging reports to electronic medical record systems holds great potential in terms of advancing radiology research and practice by leveraging the large volume of data continuously being updated, integrated, and shared. However, there are significant challenges as well, largely due to the heterogeneity of how these data are formatted. Indeed, although there is movement toward structured reporting in radiology (ie, hierarchically itemized reporting with use of standardized terminology), the majority of radiology reports remain unstructured and use free-form language. To effectively "mine" these large datasets for hypothesis testing, a robust strategy for extracting the necessary information is needed. Manual extraction of information is a time-consuming and often unmanageable task. "Intelligent" search engines that instead rely on natural language processing (NLP), a computer-based approach to analyzing free-form text or speech, can be used to automate this data mining task. The overall goal of NLP is to translate natural human language into a structured format (ie, a fixed collection of elements), each with a standardized set of choices for its value, that is easily manipulated by computer programs to (among other things) order into subcategories or query for the presence or absence of a finding. The authors review the fundamentals of NLP and describe various techniques that constitute NLP in radiology, along with some key applications. ©RSNA, 2016.

  18. Learning procedures from interactive natural language instructions

    Science.gov (United States)

    Huffman, Scott B.; Laird, John E.

    1994-01-01

    Despite its ubiquity in human learning, very little work has been done in artificial intelligence on agents that learn from interactive natural language instructions. In this paper, the problem of learning procedures from interactive, situated instruction is examined in which the student is attempting to perform tasks within the instructional domain, and asks for instruction when it is needed. Presented is Instructo-Soar, a system that behaves and learns in response to interactive natural language instructions. Instructo-Soar learns completely new procedures from sequences of instruction, and also learns how to extend its knowledge of previously known procedures to new situations. These learning tasks require both inductive and analytic learning. Instructo-Soar exhibits a multiple execution learning process in which initial learning has a rote, episodic flavor, and later executions allow the initially learned knowledge to be generalized properly.

  19. Natural language processing tools for computer assisted language learning

    Directory of Open Access Journals (Sweden)

    Vandeventer Faltin, Anne

    2003-01-01

    Full Text Available This paper illustrates the usefulness of natural language processing (NLP tools for computer assisted language learning (CALL through the presentation of three NLP tools integrated within a CALL software for French. These tools are (i a sentence structure viewer; (ii an error diagnosis system; and (iii a conjugation tool. The sentence structure viewer helps language learners grasp the structure of a sentence, by providing lexical and grammatical information. This information is derived from a deep syntactic analysis. Two different outputs are presented. The error diagnosis system is composed of a spell checker, a grammar checker, and a coherence checker. The spell checker makes use of alpha-codes, phonological reinterpretation, and some ad hoc rules to provide correction proposals. The grammar checker employs constraint relaxation and phonological reinterpretation as diagnosis techniques. The coherence checker compares the underlying "semantic" structures of a stored answer and of the learners' input to detect semantic discrepancies. The conjugation tool is a resource with enhanced capabilities when put on an electronic format, enabling searches from inflected and ambiguous verb forms.

  20. Traitement automatique et apprentissage des langues (Automated Discourse Analysis and Language Teaching).

    Science.gov (United States)

    Garrigues, Mylene

    1992-01-01

    Issues in computerized analysis of language usage are discussed, focusing on the problems encountered as computers, linguistics, and language teaching converge. The tools of automated language and error analysis are outlined and specific problems are illustrated in several types of classroom exercise. (MSE)

  1. Natural Language Question Answering in Open Domains

    Directory of Open Access Journals (Sweden)

    Dan Tufis

    2011-10-01

    Full Text Available With the ever-growing volume of information on the web, the traditional search engines, returning hundreds or thousands of documents per query, become more and more demanding on the user patience in satisfying his/her information needs. Question Answering in Open Domains is a top research and development topic in current language technology. Unlike the standard search engines, based on the latest Information Retrieval (IR methods, open domain question-answering systems are expected to deliver not a list of documents that might be relevant for the user's query, but a sentence or a paragraph answering the question asked in natural language. This paper reports on the construction and testing of a Question Answering (QA system which builds on several web services developed at the Research Institute for Artificial Intelligence (ICIA/RACAI. The evaluation of the system has been independently done by the organizers of the ResPubliQA 2009 exercise and has been rated the best performing system with the highest improvement due to the natural language processing technology over a baseline state-of-the-art IR system. The system was trained on a specific corpus, but its functionality is independent on the linguistic register of the training data.

  2. Natural Language Generation in Health Care

    Science.gov (United States)

    Cawsey, Alison J.; Webber, Bonnie L.; Jones, Ray B.

    1997-01-01

    Abstract Good communication is vital in health care, both among health care professionals, and between health care professionals and their patients. And well-written documents, describing and/or explaining the information in structured databases may be easier to comprehend, more edifying, and even more convincing than the structured data, even when presented in tabular or graphic form. Documents may be automatically generated from structured data, using techniques from the field of natural language generation. These techniques are concerned with how the content, organization and language used in a document can be dynamically selected, depending on the audience and context. They have been used to generate health education materials, explanations and critiques in decision support systems, and medical reports and progress notes. PMID:9391935

  3. On the Relationship between a Computational Natural Logic and Natural Language

    DEFF Research Database (Denmark)

    Andreasen, Troels; Bulskov, Henrik; Nilsson, Jørgen Fischer

    2016-01-01

    This paper makes a case for adopting appropriate forms of natural logic as target language for computational reasoning with descriptive natural language. Natural logics are stylized fragments of natural language where reasoning can be conducted directly by natural reasoning rules reflecting intui...

  4. Constructing Concept Schemes From Astronomical Telegrams Via Natural Language Clustering

    Science.gov (United States)

    Graham, Matthew; Zhang, M.; Djorgovski, S. G.; Donalek, C.; Drake, A. J.; Mahabal, A.

    2012-01-01

    The rapidly emerging field of time domain astronomy is one of the most exciting and vibrant new research frontiers, ranging in scientific scope from studies of the Solar System to extreme relativistic astrophysics and cosmology. It is being enabled by a new generation of large synoptic digital sky surveys - LSST, PanStarrs, CRTS - that cover large areas of sky repeatedly, looking for transient objects and phenomena. One of the biggest challenges facing these is the automated classification of transient events, a process that needs machine-processible astronomical knowledge. Semantic technologies enable the formal representation of concepts and relations within a particular domain. ATELs (http://www.astronomerstelegram.org) are a commonly-used means for reporting and commenting upon new astronomical observations of transient sources (supernovae, stellar outbursts, blazar flares, etc). However, they are loose and unstructured and employ scientific natural language for description: this makes automated processing of them - a necessity within the next decade with petascale data rates - a challenge. Nevertheless they represent a potentially rich corpus of information that could lead to new and valuable insights into transient phenomena. This project lies in the cutting-edge field of astrosemantics, a branch of astroinformatics, which applies semantic technologies to astronomy. The ATELs have been used to develop an appropriate concept scheme - a representation of the information they contain - for transient astronomy using hierarchical clustering of processed natural language. This allows us to automatically organize ATELs based on the vocabulary used. We conclude that we can use simple algorithms to process and extract meaning from astronomical textual data.

  5. Manual versus Automated Narrative Analysis of Agrammatic Production Patterns: The Northwestern Narrative Language Analysis and Computerized Language Analysis

    Science.gov (United States)

    Hsu, Chien-Ju; Thompson, Cynthia K.

    2018-01-01

    Purpose: The purpose of this study is to compare the outcomes of the manually coded Northwestern Narrative Language Analysis (NNLA) system, which was developed for characterizing agrammatic production patterns, and the automated Computerized Language Analysis (CLAN) system, which has recently been adopted to analyze speech samples of individuals…

  6. Natural Language Processing in Radiology: A Systematic Review.

    Science.gov (United States)

    Pons, Ewoud; Braun, Loes M M; Hunink, M G Myriam; Kors, Jan A

    2016-05-01

    Radiological reporting has generated large quantities of digital content within the electronic health record, which is potentially a valuable source of information for improving clinical care and supporting research. Although radiology reports are stored for communication and documentation of diagnostic imaging, harnessing their potential requires efficient and automated information extraction: they exist mainly as free-text clinical narrative, from which it is a major challenge to obtain structured data. Natural language processing (NLP) provides techniques that aid the conversion of text into a structured representation, and thus enables computers to derive meaning from human (ie, natural language) input. Used on radiology reports, NLP techniques enable automatic identification and extraction of information. By exploring the various purposes for their use, this review examines how radiology benefits from NLP. A systematic literature search identified 67 relevant publications describing NLP methods that support practical applications in radiology. This review takes a close look at the individual studies in terms of tasks (ie, the extracted information), the NLP methodology and tools used, and their application purpose and performance results. Additionally, limitations, future challenges, and requirements for advancing NLP in radiology will be discussed. (©) RSNA, 2016 Online supplemental material is available for this article.

  7. Two interpretive systems for natural language?

    Science.gov (United States)

    Frazier, Lyn

    2015-02-01

    It is proposed that humans have available to them two systems for interpreting natural language. One system is familiar from formal semantics. It is a type based system that pairs a syntactic form with its interpretation using grammatical rules of composition. This system delivers both plausible and implausible meanings. The other proposed system is one that uses the grammar together with knowledge of how the human production system works. It is token based and only delivers plausible meanings, including meanings based on a repaired input when the input might have been produced as a speech error.

  8. The language of Islamic extremism: Towards an automated identification of beliefs, motivations and justifications

    NARCIS (Netherlands)

    Prentice, S.; Rayson, P.; Taylor, Paul J

    2012-01-01

    Recent studies have sought to understand individuals' motivations for terrorism through terrorist material content. To date, these studies have not capitalised on automated language analysis techniques, particularly those of corpus linguistics. In this paper, we demonstrate how applying three

  9. Inselect: Automating the Digitization of Natural History Collections.

    Directory of Open Access Journals (Sweden)

    Lawrence N Hudson

    Full Text Available The world's natural history collections constitute an enormous evidence base for scientific research on the natural world. To facilitate these studies and improve access to collections, many organisations are embarking on major programmes of digitization. This requires automated approaches to mass-digitization that support rapid imaging of specimens and associated data capture, in order to process the tens of millions of specimens common to most natural history collections. In this paper we present Inselect-a modular, easy-to-use, cross-platform suite of open-source software tools that supports the semi-automated processing of specimen images generated by natural history digitization programmes. The software is made up of a Windows, Mac OS X, and Linux desktop application, together with command-line tools that are designed for unattended operation on batches of images. Blending image visualisation algorithms that automatically recognise specimens together with workflows to support post-processing tasks such as barcode reading, label transcription and metadata capture, Inselect fills a critical gap to increase the rate of specimen digitization.

  10. The social impact of natural language processing

    DEFF Research Database (Denmark)

    Hovy, Dirk; Spruit, Shannon

    Research in natural language processing (NLP) used to be mostly performed on anonymous corpora, with the goal of enriching linguistic analysis. Authors were either largely unknown or public figures. As we increasingly use more data from social media, this situation has changed: users are now...... individually identifiable, and the outcome of NLP experiments and applications can have a direct effect on their lives. This change should spawn a debate about the ethical implications of NLP, but until now, the internal discourse in the field has not followed the technological development. This position paper...... identifies a number of social implications that NLP research may have, and discusses their ethical significance, as well as ways to address them....

  11. Landscape Design and the language of Nature

    Directory of Open Access Journals (Sweden)

    Stephen Perry

    2008-07-01

    Full Text Available Recognition that we need to live in a more ecologically sustainable way and that the physical forms of designed landscapes are an expression of the social values and cultural drivers of the time has underpinned the call by some landscape design professionals for a new design aesthetic - one that reflects modern ecological concerns. However, for an 'ecological aesthetic' to be accepted, it must be capable of generating landscape forms that are pleasurable to the general public, as it is the general public who will be responsible for delivering ecological sustainability in the long term. The growth in understanding of the mathematical properties of natural systems and processes has led some authors to suggest that fractal geometry, called the language of nature, could playa role in developing such an aesthetic. This is supported by recent research that suggests human perceptual systems have evolved to process fractal patterning and that we have a visual preference for images with certain fractal qualities. However, how fractal geometry can be used, and what form an aesthetic based on this geometry might take, remains elusive and undefined. To develop an aesthetic based on fractal geometry it is necessary to understand why fractal geometry should be considered as a potential tool and whether the application of fractal analysis can differentiate between the types of landscape forms encountered every day.

  12. Capturing and Modeling Domain Knowledge Using Natural Language Processing Techniques

    National Research Council Canada - National Science Library

    Auger, Alain

    2005-01-01

    .... Initiated in 2004 at Defense Research and Development Canada (DRDC), the SACOT knowledge engineering research project is currently investigating, developing and validating innovative natural language processing (NLP...

  13. Semiotic Nature of Language Teaching Methods in Foreign Language Learning and Teaching

    OpenAIRE

    Erton, İsmail

    2006-01-01

    This paper aims to cover the semiotic nature of language teaching methods, and their sample applications in the language classroom. The verbal and the non-verbal aspects of language teaching should not be kept separate since they are closely interrelated and interdependent. The use of signs, symbols and visual aids by the teachers help the enhancement of the learning capacity of the language learner both at cognitive and meta-cognitive levels as they listen and try to learn a foreign language...

  14. The Islamic State Battle Plan: Press Release Natural Language Processing

    Science.gov (United States)

    2016-06-01

    we apply Natural Language Processing (NLP) tools to a unique database constructed from approximately 3,000 English translated press releases...in the English language . It denies any bias introduced by limiting sources to English language media reports. IBC critics claim that its body counts...added benefit to the understanding of the text. There are variations of stopwords for each language . The System for the Mechanical Analysis and

  15. Natural language solution to a Tuff problem

    International Nuclear Information System (INIS)

    Langkopf, B.S.; Mallory, L.H.

    1984-01-01

    A scientific data base, the Tuff Data Base, is being created at Sandia National Laboratories on the Cyber 170/855, using System 2000. It is being developed for use by scientists and engineers investigating the feasibility of locating a high-level radioactive waste repository in tuff (a type of volcanic rock) at Yucca Mountain on and adjacent to the Nevada Test Site. This project, the Nevada Nuclear Waste Storage Investigations (NNWSI) Project, is managed by the Nevada Operations Office of the US Department of Energy. A user-friendly interface, PRIMER, was developed that uses the Self-Contained Facility (SCF) command SUBMIT and System 2000 Natural Language functions and parametric strings that are schema resident. The interface was designed to: (1) allow users, with or without computer experience or keyboard skill, to sporadically access data in the Tuff Data Base; (2) produce retrieval capabilities for the user quickly; and (3) acquaint the users with the data in the Tuff Data Base. This paper gives a brief description of the Tuff Data Base Schema and the interface, PRIMER, which is written in Fortran V. 3 figures

  16. Natural language metaphors covertly influence reasoning.

    Science.gov (United States)

    Thibodeau, Paul H; Boroditsky, Lera

    2013-01-01

    Metaphors pervade discussions of social issues like climate change, the economy, and crime. We ask how natural language metaphors shape the way people reason about such social issues. In previous work, we showed that describing crime metaphorically as a beast or a virus, led people to generate different solutions to a city's crime problem. In the current series of studies, instead of asking people to generate a solution on their own, we provided them with a selection of possible solutions and asked them to choose the best ones. We found that metaphors influenced people's reasoning even when they had a set of options available to compare and select among. These findings suggest that metaphors can influence not just what solution comes to mind first, but also which solution people think is best, even when given the opportunity to explicitly compare alternatives. Further, we tested whether participants were aware of the metaphor. We found that very few participants thought the metaphor played an important part in their decision. Further, participants who had no explicit memory of the metaphor were just as much affected by the metaphor as participants who were able to remember the metaphorical frame. These findings suggest that metaphors can act covertly in reasoning. Finally, we examined the role of political affiliation on reasoning about crime. The results confirm our previous findings that Republicans are more likely to generate enforcement and punishment solutions for dealing with crime, and are less swayed by metaphor than are Democrats or Independents.

  17. Natural Language Video Description using Deep Recurrent Neural Networks

    Science.gov (United States)

    2015-11-23

    language with a single deep neural network. We use deep recurrent nets (RNNs), which have recently demonstrated strong results for machine translation (MT...Donahue, Marcus Rohrbach, Raymond Mooney, and Kate Saenko. Translating videos to natural language using deep recurrent neural net - works. In NAACL, 2015...Natural Language Video Description using Deep Recurrent Neural Networks Subhashini Venugopalan University of Texas at Austin vsub@cs.utexas.edu

  18. Cognitive Neuroscience of Natural Language Use

    NARCIS (Netherlands)

    Willems, R.M.

    2015-01-01

    When we think of everyday language use, the first things that come to mind include colloquial conversations, reading and writing e-mails, sending text messages or reading a book. But can we study the brain basis of language as we use it in our daily lives? As a topic of study, the cognitive

  19. Do neural nets learn statistical laws behind natural language?

    Directory of Open Access Journals (Sweden)

    Shuntaro Takahashi

    Full Text Available The performance of deep learning in natural language processing has been spectacular, but the reasons for this success remain unclear because of the inherent complexity of deep learning. This paper provides empirical evidence of its effectiveness and of a limitation of neural networks for language engineering. Precisely, we demonstrate that a neural language model based on long short-term memory (LSTM effectively reproduces Zipf's law and Heaps' law, two representative statistical properties underlying natural language. We discuss the quality of reproducibility and the emergence of Zipf's law and Heaps' law as training progresses. We also point out that the neural language model has a limitation in reproducing long-range correlation, another statistical property of natural language. This understanding could provide a direction for improving the architectures of neural networks.

  20. Whole language and deaf bilingual-bicultural education--naturally!

    Science.gov (United States)

    Mason, D; Ewoldt, C

    1996-10-01

    This position paper discusses how the tenets of Whole Language and Deaf Bilingual-Bicultural Education complement each other. It stresses that Whole Language is based on natural processes through which children can translate their constructs of personal experiences, observations, and perspectives into modes of communication that include written language and, in the present case, American Sign Language. The paper is based on two emphases: (a) Whole Language emphasizes a two-way teaching/learning process, teachers learning from children, and vice versa; and (b) Deaf Bilingual-Bicultural Education emphasizes American Sign Language as a language of instruction and builds on mutual respect for the similarities and differences in the sociocultural and socioeducational experiences and values of Deaf and hearing people. Both Whole Language and Deaf Bilingual-Bicultural Education attempt to authenticate curriculum by integrating Deaf persons' worldviews as part of educational experience.

  1. Some Uses of Natural Language Interfaces in Computer Assisted Language Learning.

    Science.gov (United States)

    Ward, R. D.

    1989-01-01

    Presents a theoretical rationale for the idea that computer programs simulating written conversation, and using natural language, could be effective in language teaching and remediation, and reports empirical studies of its potential. Studies with 10- to 14-year-old language-impaired children are described, software is explained, and future…

  2. Natural language computing an English generative grammar in Prolog

    CERN Document Server

    Dougherty, Ray C

    2013-01-01

    This book's main goal is to show readers how to use the linguistic theory of Noam Chomsky, called Universal Grammar, to represent English, French, and German on a computer using the Prolog computer language. In so doing, it presents a follow-the-dots approach to natural language processing, linguistic theory, artificial intelligence, and expert systems. The basic idea is to introduce meaningful answers to significant problems involved in representing human language data on a computer. The book offers a hands-on approach to anyone who wishes to gain a perspective on natural language

  3. Finite-state pre-processing for natural language analysis

    NARCIS (Netherlands)

    Prins, Robbert Paul

    2005-01-01

    Wide-coverage natural language parsers are typically not very efficient. Finite-state techniques are less powerful, but offer the advantage of being very fast, and good at representing language locally. This dissertation constitutes empirical research into the construction and use of a finite-state

  4. Understanding and Representing Natural Language Meaning.

    Science.gov (United States)

    1982-12-01

    Pragmatics , in press. Collins, A. and M. R. Quillian, "Experiments on Semantic Memory and Language Comprehension," in L. W. Gregg (Ed.), Cognition in Learning...ed Anaphora in Basque," ProceedingA of the 8th Anniil -cet in of the Berjkeley Ljnuisti,._; $ocietZ, Berkeley, CA, 1982. (2) Azkarate, M., D. Far

  5. Natural-Language Parser for PBEM

    Science.gov (United States)

    James, Mark

    2010-01-01

    A computer program called "Hunter" accepts, as input, a colloquial-English description of a set of policy-based-management rules, and parses that description into a form useable by policy-based enterprise management (PBEM) software. PBEM is a rules-based approach suitable for automating some management tasks. PBEM simplifies the management of a given enterprise through establishment of policies addressing situations that are likely to occur. Hunter was developed to have a unique capability to extract the intended meaning instead of focusing on parsing the exact ways in which individual words are used.

  6. Natural Language Assistant: A Dialog System for Online Product Recommendation

    OpenAIRE

    Chai, Joyce; Horvath, Veronika; Nicolov, Nicolas; Stys, Margo; Kambhatla, Nanda; Zadrozny, Wlodek; Melville, Prem

    2002-01-01

    With the emergence of electronic-commerce systems, successful information access on electroniccommerce web sites becomes essential. Menu-driven navigation and keyword search currently provided by most commercial sites have considerable limitations because they tend to overwhelm and frustrate users with lengthy, rigid, and ineffective interactions. To provide an efficient solution for information access, we have built the NATURAL language ASSISTANT (NLA), a web-based natural language dialog sy...

  7. State of the Art of Natural Language Processing

    Science.gov (United States)

    1987-11-15

    computers. ♦ Noam Chomsky , Aspects of the Theory of Syntax (Cambridge, Mass.: MIT Press, 1965). -A- One of the earliest attempts at Natural Language...of computers that a machine which understood natural languages was highly desirable. It also was evident from the work of Chomsky * and others that...20 years. All the interviewees were educated to the Ph.D. level and most had extensively published in AI literature. The interviewees were evenly

  8. Finite-State Methodology in Natural Language Processing

    Directory of Open Access Journals (Sweden)

    Michal Korzycki

    2001-01-01

    Full Text Available Recent mathematical and algorithmic results in the field of finite-state technology, as well the increase in computing power, have constructed the base for a new approach in natural language processing. However the task of creating an appropriate model that would describe the phenomena of the natural language is still to be achieved. ln this paper I'm presenting some notions related to the finite-state modelling of syntax and morphology.

  9. The nature of written language deficits in children with SLI.

    Science.gov (United States)

    Mackie, Clare; Dockrell, Julie E

    2004-12-01

    Children with specific language impairment (SLI) have associated difficulties in reading decoding and reading comprehension. To date, few research studies have examined the children's written language. The aim of the present study was to (a) evaluate the nature and extent of the children's difficulties with writing and (b) investigate the relationship between oral and written language. Eleven children with SLI were identified (mean age = 11 years) and were compared with a group of children matched for chronological age (CA; mean age = 11;2 [years;months]) and language age (LA; mean CA = 7;3). All groups completed standardized measures of language production, writing, and reading decoding. The writing assessment revealed that the SLI group wrote fewer words and produced proportionately more syntax errors than the CA group, but they did not differ on a measure of content of written language or on the proportion of spelling errors. The SLI group also produced proportionately more syntax errors than the LA group. The relationships among oral language, reading, and writing differed for the 3 groups. The nature and extent of the children's written language problems are considered in the context of difficulties with spoken language.

  10. Naturalizing language: human appraisal and (quasi) technology

    DEFF Research Database (Denmark)

    Cowley, Stephen

    2013-01-01

    Using contemporary science, the paper builds on Wittgenstein’s views of human language. Rather than ascribing reality to inscription-like entities, it links embodiment with distributed cognition. The verbal or (quasi) technological aspect of language is traced to not action, but human specific...... interactivity. This species-specific form of sense-making sustains, among other things, using texts, making/construing phonetic gestures and thinking. Human action is thus grounded in appraisals or sense-saturated coordination. To illustrate interactivity at work, the paper focuses on a case study. Over 11 s......, a crime scene investigator infers that she is probably dealing with an inside job: she uses not words, but intelligent gaze. This connects professional expertise to circumstances and the feeling of thinking. It is suggested that, as for other species, human appraisal is based in synergies. However, since...

  11. Handbook of natural language processing and machine translation DARPA global autonomous language exploitation

    CERN Document Server

    Olive, Joseph P; McCary, John

    2011-01-01

    This comprehensive handbook, written by leading experts in the field, details the groundbreaking research conducted under the breakthrough GALE program - The Global Autonomous Language Exploitation within the Defense Advanced Research Projects Agency (DARPA), while placing it in the context of previous research in the fields of natural language and signal processing, artificial intelligence and machine translation. The most fundamental contrast between GALE and its predecessor programs was its holistic integration of previously separate or sequential processes. In earlier language research pro

  12. Sociolinguistically Informed Natural Language Processing: Automating Irony Detection

    Science.gov (United States)

    2015-04-13

    this dataset to empirically demonstrate that human annotators require context to infer irony. Moreover, we have shown that the classification errors ...derides Senator Cruz (e.g., writing “Ted Cruz is no Ronald Reagan. They aren’t even close.”). From this contextual information, then, we can reasonably...annotators tend to request context. Indeed we have shown that annotators rely on contextual cues (in addition to word and grammatical features) to discern

  13. A Natural Logic for Natural-language Knowledge Bases

    DEFF Research Database (Denmark)

    Andreasen, Troels; Bulskov, Henrik; Jensen, Per Anker

    2017-01-01

    to semantic querying. Our core natural logic proposal covers formal ontologies and generative extensions thereof. It further provides means of expressing general relationships between classes in an application. We discuss extensions of the core natural logic with various conservative as well as non-conservative...

  14. A Natural Logic for Natural-language Knowledge Bases

    DEFF Research Database (Denmark)

    Andreasen, Troels; Bulskov, Henrik; Jensen, Per Anker

    2017-01-01

    to semantic querying. Our core natural logic proposal covers formal ontologies and generative extensions thereof. It further provides means of expressing general relationships between classes in an application. We discuss extensions of the core natural logic with various conservative as well as non...

  15. Statistical Language Models and Information Retrieval: Natural Language Processing Really Meets Retrieval

    NARCIS (Netherlands)

    Hiemstra, Djoerd; de Jong, Franciska M.G.

    2001-01-01

    Traditionally, natural language processing techniques for information retrieval have always been studied outside the framework of formal models of information retrieval. In this article, we introduce a new formal model of information retrieval based on the application of statistical language models.

  16. ROPE: Recoverable Order-Preserving Embedding of Natural Language

    Energy Technology Data Exchange (ETDEWEB)

    Widemann, David P. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Wang, Eric X. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Thiagarajan, Jayaraman J. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2016-02-11

    We present a novel Recoverable Order-Preserving Embedding (ROPE) of natural language. ROPE maps natural language passages from sparse concatenated one-hot representations to distributed vector representations of predetermined fixed length. We use Euclidean distance to return search results that are both grammatically and semantically similar. ROPE is based on a series of random projections of distributed word embeddings. We show that our technique typically forms a dictionary with sufficient incoherence such that sparse recovery of the original text is possible. We then show how our embedding allows for efficient and meaningful natural search and retrieval on Microsoft’s COCO dataset and the IMDB Movie Review dataset.

  17. From language to nature: The semiotic metaphor in biology

    DEFF Research Database (Denmark)

    Emmeche, Claus; Hoffmeyer, Jesper Normann

    1991-01-01

    of a program, written in a formal language in the computer. Other versions of the semiotic or "nature-as-language" metaphor uses other formal or informal aspects of language to comprehend the specific structural relations in nature as explored by molecular and evolutionary biology. This intuitively appealing......The development of form in living organisms continues to challenge biological research. The concept of biological information encoded in the genetic program that controls development forms a major part of the semiotic metaphor in biology. Development is here seen in analogy to an execution...... complex of related ideas, which has a long history in the philosophy of nature and biology, is critically reviewed. The general nature of metaphor in science is considered, and different levels of metaphorical transfer of signification is distinguished. It is argued, that the metaphors may...

  18. Techniques for Automated Testing of Lola Industrial Robot Language Parser

    Directory of Open Access Journals (Sweden)

    M. M. Lutovac

    2014-06-01

    Full Text Available The accuracy of parsing execution directly affects the accuracy of semantic analysis, optimization and object code generation. Therefore, parser testing represents the basis of compiler testing. It should include tests for correct and expected, but also for unexpected and invalid cases. Techniques for testing the parser, as well as algorithms and tools for test sentences generation, are discussed in this paper. The methodology for initial testing of a newly developed compiler is proposed. Generation of negative test sentences by modifying the original language grammar is described. Positive and negative test cases generated by Grow, Purdom’s algorithm with and without length control, CDRC-P algorithm and CDRC-P algorithm with length control are applied to the testing of L-IRL robot programming language. For this purpose two different tools for generation of test sentences are used. Based on the presented analysis of possible solutions, the appropriate method can be chosen for testing the parser for smaller grammars with many recursive rules.

  19. Semiotic Nature of Language Teaching Methods in Foreign Language Learning and Teaching

    Directory of Open Access Journals (Sweden)

    İsmail ERTON

    2006-04-01

    Full Text Available This paper aims to cover the semiotic nature of language teaching methods, andtheir sample applications in the language classroom. The verbal and the non-verbalaspects of language teaching should not be kept separate since they are closelyinterrelated and interdependent. The use of signs, symbols and visual aids by theteachers help the enhancement of the learning capacity of the language learner both atcognitive and meta-cognitive levels as they listen and try to learn a foreign languagecomponent in the classroom.

  20. LAIR: A Language for Automated Semantics-Aware Text Sanitization based on Frame Semantics

    DEFF Research Database (Denmark)

    Hedegaard, Steffen; Houen, Søren; Simonsen, Jakob Grue

    2009-01-01

    We present \\lair{}: A domain-specific language that enables users to specify actions to be taken upon meeting specific semantic frames in a text, in particular to rephrase and redact the textual content. While \\lair{} presupposes superficial knowledge of frames and frame semantics, it requires only...... limited prior programming experience. It neither contain scripting or I/O primitives, nor does it contain general loop constructions and is not Turing-complete. We have implemented a \\lair{} compiler and integrated it in a pipeline for automated redaction of web pages. We detail our experience...... with automated redaction of web pages for subjectively undesirable content; initial experiments suggest that using a small language based on semantic recognition of undesirable terms can be highly useful as a supplement to traditional methods of text sanitization....

  1. Natural Language Direction Following for Robots in Unstructured Unknown Environments

    Science.gov (United States)

    2015-01-15

    music is not to be found in the notes. Gustav Mahler Our approach so far has only considered the user’s natural language command as a specification... Electronic Lexical Database. Language, Speech, and Communication. 1998. 2.1.1, 3.4 [46] Dave Ferguson and Anthony Stentz. Field D*: An interpolation-based...and Brain Sciences, 1993. 3.1 [84] Christian Landsiedel, Roderick De Nijs, Kolja Kuhnlenz, Dirk Wollherr, and Martin Buss. Route description

  2. Natural language processing and the Now-or-Never bottleneck.

    Science.gov (United States)

    Gómez-Rodríguez, Carlos

    2016-01-01

    Researchers, motivated by the need to improve the efficiency of natural language processing tools to handle web-scale data, have recently arrived at models that remarkably match the expected features of human language processing under the Now-or-Never bottleneck framework. This provides additional support for said framework and highlights the research potential in the interaction between applied computational linguistics and cognitive science.

  3. Clinical Natural Language Processing in languages other than English: opportunities and challenges.

    Science.gov (United States)

    Névéol, Aurélie; Dalianis, Hercules; Velupillai, Sumithra; Savova, Guergana; Zweigenbaum, Pierre

    2018-03-30

    Natural language processing applied to clinical text or aimed at a clinical outcome has been thriving in recent years. This paper offers the first broad overview of clinical Natural Language Processing (NLP) for languages other than English. Recent studies are summarized to offer insights and outline opportunities in this area. We envision three groups of intended readers: (1) NLP researchers leveraging experience gained in other languages, (2) NLP researchers faced with establishing clinical text processing in a language other than English, and (3) clinical informatics researchers and practitioners looking for resources in their languages in order to apply NLP techniques and tools to clinical practice and/or investigation. We review work in clinical NLP in languages other than English. We classify these studies into three groups: (i) studies describing the development of new NLP systems or components de novo, (ii) studies describing the adaptation of NLP architectures developed for English to another language, and (iii) studies focusing on a particular clinical application. We show the advantages and drawbacks of each method, and highlight the appropriate application context. Finally, we identify major challenges and opportunities that will affect the impact of NLP on clinical practice and public health studies in a context that encompasses English as well as other languages.

  4. Design Automation Using Script Languages. High-Level CAD Templates in Non-Parametric Programs

    Science.gov (United States)

    Moreno, R.; Bazán, A. M.

    2017-10-01

    The main purpose of this work is to study the advantages offered by the application of traditional techniques of technical drawing in processes for automation of the design, with non-parametric CAD programs, provided with scripting languages. Given that an example drawing can be solved with traditional step-by-step detailed procedures, is possible to do the same with CAD applications and to generalize it later, incorporating references. In today’s modern CAD applications, there are striking absences of solutions for building engineering: oblique projections (military and cavalier), 3D modelling of complex stairs, roofs, furniture, and so on. The use of geometric references (using variables in script languages) and their incorporation into high-level CAD templates allows the automation of processes. Instead of repeatedly creating similar designs or modifying their data, users should be able to use these templates to generate future variations of the same design. This paper presents the automation process of several complex drawing examples based on CAD script files aided with parametric geometry calculation tools. The proposed method allows us to solve complex geometry designs not currently incorporated in the current CAD applications and to subsequently create other new derivatives without user intervention. Automation in the generation of complex designs not only saves time but also increases the quality of the presentations and reduces the possibility of human errors.

  5. Computing an Ontological Semantics for a Natural Language Fragment

    DEFF Research Database (Denmark)

    Szymczak, Bartlomiej Antoni

    The key objective of the research that has been carried out has been to establish theoretically sound connections between the following two areas: • Computational processing of texts in natural language by means of logical methods • Theories and methods for engineering of formal ontologies We have...... tried to establish a domain independent “ontological semantics” for relevant fragments of natural language. The purpose of this research is to develop methods and systems for taking advantage of formal ontologies for the purpose of extracting the meaning contents of texts. This functionality...... is desirable e.g. for future content–based search systems in contrast to today’s keyword based search systems (viz., Google) which rely chiefly on recognition of stated keywords in the targeted text. Logical methods were introduced into semantic theories for natural language already during the 60’s in what...

  6. Quicky location determination based on geographic keywords of natural language

    Science.gov (United States)

    Guo, Danhuai; Cui, Weihong

    2007-06-01

    In location determination based on natural language, it is common to find the location by describing relationship between the undetermined position and one or several determined position. That indicates that the uncertainty of location determination processing is derived from the one of natural language procedure, the one of spatial position description and the one of spatial relationship description. Most of current researches and regular GIS software take certainty as prerequisite and try to avoid uncertainty and its influence. The research reported in this paper is an attempt to create a new combing method of Artificial Intelligence (AI), Fuzzy set theory and spatial information science named Quickly Location Determination based on Geographic Keywords (QLDGK) to rise to the challenge of location searching technique based on natural language. QLDGK have two technical gists. The first one is geographic-keywords-library and special natural-language-separation-model-library that increases the language processing efficiency. The second one is fuzzy theory based definition of spatial relationship, spatial metric and spatial orientation that extends the searching scope and defines variant confidences on variant searching outcome. QLDGK takes consideration on both higher query efficiency and the lower omission rate. The above method has been proved workable and efficient by QLDGK prototype system which was tested by about 12000 emergency call reports from K-city, Southwest of China, and achieved the test result with 78% accuracy in highest confidence and 8% omitting ration.

  7. Learning to rank for information retrieval and natural language processing

    CERN Document Server

    Li, Hang

    2014-01-01

    Learning to rank refers to machine learning techniques for training a model in a ranking task. Learning to rank is useful for many applications in information retrieval, natural language processing, and data mining. Intensive studies have been conducted on its problems recently, and significant progress has been made. This lecture gives an introduction to the area including the fundamental problems, major approaches, theories, applications, and future work.The author begins by showing that various ranking problems in information retrieval and natural language processing can be formalized as tw

  8. System reliability analysis with natural language and expert's subjectivity

    International Nuclear Information System (INIS)

    Onisawa, T.

    1996-01-01

    This paper introduces natural language expressions and expert's subjectivity to system reliability analysis. To this end, this paper defines a subjective measure of reliability and presents the method of the system reliability analysis using the measure. The subjective measure of reliability corresponds to natural language expressions of reliability estimation, which is represented by a fuzzy set defined on [0,1]. The presented method deals with the dependence among subsystems and employs parametrized operations of subjective measures of reliability which can reflect expert 's subjectivity towards the analyzed system. The analysis results are also expressed by linguistic terms. Finally this paper gives an example of the system reliability analysis by the presented method

  9. Second Language Aquisition and The Development through Nature-Nurture

    Directory of Open Access Journals (Sweden)

    Syahfitri Purnama

    2017-10-01

    Full Text Available There are some factors regarding which aspect of second language acquisition is affected by individual learner factors, age, learning style. aptitude, motivation, and personality. This research is about English language acquisition of fourth-year child by nature and nurture. The child acquired her second language acquisition at home and also in one of the courses in Jakarta. She schooled by her parents in order to be able to speak English well as a target language for her future time. The purpose of this paper is to see and examine individual learner difference especially in using English as a second language. This study is a library research and retrieved data collected, recorded, transcribed, and analyzed descriptively. The results can be concluded: the child is able to communicate well and also able to construct simple sentences, complex sentences, sentence statement, phrase questions, and explain something when her teacher asks her at school. She is able to communicate by making a simple sentence or compound sentence in well-form (two clauses or three clauses, even though she still not focus to use the past tense form and sometimes she forgets to put bound morpheme -s in third person singular but she can use turn-taking in her utterances. It is a very long process since the child does the second language acquisition. The family and teacher should participate and assist the child, the proven child can learn the first and the second language at the same time.

  10. Natural Language Processing for the Swiss German Dialect Area

    OpenAIRE

    Scherrer, Yves; Rambow, Owen

    2010-01-01

    This paper discusses work on data collection for Swiss German dialects taking into account the continuous nature of the dialect landscape, and proposes to integrate these data into natural language processing models. We present knowledge-based models for machine translation into any Swiss German dialect, for dialect identification, and for multi-dialectal parsing. In a dialect continuum, rules cannot be applied uniformly, but have restricted validity in well-defined geographic areas. Therefor...

  11. A Tutorial on Techniques and Applications for Natural Language Processing

    Science.gov (United States)

    1983-10-17

    machines through natural language. The emphasis is pragmatic . It is less important in applied NLP whether the machine "understands" its natural...between man and machine or communication between two people, entails discourse phenomena that transcend individual sentences. e Anaphora - Pronouns and...identifying the referents of these place-holder words. Interactive dialogues invite the use of anaphora , much more than simpler data base query situations

  12. Classifying free-text triage chief complaints into syndromic categories with natural language processing.

    Science.gov (United States)

    Chapman, Wendy W; Christensen, Lee M; Wagner, Michael M; Haug, Peter J; Ivanov, Oleg; Dowling, John N; Olszewski, Robert T

    2005-01-01

    Develop and evaluate a natural language processing application for classifying chief complaints into syndromic categories for syndromic surveillance. Much of the input data for artificial intelligence applications in the medical field are free-text patient medical records, including dictated medical reports and triage chief complaints. To be useful for automated systems, the free-text must be translated into encoded form. We implemented a biosurveillance detection system from Pennsylvania to monitor the 2002 Winter Olympic Games. Because input data was in free-text format, we used a natural language processing text classifier to automatically classify free-text triage chief complaints into syndromic categories used by the biosurveillance system. The classifier was trained on 4700 chief complaints from Pennsylvania. We evaluated the ability of the classifier to classify free-text chief complaints into syndromic categories with a test set of 800 chief complaints from Utah. The classifier produced the following areas under the ROC curve: Constitutional = 0.95; Gastrointestinal = 0.97; Hemorrhagic = 0.99; Neurological = 0.96; Rash = 1.0; Respiratory = 0.99; Other = 0.96. Using information stored in the system's semantic model, we extracted from the Respiratory classifications lower respiratory complaints and lower respiratory complaints with fever with a precision of 0.97 and 0.96, respectively. Results suggest that a trainable natural language processing text classifier can accurately extract data from free-text chief complaints for biosurveillance.

  13. Modeling virtual organizations with Latent Dirichlet Allocation: a case for natural language processing.

    Science.gov (United States)

    Gross, Alexander; Murthy, Dhiraj

    2014-10-01

    This paper explores a variety of methods for applying the Latent Dirichlet Allocation (LDA) automated topic modeling algorithm to the modeling of the structure and behavior of virtual organizations found within modern social media and social networking environments. As the field of Big Data reveals, an increase in the scale of social data available presents new challenges which are not tackled by merely scaling up hardware and software. Rather, they necessitate new methods and, indeed, new areas of expertise. Natural language processing provides one such method. This paper applies LDA to the study of scientific virtual organizations whose members employ social technologies. Because of the vast data footprint in these virtual platforms, we found that natural language processing was needed to 'unlock' and render visible latent, previously unseen conversational connections across large textual corpora (spanning profiles, discussion threads, forums, and other social media incarnations). We introduce variants of LDA and ultimately make the argument that natural language processing is a critical interdisciplinary methodology to make better sense of social 'Big Data' and we were able to successfully model nested discussion topics from forums and blog posts using LDA. Importantly, we found that LDA can move us beyond the state-of-the-art in conventional Social Network Analysis techniques. Copyright © 2014 Elsevier Ltd. All rights reserved.

  14. Spatial Extent Models for Natural Language Phrases Involving Directional Containment

    NARCIS (Netherlands)

    Singh, G.; de By, R.A.

    2015-01-01

    We study the problem of assigning a spatial extent to a text phrase such as central northern California', with the objective of allowing spatial interpretations of natural language, and consistency testing of complex utterances that involve multiple phrases from which spatial extent can be derived.

  15. Generating natural language descriptions using speaker-dependent information

    NARCIS (Netherlands)

    Castro Ferreira, Thiago; Paraboni, Ivandré

    2017-01-01

    This paper discusses the issue of human variation in natural language referring expression generation. We introduce a model of content selection that takes speaker-dependent information into account to produce descriptions that closely resemble those produced by each individual, as seen in a number

  16. Perspectives on Bayesian Natural Language Semantics and Pragmatics

    NARCIS (Netherlands)

    Zeevat, H.; Zeevat, H.; Schmitz, H.-C.

    2015-01-01

    Bayesian interpretation is a technique in signal processing and its application to natural language semantics and pragmatics (BNLSP from here on and BNLI if there is no particular emphasis on semantics and pragmatics) is basically an engineering decision. It is a cognitive science hypothesis that

  17. Recurrent Artificial Neural Networks and Finite State Natural Language Processing.

    Science.gov (United States)

    Moisl, Hermann

    It is argued that pessimistic assessments of the adequacy of artificial neural networks (ANNs) for natural language processing (NLP) on the grounds that they have a finite state architecture are unjustified, and that their adequacy in this regard is an empirical issue. First, arguments that counter standard objections to finite state NLP on the…

  18. Spinoza II: Conceptual Case-Based Natural Language Analysis.

    Science.gov (United States)

    Schank, Roger C.; And Others

    This paper presents the theoretical changes that have developed in Conceptual Dependency Theory and their ramifications in computer analysis of natural language. The major items of concern are: the elimination of reliance on "grammar rules" for parsing with the emphasis given to conceptual rule based parsing; the development of a…

  19. CITE NLM: Natural-Language Searching in an Online Catalog.

    Science.gov (United States)

    Doszkocs, Tamas E.

    1983-01-01

    The National Library of Medicine's Current Information Transfer in English public access online catalog offers unique subject search capabilities--natural-language query input, automatic medical subject headings display, closest match search strategy, ranked document output, dynamic end user feedback for search refinement. References, description…

  20. Orwell's 1984: Natural Language Searching and the Contemporary Metaphor.

    Science.gov (United States)

    Dadlez, Eva M.

    1984-01-01

    Describes a natural language searching strategy for retrieving current material which has bearing on George Orwell's "1984," and identifies four main themes (technology, authoritarianism, press and psychological/linguistic implications of surveillance, political oppression) which have emerged from cross-database searches of the "Big…

  1. The Nature of Object Marking in American Sign Language

    Science.gov (United States)

    Gokgoz, Kadir

    2013-01-01

    In this dissertation, I examine the nature of object marking in American Sign Language (ASL). I investigate object marking by means of directionality (the movement of the verb towards a certain location in signing space) and by means of handling classifiers (certain handshapes accompanying the verb). I propose that object marking in ASL is…

  2. Paired structures in logical and semiotic models of natural language

    DEFF Research Database (Denmark)

    Rodríguez, J. Tinguaro; Franco, Camilo; Montero, Javier

    2014-01-01

    The evidence coming from cognitive psychology and linguistics shows that pairs of reference concepts (as e.g. good/bad, tall/short, nice/ugly, etc.) play a crucial role in the way we everyday use and understand natural languages in order to analyze reality and make decisions. Different situations...

  3. Use of information-retrieval languages in automated retrieval of experimental data from long-term storage

    Science.gov (United States)

    Khovanskiy, Y. D.; Kremneva, N. I.

    1975-01-01

    Problems and methods are discussed of automating information retrieval operations in a data bank used for long term storage and retrieval of data from scientific experiments. Existing information retrieval languages are analyzed along with those being developed. The results of studies discussing the application of the descriptive 'Kristall' language used in the 'ASIOR' automated information retrieval system are presented. The development and use of a specialized language of the classification-descriptive type, using universal decimal classification indices as the main descriptors, is described.

  4. Medical problem and document model for natural language understanding.

    Science.gov (United States)

    Meystre, Stephanie; Haug, Peter J

    2003-01-01

    We are developing tools to help maintain a complete, accurate and timely problem list within a general purpose Electronic Medical Record system. As a part of this project, we have designed a system to automatically retrieve medical problems from free-text documents. Here we describe an information model based on XML (eXtensible Markup Language) and compliant with the CDA (Clinical Document Architecture). This model is used to ease the exchange of clinical data between the Natural Language Understanding application that retrieves potential problems from narrative document, and the problem list management application.

  5. Managing Fieldwork Data with Toolbox and the Natural Language Toolkit

    Directory of Open Access Journals (Sweden)

    Stuart Robinson

    2007-06-01

    Full Text Available This paper shows how fieldwork data can be managed using the program Toolbox together with the Natural Language Toolkit (NLTK for the Python programming language. It provides background information about Toolbox and describes how it can be downloaded and installed. The basic functionality of the program for lexicons and texts is described, and its strengths and weaknesses are reviewed. Its underlying data format is briefly discussed, and Toolbox processing capabilities of NLTK are introduced, showing ways in which it can be used to extend the functionality of Toolbox. This is illustrated with a few simple scripts that demonstrate basic data management tasks relevant to language documentation, such as printing out the contents of a lexicon as HTML.

  6. Use of the PASKAL' language for programming in experiment automation systems

    International Nuclear Information System (INIS)

    Ostrovnoj, A.I.

    1985-01-01

    A complex of standard solutions intended for realization of the main functions is suggested; execution of these solutions is provided by any system for experiment automation. They include: recording and accumulation of experimental data; visualization and preliminary processing of incoming data, interaction with the operator and system control; data filing. It is advisable to use standard software, to represent data processing algorithms as parallel processes, to apply the PASCAL' language for programming. Programming using CAMAC equipment is provided by complex of procedures similar to the set of subprograms in the FORTRAN language. Utilization of a simple data file in accumulation and processing programs ensures unified representation of experimental data and uniform access to them on behalf of a large number of programs operating both on-line and off-line regimes. The suggested approach is realized when developing systems on the base of the SM-3, SM-4 and MERA-60 computers with RAFOS operating system

  7. Combining Natural Language Processing and Statistical Text Mining: A Study of Specialized versus Common Languages

    Science.gov (United States)

    Jarman, Jay

    2011-01-01

    This dissertation focuses on developing and evaluating hybrid approaches for analyzing free-form text in the medical domain. This research draws on natural language processing (NLP) techniques that are used to parse and extract concepts based on a controlled vocabulary. Once important concepts are extracted, additional machine learning algorithms,…

  8. Conclusiveness of natural languages and recognition of images

    Energy Technology Data Exchange (ETDEWEB)

    Wojcik, Z.M.

    1983-01-01

    The conclusiveness is investigated using recognition processes and one-one correspondence between expressions of a natural language and graphs representing events. The graphs, as conceived in psycholinguistics, are obtained as a result of perception processes. It is possible to generate and process the graphs automatically, using computers and then to convert the resulting graphs into expressions of a natural language. Correctness and conclusiveness of the graphs and sentences are investigated using the fundamental condition for events representation processes. Some consequences of the conclusiveness are discussed, e.g. undecidability of arithmetic, human brain assymetry, correctness of statistical calculations and operations research. It is suggested that the group theory should be imposed on mathematical models of any real system. Proof of the fundamental condition is also presented. 14 references.

  9. Exploiting Lexical Regularities in Designing Natural Language Systems.

    Science.gov (United States)

    1988-04-01

    ELEMENT. PROJECT. TASKN Artificial Inteligence Laboratory A1A4WR NTumet 0) 545 Technology Square Cambridge, MA 02139 Ln *t- CONTROLLING OFFICE NAME AND...RO-RI95 922 EXPLOITING LEXICAL REGULARITIES IN DESIGNING NATURAL 1/1 LANGUAGE SYSTENS(U) MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE...oes.ary and ftdou.Ip hr Nl wow" L,2This paper presents the lexical component of the START Question Answering system developed at the MIT Artificial

  10. Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

    OpenAIRE

    Gorrell, Genevieve

    2006-01-01

    The current surge of interest in search and comparison tasks in natural language processing has brought with it a focus on vector space approaches and vector space dimensionality reduction techniques. Presenting data as points in hyperspace provides opportunities to use a variety of welldeveloped tools pertinent to this representation. Dimensionality reduction allows data to be compressed and generalised. Eigen decomposition and related algorithms are one category of approaches to dimensional...

  11. ARSENAL: Automatic Requirements Specification Extraction from Natural Language

    OpenAIRE

    Ghosh, Shalini; Elenius, Daniel; Li, Wenchao; Lincoln, Patrick; Shankar, Natarajan; Steiner, Wilfried

    2014-01-01

    Requirements are informal and semi-formal descriptions of the expected behavior of a complex system from the viewpoints of its stakeholders (customers, users, operators, designers, and engineers). However, for the purpose of design, testing, and verification for critical systems, we can transform requirements into formal models that can be analyzed automatically. ARSENAL is a framework and methodology for systematically transforming natural language (NL) requirements into analyzable formal mo...

  12. Anaphora and Logical Form: On Formal Meaning Representations for Natural Language. Technical Report No. 36.

    Science.gov (United States)

    Nash-Webber, Bonnie; Reiter, Raymond

    This paper describes a computational approach to certain problems of anaphora in natural language and argues in favor of formal meaning representation languages (MRLs) for natural language. After presenting arguments in favor of formal meaning representation languages, appropriate MRLs are discussed. Minimal requirements include provisions for…

  13. Discovery of Kolmogorov Scaling in the Natural Language

    Directory of Open Access Journals (Sweden)

    Maurice H. P. M. van Putten

    2017-05-01

    Full Text Available We consider the rate R and variance σ 2 of Shannon information in snippets of text based on word frequencies in the natural language. We empirically identify Kolmogorov’s scaling law in σ 2 ∝ k - 1 . 66 ± 0 . 12 (95% c.l. as a function of k = 1 / N measured by word count N. This result highlights a potential association of information flow in snippets, analogous to energy cascade in turbulent eddies in fluids at high Reynolds numbers. We propose R and σ 2 as robust utility functions for objective ranking of concordances in efficient search for maximal information seamlessly across different languages and as a starting point for artificial attention.

  14. 'Fly Like This': Natural Language Interface for UAV Mission Planning

    Science.gov (United States)

    Chandarana, Meghan; Meszaros, Erica L.; Trujillo, Anna; Allen, B. Danette

    2017-01-01

    With the increasing presence of unmanned aerial vehicles (UAVs) in everyday environments, the user base of these powerful and potentially intelligent machines is expanding beyond exclusively highly trained vehicle operators to include non-expert system users. Scientists seeking to augment costly and often inflexible methods of data collection historically used are turning towards lower cost and reconfigurable UAVs. These new users require more intuitive and natural methods for UAV mission planning. This paper explores two natural language interfaces - gesture and speech - for UAV flight path generation through individual user studies. Subjects who participated in the user studies also used a mouse-based interface for a baseline comparison. Each interface allowed the user to build flight paths from a library of twelve individual trajectory segments. Individual user studies evaluated performance, efficacy, and ease-of-use of each interface using background surveys, subjective questionnaires, and observations on time and correctness. Analysis indicates that natural language interfaces are promising alternatives to traditional interfaces. The user study data collected on the efficacy and potential of each interface will be used to inform future intuitive UAV interface design for non-expert users.

  15. A semantic-based approach for querying linked data using natural language

    KAUST Repository

    Paredes-Valverde, Mario Andrés

    2016-01-11

    The semantic Web aims to provide to Web information with a well-defined meaning and make it understandable not only by humans but also by computers, thus allowing the automation, integration and reuse of high-quality information across different applications. However, current information retrieval mechanisms for semantic knowledge bases are intended to be only used by expert users. In this work, we propose a natural language interface that allows non-expert users the access to this kind of information through formulating queries in natural language. The present approach uses a domain-independent ontology model to represent the question\\'s structure and context. Also, this model allows determination of the answer type expected by the user based on a proposed question classification. To prove the effectiveness of our approach, we have conducted an evaluation in the music domain using LinkedBrainz, an effort to provide the MusicBrainz information as structured data on the Web by means of Semantic Web technologies. Our proposal obtained encouraging results based on the F-measure metric, ranging from 0.74 to 0.82 for a corpus of questions generated by a group of real-world end users. © The Author(s) 2015.

  16. Deviations in the Zipf and Heaps laws in natural languages

    International Nuclear Information System (INIS)

    Bochkarev, Vladimir V; Lerner, Eduard Yu; Shevlyakova, Anna V

    2014-01-01

    This paper is devoted to verifying of the empirical Zipf and Hips laws in natural languages using Google Books Ngram corpus data. The connection between the Zipf and Heaps law which predicts the power dependence of the vocabulary size on the text size is discussed. In fact, the Heaps exponent in this dependence varies with the increasing of the text corpus. To explain it, the obtained results are compared with the probability model of text generation. Quasi-periodic variations with characteristic time periods of 60-100 years were also found

  17. Context and Natural Language in Formal Concept Analysis

    DEFF Research Database (Denmark)

    Wray, Tim; Eklund, Peter

    2017-01-01

    CollectionWeb is a framework that uses Formal Concept Analysis (FCA) to link contextually related objects within museum collections. These connections are used to drive a number of user interactions that are intended to promote exploration and discovery. The idea is based on museological perspect...... narratives based on conceptual pathways. The framework has been applied to a number of user facing applications and provides insights on how FCA and natural language pipelines can be used to provide contextual, linked navigation within museum collections....

  18. VnCoreNLP: A Vietnamese Natural Language Processing Toolkit

    OpenAIRE

    Vu, Thanh; Nguyen, Dat Quoc; Nguyen, Dai Quoc; Dras, Mark; Johnson, Mark

    2018-01-01

    We present an easy-to-use and fast toolkit, namely VnCoreNLP---a Java NLP annotation pipeline for Vietnamese. Our VnCoreNLP supports key natural language processing (NLP) tasks including word segmentation, part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing, and obtains state-of-the-art (SOTA) results for these tasks. We release VnCoreNLP to provide rich linguistic annotations to facilitate research work on Vietnamese NLP. Our VnCoreNLP is open-source under GPL...

  19. Harnessing QbD, Programming Languages, and Automation for Reproducible Biology.

    Science.gov (United States)

    Sadowski, Michael I; Grant, Chris; Fell, Tim S

    2016-03-01

    Building robust manufacturing processes from biological components is a task that is highly complex and requires sophisticated tools to describe processes, inputs, and measurements and administrate management of knowledge, data, and materials. We argue that for bioengineering to fully access biological potential, it will require application of statistically designed experiments to derive detailed empirical models of underlying systems. This requires execution of large-scale structured experimentation for which laboratory automation is necessary. This requires development of expressive, high-level languages that allow reusability of protocols, characterization of their reliability, and a change in focus from implementation details to functional properties. We review recent developments in these areas and identify what we believe is an exciting trend that promises to revolutionize biotechnology. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  20. Does textual feedback hinder spoken interaction in natural language?

    Science.gov (United States)

    Le Bigot, Ludovic; Terrier, Patrice; Jamet, Eric; Botherel, Valerie; Rouet, Jean-Francois

    2010-01-01

    The aim of the study was to determine the influence of textual feedback on the content and outcome of spoken interaction with a natural language dialogue system. More specifically, the assumption that textual feedback could disrupt spoken interaction was tested in a human-computer dialogue situation. In total, 48 adult participants, familiar with the system, had to find restaurants based on simple or difficult scenarios using a real natural language service system in a speech-only (phone), speech plus textual dialogue history (multimodal) or text-only (web) modality. The linguistic contents of the dialogues differed as a function of modality, but were similar whether the textual feedback was included in the spoken condition or not. These results add to burgeoning research efforts on multimodal feedback, in suggesting that textual feedback may have little or no detrimental effect on information searching with a real system. STATEMENT OF RELEVANCE: The results suggest that adding textual feedback to interfaces for human-computer dialogue could enhance spoken interaction rather than create interference. The literature currently suggests that adding textual feedback to tasks that depend on the visual sense benefits human-computer interaction. The addition of textual output when the spoken modality is heavily taxed by the task was investigated.

  1. Suicide Note Classification Using Natural Language Processing: A Content Analysis

    Directory of Open Access Journals (Sweden)

    John Pestian

    2010-08-01

    Full Text Available Suicide is the second leading cause of death among 25–34 year olds and the third leading cause of death among 15–25 year olds in the United States. In the Emergency Department, where suicidal patients often present, estimating the risk of repeated attempts is generally left to clinical judgment. This paper presents our second attempt to determine the role of computational algorithms in understanding a suicidal patient’s thoughts, as represented by suicide notes. We focus on developing methods of natural language processing that distinguish between genuine and elicited suicide notes. We hypothesize that machine learning algorithms can categorize suicide notes as well as mental health professionals and psychiatric physician trainees do. The data used are comprised of suicide notes from 33 suicide completers and matched to 33 elicited notes from healthy control group members. Eleven mental health professionals and 31 psychiatric trainees were asked to decide if a note was genuine or elicited. Their decisions were compared to nine different machine-learning algorithms. The results indicate that trainees accurately classified notes 49% of the time, mental health professionals accurately classified notes 63% of the time, and the best machine learning algorithm accurately classified the notes 78% of the time. This is an important step in developing an evidence-based predictor of repeated suicide attempts because it shows that natural language processing can aid in distinguishing between classes of suicidal notes.

  2. Suicide Note Classification Using Natural Language Processing: A Content Analysis.

    Science.gov (United States)

    Pestian, John; Nasrallah, Henry; Matykiewicz, Pawel; Bennett, Aurora; Leenaars, Antoon

    2010-08-04

    Suicide is the second leading cause of death among 25-34 year olds and the third leading cause of death among 15-25 year olds in the United States. In the Emergency Department, where suicidal patients often present, estimating the risk of repeated attempts is generally left to clinical judgment. This paper presents our second attempt to determine the role of computational algorithms in understanding a suicidal patient's thoughts, as represented by suicide notes. We focus on developing methods of natural language processing that distinguish between genuine and elicited suicide notes. We hypothesize that machine learning algorithms can categorize suicide notes as well as mental health professionals and psychiatric physician trainees do. The data used are comprised of suicide notes from 33 suicide completers and matched to 33 elicited notes from healthy control group members. Eleven mental health professionals and 31 psychiatric trainees were asked to decide if a note was genuine or elicited. Their decisions were compared to nine different machine-learning algorithms. The results indicate that trainees accurately classified notes 49% of the time, mental health professionals accurately classified notes 63% of the time, and the best machine learning algorithm accurately classified the notes 78% of the time. This is an important step in developing an evidence-based predictor of repeated suicide attempts because it shows that natural language processing can aid in distinguishing between classes of suicidal notes.

  3. Advanced applications of natural language processing for performing information extraction

    CERN Document Server

    Rodrigues, Mário

    2015-01-01

    This book explains how can be created information extraction (IE) applications that are able to tap the vast amount of relevant information available in natural language sources: Internet pages, official documents such as laws and regulations, books and newspapers, and social web. Readers are introduced to the problem of IE and its current challenges and limitations, supported with examples. The book discusses the need to fill the gap between documents, data, and people, and provides a broad overview of the technology supporting IE. The authors present a generic architecture for developing systems that are able to learn how to extract relevant information from natural language documents, and illustrate how to implement working systems using state-of-the-art and freely available software tools. The book also discusses concrete applications illustrating IE uses.   ·         Provides an overview of state-of-the-art technology in information extraction (IE), discussing achievements and limitations for t...

  4. Nature of phonological delay in children with specific language impairment.

    Science.gov (United States)

    Orsolini, M; Sechi, E; Maronato, C; Bonvino, E; Corcelli, A

    2001-01-01

    This study investigated the nature of phonological delay in a group of children with specific language impairment. It was asked whether phonological errors in this group of children were generated by a slow but normal language learning process or whether they reflected a selective impairment in some representations that enhance normal acquisition and use of a language phonology. A group of 10 children with SLI (mean age = 5.1) was compared with three groups of normal children who were matched in age (age control group, mean age = 5.1), in sentence comprehension and recalling (grammar control group, mean age = 3.7), or who exhibited a phonological performance lower than the age average (group with low phonological performance, mean age = 4.4). The four groups of children were assessed in terms of: (1) responses to a mispronunciation detection task; and (2) error profiles with complex and simple syllabic structures. Performance on the mispronunciation detection task showed that the group with SLI could distinguish a target lexical item from acoustic non-word stimuli that were highly similar to it in terms of phonetic characteristics. An analysis of overall error rate at this task showed, however, that four children with SLI had a much lower performance than normal children of the same age, even when the auditory stimuli were tokens of the target word, or non-words that were phonetically different from the target. A difficulty in coordinating vocal actions in an articulatory plan accounted for error profiles with simple syllabic structures both for some children with SLI and normal children with phonological performance lower than the age average. A severe difficulty with representing complex syllabic structures was a homogeneous characteristic of the group with SLI and worked as the main indicator of impaired, rather than simply slow, phonological development.

  5. Neurolinguistics and psycholinguistics as a basis for computer acquisition of natural language

    Energy Technology Data Exchange (ETDEWEB)

    Powers, D.M.W.

    1983-04-01

    Research into natural language understanding systems for computers has concentrated on implementing particular grammars and grammatical models of the language concerned. This paper presents a rationale for research into natural language understanding systems based on neurological and psychological principles. Important features of the approach are that it seeks to place the onus of learning the language on the computer, and that it seeks to make use of the vast wealth of relevant psycholinguistic and neurolinguistic theory. 22 references.

  6. Connectionist natural language parsing with BrainC

    Science.gov (United States)

    Mueller, Adrian; Zell, Andreas

    1991-08-01

    A close examination of pure neural parsers shows that they either could not guarantee the correctness of their derivations or had to hard-code seriality into the structure of the net. The authors therefore decided to use a hybrid architecture, consisting of a serial parsing algorithm and a trainable net. The system fulfills the following design goals: (1) parsing of sentences without length restriction, (2) soundness and completeness for any context-free language, and (3) learning the applicability of parsing rules with a neural network to increase the efficiency of the whole system. BrainC (backtracktacking and backpropagation in C) combines the well- known shift-reduce parsing technique with backtracking with a backpropagation network to learn and represent typical structures of the trained natural language grammars. The system has been implemented as a subsystem of the Rochester Connectionist Simulator (RCS) on SUN workstations and was tested with several grammars for English and German. The design of the system and then the results are discussed.

  7. Natural language acquisition in large scale neural semantic networks

    Science.gov (United States)

    Ealey, Douglas

    This thesis puts forward the view that a purely signal- based approach to natural language processing is both plausible and desirable. By questioning the veracity of symbolic representations of meaning, it argues for a unified, non-symbolic model of knowledge representation that is both biologically plausible and, potentially, highly efficient. Processes to generate a grounded, neural form of this model-dubbed the semantic filter-are discussed. The combined effects of local neural organisation, coincident with perceptual maturation, are used to hypothesise its nature. This theoretical model is then validated in light of a number of fundamental neurological constraints and milestones. The mechanisms of semantic and episodic development that the model predicts are then used to explain linguistic properties, such as propositions and verbs, syntax and scripting. To mimic the growth of locally densely connected structures upon an unbounded neural substrate, a system is developed that can grow arbitrarily large, data- dependant structures composed of individual self- organising neural networks. The maturational nature of the data used results in a structure in which the perception of concepts is refined by the networks, but demarcated by subsequent structure. As a consequence, the overall structure shows significant memory and computational benefits, as predicted by the cognitive and neural models. Furthermore, the localised nature of the neural architecture also avoids the increasing error sensitivity and redundancy of traditional systems as the training domain grows. The semantic and episodic filters have been demonstrated to perform as well, or better, than more specialist networks, whilst using significantly larger vocabularies, more complex sentence forms and more natural corpora.

  8. Behind the scenes: A medical natural language processing project.

    Science.gov (United States)

    Wu, Joy T; Dernoncourt, Franck; Gehrmann, Sebastian; Tyler, Patrick D; Moseley, Edward T; Carlson, Eric T; Grant, David W; Li, Yeran; Welt, Jonathan; Celi, Leo Anthony

    2018-04-01

    Advancement of Artificial Intelligence (AI) capabilities in medicine can help address many pressing problems in healthcare. However, AI research endeavors in healthcare may not be clinically relevant, may have unrealistic expectations, or may not be explicit enough about their limitations. A diverse and well-functioning multidisciplinary team (MDT) can help identify appropriate and achievable AI research agendas in healthcare, and advance medical AI technologies by developing AI algorithms as well as addressing the shortage of appropriately labeled datasets for machine learning. In this paper, our team of engineers, clinicians and machine learning experts share their experience and lessons learned from their two-year-long collaboration on a natural language processing (NLP) research project. We highlight specific challenges encountered in cross-disciplinary teamwork, dataset creation for NLP research, and expectation setting for current medical AI technologies. Copyright © 2017. Published by Elsevier B.V.

  9. Natural language processing in biomedicine: a unified system architecture overview.

    Science.gov (United States)

    Doan, Son; Conway, Mike; Phuong, Tu Minh; Ohno-Machado, Lucila

    2014-01-01

    In contemporary electronic medical records much of the clinically important data-signs and symptoms, symptom severity, disease status, etc.-are not provided in structured data fields but rather are encoded in clinician-generated narrative text. Natural language processing (NLP) provides a means of unlocking this important data source for applications in clinical decision support, quality assurance, and public health. This chapter provides an overview of representative NLP systems in biomedicine based on a unified architectural view. A general architecture in an NLP system consists of two main components: background knowledge that includes biomedical knowledge resources and a framework that integrates NLP tools to process text. Systems differ in both components, which we review briefly. Additionally, the challenge facing current research efforts in biomedical NLP includes the paucity of large, publicly available annotated corpora, although initiatives that facilitate data sharing, system evaluation, and collaborative work between researchers in clinical NLP are starting to emerge.

  10. Natural Language Based Multimodal Interface for UAV Mission Planning

    Science.gov (United States)

    Chandarana, Meghan; Meszaros, Erica L.; Trujillo, Anna; Allen, B. Danette

    2017-01-01

    As the number of viable applications for unmanned aerial vehicle (UAV) systems increases at an exponential rate, interfaces that reduce the reliance on highly skilled engineers and pilots must be developed. Recent work aims to make use of common human communication modalities such as speech and gesture. This paper explores a multimodal natural language interface that uses a combination of speech and gesture input modalities to build complex UAV flight paths by defining trajectory segment primitives. Gesture inputs are used to define the general shape of a segment while speech inputs provide additional geometric information needed to fully characterize a trajectory segment. A user study is conducted in order to evaluate the efficacy of the multimodal interface.

  11. Pattern Recognition and Natural Language Processing: State of the Art

    Directory of Open Access Journals (Sweden)

    Mirjana Kocaleva

    2016-05-01

    Full Text Available Development of information technologies is growing steadily. With the latest software technologies development and application of the methods of artificial intelligence and machine learning intelligence embededs in computers, the expectations are that in near future computers will be able to solve problems themselves like people do. Artificial intelligence emulates human behavior on computers. Rather than executing instructions one by one, as theyare programmed, machine learning employs prior experience/data that is used in the process of system’s training. In this state of the art paper, common methods in AI, such as machine learning, pattern recognition and the natural language processing (NLP are discussed. Also are given standard architecture of NLP processing system and the level thatisneeded for understanding NLP. Lastly the statistical NLP processing and multi-word expressions are described.

  12. Second-language instinct and instruction effects: nature and nurture in second-language acquisition.

    Science.gov (United States)

    Yusa, Noriaki; Koizumi, Masatoshi; Kim, Jungho; Kimura, Naoki; Uchida, Shinya; Yokoyama, Satoru; Miura, Naoki; Kawashima, Ryuta; Hagiwara, Hiroko

    2011-10-01

    Adults seem to have greater difficulties than children in acquiring a second language (L2) because of the alleged "window of opportunity" around puberty. Postpuberty Japanese participants learned a new English rule with simplex sentences during one month of instruction, and then they were tested on "uninstructed complex sentences" as well as "instructed simplex sentences." The behavioral data show that they can acquire more knowledge than is instructed, suggesting the interweaving of nature (universal principles of grammar, UG) and nurture (instruction) in L2 acquisition. The comparison in the "uninstructed complex sentences" between post-instruction and pre-instruction using functional magnetic resonance imaging reveals a significant activation in Broca's area. Thus, this study provides new insight into Broca's area, where nature and nurture cooperate to produce L2 learners' rich linguistic knowledge. It also shows neural plasticity of adult L2 acquisition, arguing against a critical period hypothesis, at least in the domain of UG.

  13. Classifying a Person's Degree of Accessibility From Natural Body Language During Social Human-Robot Interactions.

    Science.gov (United States)

    McColl, Derek; Jiang, Chuan; Nejat, Goldie

    2017-02-01

    For social robots to be successfully integrated and accepted within society, they need to be able to interpret human social cues that are displayed through natural modes of communication. In particular, a key challenge in the design of social robots is developing the robot's ability to recognize a person's affective states (emotions, moods, and attitudes) in order to respond appropriately during social human-robot interactions (HRIs). In this paper, we present and discuss social HRI experiments we have conducted to investigate the development of an accessibility-aware social robot able to autonomously determine a person's degree of accessibility (rapport, openness) toward the robot based on the person's natural static body language. In particular, we present two one-on-one HRI experiments to: 1) determine the performance of our automated system in being able to recognize and classify a person's accessibility levels and 2) investigate how people interact with an accessibility-aware robot which determines its own behaviors based on a person's speech and accessibility levels.

  14. The Nature of Spanish versus English Language Use at Home

    Science.gov (United States)

    Branum-Martin, Lee; Mehta, Paras D.; Carlson, Coleen D.; Francis, David J.; Goldenberg, Claude

    2014-01-01

    Home language experiences are important for children's development of language and literacy. However, the home language context is complex, especially for Spanish-speaking children in the United States. A child's use of Spanish or English likely ranges along a continuum, influenced by preferences of particular people involved, such as parents,…

  15. A Classification of Sentences Used in Natural Language Processing in the Military Services.

    Science.gov (United States)

    Wittrock, Merlin C.

    Concepts in cognitive psychology are applied to the language used in military situations, and a sentence classification system for use in analyzing military language is outlined. The system is designed to be used, in part, in conjunction with a natural language query system that allows a user to access a database. The discussion of military…

  16. Understanding the Nature of Learners' Out-of-Class Language Learning Experience with Technology

    Science.gov (United States)

    Lai, Chun; Hu, Xiao; Lyu, Boning

    2018-01-01

    Out-of-class learning with technology comprises an essential context of second language development. Understanding the nature of out-of-class language learning with technology is the initial step towards safeguarding its quality. This study examined the types of learning experiences that language learners engaged in outside the classroom and the…

  17. A natural language processing program effectively extracts key pathologic findings from radical prostatectomy reports.

    Science.gov (United States)

    Kim, Brian J; Merchant, Madhur; Zheng, Chengyi; Thomas, Anil A; Contreras, Richard; Jacobsen, Steven J; Chien, Gary W

    2014-12-01

    Natural language processing (NLP) software programs have been widely developed to transform complex free text into simplified organized data. Potential applications in the field of medicine include automated report summaries, physician alerts, patient repositories, electronic medical record (EMR) billing, and quality metric reports. Despite these prospects and the recent widespread adoption of EMR, NLP has been relatively underutilized. The objective of this study was to evaluate the performance of an internally developed NLP program in extracting select pathologic findings from radical prostatectomy specimen reports in the EMR. An NLP program was generated by a software engineer to extract key variables from prostatectomy reports in the EMR within our healthcare system, which included the TNM stage, Gleason grade, presence of a tertiary Gleason pattern, histologic subtype, size of dominant tumor nodule, seminal vesicle invasion (SVI), perineural invasion (PNI), angiolymphatic invasion (ALI), extracapsular extension (ECE), and surgical margin status (SMS). The program was validated by comparing NLP results to a gold standard compiled by two blinded manual reviewers for 100 random pathology reports. NLP demonstrated 100% accuracy for identifying the Gleason grade, presence of a tertiary Gleason pattern, SVI, ALI, and ECE. It also demonstrated near-perfect accuracy for extracting histologic subtype (99.0%), PNI (98.9%), TNM stage (98.0%), SMS (97.0%), and dominant tumor size (95.7%). The overall accuracy of NLP was 98.7%. NLP generated a result in report. This novel program demonstrated high accuracy and efficiency identifying key pathologic details from the prostatectomy report within an EMR system. NLP has the potential to assist urologists by summarizing and highlighting relevant information from verbose pathology reports. It may also facilitate future urologic research through the rapid and automated creation of large databases.

  18. Automation in the Teaching of Descriptive Geometry and CAD. High-Level CAD Templates Using Script Languages

    Science.gov (United States)

    Moreno, R.; Bazán, A. M.

    2017-10-01

    The main purpose of this work is to study improvements to the learning method of technical drawing and descriptive geometry through exercises with traditional techniques that are usually solved manually by applying automated processes assisted by high-level CAD templates (HLCts). Given that an exercise with traditional procedures can be solved, detailed step by step in technical drawing and descriptive geometry manuals, CAD applications allow us to do the same and generalize it later, incorporating references. Traditional teachings have become obsolete and current curricula have been relegated. However, they can be applied in certain automation processes. The use of geometric references (using variables in script languages) and their incorporation into HLCts allows the automation of drawing processes. Instead of repeatedly creating similar exercises or modifying data in the same exercises, users should be able to use HLCts to generate future modifications of these exercises. This paper introduces the automation process when generating exercises based on CAD script files, aided by parametric geometry calculation tools. The proposed method allows us to design new exercises without user intervention. The integration of CAD, mathematics, and descriptive geometry facilitates their joint learning. Automation in the generation of exercises not only saves time but also increases the quality of the statements and reduces the possibility of human error.

  19. Automatic retrieval of bone fracture knowledge using natural language processing.

    Science.gov (United States)

    Do, Bao H; Wu, Andrew S; Maley, Joan; Biswal, Sandip

    2013-08-01

    Natural language processing (NLP) techniques to extract data from unstructured text into formal computer representations are valuable for creating robust, scalable methods to mine data in medical documents and radiology reports. As voice recognition (VR) becomes more prevalent in radiology practice, there is opportunity for implementing NLP in real time for decision-support applications such as context-aware information retrieval. For example, as the radiologist dictates a report, an NLP algorithm can extract concepts from the text and retrieve relevant classification or diagnosis criteria or calculate disease probability. NLP can work in parallel with VR to potentially facilitate evidence-based reporting (for example, automatically retrieving the Bosniak classification when the radiologist describes a kidney cyst). For these reasons, we developed and validated an NLP system which extracts fracture and anatomy concepts from unstructured text and retrieves relevant bone fracture knowledge. We implement our NLP in an HTML5 web application to demonstrate a proof-of-concept feedback NLP system which retrieves bone fracture knowledge in real time.

  20. Intelligent Performance Analysis with a Natural Language Interface

    Science.gov (United States)

    Juuso, Esko K.

    2017-09-01

    Performance improvement is taken as the primary goal in the asset management. Advanced data analysis is needed to efficiently integrate condition monitoring data into the operation and maintenance. Intelligent stress and condition indices have been developed for control and condition monitoring by combining generalized norms with efficient nonlinear scaling. These nonlinear scaling methodologies can also be used to handle performance measures used for management since management oriented indicators can be presented in the same scale as intelligent condition and stress indices. Performance indicators are responses of the process, machine or system to the stress contributions analyzed from process and condition monitoring data. Scaled values are directly used in intelligent temporal analysis to calculate fluctuations and trends. All these methodologies can be used in prognostics and fatigue prediction. The meanings of the variables are beneficial in extracting expert knowledge and representing information in natural language. The idea of dividing the problems into the variable specific meanings and the directions of interactions provides various improvements for performance monitoring and decision making. The integrated temporal analysis and uncertainty processing facilitates the efficient use of domain expertise. Measurements can be monitored with generalized statistical process control (GSPC) based on the same scaling functions.

  1. Arabic text preprocessing for the natural language processing applications

    International Nuclear Information System (INIS)

    Awajan, A.

    2007-01-01

    A new approach for processing vowelized and unvowelized Arabic texts in order to prepare them for Natural Language Processing (NLP) purposes is described. The developed approach is rule-based and made up of four phases: text tokenization, word light stemming, word's morphological analysis and text annotation. The first phase preprocesses the input text in order to isolate the words and represent them in a formal way. The second phase applies a light stemmer in order to extract the stem of each word by eliminating the prefixes and suffixes. The third phase is a rule-based morphological analyzer that determines the root and the morphological pattern for each extracted stem. The last phase produces an annotated text where each word is tagged with its morphological attributes. The preprocessor presented in this paper is capable of dealing with vowelized and unvowelized words, and provides the input words along with relevant linguistics information needed by different applications. It is designed to be used with different NLP applications such as machine translation text summarization, text correction, information retrieval and automatic vowelization of Arabic Text. (author)

  2. Crowdsourcing and curation: perspectives from biology and natural language processing.

    Science.gov (United States)

    Hirschman, Lynette; Fort, Karën; Boué, Stéphanie; Kyrpides, Nikos; Islamaj Doğan, Rezarta; Cohen, Kevin Bretonnel

    2016-01-01

    Crowdsourcing is increasingly utilized for performing tasks in both natural language processing and biocuration. Although there have been many applications of crowdsourcing in these fields, there have been fewer high-level discussions of the methodology and its applicability to biocuration. This paper explores crowdsourcing for biocuration through several case studies that highlight different ways of leveraging 'the crowd'; these raise issues about the kind(s) of expertise needed, the motivations of participants, and questions related to feasibility, cost and quality. The paper is an outgrowth of a panel session held at BioCreative V (Seville, September 9-11, 2015). The session consisted of four short talks, followed by a discussion. In their talks, the panelists explored the role of expertise and the potential to improve crowd performance by training; the challenge of decomposing tasks to make them amenable to crowdsourcing; and the capture of biological data and metadata through community editing.Database URL: http://www.mitre.org/publications/technical-papers/crowdsourcing-and-curation-perspectives. © The Author(s) 2016. Published by Oxford University Press.

  3. A common type system for clinical natural language processing

    Directory of Open Access Journals (Sweden)

    Wu Stephen T

    2013-01-01

    Full Text Available Abstract Background One challenge in reusing clinical data stored in electronic medical records is that these data are heterogenous. Clinical Natural Language Processing (NLP plays an important role in transforming information in clinical text to a standard representation that is comparable and interoperable. Information may be processed and shared when a type system specifies the allowable data structures. Therefore, we aim to define a common type system for clinical NLP that enables interoperability between structured and unstructured data generated in different clinical settings. Results We describe a common type system for clinical NLP that has an end target of deep semantics based on Clinical Element Models (CEMs, thus interoperating with structured data and accommodating diverse NLP approaches. The type system has been implemented in UIMA (Unstructured Information Management Architecture and is fully functional in a popular open-source clinical NLP system, cTAKES (clinical Text Analysis and Knowledge Extraction System versions 2.0 and later. Conclusions We have created a type system that targets deep semantics, thereby allowing for NLP systems to encapsulate knowledge from text and share it alongside heterogenous clinical data sources. Rather than surface semantics that are typically the end product of NLP algorithms, CEM-based semantics explicitly build in deep clinical semantics as the point of interoperability with more structured data types.

  4. A common type system for clinical natural language processing.

    Science.gov (United States)

    Wu, Stephen T; Kaggal, Vinod C; Dligach, Dmitriy; Masanz, James J; Chen, Pei; Becker, Lee; Chapman, Wendy W; Savova, Guergana K; Liu, Hongfang; Chute, Christopher G

    2013-01-03

    One challenge in reusing clinical data stored in electronic medical records is that these data are heterogenous. Clinical Natural Language Processing (NLP) plays an important role in transforming information in clinical text to a standard representation that is comparable and interoperable. Information may be processed and shared when a type system specifies the allowable data structures. Therefore, we aim to define a common type system for clinical NLP that enables interoperability between structured and unstructured data generated in different clinical settings. We describe a common type system for clinical NLP that has an end target of deep semantics based on Clinical Element Models (CEMs), thus interoperating with structured data and accommodating diverse NLP approaches. The type system has been implemented in UIMA (Unstructured Information Management Architecture) and is fully functional in a popular open-source clinical NLP system, cTAKES (clinical Text Analysis and Knowledge Extraction System) versions 2.0 and later. We have created a type system that targets deep semantics, thereby allowing for NLP systems to encapsulate knowledge from text and share it alongside heterogenous clinical data sources. Rather than surface semantics that are typically the end product of NLP algorithms, CEM-based semantics explicitly build in deep clinical semantics as the point of interoperability with more structured data types.

  5. The language of nature matters: we need a more public ecology

    Science.gov (United States)

    Bruce R. Hull; David P. Robertson

    2000-01-01

    The language we use to describe nature matters. It is used by policy analysts to set goals for ecological restoration and management, by scientists to describe the nature that did, does, or could exist, and by all of us to imagine possible and acceptable conditions of environmental quality. Participants in environmental decision making demand a lot of the language and...

  6. Natural Language Understanding Systems Within the A. I. Paradigm: A Survey and Some Comparisons.

    Science.gov (United States)

    Wilks, Yorick

    The paper surveys the major projects on the understanding of natural language that fall within what may now be called the artificial intelligence paradigm of natural language systems. Some space is devoted to arguing that the paradigm is now a reality and different in significant respects from the generative paradigm of present-day linguistics.…

  7. On the neurolinguistic nature of language abnormalities in Huntington's disease.

    Science.gov (United States)

    Wallesch, C W; Fehrenbach, R A

    1988-03-01

    Spontaneous language of 18 patients suffering from Huntington's disease and 15 dysarthric controls suffering from Friedreich's ataxia were investigated. In addition, language functions in various modalities were assessed with the Aachen Aphasia Test (AAT). The Huntington patients exhibited deficits in the syntactical complexity of spontaneous speech and in the Token Test, confrontation naming, and language comprehension subtests of the AAT, which are interpreted as resulting from their dementia. Errors affecting word access mechanisms and production of syntactical structures as such were not encountered.

  8. Sign language: its history and contribution to the understanding of the biological nature of language.

    Science.gov (United States)

    Ruben, Robert J

    2005-05-01

    The development of conceptualization of a biological basis of language during the 20th century has come about, in part, through the appreciation of the central nervous system's ability to utilize varied sensory inputs, and particularly vision, to develop language. Sign language has been a part of the linguistic experience from prehistory to the present day. Data suggest that human language may have originated as a visual language and became primarily auditory with the later development of our voice/speech tract. Sign language may be categorized into two types. The first is used by individuals who have auditory/oral language and the signs are used for special situations, such as communication in a monastery in which there is a vow of silence. The second is used by those who do not have access to auditory/oral language, namely the deaf. The history of the two forms of sign language and the development of the concept of the biological basis of language are reviewed from the fourth century BC to the present day. Sign languages of the deaf have been recognized since at least the fourth century BC. The codification of a monastic sign language occurred in the seventh to eighth centuries AD. Probable synergy between the two forms of sign language occurred in the 16th century. Among other developments, the Abbey de L'Epée introduced, in the 18th century, an oral syntax, French, into a sign language based upon indigenous signs of the deaf and newly created signs. During the 19th century, the concept of a "critical" period for the acquisition of language developed; this was an important stimulus for the exploration of the biological basis of language. The introduction of techniques, e.g. evoked potentials and functional MRI, during the 20th century allowed study of the brain functions associated with language.

  9. Natural language processing of clinical notes for identification of critical limb ischemia.

    Science.gov (United States)

    Afzal, Naveed; Mallipeddi, Vishnu Priya; Sohn, Sunghwan; Liu, Hongfang; Chaudhry, Rajeev; Scott, Christopher G; Kullo, Iftikhar J; Arruda-Olson, Adelaide M

    2018-03-01

    Critical limb ischemia (CLI) is a complication of advanced peripheral artery disease (PAD) with diagnosis based on the presence of clinical signs and symptoms. However, automated identification of cases from electronic health records (EHRs) is challenging due to absence of a single definitive International Classification of Diseases (ICD-9 or ICD-10) code for CLI. In this study, we extend a previously validated natural language processing (NLP) algorithm for PAD identification to develop and validate a subphenotyping NLP algorithm (CLI-NLP) for identification of CLI cases from clinical notes. We compared performance of the CLI-NLP algorithm with CLI-related ICD-9 billing codes. The gold standard for validation was human abstraction of clinical notes from EHRs. Compared to billing codes the CLI-NLP algorithm had higher positive predictive value (PPV) (CLI-NLP 96%, billing codes 67%, p tools and support a learning healthcare system. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  10. Natural Language Processing for EHR-Based Pharmacovigilance: A Structured Review.

    Science.gov (United States)

    Luo, Yuan; Thompson, William K; Herr, Timothy M; Zeng, Zexian; Berendsen, Mark A; Jonnalagadda, Siddhartha R; Carson, Matthew B; Starren, Justin

    2017-11-01

    The goal of pharmacovigilance is to detect, monitor, characterize and prevent adverse drug events (ADEs) with pharmaceutical products. This article is a comprehensive structured review of recent advances in applying natural language processing (NLP) to electronic health record (EHR) narratives for pharmacovigilance. We review methods of varying complexity and problem focus, summarize the current state-of-the-art in methodology advancement, discuss limitations and point out several promising future directions. The ability to accurately capture both semantic and syntactic structures in clinical narratives becomes increasingly critical to enable efficient and accurate ADE detection. Significant progress has been made in algorithm development and resource construction since 2000. Since 2012, statistical analysis and machine learning methods have gained traction in automation of ADE mining from EHR narratives. Current state-of-the-art methods for NLP-based ADE detection from EHRs show promise regarding their integration into production pharmacovigilance systems. In addition, integrating multifaceted, heterogeneous data sources has shown promise in improving ADE detection and has become increasingly adopted. On the other hand, challenges and opportunities remain across the frontier of NLP application to EHR-based pharmacovigilance, including proper characterization of ADE context, differentiation between off- and on-label drug-use ADEs, recognition of the importance of polypharmacy-induced ADEs, better integration of heterogeneous data sources, creation of shared corpora, and organization of shared-task challenges to advance the state-of-the-art.

  11. On the nature of language – Heidegger and African Philosophy ...

    African Journals Online (AJOL)

    My contention is that Heidegger's daring phenomenology of language is also found and even radicalised within the framework of African philosophy, particularly the philosophy of myth. I argue that the exploration of the relation between these views of language offers the possibility not only to expand on the conventional ...

  12. "Homo Pedagogicus": The Evolutionary Nature of Second Language Teaching

    Science.gov (United States)

    Atkinson, Dwight

    2017-01-01

    Second language (SL) teacher educators tirelessly teach others how to teach. But how often do we actually define teaching? Without explicit definitional activity on this fundamental concept in second language teaching (SLT), it remains implicit and intuitive--the opposite of clear, productive understanding. I therefore explore the question,…

  13. Comparison Between Manual Auditing and a Natural Language Process With Machine Learning Algorithm to Evaluate Faculty Use of Standardized Reports in Radiology.

    Science.gov (United States)

    Guimaraes, Carolina V; Grzeszczuk, Robert; Bisset, George S; Donnelly, Lane F

    2018-03-01

    When implementing or monitoring department-sanctioned standardized radiology reports, feedback about individual faculty performance has been shown to be a useful driver of faculty compliance. Most commonly, these data are derived from manual audit, which can be both time-consuming and subject to sampling error. The purpose of this study was to evaluate whether a software program using natural language processing and machine learning could accurately audit radiologist compliance with the use of standardized reports compared with performed manual audits. Radiology reports from a 1-month period were loaded into such a software program, and faculty compliance with use of standardized reports was calculated. For that same period, manual audits were performed (25 reports audited for each of 42 faculty members). The mean compliance rates calculated by automated auditing were then compared with the confidence interval of the mean rate by manual audit. The mean compliance rate for use of standardized reports as determined by manual audit was 91.2% with a confidence interval between 89.3% and 92.8%. The mean compliance rate calculated by automated auditing was 92.0%, within that confidence interval. This study shows that by use of natural language processing and machine learning algorithms, an automated analysis can accurately define whether reports are compliant with use of standardized report templates and language, compared with manual audits. This may avoid significant labor costs related to conducting the manual auditing process. Copyright © 2017 American College of Radiology. Published by Elsevier Inc. All rights reserved.

  14. A grammar-based semantic similarity algorithm for natural language sentences.

    Science.gov (United States)

    Lee, Ming Che; Chang, Jia Wei; Hsieh, Tung Cheng

    2014-01-01

    This paper presents a grammar and semantic corpus based similarity algorithm for natural language sentences. Natural language, in opposition to "artificial language", such as computer programming languages, is the language used by the general public for daily communication. Traditional information retrieval approaches, such as vector models, LSA, HAL, or even the ontology-based approaches that extend to include concept similarity comparison instead of cooccurrence terms/words, may not always determine the perfect matching while there is no obvious relation or concept overlap between two natural language sentences. This paper proposes a sentence similarity algorithm that takes advantage of corpus-based ontology and grammatical rules to overcome the addressed problems. Experiments on two famous benchmarks demonstrate that the proposed algorithm has a significant performance improvement in sentences/short-texts with arbitrary syntax and structure.

  15. Using Neural Networks to Generate Inferential Roles for Natural Language

    Directory of Open Access Journals (Sweden)

    Peter Blouw

    2018-01-01

    Full Text Available Neural networks have long been used to study linguistic phenomena spanning the domains of phonology, morphology, syntax, and semantics. Of these domains, semantics is somewhat unique in that there is little clarity concerning what a model needs to be able to do in order to provide an account of how the meanings of complex linguistic expressions, such as sentences, are understood. We argue that one thing such models need to be able to do is generate predictions about which further sentences are likely to follow from a given sentence; these define the sentence's “inferential role.” We then show that it is possible to train a tree-structured neural network model to generate very simple examples of such inferential roles using the recently released Stanford Natural Language Inference (SNLI dataset. On an empirical front, we evaluate the performance of this model by reporting entailment prediction accuracies on a set of test sentences not present in the training data. We also report the results of a simple study that compares human plausibility ratings for both human-generated and model-generated entailments for a random selection of sentences in this test set. On a more theoretical front, we argue in favor of a revision to some common assumptions about semantics: understanding a linguistic expression is not only a matter of mapping it onto a representation that somehow constitutes its meaning; rather, understanding a linguistic expression is mainly a matter of being able to draw certain inferences. Inference should accordingly be at the core of any model of semantic cognition.

  16. One grammar or two? Sign Languages and the Nature of Human Language.

    Science.gov (United States)

    Lillo-Martin, Diane C; Gajewski, Jon

    2014-07-01

    Linguistic research has identified abstract properties that seem to be shared by all languages-such properties may be considered defining characteristics. In recent decades, the recognition that human language is found not only in the spoken modality but also in the form of sign languages has led to a reconsideration of some of these potential linguistic universals. In large part, the linguistic analysis of sign languages has led to the conclusion that universal characteristics of language can be stated at an abstract enough level to include languages in both spoken and signed modalities. For example, languages in both modalities display hierarchical structure at sub-lexical and phrasal level, and recursive rule application. However, this does not mean that modality-based differences between signed and spoken languages are trivial. In this article, we consider several candidate domains for modality effects, in light of the overarching question: are signed and spoken languages subject to the same abstract grammatical constraints, or is a substantially different conception of grammar needed for the sign language case? We look at differences between language types based on the use of space, iconicity, and the possibility for simultaneity in linguistic expression. The inclusion of sign languages does support some broadening of the conception of human language-in ways that are applicable for spoken languages as well. Still, the overall conclusion is that one grammar applies for human language, no matter the modality of expression. WIREs Cogn Sci 2014, 5:387-401. doi: 10.1002/wcs.1297 This article is categorized under: Linguistics > Linguistic Theory. © 2014 The Authors. WIREs Cognitive Science published by John Wiley & Sons, Ltd.

  17. From quantum foundations via natural language meaning to a theory of everything

    OpenAIRE

    Coecke, Bob

    2016-01-01

    In this paper we argue for a paradigmatic shift from `reductionism' to `togetherness'. In particular, we show how interaction between systems in quantum theory naturally carries over to modelling how word meanings interact in natural language. Since meaning in natural language, depending on the subject domain, encompasses discussions within any scientific discipline, we obtain a template for theories such as social interaction, animal behaviour, and many others.

  18. Optimizing annotation resources for natural language de-identification via a game theoretic framework.

    Science.gov (United States)

    Li, Muqun; Carrell, David; Aberdeen, John; Hirschman, Lynette; Kirby, Jacqueline; Li, Bo; Vorobeychik, Yevgeniy; Malin, Bradley A

    2016-06-01

    Electronic medical records (EMRs) are increasingly repurposed for activities beyond clinical care, such as to support translational research and public policy analysis. To mitigate privacy risks, healthcare organizations (HCOs) aim to remove potentially identifying patient information. A substantial quantity of EMR data is in natural language form and there are concerns that automated tools for detecting identifiers are imperfect and leak information that can be exploited by ill-intentioned data recipients. Thus, HCOs have been encouraged to invest as much effort as possible to find and detect potential identifiers, but such a strategy assumes the recipients are sufficiently incentivized and capable of exploiting leaked identifiers. In practice, such an assumption may not hold true and HCOs may overinvest in de-identification technology. The goal of this study is to design a natural language de-identification framework, rooted in game theory, which enables an HCO to optimize their investments given the expected capabilities of an adversarial recipient. We introduce a Stackelberg game to balance risk and utility in natural language de-identification. This game represents a cost-benefit model that enables an HCO with a fixed budget to minimize their investment in the de-identification process. We evaluate this model by assessing the overall payoff to the HCO and the adversary using 2100 clinical notes from Vanderbilt University Medical Center. We simulate several policy alternatives using a range of parameters, including the cost of training a de-identification model and the loss in data utility due to the removal of terms that are not identifiers. In addition, we compare policy options where, when an attacker is fined for misuse, a monetary penalty is paid to the publishing HCO as opposed to a third party (e.g., a federal regulator). Our results show that when an HCO is forced to exhaust a limited budget (set to $2000 in the study), the precision and recall of the

  19. Three-dimensional grammar in the brain: Dissociating the neural correlates of natural sign language and manually coded spoken language.

    Science.gov (United States)

    Jednoróg, Katarzyna; Bola, Łukasz; Mostowski, Piotr; Szwed, Marcin; Boguszewski, Paweł M; Marchewka, Artur; Rutkowski, Paweł

    2015-05-01

    In several countries natural sign languages were considered inadequate for education. Instead, new sign-supported systems were created, based on the belief that spoken/written language is grammatically superior. One such system called SJM (system językowo-migowy) preserves the grammatical and lexical structure of spoken Polish and since 1960s has been extensively employed in schools and on TV. Nevertheless, the Deaf community avoids using SJM for everyday communication, its preferred language being PJM (polski język migowy), a natural sign language, structurally and grammatically independent of spoken Polish and featuring classifier constructions (CCs). Here, for the first time, we compare, with fMRI method, the neural bases of natural vs. devised communication systems. Deaf signers were presented with three types of signed sentences (SJM and PJM with/without CCs). Consistent with previous findings, PJM with CCs compared to either SJM or PJM without CCs recruited the parietal lobes. The reverse comparison revealed activation in the anterior temporal lobes, suggesting increased semantic combinatory processes in lexical sign comprehension. Finally, PJM compared with SJM engaged left posterior superior temporal gyrus and anterior temporal lobe, areas crucial for sentence-level speech comprehension. We suggest that activity in these two areas reflects greater processing efficiency for naturally evolved sign language. Copyright © 2015 Elsevier Ltd. All rights reserved.

  20. Towards multilingual access to textual databases in natural language

    International Nuclear Information System (INIS)

    Radwan, Khaled

    1994-01-01

    The Cross-Lingual Information Retrieval system (CLIR) or Multilingual Information Retrieval (MIR) has become the key issue in electronic documents management systems in a multinational environment. We propose here a multilingual information retrieval system consisting of a morpho-syntactic analyser, a transfer system from source language to target language and an information retrieval system. A thorough investigation into the system architecture and the transfer mechanisms is proposed in that report, using two different performance evaluation methods. (author) [fr

  1. Applications Associated With Morphological Analysis And Generation In Natural Language Processing

    Directory of Open Access Journals (Sweden)

    Neha Yadav

    2017-08-01

    Full Text Available Natural Language Processing is one of the most developing fields in research area. In most of the applications related to the Natural Language Processing findings of the Morphological Analysis and Morphological Generation can be considered very important. As morphological study is the technique to recognise a word and its output can be used on later on stages .Keeping in view this importance this paper describes how Morphological Analysis and Morphological Generation can be proved as an important part of various Natural Language Processing fields such as Spell checker Machine Translation etc.

  2. Induction of the morphology of natural language : unsupervised morpheme segmentation with application to automatic speech recognition

    OpenAIRE

    Creutz, Mathias

    2006-01-01

    In order to develop computer applications that successfully process natural language data (text and speech), one needs good models of the vocabulary and grammar of as many languages as possible. According to standard linguistic theory, words consist of morphemes, which are the smallest individually meaningful elements in a language. Since an immense number of word forms can be constructed by combining a limited set of morphemes, the capability of understanding and producing new word forms dep...

  3. A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences

    Science.gov (United States)

    Chang, Jia Wei; Hsieh, Tung Cheng

    2014-01-01

    This paper presents a grammar and semantic corpus based similarity algorithm for natural language sentences. Natural language, in opposition to “artificial language”, such as computer programming languages, is the language used by the general public for daily communication. Traditional information retrieval approaches, such as vector models, LSA, HAL, or even the ontology-based approaches that extend to include concept similarity comparison instead of cooccurrence terms/words, may not always determine the perfect matching while there is no obvious relation or concept overlap between two natural language sentences. This paper proposes a sentence similarity algorithm that takes advantage of corpus-based ontology and grammatical rules to overcome the addressed problems. Experiments on two famous benchmarks demonstrate that the proposed algorithm has a significant performance improvement in sentences/short-texts with arbitrary syntax and structure. PMID:24982952

  4. A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences

    Directory of Open Access Journals (Sweden)

    Ming Che Lee

    2014-01-01

    Full Text Available This paper presents a grammar and semantic corpus based similarity algorithm for natural language sentences. Natural language, in opposition to “artificial language”, such as computer programming languages, is the language used by the general public for daily communication. Traditional information retrieval approaches, such as vector models, LSA, HAL, or even the ontology-based approaches that extend to include concept similarity comparison instead of cooccurrence terms/words, may not always determine the perfect matching while there is no obvious relation or concept overlap between two natural language sentences. This paper proposes a sentence similarity algorithm that takes advantage of corpus-based ontology and grammatical rules to overcome the addressed problems. Experiments on two famous benchmarks demonstrate that the proposed algorithm has a significant performance improvement in sentences/short-texts with arbitrary syntax and structure.

  5. Computational Nonlinear Morphology with Emphasis on Semitic Languages. Studies in Natural Language Processing.

    Science.gov (United States)

    Kiraz, George Anton

    This book presents a tractable computational model that can cope with complex morphological operations, especially in Semitic languages, and less complex morphological systems present in Western languages. It outlines a new generalized regular rewrite rule system that uses multiple finite-state automata to cater to root-and-pattern morphology,…

  6. Dynamic changes in network activations characterize early learning of a natural language.

    Science.gov (United States)

    Plante, Elena; Patterson, Dianne; Dailey, Natalie S; Kyle, R Almyrde; Fridriksson, Julius

    2014-09-01

    Those who are initially exposed to an unfamiliar language have difficulty separating running speech into individual words, but over time will recognize both words and the grammatical structure of the language. Behavioral studies have used artificial languages to demonstrate that humans are sensitive to distributional information in language input, and can use this information to discover the structure of that language. This is done without direct instruction and learning occurs over the course of minutes rather than days or months. Moreover, learners may attend to different aspects of the language input as their own learning progresses. Here, we examine processing associated with the early stages of exposure to a natural language, using fMRI. Listeners were exposed to an unfamiliar language (Icelandic) while undergoing four consecutive fMRI scans. The Icelandic stimuli were constrained in ways known to produce rapid learning of aspects of language structure. After approximately 4 min of exposure to the Icelandic stimuli, participants began to differentiate between correct and incorrect sentences at above chance levels, with significant improvement between the first and last scan. An independent component analysis of the imaging data revealed four task-related components, two of which were associated with behavioral performance early in the experiment, and two with performance later in the experiment. This outcome suggests dynamic changes occur in the recruitment of neural resources even within the initial period of exposure to an unfamiliar natural language. Copyright © 2014 Elsevier Ltd. All rights reserved.

  7. The Nature of Chinese Language Classroom Learning Environments in Singapore Secondary Schools

    Science.gov (United States)

    Chua, Siew Lian; Wong, Angela F. L.; Chen, Der-Thanq V.

    2011-01-01

    This article reports findings from a classroom environment study which was designed to investigate the nature of Chinese Language classroom environments in Singapore secondary schools. We used a perceptual instrument, the Chinese Language Classroom Environment Inventory, to investigate teachers' and students' perceptions towards their Chinese…

  8. Using the Natural Language Paradigm (NLP) to Increase Vocalizations of Older Adults with Cognitive Impairments

    Science.gov (United States)

    LeBlanc, Linda A.; Geiger, Kaneen B.; Sautter, Rachael A.; Sidener, Tina M.

    2007-01-01

    The Natural Language Paradigm (NLP) has proven effective in increasing spontaneous verbalizations for children with autism. This study investigated the use of NLP with older adults with cognitive impairments served at a leisure-based adult day program for seniors. Three individuals with limited spontaneous use of functional language participated…

  9. Where humans meet machines innovative solutions for knotty natural-language problems

    CERN Document Server

    Markowitz, Judith

    2013-01-01

    Where Humans Meet Machines: Innovative Solutions for Knotty Natural-Language Problems brings humans and machines closer together by showing how linguistic complexities that confound the speech systems of today can be handled effectively by sophisticated natural-language technology. Some of the most vexing natural-language problems that are addressed in this book entail   recognizing and processing idiomatic expressions, understanding metaphors, matching an anaphor correctly with its antecedent, performing word-sense disambiguation, and handling out-of-vocabulary words and phrases. This fourteen-chapter anthology consists of contributions from industry scientists and from academicians working at major universities in North America and Europe. They include researchers who have played a central role in DARPA-funded programs and developers who craft real-world solutions for corporations. These contributing authors analyze the role of natural language technology in the global marketplace; they explore the need f...

  10. From Monologue to Dialogue: Natural Language Generation in OVIS

    NARCIS (Netherlands)

    Theune, Mariet; Freedman, R.; Callaway, C.

    This paper describes how a language generation system that was originally designed for monologue generation, has been adapted for use in the OVIS spoken dialogue system. To meet the requirement that in a dialogue, the system’s utterances should make up a single, coherent dialogue turn, several

  11. Evolutionary explanations for natural language: criteria from evolutionary biology

    NARCIS (Netherlands)

    Zuidema, W.; de Boer, B.

    2008-01-01

    Theories of the evolutionary origins of language must be informed by empirical and theoretical results from a variety of different fields. Complementing recent surveys of relevant work from linguistics, animal behaviour and genetics, this paper surveys the requirements on evolutionary scenarios that

  12. Evolutionary Developmental Linguistics: Naturalization of the Faculty of Language

    Science.gov (United States)

    Locke, John L.

    2009-01-01

    Since language is a biological trait, it is necessary to investigate its evolution, development, and functions, along with the mechanisms that have been set aside, and are now recruited, for its acquisition and use. It is argued here that progress toward each of these goals can be facilitated by new programs of research, carried out within a new…

  13. Population-Based Analysis of Histologically Confirmed Melanocytic Proliferations Using Natural Language Processing.

    Science.gov (United States)

    Lott, Jason P; Boudreau, Denise M; Barnhill, Ray L; Weinstock, Martin A; Knopp, Eleanor; Piepkorn, Michael W; Elder, David E; Knezevich, Steven R; Baer, Andrew; Tosteson, Anna N A; Elmore, Joann G

    2018-01-01

    Population-based information on the distribution of histologic diagnoses associated with skin biopsies is unknown. Electronic medical records (EMRs) enable automated extraction of pathology report data to improve our epidemiologic understanding of skin biopsy outcomes, specifically those of melanocytic origin. To determine population-based frequencies and distribution of histologically confirmed melanocytic lesions. A natural language processing (NLP)-based analysis of EMR pathology reports of adult patients who underwent skin biopsies at a large integrated health care delivery system in the US Pacific Northwest from January 1, 2007, through December 31, 2012. Skin biopsy procedure. The primary outcome was histopathologic diagnosis, obtained using an NLP-based system to process EMR pathology reports. We determined the percentage of diagnoses classified as melanocytic vs nonmelanocytic lesions. Diagnoses classified as melanocytic were further subclassified using the Melanocytic Pathology Assessment Tool and Hierarchy for Diagnosis (MPATH-Dx) reporting schema into the following categories: class I (nevi and other benign proliferations such as mildly dysplastic lesions typically requiring no further treatment), class II (moderately dysplastic and other low-risk lesions that may merit narrow reexcision with skin biopsies, performed on 47 529 patients, were examined. Nearly 1 in 4 skin biopsies were of melanocytic lesions (23%; n = 18 715), which were distributed according to MPATH-Dx categories as follows: class I, 83.1% (n = 15 558); class II, 8.3% (n = 1548); class III, 4.5% (n = 842); class IV, 2.2% (n = 405); and class V, 1.9% (n = 362). Approximately one-quarter of skin biopsies resulted in diagnoses of melanocytic proliferations. These data provide the first population-based estimates across the spectrum of melanocytic lesions ranging from benign through dysplastic to malignant. These results may serve as a foundation for future

  14. Mining peripheral arterial disease cases from narrative clinical notes using natural language processing.

    Science.gov (United States)

    Afzal, Naveed; Sohn, Sunghwan; Abram, Sara; Scott, Christopher G; Chaudhry, Rajeev; Liu, Hongfang; Kullo, Iftikhar J; Arruda-Olson, Adelaide M

    2017-06-01

    Lower extremity peripheral arterial disease (PAD) is highly prevalent and affects millions of individuals worldwide. We developed a natural language processing (NLP) system for automated ascertainment of PAD cases from clinical narrative notes and compared the performance of the NLP algorithm with billing code algorithms, using ankle-brachial index test results as the gold standard. We compared the performance of the NLP algorithm to (1) results of gold standard ankle-brachial index; (2) previously validated algorithms based on relevant International Classification of Diseases, Ninth Revision diagnostic codes (simple model); and (3) a combination of International Classification of Diseases, Ninth Revision codes with procedural codes (full model). A dataset of 1569 patients with PAD and controls was randomly divided into training (n = 935) and testing (n = 634) subsets. We iteratively refined the NLP algorithm in the training set including narrative note sections, note types, and service types, to maximize its accuracy. In the testing dataset, when compared with both simple and full models, the NLP algorithm had better accuracy (NLP, 91.8%; full model, 81.8%; simple model, 83%; P NLP, 92.9%; full model, 74.3%; simple model, 79.9%; P NLP, 92.5%; full model, 64.2%; simple model, 75.9%; P NLP algorithm for automatic ascertainment of PAD cases from clinical notes had greater accuracy than billing code algorithms. Our findings highlight the potential of NLP tools for rapid and efficient ascertainment of PAD cases from electronic health records to facilitate clinical investigation and eventually improve care by clinical decision support. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  15. Studies in using a universal exchange and inference language for evidence based medicine. Semi-automated learning and reasoning for PICO methodology, systematic review, and environmental epidemiology.

    Science.gov (United States)

    Robson, Barry

    2016-12-01

    The Q-UEL language of XML-like tags and the associated software applications are providing a valuable toolkit for Evidence Based Medicine (EBM). In this paper the already existing applications, data bases, and tags are brought together with new ones. The particular Q-UEL embodiment used here is the BioIngine. The main challenge is one of bringing together the methods of symbolic reasoning and calculative probabilistic inference that underlie EBM and medical decision making. Some space is taken to review this background. The unification is greatly facilitated by Q-UEL's roots in the notation and algebra of Dirac, and by extending Q-UEL into the Wolfram programming environment. Further, the overall problem of integration is also a relatively simple one because of the nature of Q-UEL as a language for interoperability in healthcare and biomedicine, while the notion of workflow is facilitated because of the EBM best practice known as PICO. What remains difficult is achieving a high degree of overall automation because of a well-known difficulty in capturing human expertise in computers: the Feigenbaum bottleneck. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. The Language Faculty that Wasn't: A Usage-Based Account of Natural Language Recursion

    Directory of Open Access Journals (Sweden)

    Morten H Christiansen

    2015-08-01

    Full Text Available In the generative tradition, the language faculty has been shrinking—perhaps to include only the mechanism of recursion. This paper argues that even this view of the language faculty is too expansive. We first argue that a language faculty is difficult to reconcile with evolutionary considerations. We then focus on recursion as a detailed case study, arguing that our ability to process recursive structure does not rely on recursion as a property of the grammar, but instead emerge gradually by piggybacking on domain-general sequence learning abilities. Evidence from genetics, comparative work on non-human primates, and cognitive neuroscience suggests that humans have evolved complex sequence learning skills, which were subsequently pressed into service to accommodate language. Constraints on sequence learning therefore have played an important role in shaping the cultural evolution of linguistic structure, including our limited abilities for processing recursive structure. Finally, we re-evaluate some of the key considerations that have often been taken to require the postulation of a language faculty.

  17. Semantic similarity from natural language and ontology analysis

    CERN Document Server

    Harispe, Sébastien; Janaqi, Stefan

    2015-01-01

    Artificial Intelligence federates numerous scientific fields in the aim of developing machines able to assist human operators performing complex treatments---most of which demand high cognitive skills (e.g. learning or decision processes). Central to this quest is to give machines the ability to estimate the likeness or similarity between things in the way human beings estimate the similarity between stimuli.In this context, this book focuses on semantic measures: approaches designed for comparing semantic entities such as units of language, e.g. words, sentences, or concepts and instances def

  18. Building blocks for automated elucidation of metabolites: natural product-likeness for candidate ranking.

    Science.gov (United States)

    Jayaseelan, Kalai Vanii; Steinbeck, Christoph

    2014-07-05

    In metabolomics experiments, spectral fingerprints of metabolites with no known structural identity are detected routinely. Computer-assisted structure elucidation (CASE) has been used to determine the structural identities of unknown compounds. It is generally accepted that a single 1D NMR spectrum or mass spectrum is usually not sufficient to establish the identity of a hitherto unknown compound. When a suite of spectra from 1D and 2D NMR experiments supplemented with a molecular formula are available, the successful elucidation of the chemical structure for candidates with up to 30 heavy atoms has been reported previously by one of the authors. In high-throughput metabolomics, usually 1D NMR or mass spectrometry experiments alone are conducted for rapid analysis of samples. This method subsequently requires that the spectral patterns are analyzed automatically to quickly identify known and unknown structures. In this study, we investigated whether additional existing knowledge, such as the fact that the unknown compound is a natural product, can be used to improve the ranking of the correct structure in the result list after the structure elucidation process. To identify unknowns using as little spectroscopic information as possible, we implemented an evolutionary algorithm-based CASE mechanism to elucidate candidates in a fully automated fashion, with input of the molecular formula and 13C NMR spectrum of the isolated compound. We also tested how filters like natural product-likeness, a measure that calculates the similarity of the compounds to known natural product space, might enhance the performance and quality of the structure elucidation. The evolutionary algorithm is implemented within the SENECA package for CASE reported previously, and is available for free download under artistic license at http://sourceforge.net/projects/seneca/. The natural product-likeness calculator is incorporated as a plugin within SENECA and is available as a GUI client and

  19. Use of Automated Scoring Features to Generate Hypotheses Regarding Language-Based DIF

    Science.gov (United States)

    Shermis, Mark D.; Mao, Liyang; Mulholland, Matthew; Kieftenbeld, Vincent

    2017-01-01

    This study uses the feature sets employed by two automated scoring engines to determine if a "linguistic profile" could be formulated that would help identify items that are likely to exhibit differential item functioning (DIF) based on linguistic features. Sixteen items were administered to 1200 students where demographic information…

  20. Automated extraction of natural drainage density patterns for the conterminous United States through high performance computing

    Science.gov (United States)

    Stanislawski, Larry V.; Falgout, Jeff T.; Buttenfield, Barbara P.

    2015-01-01

    Hydrographic networks form an important data foundation for cartographic base mapping and for hydrologic analysis. Drainage density patterns for these networks can be derived to characterize local landscape, bedrock and climate conditions, and further inform hydrologic and geomorphological analysis by indicating areas where too few headwater channels have been extracted. But natural drainage density patterns are not consistently available in existing hydrographic data for the United States because compilation and capture criteria historically varied, along with climate, during the period of data collection over the various terrain types throughout the country. This paper demonstrates an automated workflow that is being tested in a high-performance computing environment by the U.S. Geological Survey (USGS) to map natural drainage density patterns at the 1:24,000-scale (24K) for the conterminous United States. Hydrographic network drainage patterns may be extracted from elevation data to guide corrections for existing hydrographic network data. The paper describes three stages in this workflow including data pre-processing, natural channel extraction, and generation of drainage density patterns from extracted channels. The workflow is concurrently implemented by executing procedures on multiple subbasin watersheds within the U.S. National Hydrography Dataset (NHD). Pre-processing defines parameters that are needed for the extraction process. Extraction proceeds in standard fashion: filling sinks, developing flow direction and weighted flow accumulation rasters. Drainage channels with assigned Strahler stream order are extracted within a subbasin and simplified. Drainage density patterns are then estimated with 100-meter resolution and subsequently smoothed with a low-pass filter. The extraction process is found to be of better quality in higher slope terrains. Concurrent processing through the high performance computing environment is shown to facilitate and refine

  1. Análise de inteligibilidade textual por meio de ferramentas de processamento automático do português: avaliação da Coleção Literatura para Todos = Text readability analysis with Natural Language Processing Tools: assessment of the “Literatura para Todos” Collection

    Directory of Open Access Journals (Sweden)

    Rodrigues, Erica dos Santos

    2013-01-01

    Full Text Available O presente trabalho apresenta resultados de pesquisa referente à inteligibilidade dos livros da Coleção Literatura para Todos 1, publicada pelo MEC/SECAD (2006 e distribuída para jovens e adultos recém-alfabetizados. A investigação da inteligibilidade dos textos buscou conjugar pressupostos da Psicolinguística e ferramentas de processamento automático da língua portuguesa. Utilizamos critérios de inteligibilidade referidos na literatura psicolinguística, e tentamos capturar de maneira objetiva o grau de complexidade linguística dos livros através de ferramentas computacionais: o analisador morfossintático PALAVRAS e o programa Coh-Metrix Port. Nossos resultados sugerem que os livros dessa coleção são complexos para o público pretendido. Assumindo que os neoleitores estão na etapa inicial do processo de alfabetização, é nítido que esses livros exigem um esforço de decodificação da escrita que está além de sua capacidade

  2. Neural systems language: a formal modeling language for the systematic description, unambiguous communication, and automated digital curation of neural connectivity.

    Science.gov (United States)

    Brown, Ramsay A; Swanson, Larry W

    2013-09-01

    Systematic description and the unambiguous communication of findings and models remain among the unresolved fundamental challenges in systems neuroscience. No common descriptive frameworks exist to describe systematically the connective architecture of the nervous system, even at the grossest level of observation. Furthermore, the accelerating volume of novel data generated on neural connectivity outpaces the rate at which this data is curated into neuroinformatics databases to synthesize digitally systems-level insights from disjointed reports and observations. To help address these challenges, we propose the Neural Systems Language (NSyL). NSyL is a modeling language to be used by investigators to encode and communicate systematically reports of neural connectivity from neuroanatomy and brain imaging. NSyL engenders systematic description and communication of connectivity irrespective of the animal taxon described, experimental or observational technique implemented, or nomenclature referenced. As a language, NSyL is internally consistent, concise, and comprehensible to both humans and computers. NSyL is a promising development for systematizing the representation of neural architecture, effectively managing the increasing volume of data on neural connectivity and streamlining systems neuroscience research. Here we present similar precedent systems, how NSyL extends existing frameworks, and the reasoning behind NSyL's development. We explore NSyL's potential for balancing robustness and consistency in representation by encoding previously reported assertions of connectivity from the literature as examples. Finally, we propose and discuss the implications of a framework for how NSyL will be digitally implemented in the future to streamline curation of experimental results and bridge the gaps among anatomists, imagers, and neuroinformatics databases. Copyright © 2013 Wiley Periodicals, Inc.

  3. Deciphering the language of nature: cryptography, secrecy, and alterity in Francis Bacon.

    Science.gov (United States)

    Clody, Michael C

    2011-01-01

    The essay argues that Francis Bacon's considerations of parables and cryptography reflect larger interpretative concerns of his natural philosophic project. Bacon describes nature as having a language distinct from those of God and man, and, in so doing, establishes a central problem of his natural philosophy—namely, how can the language of nature be accessed through scientific representation? Ultimately, Bacon's solution relies on a theory of differential and duplicitous signs that conceal within them the hidden voice of nature, which is best recognized in the natural forms of efficient causality. The "alphabet of nature"—those tables of natural occurrences—consequently plays a central role in his program, as it renders nature's language susceptible to a process and decryption that mirrors the model of the bilateral cipher. It is argued that while the writing of Bacon's natural philosophy strives for literality, its investigative process preserves a space for alterity within scientific representation, that is made accessible to those with the interpretative key.

  4. Evaluation of uncertainty in the measurement of sense of natural language constructions

    Directory of Open Access Journals (Sweden)

    Bisikalo Oleg V.

    2017-01-01

    Full Text Available The task of evaluating uncertainty in the measurement of sense in natural language constructions (NLCs was researched through formalization of the notions of the language image, formalization of artificial cognitive systems (ACSs and the formalization of units of meaning. The method for measuring the sense of natural language constructions incorporated fuzzy relations of meaning, which ensures that information about the links between lemmas of the text is taken into account, permitting the evaluation of two types of measurement uncertainty of sense characteristics. Using developed applications programs, experiments were conducted to investigate the proposed method to tackle the identification of informative characteristics of text. The experiments resulted in dependencies of parameters being obtained in order to utilise the Pareto distribution law to define relations between lemmas, analysis of which permits the identification of exponents of an average number of connections of the language image as the most informative characteristics of text.

  5. Development of Autonomous Boat-Type Robot for Automated Velocity Measurement in Straight Natural River

    Science.gov (United States)

    Sanjou, Michio; Nagasaka, Tsuyoshi

    2017-11-01

    The present study describes an automated system to measure the river flow velocity. A combination of the camera-tracking system and the Proportional/Integral/Derivative (PID) control could enable the boat-type robot to remain in position against the mainstream; this results in reasonable evaluation of the mean velocity by a duty ratio which corresponds to rotation speed of the screw propeller. A laser range finder module was installed to measure the local water depth. Reliable laboratory experiments with the prototype boat robot and electromagnetic velocimetry were conducted to obtain a calibration curve that connects the duty ratio and mean current velocity. The remaining accuracy in the target point was also examined quantitatively. The fluctuation in the spanwise direction is within half of the robot length. It was therefore found that the robot remains well within the target region. We used two-dimensional navigation tests to guarantee that the prototype moved smoothly to the target points and successfully measured the streamwise velocity profiles across the mainstream. Moreover, the present robot was found to move successfully not only in the laboratory flume but also in a small natural river. The robot could move smoothly from the starting point near the operator's site toward the target point where the velocity is measured, and it could evaluate the cross-sectional discharge.

  6. Methods for automated identification of informative behaviors in natural bioptic driving.

    Science.gov (United States)

    Luo, Gang; Peli, Eli

    2012-06-01

    Visually impaired people may legally drive if wearing bioptic telescopes in some developed countries. To address the controversial safety issue of the practice, we have developed a low-cost in-car recording system that can be installed in study participants' own vehicles to record their daily driving activities. We also developed a set of automated identification techniques of informative behaviors to facilitate efficient manual review of important segments submerged in the vast amount of uncontrolled data. Here, we present the methods and quantitative results of the detection performance for six types of driving maneuvers and behaviors that are important for bioptic driving: bioptic telescope use, turns, curves, intersections, weaving, and rapid stops. The testing data were collected from one normally sighted and two visually impaired subjects across multiple days. The detection rates ranged from 82% up to 100%, and the false discovery rates ranged from 0% to 13%. In addition, two human observers were able to interpret about 80% of targets viewed through the telescope. These results indicate that with appropriate data processing the low-cost system is able to provide reliable data for natural bioptic driving studies.

  7. Linguistic fundamentals for natural language processing 100 essentials from morphology and syntax

    CERN Document Server

    Bender, Emily M

    2013-01-01

    Many NLP tasks have at their core a subtask of extracting the dependencies-who did what to whom-from natural language sentences. This task can be understood as the inverse of the problem solved in different ways by diverse human languages, namely, how to indicate the relationship between different parts of a sentence. Understanding how languages solve the problem can be extremely useful in both feature design and error analysis in the application of machine learning to NLP. Likewise, understanding cross-linguistic variation can be important for the design of MT systems and other multilingual a

  8. The Nature of Automated Jobs and Their Educational and Training Requirements.

    Science.gov (United States)

    Fine, S.A.

    Objective information concerning the impact of automation on educational and training requirements was obtained for 132 employees engaged in electron tube, computer, and steel manufacturing processes through management questionnaire responses, analysis of job functions, and employer interviews before and after the introduction of automation. The…

  9. Automated Microscopy: Macro Language Controlling a Confocal Microscope and its External Illumination: Adaptation for Photosynthetic Organisms

    Czech Academy of Sciences Publication Activity Database

    Steinbach, Gabor; Kaňa, Radek

    2016-01-01

    Roč. 22, č. 2 (2016), s. 258-263 ISSN 1431-9276 R&D Projects: GA ČR GAP501/12/0304; GA MŠk EE2.3.30.0059; GA MŠk ED2.1.00/03.0110; GA MŠk(CZ) LO1416 Institutional support: RVO:61388971 Keywords : automated microscopy * remote controlled microscopy * confocal microscopy Subject RIV: BH - Optics, Masers, Lasers Impact factor: 1.891, year: 2016

  10. Automated reference librarians for program libraries and their interaction with language based editors

    Energy Technology Data Exchange (ETDEWEB)

    Shilling, J.J.

    1986-08-01

    The design of an automated reference librarian system for computer program libraries is presented and discussed from the point of view of the user, the host, and the system itself. The design is modular and includes 4 components: the managing module, the selection module, the use module, and the database module. A prototype implementation of a reference librarian is presented. (DWL). 62 refs., 6 figs.

  11. Pep2Path: automated mass spectrometry-guided genome mining of peptidic natural products.

    Directory of Open Access Journals (Sweden)

    Marnix H Medema

    2014-09-01

    Full Text Available Nonribosomally and ribosomally synthesized bioactive peptides constitute a source of molecules of great biomedical importance, including antibiotics such as penicillin, immunosuppressants such as cyclosporine, and cytostatics such as bleomycin. Recently, an innovative mass-spectrometry-based strategy, peptidogenomics, has been pioneered to effectively mine microbial strains for novel peptidic metabolites. Even though mass-spectrometric peptide detection can be performed quite fast, true high-throughput natural product discovery approaches have still been limited by the inability to rapidly match the identified tandem mass spectra to the gene clusters responsible for the biosynthesis of the corresponding compounds. With Pep2Path, we introduce a software package to fully automate the peptidogenomics approach through the rapid Bayesian probabilistic matching of mass spectra to their corresponding biosynthetic gene clusters. Detailed benchmarking of the method shows that the approach is powerful enough to correctly identify gene clusters even in data sets that consist of hundreds of genomes, which also makes it possible to match compounds from unsequenced organisms to closely related biosynthetic gene clusters in other genomes. Applying Pep2Path to a data set of compounds without known biosynthesis routes, we were able to identify candidate gene clusters for the biosynthesis of five important compounds. Notably, one of these clusters was detected in a genome from a different subphylum of Proteobacteria than that in which the molecule had first been identified. All in all, our approach paves the way towards high-throughput discovery of novel peptidic natural products. Pep2Path is freely available from http://pep2path.sourceforge.net/, implemented in Python, licensed under the GNU General Public License v3 and supported on MS Windows, Linux and Mac OS X.

  12. Stochastic Model for the Vocabulary Growth in Natural Languages

    Science.gov (United States)

    Gerlach, Martin; Altmann, Eduardo G.

    2013-04-01

    We propose a stochastic model for the number of different words in a given database which incorporates the dependence on the database size and historical changes. The main feature of our model is the existence of two different classes of words: (i) a finite number of core words, which have higher frequency and do not affect the probability of a new word to be used, and (ii) the remaining virtually infinite number of noncore words, which have lower frequency and, once used, reduce the probability of a new word to be used in the future. Our model relies on a careful analysis of the Google Ngram database of books published in the last centuries, and its main consequence is the generalization of Zipf’s and Heaps’ law to two-scaling regimes. We confirm that these generalizations yield the best simple description of the data among generic descriptive models and that the two free parameters depend only on the language but not on the database. From the point of view of our model, the main change on historical time scales is the composition of the specific words included in the finite list of core words, which we observe to decay exponentially in time with a rate of approximately 30 words per year for English.

  13. Ontology-Based Controlled Natural Language Editor Using CFG with Lexical Dependency

    Science.gov (United States)

    Namgoong, Hyun; Kim, Hong-Gee

    In recent years, CNL (Controlled Natural Language) has received much attention with regard to ontology-based knowledge acquisition systems. CNLs, as subsets of natural languages, can be useful for both humans and computers by eliminating ambiguity of natural languages. Our previous work, OntoPath [10], proposed to edit natural language-like narratives that are structured in RDF (Resource Description Framework) triples, using a domain-specific ontology as their language constituents. However, our previous work and other systems employing CFG for grammar definition have difficulties in enlarging the expression capacity. A newly developed editor, which we propose in this paper, permits grammar definitions through CFG-LD (Context-Free Grammar with Lexical Dependency) that includes sequential and semantic structures of the grammars. With CFG describing the sequential structure of grammar, lexical dependencies between sentence elements can be designated in the definition system. Through the defined grammars, the implemented editor guides users' narratives in more familiar expressions with a domain-specific ontology and translates the content into RDF triples.

  14. An algorithm to transform natural language into SQL queries for relational databases

    Directory of Open Access Journals (Sweden)

    Garima Singh

    2016-09-01

    Full Text Available Intelligent interface, to enhance efficient interactions between user and databases, is the need of the database applications. Databases must be intelligent enough to make the accessibility faster. However, not every user familiar with the Structured Query Language (SQL queries as they may not aware of structure of the database and they thus require to learn SQL. So, non-expert users need a system to interact with relational databases in their natural language such as English. For this, Database Management System (DBMS must have an ability to understand Natural Language (NL. In this research, an intelligent interface is developed using semantic matching technique which translates natural language query to SQL using set of production rules and data dictionary. The data dictionary consists of semantics sets for relations and attributes. A series of steps like lower case conversion, tokenization, speech tagging, database element and SQL element extraction is used to convert Natural Language Query (NLQ to SQL Query. The transformed query is executed and the results are obtained by the user. Intelligent Interface is the need of database applications to enhance efficient interaction between user and DBMS.

  15. Selecting the Best Mobile Information Service with Natural Language User Input

    Science.gov (United States)

    Feng, Qiangze; Qi, Hongwei; Fukushima, Toshikazu

    Information services accessed via mobile phones provide information directly relevant to subscribers’ daily lives and are an area of dynamic market growth worldwide. Although many information services are currently offered by mobile operators, many of the existing solutions require a unique gateway for each service, and it is inconvenient for users to have to remember a large number of such gateways. Furthermore, the Short Message Service (SMS) is very popular in China and Chinese users would prefer to access these services in natural language via SMS. This chapter describes a Natural Language Based Service Selection System (NL3S) for use with a large number of mobile information services. The system can accept user queries in natural language and navigate it to the required service. Since it is difficult for existing methods to achieve high accuracy and high coverage and anticipate which other services a user might want to query, the NL3S is developed based on a Multi-service Ontology (MO) and Multi-service Query Language (MQL). The MO and MQL provide semantic and linguistic knowledge, respectively, to facilitate service selection for a user query and to provide adaptive service recommendations. Experiments show that the NL3S can achieve 75-95% accuracies and 85-95% satisfactions for processing various styles of natural language queries. A trial involving navigation of 30 different mobile services shows that the NL3S can provide a viable commercial solution for mobile operators.

  16. Application of the Virtual Reality Modeling Language for Design of Automated Workplaces

    OpenAIRE

    Jozef Novak-Marcincin

    2007-01-01

    Virtual Reality Modelling Language (VRML) is description language, which belongs to a field Window on World virtual reality system. The file, which is in VRML format, can be interpreted by VRML explorer in three-dimensional scene. VRML was created with aim to represent virtual reality on Internet easier. Development of 3D graphic is connected with Silicon Graphic Corporation. VRML 2.0 is the file format for describing interactive 3D scenes and objects. It can be used in collaboration with www...

  17. THE NATURE OF LEARNER LANGUAGE: A CASE STUDY OF INDONESIAN LEARNERS LEARNING ENGLISH AS A FOREIGN LANGUAGE

    Directory of Open Access Journals (Sweden)

    Endang Fauziati

    2017-04-01

    Full Text Available This study deals with learner language known as interlanguage; in particular, this tries to investigate its nature. For this purpose, an empirical study was conducted, using Indonesian senior high school learners learning English as the research subjects. This study used error analysis as methodological framework. The data were in the form of interlanguage errors collected from the learners‘ free compositions prior and after an error treatment. The data were analyzed qualitatively. The research indicates that Error treatment was proved to have significant contribution to the destabilization process; that is to say, it helped the learners‘ interlanguage errors change their nature: at a certain period of learning, some particular errors should appear as inevitable part of learning process; as a result of error treatment they change their nature. It was observed that the change of state of interlanguage errors was stimulated by several classroom aspects, namely: input, feedback, explicit grammar explanation, and practice. The conclusion is that the learner language is dynamic in nature.

  18. Natural Language Processing and Fuzzy Tools for Business Processes in a Geolocation Context

    Directory of Open Access Journals (Sweden)

    Isis Truck

    2017-01-01

    Full Text Available In the geolocation field where high-level programs and low-level devices coexist, it is often difficult to find a friendly user interface to configure all the parameters. The challenge addressed in this paper is to propose intuitive and simple, thus natural language interfaces to interact with low-level devices. Such interfaces contain natural language processing (NLP and fuzzy representations of words that facilitate the elicitation of business-level objectives in our context. A complete methodology is proposed, from the lexicon construction to a dialogue software agent including a fuzzy linguistic representation, based on synonymy.

  19. MODUS PONENS AND MODUS TOLLENS: THEIR VALIDITY/INVALIDITY IN NATURAL LANGUAGE ARGUMENTS

    Directory of Open Access Journals (Sweden)

    Ri Yong-Sok

    2017-06-01

    Full Text Available The precedent studies on the validity of Modus ponens and Modus tollens have been carried out with most regard to a major type of conditionals in which the conditional clause is a sufficient condition for the main clause. But we sometimes, in natural language arguments, find other types of conditionals in which the conditional clause is a necessary or necessary and sufficient condition for the main clause. In this paper I reappraise, on the basis of new definitions of Modus ponens and Modus tollens, their validity/invalidity in natural language arguments in consideration of all types of conditionals.

  20. Towards Automated Analysis of Urban Infrastructure after Natural Disasters using Remote Sensing

    Science.gov (United States)

    Axel, Colin

    Natural disasters, such as earthquakes and hurricanes, are an unpreventable component of the complex and changing environment we live in. Continued research and advancement in disaster mitigation through prediction of and preparation for impacts have undoubtedly saved many lives and prevented significant amounts of damage, but it is inevitable that some events will cause destruction and loss of life due to their sheer magnitude and proximity to built-up areas. Consequently, development of effective and efficient disaster response methodologies is a research topic of great interest. A successful emergency response is dependent on a comprehensive understanding of the scenario at hand. It is crucial to assess the state of the infrastructure and transportation network, so that resources can be allocated efficiently. Obstructions to the roadways are one of the biggest inhibitors to effective emergency response. To this end, airborne and satellite remote sensing platforms have been used extensively to collect overhead imagery and other types of data in the event of a natural disaster. The ability of these platforms to rapidly probe large areas is ideal in a situation where a timely response could result in saving lives. Typically, imagery is delivered to emergency management officials who then visually inspect it to determine where roads are obstructed and buildings have collapsed. Manual interpretation of imagery is a slow process and is limited by the quality of the imagery and what the human eye can perceive. In order to overcome the time and resource limitations of manual interpretation, this dissertation inves- tigated the feasibility of performing fully automated post-disaster analysis of roadways and buildings using airborne remote sensing data. First, a novel algorithm for detecting roadway debris piles from airborne light detection and ranging (lidar) point clouds and estimating their volumes is presented. Next, a method for detecting roadway flooding in aerial

  1. An Automated Method to Generate e-Learning Quizzes from Online Language Learner Writing

    Science.gov (United States)

    Flanagan, Brendan; Yin, Chengjiu; Hirokawa, Sachio; Hashimoto, Kiyota; Tabata, Yoshiyuki

    2013-01-01

    In this paper, the entries of Lang-8, which is a Social Networking Site (SNS) site for learning and practicing foreign languages, were analyzed and found to contain similar rates of errors for most error categories reported in previous research. These similarly rated errors were then processed using an algorithm to determine corrections suggested…

  2. Utilising scripting language for unmanned and automated guided vehicles operating within row crops

    DEFF Research Database (Denmark)

    Jørgensen, R. N.; Nørremark, M.; Sørensen, C.G.

    2008-01-01

    the requirements and scope of a process- and behaviour-based scripting language needed to control the weeding AGV in an agricultural row crop. The goal is to traverse and cover the whole field with no human auxiliary input during the field operation. The basis is the transparent and tactical real-time control...

  3. Evaluation of natural language processing from emergency department computerized medical records for intra-hospital syndromic surveillance

    Directory of Open Access Journals (Sweden)

    Pagliaroli Véronique

    2011-07-01

    Full Text Available Abstract Background The identification of patients who pose an epidemic hazard when they are admitted to a health facility plays a role in preventing the risk of hospital acquired infection. An automated clinical decision support system to detect suspected cases, based on the principle of syndromic surveillance, is being developed at the University of Lyon's Hôpital de la Croix-Rousse. This tool will analyse structured data and narrative reports from computerized emergency department (ED medical records. The first step consists of developing an application (UrgIndex which automatically extracts and encodes information found in narrative reports. The purpose of the present article is to describe and evaluate this natural language processing system. Methods Narrative reports have to be pre-processed before utilizing the French-language medical multi-terminology indexer (ECMT for standardized encoding. UrgIndex identifies and excludes syntagmas containing a negation and replaces non-standard terms (abbreviations, acronyms, spelling errors.... Then, the phrases are sent to the ECMT through an Internet connection. The indexer's reply, based on Extensible Markup Language, returns codes and literals corresponding to the concepts found in phrases. UrgIndex filters codes corresponding to suspected infections. Recall is defined as the number of relevant processed medical concepts divided by the number of concepts evaluated (coded manually by the medical epidemiologist. Precision is defined as the number of relevant processed concepts divided by the number of concepts proposed by UrgIndex. Recall and precision were assessed for respiratory and cutaneous syndromes. Results Evaluation of 1,674 processed medical concepts contained in 100 ED medical records (50 for respiratory syndromes and 50 for cutaneous syndromes showed an overall recall of 85.8% (95% CI: 84.1-87.3. Recall varied from 84.5% for respiratory syndromes to 87.0% for cutaneous syndromes. The

  4. Natural language processing of radiology reports for the detection of thromboembolic diseases and clinically relevant incidental findings.

    Science.gov (United States)

    Pham, Anne-Dominique; Névéol, Aurélie; Lavergne, Thomas; Yasunaga, Daisuke; Clément, Olivier; Meyer, Guy; Morello, Rémy; Burgun, Anita

    2014-08-07

    Natural Language Processing (NLP) has been shown effective to analyze the content of radiology reports and identify diagnosis or patient characteristics. We evaluate the combination of NLP and machine learning to detect thromboembolic disease diagnosis and incidental clinically relevant findings from angiography and venography reports written in French. We model thromboembolic diagnosis and incidental findings as a set of concepts, modalities and relations between concepts that can be used as features by a supervised machine learning algorithm. A corpus of 573 radiology reports was de-identified and manually annotated with the support of NLP tools by a physician for relevant concepts, modalities and relations. A machine learning classifier was trained on the dataset interpreted by a physician for diagnosis of deep-vein thrombosis, pulmonary embolism and clinically relevant incidental findings. Decision models accounted for the imbalanced nature of the data and exploited the structure of the reports. The best model achieved an F measure of 0.98 for pulmonary embolism identification, 1.00 for deep vein thrombosis, and 0.80 for incidental clinically relevant findings. The use of concepts, modalities and relations improved performances in all cases. This study demonstrates the benefits of developing an automated method to identify medical concepts, modality and relations from radiology reports in French. An end-to-end automatic system for annotation and classification which could be applied to other radiology reports databases would be valuable for epidemiological surveillance, performance monitoring, and accreditation in French hospitals.

  5. A Natural Language for AdS/CFT Correlators

    Energy Technology Data Exchange (ETDEWEB)

    Fitzpatrick, A.Liam; /Boston U.; Kaplan, Jared; /SLAC; Penedones, Joao; /Perimeter Inst. Theor. Phys.; Raju, Suvrat; /Harish-Chandra Res. Inst.; van Rees, Balt C.; /YITP, Stony Brook

    2012-02-14

    We provide dramatic evidence that 'Mellin space' is the natural home for correlation functions in CFTs with weakly coupled bulk duals. In Mellin space, CFT correlators have poles corresponding to an OPE decomposition into 'left' and 'right' sub-correlators, in direct analogy with the factorization channels of scattering amplitudes. In the regime where these correlators can be computed by tree level Witten diagrams in AdS, we derive an explicit formula for the residues of Mellin amplitudes at the corresponding factorization poles, and we use the conformal Casimir to show that these amplitudes obey algebraic finite difference equations. By analyzing the recursive structure of our factorization formula we obtain simple diagrammatic rules for the construction of Mellin amplitudes corresponding to tree-level Witten diagrams in any bulk scalar theory. We prove the diagrammatic rules using our finite difference equations. Finally, we show that our factorization formula and our diagrammatic rules morph into the flat space S-Matrix of the bulk theory, reproducing the usual Feynman rules, when we take the flat space limit of AdS/CFT. Throughout we emphasize a deep analogy with the properties of flat space scattering amplitudes in momentum space, which suggests that the Mellin amplitude may provide a holographic definition of the flat space S-Matrix.

  6. The Rape of Mother Nature? Women in the Language of Environmental Discourse.

    Science.gov (United States)

    Berman, Tzeporah

    1994-01-01

    Argues that the structure of language reflects and reproduces the dominant model, and reinforces many of the dualistic assumptions which underlie the separation of male and female, nature and culture, mind from body, emotion from reason, and intuition from fact. (LZ)

  7. Combining Machine Learning and Natural Language Processing to Assess Literary Text Comprehension

    Science.gov (United States)

    Balyan, Renu; McCarthy, Kathryn S.; McNamara, Danielle S.

    2017-01-01

    This study examined how machine learning and natural language processing (NLP) techniques can be leveraged to assess the interpretive behavior that is required for successful literary text comprehension. We compared the accuracy of seven different machine learning classification algorithms in predicting human ratings of student essays about…

  8. Speech perception and reading: two parallel modes of understanding language and implications for acquiring literacy naturally.

    Science.gov (United States)

    Massaro, Dominic W

    2012-01-01

    I review 2 seminal research reports published in this journal during its second decade more than a century ago. Given psychology's subdisciplines, they would not normally be reviewed together because one involves reading and the other speech perception. The small amount of interaction between these domains might have limited research and theoretical progress. In fact, the 2 early research reports revealed common processes involved in these 2 forms of language processing. Their illustration of the role of Wundt's apperceptive process in reading and speech perception anticipated descriptions of contemporary theories of pattern recognition, such as the fuzzy logical model of perception. Based on the commonalities between reading and listening, one can question why they have been viewed so differently. It is commonly believed that learning to read requires formal instruction and schooling, whereas spoken language is acquired from birth onward through natural interactions with people who talk. Most researchers and educators believe that spoken language is acquired naturally from birth onward and even prenatally. Learning to read, on the other hand, is not possible until the child has acquired spoken language, reaches school age, and receives formal instruction. If an appropriate form of written text is made available early in a child's life, however, the current hypothesis is that reading will also be learned inductively and emerge naturally, with no significant negative consequences. If this proposal is true, it should soon be possible to create an interactive system, Technology Assisted Reading Acquisition, to allow children to acquire literacy naturally.

  9. Visualization of health information with predications extracted using natural language processing and filtered using the UMLS.

    Science.gov (United States)

    Miller, Trudi; Leroy, Gondy

    2008-11-06

    Increased availability of and reliance on written health information can tax the abilities of unskilled readers. We are developing a system that uses natural language processing to extract phrases, identify medical terms using the UMLS, and visualize the propositions. This system substantially reduces the amount of information a consumer must read, while providing an alternative to traditional prose based text.

  10. Using natural language processing to improve biomedical concept normalization and relation mining

    NARCIS (Netherlands)

    N. Kang (Ning)

    2013-01-01

    textabstractThis thesis concerns the use of natural language processing for improving biomedical concept normalization and relation mining. We begin with introducing the background of biomedical text mining, and subsequently we will continue by describing a typical text mining pipeline, some key

  11. Modelling the phonotactic structure of natural language words with simple recurrent networks

    NARCIS (Netherlands)

    Stoianov, [No Value; Nerbonne, J; Bouma, H; Coppen, PA; vanHalteren, H; Teunissen, L

    1998-01-01

    Simple Recurrent Networks (SRN) are Neural Network (connectionist) models able to process natural language. Phonotactics concerns the order of symbols in words. We continued an earlier unsuccessful trial to model the phonotactics of Dutch words with SRNs. In order to overcome the previously reported

  12. Construct Validity in TOEFL iBT Speaking Tasks: Insights from Natural Language Processing

    Science.gov (United States)

    Kyle, Kristopher; Crossley, Scott A.; McNamara, Danielle S.

    2016-01-01

    This study explores the construct validity of speaking tasks included in the TOEFL iBT (e.g., integrated and independent speaking tasks). Specifically, advanced natural language processing (NLP) tools, MANOVA difference statistics, and discriminant function analyses (DFA) are used to assess the degree to which and in what ways responses to these…

  13. Dimensional Reduction in Vector Space Methods for Natural Language Processing: Products and Projections

    Science.gov (United States)

    Aerts, Sven

    2011-12-01

    We introduce vector space based approaches to natural language processing and some of their similarities with quantum theory when applied to information retrieval. We explain how dimensional reduction is called for from both a practical and theoretical point of view and how this can be achieved through choice of product or through projectors onto subspaces.

  14. Drawing Dynamic Geometry Figures Online with Natural Language for Junior High School Geometry

    Science.gov (United States)

    Wong, Wing-Kwong; Yin, Sheng-Kai; Yang, Chang-Zhe

    2012-01-01

    This paper presents a tool for drawing dynamic geometric figures by understanding the texts of geometry problems. With the tool, teachers and students can construct dynamic geometric figures on a web page by inputting a geometry problem in natural language. First we need to build the knowledge base for understanding geometry problems. With the…

  15. You Are Your Words: Modeling Students' Vocabulary Knowledge with Natural Language Processing Tools

    Science.gov (United States)

    Allen, Laura K.; McNamara, Danielle S.

    2015-01-01

    The current study investigates the degree to which the lexical properties of students' essays can inform stealth assessments of their vocabulary knowledge. In particular, we used indices calculated with the natural language processing tool, TAALES, to predict students' performance on a measure of vocabulary knowledge. To this end, two corpora were…

  16. Reconceptualizing the Nature of Goals and Outcomes in Language/s Education

    Science.gov (United States)

    Leung, Constant; Scarino, Angela

    2016-01-01

    Transformations associated with the increasing speed, scale, and complexity of mobilities, together with the information technology revolution, have changed the demography of most countries of the world and brought about accompanying social, cultural, and economic shifts (Heugh, 2013). This complex diversity has changed the very nature of…

  17. Automatic Determination of the Need for Intravenous Contrast in Musculoskeletal MRI Examinations Using IBM Watson's Natural Language Processing Algorithm.

    Science.gov (United States)

    Trivedi, Hari; Mesterhazy, Joseph; Laguna, Benjamin; Vu, Thienkhai; Sohn, Jae Ho

    2018-04-01

    Magnetic resonance imaging (MRI) protocoling can be time- and resource-intensive, and protocols can often be suboptimal dependent upon the expertise or preferences of the protocoling radiologist. Providing a best-practice recommendation for an MRI protocol has the potential to improve efficiency and decrease the likelihood of a suboptimal or erroneous study. The goal of this study was to develop and validate a machine learning-based natural language classifier that can automatically assign the use of intravenous contrast for musculoskeletal MRI protocols based upon the free-text clinical indication of the study, thereby improving efficiency of the protocoling radiologist and potentially decreasing errors. We utilized a deep learning-based natural language classification system from IBM Watson, a question-answering supercomputer that gained fame after challenging the best human players on Jeopardy! in 2011. We compared this solution to a series of traditional machine learning-based natural language processing techniques that utilize a term-document frequency matrix. Each classifier was trained with 1240 MRI protocols plus their respective clinical indications and validated with a test set of 280. Ground truth of contrast assignment was obtained from the clinical record. For evaluation of inter-reader agreement, a blinded second reader radiologist analyzed all cases and determined contrast assignment based on only the free-text clinical indication. In the test set, Watson demonstrated overall accuracy of 83.2% when compared to the original protocol. This was similar to the overall accuracy of 80.2% achieved by an ensemble of eight traditional machine learning algorithms based on a term-document matrix. When compared to the second reader's contrast assignment, Watson achieved 88.6% agreement. When evaluating only the subset of cases where the original protocol and second reader were concordant (n = 251), agreement climbed further to 90.0%. The classifier was

  18. The Exploring Nature of Definitions and Classifications of Language Learning Strategies (LLSs) in the Current Studies of Second/Foreign Language Learning

    Science.gov (United States)

    Fazeli, Seyed Hossein

    2011-01-01

    This study aims to explore the nature of definitions and classifications of Language Learning Strategies (LLSs) in the current studies of second/foreign language learning in order to show the current problems regarding such definitions and classifications. The present study shows that there is not a universal agreeable definition and…

  19. Mathematics and the Laws of Nature Developing the Language of Science (Revised Edition)

    CERN Document Server

    Tabak, John

    2011-01-01

    Mathematics and the Laws of Nature, Revised Edition describes the evolution of the idea that nature can be described in the language of mathematics. Colorful chapters explore the earliest attempts to apply deductive methods to the study of the natural world. This revised resource goes on to examine the development of classical conservation laws, including the conservation of momentum, the conservation of mass, and the conservation of energy. Chapters have been updated and revised to reflect recent information, including the mathematical pioneers who introduced new ideas about what it meant to

  20. Analyzing the Language of Citation across Discipline and Experience Levels: An Automated Dictionary Approach

    Directory of Open Access Journals (Sweden)

    David Kaufer

    2016-02-01

    Full Text Available Citation practices have been and continue to be a concentrated area of research activity among writing researchers, spanning many disciplines. This research presents a re-analysis of a common data set contributed by Karatsolis (this issue, which focused on the citation practices of 8 PhD advisors and 8 PhD advisees across four disciplines. Our purpose in this paper is to show what automated dictionary methods can uncover on the same data based on a text analysis and visualization environment we have been developing over many years. The results of our analysis suggest that, although automatic dictionary methods cannot reproduce the fine granularity of interpretative coding schemes designed for human coders, it can find significant non-adjacent patterns distributed across a text or corpus that will likely elude the analyst relying solely on serial reading. We report on the discovery of several of these patterns that we believe complement Karatsolis’ original analysis and extend the citation literature at large. We conclude the paper by reviewing some of the advantages and limits of dictionary approaches to textual analysis, as well as debunking some common misconceptions against them.

  1. TAPS - An automated tool for identification of skills, knowledges, and abilities using natural language task description

    International Nuclear Information System (INIS)

    Jorgensen, C.C.; Carter, R.J.

    1986-01-01

    A prototype, computer-based tool (TAPS) has been developed to aid training system developers in identifying skills, knowledges, and abilities (SKAs) during task analysis. TAPS uses concepts of flexible pattern matching to evaluate English descriptions of job behaviors and to recode them as SKA lists. This paper addresses the rationale for TAPS and describes its design including SKA definitions and task analysis logic. It also presents examples of TAPS's application

  2. TAPS: an automated tool for identification of skills, knowledges, and abilities using natural language task description

    International Nuclear Information System (INIS)

    Jorgensen, C.C.; Carter, R.J.

    1986-01-01

    A prototype, computer-based tool (TAPS) has been developed to aid training system developers in identifying skills, knowledges, and abilities (SKAs) during task analysis. TAPS uses concepts of flexible pattern matching to evaluate English descriptions of job behaviors and to recode them as SKA lists. This paper addresses the rationale for TAPS and describes its design including SKA definitions and task analysis logic. It also presents examples of TAPS's application

  3. Automated Assessment of the Quality of Peer Reviews Using Natural Language Processing Techniques

    Science.gov (United States)

    Ramachandran, Lakshmi; Gehringer, Edward F.; Yadav, Ravi K.

    2017-01-01

    A "review" is textual feedback provided by a reviewer to the author of a submitted version. Peer reviews are used in academic publishing and in education to assess student work. While reviews are important to e-commerce sites like Amazon and e-bay, which use them to assess the quality of products and services, our work focuses on…

  4. Natural language indicators of differential gene regulation in the human immune system.

    Science.gov (United States)

    Mehl, Matthias R; Raison, Charles L; Pace, Thaddeus W W; Arevalo, Jesusa M G; Cole, Steve W

    2017-11-21

    Adverse social conditions have been linked to a conserved transcriptional response to adversity (CTRA) in circulating leukocytes that may contribute to social gradients in disease. However, the CNS mechanisms involved remain obscure, in part because CTRA gene-expression profiles often track external social-environmental variables more closely than they do self-reported internal affective states such as stress, depression, or anxiety. This study examined the possibility that variations in patterns of natural language use might provide more sensitive indicators of the automatic threat-detection and -response systems that proximally regulate autonomic induction of the CTRA. In 22,627 audio samples of natural speech sampled from the daily interactions of 143 healthy adults, both total language output and patterns of function-word use covaried with CTRA gene expression. These language features predicted CTRA gene expression substantially better than did conventional self-report measures of stress, depression, and anxiety and did so independently of demographic and behavioral factors (age, sex, race, smoking, body mass index) and leukocyte subset distributions. This predictive relationship held when language and gene expression were sampled more than a week apart, suggesting that associations reflect stable individual differences or chronic life circumstances. Given the observed relationship between personal expression and gene expression, patterns of natural language use may provide a useful behavioral indicator of nonconsciously evaluated well-being (implicit safety vs. threat) that is distinct from conscious affective experience and more closely tracks the neurobiological processes involved in peripheral gene regulation. Copyright © 2017 the Author(s). Published by PNAS.

  5. Naturalism and Ideological Work: How Is Family Language Policy Renegotiated as Both Parents and Children Learn a Threatened Minority Language?

    Science.gov (United States)

    Armstrong, Timothy Currie

    2014-01-01

    Parents who enroll their children to be educated through a threatened minority language frequently do not speak that language themselves and classes in the language are sometimes offered to parents in the expectation that this will help them to support their children's education and to use the minority language in the home. Providing…

  6. The Nature of the Language Faculty and Its Implications for Evolution of Language (Reply to Fitch, Hauser, and Chomsky)

    Science.gov (United States)

    Jackendoff, Ray; Pinker, Steven

    2005-01-01

    In a continuation of the conversation with Fitch, Chomsky, and Hauser on the evolution of language, we examine their defense of the claim that the uniquely human, language-specific part of the language faculty (the ''narrow language faculty'') consists only of recursion, and that this part cannot be considered an adaptation to communication. We…

  7. Adapting existing natural language processing resources for cardiovascular risk factors identification in clinical notes.

    Science.gov (United States)

    Khalifa, Abdulrahman; Meystre, Stéphane

    2015-12-01

    The 2014 i2b2 natural language processing shared task focused on identifying cardiovascular risk factors such as high blood pressure, high cholesterol levels, obesity and smoking status among other factors found in health records of diabetic patients. In addition, the task involved detecting medications, and time information associated with the extracted data. This paper presents the development and evaluation of a natural language processing (NLP) application conceived for this i2b2 shared task. For increased efficiency, the application main components were adapted from two existing NLP tools implemented in the Apache UIMA framework: Textractor (for dictionary-based lookup) and cTAKES (for preprocessing and smoking status detection). The application achieved a final (micro-averaged) F1-measure of 87.5% on the final evaluation test set. Our attempt was mostly based on existing tools adapted with minimal changes and allowed for satisfying performance with limited development efforts. Copyright © 2015 Elsevier Inc. All rights reserved.

  8. Feature Selection for Natural Language Call Routing Based on Self-Adaptive Genetic Algorithm

    Science.gov (United States)

    Koromyslova, A.; Semenkina, M.; Sergienko, R.

    2017-02-01

    The text classification problem for natural language call routing was considered in the paper. Seven different term weighting methods were applied. As dimensionality reduction methods, the feature selection based on self-adaptive GA is considered. k-NN, linear SVM and ANN were used as classification algorithms. The tasks of the research are the following: perform research of text classification for natural language call routing with different term weighting methods and classification algorithms and investigate the feature selection method based on self-adaptive GA. The numerical results showed that the most effective term weighting is TRR. The most effective classification algorithm is ANN. Feature selection with self-adaptive GA provides improvement of classification effectiveness and significant dimensionality reduction with all term weighting methods and with all classification algorithms.

  9. Laboratory process control using natural language commands from a personal computer

    Science.gov (United States)

    Will, Herbert A.; Mackin, Michael A.

    1989-01-01

    PC software is described which provides flexible natural language process control capability with an IBM PC or compatible machine. Hardware requirements include the PC, and suitable hardware interfaces to all controlled devices. Software required includes the Microsoft Disk Operating System (MS-DOS) operating system, a PC-based FORTRAN-77 compiler, and user-written device drivers. Instructions for use of the software are given as well as a description of an application of the system.

  10. Natural language processing-based COTS software and related technologies survey.

    Energy Technology Data Exchange (ETDEWEB)

    Stickland, Michael G.; Conrad, Gregory N.; Eaton, Shelley M.

    2003-09-01

    Natural language processing-based knowledge management software, traditionally developed for security organizations, is now becoming commercially available. An informal survey was conducted to discover and examine current NLP and related technologies and potential applications for information retrieval, information extraction, summarization, categorization, terminology management, link analysis, and visualization for possible implementation at Sandia National Laboratories. This report documents our current understanding of the technologies, lists software vendors and their products, and identifies potential applications of these technologies.

  11. Human Computer Collaboration at the Edge: Enhancing Collective Situation Understanding with Controlled Natural Language

    Science.gov (United States)

    2016-09-06

    has conceptually noted lim- itations of COPs [26]; our research empirically illustrates the tradeoffs with a COP even if all users have a shared goal...in group size and dynamics. To further assess the effects of a COP on information quality and quantity, we plan to run a conceptual replication of the...2] T. Kuhn, “A survey and classification of controlled natural languages,” Computational Linguistics , vol. 40, pp. 121–170, 2014. [3] E. Cambria

  12. Effects of speech- and text-based interaction modes in natural language human-computer dialogue.

    Science.gov (United States)

    Le Bigot, Ludovic; Rouet, Jean-François; Jamet, Eric

    2007-12-01

    This study examined the effects of user production (speaking and typing) and user reception (listening and reading) modes on natural language human-computer dialogue. Text-based dialogue is often more efficient than speech-based dialogue, but the latter is more dynamic and more suitable for mobile environments and hands-busy situations. The respective contributions of user production and reception modes have not previously been assessed. Eighteen participants performed several information search tasks using a natural language information system in four experimental conditions: phone (speaking and listening), Web (typing and reading), and mixed (speaking and reading or typing and listening). Mental workload was greater and participants' repetitions of commands were more frequent when speech (speaking or listening) was used for both the user production and reception modes rather than text (typing or reading). Completion times were longer for listening than for reading. Satisfaction was lower, utterances were longer, and the interaction error rate was higher for speaking than typing. The production and reception modes both contribute to dialogue and mental workload. They have distinct contributions to performance, satisfaction, and the form of the discourse. The most efficient configuration for interacting in natural language would appear to be speech for production and system prompts in text, as this combination decreases the time on task while improving dialogue involvement.

  13. Using Natural Language And Voice To Control High Level Tasks In A Robotics Environment

    Science.gov (United States)

    Hackenberg, Robert G.

    1987-03-01

    RCA's Advanced Technology Laboratories (ATL) has implemented an integrated system which permits control of high level tasks in a robotics environment through voice input in the form of natural language syntax. The paper to be presented will outline the architecture used to integrate voice recognition and synthesis hardware and natural language and intelligent reasoning software with a supervisory processor that controls robotic and vision operations in the robotic testbed. The application is intended to give the human operator of a Puma 782 industrial robot the ability to combine joystick teleoperation with voice input in order to provide a flexible man-machine interface in a hands-busy environment. The system is designed to give the operator a speech interface which is unobtrusive and undemanding in terms of predetermined syntax requirements. The voice recognizer accepts continuous speech and the natural language processor accepts full and partial sentence fragments and can perform a fair amount of disambiguation and context analysis. Output to the operator comes via the parallel channel of speech synthesis so that the operator does not have to consult the computer's CRT for messages. The messages are generated from the software and offer warnings about unacceptable situations, confirmations of actions completed, and feedback of system data.

  14. Detecting Novel and Emerging Drug Terms Using Natural Language Processing: A Social Media Corpus Study

    Science.gov (United States)

    Simpson, Sean S; Brugman, Claudia M; Conners, Thomas J

    2018-01-01

    Background With the rapid development of new psychoactive substances (NPS) and changes in the use of more traditional drugs, it is increasingly difficult for researchers and public health practitioners to keep up with emerging drugs and drug terms. Substance use surveys and diagnostic tools need to be able to ask about substances using the terms that drug users themselves are likely to be using. Analyses of social media may offer new ways for researchers to uncover and track changes in drug terms in near real time. This study describes the initial results from an innovative collaboration between substance use epidemiologists and linguistic scientists employing techniques from the field of natural language processing to examine drug-related terms in a sample of tweets from the United States. Objective The objective of this study was to assess the feasibility of using distributed word-vector embeddings trained on social media data to uncover previously unknown (to researchers) drug terms. Methods In this pilot study, we trained a continuous bag of words (CBOW) model of distributed word-vector embeddings on a Twitter dataset collected during July 2016 (roughly 884.2 million tokens). We queried the trained word embeddings for terms with high cosine similarity (a proxy for semantic relatedness) to well-known slang terms for marijuana to produce a list of candidate terms likely to function as slang terms for this substance. This candidate list was then compared with an expert-generated list of marijuana terms to assess the accuracy and efficacy of using word-vector embeddings to search for novel drug terminology. Results The method described here produced a list of 200 candidate terms for the target substance (marijuana). Of these 200 candidates, 115 were determined to in fact relate to marijuana (65 terms for the substance itself, 50 terms related to paraphernalia). This included 30 terms which were used to refer to the target substance in the corpus yet did not appear

  15. Language

    DEFF Research Database (Denmark)

    Sanden, Guro Refsum

    2016-01-01

    Purpose: – The purpose of this paper is to analyse the consequences of globalisation in the area of corporate communication, and investigate how language may be managed as a strategic resource. Design/methodology/approach: – A review of previous studies on the effects of globalisation on corporate...... communication and the implications of language management initiatives in international business. Findings: – Efficient language management can turn language into a strategic resource. Language needs analyses, i.e. linguistic auditing/language check-ups, can be used to determine the language situation...

  16. Applications of artificial intelligence to space station and automated software techniques: High level robot command language

    Science.gov (United States)

    Mckee, James W.

    1989-01-01

    The objective is to develop a system that will allow a person not necessarily skilled in the art of programming robots to quickly and naturally create the necessary data and commands to enable a robot to perform a desired task. The system will use a menu driven graphical user interface. This interface will allow the user to input data to select objects to be moved. There will be an imbedded expert system to process the knowledge about objects and the robot to determine how they are to be moved. There will be automatic path planning to avoid obstacles in the work space and to create a near optimum path. The system will contain the software to generate the required robot instructions.

  17. Testing an AAC system that transforms pictograms into natural language with persons with cerebral palsy.

    Science.gov (United States)

    Pahisa-Solé, Joan; Herrera-Joancomartí, Jordi

    2017-10-18

    In this article, we describe a compansion system that transforms the telegraphic language that comes from the use of pictogram-based augmentative and alternative communication (AAC) into natural language. The system was tested with four participants with severe cerebral palsy and ranging degrees of linguistic competence and intellectual disabilities. Participants had used pictogram-based AAC at least for the past 30 years each and presented a stable linguistic profile. During tests, which consisted of a total of 40 sessions, participants were able to learn new linguistic skills, such as the use of basic verb tenses, while using the compansion system, which proved a source of motivation. The system can be adapted to the linguistic competence of each person and required no learning curve during tests when none of its special features, like gender, number, verb tense, or sentence type modifiers, were used. Furthermore, qualitative and quantitative results showed a mean communication rate increase of 41.59%, compared to the same communication device without the compansion system, and an overall improvement in the communication experience when the output is in natural language. Tests were conducted in Catalan and Spanish.

  18. Resolution of ambiguities in cartoons as an illustration of the role of pragmatics in natural language understanding by computers

    Energy Technology Data Exchange (ETDEWEB)

    Mazlack, L.J.; Paz, N.M.

    1983-01-01

    Newspaper cartoons can graphically display the result of ambiguity in human speech; the result can be unexpected and funny. Likewise, computer analysis of natural language statements also needs to successfully resolve ambiguous situations. Computer techniques already developed use restricted world knowledge in resolving ambiguous language use. This paper illustrates how these techniques can be used in resolving ambiguous situations arising in cartoons. 8 references.

  19. Language Revitalization.

    Science.gov (United States)

    Hinton, Leanne

    2003-01-01

    Surveys developments in language revitalization and language death. Focusing on indigenous languages, discusses the role and nature of appropriate linguistic documentation, possibilities for bilingual education, and methods of promoting oral fluency and intergenerational transmission in affected languages. (Author/VWL)

  20. Ulisse Aldrovandi's Color Sensibility: Natural History, Language and the Lay Color Practices of Renaissance Virtuosi.

    Science.gov (United States)

    Pugliano, Valentina

    2015-01-01

    Famed for his collection of drawings of naturalia and his thoughts on the relationship between painting and natural knowledge, it now appears that the Bolognese naturalist Ulisse Aldrovandi (1522-1605) also pondered specifically color and pigments, compiling not only lists and diagrams of color terms but also a full-length unpublished manuscript entitled De coloribus or Trattato dei colori. Introducing these writings for the first time, this article portrays a scholar not so much interested in the materiality of pigment production, as in the cultural history of hues. It argues that these writings constituted an effort to build a language of color, in the sense both of a standard nomenclature of hues and of a lexicon, a dictionary of their denotations and connotations as documented in the literature of ancients and moderns. This language would serve the naturalist in his artistic patronage and his natural historical studies, where color was considered one of the most reliable signs for the correct identification of specimens, and a guarantee of accuracy in their illustration. Far from being an exception, Aldrovandi's 'color sensibility'spoke of that of his university-educated nature-loving peers.

  1. Natural Language Processing Approach for Searching the Quran: Quick and Intuitive

    Directory of Open Access Journals (Sweden)

    Zainal Abidah

    2017-01-01

    Full Text Available The Quran is a scripture that acts as the main reference to people which their religion is Islam. It covers information from politics to science, with vast amount of information that requires effort to uncover the knowledge behind it. Today, the emergence of smartphones has led to the development of a wide-range application for enhancing knowledge-seeking activities. This project proposes a mobile application that is taking a natural language approach to searching topics in the Quran based on keyword searching. The benefit of the application is two-fold; it is intuitive and it saves time.

  2. Semi-supervised learning and domain adaptation in natural language processing

    CERN Document Server

    Søgaard, Anders

    2013-01-01

    This book introduces basic supervised learning algorithms applicable to natural language processing (NLP) and shows how the performance of these algorithms can often be improved by exploiting the marginal distribution of large amounts of unlabeled data. One reason for that is data sparsity, i.e., the limited amounts of data we have available in NLP. However, in most real-world NLP applications our labeled data is also heavily biased. This book introduces extensions of supervised learning algorithms to cope with data sparsity and different kinds of sampling bias.This book is intended to be both

  3. Knowledge acquisition from natural language for expert systems based on classification problem-solving methods

    Science.gov (United States)

    Gomez, Fernando

    1989-01-01

    It is shown how certain kinds of domain independent expert systems based on classification problem-solving methods can be constructed directly from natural language descriptions by a human expert. The expert knowledge is not translated into production rules. Rather, it is mapped into conceptual structures which are integrated into long-term memory (LTM). The resulting system is one in which problem-solving, retrieval and memory organization are integrated processes. In other words, the same algorithm and knowledge representation structures are shared by these processes. As a result of this, the system can answer questions, solve problems or reorganize LTM.

  4. Detecting inpatient falls by using natural language processing of electronic medical records

    Directory of Open Access Journals (Sweden)

    Toyabe Shin-ichi

    2012-12-01

    Full Text Available Abstract Background Incident reporting is the most common method for detecting adverse events in a hospital. However, under-reporting or non-reporting and delay in submission of reports are problems that prevent early detection of serious adverse events. The aim of this study was to determine whether it is possible to promptly detect serious injuries after inpatient falls by using a natural language processing method and to determine which data source is the most suitable for this purpose. Methods We tried to detect adverse events from narrative text data of electronic medical records by using a natural language processing method. We made syntactic category decision rules to detect inpatient falls from text data in electronic medical records. We compared how often the true fall events were recorded in various sources of data including progress notes, discharge summaries, image order entries and incident reports. We applied the rules to these data sources and compared F-measures to detect falls between these data sources with reference to the results of a manual chart review. The lag time between event occurrence and data submission and the degree of injury were compared. Results We made 170 syntactic rules to detect inpatient falls by using a natural language processing method. Information on true fall events was most frequently recorded in progress notes (100%, incident reports (65.0% and image order entries (12.5%. However, F-measure to detect falls using the rules was poor when using progress notes (0.12 and discharge summaries (0.24 compared with that when using incident reports (1.00 and image order entries (0.91. Since the results suggested that incident reports and image order entries were possible data sources for prompt detection of serious falls, we focused on a comparison of falls found by incident reports and image order entries. Injury caused by falls found by image order entries was significantly more severe than falls detected by

  5. Visualizing Patient Journals by Combining Vital Signs Monitoring and Natural Language Processing

    DEFF Research Database (Denmark)

    Vilic, Adnan; Petersen, John Asger; Hoppe, Karsten

    2016-01-01

    This paper presents a data-driven approach to graphically presenting text-based patient journals while still maintaining all textual information. The system first creates a timeline representation of a patients’ physiological condition during an admission, which is assessed by electronically...... monitoring vital signs and then combining these into Early Warning Scores (EWS). Hereafter, techniques from Natural Language Processing (NLP) are applied on the existing patient journal to extract all entries. Finally, the two methods are combined into an interactive timeline featuring the ability to see...... drastic changes in the patients’ health, and thereby enabling staff to see where in the journal critical events have taken place....

  6. Systemic functional grammar in natural language generation linguistic description and computational representation

    CERN Document Server

    Teich, Elke

    1999-01-01

    This volume deals with the computational application of systemic functional grammar (SFG) for natural language generation. In particular, it describes the implementation of a fragment of the grammar of German in the computational framework of KOMET-PENMAN for multilingual generation. The text also presents a specification of explicit well-formedness constraints on syntagmatic structure which are defined in the form of typed feature structures. It thus achieves a model of systemic functional grammar that unites both the strengths of systemics, such as stratification, functional diversification

  7. Visualizing Patient Journals by Combining Vital Signs Monitoring and Natural Language Processing

    DEFF Research Database (Denmark)

    Vilic, Adnan; Petersen, John Asger; Hoppe, Karsten

    2016-01-01

    monitoring vital signs and then combining these into Early Warning Scores (EWS). Hereafter, techniques from Natural Language Processing (NLP) are applied on the existing patient journal to extract all entries. Finally, the two methods are combined into an interactive timeline featuring the ability to see......This paper presents a data-driven approach to graphically presenting text-based patient journals while still maintaining all textual information. The system first creates a timeline representation of a patients’ physiological condition during an admission, which is assessed by electronically...... drastic changes in the patients’ health, and thereby enabling staff to see where in the journal critical events have taken place....

  8. Facilitating pharmacogenetic studies using electronic health records and natural-language processing: a case study of warfarin.

    Science.gov (United States)

    Xu, Hua; Jiang, Min; Oetjens, Matt; Bowton, Erica A; Ramirez, Andrea H; Jeff, Janina M; Basford, Melissa A; Pulley, Jill M; Cowan, James D; Wang, Xiaoming; Ritchie, Marylyn D; Masys, Daniel R; Roden, Dan M; Crawford, Dana C; Denny, Joshua C

    2011-01-01

    DNA biobanks linked to comprehensive electronic health records systems are potentially powerful resources for pharmacogenetic studies. This study sought to develop natural-language-processing algorithms to extract drug-dose information from clinical text, and to assess the capabilities of such tools to automate the data-extraction process for pharmacogenetic studies. A manually validated warfarin pharmacogenetic study identified a cohort of 1125 patients with a stable warfarin dose, in which 776 patients were managed by Coumadin Clinic physicians, and the remaining 349 patients were managed by their providers. The authors developed two algorithms to extract weekly warfarin doses from both data sets: a regular expression-based program for semistructured Coumadin Clinic notes; and an advanced weekly dose calculator based on an existing medication information extraction system (MedEx) for narrative providers' notes. The authors then conducted an association analysis between an automatically extracted stable weekly dose of warfarin and four genetic variants of VKORC1 and CYP2C9 genes. The performance of the weekly dose-extraction program was evaluated by comparing it with a gold standard containing manually curated weekly doses. Precision, recall, F-measure, and overall accuracy were reported. Associations between known variants in VKORC1 and CYP2C9 and warfarin stable weekly dose were performed with linear regression adjusted for age, gender, and body mass index. The authors' evaluation showed that the MedEx-based system could determine patients' warfarin weekly doses with 99.7% recall, 90.8% precision, and 93.8% accuracy. Using the automatically extracted weekly doses of warfarin, the authors successfully replicated the previous known associations between warfarin stable dose and genetic variants in VKORC1 and CYP2C9.

  9. Formularity: Software for Automated Formula Assignment of Natural and Other Organic Matter from Ultrahigh-Resolution Mass Spectra

    Energy Technology Data Exchange (ETDEWEB)

    Tolic, Nikola; Liu, Yina; Liyu, Andrey V.; Shen, Yufeng; Tfaily, Malak M.; Kujawinski, Elizabeth B.; Longnecker, Krista; Kuo, Li-Jung; Robinson, Errol W.; Pasa Tolic, Ljiljana; Hess, Nancy J.

    2017-11-13

    Ultrahigh-resolution mass spectrometry, such as Fourier transform ion-cyclotron resonance mass spectrometry (FT-ICR MS), can resolve thousands of molecular ions in complex organic matrices. A Compound Identification Algorithm (CIA) was previously developed for automated elemental formula assignment for natural organic matter (NOM). In this work we describe a user friendly interface for CIA, titled Formularity, which includes an additional functionality to perform search of formulas based on an Isotopic Pattern Algorithm (IPA). While CIA assigns elemental formulas for compounds containing C, H, O, N, S, and P, IPA is capable of assigning formulas for compounds containing other elements. We used halogenated organic compounds (HOC), a chemical class that is ubiquitous in nature as well as anthropogenic systems, as an example to demonstrate the capability of Formularity with IPA. A HOC standard mix was used to evaluate the identification confidence of IPA. The HOC spike in NOM and tap water were used to assess HOC identification in natural and anthropogenic matrices. Strategies for reconciliation of CIA and IPA assignments are discussed. Software and sample databases with documentation are freely available from the PNNL OMICS software repository https://omics.pnl.gov/software/formularity.

  10. Language and Interactional Discourse: Deconstrusting the Talk- Generating Machinery in Natural Convresation

    Directory of Open Access Journals (Sweden)

    Amaechi Uneke Enyi

    2015-08-01

    Full Text Available The study entitled. “Language and Interactional Discourse: Deconstructing the Talk - Generating Machinery in Natural Conversation,” is an analysis of spontaneous and informal conversation. The study, carried out in the theoretical and methodological tradition of Ethnomethodology, was aimed at explicating how ordinary talk is organized and produced, how people coordinate their talk –in- interaction, how meanings are determined, and the role of talk in the wider social processes. The study followed the basic assumption of conversation analysis which is, that talk is not just a product of two ‘speakers - hearers’ who attempt to exchange information or convey messages to each other. Rather, participants in conversation are seen to be mutually orienting to, and collaborating in order to achieve orderly and meaningful communication. The analytic objective is therefore to make clear these procedures on which speakers rely to produce utterances and by which they make sense of other speakers’ talk. The datum used for this study was a recorded informal conversation between two (and later three middle- class civil servants who are friends. The recording was done in such a way that the participants were not aware that they were being recorded. The recording was later transcribed in a way that we believe is faithful to the spontaneity and informality of the talk. Our finding showed that conversation has its own features and is an ordered and structured social day by- day event. Specifically, utterances are designed and informed by organized procedures, methods and resources which are tied to the contexts in which they are produced, and which participants are privy to by virtue of their membership of a culture or a natural language community.  Keywords: Language, Discourse and Conversation

  11. Natural Conversation Reconstruction Tasks: The Language Classroom as a Meeting Place

    Directory of Open Access Journals (Sweden)

    Jun Ohashi

    2009-03-01

    Full Text Available This paper, drawing on Pratt’s notion of ‘transculturation’ and Bhabha’s ‘third space’, presents an example of language learning tasks that empower learners’ agency and promote their cross-cultural awareness and sensitivities to a different set of cultural expectations, using a naturally occurred Japanese thanking episodes. The paper discusses the merits of Natural Conversation Reconstruction Tasks (NCRTs as a practical method for helping L2 learners develop this ‘intercultural competence’. It is based on a qualitative study of the results of one NCRT created for use in the context of teaching Japanese as a L2 in a multicultural society. It suggests the NCRT encourages the learners to explore the intersection where language use, speaker intention and L1 and L2 cultural norms meet. Such a process helps the learners become aware of socially expected patterns of communication in L1 and L2 in terms of the choices of speech act, formulaic expressions, sequential organization and politeness orientation. The learners’ comments suggest that the NCRT helps learners transcend their cultural boundaries by overcoming their narrow understanding of ‘thanking’ as ‘expressions of gratitude and appreciation’ and by cross-culturally widening their views of what counts as thanking. The NCRT with rich contextual information promotes the learners’ intercultural awareness, sensitivity to context and intercultural exploration in the space between L1 and L2, where they have authority and freedom of making sense of conversations, and pragmatics is fully integrated into language pedagogy.

  12. Formal ontology for natural language processing and the integration of biomedical databases.

    Science.gov (United States)

    Simon, Jonathan; Dos Santos, Mariana; Fielding, James; Smith, Barry

    2006-01-01

    The central hypothesis underlying this communication is that the methodology and conceptual rigor of a philosophically inspired formal ontology can bring significant benefits in the development and maintenance of application ontologies [A. Flett, M. Dos Santos, W. Ceusters, Some Ontology Engineering Procedures and their Supporting Technologies, EKAW2002, 2003]. This hypothesis has been tested in the collaboration between Language and Computing (L&C), a company specializing in software for supporting natural language processing especially in the medical field, and the Institute for Formal Ontology and Medical Information Science (IFOMIS), an academic research institution concerned with the theoretical foundations of ontology. In the course of this collaboration L&C's ontology, LinKBase, which is designed to integrate and support reasoning across a plurality of external databases, has been subjected to a thorough auditing on the basis of the principles underlying IFOMIS's Basic Formal Ontology (BFO) [B. Smith, Basic Formal Ontology, 2002. http://ontology.buffalo.edu/bfo]. The goal is to transform a large terminology-based ontology into one with the ability to support reasoning applications. Our general procedure has been the implementation of a meta-ontological definition space in which the definitions of all the concepts and relations in LinKBase are standardized in the framework of first-order logic. In this paper we describe how this principles-based standardization has led to a greater degree of internal coherence of the LinKBase structure, and how it has facilitated the construction of mappings between external databases using LinKBase as translation hub. We argue that the collaboration here described represents a new phase in the quest to solve the so-called "Tower of Babel" problem of ontology integration [F. Montayne, J. Flanagan, Formal Ontology: The Foundation for Natural Language Processing, 2003. http://www.landcglobal.com/].

  13. Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review.

    Science.gov (United States)

    Kreimeyer, Kory; Foster, Matthew; Pandey, Abhishek; Arya, Nina; Halford, Gwendolyn; Jones, Sandra F; Forshee, Richard; Walderhaug, Mark; Botsis, Taxiarchis

    2017-09-01

    We followed a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses to identify existing clinical natural language processing (NLP) systems that generate structured information from unstructured free text. Seven literature databases were searched with a query combining the concepts of natural language processing and structured data capture. Two reviewers screened all records for relevance during two screening phases, and information about clinical NLP systems was collected from the final set of papers. A total of 7149 records (after removing duplicates) were retrieved and screened, and 86 were determined to fit the review criteria. These papers contained information about 71 different clinical NLP systems, which were then analyzed. The NLP systems address a wide variety of important clinical and research tasks. Certain tasks are well addressed by the existing systems, while others remain as open challenges that only a small number of systems attempt, such as extraction of temporal information or normalization of concepts to standard terminologies. This review has identified many NLP systems capable of processing clinical free text and generating structured output, and the information collected and evaluated here will be important for prioritizing development of new approaches for clinical NLP. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records.

    Science.gov (United States)

    Luo, Yuan; Szolovits, Peter

    2016-01-01

    In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen's interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen's relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions.

  15. PERSISTENCE AND ACADEMIC ACHIEVEMENT IN FOREIGN LANGUAGE IN NATURAL SCIENCES STUDENTS

    Directory of Open Access Journals (Sweden)

    Alexandr I Krupnov

    2017-12-01

    Full Text Available The article discusses the results of empirical study of the association between variables of persistence and academic achievement in foreign languages. The sample includes students of the Faculty of Physics, Mathematics and Natural Science at the RUDN University ( n = 115, divided into 5 subsamples, two of which are featured in the present study (the most and the least successful students subsamples. Persistence as a personality trait is studied within A.I. Krupnov’s system-functional approach. A.I. Krupnov’s paper-and-pencil test was used to measure persistence variables. Academic achievement was measured according to the four parameters: Phonetics, Grammar, Speaking and Political vocabulary based on the grades students received during the academic year. The analysis revealed that persistence displays different associations with academic achievement variables in more and less successful students subsamples, the general prominence of this trait is more important for unsuccessful students. Phonetics is the academic achievement variable most associated with persistence due to its nature, a skill one can acquire through hard work and practice which is the definition of persistence. Grammar as an academic achievement variable is not associated with persistence and probably relates to other factors. Unsuccessful students may have difficulties in separating various aspects of language acquisition from each other which should be taken into consideration by the teachers.

  16. Text to Speech Berbasis Natural Language pada Aplikasi Pembelajaran Tenses Bahasa Inggris

    Directory of Open Access Journals (Sweden)

    Amak Yunus

    2014-09-01

    Full Text Available Bahasa adalah sebuah cara berkomunikasi secara sistematis dengan menggunakan suara atau simbol-simbol yang memiliki arti, yang diucapkan melalui mulut. Bahasa juga ditulis dengan mengikuti kaidah yang berlaku. Salah satu bahasa yang banyak digunakan di belahan dunia adalah Bahasa Inggris. Namun ada beberapa kendala apabila kita belajar kepada seorang guru atau instruktur. Waktu yang diberikan seorang guru, terbatas pada jam sekolah atau les saja. Bila siswa pulang sekolah atau les, maka yang bersangkutan harus belajar bahasa Inggris secara mandiri. Dari permasalahan di atas, muncul sebuah ide tentang bagaimana membuat sebuah penelitian yang berkaitan dengan pembuatan aplikasi yang mampu memberikan pengetahuan kepada siswa tentang bagaimana belajar bahasa Inggris secara mandiri baik dari perubahan kalimat postif menjadi kalimat negatif dan kalimat tanya. Disamping itu, aplikasi ini juga mampu memberikan pengetahuan tentang bagaimana mengucapkan kalimat dalam bahasa Inggris. Pada intinya kontribusi yang dapat diperoleh dari hasil penelitian ini adalah pihak terkait dari tingkat SMP sampai dengan SMU/SMK, dapat menggunakan aplikasi text to speech berbasis natural language processing untuk mempelajari tenses pada bahasa Inggris. Aplikasi ini dapat memperdengarkan kalimat-kalimat pada bahasa inggris dan dapat menyusun kalimat tanya dan kalimat negatif berdasarkan kalimat positifnya dalam beberapa tenses bahasa Inggris. Kata Kunci : Natural language processing, Text to speech

  17. NATURAL LANGUAGE PROCESSING (NLP UNTUK MENGETAHUI HUKUM BACAAN AL-QUR’AN

    Directory of Open Access Journals (Sweden)

    Heriyanto Heriyanto

    2015-04-01

    Full Text Available Natural Language Processing (NLP to know Al-Quran reading law can analyse text data input in the form of sentence with everyday human being Ianguage of process early by recognizing syntak order and existing production order through scanning, identifying token, result of from token will be conducted by parsing and processed later;then to be conducted by adaptation with existing production order. Result of adaptation will in accepting or is not accepted by if do not fulfill existing production order will emerge message of mistake. The result if as according to order produce hence will present as according to wanted sentence to present Al-Quran reading law. Knowing real correct Al-Quran reading law as according to tartil, its science of its law nya of kifayah fardhu, therefore in studying and knowing Al-Quran reading law by using Natural Language Processing (NLP can fulfill science procedures learn to read Al-Quran matching with tajwid science. NLP which can recognize wanted reading law by consumer for the letter of Al-Fatihah, Al-Baqarah Juz 1. made Application Software can give appearance result of Al-Quran reading laws, NLP which can analyse about wanted reading law with input pass text. Read text pursuant to used production order so that can know reading law which is pursuant to included text

  18. Teaching the tacit knowledge of programming to noviceswith natural language tutoring

    Science.gov (United States)

    Lane, H. Chad; Vanlehn, Kurt

    2005-09-01

    For beginning programmers, inadequate problem solving and planning skills are among the most salient of their weaknesses. In this paper, we test the efficacy of natural language tutoring to teach and scaffold acquisition of these skills. We describe ProPL (Pro-PELL), a dialogue-based intelligent tutoring system that elicits goal decompositions and program plans from students in natural language. The system uses a variety of tutoring tactics that leverage students' intuitive understandings of the problem, how it might be solved, and the underlying concepts of programming. We report the results of a small-scale evaluation comparing students who used ProPL with a control group who read the same content. Our primary findings are that students who received tutoring from ProPL seem to have developed an improved ability to solve the composition problem and displayed behaviors that suggest they were able to think at greater levels of abstraction than students in the read-only group.

  19. Knowledge-based machine indexing from natural language text: Knowledge base design, development, and maintenance

    Science.gov (United States)

    Genuardi, Michael T.

    1993-01-01

    One strategy for machine-aided indexing (MAI) is to provide a concept-level analysis of the textual elements of documents or document abstracts. In such systems, natural-language phrases are analyzed in order to identify and classify concepts related to a particular subject domain. The overall performance of these MAI systems is largely dependent on the quality and comprehensiveness of their knowledge bases. These knowledge bases function to (1) define the relations between a controlled indexing vocabulary and natural language expressions; (2) provide a simple mechanism for disambiguation and the determination of relevancy; and (3) allow the extension of concept-hierarchical structure to all elements of the knowledge file. After a brief description of the NASA Machine-Aided Indexing system, concerns related to the development and maintenance of MAI knowledge bases are discussed. Particular emphasis is given to statistically-based text analysis tools designed to aid the knowledge base developer. One such tool, the Knowledge Base Building (KBB) program, presents the domain expert with a well-filtered list of synonyms and conceptually-related phrases for each thesaurus concept. Another tool, the Knowledge Base Maintenance (KBM) program, functions to identify areas of the knowledge base affected by changes in the conceptual domain (for example, the addition of a new thesaurus term). An alternate use of the KBM as an aid in thesaurus construction is also discussed.

  20. Automated Image Sampling and Classification Can Be Used to Explore Perceived Naturalness of Urban Spaces.

    Directory of Open Access Journals (Sweden)

    Roger Hyam

    Full Text Available The psychological restorative effects of exposure to nature are well established and extend to just viewing of images of nature. A previous study has shown that Perceived Naturalness (PN of images correlates with their restorative value. This study tests whether it is possible to detect degree of PN of images using an image classifier. It takes images that have been scored by humans for PN (including a subset that have been assessed for restorative value and passes them through the Google Vision API image classification service. The resulting labels are assigned to broad semantic classes to create a Calculated Semantic Naturalness (CSN metric for each image. It was found that CSN correlates with PN. CSN was then calculated for a geospatial sampling of Google Street View images across the city of Edinburgh. CSN was found to correlate with PN in this sample also indicating the technique may be useful in large scale studies. Because CSN correlates with PN which correlates with restorativeness it is suggested that CSN or a similar measure may be useful in automatically detecting restorative images and locations. In an exploratory aside CSN was not found to correlate with an indicator of socioeconomic deprivation.

  1. Selected Topics on Systems Modeling and Natural Language Processing: Editorial Introduction to the Issue 7 of CSIMQ

    Directory of Open Access Journals (Sweden)

    Witold Andrzejewski

    2016-07-01

    Full Text Available The seventh issue of Complex Systems Informatics and Modeling Quarterly presents five papers devoted to two distinct research topics: systems modeling and natural language processing (NLP. Both of these subjects are very important in computer science. Through modeling we can simplify the studied problem by concentrating on only one aspect at a time. Moreover, a properly constructed model allows the modeler to work on higher levels of abstraction and not having to concentrate on details. Since the size and complexity of information systems grows rapidly, creating good models of such systems is crucial. The analysis of natural language is slowly becoming a widely used tool in commerce and day to day life. Opinion mining allows recommender systems to provide accurate recommendations based on user-generated reviews. Speech recognition and NLP are the basis for such widely used personal assistants as Apple’s Siri, Microsoft’s Cortana, and Google Now. While a lot of work has already been done on natural language processing, the research usually concerns widely used languages, such as English. Consequently, natural language processing in languages other than English is very relevant subject and is addressed in this issue.

  2. Overview of research work activities in German language in the Home Automation area; Ueberblick deutschsprachiger Forschungsaktivitaeten im Bereich Home Automation. Forschungsinstitute, Themen, Ergebnisse - Schlussbericht

    Energy Technology Data Exchange (ETDEWEB)

    Staub, R.

    2010-02-15

    This final report for the Swiss Federal Office of Energy (SFOE) takes a look at research work carried out in Germany and Austria on 'smart homes'. The aim of the project was to determine which work has already been carried out in Germany and Austria so that work in Switzerland can be concentrated on questions that have not been looked at in Germany and Austria. The appropriate research institutions are listed. Concrete projects are briefly described and their relevance for Swiss efforts is examined. Various Home Automation project categories are listed, as are the most important research institutes involved. The particular research projects in Germany and Austria and their relevance to Swiss efforts are listed.

  3. A study of the very high order natural user language (with AI capabilities) for the NASA space station common module

    Science.gov (United States)

    Gill, E. N.

    1986-01-01

    The requirements are identified for a very high order natural language to be used by crew members on board the Space Station. The hardware facilities, databases, realtime processes, and software support are discussed. The operations and capabilities that will be required in both normal (routine) and abnormal (nonroutine) situations are evaluated. A structure and syntax for an interface (front-end) language to satisfy the above requirements are recommended.

  4. Natural Environment Modeling and Fault-Diagnosis for Automated Agricultural Vehicle

    DEFF Research Database (Denmark)

    Blas, Morten Rufus; Blanke, Mogens

    2008-01-01

    This paper presents results for an automatic navigation system for agricultural vehicles. The system uses stereo-vision, inertial sensors and GPS. Special emphasis has been placed on modeling the natural environment in conjunction with a fault-tolerant navigation system. The results are exemplified...... by an agricultural vehicle following cut grass (swath). It is demonstrated how faults in the system can be detected and diagnosed using state of the art techniques from fault-tolerant literature. Results in performing fault-diagnosis and fault accomodation are presented using real data....

  5. Automation of a gamma spectrometric analysis method for naturally occuring radionuclides in different materials (NORM)

    International Nuclear Information System (INIS)

    Marzocchi, Olaf

    2009-06-01

    This work presents an improvement over the standard analysis routine used in the Physikalisches Messlabor to detect gamma peaks in spectra from naturally occurring radioactive materials (NORM). The new routine introduces the use of custom libraries of known gamma peaks, in order to ease the work of the software than can therefore detect more peaks. As final result, the user performing the analysis has less chances of making errors and can also analyse more spectra in the same amount of time. A new software, with an optimised interface able to further enhance the productivity of the user, is developed and validated. (orig.)

  6. CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines.

    Science.gov (United States)

    Soysal, Ergin; Wang, Jingqi; Jiang, Min; Wu, Yonghui; Pakhomov, Serguei; Liu, Hongfang; Xu, Hua

    2017-11-24

    Existing general clinical natural language processing (NLP) systems such as MetaMap and Clinical Text Analysis and Knowledge Extraction System have been successfully applied to information extraction from clinical text. However, end users often have to customize existing systems for their individual tasks, which can require substantial NLP skills. Here we present CLAMP (Clinical Language Annotation, Modeling, and Processing), a newly developed clinical NLP toolkit that provides not only state-of-the-art NLP components, but also a user-friendly graphic user interface that can help users quickly build customized NLP pipelines for their individual applications. Our evaluation shows that the CLAMP default pipeline achieved good performance on named entity recognition and concept encoding. We also demonstrate the efficiency of the CLAMP graphic user interface in building customized, high-performance NLP pipelines with 2 use cases, extracting smoking status and lab test values. CLAMP is publicly available for research use, and we believe it is a unique asset for the clinical NLP community. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  7. Towards symbiosis in knowledge representation and natural language processing for structuring clinical practice guidelines.

    Science.gov (United States)

    Weng, Chunhua; Payne, Philip R O; Velez, Mark; Johnson, Stephen B; Bakken, Suzanne

    2014-01-01

    The successful adoption by clinicians of evidence-based clinical practice guidelines (CPGs) contained in clinical information systems requires efficient translation of free-text guidelines into computable formats. Natural language processing (NLP) has the potential to improve the efficiency of such translation. However, it is laborious to develop NLP to structure free-text CPGs using existing formal knowledge representations (KR). In response to this challenge, this vision paper discusses the value and feasibility of supporting symbiosis in text-based knowledge acquisition (KA) and KR. We compare two ontologies: (1) an ontology manually created by domain experts for CPG eligibility criteria and (2) an upper-level ontology derived from a semantic pattern-based approach for automatic KA from CPG eligibility criteria text. Then we discuss the strengths and limitations of interweaving KA and NLP for KR purposes and important considerations for achieving the symbiosis of KR and NLP for structuring CPGs to achieve evidence-based clinical practice.

  8. On application of image analysis and natural language processing for music search

    Science.gov (United States)

    Gwardys, Grzegorz

    2013-10-01

    In this paper, I investigate a problem of finding most similar music tracks using, popular in Natural Language Processing, techniques like: TF-IDF and LDA. I de ned document as music track. Each music track is transformed to spectrogram, thanks that, I can use well known techniques to get words from images. I used SURF operation to detect characteristic points and novel approach for their description. The standard kmeans was used for clusterization. Clusterization is here identical with dictionary making, so after that I can transform spectrograms to text documents and perform TF-IDF and LDA. At the final, I can make a query in an obtained vector space. The research was done on 16 music tracks for training and 336 for testing, that are splitted in four categories: Hiphop, Jazz, Metal and Pop. Although used technique is completely unsupervised, results are satisfactory and encouraging to further research.

  9. Workshop on using natural language processing applications for enhancing clinical decision making: an executive summary.

    Science.gov (United States)

    Pai, Vinay M; Rodgers, Mary; Conroy, Richard; Luo, James; Zhou, Ruixia; Seto, Belinda

    2014-02-01

    In April 2012, the National Institutes of Health organized a two-day workshop entitled 'Natural Language Processing: State of the Art, Future Directions and Applications for Enhancing Clinical Decision-Making' (NLP-CDS). This report is a summary of the discussions during the second day of the workshop. Collectively, the workshop presenters and participants emphasized the need for unstructured clinical notes to be included in the decision making workflow and the need for individualized longitudinal data tracking. The workshop also discussed the need to: (1) combine evidence-based literature and patient records with machine-learning and prediction models; (2) provide trusted and reproducible clinical advice; (3) prioritize evidence and test results; and (4) engage healthcare professionals, caregivers, and patients. The overall consensus of the NLP-CDS workshop was that there are promising opportunities for NLP and CDS to deliver cognitive support for healthcare professionals, caregivers, and patients.

  10. Accurate Identification of Fatty Liver Disease in Data Warehouse Utilizing Natural Language Processing.

    Science.gov (United States)

    Redman, Joseph S; Natarajan, Yamini; Hou, Jason K; Wang, Jingqi; Hanif, Muzammil; Feng, Hua; Kramer, Jennifer R; Desiderio, Roxanne; Xu, Hua; El-Serag, Hashem B; Kanwal, Fasiha

    2017-10-01

    Natural language processing is a powerful technique of machine learning capable of maximizing data extraction from complex electronic medical records. We utilized this technique to develop algorithms capable of "reading" full-text radiology reports to accurately identify the presence of fatty liver disease. Abdominal ultrasound, computerized tomography, and magnetic resonance imaging reports were retrieved from the Veterans Affairs Corporate Data Warehouse from a random national sample of 652 patients. Radiographic fatty liver disease was determined by manual review by two physicians and verified with an expert radiologist. A split validation method was utilized for algorithm development. For all three imaging modalities, the algorithms could identify fatty liver disease with >90% recall and precision, with F-measures >90%. These algorithms could be used to rapidly screen patient records to establish a large cohort to facilitate epidemiological and clinical studies and examine the clinic course and outcomes of patients with radiographic hepatic steatosis.

  11. Natural Language Processing in Serious Games: A state of the art.

    Directory of Open Access Journals (Sweden)

    Davide Picca

    2015-09-01

    Full Text Available In the last decades, Natural Language Processing (NLP has obtained a high level of success. Interactions between NLP and Serious Games have started and some of them already include NLP techniques. The objectives of this paper are twofold: on the one hand, providing a simple framework to enable analysis of potential uses of NLP in Serious Games and, on the other hand, applying the NLP framework to existing Serious Games and giving an overview of the use of NLP in pedagogical Serious Games. In this paper we present 11 serious games exploiting NLP techniques. We present them systematically, according to the following structure:  first, we highlight possible uses of NLP techniques in Serious Games, second, we describe the type of NLP implemented in the each specific Serious Game and, third, we provide a link to possible purposes of use for the different actors interacting in the Serious Game.

  12. Knowledge Extraction from MEDLINE by Combining Clustering with Natural Language Processing.

    Science.gov (United States)

    Miñarro-Giménez, Jose A; Kreuzthaler, Markus; Schulz, Stefan

    2015-01-01

    The identification of relevant predicates between co-occurring concepts in scientific literature databases like MEDLINE is crucial for using these sources for knowledge extraction, in order to obtain meaningful biomedical predications as subject-predicate-object triples. We consider the manually assigned MeSH indexing terms (main headings and subheadings) in MEDLINE records as a rich resource for extracting a broad range of domain knowledge. In this paper, we explore the combination of a clustering method for co-occurring concepts based on their related MeSH subheadings in MEDLINE with the use of SemRep, a natural language processing engine, which extracts predications from free text documents. As a result, we generated sets of clusters of co-occurring concepts and identified the most significant predicates for each cluster. The association of such predicates with the co-occurrences of the resulting clusters produces the list of predications, which were checked for relevance.

  13. Harmonization and development of resources and tools for Italian natural language processing within the PARLI project

    CERN Document Server

    Bosco, Cristina; Delmonte, Rodolfo; Moschitti, Alessandro; Simi, Maria

    2015-01-01

    The papers collected in this volume are selected as a sample of the progress in Natural Language Processing (NLP) performed within the Italian NLP community and especially attested by the PARLI project. PARLI (Portale per l’Accesso alle Risorse in Lingua Italiana) is a project partially funded by the Ministero Italiano per l’Università e la Ricerca (PRIN 2008) from 2008 to 2012 for monitoring and fostering the harmonic growth and coordination of the activities of Italian NLP. It was proposed by various teams of researchers working in Italian universities and research institutions. According to the spirit of the PARLI project, most of the resources and tools created within the project and here described are freely distributed and they did not terminate their life at the end of the project itself, hoping they could be a key factor in future development of computational linguistics.

  14. Integrating natural language processing and web GIS for interactive knowledge domain visualization

    Science.gov (United States)

    Du, Fangming

    Recent years have seen a powerful shift towards data-rich environments throughout society. This has extended to a change in how the artifacts and products of scientific knowledge production can be analyzed and understood. Bottom-up approaches are on the rise that combine access to huge amounts of academic publications with advanced computer graphics and data processing tools, including natural language processing. Knowledge domain visualization is one of those multi-technology approaches, with its aim of turning domain-specific human knowledge into highly visual representations in order to better understand the structure and evolution of domain knowledge. For example, network visualizations built from co-author relations contained in academic publications can provide insight on how scholars collaborate with each other in one or multiple domains, and visualizations built from the text content of articles can help us understand the topical structure of knowledge domains. These knowledge domain visualizations need to support interactive viewing and exploration by users. Such spatialization efforts are increasingly looking to geography and GIS as a source of metaphors and practical technology solutions, even when non-georeferenced information is managed, analyzed, and visualized. When it comes to deploying spatialized representations online, web mapping and web GIS can provide practical technology solutions for interactive viewing of knowledge domain visualizations, from panning and zooming to the overlay of additional information. This thesis presents a novel combination of advanced natural language processing - in the form of topic modeling - with dimensionality reduction through self-organizing maps and the deployment of web mapping/GIS technology towards intuitive, GIS-like, exploration of a knowledge domain visualization. A complete workflow is proposed and implemented that processes any corpus of input text documents into a map form and leverages a web

  15. A Natural Language Intelligent Tutoring System for Training Pathologists - Implementation and Evaluation

    Science.gov (United States)

    El Saadawi, Gilan M.; Tseytlin, Eugene; Legowski, Elizabeth; Jukic, Drazen; Castine, Melissa; Fine, Jeffrey; Gormley, Robert; Crowley, Rebecca S.

    2009-01-01

    Introduction We developed and evaluated a Natural Language Interface (NLI) for an Intelligent Tutoring System (ITS) in Diagnostic Pathology. The system teaches residents to examine pathologic slides and write accurate pathology reports while providing immediate feedback on errors they make in their slide review and diagnostic reports. Residents can ask for help at any point in the case, and will receive context-specific feedback. Research Questions We evaluated (1) the performance of our natural language system, (2) the effect of the system on learning (3) the effect of feedback timing on learning gains and (4) the effect of ReportTutor on performance to self-assessment correlations. Methods The study uses a crossover 2×2 factorial design. We recruited 20 subjects from 4 academic programs. Subjects were randomly assigned to one of the four conditions - two conditions for the immediate interface, and two for the delayed interface. An expert dermatopathologist created a reference standard and 2 board certified AP/CP pathology fellows manually coded the residents' assessment reports. Subjects were given the opportunity to self grade their performance and we used a survey to determine student response to both interfaces. Results Our results show a highly significant improvement in report writing after one tutoring session with 4-fold increase in the learning gains with both interfaces but no effect of feedback timing on performance gains. Residents who used the immediate feedback interface first experienced a feature learning gain that is correlated with the number of cases they viewed. There was no correlation between performance and self-assessment in either condition. PMID:17934789

  16. Automated Detection of Buildings from Heterogeneous VHR Satellite Images for Rapid Response to Natural Disasters

    Directory of Open Access Journals (Sweden)

    Shaodan Li

    2017-11-01

    Full Text Available In this paper, we present a novel approach for automatically detecting buildings from multiple heterogeneous and uncalibrated very high-resolution (VHR satellite images for a rapid response to natural disasters. In the proposed method, a simple and efficient visual attention method is first used to extract built-up area candidates (BACs from each multispectral (MS satellite image. After this, morphological building indices (MBIs are extracted from all the masked panchromatic (PAN and MS images with BACs to characterize the structural features of buildings. Finally, buildings are automatically detected in a hierarchical probabilistic model by fusing the MBI and masked PAN images. The experimental results show that the proposed method is comparable to supervised classification methods in terms of recall, precision and F-value.

  17. Wikipedia and medicine: quantifying readership, editors, and the significance of natural language.

    Science.gov (United States)

    Heilman, James M; West, Andrew G

    2015-03-04

    Wikipedia is a collaboratively edited encyclopedia. One of the most popular websites on the Internet, it is known to be a frequently used source of health care information by both professionals and the lay public. This paper quantifies the production and consumption of Wikipedia's medical content along 4 dimensions. First, we measured the amount of medical content in both articles and bytes and, second, the citations that supported that content. Third, we analyzed the medical readership against that of other health care websites between Wikipedia's natural language editions and its relationship with disease prevalence. Fourth, we surveyed the quantity/characteristics of Wikipedia's medical contributors, including year-over-year participation trends and editor demographics. Using a well-defined categorization infrastructure, we identified medically pertinent English-language Wikipedia articles and links to their foreign language equivalents. With these, Wikipedia can be queried to produce metadata and full texts for entire article histories. Wikipedia also makes available hourly reports that aggregate reader traffic at per-article granularity. An online survey was used to determine the background of contributors. Standard mining and visualization techniques (eg, aggregation queries, cumulative distribution functions, and/or correlation metrics) were applied to each of these datasets. Analysis focused on year-end 2013, but historical data permitted some longitudinal analysis. Wikipedia's medical content (at the end of 2013) was made up of more than 155,000 articles and 1 billion bytes of text across more than 255 languages. This content was supported by more than 950,000 references. Content was viewed more than 4.88 billion times in 2013. This makes it one of if not the most viewed medical resource(s) globally. The core editor community numbered less than 300 and declined over the past 5 years. The members of this community were half health care providers and 85

  18. Processamento de linguagem natural para indexação automática semântico-ontológica

    OpenAIRE

    Câmara Júnior, Auto Tavares da

    2013-01-01

    A pesquisa propõe uma arquitetura de indexação automática de documentos utilizando mecanismos de processamento de linguagem natural em nível semântico. Por meio do arranjo de ferramentas e recursos existentes, agregado ao desenvolvimento de software para integração, é construído um sistema de indexação automática que utiliza conhecimento modelado em uma ontologia para análise semântica. A aplicação da arquitetura é exemplificada e posta à prova em um conjunto de laudos periciais de crimes ...

  19. The role of automation and artificial intelligence

    Science.gov (United States)

    Schappell, R. T.

    1983-07-01

    Consideration is given to emerging technologies that are not currently in common use, yet will be mature enough for implementation in a space station. Artificial intelligence (AI) will permit more autonomous operation and improve the man-machine interfaces. Technology goals include the development of expert systems, a natural language query system, automated planning systems, and AI image understanding systems. Intelligent robots and teleoperators will be needed, together with improved sensory systems for the robotics, housekeeping, vehicle control, and spacecraft housekeeping systems. Finally, NASA is developing the ROBSIM computer program to evaluate level of automation, perform parametric studies and error analyses, optimize trajectories and control systems, and assess AI technology.

  20. A UMLS-based spell checker for natural language processing in vaccine safety.

    Science.gov (United States)

    Tolentino, Herman D; Matters, Michael D; Walop, Wikke; Law, Barbara; Tong, Wesley; Liu, Fang; Fontelo, Paul; Kohl, Katrin; Payne, Daniel C

    2007-02-12

    The Institute of Medicine has identified patient safety as a key goal for health care in the United States. Detecting vaccine adverse events is an important public health activity that contributes to patient safety. Reports about adverse events following immunization (AEFI) from surveillance systems contain free-text components that can be analyzed using natural language processing. To extract Unified Medical Language System (UMLS) concepts from free text and classify AEFI reports based on concepts they contain, we first needed to clean the text by expanding abbreviations and shortcuts and correcting spelling errors. Our objective in this paper was to create a UMLS-based spelling error correction tool as a first step in the natural language processing (NLP) pipeline for AEFI reports. We developed spell checking algorithms using open source tools. We used de-identified AEFI surveillance reports to create free-text data sets for analysis. After expansion of abbreviated clinical terms and shortcuts, we performed spelling correction in four steps: (1) error detection, (2) word list generation, (3) word list disambiguation and (4) error correction. We then measured the performance of the resulting spell checker by comparing it to manual correction. We used 12,056 words to train the spell checker and tested its performance on 8,131 words. During testing, sensitivity, specificity, and positive predictive value (PPV) for the spell checker were 74% (95% CI: 74-75), 100% (95% CI: 100-100), and 47% (95% CI: 46%-48%), respectively. We created a prototype spell checker that can be used to process AEFI reports. We used the UMLS Specialist Lexicon as the primary source of dictionary terms and the WordNet lexicon as a secondary source. We used the UMLS as a domain-specific source of dictionary terms to compare potentially misspelled words in the corpus. The prototype sensitivity was comparable to currently available tools, but the specificity was much superior. The slow processing

  1. A UMLS-based spell checker for natural language processing in vaccine safety

    Directory of Open Access Journals (Sweden)

    Liu Fang

    2007-02-01

    Full Text Available Abstract Background The Institute of Medicine has identified patient safety as a key goal for health care in the United States. Detecting vaccine adverse events is an important public health activity that contributes to patient safety. Reports about adverse events following immunization (AEFI from surveillance systems contain free-text components that can be analyzed using natural language processing. To extract Unified Medical Language System (UMLS concepts from free text and classify AEFI reports based on concepts they contain, we first needed to clean the text by expanding abbreviations and shortcuts and correcting spelling errors. Our objective in this paper was to create a UMLS-based spelling error correction tool as a first step in the natural language processing (NLP pipeline for AEFI reports. Methods We developed spell checking algorithms using open source tools. We used de-identified AEFI surveillance reports to create free-text data sets for analysis. After expansion of abbreviated clinical terms and shortcuts, we performed spelling correction in four steps: (1 error detection, (2 word list generation, (3 word list disambiguation and (4 error correction. We then measured the performance of the resulting spell checker by comparing it to manual correction. Results We used 12,056 words to train the spell checker and tested its performance on 8,131 words. During testing, sensitivity, specificity, and positive predictive value (PPV for the spell checker were 74% (95% CI: 74–75, 100% (95% CI: 100–100, and 47% (95% CI: 46%–48%, respectively. Conclusion We created a prototype spell checker that can be used to process AEFI reports. We used the UMLS Specialist Lexicon as the primary source of dictionary terms and the WordNet lexicon as a secondary source. We used the UMLS as a domain-specific source of dictionary terms to compare potentially misspelled words in the corpus. The prototype sensitivity was comparable to currently available

  2. Creation of a simple natural language processing tool to support an imaging utilization quality dashboard.

    Science.gov (United States)

    Swartz, Jordan; Koziatek, Christian; Theobald, Jason; Smith, Silas; Iturrate, Eduardo

    2017-05-01

    Testing for venous thromboembolism (VTE) is associated with cost and risk to patients (e.g. radiation). To assess the appropriateness of imaging utilization at the provider level, it is important to know that provider's diagnostic yield (percentage of tests positive for the diagnostic entity of interest). However, determining diagnostic yield typically requires either time-consuming, manual review of radiology reports or the use of complex and/or proprietary natural language processing software. The objectives of this study were twofold: 1) to develop and implement a simple, user-configurable, and open-source natural language processing tool to classify radiology reports with high accuracy and 2) to use the results of the tool to design a provider-specific VTE imaging dashboard, consisting of both utilization rate and diagnostic yield. Two physicians reviewed a training set of 400 lower extremity ultrasound (UTZ) and computed tomography pulmonary angiogram (CTPA) reports to understand the language used in VTE-positive and VTE-negative reports. The insights from this review informed the arguments to the five modifiable parameters of the NLP tool. A validation set of 2,000 studies was then independently classified by the reviewers and by the tool; the classifications were compared and the performance of the tool was calculated. The tool was highly accurate in classifying the presence and absence of VTE for both the UTZ (sensitivity 95.7%; 95% CI 91.5-99.8, specificity 100%; 95% CI 100-100) and CTPA reports (sensitivity 97.1%; 95% CI 94.3-99.9, specificity 98.6%; 95% CI 97.8-99.4). The diagnostic yield was then calculated at the individual provider level and the imaging dashboard was created. We have created a novel NLP tool designed for users without a background in computer programming, which has been used to classify venous thromboembolism reports with a high degree of accuracy. The tool is open-source and available for download at http

  3. Automated Guidance for Thermodynamics Essays: Critiquing versus Revisiting

    Science.gov (United States)

    Donnelly, Dermot F.; Vitale, Jonathan M.; Linn, Marcia C.

    2015-01-01

    Middle school students struggle to explain thermodynamics concepts. In this study, to help students succeed, we use a natural language processing program to analyze their essays explaining the aspects of thermodynamics and provide guidance based on the automated score. The 346 sixth-grade students were assigned to either the critique condition…

  4. Effects of Automated Tier 2 Storybook Intervention on Vocabulary and Comprehension Learning in Preschool Children with Limited Oral Language Skills

    Science.gov (United States)

    Kelley, Elizabeth Spencer; Goldstein, Howard; Spencer, Trina D.; Sherman, Amber

    2015-01-01

    This early efficacy study examined the effects of an automated storybook intervention designed to promote school readiness among at-risk prekindergarten children. Story Friends is a small-group intervention in which vocabulary and question-answering lessons are embedded in a series of storybooks.A randomized group design with an embedded…

  5. Automated cleaning of fan coil units with a natural detergent-disinfectant product

    Directory of Open Access Journals (Sweden)

    Di Onofrio Valeria

    2010-10-01

    Full Text Available Abstract Background Air conditioning systems represent one important source of microbial pollutants for indoor air. In the past few years, numerous strategies have been conceived to reduce the contamination of air conditioners, mainly in hospital settings. The biocidal detergent BATT2 represents a natural product obtained through extraction from brown seaweeds, that has been tested previously on multidrug-resistant microorganisms. Methods BATT2 has been utilized for the disinfection of fan coil units from four air conditioning systems located in hospital environments with a mean degree of risk. Samples were collected from the air supplied by the conditioning systems and from the surfaces of fan coil units, before and after sanitization procedures. Total microbial counts at 37°C and 22°C and mycotic count at 32°C were evaluated. Staphylococci and Pseudomonas aeruginosa were also detected on surfaces samples. Results The biodetergent was able to reduce up 50% of the microbial pollution of fan coil units surfaces and air supplied by the air conditioners. Conclusions BATT2 could be considered for cleaning/disinfection of air conditioning systems, that should be performed on the basis of accurate and verifiable sanitization protocols.

  6. Automation in College Libraries.

    Science.gov (United States)

    Werking, Richard Hume

    1991-01-01

    Reports the results of a survey of the "Bowdoin List" group of liberal arts colleges. The survey obtained information about (1) automation modules in place and when they had been installed; (2) financing of automation and its impacts on the library budgets; and (3) library director's views on library automation and the nature of the…

  7. Image statistics of American Sign Language: comparison with faces and natural scenes

    Science.gov (United States)

    Bosworth, Rain G.; Bartlett, Marian Stewart; Dobkins, Karen R.

    2006-09-01

    Several lines of evidence suggest that the image statistics of the environment shape visual abilities. To date, the image statistics of natural scenes and faces have been well characterized using Fourier analysis. We employed Fourier analysis to characterize images of signs in American Sign Language (ASL). These images are highly relevant to signers who rely on ASL for communication, and thus the image statistics of ASL might influence signers' visual abilities. Fourier analysis was conducted on 105 static images of signs, and these images were compared with analyses of 100 natural scene images and 100 face images. We obtained two metrics from our Fourier analysis: mean amplitude and entropy of the amplitude across the image set (which is a measure from information theory) as a function of spatial frequency and orientation. The results of our analyses revealed interesting differences in image statistics across the three different image sets, setting up the possibility that ASL experience may alter visual perception in predictable ways. In addition, for all image sets, the mean amplitude results were markedly different from the entropy results, which raises the interesting question of which aspect of an image set (mean amplitude or entropy of the amplitude) is better able to account for known visual abilities.

  8. Statistical Learning in a Natural Language by 8-Month-Old Infants

    Science.gov (United States)

    Pelucchi, Bruna; Hay, Jessica F.; Saffran, Jenny R.

    2009-01-01

    Numerous studies over the past decade support the claim that infants are equipped with powerful statistical language learning mechanisms. The primary evidence for statistical language learning in word segmentation comes from studies using artificial languages, continuous streams of synthesized syllables that are highly simplified relative to real…

  9. Natural language processing pipelines to annotate BioC collections with an application to the NCBI disease corpus.

    Science.gov (United States)

    Comeau, Donald C; Liu, Haibin; Islamaj Doğan, Rezarta; Wilbur, W John

    2014-01-01

    BioC is a new format and associated code libraries for sharing text and annotations. We have implemented BioC natural language preprocessing pipelines in two popular programming languages: C++ and Java. The current implementations interface with the well-known MedPost and Stanford natural language processing tool sets. The pipeline functionality includes sentence segmentation, tokenization, part-of-speech tagging, lemmatization and sentence parsing. These pipelines can be easily integrated along with other BioC programs into any BioC compliant text mining systems. As an application, we converted the NCBI disease corpus to BioC format, and the pipelines have successfully run on this corpus to demonstrate their functionality. Code and data can be downloaded from http://bioc.sourceforge.net. Database URL: http://bioc.sourceforge.net. © The Author(s) 2014. Published by Oxford University Press.

  10. AIED 2009 Workshops Proceeedings Volume 10: Natural Language Processing in Support of Learning: Metrics, Feedback and Connectivity

    NARCIS (Netherlands)

    Dessus, Philippe; Trausan-Matu, Stefan; Van Rosmalen, Peter; Wild, Fridolin

    2009-01-01

    Dessus, P., Trausan-Matu, S., Van Rosmalen, P., & Wild, F. (Eds.) (2009). AIED 2009 Workshops Proceedings Volume 10 Natural Language Processing in Support of Learning: Metrics, Feedback and Connectivity. In S. D. Craig & D. Dicheva (Eds.), AIED 2009: 14th International Conference in Artificial

  11. Voice-enabled Knowledge Engine using Flood Ontology and Natural Language Processing

    Science.gov (United States)

    Sermet, M. Y.; Demir, I.; Krajewski, W. F.

    2015-12-01

    The Iowa Flood Information System (IFIS) is a web-based platform developed by the Iowa Flood Center (IFC) to provide access to flood inundation maps, real-time flood conditions, flood forecasts, flood-related data, information and interactive visualizations for communities in Iowa. The IFIS is designed for use by general public, often people with no domain knowledge and limited general science background. To improve effective communication with such audience, we have introduced a voice-enabled knowledge engine on flood related issues in IFIS. Instead of navigating within many features and interfaces of the information system and web-based sources, the system provides dynamic computations based on a collection of built-in data, analysis, and methods. The IFIS Knowledge Engine connects to real-time stream gauges, in-house data sources, analysis and visualization tools to answer natural language questions. Our goal is the systematization of data and modeling results on flood related issues in Iowa, and to provide an interface for definitive answers to factual queries. The goal of the knowledge engine is to make all flood related knowledge in Iowa easily accessible to everyone, and support voice-enabled natural language input. We aim to integrate and curate all flood related data, implement analytical and visualization tools, and make it possible to compute answers from questions. The IFIS explicitly implements analytical methods and models, as algorithms, and curates all flood related data and resources so that all these resources are computable. The IFIS Knowledge Engine computes the answer by deriving it from its computational knowledge base. The knowledge engine processes the statement, access data warehouse, run complex database queries on the server-side and return outputs in various formats. This presentation provides an overview of IFIS Knowledge Engine, its unique information interface and functionality as an educational tool, and discusses the future plans

  12. Towards an Automated Requirements-driven Development of Smart Cyber-Physical Systems

    Directory of Open Access Journals (Sweden)

    Jiri Vinarek

    2016-03-01

    Full Text Available The Invariant Refinement Method for Self Adaptation (IRM-SA is a design method targeting development of smart Cyber-Physical Systems (sCPS. It allows for a systematic translation of the system requirements into the system architecture expressed as an ensemble-based component system (EBCS. However, since the requirements are captured using natural language, there exists the danger of their misinterpretation due to natural language requirements' ambiguity, which could eventually lead to design errors. Thus, automation and validation of the design process is desirable. In this paper, we (i analyze the translation process of natural language requirements into the IRM-SA model, (ii identify individual steps that can be automated and/or validated using natural language processing techniques, and (iii propose suitable methods.

  13. Automated illustration of patients instructions.

    Science.gov (United States)

    Bui, Duy; Nakamura, Carlos; Bray, Bruce E; Zeng-Treitler, Qing

    2012-01-01

    A picture can be a powerful communication tool. However, creating pictures to illustrate patient instructions can be a costly and time-consuming task. Building on our prior research in this area, we developed a computer application that automatically converts text to pictures using natural language processing and computer graphics techniques. After iterative testing, the automated illustration system was evaluated using 49 previously unseen cardiology discharge instructions. The completeness of the system-generated illustrations was assessed by three raters using a three-level scale. The average inter-rater agreement for text correctly represented in the pictograph was about 66 percent. Since illustration in this context is intended to enhance rather than replace text, these results support the feasibility of conducting automated illustration.

  14. Surmounting the Tower of Babel: Monolingual and bilingual 2-year-olds' understanding of the nature of foreign language words.

    Science.gov (United States)

    Byers-Heinlein, Krista; Chen, Ke Heng; Xu, Fei

    2014-03-01

    Languages function as independent and distinct conventional systems, and so each language uses different words to label the same objects. This study investigated whether 2-year-old children recognize that speakers of their native language and speakers of a foreign language do not share the same knowledge. Two groups of children unfamiliar with Mandarin were tested: monolingual English-learning children (n=24) and bilingual children learning English and another language (n=24). An English speaker taught children the novel label fep. On English mutual exclusivity trials, the speaker asked for the referent of a novel label (wug) in the presence of the fep and a novel object. Both monolingual and bilingual children disambiguated the reference of the novel word using a mutual exclusivity strategy, choosing the novel object rather than the fep. On similar trials with a Mandarin speaker, children were asked to find the referent of a novel Mandarin label kuò. Monolinguals again chose the novel object rather than the object with the English label fep, even though the Mandarin speaker had no access to conventional English words. Bilinguals did not respond systematically to the Mandarin speaker, suggesting that they had enhanced understanding of the Mandarin speaker's ignorance of English words. The results indicate that monolingual children initially expect words to be conventionally shared across all speakers-native and foreign. Early bilingual experience facilitates children's discovery of the nature of foreign language words. Copyright © 2013 Elsevier Inc. All rights reserved.

  15. Water Relationships in the U.S. Southwest: Characterizing Water Management Networks Using Natural Language Processing

    Directory of Open Access Journals (Sweden)

    John T. Murphy

    2014-06-01

    Full Text Available Natural language processing (NLP and named entity recognition (NER techniques are applied to collections of newspaper articles from four cities in the U.S. Southwest. The results are used to generate a network of water management institutions that reflect public perceptions of water management and the structure of water management in these areas. This structure can be highly centralized or fragmented; in the latter case, multiple peer institutions exist that may cooperate or be in conflict. This is reflected in the public discourse of the water consumers in these areas and can, we contend, impact the potential responses of management agencies to challenges of water supply and quality and, in some cases, limit their effectiveness. Flagstaff, AZ, Tucson, AZ, Las Vegas, NV, and the Grand Valley, CO, are examined, including more than 110,000 articles from 2004–2012. Documents are scored by association with water topics, and phrases likely to be institutions are extracted via custom NLP and NER algorithms; those institutions associated with water-related documents are used to form networks via document co-location. The Grand Valley is shown to have a markedly different structure, which we contend reflects the different historical trajectory of its development and its current state, which includes multiple institutions of roughly equal scope and size. These results demonstrate the utility of using NLP and NER methods to understanding the structure and variation of water management systems.

  16. ReportTutor – An Intelligent Tutoring System that Uses a Natural Language Interface

    Science.gov (United States)

    Crowley, Rebecca S.; Tseytlin, Eugene; Jukic, Drazen

    2005-01-01

    ReportTutor is an extension to our work on Intelligent Tutoring Systems for visual diagnosis. ReportTutor combines a virtual microscope and a natural language interface to allow students to visually inspect a virtual slide as they type a diagnostic report on the case. The system monitors both actions in the virtual microscope interface as well as text created by the student in the reporting interface. It provides feedback about the correctness, completeness, and style of the report. ReportTutor uses MMTx with a custom data-source created with the NCI Metathesaurus. A separate ontology of cancer specific concepts is used to structure the domain knowledge needed for evaluation of the student’s input including co-reference resolution. As part of the early evaluation of the system, we collected data from 4 pathology residents who typed in their reports without the tutoring aspects of the system, and compared responses to an expert dermatopathologist. We analyzed the resulting reports to (1) identify the error rates and distribution among student reports, (2) determine the performance of the system in identifying features within student reports, and (3) measure the accuracy of the system in distinguishing between correct and incorrect report elements. PMID:16779024

  17. Bringing Chatbots into education: Towards Natural Language Negotiation of Open Learner Models

    Science.gov (United States)

    Kerlyl, Alice; Hall, Phil; Bull, Susan

    There is an extensive body of work on Intelligent Tutoring Systems: computer environments for education, teaching and training that adapt to the needs of the individual learner. Work on personalisation and adaptivity has included research into allowing the student user to enhance the system's adaptivity by improving the accuracy of the underlying learner model. Open Learner Modelling, where the system's model of the user's knowledge is revealed to the user, has been proposed to support student reflection on their learning. Increased accuracy of the learner model can be obtained by the student and system jointly negotiating the learner model. We present the initial investigations into a system to allow people to negotiate the model of their understanding of a topic in natural language. This paper discusses the development and capabilities of both conversational agents (or chatbots) and Intelligent Tutoring Systems, in particular Open Learner Modelling. We describe a Wizard-of-Oz experiment to investigate the feasibility of using a chatbot to support negotiation, and conclude that a fusion of the two fields can lead to developing negotiation techniques for chatbots and the enhancement of the Open Learner Model. This technology, if successful, could have widespread application in schools, universities and other training scenarios.

  18. Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera

    Directory of Open Access Journals (Sweden)

    Jiatong Bao

    2016-12-01

    Full Text Available Controlling robots by natural language (NL is increasingly attracting attention for its versatility, convenience and no need of extensive training for users. Grounding is a crucial challenge of this problem to enable robots to understand NL instructions from humans. This paper mainly explores the object grounding problem and concretely studies how to detect target objects by the NL instructions using an RGB-D camera in robotic manipulation applications. In particular, a simple yet robust vision algorithm is applied to segment objects of interest. With the metric information of all segmented objects, the object attributes and relations between objects are further extracted. The NL instructions that incorporate multiple cues for object specifications are parsed into domain-specific annotations. The annotations from NL and extracted information from the RGB-D camera are matched in a computational state estimation framework to search all possible object grounding states. The final grounding is accomplished by selecting the states which have the maximum probabilities. An RGB-D scene dataset associated with different groups of NL instructions based on different cognition levels of the robot are collected. Quantitative evaluations on the dataset illustrate the advantages of the proposed method. The experiments of NL controlled object manipulation and NL-based task programming using a mobile manipulator show its effectiveness and practicability in robotic applications.

  19. Predicting judicial decisions of the European Court of Human Rights: a Natural Language Processing perspective

    Directory of Open Access Journals (Sweden)

    Nikolaos Aletras

    2016-10-01

    Full Text Available Recent advances in Natural Language Processing and Machine Learning provide us with the tools to build predictive models that can be used to unveil patterns driving judicial decisions. This can be useful, for both lawyers and judges, as an assisting tool to rapidly identify cases and extract patterns which lead to certain decisions. This paper presents the first systematic study on predicting the outcome of cases tried by the European Court of Human Rights based solely on textual content. We formulate a binary classification task where the input of our classifiers is the textual content extracted from a case and the target output is the actual judgment as to whether there has been a violation of an article of the convention of human rights. Textual information is represented using contiguous word sequences, i.e., N-grams, and topics. Our models can predict the court’s decisions with a strong accuracy (79% on average. Our empirical analysis indicates that the formal facts of a case are the most important predictive factor. This is consistent with the theory of legal realism suggesting that judicial decision-making is significantly affected by the stimulus of the facts. We also observe that the topical content of a case is another important feature in this classification task and explore this relationship further by conducting a qualitative analysis.

  20. Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited

    Directory of Open Access Journals (Sweden)

    Łukasz Dębowski

    2018-01-01

    Full Text Available As we discuss, a stationary stochastic process is nonergodic when a random persistent topic can be detected in the infinite random text sampled from the process, whereas we call the process strongly nonergodic when an infinite sequence of independent random bits, called probabilistic facts, is needed to describe this topic completely. Replacing probabilistic facts with an algorithmically random sequence of bits, called algorithmic facts, we adapt this property back to ergodic processes. Subsequently, we call a process perigraphic if the number of algorithmic facts which can be inferred from a finite text sampled from the process grows like a power of the text length. We present a simple example of such a process. Moreover, we demonstrate an assertion which we call the theorem about facts and words. This proposition states that the number of probabilistic or algorithmic facts which can be inferred from a text drawn from a process must be roughly smaller than the number of distinct word-like strings detected in this text by means of the Prediction by Partial Matching (PPM compression algorithm. We also observe that the number of the word-like strings for a sample of plays by Shakespeare follows an empirical stepwise power law, in a stark contrast to Markov processes. Hence, we suppose that natural language considered as a process is not only non-Markov but also perigraphic.

  1. Rethinking information delivery: using a natural language processing application for point-of-care data discovery.

    Science.gov (United States)

    Workman, T Elizabeth; Stoddart, Joan M

    2012-04-01

    This paper examines the use of Semantic MEDLINE, a natural language processing application enhanced with a statistical algorithm known as Combo, as a potential decision support tool for clinicians. Semantic MEDLINE summarizes text in PubMed citations, transforming it into compact declarations that are filtered according to a user's information need that can be displayed in a graphic interface. Integration of the Combo algorithm enables Semantic MEDLINE to deliver information salient to many diverse needs. The authors selected three disease topics and crafted PubMed search queries to retrieve citations addressing the prevention of these diseases. They then processed the citations with Semantic MEDLINE, with the Combo algorithm enhancement. To evaluate the results, they constructed a reference standard for each disease topic consisting of preventive interventions recommended by a commercial decision support tool. Semantic MEDLINE with Combo produced an average recall of 79% in primary and secondary analyses, an average precision of 45%, and a final average F-score of 0.57. This new approach to point-of-care information delivery holds promise as a decision support tool for clinicians. Health sciences libraries could implement such technologies to deliver tailored information to their users.

  2. Natural language processing to extract medical problems from electronic clinical documents: performance evaluation.

    Science.gov (United States)

    Meystre, Stéphane; Haug, Peter J

    2006-12-01

    In this study, we evaluate the performance of a Natural Language Processing (NLP) application designed to extract medical problems from narrative text clinical documents. The documents come from a patient's electronic medical record and medical problems are proposed for inclusion in the patient's electronic problem list. This application has been developed to help maintain the problem list and make it more accurate, complete, and up-to-date. The NLP part of this system-analyzed in this study-uses the UMLS MetaMap Transfer (MMTx) application and a negation detection algorithm called NegEx to extract 80 different medical problems selected for their frequency of use in our institution. When using MMTx with its default data set, we measured a recall of 0.74 and a precision of 0.756. A custom data subset for MMTx was created, making it faster and significantly improving the recall to 0.896 with a non-significant reduction in precision.

  3. Qualitative spatial logic descriptors from 3D indoor scenes to generate explanations in natural language.

    Science.gov (United States)

    Falomir, Zoe; Kluth, Thomas

    2017-06-24

    The challenge of describing 3D real scenes is tackled in this paper using qualitative spatial descriptors. A key point to study is which qualitative descriptors to use and how these qualitative descriptors must be organized to produce a suitable cognitive explanation. In order to find answers, a survey test was carried out with human participants which openly described a scene containing some pieces of furniture. The data obtained in this survey are analysed, and taking this into account, the QSn3D computational approach was developed which uses a XBox 360 Kinect to obtain 3D data from a real indoor scene. Object features are computed on these 3D data to identify objects in indoor scenes. The object orientation is computed, and qualitative spatial relations between the objects are extracted. These qualitative spatial relations are the input to a grammar which applies saliency rules obtained from the survey study and generates cognitive natural language descriptions of scenes. Moreover, these qualitative descriptors can be expressed as first-order logical facts in Prolog for further reasoning. Finally, a validation study is carried out to test whether the descriptions provided by QSn3D approach are human readable. The obtained results show that their acceptability is higher than 82%.

  4. EVALUATION OF SEMANTIC SIMILARITY FOR SENTENCES IN NATURAL LANGUAGE BY MATHEMATICAL STATISTICS METHODS

    Directory of Open Access Journals (Sweden)

    A. E. Pismak

    2016-03-01

    Full Text Available Subject of Research. The paper is focused on Wiktionary articles structural organization in the aspect of its usage as the base for semantic network. Wiktionary community references, article templates and articles markup features are analyzed. The problem of numerical estimation for semantic similarity of structural elements in Wiktionary articles is considered. Analysis of existing software for semantic similarity estimation of such elements is carried out; algorithms of their functioning are studied; their advantages and disadvantages are shown. Methods. Mathematical statistics methods were used to analyze Wiktionary articles markup features. The method of semantic similarity computing based on statistics data for compared structural elements was proposed.Main Results. We have concluded that there is no possibility for direct use of Wiktionary articles as the source for semantic network. We have proposed to find hidden similarity between article elements, and for that purpose we have developed the algorithm for calculation of confidence coefficients proving that each pair of sentences is semantically near. The research of quantitative and qualitative characteristics for the developed algorithm has shown its major performance advantage over the other existing solutions in the presence of insignificantly higher error rate. Practical Relevance. The resulting algorithm may be useful in developing tools for automatic Wiktionary articles parsing. The developed method could be used in computing of semantic similarity for short text fragments in natural language in case of algorithm performance requirements are higher than its accuracy specifications.

  5. Natural language processing using online analytic processing for assessing recommendations in radiology reports.

    Science.gov (United States)

    Dang, Pragya A; Kalra, Mannudeep K; Blake, Michael A; Schultz, Thomas J; Stout, Markus; Lemay, Paul R; Freshman, David J; Halpern, Elkan F; Dreyer, Keith J

    2008-03-01

    The study purpose was to describe the use of natural language processing (NLP) and online analytic processing (OLAP) for assessing patterns in recommendations in unstructured radiology reports on the basis of patient and imaging characteristics, such as age, gender, referring physicians, radiology subspecialty, modality, indications, diseases, and patient status (inpatient vs outpatient). A database of 4,279,179 radiology reports from a single tertiary health care center during a 10-year period (1995-2004) was created. The database includes reports of computed tomography, magnetic resonance imaging, fluoroscopy, nuclear medicine, ultrasound, radiography, mammography, angiography, special procedures, and unclassified imaging tests with patient demographics. A clinical data mining and analysis NLP program (Leximer, Nuance Inc, Burlington, Massachusetts) in conjunction with OLAP was used for classifying reports into those with recommendations (I(REC)) and without recommendations (N(REC)) for imaging and determining I(REC) rates for different patient age groups, gender, imaging modalities, indications, diseases, subspecialties, and referring physicians. In addition, temporal trends for I(REC) were also determined. There was a significant difference in the I(REC) rates in different age groups, varying between 4.8% (10-19 years) and 9.5% (>70 years) (P OLAP revealed considerable differences between recommendation trends for different imaging modalities and other patient and imaging characteristics.

  6. THE MEANING OF “TO BRING” IN CIACIA LANGUAGE: NATURAL SEMANTICS METALANGUAGE

    Directory of Open Access Journals (Sweden)

    La Yani

    2014-11-01

    Full Text Available This paper aims at investigating the meaning of “to bring” in Ciacia language based on the Natural Semantics Metalanguage (NSM theory, an approach to investigate various forms, structure, and meaning as the whole with the principle of “one form for one meaning and one meaning for one form”. The data were collected through interview and note taking techniques. The result of this study shows that meaning of “to bring” in Ciacia can be expressed by a number of lexicons. Each form has certain or distinctive meaning. First is suu meaning ‘to bring something by putting on head’; second are tongku and lemba meaning ‘to bring something by putting on shoulder’; third is temba meaning ‘to bring something by putting on chest’; fourth lexicon are solo/rongo meaning ‘to bring something by putting on the back’; fifth are bimbi and sele meaning ‘to bring something by putting on waist’; and lastly ntai, kopo, and tape meaning ‘to bring something by putting on finger’.

  7. Reliability of an Automated High-Resolution Manometry Analysis Program across Expert Users, Novice Users, and Speech-Language Pathologists

    Science.gov (United States)

    Jones, Corinne A.; Hoffman, Matthew R.; Geng, Zhixian; Abdelhalim, Suzan M.; Jiang, Jack J.; McCulloch, Timothy M.

    2014-01-01

    Purpose: The purpose of this study was to investigate inter- and intrarater reliability among expert users, novice users, and speech-language pathologists with a semiautomated high-resolution manometry analysis program. We hypothesized that all users would have high intrarater reliability and high interrater reliability. Method: Three expert…

  8. Automated dispersive liquid-liquid microextraction coupled to high performance liquid chromatography - cold vapour atomic fluorescence spectroscopy for the determination of mercury species in natural water samples.

    Science.gov (United States)

    Liu, Yao-Min; Zhang, Feng-Ping; Jiao, Bao-Yu; Rao, Jin-Yu; Leng, Geng

    2017-04-14

    An automated, home-constructed, and low cost dispersive liquid-liquid microextraction (DLLME) device that directly coupled to a high performance liquid chromatography (HPLC) - cold vapour atomic fluorescence spectroscopy (CVAFS) system was designed and developed for the determination of trace concentrations of methylmercury (MeHg + ), ethylmercury (EtHg + ) and inorganic mercury (Hg 2+ ) in natural waters. With a simple, miniaturized and efficient automated DLLME system, nanogram amounts of these mercury species were extracted from natural water samples and injected into a hyphenated HPLC-CVAFS for quantification. The complete analytical procedure, including chelation, extraction, phase separation, collection and injection of the extracts, as well as HPLC-CVAFS quantification, was automated. Key parameters, such as the type and volume of the chelation, extraction and dispersive solvent, aspiration speed, sample pH, salt effect and matrix effect, were thoroughly investigated. Under the optimum conditions, linear range was 10-1200ngL -1 for EtHg + and 5-450ngL -1 for MeHg + and Hg 2+ . Limits of detection were 3.0ngL -1 for EtHg + and 1.5ngL -1 for MeHg + and Hg 2+ . Reproducibility and recoveries were assessed by spiking three natural water samples with different Hg concentrations, giving recoveries from 88.4-96.1%, and relative standard deviations <5.1%. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Toward a Theory-Based Natural Language Capability in Robots and Other Embodied Agents: Evaluating Hausser's SLIM Theory and Database Semantics

    Science.gov (United States)

    Burk, Robin K.

    2010-01-01

    Computational natural language understanding and generation have been a goal of artificial intelligence since McCarthy, Minsky, Rochester and Shannon first proposed to spend the summer of 1956 studying this and related problems. Although statistical approaches dominate current natural language applications, two current research trends bring…

  10. LINGUISTIC ANALYSIS FOR THE BELARUSIAN CORPUS WITH THE APPLICATION OF NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING TECHNIQUES

    Directory of Open Access Journals (Sweden)

    Yu. S. Hetsevich

    2017-01-01

    Full Text Available The article focuses on the problems existing in text-to-speech synthesis. Different morphological, lexical and syntactical elements were localized with the help of the Belarusian unit of NooJ program. Those types of errors, which occur in Belarusian texts, were analyzed and corrected. Language model and part of speech tagging model were built. The natural language processing of Belarusian corpus with the help of developed algorithm using machine learning was carried out. The precision of developed models of machine learning has been 80–90 %. The dictionary was enriched with new words for the further using it in the systems of Belarusian speech synthesis.

  11. Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach.

    Science.gov (United States)

    Weng, Wei-Hung; Wagholikar, Kavishwar B; McCray, Alexa T; Szolovits, Peter; Chueh, Henry C

    2017-12-01

    The medical subdomain of a clinical note, such as cardiology or neurology, is useful content-derived metadata for developing machine learning downstream applications. To classify the medical subdomain of a note accurately, we have constructed a machine learning-based natural language processing (NLP) pipeline and developed medical subdomain classifiers based on the content of the note. We constructed the pipeline using the clinical NLP system, clinical Text Analysis and Knowledge Extraction System (cTAKES), the Unified Medical Language System (UMLS) Metathesaurus, Semantic Network, and learning algorithms to extract features from two datasets - clinical notes from Integrating Data for Analysis, Anonymization, and Sharing (iDASH) data repository (n = 431) and Massachusetts General Hospital (MGH) (n = 91,237), and built medical subdomain classifiers with different combinations of data representation methods and supervised learning algorithms. We evaluated the performance of classifiers and their portability across the two datasets. The convolutional recurrent neural network with neural word embeddings trained-medical subdomain classifier yielded the best performance measurement on iDASH and MGH datasets with area under receiver operating characteristic curve (AUC) of 0.975 and 0.991, and F1 scores of 0.845 and 0.870, respectively. Considering better clinical interpretability, linear support vector machine-trained medical subdomain classifier using hybrid bag-of-words and clinically relevant UMLS concepts as the feature representation, with term frequency-inverse document frequency (tf-idf)-weighting, outperformed other shallow learning classifiers on iDASH and MGH datasets with AUC of 0.957 and 0.964, and F1 scores of 0.932 and 0.934 respectively. We trained classifiers on one dataset, applied to the other dataset and yielded the threshold of F1 score of 0.7 in classifiers for half of the medical subdomains we studied. Our study shows that a supervised

  12. The application of computerized content analysis of natural language in psychotherapy research now and in the future.

    Science.gov (United States)

    Gottschalk, L A

    2000-01-01

    For many years the author and his colleagues have been involved in studying the roots and processes of the conveyance of semantic messages via spoken language and verbal texts. After establishing that reliable and valid measurements of highly relevant neuropsychiatric categories, such as anxiety, depression, and cognitive impairment, can be made by identifying and counting the occurrence per grammatical clause of language content and form categories typifying specific content-analysis scales, the research focus has turned towards computerizing this process of content analysis. This report summarizes the achievements and applications of the current empirical status of this method of computerized content analysis of natural language to psychotherapy research, and it speculates on possible future applications in the millennium.

  13. The written language of signals as a means of natural literacy of deaf children

    Directory of Open Access Journals (Sweden)

    Giovana Fracari Hautrive

    2010-10-01

    Full Text Available Taking the theme literacy of deaf children is currently directing the eye to the practice teaching course that demands beyond the school. Questions moving to daily practice, became a challenge, requiring an investigative attitude. The article aims to problematize the process of literacy of deaf children. Reflection proposal emerges from daily practice. This structure is from yarns that include theoretical studies of Vigotskii (1989, 1994, 1996, 1998; Stumpf (2005, Quadros (1997; Bolzan (1998, 2002; Skliar (1997a, 1997b, 1998 . From which, problematizes the processes involved in the construction of written language. It is as a result, the importance of the instrumentalization of sign language as first language in education of deaf and learning of sign language writing. Important aspects for the deaf student is observed in the condition to be literate in their mother tongue. It points out the need for a redirect in the literacy of deaf children, so that important aspects of language and its role in the structuring of thought and its communicative aspect, are respected and considered in this process. Thus, it emphasizes the learning of the writing of sign language as fundamental, it should occupy a central role in the proposed teaching the class, encouraging the contradictions that put the student in a situation of cognitive conflict, while respecting the diversity inherent to each humans. It is considered that the production of sign language writing is an appropriate tool for the deaf students record their visual language.

  14. Using natural language processing and machine learning to identify gout flares from electronic clinical notes.

    Science.gov (United States)

    Zheng, Chengyi; Rashid, Nazia; Wu, Yi-Lin; Koblick, River; Lin, Antony T; Levy, Gerald D; Cheetham, T Craig

    2014-11-01

    Gout flares are not well documented by diagnosis codes, making it difficult to conduct accurate database studies. We implemented a computer-based method to automatically identify gout flares using natural language processing (NLP) and machine learning (ML) from electronic clinical notes. Of 16,519 patients, 1,264 and 1,192 clinical notes from 2 separate sets of 100 patients were selected as the training and evaluation data sets, respectively, which were reviewed by rheumatologists. We created separate NLP searches to capture different aspects of gout flares. For each note, the NLP search outputs became the ML system inputs, which provided the final classification decisions. The note-level classifications were grouped into patient-level gout flares. Our NLP+ML results were validated using a gold standard data set and compared with the claims-based method used by prior literatures. For 16,519 patients with a diagnosis of gout and a prescription for a urate-lowering therapy, we identified 18,869 clinical notes as gout flare positive (sensitivity 82.1%, specificity 91.5%): 1,402 patients with ≥3 flares (sensitivity 93.5%, specificity 84.6%), 5,954 with 1 or 2 flares, and 9,163 with no flare (sensitivity 98.5%, specificity 96.4%). Our method identified more flare cases (18,869 versus 7,861) and patients with ≥3 flares (1,402 versus 516) when compared to the claims-based method. We developed a computer-based method (NLP and ML) to identify gout flares from the clinical notes. Our method was validated as an accurate tool for identifying gout flares with higher sensitivity and specificity compared to previous studies. Copyright © 2014 by the American College of Rheumatology.

  15. Validation of natural language processing to extract breast cancer pathology procedures and results

    Directory of Open Access Journals (Sweden)

    Arika E Wieneke

    2015-01-01

    Full Text Available Background: Pathology reports typically require manual review to abstract research data. We developed a natural language processing (NLP system to automatically interpret free-text breast pathology reports with limited assistance from manual abstraction. Methods: We used an iterative approach of machine learning algorithms and constructed groups of related findings to identify breast-related procedures and results from free-text pathology reports. We evaluated the NLP system using an all-or-nothing approach to determine which reports could be processed entirely using NLP and which reports needed manual review beyond NLP. We divided 3234 reports for development (2910, 90%, and evaluation (324, 10% purposes using manually reviewed pathology data as our gold standard. Results: NLP correctly coded 12.7% of the evaluation set, flagged 49.1% of reports for manual review, incorrectly coded 30.8%, and correctly omitted 7.4% from the evaluation set due to irrelevancy (i.e. not breast-related. Common procedures and results were identified correctly (e.g. invasive ductal with 95.5% precision and 94.0% sensitivity, but entire reports were flagged for manual review because of rare findings and substantial variation in pathology report text. Conclusions: The NLP system we developed did not perform sufficiently for abstracting entire breast pathology reports. The all-or-nothing approach resulted in too broad of a scope of work and limited our flexibility to identify breast pathology procedures and results. Our NLP system was also limited by the lack of the gold standard data on rare findings and wide variation in pathology text. Focusing on individual, common elements and improving pathology text report standardization may improve performance.

  16. Natural Language Processing for Cohort Discovery in a Discharge Prediction Model for the Neonatal ICU.

    Science.gov (United States)

    Temple, Michael W; Lehmann, Christoph U; Fabbri, Daniel

    2016-01-01

    Discharging patients from the Neonatal Intensive Care Unit (NICU) can be delayed for non-medical reasons including the procurement of home medical equipment, parental education, and the need for children's services. We previously created a model to identify patients that will be medically ready for discharge in the subsequent 2-10 days. In this study we use Natural Language Processing to improve upon that model and discern why the model performed poorly on certain patients. We retrospectively examined the text of the Assessment and Plan section from daily progress notes of 4,693 patients (103,206 patient-days) from the NICU of a large, academic children's hospital. A matrix was constructed using words from NICU notes (single words and bigrams) to train a supervised machine learning algorithm to determine the most important words differentiating poorly performing patients compared to well performing patients in our original discharge prediction model. NLP using a bag of words (BOW) analysis revealed several cohorts that performed poorly in our original model. These included patients with surgical diagnoses, pulmonary hypertension, retinopathy of prematurity, and psychosocial issues. The BOW approach aided in cohort discovery and will allow further refinement of our original discharge model prediction. Adequately identifying patients discharged home on g-tube feeds alone could improve the AUC of our original model by 0.02. Additionally, this approach identified social issues as a major cause for delayed discharge. A BOW analysis provides a method to improve and refine our NICU discharge prediction model and could potentially avoid over 900 (0.9%) hospital days.

  17. Using natural language processing to provide personalized learning opportunities from trainee clinical notes.

    Science.gov (United States)

    Denny, Joshua C; Spickard, Anderson; Speltz, Peter J; Porier, Renee; Rosenstiel, Donna E; Powers, James S

    2015-08-01

    Assessment of medical trainee learning through pre-defined competencies is now commonplace in schools of medicine. We describe a novel electronic advisor system using natural language processing (NLP) to identify two geriatric medicine competencies from medical student clinical notes in the electronic medical record: advance directives (AD) and altered mental status (AMS). Clinical notes from third year medical students were processed using a general-purpose NLP system to identify biomedical concepts and their section context. The system analyzed these notes for relevance to AD or AMS and generated custom email alerts to students with embedded supplemental learning material customized to their notes. Recall and precision of the two advisors were evaluated by physician review. Students were given pre and post multiple choice question tests broadly covering geriatrics. Of 102 students approached, 66 students consented and enrolled. The system sent 393 email alerts to 54 students (82%), including 270 for AD and 123 for AMS. Precision was 100% for AD and 93% for AMS. Recall was 69% for AD and 100% for AMS. Students mentioned ADs for 43 patients, with all mentions occurring after first having received an AD reminder. Students accessed educational links 34 times from the 393 email alerts. There was no difference in pre (mean 62%) and post (mean 60%) test scores. The system effectively identified two educational opportunities using NLP applied to clinical notes and demonstrated a small change in student behavior. Use of electronic advisors such as these may provide a scalable model to assess specific competency elements and deliver educational opportunities. Copyright © 2015 Elsevier Inc. All rights reserved.

  18. Comparison of natural language processing biosurveillance methods for identifying influenza from encounter notes.

    Science.gov (United States)

    Elkin, Peter L; Froehling, David A; Wahner-Roedler, Dietlind L; Brown, Steven H; Bailey, Kent R

    2012-01-03

    An effective national biosurveillance system expedites outbreak recognition and facilitates response coordination at the federal, state, and local levels. The BioSense system, used at the Centers for Disease Control and Prevention, incorporates chief complaints but not data from the whole encounter note into its surveillance algorithms. To evaluate whether biosurveillance by using data from the whole encounter note is superior to that using data from the chief complaint field alone. 6-year retrospective case-control cohort study. Mayo Clinic, Rochester, Minnesota. 17,243 persons tested for influenza A or B virus between 1 January 2000 and 31 December 2006. The accuracy of a model based on signs and symptoms to predict influenza virus infection in patients with upper respiratory tract symptoms, and the ability of a natural language processing technique to identify definitional clinical features from free-text encounter notes. Surveillance based on the whole encounter note was superior to the chief complaint field alone. For the case definition used by surveillance of the whole encounter note, the normalized partial area under the receiver-operating characteristic curve (specificity, 0.1 to 0.4) for surveillance using the whole encounter note was 92.9% versus 70.3% for surveillance with the chief complaint field (difference, 22.6%; P biosurveillance monitoring was not studied. A biosurveillance model for influenza using the whole encounter note is more accurate than a model that uses only the chief complaint field. Because case-defining signs and symptoms of influenza are commonly available in health records, the investigators believe that the national strategy for biosurveillance should be changed to incorporate data from the whole health record. Centers for Disease Control and Prevention.

  19. Using Natural Language Processing to Extract Abnormal Results From Cancer Screening Reports.

    Science.gov (United States)

    Moore, Carlton R; Farrag, Ashraf; Ashkin, Evan

    2017-09-01

    Numerous studies show that follow-up of abnormal cancer screening results, such as mammography and Papanicolaou (Pap) smears, is frequently not performed in a timely manner. A contributing factor is that abnormal results may go unrecognized because they are buried in free-text documents in electronic medical records (EMRs), and, as a result, patients are lost to follow-up. By identifying abnormal results from free-text reports in EMRs and generating alerts to clinicians, natural language processing (NLP) technology has the potential for improving patient care. The goal of the current study was to evaluate the performance of NLP software for extracting abnormal results from free-text mammography and Pap smear reports stored in an EMR. A sample of 421 and 500 free-text mammography and Pap reports, respectively, were manually reviewed by a physician, and the results were categorized for each report. We tested the performance of NLP to extract results from the reports. The 2 assessments (criterion standard versus NLP) were compared to determine the precision, recall, and accuracy of NLP. When NLP was compared with manual review for mammography reports, the results were as follows: precision, 98% (96%-99%); recall, 100% (98%-100%); and accuracy, 98% (96%-99%). For Pap smear reports, the precision, recall, and accuracy of NLP were all 100%. Our study developed NLP models that accurately extract abnormal results from mammography and Pap smear reports. Plans include using NLP technology to generate real-time alerts and reminders for providers to facilitate timely follow-up of abnormal results.

  20. Natural Language Search Interfaces: Health Data Needs Single-Field Variable Search

    Science.gov (United States)

    Smith, Sam; Sufi, Shoaib; Goble, Carole; Buchan, Iain

    2016-01-01

    search interface for learnability (P=.002, 95% CI [0.6-2.4]), ease of use (Psearch, which is consistent with its general familiarity and with enabling queries to be refined as the search proceeds, which treats serendipity as part of the refinement. Conclusions The results provide clear evidence that data science should adopt single-field natural language search interfaces for variable search supporting in particular: query reformulation; data browsing; faceted search; surrogates; relevance feedback; summarization, analytics, and visual presentation. PMID:26769334

  1. Assessing the feasibility of large-scale natural language processing in a corpus of ordinary medical records: a lexical analysis.

    Science.gov (United States)

    Hersh, W R; Campbell, E M; Malveau, S E

    1997-01-01

    Identify the lexical content of a large corpus of ordinary medical records to assess the feasibility of large-scale natural language processing. A corpus of 560 megabytes of medical record text from an academic medical center was broken into individual words and compared with the words in six medical vocabularies, a common word list, and a database of patient names. Unrecognized words were assessed for algorithmic and contextual approaches to identifying more words, while the remainder were analyzed for spelling correctness. About 60% of the words occurred in the medical vocabularies, common word list, or names database. Of the remainder, one-third were recognizable by other means. Of the remaining unrecognizable words, over three-fourths represented correctly spelled real words and the rest were misspellings. Large-scale generalized natural language processing methods for the medical record will require expansion of existing vocabularies, spelling error correction, and other algorithmic approaches to map words into those from clinical vocabularies.

  2. Causal knowledge extraction by natural language processing in material science: a case study in chemical vapor deposition

    Directory of Open Access Journals (Sweden)

    Yuya Kajikawa

    2006-11-01

    Full Text Available Scientific publications written in natural language still play a central role as our knowledge source. However, due to the flood of publications, the literature survey process has become a highly time-consuming and tangled process, especially for novices of the discipline. Therefore, tools supporting the literature-survey process may help the individual scientist to explore new useful domains. Natural language processing (NLP is expected as one of the promising techniques to retrieve, abstract, and extract knowledge. In this contribution, NLP is firstly applied to the literature of chemical vapor deposition (CVD, which is a sub-discipline of materials science and is a complex and interdisciplinary field of research involving chemists, physicists, engineers, and materials scientists. Causal knowledge extraction from the literature is demonstrated using NLP.

  3. Crowdsourcing a normative natural language dataset: a comparison of Amazon Mechanical Turk and in-lab data collection.

    Science.gov (United States)

    Saunders, Daniel R; Bex, Peter J; Woods, Russell L

    2013-05-20

    Crowdsourcing has become a valuable method for collecting medical research data. This approach, recruiting through open calls on the Web, is particularly useful for assembling large normative datasets. However, it is not known how natural language datasets collected over the Web differ from those collected under controlled laboratory conditions. To compare the natural language responses obtained from a crowdsourced sample of participants with responses collected in a conventional laboratory setting from participants recruited according to specific age and gender criteria. We collected natural language descriptions of 200 half-minute movie clips, from Amazon Mechanical Turk workers (crowdsourced) and 60 participants recruited from the community (lab-sourced). Crowdsourced participants responded to as many clips as they wanted and typed their responses, whereas lab-sourced participants gave spoken responses to 40 clips, and their responses were transcribed. The content of the responses was evaluated using a take-one-out procedure, which compared responses to other responses to the same clip and to other clips, with a comparison of the average number of shared words. In contrast to the 13 months of recruiting that was required to collect normative data from 60 lab-sourced participants (with specific demographic characteristics), only 34 days were needed to collect normative data from 99 crowdsourced participants (contributing a median of 22 responses). The majority of crowdsourced workers were female, and the median age was 35 years, lower than the lab-sourced median of 62 years but similar to the median age of the US population. The responses contributed by the crowdsourced participants were longer on average, that is, 33 words compared to 28 words (Pcrowdsourced participants had more shared words (P=.004 and .01 respectively), whereas younger participants had higher numbers of shared words in the lab-sourced population (P=.01). Crowdsourcing is an effective approach

  4. Steering the conversation: A linguistic exploration of natural language interactions with a digital assistant during simulated driving.

    Science.gov (United States)

    Large, David R; Clark, Leigh; Quandt, Annie; Burnett, Gary; Skrypchuk, Lee

    2017-09-01

    Given the proliferation of 'intelligent' and 'socially-aware' digital assistants embodying everyday mobile technology - and the undeniable logic that utilising voice-activated controls and interfaces in cars reduces the visual and manual distraction of interacting with in-vehicle devices - it appears inevitable that next generation vehicles will be embodied by digital assistants and utilise spoken language as a method of interaction. From a design perspective, defining the language and interaction style that a digital driving assistant should adopt is contingent on the role that they play within the social fabric and context in which they are situated. We therefore conducted a qualitative, Wizard-of-Oz study to explore how drivers might interact linguistically with a natural language digital driving assistant. Twenty-five participants drove for 10 min in a medium-fidelity driving simulator while interacting with a state-of-the-art, high-functioning, conversational digital driving assistant. All exchanges were transcribed and analysed using recognised linguistic techniques, such as discourse and conversation analysis, normally reserved for interpersonal investigation. Language usage patterns demonstrate that interactions with the digital assistant were fundamentally social in nature, with participants affording the assistant equal social status and high-level cognitive processing capability. For example, participants were polite, actively controlled turn-taking during the conversation, and used back-channelling, fillers and hesitation, as they might in human communication. Furthermore, participants expected the digital assistant to understand and process complex requests mitigated with hedging words and expressions, and peppered with vague language and deictic references requiring shared contextual information and mutual understanding. Findings are presented in six themes which emerged during the analysis - formulating responses; turn-taking; back

  5. Starting over: international adoption as a natural experiment in language development.

    Science.gov (United States)

    Snedeker, Jesse; Geren, Joy; Shafto, Carissa L

    2007-01-01

    Language development is characterized by predictable shifts in the words children produce and the complexity of their utterances. Because acquisition typically occurs simultaneously with maturation and cognitive development, it is difficult to determine the causes of these shifts. We explored how acquisition proceeds in the absence of possible cognitive or maturational roadblocks, by examining the acquisition of English in internationally adopted preschoolers. Like infants, and unlike other second-language learners, these children acquire language from child-directed speech, without access to bilingual informants. Parental reports and speech samples were collected from 27 preschoolers, 3 to 18 months after they were adopted from China. These children showed the same developmental patterns in language production as monolingual infants (matched for vocabulary size). Early on, their vocabularies were dominated by nouns, their utterances were short, and grammatical morphemes were generally omitted. Children at later stages had more diverse vocabularies and produced longer utterances with more grammatical morphemes.

  6. A Discussion about Upgrading the Quick Script Platform to Create Natural Language based IoT Systems

    DEFF Research Database (Denmark)

    Khanna, Anirudh; Das, Bhagwan; Pandey, Bishwajeet

    2016-01-01

    With the advent of AI and IoT, the idea of incorporating smart things/appliances in our day to day life is converting into a reality. The paper discusses the possibilities and potential of designing IoT systems which can be controlled via natural language, with help of Quick Script as a development......, and where all the necessary changes/ additions are to be made. The benefits of this will include sharing the power of controlling and even programming (up to some extent) to the user end. As well as providing a simple intermediary to make communication between man and his machines a little more natural...

  7. Natural Language Processing Based Instrument for Classification of Free Text Medical Records

    OpenAIRE

    Khachidze, Manana; Tsintsadze, Magda; Archuadze, Maia

    2016-01-01

    According to the Ministry of Labor, Health and Social Affairs of Georgia a new health management system has to be introduced in the nearest future. In this context arises the problem of structuring and classifying documents containing all the history of medical services provided. The present work introduces the instrument for classification of medical records based on the Georgian language. It is the first attempt of such classification of the Georgian language based medical records. On the w...

  8. Dynamical Languages

    Science.gov (United States)

    Xie, Huimin

    The following sections are included: * Definition of Dynamical Languages * Distinct Excluded Blocks * Definition and Properties * L and L″ in Chomsky Hierarchy * A Natural Equivalence Relation * Symbolic Flows * Symbolic Flows and Dynamical Languages * Subshifts of Finite Type * Sofic Systems * Graphs and Dynamical Languages * Graphs and Shannon-Graphs * Transitive Languages * Topological Entropy

  9. Dependency distance: A new perspective on the syntactic development in second language acquisition. Comment on "Dependency distance: A new perspective on syntactic patterns in natural language" by Haitao Liu et al.

    Science.gov (United States)

    Jiang, Jingyang; Ouyang, Jinghui

    2017-07-01

    Liu et al. [1] offers a clear and informative account of the use of dependency distance in studying natural languages, with a focus on the viewpoint that dependency distance minimization (DDM) can be regarded as a linguistic universal. We would like to add the perspective of employing dependency distance in the studies of second languages acquisition (SLA), particularly the studies of syntactic development.

  10. "Rate My Therapist": Automated Detection of Empathy in Drug and Alcohol Counseling via Speech and Language Processing.

    Science.gov (United States)

    Xiao, Bo; Imel, Zac E; Georgiou, Panayiotis G; Atkins, David C; Narayanan, Shrikanth S

    2015-01-01

    The technology for evaluating patient-provider interactions in psychotherapy-observational coding-has not changed in 70 years. It is labor-intensive, error prone, and expensive, limiting its use in evaluating psychotherapy in the real world. Engineering solutions from speech and language processing provide new methods for the automatic evaluation of provider ratings from session recordings. The primary data are 200 Motivational Interviewing (MI) sessions from a study on MI training methods with observer ratings of counselor empathy. Automatic Speech Recognition (ASR) was used to transcribe sessions, and the resulting words were used in a text-based predictive model of empathy. Two supporting datasets trained the speech processing tasks including ASR (1200 transcripts from heterogeneous psychotherapy sessions and 153 transcripts and session recordings from 5 MI clinical trials). The accuracy of computationally-derived empathy ratings were evaluated against human ratings for each provider. Computationally-derived empathy scores and classifications (high vs. low) were highly accurate against human-based codes and classifications, with a correlation of 0.65 and F-score (a weighted average of sensitivity and specificity) of 0.86, respectively. Empathy prediction using human transcription as input (as opposed to ASR) resulted in a slight increase in prediction accuracies, suggesting that the fully automatic system with ASR is relatively robust. Using speech and language processing methods, it is possible to generate accurate predictions of provider performance in psychotherapy from audio recordings alone. This technology can support large-scale evaluation of psychotherapy for dissemination and process studies.

  11. Natural Language Search Interfaces: Health Data Needs Single-Field Variable Search.

    Science.gov (United States)

    Jay, Caroline; Harper, Simon; Dunlop, Ian; Smith, Sam; Sufi, Shoaib; Goble, Carole; Buchan, Iain

    2016-01-14

    ,19=18.0, P<.001). There was also a main effect of task (F2,38=4.1, P=.025, Greenhouse-Geisser correction applied). Overall, participants were asked to rate learnability, ease of use, and satisfaction. Paired mean comparisons showed that the Web search interface received significantly higher ratings than the traditional search interface for learnability (P=.002, 95% CI [0.6-2.4]), ease of use (P<.001, 95% CI [1.2-3.2]), and satisfaction (P<.001, 95% CI [1.8-3.5]). The results show superior cross-domain usability of Web search, which is consistent with its general familiarity and with enabling queries to be refined as the search proceeds, which treats serendipity as part of the refinement. The results provide clear evidence that data science should adopt single-field natural language search interfaces for variable search supporting in particular: query reformulation; data browsing; faceted search; surrogates; relevance feedback; summarization, analytics, and visual presentation.

  12. Natural language morphology integration in off-line Arabic optical text recognition.

    Science.gov (United States)

    Kanoun, Slim; Alimi, Adel M; Lecourtier, Yves

    2011-04-01

    In this paper, we propose a new linguistic-based approach called the affixal approach for Arabic word and text image recognition. Most of the existing works in the field integrate the knowledge of the Arabic language in the recognition process in two ways: either in post-recognition using the language of dictionary (dictionary of words) to validate the word hypotheses suggested by the OCR or in the course of the recognition process (recognition directed by a lexicon) using a statistical model of the language (Hidden Markov Model or N-gram). The proposed approach uses the linguistic concepts of the vocabulary to direct and simplify the recognition process. The principal contribution of the proposed approach is to be able to categorize the word hypotheses in words that are either derived or not derived from roots and to characterize morphologically each word hypothesis in order to prepare the text hypotheses for later analyses (for example, syntactic analysis; to filter the sentence hypotheses).

  13. In silico Evolutionary Developmental Neurobiology and the Origin of Natural Language

    Science.gov (United States)

    Szathmáry, Eörs; Szathmáry, Zoltán; Ittzés, Péter; Orbaán, Geroő; Zachár, István; Huszár, Ferenc; Fedor, Anna; Varga, Máté; Számadó, Szabolcs

    It is justified to assume that part of our genetic endowment contributes to our language skills, yet it is impossible to tell at this moment exactly how genes affect the language faculty. We complement experimental biological studies by an in silico approach in that we simulate the evolution of neuronal networks under selection for language-related skills. At the heart of this project is the Evolutionary Neurogenetic Algorithm (ENGA) that is deliberately biomimetic. The design of the system was inspired by important biological phenomena such as brain ontogenesis, neuron morphologies, and indirect genetic encoding. Neuronal networks were selected and were allowed to reproduce as a function of their performance in the given task. The selected neuronal networks in all scenarios were able to solve the communication problem they had to face. The most striking feature of the model is that it works with highly indirect genetic encoding--just as brains do.

  14. Mirror neurons and the social nature of language: the neural exploitation hypothesis.

    Science.gov (United States)

    Gallese, Vittorio

    2008-01-01

    This paper discusses the relevance of the discovery of mirror neurons in monkeys and of the mirror neuron system in humans to a neuroscientific account of primates' social cognition and its evolution. It is proposed that mirror neurons and the functional mechanism they underpin, embodied simulation, can ground within a unitary neurophysiological explanatory framework important aspects of human social cognition. In particular, the main focus is on language, here conceived according to a neurophenomenological perspective, grounding meaning on the social experience of action. A neurophysiological hypothesis--the "neural exploitation hypothesis"--is introduced to explain how key aspects of human social cognition are underpinned by brain mechanisms originally evolved for sensorimotor integration. It is proposed that these mechanisms were later on adapted as new neurofunctional architecture for thought and language, while retaining their original functions as well. By neural exploitation, social cognition and language can be linked to the experiential domain of action.

  15. Genetic and Environmental Links between Natural Language Use and Cognitive Ability in Toddlers

    Science.gov (United States)

    Canfield, Caitlin F.; Edelson, Lisa R.; Saudino, Kimberly J.

    2017-01-01

    Although the phenotypic correlation between language and nonverbal cognitive ability is well-documented, studies examining the etiology of the covariance between these abilities are scant, particularly in very young children. The goal of this study was to address this gap in the literature by examining the genetic and environmental links between…

  16. Implementation of Danish in the Natural Language Generator of Angus2

    DEFF Research Database (Denmark)

    Larsen, Søren Støvelbæk; Fihl, Preben; Moeslund, Thomas B.

    The purpose of this technical report is to cover the implementation of the Danish language and grammar in the Angus2 software. This includes a brief description of the Angus2 software, and the Danish grammar with relevance to the implementation in Angus2, and detailed description of how...

  17. The substantive nature of psycholexical personality factors : A comparison across languages

    NARCIS (Netherlands)

    Peabody, D; De Raad, B.

    2002-01-01

    The psycholexical approach to personality structure in American English has led to the Big Five factors. The present study considers whether this result is similar or different in other languages. Instead of placing the usual emphasis on quantitative indices, this study examines the substantive

  18. INTEGRATING CORPUS-BASED RESOURCES AND NATURAL LANGUAGE PROCESSING TOOLS INTO CALL

    Directory of Open Access Journals (Sweden)

    Pascual Cantos Gomez

    2002-06-01

    Full Text Available This paper ainis at presenting a survey of computational linguistic tools presently available but whose potential has been neither fully considered not exploited to its full in modern CALL. It starts with a discussion on the rationale of DDL to language learning, presenting typical DDL-activities. DDL-software and potential extensions of non-typical DDL-software (electronic dictionaries and electronic dictionary facilities to DDL . An extended section is devoted to describe NLP-technology and how it can be integrated into CALL, within already existing software or as stand alone resources. A range of NLP-tools is presentcd (MT programs, taggers, lemn~atizersp, arsers and speech technologies with special emphasis on tagged concordancing. The paper finishes with a number of reflections and ideas on how language technologies can be used efficiently within the language learning context and how extensive exploration and integration of these technologies might change and extend both modern CAI,I, and the present language learning paradigiii..

  19. Home automation with Intel Galileo

    CERN Document Server

    Dundar, Onur

    2015-01-01

    This book is for anyone who wants to learn Intel Galileo for home automation and cross-platform software development. No knowledge of programming with Intel Galileo is assumed, but knowledge of the C programming language is essential.

  20. On the relation between dependency distance, crossing dependencies, and parsing. Comment on "Dependency distance: a new perspective on syntactic patterns in natural languages" by Haitao Liu et al.

    Science.gov (United States)

    Gómez-Rodríguez, Carlos

    2017-07-01

    Liu et al. [1] provide a comprehensive account of research on dependency distance in human languages. While the article is a very rich and useful report on this complex subject, here I will expand on a few specific issues where research in computational linguistics (specifically natural language processing) can inform DDM research, and vice versa. These aspects have not been explored much in [1] or elsewhere, probably due to the little overlap between both research communities, but they may provide interesting insights for improving our understanding of the evolution of human languages, the mechanisms by which the brain processes and understands language, and the construction of effective computer systems to achieve this goal.

  1. Une Analyse automatique en syntaxe textuelle (An Automated Analysis of Textual Syntax). Publication K-5.

    Science.gov (United States)

    Ladouceur, Jacques

    This study reports the use of automated textual analysis on a French novel. An introductory section chronicles the history of artificial intelligence, focusing on its use with natural languages, and discusses its application to textual syntax. The first chapter examines computational linguistics in greater detail, looking at its relationship to…

  2. Context Analysis of Customer Requests using a Hybrid Adaptive Neuro Fuzzy Inference System and Hidden Markov Models in the Natural Language Call Routing Problem

    Science.gov (United States)

    Rustamov, Samir; Mustafayev, Elshan; Clements, Mark A.

    2018-04-01

    The context analysis of customer requests in a natural language call routing problem is investigated in the paper. One of the most significant problems in natural language call routing is a comprehension of client request. With the aim of finding a solution to this issue, the Hybrid HMM and ANFIS models become a subject to an examination. Combining different types of models (ANFIS and HMM) can prevent misunderstanding by the system for identification of user intention in dialogue system. Based on these models, the hybrid system may be employed in various language and call routing domains due to nonusage of lexical or syntactic analysis in classification process.

  3. A natural language query system for Hubble Space Telescope proposal selection

    Science.gov (United States)

    Hornick, Thomas; Cohen, William; Miller, Glenn

    1987-01-01

    The proposal selection process for the Hubble Space Telescope is assisted by a robust and easy to use query program (TACOS). The system parses an English subset language sentence regardless of the order of the keyword phases, allowing the user a greater flexibility than a standard command query language. Capabilities for macro and procedure definition are also integrated. The system was designed for flexibility in both use and maintenance. In addition, TACOS can be applied to any knowledge domain that can be expressed in terms of a single reaction. The system was implemented mostly in Common LISP. The TACOS design is described in detail, with particular attention given to the implementation methods of sentence processing.

  4. The use of natural language processing on pediatric diagnostic radiology reports in the electronic health record to identify deep venous thrombosis in children.

    Science.gov (United States)

    Gálvez, Jorge A; Pappas, Janine M; Ahumada, Luis; Martin, John N; Simpao, Allan F; Rehman, Mohamed A; Witmer, Char

    2017-10-01

    Venous thromboembolism (VTE) is a potentially life-threatening condition that includes both deep vein thrombosis (DVT) and pulmonary embolism. We sought to improve detection and reporting of children with a new diagnosis of VTE by applying natural language processing (NLP) tools to radiologists' reports. We validated an NLP tool, Reveal NLP (Health Fidelity Inc, San Mateo, CA) and inference rules engine's performance in identifying reports with deep venous thrombosis using a curated set of ultrasound reports. We then configured the NLP tool to scan all available radiology reports on a daily basis for studies that met criteria for VTE between July 1, 2015, and March 31, 2016. The NLP tool and inference rules engine correctly identified 140 out of 144 reports with positive DVT findings and 98 out of 106 negative reports in the validation set. The tool's sensitivity was 97.2% (95% CI 93-99.2%), specificity was 92.5% (95% CI 85.7-96.7%). Subsequently, the NLP tool and inference rules engine processed 6373 radiology reports from 3371 hospital encounters. The NLP tool and inference rules engine identified 178 positive reports and 3193 negative reports with a sensitivity of 82.9% (95% CI 74.8-89.2) and specificity of 97.5% (95% CI 96.9-98). The system functions well as a safety net to screen patients for HA-VTE on a daily basis and offers value as an automated, redundant system. To our knowledge, this is the first pediatric study to apply NLP technology in a prospective manner for HA-VTE identification.

  5. Research and Development in Natural Language Understanding as Part of the Strategic Computing Program.

    Science.gov (United States)

    1987-04-01

    The First Conference on Artificial Intelligence Applications , pages 13-18. IEEE Computer Society, December, 1984. S"[121 Sager, N. The String Parser...Language Access. In The First Conference on Artificial Intelligence Applications , pages 19-24. IEEE Computer Society, December, 1984. [18] Stallard...of The First Conference on Artificial Intelligence Applications . IEEE Computer Society, Denver, Colorado, December, 1984. [71 Scha. R.J.H. English

  6. Descriptive Metaphysics, Natural Language Metaphysics, Sapir-Whorf, and All That Stuff: Evidence from the Mass-Count Distinction

    Directory of Open Access Journals (Sweden)

    Francis Jeffry Pelletier

    2010-12-01

    Full Text Available Strawson (1959 described ‘descriptive metaphysics’, Bach (1986a described ‘natural language metaphysics’, Sapir (1929 and Whorf (1940a,b, 1941 describe, well, Sapir-Whorfianism. And there are other views concerning the relation between correct semantic analysis of linguistic phenomena and the “reality” that is supposed to be thereby described. I think some considerations from the analyses of the mass-count distinction can shed some light on that very dark topic.ReferencesBach, Emmon. 1986a. ‘Natural Language Metaphysics’. In Ruth Barcan Marcus, G.J.W. Dorn & Paul Weingartner (eds. ‘Logic, Methodology, and Philosophy of Science, VII’, 573–595. Amsterdam: North Holland.Bach, Emmon. 1986b. ‘The Algebra of Events’. Linguistics and Philosophy 9: 5–16.Berger, Peter & Luckmann, Thomas. 1966. The Social Construction of Reality: A Treatise in the Sociology of Knowledge. New York: Doubleday.Boroditsky, Lera, Schmidt, Lauren & Phillips, Webb. 2003. ‘Sex, Syntax, and Semantics’. In Dedre Gentner & Susan Goldin-Meadow (eds. ‘Language in Mind: Advances in the Study of Language and Cognition’, 59–80. Cambridge, MA: MIT Press.Cheng, L. & Sybesma, R. 1999. ‘Bare and Not-So-Bare Nouns and the structure of NP’. Linguistic Inquiry 30: 509–542.http://dx.doi.org/10.1162/002438999554192Chierchia, Gennaro. 1998a. ‘Reference to Kinds across Languages’. Natural Language Semantics 6: 339–405.http://dx.doi.org/10.1023/A:1008324218506Chierchia, Gennaro. 1998b. ‘Plurality of Mass Nouns and the Notion of ‘Semantic Parameter’ ’. In S. Rothstein (ed. ‘Events and Grammar’, 53–103. Dordrecht: Kluwer.Chierchia, Gennaro. 2010. ‘Mass Nouns, Vagueness and Semantic Variation’. Synthèse 174: 99–149.http://dx.doi.org/10.1007/s11229-009-9686-6Doetjes, Jenny. 1997. Quantifiers and Selection: On the Distribution of Quantifying Expressions in French, Dutch and English. Ph.D. thesis, University of Leiden, Holland

  7. Treating conduct disorder: An effectiveness and natural language analysis study of a new family-centred intervention program.

    Science.gov (United States)

    Stevens, Kimberly A; Ronan, Prof Kevin; Davies, Gene

    2017-05-01

    This paper reports on a new family-centred, feedback-informed intervention focused on evaluating therapeutic outcomes and language changes across treatment for conduct disorder (CD). The study included 26 youth and families from a larger randomised, controlled trial (Ronan et al., in preparation). Outcome measures reflected family functioning/youth compliance, delinquency, and family goal attainment. First- and last-treatment session audio files were transcribed into more than 286,000 words and evaluated through the Linguistic Inquiry and Word Count Analysis program (Pennebaker et al., 2007). Significant outcomes across family functioning/youth compliance, delinquency, goal attainment and word usage reflected moderate-strong effect sizes. Benchmarking findings also revealed reduced time of treatment delivery compared to a gold standard approach. Linguistic analysis revealed specific language changes across treatment. For caregivers, increased first person, action-oriented, present tense, and assent type words and decreased sadness words were found; for youth, significant reduction in use of leisure words. This study is the first using lexical analyses of natural language to assess change across treatment for conduct disordered youth and families. Such findings provided strong support for program tenets; others, more speculative support. Copyright © 2016. Published by Elsevier B.V.

  8. Natural Language Interface for Safety Certification of Safety-Critical Software

    Science.gov (United States)

    Denney, Ewen; Fischer, Bernd

    2011-01-01

    Model-based design and automated code generation are being used increasingly at NASA. The trend is to move beyond simulation and prototyping to actual flight code, particularly in the guidance, navigation, and control domain. However, there are substantial obstacles to more widespread adoption of code generators in such safety-critical domains. Since code generators are typically not qualified, there is no guarantee that their output is correct, and consequently the generated code still needs to be fully tested and certified. The AutoCert generator plug-in supports the certification of automatically generated code by formally verifying that the generated code is free of different safety violations, by constructing an independently verifiable certificate, and by explaining its analysis in a textual form suitable for code reviews.

  9. Automated speech understanding: the next generation

    Science.gov (United States)

    Picone, J.; Ebel, W. J.; Deshmukh, N.

    1995-04-01

    Modern speech understanding systems merge interdisciplinary technologies from Signal Processing, Pattern Recognition, Natural Language, and Linguistics into a unified statistical framework. These systems, which have applications in a wide range of signal processing problems, represent a revolution in Digital Signal Processing (DSP). Once a field dominated by vector-oriented processors and linear algebra-based mathematics, the current generation of DSP-based systems rely on sophisticated statistical models implemented using a complex software paradigm. Such systems are now capable of understanding continuous speech input for vocabularies of several thousand words in operational environments. The current generation of deployed systems, based on small vocabularies of isolated words, will soon be replaced by a new technology offering natural language access to vast information resources such as the Internet, and provide completely automated voice interfaces for mundane tasks such as travel planning and directory assistance.

  10. Approaches to automated detection of cyberbullying:A Survey

    OpenAIRE

    Salawu, Semiu; He, Yulan; Lumsden, Joanna

    2017-01-01

    Research into cyberbullying detection has increased in recent years, due in part to the proliferation of cyberbullying across social media and its detrimental effect on young people. A growing body of work is emerging on automated approaches to cyberbullying detection. These approaches utilise machine learning and natural language processing techniques to identify the characteristics of a cyberbullying exchange and automatically detect cyberbullying by matching textual data to the identified ...

  11. Identification of methicillin-resistant Staphylococcus aureus within the Nation’s Veterans Affairs Medical Centers using natural language processing

    Directory of Open Access Journals (Sweden)

    Jones Makoto

    2012-07-01

    Full Text Available Abstract Background Accurate information is needed to direct healthcare systems’ efforts to control methicillin-resistant Staphylococcus aureus (MRSA. Assembling complete and correct microbiology data is vital to understanding and addressing the multiple drug-resistant organisms in our hospitals. Methods Herein, we describe a system that securely gathers microbiology data from the Department of Veterans Affairs (VA network of databases. Using natural language processing methods, we applied an information extraction process to extract organisms and susceptibilities from the free-text data. We then validated the extraction against independently derived electronic data and expert annotation. Results We estimate that the collected microbiology data are 98.5% complete and that methicillin-resistant Staphylococcus aureus was extracted accurately 99.7% of the time. Conclusions Applying natural language processing methods to microbiology records appears to be a promising way to extract accurate and useful nosocomial pathogen surveillance data. Both scientific inquiry and the data’s reliability will be dependent on the surveillance system’s capability to compare from multiple sources and circumvent systematic error. The dataset constructed and methods used for this investigation could contribute to a comprehensive infectious disease surveillance system or other pressing needs.

  12. Dual Sticky Hierarchical Dirichlet Process Hidden Markov Model and Its Application to Natural Language Description of Motions.

    Science.gov (United States)

    Hu, Weiming; Tian, Guodong; Kang, Yongxin; Yuan, Chunfeng; Maybank, Stephen

    2017-09-25

    In this paper, a new nonparametric Bayesian model called the dual sticky hierarchical Dirichlet process hidden Markov model (HDP-HMM) is proposed for mining activities from a collection of time series data such as trajectories. All the time series data are clustered. Each cluster of time series data, corresponding to a motion pattern, is modeled by an HMM. Our model postulates a set of HMMs that share a common set of states (topics in an analogy with topic models for document processing), but have unique transition distributions. For the application to motion trajectory modeling, topics correspond to motion activities. The learnt topics are clustered into atomic activities which are assigned predicates. We propose a Bayesian inference method to decompose a given trajectory into a sequence of atomic activities. On combining the learnt sources and sinks, semantic motion regions, and the learnt sequence of atomic activities, the action represented by the trajectory can be described in natural language in as automatic a way as possible. The effectiveness of our dual sticky HDP-HMM is validated on several trajectory datasets. The effectiveness of the natural language descriptions for motions is demonstrated on the vehicle trajectories extracted from a traffic scene.

  13. im4Things: An Ontology-Based Natural Language Interface for Controlling Devices in the Internet of Things

    KAUST Repository

    Noguera-Arnaldos, José Ángel

    2017-03-14

    The Internet of Things (IoT) offers opportunities for new applications and services that enable users to access and control their working and home environment from local and remote locations, aiming to perform daily life activities in an easy way. However, the IoT also introduces new challenges, some of which arise from the large range of devices currently available and the heterogeneous interfaces provided for their control. The control and management of this variety of devices and interfaces represent a new challenge for non-expert users, instead of making their life easier. Based on this understanding, in this work we present a natural language interface for the IoT, which takes advantage of Semantic Web technologies to allow non-expert users to control their home environment through an instant messaging application in an easy and intuitive way. We conducted several experiments with a group of end users aiming to evaluate the effectiveness of our approach to control home appliances by means of natural language instructions. The evaluation results proved that without the need for technicalities, the user was able to control the home appliances in an efficient way.

  14. The dynamic nature of motivation in language learning: A classroom perspective

    Directory of Open Access Journals (Sweden)

    Mirosław Pawlak

    2012-10-01

    Full Text Available When we examine the empirical investigations of motivation in second and foreign language learning, even those drawing upon the latest theoretical paradigms, such as the L2 motivational self system (Dörnyei, 2009, it becomes clear that many of them still fail to take account of its dynamic character and temporal variation. This may be surprising in view of the fact that the need to adopt such a process-oriented approach has been emphasized by a number of theorists and researchers (e.g., Dörnyei, 2000, 2001, 2009; Ushioda, 1996; Williams & Burden, 1997, and it lies at the heart of the model of second language motivation proposed by Dörnyei and Ottó (1998. It is also unfortunate that few research projects have addressed the question of how motivation changes during a language lesson as well as a series of lessons, and what factors might be responsible for fluctuations of this kind. The present paper is aimed to rectify this problem by reporting the findings of a classroom-based study which investigated the changes in the motivation of 28 senior high school students, both in terms of their goals and intentions, and their interest and engagement in classroom activities and tasks over the period of four weeks. The analysis of the data collected by means of questionnaires, observations and interviews showed that although the reasons for learning remain relatively stable, the intensity of motivation is indeed subject to variation on a minute-to-minute basis and this fact has to be recognized even in large-scale, cross-sectional research in this area.

  15. On the nature and evolution of the neural bases of human language

    Science.gov (United States)

    Lieberman, Philip

    2002-01-01

    The traditional theory equating the brain bases of language with Broca's and Wernicke's neocortical areas is wrong. Neural circuits linking activity in anatomically segregated populations of neurons in subcortical structures and the neocortex throughout the human brain regulate complex behaviors such as walking, talking, and comprehending the meaning of sentences. When we hear or read a word, neural structures involved in the perception or real-world associations of the word are activated as well as posterior cortical regions adjacent to Wernicke's area. Many areas of the neocortex and subcortical structures support the cortical-striatal-cortical circuits that confer complex syntactic ability, speech production, and a large vocabulary. However, many of these structures also form part of the neural circuits regulating other aspects of behavior. For example, the basal ganglia, which regulate motor control, are also crucial elements in the circuits that confer human linguistic ability and abstract reasoning. The cerebellum, traditionally associated with motor control, is active in motor learning. The basal ganglia are also key elements in reward-based learning. Data from studies of Broca's aphasia, Parkinson's disease, hypoxia, focal brain damage, and a genetically transmitted brain anomaly (the putative "language gene," family KE), and from comparative studies of the brains and behavior of other species, demonstrate that the basal ganglia sequence the discrete elements that constitute a complete motor act, syntactic process, or thought process. Imaging studies of intact human subjects and electrophysiologic and tracer studies of the brains and behavior of other species confirm these findings. As Dobzansky put it, "Nothing in biology makes sense except in the light of evolution" (cited in Mayr, 1982). That applies with as much force to the human brain and the neural bases of language as it does to the human foot or jaw. The converse follows: the mark of evolution on

  16. Benchmarks of programming languages for special purposes in the space station

    Science.gov (United States)

    Knoebel, Arthur

    1986-01-01

    Although Ada is likely to be chosen as the principal programming language for the Space Station, certain needs, such as expert systems and robotics, may be better developed in special languages. The languages, LISP and Prolog, are studied and some benchmarks derived. The mathematical foundations for these languages are reviewed. Likely areas of the space station are sought out where automation and robotics might be applicable. Benchmarks are designed which are functional, mathematical, relational, and expert in nature. The coding will depend on the particular versions of the languages which become available for testing.

  17. Natural Language Processing Based Instrument for Classification of Free Text Medical Records

    Directory of Open Access Journals (Sweden)

    Manana Khachidze

    2016-01-01

    Full Text Available According to the Ministry of Labor, Health and Social Affairs of Georgia a new health management system has to be introduced in the nearest future. In this context arises the problem of structuring and classifying documents containing all the history of medical services provided. The present work introduces the instrument for classification of medical records based on the Georgian language. It is the first attempt of such classification of the Georgian language based medical records. On the whole 24.855 examination records have been studied. The documents were classified into three main groups (ultrasonography, endoscopy, and X-ray and 13 subgroups using two well-known methods: Support Vector Machine (SVM and K-Nearest Neighbor (KNN. The results obtained demonstrated that both machine learning methods performed successfully, with a little supremacy of SVM. In the process of classification a “shrink” method, based on features selection, was introduced and applied. At the first stage of classification the results of the “shrink” case were better; however, on the second stage of classification into subclasses 23% of all documents could not be linked to only one definite individual subclass (liver or binary system due to common features characterizing these subclasses. The overall results of the study were successful.

  18. Language of sociology: the problem of artificial and natural (everyday concepts

    Directory of Open Access Journals (Sweden)

    M. V. Naumenko

    2014-03-01

    The author analyzes the possible negative consequences of widespread use of artificial concepts in sociology, advantages and disadvantages of the use of natural (everyday concepts in sociology, propose a resolve the situation of naive reading of everyday concepts.

  19. A Requirements-Based Exploration of Open-Source Software Development Projects--Towards a Natural Language Processing Software Analysis Framework

    Science.gov (United States)

    Vlas, Radu Eduard

    2012-01-01

    Open source projects do have requirements; they are, however, mostly informal, text descriptions found in requests, forums, and other correspondence. Understanding such requirements provides insight into the nature of open source projects. Unfortunately, manual analysis of natural language requirements is time-consuming, and for large projects,…

  20. Understanding patient satisfaction with received healthcare services: A natural language processing approach

    Science.gov (United States)

    Doing-Harris, Kristina; Mowery, Danielle L.; Daniels, Chrissy; Chapman, Wendy W.; Conway, Mike

    2016-01-01

    Important information is encoded in free-text patient comments. We determine the most common topics in patient comments, design automatic topic classifiers, identify comments ’ sentiment, and find new topics in negative comments. Our annotation scheme consisted of 28 topics, with positive and negative sentiment. Within those 28 topics, the seven most frequent accounted for 63% of annotations. For automated topic classification, we developed vocabulary-based and Naive Bayes ’ classifiers. For sentiment analysis, another Naive Bayes ’ classifier was used. Finally, we used topic modeling to search for unexpected topics within negative comments. The seven most common topics were appointment access, appointment wait, empathy, explanation, friendliness, practice environment, and overall experience. The best F-measures from our classifier were 0.52(NB), 0.57(NB), 0.36(Vocab), 0.74(NB), 0.40(NB), and 0.44(Vocab), respectively. F- scores ranged from 0.16 to 0.74. The sentiment classification F-score was 0.84. Negative comment topic modeling revealed complaints about appointment access, appointment wait, and time spent with physician. PMID:28269848

  1. PREDICATE OF ‘MANGAN’ IN SASAK LANGUAGE: A STUDY OF NATURAL SEMANTIC METALANGUAGE

    Directory of Open Access Journals (Sweden)

    Sarwadi

    2016-11-01

    Full Text Available The aim of this study were to know semantic meaning of predicate Ngajengan, Daharan, Ngelor, Mangan, Ngrodok (Eating, Kaken (Eating, Suap, Bejijit, (Eating Bekeruak (Eating, Ngerasak (Eating and Nyangklok (Eating. Besides that, to know the lexical meaning of each words and the function of words in every sentences especially the meaning of eating in Sasaknese language. The lexical meaning of Ngajengan, Daharan, Ngelor, Mangan, Ngrodok (Eating, Kaken (Eating, Suap, Bejijit, (Eating Bekeruak (Eating, Ngerasak (Eating and Nyangklok (Eating was doing something to eat but the differences of these words are usage in sentences. Besides that, the word usage based on the subject and object and there is predicate that need tool to state eat meals or food.

  2. Automation of Feynman diagram evaluations

    International Nuclear Information System (INIS)

    Tentyukov, M.N.

    1998-01-01

    A C-program DIANA (DIagram ANAlyser) for the automation of Feynman diagram evaluations is presented. It consists of two parts: the analyzer of diagrams and the interpreter of a special text manipulating language. This language can be used to create a source code for analytical or numerical evaluations and to keep the control of the process in general

  3. The embodied nature of medical concepts: image schemas and language for PAIN.

    Science.gov (United States)

    Prieto Velasco, Juan Antonio; Tercedor Sánchez, Maribel

    2014-08-01

    Cognitive linguistics assumes that knowledge is both embodied and situated as far as it is acquired through our bodily interaction with the world in a specific environment (e.g. Barsalou in Lang Cogn Process 18:513-562, 2003; Connell et al. in PLoS One 7:3, 2012). Therefore, embodiment provides an explanation to the mental representation and linguistic expression of concepts. Among the first, we find multimodal conceptual structures, like image schemas, which are schematic representations of embodied experiences resulting from our conceptualization of the surrounding environment (Tercedor Sánchez et al. in J Spec Transl 18:187-205, 2012). Furthermore, the way we interact with the environment and its objects is dynamic and configures how we refer to concepts both by means of images and lexicalizations. In this article, we investigate how image schemas underlie verbal and visual representations. They both evoke concepts based on exteroception, interoception and proprioception which can be lexicalized through language. More specifically, we study (1) a multimodal corpus of medical texts to examine how image schemas lexicalize in the language of medicine to represent specialized concepts and (2) medical pictures to explore the depiction of image-schematic concepts, in order to account for the verbal and visual representation of embodied concepts. We explore the concept PAIN, a sensory and emotional experience associated with actual or potential tissue damage, using corpus analysis tools (Sketch Engine) to extract information about the lexicalization of underlying image schemas in definitions and defining contexts. Then, we use the image schemas behind medical concepts to consistently select images which depict our experience of pain and the way we understand it. Finally, such lexicalizations and visualizations will help us assess how we refer to PAIN both verbally and visually.

  4. Modelling language

    CERN Document Server

    Cardey, Sylviane

    2013-01-01

    In response to the need for reliable results from natural language processing, this book presents an original way of decomposing a language(s) in a microscopic manner by means of intra/inter‑language norms and divergences, going progressively from languages as systems to the linguistic, mathematical and computational models, which being based on a constructive approach are inherently traceable. Languages are described with their elements aggregating or repelling each other to form viable interrelated micro‑systems. The abstract model, which contrary to the current state of the art works in int

  5. Natural Language Processing (NLP), Machine Learning (ML), and Semantics in Polar Science

    Science.gov (United States)

    Duerr, R.; Ramdeen, S.

    2017-12-01

    One of the interesting features of Polar Science is that it historically has been extremely interdisciplinary, encompassing all of the physical and social sciences. Given the ubiquity of specialized terminology in each field, enabling researchers to find, understand, and use all of the heterogeneous data needed for polar research continues to be a bottleneck. Within the informatics community, semantics has broadly accepted as a solution to these problems, yet progress in developing reusable semantic resources has been slow. The NSF-funded ClearEarth project has been adapting the methods and tools from other communities such as Biomedicine to the Earth sciences with the goal of enhancing progress and the rate at which the needed semantic resources can be created. One of the outcomes of the project has been a better understanding of the differences in the way linguists and physical scientists understand disciplinary text. One example of these differences is the tendency for each discipline and often disciplinary subfields to expend effort in creating discipline specific glossaries where individual terms often are comprised of more than one word (e.g., first-year sea ice). Often each term in a glossary is imbued with substantial contextual or physical meaning - meanings which are rarely explicitly called out within disciplinary texts; meaning which are therefore not immediately accessible to those outside that discipline or subfield; meanings which can often be represented semantically. Here we show how recognition of these difference and the use of glossaries can be used to speed up the annotation processes endemic to NLP, enable inter-community recognition and possible reconciliation of terminology differences. A number of processes and tools will be described, as will progress towards semi-automated generation of ontology structures.

  6. AUTOMATING THE DATA SECURITY PROCESS

    Directory of Open Access Journals (Sweden)

    Florin Ogigau-Neamtiu

    2017-11-01

    Full Text Available Contemporary organizations face big data security challenges in the cyber environment due to modern threats and actual business working model which relies heavily on collaboration, data sharing, tool integration, increased mobility, etc. The nowadays data classification and data obfuscation selection processes (encryption, masking or tokenization suffer because of the human implication in the process. Organizations need to shirk data security domain by classifying information based on its importance, conduct risk assessment plans and use the most cost effective data obfuscation technique. The paper proposes a new model for data protection by using automated machine decision making procedures to classify data and to select the appropriate data obfuscation technique. The proposed system uses natural language processing capabilities to analyze input data and to select the best course of action. The system has capabilities to learn from previous experiences thus improving itself and reducing the risk of wrong data classification.

  7. Writing in science: Exploring teachers' and students' views of the nature of science in language enriched environments

    Science.gov (United States)

    Decoito, Isha

    Writing in science can be used to address some of the issues relevant to contemporary scientific literacy, such as the nature of science, which describes the scientific enterprise for science education. This has implications for the kinds of writing tasks students should attempt in the classroom, and for how students should understand the rationale and claims of these tasks. While scientific writing may train the mind to think scientifically in a disciplined and structured way thus encouraging students to gain access to the public domain of scientific knowledge, the counter-argument is that students need to be able to express their thoughts freely in their own language. Writing activities must aim to promote philosophical and epistemological views of science that accurately portray contemporary science. This mixed-methods case study explored language-enriched environments, in this case, secondary science classrooms with a focus on teacher-developed activities, involving diversified writing styles, that were directly linked to the science curriculum. The research foci included: teachers' implementation of these activities in their classrooms; how the activities reflected the teachers' nature of science views; common attributes between students' views of science and how they represented science in their writings; and if, and how the activities influenced students' nature of science views. Teachers' and students' views of writing and the nature of science are illustrated through pre-and post-questionnaire responses; interviews; student work; and classroom observations. Results indicated that diversified writing activities have the potential to accurately portray science to students, personalize learning in science, improve students' overall attitude towards science, and enhance scientific literacy through learning science, learning about science, and doing science. Further research is necessary to develop an understanding of whether the choice of genre has an

  8. An Expedient Study on Back-Propagation (BPN) Neural Networks for Modeling Automated Evaluation of the Answers and Progress of Deaf Students' That Possess Basic Knowledge of the English Language and Computer Skills

    Science.gov (United States)

    Vrettaros, John; Vouros, George; Drigas, Athanasios S.

    This article studies the expediency of using neural networks technology and the development of back-propagation networks (BPN) models for modeling automated evaluation of the answers and progress of deaf students' that possess basic knowledge of the English language and computer skills, within a virtual e-learning environment. The performance of the developed neural models is evaluated with the correlation factor between the neural networks' response values and the real value data as well as the percentage measurement of the error between the neural networks' estimate values and the real value data during its training process and afterwards with unknown data that weren't used in the training process.

  9. Computer simulation as an important approach to explore language universal. Comment on "Dependency distance: a new perspective on syntactic patterns in natural languages" by Haitao Liu et al.

    Science.gov (United States)

    Lu, Qian

    2017-07-01

    Exploring language universal is one of the major goals of linguistic researches, which are largely devoted to answering the ;Platonic questions; in linguistics, that is, what is the language knowledge, how to get and use this knowledge. However, if solely guided by linguistic intuition, it is very difficult for syntactic studies to answer these questions, or to achieve abstractions in the scientific sense. This suggests that linguistic analyses based on the probability theory may provide effective ways to investigate into language universals in terms of biological motivations or cognitive psychological mechanisms. With the view that ;Language is a human-driven system;, Liu, Xu & Liang's review [1] pointed out that dependency distance minimization (DDM), which has been corroborated by big data analysis of corpus, may be a language universal shaped in language evolution, a universal that has profound effect on syntactic patterns.

  10. Performance analysis of CRF-based learning for processing WoT application requests expressed in natural language.

    Science.gov (United States)

    Yoon, Young

    2016-01-01

    In this paper, we investigate the effectiveness of a CRF-based learning method for identifying necessary Web of Things (WoT) application components that would satisfy the users' requests issued in natural language. For instance, a user request such as "archive all sports breaking news" can be satisfied by composing a WoT application that consists of ESPN breaking news service and Dropbox as a storage service. We built an engine that can identify the necessary application components by recognizing a main act (MA) or named entities (NEs) from a given request. We trained this engine with the descriptions of WoT applications (called recipes) that were collected from IFTTT WoT platform. IFTTT hosts over 300 WoT entities that offer thousands of functions referred to as triggers and actions. There are more than 270,000 publicly-available recipes composed with those functions by real users. Therefore, the set of these recipes is well-qualified for the training of our MA and NE recognition engine. We share our unique experience of generating the training and test set from these recipe descriptions and assess the performance of the CRF-based language method. Based on the performance evaluation, we introduce further research directions.

  11. Automated model building

    CERN Document Server

    Caferra, Ricardo; Peltier, Nicholas

    2004-01-01

    This is the first book on automated model building, a discipline of automated deduction that is of growing importance Although models and their construction are important per se, automated model building has appeared as a natural enrichment of automated deduction, especially in the attempt to capture the human way of reasoning The book provides an historical overview of the field of automated deduction, and presents the foundations of different existing approaches to model construction, in particular those developed by the authors Finite and infinite model building techniques are presented The main emphasis is on calculi-based methods, and relevant practical results are provided The book is of interest to researchers and graduate students in computer science, computational logic and artificial intelligence It can also be used as a textbook in advanced undergraduate courses

  12. Reproducibility in Natural Language Processing: A Case Study of Two R Libraries for Mining PubMed/MEDLINE

    Science.gov (United States)

    Cohen, K. Bretonnel; Xia, Jingbo; Roeder, Christophe; Hunter, Lawrence E.

    2018-01-01

    There is currently a crisis in science related to highly publicized failures to reproduce large numbers of published studies. The current work proposes, by way of case studies, a methodology for moving the study of reproducibility in computational work to a full stage beyond that of earlier work. Specifically, it presents a case study in attempting to reproduce the reports of two R libraries for doing text mining of the PubMed/MEDLINE repository of scientific publications. The main findings are that a rational paradigm for reproduction of natural language processing papers can be established; the advertised functionality was difficult, but not impossible, to reproduce; and reproducibility studies can produce additional insights into the functioning of the published system. Additionally, the work on reproducibility lead to the production of novel user-centered documentation that has been accessed 260 times since its publication—an average of once a day per library.

  13. Computer-Aided TRIZ Ideality and Level of Invention Estimation Using Natural Language Processing and Machine Learning

    Science.gov (United States)

    Adams, Christopher; Tate, Derrick

    Patent textual descriptions provide a wealth of information that can be used to understand the underlying design approaches that result in the generation of novel and innovative technology. This article will discuss a new approach for estimating Degree of Ideality and Level of Invention metrics from the theory of inventive problem solving (TRIZ) using patent textual information. Patent text includes information that can be used to model both the functions performed by a design and the associated costs and problems that affect a design’s value. The motivation of this research is to use patent data with calculation of TRIZ metrics to help designers understand which combinations of system components and functions result in creative and innovative design solutions. This article will discuss in detail methods to estimate these TRIZ metrics using natural language processing and machine learning with the use of neural networks.

  14. Analyzing discourse and text complexity for learning and collaborating a cognitive approach based on natural language processing

    CERN Document Server

    Dascălu, Mihai

    2014-01-01

    With the advent and increasing popularity of Computer Supported Collaborative Learning (CSCL) and e-learning technologies, the need of automatic assessment and of teacher/tutor support for the two tightly intertwined activities of comprehension of reading materials and of collaboration among peers has grown significantly. In this context, a polyphonic model of discourse derived from Bakhtin’s work as a paradigm is used for analyzing both general texts and CSCL conversations in a unique framework focused on different facets of textual cohesion. As specificity of our analysis, the individual learning perspective is focused on the identification of reading strategies and on providing a multi-dimensional textual complexity model, whereas the collaborative learning dimension is centered on the evaluation of participants’ involvement, as well as on collaboration assessment. Our approach based on advanced Natural Language Processing techniques provides a qualitative estimation of the learning process and enhance...

  15. A Sibling-Mediated Intervention for Children with Autism Spectrum Disorder: Using the Natural Language Paradigm (NLP).

    Science.gov (United States)

    Spector, Vicki; Charlop, Marjorie H

    2017-11-23

    We taught three typically developing siblings to occasion speech by implementing the Natural Language Paradigm (NLP) with their brothers with autism spectrum disorder (ASD). A non-concurrent multiple baseline design across children with ASD and sibling dyads was used. Ancillary behaviors of happiness, play, and joint attention for the children with ASD were recorded. Generalization of speech for the children with ASD across setting and peers was also measured. During baseline, the children with ASD displayed few target speech behaviors and the siblings inconsistently occasioned speech from their brothers. After sibling training, however, they successfully delivered NLP, and in turn, for two of the brothers with ASD, speech reached criterion. Implications of this research suggest the inclusion of siblings in interventions.

  16. Arbitrary symbolism in natural language revisited: when word forms carry meaning.

    Directory of Open Access Journals (Sweden)

    Jamie Reilly

    Full Text Available Cognitive science has a rich history of interest in the ways that languages represent abstract and concrete concepts (e.g., idea vs. dog. Until recently, this focus has centered largely on aspects of word meaning and semantic representation. However, recent corpora analyses have demonstrated that abstract and concrete words are also marked by phonological, orthographic, and morphological differences. These regularities in sound-meaning correspondence potentially allow listeners to infer certain aspects of semantics directly from word form. We investigated this relationship between form and meaning in a series of four experiments. In Experiments 1-2 we examined the role of metalinguistic knowledge in semantic decision by asking participants to make semantic judgments for aurally presented nonwords selectively varied by specific acoustic and phonetic parameters. Participants consistently associated increased word length and diminished wordlikeness with abstract concepts. In Experiment 3, participants completed a semantic decision task (i.e., abstract or concrete for real words varied by length and concreteness. Participants were more likely to misclassify longer, inflected words (e.g., "apartment" as abstract and shorter uninflected abstract words (e.g., "fate" as concrete. In Experiment 4, we used a multiple regression to predict trial level naming data from a large corpus of nouns which revealed significant interaction effects between concreteness and word form. Together these results provide converging evidence for the hypothesis that listeners map sound to meaning through a non-arbitrary process using prior knowledge about statistical regularities in the surface forms of words.

  17. Neurolinguistic approach to natural language processing with applications to medical text analysis.

    Science.gov (United States)

    Duch, Włodzisław; Matykiewicz, Paweł; Pestian, John

    2008-12-01

    Understanding written or spoken language presumably involves spreading neural activation in the brain. This process may be approximated by spreading activation in semantic networks, providing enhanced representations that involve concepts not found directly in the text. The approximation of this process is of great practical and theoretical interest. Although activations of neural circuits involved in representation of words rapidly change in time snapshots of these activations spreading through associative networks may be captured in a vector model. Concepts of similar type activate larger clusters of neurons, priming areas in the left and right hemisphere. Analysis of recent brain imaging experiments shows the importance of the right hemisphere non-verbal clusterization. Medical ontologies enable development of a large-scale practical algorithm to re-create pathways of spreading neural activations. First concepts of specific semantic type are identified in the text, and then all related concepts of the same type are added to the text, providing expanded representations. To avoid rapid growth of the extended feature space after each step only the most useful features that increase document clusterization are retained. Short hospital discharge summaries are used to illustrate how this process works on a real, very noisy data. Expanded texts show significantly improved clustering and may be classified with much higher accuracy. Although better approximations to the spreading of neural activations may be devised a practical approach presented in this paper helps to discover pathways used by the brain to process specific concepts, and may be used in large-scale applications.

  18. From telegraphic to natural language: an expansion system in a pictogrambased AAC application

    OpenAIRE

    Pahisa Solé, Joan

    2017-01-01

    En aquesta tesi doctoral, presentem un sistema de compansió que transforma el llenguatge telegràfic (frases formades per paraules de contingut no flexionades), derivat de la comunicació augmentativa i alternativa (CAA) basada en pictogrames, a llenguatge natural en català i en castellà. El sistema ha sigut dissenyat per millorar la comunicació de persones usuàries de CAA que habitualment tenen greus problemes a la parla, així com problemes motrius, i que utilitzen mètodes de comunicació basat...

  19. Automated nuclear materials accounting

    International Nuclear Information System (INIS)

    Pacak, P.; Moravec, J.

    1982-01-01

    An automated state system of accounting for nuclear materials data was established in Czechoslovakia in 1979. A file was compiled of 12 programs in the PL/1 language. The file is divided into four groups according to logical associations, namely programs for data input and checking, programs for handling the basic data file, programs for report outputs in the form of worksheets and magnetic tape records, and programs for book inventory listing, document inventory handling and materials balance listing. A similar automated system of nuclear fuel inventory for a light water reactor was introduced for internal purposes in the Institute of Nuclear Research (UJV). (H.S.)

  20. Chemical language and warfare of bacterial natural products in bacteria-nematode-insect interactions.

    Science.gov (United States)

    Shi, Yi-Ming; Bode, Helge B

    2018-01-23

    Covering: up to November 2017Organismic interaction is one of the fundamental principles for survival in any ecosystem. Today, numerous examples show the interaction between microorganisms like bacteria and higher eukaryotes that can be anything between mutualistic to parasitic/pathogenic symbioses. There is also increasing evidence that microorganisms are used by higher eukaryotes not only for the supply of essential factors like vitamins but also as biological weapons to protect themselves or to kill other organisms. Excellent examples for such systems are entomopathogenic nematodes of the genera Heterorhabditis and Steinernema that live in mutualistic symbiosis with bacteria of the genera Photorhabdus and Xenorhabdus, respectively. Although these systems have been used successfully in organic farming on an industrial scale, it was only shown during the last 15 years that several different natural products (NPs) produced by the bacteria play key roles in the complex life cycle of the bacterial symbionts, the nematode host and the insect prey that is killed by and provides nutrients for the nematode-bacteria pair. Since the bacteria can switch from mutualistic to pathogenic lifestyle, interacting with two different types of higher eukaryotes, and since the full system with all players can be established in the lab, they are promising model systems to elucidate the natural function of microbial NPs. This review summarizes the current knowledge as well as open questions for NPs from Photorhabdus and Xenorhabdus and tries to assign their roles in the tritrophic relationship.

  1. Intelligent system for control and automation of natural gas distribution operation; Sistema inteligente de controle e automacao da operacao de distribuicao de gas natural

    Energy Technology Data Exchange (ETDEWEB)

    Scucuglia, Jose W.; Souza, Celso C. [Universidade para o Desenvolvimento do Estado e da Regiao do Pantanal (UNIDERP), Campo Grande, MS (Brazil). Curso de Engenharia Eletrica; Patricio, Cristian M.M.M.; Cruz, Lauro C.; Reis, Antonio M.; Cortez, Marco A.A.; Maldonado, Waldemar; Rosa, Willian A. [Universidade para o Desenvolvimento do Estado e da Regiao do Pantanal (UNIDERP), Campo Grande, MS (Brazil). Nucleo de Energia, Automacao e Controle; Teixeira, Marcelo C.M. [UNESP, Ilha Solteira, SP (Brazil). Faculdade de Engenharia Eletrica; Carrasco, Benjamim [PETROBRAS, Rio de Janeiro, RJ (Brazil)

    2004-07-01

    The present work has as objective to present the development of a dedicated intelligent system to the operation of natural gas distribution. The system adds tools of project, simulation, supervision and control of the flow of natural gas in networks distribution, and is composed of hardware and intelligent software. The developed software possess friendly graphical interface, so that the operator composing visually the distribution network automatically, composes a mathematical model formed by a set of differential equations, being solved by the Newton-Raphson method. This tool of simulation allows, in function of network topology, to get through simulation the conditions gas flow in each point of the loop flow. The micro controlled hardware for acquisition of the data in real time and control of valves was developed. The hardware possesses flexible communication (Radio Frequency, Ethernet and Optical Fiber), intelligence for decision taking and auto test of its proper functioning, so that guarantee security in the operations. An implanted neural system in software propitiates the control monitoring of the characteristics operation and conditions of leak with loss of load, identifying inclusive the place of this leak along of the duct. A system with national technology was gotten, of low cost and high added technological value. (author)

  2. Dependency distance in language evolution. Comment on "Dependency distance: A new perspective on syntactic patterns in natural languages" by Haitao Liu et al.

    Science.gov (United States)

    Liu, Bingli; Chen, Xinying

    2017-07-01

    In the target article [1], Liu et al. provide an informative introduction to the dependency distance studies and proclaim that language syntactic patterns, that relate to the dependency distance, are associated with human cognitive mechanisms, such as limited working memory and syntax processing. Therefore, such syntactic patterns are probably 'human-driven' language universals. Sufficient evidence based on big data analysis is also given in the article for supporting this idea. The hypotheses generally seem very convincing yet still need further tests from various perspectives. Diachronic linguistic study based on authentic language data, on our opinion, can be one of those 'further tests'.

  3. Task Effects on Linguistic Complexity and Accuracy: A Large-Scale Learner Corpus Analysis Employing Natural Language Processing Techniques

    Science.gov (United States)

    Alexopoulou, Theodora; Michel, Marije; Murakami, Akira; Meurers, Detmar

    2017-01-01

    Large-scale learner corpora collected from online language learning platforms, such as the EF-Cambridge Open Language Database (EFCAMDAT), provide opportunities to analyze learner data at an unprecedented scale. However, interpreting the learner language in such corpora requires a precise understanding of tasks: How does the prompt and input of a…

  4. Language and human nature: Kurt Goldstein's neurolinguistic foundation of a holistic philosophy.

    Science.gov (United States)

    Ludwig, David

    2012-01-01

    Holism in interwar Germany provides an excellent example for social and political influences on scientific developments. Deeply impressed by the ubiquitous invocation of a cultural crisis, biologists, physicians, and psychologists presented holistic accounts as an alternative to the "mechanistic worldview" of the nineteenth century. Although the ideological background of these accounts is often blatantly obvious, many holistic scientists did not content themselves with a general opposition to a mechanistic worldview but aimed at a rational foundation of their holistic projects. This article will discuss the work of Kurt Goldstein, who is known for both his groundbreaking contributions to neuropsychology and his holistic philosophy of human nature. By focusing on Goldstein's neurolinguistic research, I want to reconstruct the empirical foundations of his holistic program without ignoring its cultural background. In this sense, Goldstein's work provides a case study for the formation of a scientific theory through the complex interplay between specific empirical evidences and the general cultural developments of the Weimar Republic. © 2012 Wiley Periodicals, Inc.

  5. Generación automática del diagrama entidad-relación y su representación en SQL desde un lenguaje controlado (UN-LENCEP Automatic generation of entity-relationship diagram and its representation in SQL from a controlled language (UN-LENCEP

    Directory of Open Access Journals (Sweden)

    Carlos Mario Zapata Jaramillo

    2011-01-01

    Full Text Available Entidad-relación es uno de los diagramas que se utilizan en el desarrollo de modelos para representar la información de un dominio. Con el fin de agilizar y mejorar el proceso de desarrollo de software, diferentes propuestas surgieron para contribuir en la obtención automática o semiautomática del diagrama entidad-relación. Varias de estas propuestas utilizan como punto de partida lenguaje natural o lenguaje controlado, mientras otras propuestas utilizan representaciones intermedias. Los interesados en el desarrollo de una aplicación de software no suelen comprender varias de las representaciones utilizadas sin tener previa capacitación, lo cual restringe la participación activa del interesado en todas las etapas del desarrollo. Con el fin de solucionar estos problemas, en este artículo se propone un conjunto de reglas heurísticas para la obtención automática del diagrama entidad-relación y su representación en SQL. Se toma como punto de partida el lenguaje controlado UN-Lencep, que ya se emplea para la generación de otros artefactos en el desarrollo de aplicaciones de software.Entity-relationship diagram (ERD is one of the used in modelling the domain information. Several proposals have emerged for speeding up and improving the software development process by either automatically or semi-automatically obtain the ERD. Natural language, controlled languages, and intermediate representations have been used in such a task. The stakeholders (people with some concern in application development, when untrained, are incapable to understand several of such representations. As a consequence, stakeholder active participation in software development is highly restricted. Trying to solve these problems, a set of heuristic rules for automatically obtaining ERD and its SQL-based equivalence is proposed in this paper. The starting point is UN-Lencep, a controlled language already used for generating other artefacts belonging to software

  6. Planned experiments and corpus based research play a complementary role. Comment on "Dependency distance: A new perspective on syntactic patterns in natural languages" by Haitao Liu et al.

    Science.gov (United States)

    Vasishth, Shravan

    2017-07-01

    This interesting and informative review by Liu and colleagues [17] in this issue covers the full spectrum of research on the idea that in natural language, dependency distance tends to be small. The authors discuss two distinct research threads: experimental work from psycholinguistics on online processes in comprehension and production, and text-corpus studies of dependency length distributions.

  7. Development of a user-friendly interface for the searching of a data base in natural language while using concepts and means of artificial intelligence

    International Nuclear Information System (INIS)

    Pujo, Pascal

    1989-01-01

    This research thesis aimed at the development of a natural-language-based user-friendly interface for the searching of relational data bases. The author first addresses how to store data which will be accessible through an interface in natural language: this organisation must result in as few constraints as possible in query formulation. He briefly presents techniques related to the automatic processing of natural language, and highlights the need for a more user-friendly interface. Then, he presents the developed interface and outlines the user-friendliness and ergonomics of implemented procedures. He shows how the interface has been designed to deliver information and explanations on its processing. This allows the user to control the relevance of the answer. He also indicates the classification of mistakes and errors which may be present in queries in natural language. He finally gives an overview of possible evolutions of the interface, briefly presents deductive functionalities which could expand data management. The handling of complex objects is also addressed [fr

  8. Natural language query system design for interactive information storage and retrieval systems. Presentation visuals. M.S. Thesis Final Report, 1 Jul. 1985 - 31 Dec. 1987

    Science.gov (United States)

    Dominick, Wayne D. (Editor); Liu, I-Hsiung

    1985-01-01

    This Working Paper Series entry represents a collection of presentation visuals associated with the companion report entitled Natural Language Query System Design for Interactive Information Storage and Retrieval Systems, USL/DBMS NASA/RECON Working Paper Series report number DBMS.NASA/RECON-17.

  9. Language Policy, Language Choice and Language Use in the ...

    African Journals Online (AJOL)

    The paper examines the pros and cons of the checkered nature of language use in the Tanzanian Parliament. It focuses on language policy, language choice and the practicality of language use in parliamentary discourse. Right from the eve of independence, the medium of communication in the Tanzanian parliament has ...

  10. Natural remanent magnetization and rock magnetic parameters from the North-East Atlantic continental margin : Insights from a new, automated cryogenic magnetometer at the Geological Survey of Norway

    Science.gov (United States)

    Klug, Martin; Fabian, Karl; Knies, Jochen; Sauer, Simone

    2017-04-01

    Natural remanent magnetization (NRM) and rock magnetic parameters from two locations, West Barents Sea ( 71.6°N,16.2°E) and Vestnesa Ridge, NW Svalbard ( 79.0°N, 6.9°E), were acquired using a new, automatically operating cryogenic magnetometer system at the Geological Survey of Norway. The magnetometer setup comprises an automated robot sample feeding, dynamic operation and measurement monitoring, and customised output-to-database data handling. The setup is designed to dynamically enable a variety of parallel measurements with several coupled devices (e.g. balance, MS2B) to effectively use dead-time in between the otherwise time-consuming measurements with the cryogen magnetometer. Web-based access allows remote quality control and interaction 24/7 and enables high sample throughput. The magnetic properties are combined with geophysical, geochemical measurements and optical imaging, both radiographic and colour images, from high-resolution core-logging. The multidisciplinary approach enables determination and interpretation of content and formation of the magnetic fraction, and its development during diagenetic processes. Besides palaeomagnetic age determination the results offer the opportunity to study sediment transformation processes that have implications for the burial and degradation of organic matter. The results also help to understand long and short-term variability of sediment accumulation. Chemical sediment stability is directly linked to environmental and climate variability in the polar marine environment during the recent past.

  11. Linguistics in Language Education

    Science.gov (United States)

    Kumar, Rajesh; Yunus, Reva

    2014-01-01

    This article looks at the contribution of insights from theoretical linguistics to an understanding of language acquisition and the nature of language in terms of their potential benefit to language education. We examine the ideas of innateness and universal language faculty, as well as multilingualism and the language-society relationship. Modern…

  12. Making Sense of Big Textual Data for Health Care: Findings from the Section on Clinical Natural Language Processing.

    Science.gov (United States)

    Névéol, A; Zweigenbaum, P

    2017-08-01

    Objectives: To summarize recent research and present a selection of the best papers published in 2016 in the field of clinical Natural Language Processing (NLP). Method: A survey of the literature was performed by the two section editors of the IMIA Yearbook NLP section. Bibliographic databases were searched for papers with a focus on NLP efforts applied to clinical texts or aimed at a clinical outcome. Papers were automatically ranked and then manually reviewed based on titles and abstracts. A shortlist of candidate best papers was first selected by the section editors before being peer-reviewed by independent external reviewers. Results: The five clinical NLP best papers provide a contribution that ranges from emerging original foundational methods to transitioning solid established research results to a practical clinical setting. They offer a framework for abbreviation disambiguation and coreference resolution, a classification method to identify clinically useful sentences, an analysis of counseling conversations to improve support to patients with mental disorder and grounding of gradable adjectives. Conclusions: Clinical NLP continued to thrive in 2016, with an increasing number of contributions towards applications compared to fundamental methods. Fundamental work addresses increasingly complex problems such as lexical semantics, coreference resolution, and discourse analysis. Research results translate into freely available tools, mainly for English. Georg Thieme Verlag KG Stuttgart.

  13. Integrating natural language processing expertise with patient safety event review committees to improve the analysis of medication events.

    Science.gov (United States)

    Fong, Allan; Harriott, Nicole; Walters, Donna M; Foley, Hanan; Morrissey, Richard; Ratwani, Raj R

    2017-08-01

    Many healthcare providers have implemented patient safety event reporting systems to better understand and improve patient safety. Reviewing and analyzing these reports is often time consuming and resource intensive because of both the quantity of reports and length of free-text descriptions in the reports. Natural language processing (NLP) experts collaborated with clinical experts on a patient safety committee to assist in the identification and analysis of medication related patient safety events. Different NLP algorithmic approaches were developed to identify four types of medication related patient safety events and the models were compared. Well performing NLP models were generated to categorize medication related events into pharmacy delivery delays, dispensing errors, Pyxis discrepancies, and prescriber errors with receiver operating characteristic areas under the curve of 0.96, 0.87, 0.96, and 0.81 respectively. We also found that modeling the brief without the resolution text generally improved model performance. These models were integrated into a dashboard visualization to support the patient safety committee review process. We demonstrate the capabilities of various NLP models and the use of two text inclusion strategies at categorizing medication related patient safety events. The NLP models and visualization could be used to improve the efficiency of patient safety event data review and analysis. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. An Introduction to Natural Language Processing: How You Can Get More From Those Electronic Notes You Are Generating.

    Science.gov (United States)

    Kimia, Amir A; Savova, Guergana; Landschaft, Assaf; Harper, Marvin B

    2015-07-01

    Electronically stored clinical documents may contain both structured data and unstructured data. The use of structured clinical data varies by facility, but clinicians are familiar with coded data such as International Classification of Diseases, Ninth Revision, Systematized Nomenclature of Medicine-Clinical Terms codes, and commonly other data including patient chief complaints or laboratory results. Most electronic health records have much more clinical information stored as unstructured data, for example, clinical narrative such as history of present illness, procedure notes, and clinical decision making are stored as unstructured data. Despite the importance of this information, electronic capture or retrieval of unstructured clinical data has been challenging. The field of natural language processing (NLP) is undergoing rapid development, and existing tools can be successfully used for quality improvement, research, healthcare coding, and even billing compliance. In this brief review, we provide examples of successful uses of NLP using emergency medicine physician visit notes for various projects and the challenges of retrieving specific data and finally present practical methods that can run on a standard personal computer as well as high-end state-of-the-art funded processes run by leading NLP informatics researchers.

  15. Rethinking information delivery: using a natural language processing application for point-of-care data discovery*†

    Science.gov (United States)

    Workman, T. Elizabeth; Stoddart, Joan M

    2012-01-01

    Objective: This paper examines the use of Semantic MEDLINE, a natural language processing application enhanced with a statistical algorithm known as Combo, as a potential decision support tool for clinicians. Semantic MEDLINE summarizes text in PubMed citations, transforming it into compact declarations that are filtered according to a user's information need that can be displayed in a graphic interface. Integration of the Combo algorithm enables Semantic MEDLINE to deliver information salient to many diverse needs. Methods: The authors selected three disease topics and crafted PubMed search queries to retrieve citations addressing the prevention of these diseases. They then processed the citations with Semantic MEDLINE, with the Combo algorithm enhancement. To evaluate the results, they constructed a reference standard for each disease topic consisting of preventive interventions recommended by a commercial decision support tool. Results: Semantic MEDLINE with Combo produced an average recall of 79% in primary and secondary analyses, an average precision of 45%, and a final average F-score of 0.57. Conclusion: This new approach to point-of-care information delivery holds promise as a decision support tool for clinicians. Health sciences libraries could implement such technologies to deliver tailored information to their users. PMID:22514507

  16. Per-service supervised learning for identifying desired WoT apps from user requests in natural language.

    Science.gov (United States)

    Yoon, Young

    2017-01-01

    Web of Things (WoT) platforms are growing fast so as the needs for composing WoT apps more easily and efficiently. We have recently commenced the campaign to develop an interface where users can issue requests for WoT apps entirely in natural language. This requires an effort to build a system that can learn to identify relevant WoT functions that fulfill user's requests. In our preceding work, we trained a supervised learning system with thousands of publicly-available IFTTT app recipes based on conditional random fields (CRF). However, the sub-par accuracy and excessive training time motivated us to devise a better approach. In this paper, we present a novel solution that creates a separate learning engine for each trigger service. With this approach, parallel and incremental learning becomes possible. For inference, our system first identifies the most relevant trigger service for a given user request by using an information retrieval technique. Then, the learning engine associated with the trigger service predicts the most likely pair of trigger and action functions. We expect that such two-phase inference method given parallel learning engines would improve the accuracy of identifying related WoT functions. We verify our new solution through the empirical evaluation with training and test sets sampled from a pool of refined IFTTT app recipes. We also meticulously analyze the characteristics of the recipes to find future research directions.

  17. Interpreting the Fuzzy Semantics of Natural-Language Spatial Relation Terms with the Fuzzy Random Forest Algorithm

    Directory of Open Access Journals (Sweden)

    Xiaonan Wang

    2018-02-01

    Full Text Available Naïve Geography, intelligent geographical information systems (GIS, and spatial data mining especially from social media all rely on natural-language spatial relations (NLSR terms to incorporate commonsense spatial knowledge into conventional GIS and to enhance the semantic interoperability of spatial information in social media data. Yet, the inherent fuzziness of NLSR terms makes them challenging to interpret. This study proposes to interpret the fuzzy semantics of NLSR terms using the fuzzy random forest (FRF algorithm. Based on a large number of fuzzy samples acquired by transforming a set of crisp samples with the random forest algorithm, two FRF models with different membership assembling strategies are trained to obtain the fuzzy interpretation of three line-region geometric representations using 69 NLSR terms. Experimental results demonstrate that the two FRF models achieve good accuracy in interpreting line-region geometric representations using fuzzy NLSR terms. In addition, fuzzy classification of FRF can interpret the fuzzy semantics of NLSR terms more fully than their crisp counterparts.

  18. Informatics in radiology: RADTF: a semantic search-enabled, natural language processor-generated radiology teaching file.

    Science.gov (United States)

    Do, Bao H; Wu, Andrew; Biswal, Sandip; Kamaya, Aya; Rubin, Daniel L

    2010-11-01

    Storing and retrieving radiology cases is an important activity for education and clinical research, but this process can be time-consuming. In the process of structuring reports and images into organized teaching files, incidental pathologic conditions not pertinent to the primary teaching point can be omitted, as when a user saves images of an aortic dissection case but disregards the incidental osteoid osteoma. An alternate strategy for identifying teaching cases is text search of reports in radiology information systems (RIS), but retrieved reports are unstructured, teaching-related content is not highlighted, and patient identifying information is not removed. Furthermore, searching unstructured reports requires sophisticated retrieval methods to achieve useful results. An open-source, RadLex(®)-compatible teaching file solution called RADTF, which uses natural language processing (NLP) methods to process radiology reports, was developed to create a searchable teaching resource from the RIS and the picture archiving and communication system (PACS). The NLP system extracts and de-identifies teaching-relevant statements from full reports to generate a stand-alone database, thus converting existing RIS archives into an on-demand source of teaching material. Using RADTF, the authors generated a semantic search-enabled, Web-based radiology archive containing over 700,000 cases with millions of images. RADTF combines a compact representation of the teaching-relevant content in radiology reports and a versatile search engine with the scale of the entire RIS-PACS collection of case material. ©RSNA, 2010

  19. Design Space Toolbox V2: Automated Software Enabling a Novel Phenotype-Centric Modeling Strategy for Natural and Synthetic Biological Systems.

    Science.gov (United States)

    Lomnitz, Jason G; Savageau, Michael A

    2016-01-01

    , and a negative channel that decreases the count. This example shows the power of these new automated methods to rapidly identify behaviors of interest and efficiently predict parameter values for their realization. These tools may be applied to understand complex natural circuitry and to aid in the rational design of synthetic circuits.

  20. Design Space Toolbox V2: Automated Software Enabling a Novel Phenotype-Centric Modeling Strategy for Natural and Synthetic Biological Systems

    Science.gov (United States)

    Lomnitz, Jason G.; Savageau, Michael A.

    2016-01-01

    , and a negative channel that decreases the count. This example shows the power of these new automated methods to rapidly identify behaviors of interest and efficiently predict parameter values for their realization. These tools may be applied to understand complex natural circuitry and to aid in the rational design of synthetic circuits. PMID:27462346

  1. Design Space Toolbox V2: Automated Software Enabling a Novel Phenotype-Centric Modeling Strategy for Natural and Synthetic Biological Systems

    Directory of Open Access Journals (Sweden)

    Jason Gunther Lomnitz

    2016-07-01

    the count, and a negative channel that decreases the count. This example shows the power of these new automated methods to rapidly identify behaviors of interest and efficiently predict parameter values for their realization. These tools may be applied to understand complex natural circuitry and to aid in the rational design of synthetic circuits.

  2. Cross-language diversity, head-direction and grammars. Comment on "Dependency distance: A new perspective on syntactic patterns in natural languages" by Haitao Liu et al.

    Science.gov (United States)

    Hudson, Richard

    2017-07-01

    This paper [4] - referred to below as 'LXL' - is an excellent example of cross-disciplinary work which brings together three very different disciplines, each with its different methods: quantitative computational linguistics (exploring big data), psycholinguistics (using experiments with human subjects) and theoretical linguistics (building models based on language descriptions). The measured unit is the dependency between two words, as defined by theoretical linguistics, and the question is how the length of this dependency affects the choices made by writers, as revealed in big data from a wide range of languages.

  3. Automated Generation of OCL Constraints: NL based Approach vs Pattern Based Approach

    Directory of Open Access Journals (Sweden)

    IMRAN SARWAR BAJWA

    2017-04-01

    Full Text Available This paper presents an approach used for automated generations of software constraints. In this model, the SBVR (Semantics of Business Vocabulary and Rules based semi-formal representation is obtained from the syntactic and semantic analysis of a NL (Natural Language (such as English sentence. A SBVR representation is easy to translate to other formal languages as SBVR is based on higher-order logic like other formal languages such as OCL (Object Constraint Language. The proposed model endows with a systematic and powerful system of incorporating NL knowledge on the formal languages. A prototype is constructed in Java (an Eclipse plug-in as a proof of the concept. The performance was tested for a few sample texts taken from existing research thesis reports and books

  4. The Importance of Natural Change in Planning School-Based Intervention for Children with Developmental Language Impairment (DLI)

    Science.gov (United States)

    Botting, Nicola; Gaynor, Marguerite; Tucker, Katie; Orchard-Lisle, Ginnie

    2016-01-01

    Some reports suggest that there is an increase in the number of children identified as having developmental language impairment (Bercow, 2008). yet resource issues have meant that many speech and language therapy services have compromised provision in some way. Thus, efficient ways of identifying need and prioritizing intervention are required.…

  5. Understanding Language in Education and Grade 4 Reading Performance Using a "Natural Experiment" of Botswana and South Africa

    Science.gov (United States)

    Shepherd, Debra Lynne

    2018-01-01

    The regional and cultural closeness of Botswana and South Africa, as well as differences in their political histories and language policy stances, offers a unique opportunity to evaluate the role of language in reading outcomes. This study aims to empirically test the effect of exposure to mother tongue and English instruction on the reading…

  6. Specialized languages

    DEFF Research Database (Denmark)

    Mousten, Birthe; Laursen, Anne Lise

    2016-01-01

    -disciplinarily, because they work with both derivative and contributory approaches. Derivative, because specialized language retrieves its philosophy of science as well as methods from both the natural sciences, social sciences and humanistic sciences. Contributory because language results support the communication...... science fields communicate their findings. With this article, we want to create awareness of the work in this special area of language studies and of the inherent cross-disciplinarity that makes LSP special compared to common-core language. An acknowledgement of the importance of this field both in terms...

  7. ESTIMATING DBH OF TREES EMPLOYING MULTIPLE LINEAR REGRESSION OF THE BEST LIDAR-DERIVED PARAMETER COMBINATION AUTOMATED IN PYTHON IN A NATURAL BROADLEAF FOREST IN THE PHILIPPINES

    Directory of Open Access Journals (Sweden)

    C. A. G. Ibanez

    2016-06-01

    Full Text Available Diameter-at-Breast-Height Estimation is a prerequisite in various allometric equations estimating important forestry indices like stem volume, basal area, biomass and carbon stock. LiDAR Technology has a means of directly obtaining different forest parameters, except DBH, from the behavior and characteristics of point cloud unique in different forest classes. Extensive tree inventory was done on a two-hectare established sample plot in Mt. Makiling, Laguna for a natural growth forest. Coordinates, height, and canopy cover were measured and types of species were identified to compare to LiDAR derivatives. Multiple linear regression was used to get LiDAR-derived DBH by integrating field-derived DBH and 27 LiDAR-derived parameters at 20m, 10m, and 5m grid resolutions. To know the best combination of parameters in DBH Estimation, all possible combinations of parameters were generated and automated using python scripts and additional regression related libraries such as Numpy, Scipy, and Scikit learn were used. The combination that yields the highest r-squared or coefficient of determination and lowest AIC (Akaike’s Information Criterion and BIC (Bayesian Information Criterion was determined to be the best equation. The equation is at its best using 11 parameters at 10mgrid size and at of 0.604 r-squared, 154.04 AIC and 175.08 BIC. Combination of parameters may differ among forest classes for further studies. Additional statistical tests can be supplemented to help determine the correlation among parameters such as Kaiser- Meyer-Olkin (KMO Coefficient and the Barlett’s Test for Spherecity (BTS.

  8. Library Automation

    OpenAIRE

    Dhakne, B. N.; Giri, V. V; Waghmode, S. S.

    2010-01-01

    New technologies library provides several new materials, media and mode of storing and communicating the information. Library Automation reduces the drudgery of repeated manual efforts in library routine. By use of library automation collection, Storage, Administration, Processing, Preservation and communication etc.

  9. The Common Alerting Protocol (CAP) and Emergency Data Exchange Language (EDXL) - Application in Early Warning Systems for Natural Hazard

    Science.gov (United States)

    Lendholt, Matthias; Hammitzsch, Martin; Wächter, Joachim

    2010-05-01

    The Common Alerting Protocol (CAP) [1] is an XML-based data format for exchanging public warnings and emergencies between alerting technologies. In conjunction with the Emergency Data Exchange Language (EDXL) Distribution Element (-DE) [2] these data formats can be used for warning message dissemination in early warning systems for natural hazards. Application took place in the DEWS (Distance Early Warning System) [3] project where CAP serves as central message format containing both human readable warnings and structured data for automatic processing by message receivers. In particular the spatial reference capabilities are of paramount importance both in CAP and EDXL. Affected areas are addressable via geo codes like HASC (Hierarchical Administrative Subdivision Codes) [4] or UN/LOCODE [5] but also with arbitrary polygons that can be directly generated out of GML [6]. For each affected area standardized criticality values (urgency, severity and certainty) have to be set but also application specific key-value-pairs like estimated time of arrival or maximum inundation height can be specified. This enables - together with multilingualism, message aggregation and message conversion for different dissemination channels - the generation of user-specific tailored warning messages. [1] CAP, http://www.oasis-emergency.org/cap [2] EDXL-DE, http://docs.oasis-open.org/emergency/edxl-de/v1.0/EDXL-DE_Spec_v1.0.pdf [3] DEWS, http://www.dews-online.org [4] HASC, "Administrative Subdivisions of Countries: A Comprehensive World Reference, 1900 Through 1998" ISBN 0-7864-0729-8 [5] UN/LOCODE, http://www.unece.org/cefact/codesfortrade/codes_index.htm [6] GML, http://www.opengeospatial.org/standards/gml

  10. Extensible and Efficient Automation Through Reflective Tactics

    DEFF Research Database (Denmark)

    Malecha, Gregory; Bengtson, Jesper

    2016-01-01

    automation, where proofs are witnessed by verified decision procedures rather than verbose proof objects. Our techniques center around a verified domain specific language for proving, Rtac, written in Gallina, Coq’s logic. The design of tactics makes it easy to combine them into higher-level automation...

  11. Development of a user friendly interface for database querying in natural language by using concepts and means related to artificial intelligence

    International Nuclear Information System (INIS)

    Pujo, Pascal

    1989-01-01

    This research thesis reports the development of a user-friendly interface in natural language for querying a relational database. The developed system differs from usual approaches for its integrated architecture as the relational model management is totally controlled by the interface. The author first addresses the way to store data in order to make them accessible through an interface in natural language, and more precisely to store data with an organisation which would result in the less possible constraints in query formulation. The author then briefly presents techniques related to automatic processing in natural language, and discusses the implications of a better user-friendliness and for error processing. The next part reports the study of the developed interface: selection of data processing tools, interface development, data management at the interface level, information input by the user. The last chapter proposes an overview of possible evolutions for the interface: use of deductive functionalities, use of an extensional base and of an intentional base to deduce facts from knowledge stores in the extensional base, and handling of complex objects [fr

  12. A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools

    Science.gov (United States)

    2012-01-01

    Background We introduce the linguistic annotation of a corpus of 97 full-text biomedical publications, known as the Colorado Richly Annotated Full Text (CRAFT) corpus. We further assess the performance of existing tools for performing sentence splitting, tokenization, syntactic parsing, and named entity recognition on this corpus. Results Many biomedical natural language processing systems demonstrated large differences between their previously published results and their performance on the CRAFT corpus when tested with the publicly available models or rule sets. Trainable systems differed widely with respect to their ability to build high-performing models based on this data. Conclusions The finding that some systems were able to train high-performing models based on this corpus is additional evidence, beyond high inter-annotator agreement, that the quality of the CRAFT corpus is high. The overall poor performance of various systems indicates that considerable work needs to be done to enable natural language processing systems to work well when the input is full-text journal articles. The CRAFT corpus provides a valuable resource to the biomedical natural language processing community for evaluation and training of new models for biomedical full text publications. PMID:22901054

  13. Process automation

    International Nuclear Information System (INIS)

    Moser, D.R.

    1986-01-01

    Process automation technology has been pursued in the chemical processing industries and to a very limited extent in nuclear fuel reprocessing. Its effective use has been restricted in the past by the lack of diverse and reliable process instrumentation and the unavailability of sophisticated software designed for process control. The Integrated Equipment Test (IET) facility was developed by the Consolidated Fuel Reprocessing Program (CFRP) in part to demonstrate new concepts for control of advanced nuclear fuel reprocessing plants. A demonstration of fuel reprocessing equipment automation using advanced instrumentation and a modern, microprocessor-based control system is nearing completion in the facility. This facility provides for the synergistic testing of all chemical process features of a prototypical fuel reprocessing plant that can be attained with unirradiated uranium-bearing feed materials. The unique equipment and mission of the IET facility make it an ideal test bed for automation studies. This effort will provide for the demonstration of the plant automation concept and for the development of techniques for similar applications in a full-scale plant. A set of preliminary recommendations for implementing process automation has been compiled. Some of these concepts are not generally recognized or accepted. The automation work now under way in the IET facility should be useful to others in helping avoid costly mistakes because of the underutilization or misapplication of process automation. 6 figs

  14. Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing.

    Science.gov (United States)

    Zhai, Haijun; Lingren, Todd; Deleger, Louise; Li, Qi; Kaiser, Megan; Stoutenborough, Laura; Solti, Imre

    2013-04-02

    A high-quality gold standard is vital for supervised, machine learning-based, clinical natural language processing (NLP) systems. In clinical NLP projects, expert annotators traditionally create the gold standard. However, traditional annotation is expensive and time-consuming. To reduce the cost of annotation, general NLP projects have turned to crowdsourcing based on Web 2.0 technology, which involves submitting smaller subtasks to a coordinated marketplace of workers on the Internet. Many studies have been conducted in the area of crowdsourcing, but only a few have focused on tasks in the general NLP field and only a handful in the biomedical domain, usually based upon very small pilot sample sizes. In addition, the quality of the crowdsourced biomedical NLP corpora were never exceptional when compared to traditionally-developed gold standards. The previously reported results on medical named entity annotation task showed a 0.68 F-measure based agreement between crowdsourced and traditionally-developed corpora. Building upon previous work from the general crowdsourcing research, this study investigated the usability of crowdsourcing in the clinical NLP domain with special emphasis on achieving high agreement between crowdsourced and traditionally-developed corpora. To build the gold standard for evaluating the crowdsourcing workers' performance, 1042 clinical trial announcements (CTAs) from the ClinicalTrials.gov website were randomly selected and double annotated for medication names, medication types, and linked attributes. For the experiments, we used CrowdFlower, an Amazon Mechanical Turk-based crowdsourcing platform. We calculated sensitivity, precision, and F-measure to evaluate the quality of the crowd's work and tested the statistical significance (Pcrowdsourced and traditionally-developed annotations. The agreement between the crowd's annotations and the traditionally-generated corpora was high for: (1) annotations (0.87, F-measure for medication names

  15. Automated Scoring of L2 Spoken English with Random Forests

    Science.gov (United States)

    Kobayashi, Yuichiro; Abe, Mariko

    2016-01-01

    The purpose of the present study is to assess second language (L2) spoken English using automated scoring techniques. Automated scoring aims to classify a large set of learners' oral performance data into a small number of discrete oral proficiency levels. In automated scoring, objectively measurable features such as the frequencies of lexical and…

  16. Development and Validation of a Natural Language Processing Tool to Identify Patients Treated for Pneumonia across VA Emergency Departments.

    Science.gov (United States)

    Jones, B E; South, B R; Shao, Y; Lu, C C; Leng, J; Sauer, B C; Gundlapalli, A V; Samore, M H; Zeng, Q

    2018-01-01

     Identifying pneumonia using diagnosis codes alone may be insufficient for research on clinical decision making. Natural language processing (NLP) may enable the inclusion of cases missed by diagnosis codes.  This article (1) develops a NLP tool that identifies the clinical assertion of pneumonia from physician emergency department (ED) notes, and (2) compares classification methods using diagnosis codes versus NLP against a gold standard of manual chart review to identify patients initially treated for pneumonia.  Among a national population of ED visits occurring between 2006 and 2012 across the Veterans Affairs health system, we extracted 811 physician documents containing search terms for pneumonia for training, and 100 random documents for validation. Two reviewers annotated span- and document-level classifications of the clinical assertion of pneumonia. An NLP tool using a support vector machine was trained on the enriched documents. We extracted diagnosis codes assigned in the ED and upon hospital discharge and calculated performance characteristics for diagnosis codes, NLP, and NLP plus diagnosis codes against manual review in training and validation sets.  Among the training documents, 51% contained clinical assertions of pneumonia; in the validation set, 9% were classified with pneumonia, of which 100% contained pneumonia search terms. After enriching with search terms, the NLP system alone demonstrated a recall/sensitivity of 0.72 (training) and 0.55 (validation), and a precision/positive predictive value (PPV) of 0.89 (training) and 0.71 (validation). ED-assigned diagnostic codes demonstrated lower recall/sensitivity (0.48 and 0.44) but higher precision/PPV (0.95 in training, 1.0 in validation); the NLP system identified more "possible-treated" cases than diagnostic coding. An approach combining NLP and ED-assigned diagnostic coding classification achieved the best performance (sensitivity 0.89 and PPV 0.80).  System-wide application of NLP to

  17. The effects of natural language processing on cross-institutional portability of influenza case detection for disease surveillance.

    Science.gov (United States)

    Ferraro, Jeffrey P; Ye, Ye; Gesteland, Per H; Haug, Peter J; Tsui, Fuchiang Rich; Cooper, Gregory F; Van Bree, Rudy; Ginter, Thomas; Nowalk, Andrew J; Wagner, Michael

    2017-05-31

    This study evaluates the accuracy and portability of a natural language processing (NLP) tool for extracting clinical findings of influenza from clinical notes across two large healthcare systems. Effectiveness is evaluated on how well NLP supports downstream influenza case-detection for disease surveillance. We independently developed two NLP parsers, one at Intermountain Healthcare (IH) in Utah and the other at University of Pittsburgh Medical Center (UPMC) using local clinical notes from emergency department (ED) encounters of influenza. We measured NLP parser performance for the presence and absence of 70 clinical findings indicative of influenza. We then developed Bayesian network models from NLP processed reports and tested their ability to discriminate among cases of (1) influenza, (2) non-influenza influenza-like illness (NI-ILI), and (3) 'other' diagnosis. On Intermountain Healthcare reports, recall and precision of the IH NLP parser were 0.71 and 0.75, respectively, and UPMC NLP parser, 0.67 and 0.79. On University of Pittsburgh Medical Center reports, recall and precision of the UPMC NLP parser were 0.73 and 0.80, respectively, and IH NLP parser, 0.53 and 0.80. Bayesian case-detection performance measured by AUROC for influenza versus non-influenza on Intermountain Healthcare cases was 0.93 (using IH NLP parser) and 0.93 (using UPMC NLP parser). Case-detection on University of Pittsburgh Medical Center cases was 0.95 (using UPMC NLP parser) and 0.83 (using IH NLP parser). For influenza versus NI-ILI on Intermountain Healthcare cases performance was 0.70 (using IH NLP parser) and 0.76 (using UPMC NLP parser). On University of Pisstburgh Medical Center cases, 0.76 (using UPMC NLP parser) and 0.65 (using IH NLP parser). In all but one instance (influenza versus NI-ILI using IH cases), local parsers were more effective at supporting case-detection although performances of non-local parsers were reasonable.

  18. Text mining from ontology learning to automated text processing applications

    CERN Document Server

    Biemann, Chris

    2014-01-01

    This book comprises a set of articles that specify the methodology of text mining, describe the creation of lexical resources in the framework of text mining and use text mining for various tasks in natural language processing (NLP). The analysis of large amounts of textual data is a prerequisite to build lexical resources such as dictionaries and ontologies and also has direct applications in automated text processing in fields such as history, healthcare and mobile applications, just to name a few. This volume gives an update in terms of the recent gains in text mining methods and reflects

  19. Building Languages

    Science.gov (United States)

    ... Oral — Natural Gestures, Listening, Speech (Lip) Reading, Speech Auditory-Verbal — Listening, Speech Bilingual — American Sign Language, Finger Spelling, Natural Gestures Cued Speech — Cueing, Speech (Lip) Reading Total Communication — Conceptually Accurate Signed English (CASE), Finger Spelling, Listening, ...

  20. natural

    Directory of Open Access Journals (Sweden)

    Elías Gómez Macías

    2006-01-01

    Full Text Available Partiendo de óxido de magnesio comercial se preparó una suspensión acuosa, la cual se secó y calcinó para conferirle estabilidad térmica. El material, tanto fresco como usado, se caracterizó mediante DRX, área superficial BET y SEM-EPMA. El catalizador mostró una matriz de MgO tipo periclasa con CaO en la superficie. Las pruebas de actividad catalítica se efectuaron en lecho fijo empacado con partículas obtenidas mediante prensado, trituración y clasificación del material. El flujo de reactivos consistió en mezclas gas natural-aire por debajo del límite inferior de inflamabilidad. Para diferentes flujos y temperaturas de entrada de la mezcla reactiva, se midieron las concentraciones de CH4, CO2 y CO en los gases de combustión con un analizador de gases tipo infrarrojo no dispersivo (NDIR. Para alcanzar conversión total de metano se requirió aumentar la temperatura de entrada al lecho a medida que se incrementó el flujo de gases reaccionantes. Los resultados obtenidos permiten desarrollar un sistema de combustión catalítica de bajo costo con un material térmicamente estable, que promueva la alta eficiencia en la combustión de gas natural y elimine los problemas de estabilidad, seguridad y de impacto ambiental negativo inherentes a los procesos de combustión térmica convencional.

  1. Introducing a gender-neutral pronoun in a natural gender language: the influence of time on attitudes and behavior.

    Science.gov (United States)

    Gustafsson Sendén, Marie; Bäck, Emma A; Lindqvist, Anna

    2015-01-01

    The implementation of gender fair language is often associated with negative reactions and hostile attacks on people who propose a change. This was also the case in Sweden in 2012 when a third gender-neutral pronoun hen was proposed as an addition to the already existing Swedish pronouns for she (hon) and he (han). The pronoun hen can be used both generically, when gender is unknown or irrelevant, and as a transgender pronoun for people who categorize themselves outside the gender dichotomy. In this article we review the process from 2012 to 2015. No other language has so far added a third gender-neutral pronoun, existing parallel with two gendered pronouns, that actually have reached the broader population of language users. This makes the situation in Sweden unique. We present data on attitudes toward hen during the past 4 years and analyze how time is associated with the attitudes in the process of introducing hen to the Swedish language. In 2012 the majority of the Swedish population was negative to the word, but already in 2014 there was a significant shift to more positive attitudes. Time was one of the strongest predictors for attitudes also when other relevant factors were controlled for. The actual use of the word also increased, although to a lesser extent than the attitudes shifted. We conclude that new words challenging the binary gender system evoke hostile and negative reactions, but also that attitudes can normalize rather quickly. We see this finding very positive and hope it could motivate language amendments and initiatives for gender-fair language, although the first responses may be negative.

  2. Introducing a gender-neutral pronoun in a natural gender language: the influence of time on attitudes and behavior

    Science.gov (United States)

    Gustafsson Sendén, Marie; Bäck, Emma A.; Lindqvist, Anna

    2015-01-01

    The implementation of gender fair language is often associated with negative reactions and hostile attacks on people who propose a change. This was also the case in Sweden in 2012 when a third gender-neutral pronoun hen was proposed as an addition to the already existing Swedish pronouns for she (hon) and he (han). The pronoun hen can be used both generically, when gender is unknown or irrelevant, and as a transgender pronoun for people who categorize themselves outside the gender dichotomy. In this article we review the process from 2012 to 2015. No other language has so far added a third gender-neutral pronoun, existing parallel with two gendered pronouns, that actually have reached the broader population of language users. This makes the situation in Sweden unique. We present data on attitudes toward hen during the past 4 years and analyze how time is associated with the attitudes in the process of introducing hen to the Swedish language. In 2012 the majority of the Swedish population was negative to the word, but already in 2014 there was a significant shift to more positive attitudes. Time was one of the strongest predictors for attitudes also when other relevant factors were controlled for. The actual use of the word also increased, although to a lesser extent than the attitudes shifted. We conclude that new words challenging the binary gender system evoke hostile and negative reactions, but also that attitudes can normalize rather quickly. We see this finding very positive and hope it could motivate language amendments and initiatives for gender-fair language, although the first responses may be negative. PMID:26191016

  3. Effect of Phonetic Association on Lexis Learning in Natural Language Context: A Comparative Study of English, French and Turkish Words

    Science.gov (United States)

    Ebubekir, Bozavli

    2017-01-01

    Mother tongue acquisition starts with words and grammar acquired spontaneously by means of communication, while at school foreign language learning takes place based on grammar. Vocabulary learning is very often neglected or rather it turns into an individual activity. The present study, which is considered to be unique on its own, is to reveal…

  4. Introducing a gender-neutral pronoun in a natural gender language: The influence of time on attitudes and behavior

    Directory of Open Access Journals (Sweden)

    Marie eGustafsson Sendén

    2015-07-01

    Full Text Available The implementation of gender fair language is often associated with negative reactions and hostile attack on people who propose a change. This was also the case in Sweden in 2012 when a third gender-neutral pronoun hen was proposed as an addition to the already existing Swedish pronouns for she and he. The pronoun hen can be used both generically, when gender is unknown or irrelevant, and as a transgender pronoun for people who categorize themselves outside the gender dichotomy. In this article we review the process from 2012 to 2015 when hen has been introduced in the Swedish Dictionary. No other language has so far added a third gender-neutral pronoun that actually has reached the broader population of language users, which makes the situation in Sweden unique. We present data on attitudes toward hen during the recent four years and study how time is associated with the attitudes. In 2012 the majority of the Swedish population was negative to the word, but already in 2014 there was a significant shift to more positive attitudes. Time was one of the strongest predictors for attitudes also when other relevant factors were controlled for. Even though to a lesser extent than the attitudes, the actual use of the word has also increased. We conclude that new words challenging the binary gender system evoke hostile and negative reactions, but also that attitudes can normalize rather quickly. This is very positive because it should motivate language amendments and initiatives for gender-fair language although the first responses are negative.

  5. Natural Language Processing-Enabled and Conventional Data Capture Methods for Input to Electronic Health Records: A Comparative Usability Study.

    Science.gov (United States)

    Kaufman, David R; Sheehan, Barbara; Stetson, Peter; Bhatt, Ashish R; Field, Adele I; Patel, Chirag; Maisel, James Mark

    2016-10-28

    The process of documentation in electronic health records (EHRs) is known to be time consuming, inefficient, and cumbersome. The use of dictation coupled with manual transcription has become an increasingly common practice. In recent years, natural language processing (NLP)-enabled data capture has become a viable alternative for data entry. It enables the clinician to maintain control of the process and potentially reduce the documentation burden. The question remains how this NLP-enabled workflow will impact EHR usability and whether it can meet the structured data and other EHR requirements while enhancing the user's experience. The objective of this study is evaluate the comparative effectiveness of an NLP-enabled data capture method using dictation and data extraction from transcribed documents (NLP Entry) in terms of documentation time, documentation quality, and usability versus standard EHR keyboard-and-mouse data entry. This formative study investigated the results of using 4 combinations of NLP Entry and Standard Entry methods ("protocols") of EHR data capture. We compared a novel dictation-based protocol using MediSapien NLP (NLP-NLP) for structured data capture against a standard structured data capture protocol (Standard-Standard) as well as 2 novel hybrid protocols (NLP-Standard and Standard-NLP). The 31 participants included neurologists, cardiologists, and nephrologists. Participants generated 4 consultation or admission notes using 4 documentation protocols. We recorded the time on task, documentation quality (using the Physician Documentation Quality Instrument, PDQI-9), and usability of the documentation processes. A total of 118 notes were documented across the 3 subject areas. The NLP-NLP protocol required a median of 5.2 minutes per cardiology note, 7.3 minutes per nephrology note, and 8.5 minutes per neurology note compared with 16.9, 20.7, and 21.2 minutes, respectively, using the Standard-Standard protocol and 13.8, 21.3, and 18.7 minutes

  6. Language of motivation and emotion in an internet support group for smoking cessation: explorative use of automated content analysis to measure regulatory focus.

    Science.gov (United States)

    Johnsen, Jan-Are K; Vambheim, Sara M; Wynn, Rolf; Wangberg, Silje C

    2014-01-01

    The present study describes a novel approach to the identification of the motivational processes in text data extracted from an Internet support group (ISG) for smoking cessation. Based on the previous findings that a "prevention" focus might be more relevant for maintaining behavior change, it was hypothesized that 1) language use (ie, the use of emotional words) signaling a "promotion" focus would be dominant in the initiating stages of the ISG, and 2) that the proportion of words signaling a prevention focus would increase over time. The data were collected from the ISG site, spanning 4 years of forum activity. The data were analyzed using the Linguistic Inquiry and Word Count application. The first hypothesis - of promotion focus dominance in the initiating stages - was not supported during year 1. However, for all the other years measured, the data showed that a prevention failure was more dominant compared with a promotion failure. The results indicate that content analysis could be used to investigate motivational and language-driven processes in ISGs. Understanding the interplay between self-regulation, lifestyle change, and modern communication channels could be of vital importance in providing the public with better health care services and interventions.

  7. Language of motivation and emotion in an Internet support group for smoking cessation: explorative use of automated content analysis to measure regulatory focus

    Directory of Open Access Journals (Sweden)

    Johnsen JAK

    2014-01-01

    Full Text Available Jan-Are K Johnsen,1 Sara M Vambheim,2 Rolf Wynn,3,4 Silje C Wangberg3,51Department of Clinical Dentistry, University of Tromsø, 2Department of Psychology, University of Tromsø, 3Division of Addiction and Specialized Psychiatry, University Hospital of North-Norway, 4Department of Clinical Medicine, University of Tromsø, Tromsø, 5Narvik University College, Narvik, NorwayAbstract: The present study describes a novel approach to the identification of the motivational processes in text data extracted from an Internet support group (ISG for smoking cessation. Based on the previous findings that a “prevention” focus might be more relevant for maintaining behavior change, it was hypothesized that 1 language use (ie, the use of emotional words signaling a “promotion” focus would be dominant in the initiating stages of the ISG, and 2 that the proportion of words signaling a prevention focus would increase over time. The data were collected from the ISG site, spanning 4 years of forum activity. The data were analyzed using the Linguistic Inquiry and Word Count application. The first hypothesis – of promotion focus dominance in the initiating stages – was not supported during year 1. However, for all the other years measured, the data showed that a prevention failure was more dominant compared with a promotion failure. The results indicate that content analysis could be used to investigate motivational and language-driven processes in ISGs. Understanding the interplay between self-regulation, lifestyle change, and modern communication channels could be of vital importance in providing the public with better health care services and interventions.Keywords: self-regulation, behavior change, emotion, prevention

  8. Individual biases, cultural evolution, and the statistical nature of language universals: the case of colour naming systems.

    Science.gov (United States)

    Baronchelli, Andrea; Loreto, Vittorio; Puglisi, Andrea

    2015-01-01

    Language universals have long been attributed to an innate Universal Grammar. An alternative explanation states that linguistic universals emerged independently in every language in response to shared cognitive or perceptual biases. A computational model has recently shown how this could be the case, focusing on the paradigmatic example of the universal properties of colour naming patterns, and producing results in quantitative agreement with the experimental data. Here we investigate the role of an individual perceptual bias in the framework of the model. We study how, and to what extent, the structure of the bias influences the corresponding linguistic universal patterns. We show that the cultural history of a group of speakers introduces population-specific constraints that act against the pressure for uniformity arising from the individual bias, and we clarify the interplay between these two forces.

  9. Individual biases, cultural evolution, and the statistical nature of language universals: the case of colour naming systems.

    Directory of Open Access Journals (Sweden)

    Andrea Baronchelli

    Full Text Available Language universals have long been attributed to an innate Universal Grammar. An alternative explanation states that linguistic universals emerged independently in every language in response to shared cognitive or perceptual biases. A computational model has recently shown how this could be the case, focusing on the paradigmatic example of the universal properties of colour naming patterns, and producing results in quantitative agreement with the experimental data. Here we investigate the role of an individual perceptual bias in the framework of the model. We study how, and to what extent, the structure of the bias influences the corresponding linguistic universal patterns. We show that the cultural history of a group of speakers introduces population-specific constraints that act against the pressure for uniformity arising from the individual bias, and we clarify the interplay between these two forces.

  10. Automated External Defibrillator

    Science.gov (United States)

    ... To Health Topics / Automated External Defibrillator Automated External Defibrillator Also known as What Is An automated external ... in survival. Training To Use an Automated External Defibrillator Learning how to use an AED and taking ...

  11. Exploring a corpus-based approach for detecting language impairment in monolingual English-speaking children.

    Science.gov (United States)

    Gabani, Keyur; Solorio, Thamar; Liu, Yang; Hassanali, Khairun-Nisa; Dollaghan, Christine A

    2011-11-01

    This paper explores the use of an automated method for analyzing narratives of monolingual English speaking children to accurately predict the presence or absence of a language impairment. The goal is to exploit corpus-based approaches inspired by the fields of natural language processing and machine learning. We extract a large variety of features from language samples and use them to train language models and well known machine learning algorithms as the underlying predictors. The methods are evaluated on two different datasets and three language tasks. One dataset contains samples of two spontaneous narrative tasks performed by 118 children with an average age of 13 years and a second dataset contains play sessions from over 600 younger children with an average age of 6 years. We compare results against a cut off baseline method and show that our results are far superior, reaching F-measures of over 85% in two of the three language tasks, and 48% in the third one. The different experiments we present here show that corpus based approaches can yield good prediction results in the problem of language impairment detection. These findings warrant further exploration of natural language processing techniques in the field of communication disorders. Moreover, the proposed framework can be easily adapted to analyze samples in languages other than English since most of the features are language independent or can be customized with little effort. Copyright © 2011 Elsevier B.V. All rights reserved.

  12. First Language Acquisition and Teaching

    Science.gov (United States)

    Cruz-Ferreira, Madalena

    2011-01-01

    "First language acquisition" commonly means the acquisition of a single language in childhood, regardless of the number of languages in a child's natural environment. Language acquisition is variously viewed as predetermined, wondrous, a source of concern, and as developing through formal processes. "First language teaching" concerns schooling in…

  13. An automation model of Effluent Treatment Plant

    Directory of Open Access Journals (Sweden)

    Luiz Alberto Oliveira Lima Roque

    2012-07-01

    Full Text Available Population growth and intensification of industrial activities have increased the deterioration of natural resources. Industrial, hospital and residential wastes are dumped directly into landfills without processing, polluting soils. This action will have consequences later, because the liquid substance resulting from the putrefaction of organic material plows into the soil to reach water bodies. Cities arise without planning, industrial and household wastes are discharged into rivers, lakes and oceans without proper treatment, affecting water resources. It is well known that in the next century there will be fierce competition for fresh water on the planet, probably due to the scarcity of it. Demographic expansion has occurred without proper health planning, degrading oceans, lakes and rivers. Thus, a large percentage of world population suffers from diseases related to water pollution. Accordingly, it can be concluded that sewage treatment is essential to human survival, to preserve rivers, lakes and oceans. An Effluent Treatment Plant (ETP treats wastewater to reduce its pollution to acceptable levels before sending them to the oceans or rivers. To automate the operation of an ETP, motors, sensors and logic blocks, timers and counters are needed. These functions are achieved with programmable logic controllers (PLC and Supervisory Systems. The Ladder language is used to program controllers and is a pillar of the Automation and Control Engineering. The supervisory systems allow process information to be monitored, while the PLC are responsible for control and data acquisition. In the age we live in, process automation is used in an increasing scale in order to provide higher quality, raise productivity and improve the proposed activities. Therefore, an automatic ETP will improve performance and efficiency to handle large volumes of sewage. Considering the growing importance of environmental awareness with special emphasis

  14. Library Automation.

    Science.gov (United States)

    Husby, Ole

    1990-01-01

    The challenges and potential benefits of automating university libraries are reviewed, with special attention given to cooperative systems. Aspects discussed include database size, the role of the university computer center, storage modes, multi-institutional systems, resource sharing, cooperative system management, networking, and intelligent…

  15. Simultaneous natural speech and AAC interventions for children with childhood apraxia of speech: lessons from a speech-language pathologist focus group.

    Science.gov (United States)

    Oommen, Elizabeth R; McCarthy, John W

    2015-03-01

    In childhood apraxia of speech (CAS), children exhibit varying levels of speech intelligibility depending on the nature of errors in articulation and prosody. Augmentative and alternative communication (AAC) strategies are beneficial, and commonly adopted with children with CAS. This study focused on the decision-making process and strategies adopted by speech-language pathologists (SLPs) when simultaneously implementing interventions that focused on natural speech and AAC. Eight SLPs, with significant clinical experience in CAS and AAC interventions, participated in an online focus group. Thematic analysis revealed eight themes: key decision-making factors; treatment history and rationale; benefits; challenges; therapy strategies and activities; collaboration with team members; recommendations; and other comments. Results are discussed along with clinical implications and directions for future research.

  16. LAVA: a conceptual framework for automated risk assessment

    International Nuclear Information System (INIS)

    Smith, S.T.; Brown, D.C.; Erkkila, T.H.; FitzGerald, P.D.; Lim, J.J.; Massagli, L.; Phillips, J.R.; Tisinger, R.M.

    1986-01-01

    At the Los Alamos National Laboratory we are developing the framework for generating knowledge-based systems that perform automated risk analyses on an organization's assets. An organization's assets can be subdivided into tangible and intangible assets. Tangible assets include facilities, materiel, personnel, and time, while intangible assets include such factors as reputation, employee morale, and technical knowledge. The potential loss exposure of an asset is dependent upon the threats (both static and dynamic), the vulnerabilities in the mechanisms protecting the assets from the threats, and the consequences of the threats successfully exploiting the protective systems vulnerabilities. The methodology is based upon decision analysis, fuzzy set theory, natural-language processing, and event-tree structures. The Los Alamos Vulnerability and Risk Assessment (LAVA) methodology has been applied to computer security. LAVA is modeled using an interactive questionnaire in natural language and is fully automated on a personal computer. The program generates both summary reports for use by both management personnel and detailed reports for use by operations staff. LAVA has been in use by the Nuclear Regulatory Commission and the National Bureau of Standards for nearly two years and is presently under evaluation by other governmental agencies. 7 refs

  17. Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis.

    Science.gov (United States)

    Toledo, Cíntia Matsuda; Cunha, Andre; Scarton, Carolina; Aluísio, Sandra

    2014-01-01

    Discourse production is an important aspect in the evaluation of brain-injured individuals. We believe that studies comparing the performance of brain-injured subjects with that of healthy controls must use groups with compatible education. A pioneering application of machine learning methods using Brazilian Portuguese for clinical purposes is described, highlighting education as an important variable in the Brazilian scenario. The aims were to describe how to:(i) develop machine learning classifiers using features generated by natural language processing tools to distinguish descriptions produced by healthy individuals into classes based on their years of education; and(ii) automatically identify the features that best distinguish the groups. The approach proposed here extracts linguistic features automatically from the written descriptions with the aid of two Natural Language Processing tools: Coh-Metrix-Port and AIC. It also includes nine task-specific features (three new ones, two extracted manually, besides description time; type of scene described - simple or complex; presentation order - which type of picture was described first; and age). In this study, the descriptions by 144 of the subjects studied in Toledo 18 were used,which included 200 healthy Brazilians of both genders. A Support Vector Machine (SVM) with a radial basis function (RBF) kernel is the most recommended approach for the binary classification of our data, classifying three of the four initial classes. CfsSubsetEval (CFS) is a strong candidate to replace manual feature selection methods.

  18. Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis

    Directory of Open Access Journals (Sweden)

    Cíntia Matsuda Toledo

    Full Text Available Discourse production is an important aspect in the evaluation of brain-injured individuals. We believe that studies comparing the performance of brain-injured subjects with that of healthy controls must use groups with compatible education. A pioneering application of machine learning methods using Brazilian Portuguese for clinical purposes is described, highlighting education as an important variable in the Brazilian scenario.OBJECTIVE: The aims were to describe how to: (i develop machine learning classifiers using features generated by natural language processing tools to distinguish descriptions produced by healthy individuals into classes based on their years of education; and (ii automatically identify the features that best distinguish the groups.METHODS: The approach proposed here extracts linguistic features automatically from the written descriptions with the aid of two Natural Language Processing tools: Coh-Metrix-Port and AIC. It also includes nine task-specific features (three new ones, two extracted manually, besides description time; type of scene described - simple or complex; presentation order - which type of picture was described first; and age. In this study, the descriptions by 144 of the subjects studied in Toledo18 were used, which included 200 healthy Brazilians of both genders.RESULTS AND CONCLUSION:A Support Vector Machine (SVM with a radial basis function (RBF kernel is the most recommended approach for the binary classification of our data, classifying three of the four initial classes. CfsSubsetEval (CFS is a strong candidate to replace manual feature selection methods.

  19. Re-examining text difficulty through automated textual analysis tools and readers’ beliefs: the case of the Greek State Certificate of English Language Proficiency exam

    Directory of Open Access Journals (Sweden)

    Jenny Liontou

    2012-02-01

    Full Text Available This article reports on an exploratory study that aimed at describing and comparing a range of linguistic features that characterize the reading texts used at the B2 and C1 level of the Greek State Certificate of English Language Proficiency exam (KPG1. Its ultimate purpose was to explore the contribution of such features to perceived text difficulty while at the same time examining the relationship between strategy use and test-takers’ perceived level of reading comprehension difficulty reported in 7,250 questionnaires. Text analysis revealed significant differences between B2 and C1 reading texts for a specific number of text features such as word, paragraph and text length, readability indices, levels of word frequency and presence of words with rich conceptual content. A significant correlation was also found between B2 test-takers’ perception of reading module difficulty and specific text features i.e. lexical diversity, abstract words, positive additive connectives and anaphoric references between adjacent sentences. With regard to C1 test-takers, data analysis showed that two specific text variables i.e. positive logical connectives and argument overlap, correlated significantly with readers’ perception of reading module difficulty. Finally, problem-solving reading strategies such as rereading the text, guessing the meaning of unknown words and translating in mother tongue were found to correlate significantly with perceived text difficulty, whereas support-type reading strategies such as underlining or selectively reading parts of the text were less often employed regardless KPG test-takers’ perception of text difficulty. The findings of this study could help both EFL teachers and test designers gain valuable knowledge regarding EFL learners’ reading habits and also become more alert to the difficulty specific text features impose on the latter.

  20. Natural Language Processing and Machine Learning (NLP/ML): Applying Advances in Biomedicine to the Earth Sciences

    Science.gov (United States)

    Duerr, R.; Myers, S.; Palmer, M.; Jenkins, C. J.; Thessen, A.; Martin, J.

    2015-12-01

    Semantics underlie many of the tools and services available from and on the web. From improving search results to enabling data mashups and other forms of interoperability, semantic technologies have proven themselves. But creating semantic resources, especially re-usable semantic resources, is extremely time consuming and labor intensive. Why? Because it is not just a matter of technology but also of obtaining rough consensus if not full agreement amongst community members on the meaning and order of things. One way to develop these resources in a more automated way would be to use NLP/ML techniques to extract the required resources from large corpora of subject-specific text such as peer-reviewed papers where presumably a rough consensus has been achieved at least about the basics of the particular discipline involved. While not generally applied to Earth Sciences, considerable resources have been spent in other fields such as medicine on these types of techniques with some success. The NSF-funded ClearEarth project is applying the techniques developed for biomedicine to the cryosphere, geology, and biology in order to spur faster development of the semantic resources needed in these fields. The first area being addressed by the project is the cryosphere, specifically sea ice nomenclature where an existing set of sea ice ontologies are being used as the "Gold Standard" against which to test and validate the NLP/ML techniques. The processes being used, lessons learned and early results will be described.

  1. Electronic Design Automation Using Object Oriented Electronics

    OpenAIRE

    Walid M. Aly; Mohamed S. Abuelnasr

    2010-01-01

    Problem statement: Electronic design automation is the usage of computer technology and software tools for designing integrated electronic system and creating electrical schematics. Approach: An approach is presented for modeling of various electronic and electric devices using object oriented design, aiming on building a library of devices (classes) which can be used for electronic design automation. Results: The presented library was implemented using Java programming language to form an El...

  2. The language of social software

    NARCIS (Netherlands)

    D.J.N. van Eijck (Jan)

    2010-01-01

    htmlabstractComputer software is written in languages likeC, Java orHaskell. In many cases social software is expressed in natural language. The paper explores connections between the areas of natural language analysis and analysis of social protocols, and proposes an extended program for natural

  3. Towards Compatible and Interderivable Semantic Specifications for the Scheme Programming Language, Part I: Denotational Semantics, Natural Semantics, and Abstract Machines

    DEFF Research Database (Denmark)

    Danvy, Olivier

    2009-01-01

    We derive two big-step abstract machines, a natural semantics, and the valuation function of a denotational semantics based on the small-step abstract machine for Core Scheme presented by Clinger at PLDI'98. Starting from a functional implementation of this small-step abstract machine, (1) we fuse...... its transition function with its driver loop, obtaining the functional implementation of a big-step abstract machine; (2) we adjust this big-step abstract machine so that it is in defunctionalized form, obtaining the functional implementation of a second big-step abstract machine; (3) we...... refunctionalize this adjusted abstract machine, obtaining the functional implementation of a natural semantics in continuation-passing style; and (4) we closure-unconvert this natural semantics, obtaining a compositional continuation-passing evaluation function which we identify as the functional implementation...

  4. Towards Compatible and Interderivable Semantic Specifications for the Scheme Programming Language, Part I: Denotational Semantics, Natural Semantics, and Abstract Machines

    DEFF Research Database (Denmark)

    Danvy, Olivier

    2008-01-01

    We derive two big-step abstract machines, a natural semantics, and the valuation function of a denotational semantics based on the small-step abstract machine for Core Scheme presented by Clinger at PLDI'98. Starting from a functional implementation of this small-step abstract machine, (1) we fuse...... its transition function with its driver loop, obtaining the functional implementation of a big-step abstract machine; (2) we adjust this big-step abstract machine so that it is in defunctionalized form, obtaining the functional implementation of a second big-step abstract machine; (3) we...... refunctionalize this adjusted abstract machine, obtaining the functional implementation of a natural semantics in continuation style; and (4) we closure-unconvert this natural semantics, obtaining a compositional continuation-passing evaluation function which we identify as the functional implementation...

  5. LK Scripting Language

    Energy Technology Data Exchange (ETDEWEB)

    2016-01-27

    The LK scripting language is a simple and fast computer programming language designed for easy integration with existing software to enable automation of tasks. The LK language is used by NREL’s System Advisor Model (SAM), the SAM Software Development Kit (SDK), and SolTrace products. LK is easy extensible and adaptable to new software due to its small footprint and is designed to be statically linked into other software. It is written in standard C++, is cross-platform (Windows, Linux, and OSX), and includes optional portions that enable direct integration with graphical user interfaces written in the open source C++ wxWidgets Version 3.0+ toolkit.

  6. Natural Propositions

    DEFF Research Database (Denmark)

    Stjernfelt, Frederik

    Preface -- Introduction -- The generality of signs -- Dicisigns -- Some consequences of the dicisign doctrine -- Dicisigns and cognition -- Natural propositions--the evolution of semiotic self-control -- Dicisigns beyond language -- Operational and optimal iconicity in Peirce's diagrammatology...

  7. Observing Coaching and Reflecting: A Multi-modal Natural Language-based Dialogue System in a Learning Context

    NARCIS (Netherlands)

    Van Helvert, Joy; Van Rosmalen, Peter; Börner, Dirk; Petukhova, Volha; Alexandersson, Jan

    2016-01-01

    The Metalogue project aims to develop a multi-modal, multi-party dialogue system with metacognitive abilities that will advance our understanding of natural conversational human-machine interaction and dialogue interfaces. This paper introduces the vision for the system and discusses its application

  8. Natural language processing: state of the art and prospects for significant progress, a workshop sponsored by the National Library of Medicine.

    Science.gov (United States)

    Friedman, Carol; Rindflesch, Thomas C; Corn, Milton

    2013-10-01

    Natural language processing (NLP) is crucial for advancing healthcare because it is needed to transform relevant information locked in text into structured data that can be used by computer processes aimed at improving patient care and advancing medicine. In light of the importance of NLP to health, the National Library of Medicine (NLM) recently sponsored a workshop to review the state of the art in NLP focusing on text in English, both in biomedicine and in the general language domain. Specific goals of the NLM-sponsored workshop were to identify the current state of the art, grand challenges and specific roadblocks, and to identify effective use and best practices. This paper reports on the main outcomes of the workshop, including an overview of the state of the art, strategies for advancing the field, and obstacles that need to be addressed, resulting in recommendations for a research agenda intended to advance the field. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  9. Sequoyah Foreign Language Translation System - Business Case Analysis

    National Research Council Canada - National Science Library

    Ong, Wing S. S

    2007-01-01

    Sequoyah, which is the Department of Defense (DoD)'s Program of Record for automated foreign language translation, is to identify current and developing technologies to meet warfighter requirements for foreign language support...

  10. Propaedeutics of Mathematical Language of Schemes and Structures in School Teaching of the Natural Sciences Profile

    Directory of Open Access Journals (Sweden)

    V. P. Kotchnev

    2012-01-01

    Full Text Available The paper looks at the teaching process at schools of the natural sciences profile. The subject of the research is devoted to the correlations between the students’ progress and the degree of their involvement in creative activities of problem solving in the natural sciences context. The research is aimed to demonstrate the reinforce- ment of students’ creative learning by teaching mathematical schemes and structures. The comparative characteristics of the task, problem and model approaches to mathematical problem solving are given; the experimental data on the efficiency of mathematical training based on the above approaches being discussed, as well as the specifics of modeling the tasks for problem solving. The author examines the ways for stimulating the students’ creative activity and motivating the knowledge acquisition, and search for the new mathematical conformities related to the natural science content. The significance of the Olympiad and other non-standard tasks, broadening the students’ horizons and stimulating creative thinking and abilities, is emphasized.The proposed method confirms the appropriateness of introducing the Olympiad and non-standard problem solving into the preparatory training curricula for the Unified State Examinations. 

  11. Automated electronic filter design

    CERN Document Server

    Banerjee, Amal

    2017-01-01

    This book describes a novel, efficient and powerful scheme for designing and evaluating the performance characteristics of any electronic filter designed with predefined specifications. The author explains techniques that enable readers to eliminate complicated manual, and thus error-prone and time-consuming, steps of traditional design techniques. The presentation includes demonstration of efficient automation, using an ANSI C language program, which accepts any filter design specification (e.g. Chebyschev low-pass filter, cut-off frequency, pass-band ripple etc.) as input and generates as output a SPICE(Simulation Program with Integrated Circuit Emphasis) format netlist. Readers then can use this netlist to run simulations with any version of the popular SPICE simulator, increasing accuracy of the final results, without violating any of the key principles of the traditional design scheme.

  12. Automation from pictures

    International Nuclear Information System (INIS)

    Kozubal, A.J.

    1992-01-01

    The state transition diagram (STD) model has been helpful in the design of real time software, especially with the emergence of graphical computer aided software engineering (CASE) tools. Nevertheless, the translation of the STD to real time code has in the past been primarily a manual task. At Los Alamos we have automated this process. The designer constructs the STD using a CASE tool (Cadre Teamwork) using a special notation for events and actions. A translator converts the STD into an intermediate state notation language (SNL), and this SNL is compiled directly into C code (a state program). Execution of the state program is driven by external events, allowing multiple state programs to effectively share the resources of the host processor. Since the design and the code are tightly integrated through the CASE tool, the design and code never diverge, and we avoid design obsolescence. Furthermore, the CASE tool automates the production of formal technical documents from the graphic description encapsulated by the CASE tool. (author)

  13. Maneuver Automation Software

    Science.gov (United States)

    Uffelman, Hal; Goodson, Troy; Pellegrin, Michael; Stavert, Lynn; Burk, Thomas; Beach, David; Signorelli, Joel; Jones, Jeremy; Hahn, Yungsun; Attiyah, Ahlam; hide

    2009-01-01

    The Maneuver Automation Software (MAS) automates the process of generating commands for maneuvers to keep the spacecraft of the Cassini-Huygens mission on a predetermined prime mission trajectory. Before MAS became available, a team of approximately 10 members had to work about two weeks to design, test, and implement each maneuver in a process that involved running many maneuver-related application programs and then serially handing off data products to other parts of the team. MAS enables a three-member team to design, test, and implement a maneuver in about one-half hour after Navigation has process-tracking data. MAS accepts more than 60 parameters and 22 files as input directly from users. MAS consists of Practical Extraction and Reporting Language (PERL) scripts that link, sequence, and execute the maneuver- related application programs: "Pushing a single button" on a graphical user interface causes MAS to run navigation programs that design a maneuver; programs that create sequences of commands to execute the maneuver on the spacecraft; and a program that generates predictions about maneuver performance and generates reports and other files that enable users to quickly review and verify the maneuver design. MAS can also generate presentation materials, initiate electronic command request forms, and archive all data products for future reference.

  14. Autonomous Systems: Habitat Automation

    Data.gov (United States)

    National Aeronautics and Space Administration — The Habitat Automation Project Element within the Autonomous Systems Project is developing software to automate the automation of habitats and other spacecraft. This...

  15. An Automation Planning Primer.

    Science.gov (United States)

    Paynter, Marion

    1988-01-01

    This brief planning guide for library automation incorporates needs assessment and evaluation of options to meet those needs. A bibliography of materials on automation planning and software reviews, library software directories, and library automation journals is included. (CLB)

  16. Our ways of learning language | Lepota | Journal for Language ...

    African Journals Online (AJOL)

    Its results are discussed within the contours of five categories: learners' motivations for learning language; their ideas about language learning aptitude; their opinions of the difficulty of learning English; their second language learning and communication strategies; and, finally, their views on the nature of language learning.

  17. Whole Language Strategies for ESL Students. Language and Literacy Series.

    Science.gov (United States)

    Heald-Taylor, Gail

    This handbook outlines learning strategies in language arts for children in kindergarten to third grade learning English as a second language (ESL). They are designed for the Whole Language or Natural Approach. Although reading and writing are the key language components emphasized, listening, speaking, drama, and visual arts activities have been…

  18. The Linguistic Interpretation for Language Union – Language Family

    Directory of Open Access Journals (Sweden)

    E.A. Balalykina

    2016-10-01

    Full Text Available The paper is dedicated to the problem of determination of the essence of language union and language family in modern linguistics, which is considered important, because these terms are often used as absolute synonyms. The research is relevant due to the need to distinguish the features of languages that are inherited during their functioning within either language union or language family when these languages are compared. The research has been carried out in order to present the historical background of the problem and to justify the need for differentiation of language facts that allow relating languages to particular language union or language family. In order to fulfill the goal of this work, descriptive, comparative, and historical methods have been used. A range of examples has been provided to prove that some languages, mainly Slavonic and Baltic languages, form a language family rather than a language union, because a whole number of features in their systems are the heritage of their common Indo-European past. Firstly, it is necessary to take into account changes having either common or different nature in the system of particular languages; secondly, one must have a precise idea of what features in the phonetic and morphological systems of compared languages allow to relate them to language union or language family; thirdly, it must be determined whether the changes in compared languages are regular or of any other type. On the basis of the obtained results, the following conclusions have been drawn: language union and language family are two different types of relations between modern languages; they allow identifying both degree of similarity of these languages and causes of differences between them. It is most important that one should distinguish and describe the specific features of two basic groups of languages forming language family or language union. The results obtained during the analysis are very important for linguistics

  19. Automated Budget System -

    Data.gov (United States)

    Department of Transportation — The Automated Budget System (ABS) automates management and planning of the Mike Monroney Aeronautical Center (MMAC) budget by providing enhanced capability to plan,...

  20. Cultural Perspectives Toward Language Learning

    Science.gov (United States)

    Lin, Li-Li

    2008-01-01

    Cultural conflicts may be derived from using inappropriate language. Appropriate linguistic-pragmatic competence may also be produced by providing various and multicultural backgrounds. Culture and language are linked together naturally, unconsciously, and closely in daily social lives. Culture affects language and language affects culture through…

  1. Implementing Office Automation in Postsecondary Educational Institutions.

    Science.gov (United States)

    Creutz, Alan

    1984-01-01

    Three implementation strategies for office automation and decision support systems within postsecondary educational institutions--"natural evolution,""the total solution," and "coordinate evolution"--are identified. The components of an effective implementation plan are discussed. (Author/MLW)

  2. Automation 2017

    CERN Document Server

    Zieliński, Cezary; Kaliczyńska, Małgorzata

    2017-01-01

    This book consists of papers presented at Automation 2017, an international conference held in Warsaw from March 15 to 17, 2017. It discusses research findings associated with the concepts behind INDUSTRY 4.0, with a focus on offering a better understanding of and promoting participation in the Fourth Industrial Revolution. Each chapter presents a detailed analysis of a specific technical problem, in most cases followed by a numerical analysis, simulation and description of the results of implementing the solution in a real-world context. The theoretical results, practical solutions and guidelines presented are valuable for both researchers working in the area of engineering sciences and practitioners looking for solutions to industrial problems. .

  3. Marketing automation

    Directory of Open Access Journals (Sweden)

    TODOR Raluca Dania

    2017-01-01

    Full Text Available The automation of the marketing process seems to be nowadays, the only solution to face the major changes brought by the fast evolution of technology and the continuous increase in supply and demand. In order to achieve the desired marketing results, businessis have to employ digital marketing and communication services. These services are efficient and measurable thanks to the marketing technology used to track, score and implement each campaign. Due to the technical progress, the marketing fragmentation, demand for customized products and services on one side and the need to achieve constructive dialogue with the customers, immediate and flexible response and the necessity to measure the investments and the results on the other side, the classical marketing approached had changed continue to improve substantially.

  4. Social Network Development, Language Use, and Language Acquisition during Study Abroad: Arabic Language Learners' Perspectives

    Science.gov (United States)

    Dewey, Dan P.; Belnap, R. Kirk; Hillstrom, Rebecca

    2013-01-01

    Language learners and educators have subscribed to the belief that those who go abroad will have many opportunities to use the target language and will naturally become proficient. They also assume that language learners will develop relationships with native speakers allowing them to use the language and become more fluent, an assumption…

  5. Automated Speech Rate Measurement in Dysarthria

    Science.gov (United States)

    Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc

    2015-01-01

    Purpose: In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. Method: The new algorithm was trained and tested using Dutch…

  6. How will XML impact industrial automation?

    CERN Multimedia

    Pinceti, P

    2002-01-01

    A working group of the World Wide Web Consortium (W3C) has overcome the limits of both HTML and SGML with the definition of the extensible markup language - XML. This article looks at how XML will affect industrial automation (2 pages).

  7. USSR Report, Cybernetics Computers and Automation Technology

    Science.gov (United States)

    1985-09-05

    Materials from foreign-language sources are translated; those from English -language sources are transcribed or reprinted, with the original phrasing and...Informational Robot-Manipulator Systems." Moscow, Mashinostroyeniye, 1977, 272 pages. 2 Pratt U. "Digital Processing of Images." Translated from English ...adjectives. It provides for automatic formation of lexical- gramatical information necessary for natural language processing. The semant^ syntactic

  8. Language policy, translation and language development in Zimbabwe

    African Journals Online (AJOL)

    depth investigations on the issues that are highlighted in this article like the nature of languages involved, the directionality of translation, and the types of texts translated. Southern African Linguistics and Applied Language Studies 2011, 29(3): ...

  9. Integration of language and sensor information

    Science.gov (United States)

    Perlovsky, Leonid I.; Weijers, Bertus

    2003-04-01

    The talk describes the development of basic technologies of intelligent systems fusing data from multiple domains and leading to automated computational techniques for understanding data contents. Understanding involves inferring appropriate decisions and recommending proper actions, which in turn requires fusion of data and knowledge about objects, situations, and actions. Data might include sensory data, verbal reports, intelligence intercepts, or public records, whereas knowledge ought to encompass the whole range of objects, situations, people and their behavior, and knowledge of languages. In the past, a fundamental difficulty in combining knowledge with data was the combinatorial complexity of computations, too many combinations of data and knowledge pieces had to be evaluated. Recent progress in understanding of natural intelligent systems, including the human mind, leads to the development of neurophysiologically motivated architectures for solving these challenging problems, in particular the role of emotional neural signals in overcoming combinatorial complexity of old logic-based approaches. Whereas past approaches based on logic tended to identify logic with language and thinking, recent studies in cognitive linguistics have led to appreciation of more complicated nature of linguistic models. Little is known about the details of the brain mechanisms integrating language and thinking. Understanding and fusion of linguistic information with sensory data represent a novel challenging aspect of the development of integrated fusion systems. The presentation will describe a non-combinatorial approach to this problem and outline techniques that can be used for fusing diverse and uncertain knowledge with sensory and linguistic data.

  10. Both Automation and Paper.

    Science.gov (United States)

    Purcell, Royal

    1988-01-01

    Discusses the concept of a paperless society and the current situation in library automation. Various applications of automation and telecommunications are addressed, and future library automation is considered. Automation at the Monroe County Public Library in Bloomington, Indiana, is described as an example. (MES)

  11. Unmet needs in automated cytogenetics

    International Nuclear Information System (INIS)

    Bender, M.A.

    1976-01-01

    Though some, at least, of the goals of automation systems for analysis of clinical cytogenetic material seem either at hand, like automatic metaphase finding, or at least likely to be met in the near future, like operator-assisted semi-automatic analysis of banded metaphase spreads, important areas of cytogenetic analsis, most importantly the determination of chromosomal aberration frequencies in populations of cells or in samples of cells from people exposed to environmental mutagens, await practical methods of automation. Important as are the clinical diagnostic applications, it is apparent that increasing concern over the clastogenic effects of the multitude of potentially clastogenic chemical and physical agents to which human populations are being increasingly exposed, and the resulting emergence of extensive cytogenetic testing protocols, makes the development of automation not only economically feasible but almost mandatory. The nature of the problems involved, and acutal of possible approaches to their solution, are discussed

  12. Automating a 96-well microtiter plate assay for identification of AGEs inhibitors or inducers: application to the screening of a small natural compounds library.

    Science.gov (United States)

    Derbré, Séverine; Gatto, Julia; Pelleray, Aude; Coulon, Laurie; Séraphin, Denis; Richomme, Pascal

    2010-10-01

    Advanced glycation end-products (AGEs) are involved in the pathogenesis of numerous affections such as diabetes and neurological diseases. AGEs are also implied in various changes in tissues and organs. Therefore, compounds able to break them or inhibit their formation may be considered as potential drugs, dietary supplements, or bioactive additives. In this study, we have developed a rapid and reliable (Z' factor calculation) anti-AGEs activity screening based on the overall fluorescence of AGEs. This method was successfully evaluated on known AGEs inhibitors and on a small library of natural compounds, yielding coherent results when compared with literature data.

  13. CORRELATION BETWEEN METACOGNITIVE STRATEGY, FOREIGN LANGUAGE APTITUDE AND MOTIVATIONS IN LANGUAGE LEARNING

    OpenAIRE

    Novia Tri Febriani

    2017-01-01

    Language learning belief and language learning strategies are two essential predictors that have significant effect toward students’ language proficiency. Learners’ belief is dealing with what comes from inside the learners in learning the language, such as foreign language aptitude; difficulty of language learning; nature of language learning; learning and communication strategies; and motivation. Meanwhile, language learning strategies are learners’ plan in achieving certain goals or master...

  14. Automation, communication and cybernetics in science and engineering 2013/2014

    CERN Document Server

    Isenhardt, Ingrid; Hees, Frank; Henning, Klaus

    2014-01-01

    This book continues the tradition of its predecessors “Automation, Communication and Cybernetics in Science and Engineering 2009/2010 and 2011/2012” and includes a representative selection of scientific publications from researchers at the institute cluster IMA/ZLW & IfU.   IMA - Institute of Information Management in Mechanical Engineering
 ZLW - Center for Learning and Knowledge Management
 IfU - Associated Institute for Management Cybernetics e.V.
Faculty of Mechanical Engineering, RWTH Aachen University   The book presents a range of innovative fields of application, including: cognitive systems, cyber-physical production systems, robotics, automation technology, machine learning, natural language processing, data mining, predictive data analytics, visual analytics, innovation and diversity management, demographic models, virtual and remote laboratories, virtual and augmented realities, multimedia learning environments, organizational development and management cybernetics. The contributio...

  15. Novel Use of Natural Language Processing (NLP) to Predict Suicidal Ideation and Psychiatric Symptoms in a Text-Based Mental Health Intervention in Madrid

    Science.gov (United States)

    Progovac, Ana M.; Chen, Pei; Mullin, Brian; Hou, Sherry

    2016-01-01

    Natural language processing (NLP) and machine learning were used to predict suicidal ideation and heightened psychiatric symptoms among adults recently discharged from psychiatric inpatient or emergency room settings in Madrid, Spain. Participants responded to structured mental and physical health instruments at multiple follow-up points. Outcome variables of interest were suicidal ideation and psychiatric symptoms (GHQ-12). Predictor variables included structured items (e.g., relating to sleep and well-being) and responses to one unstructured question, “how do you feel today?” We compared NLP-based models using the unstructured question with logistic regression prediction models using structured data. The PPV, sensitivity, and specificity for NLP-based models of suicidal ideation were 0.61, 0.56, and 0.57, respectively, compared to 0.73, 0.76, and 0.62 of structured data-based models. The PPV, sensitivity, and specificity for NLP-based models of heightened psychiatric symptoms (GHQ-12 ≥ 4) were 0.56, 0.59, and 0.60, respectively, compared to 0.79, 0.79, and 0.85 in structured models. NLP-based models were able to generate relatively high predictive values based solely on responses to a simple general mood question. These models have promise for rapidly identifying persons at risk of suicide or psychological distress and could provide a low-cost screening alternative in settings where lengthy structured item surveys are not feasible. PMID:27752278

  16. Novel Use of Natural Language Processing (NLP to Predict Suicidal Ideation and Psychiatric Symptoms in a Text-Based Mental Health Intervention in Madrid

    Directory of Open Access Journals (Sweden)

    Benjamin L. Cook

    2016-01-01

    Full Text Available Natural language processing (NLP and machine learning were used to predict suicidal ideation and heightened psychiatric symptoms among adults recently discharged from psychiatric inpatient or emergency room settings in Madrid, Spain. Participants responded to structured mental and physical health instruments at multiple follow-up points. Outcome variables of interest were suicidal ideation and psychiatric symptoms (GHQ-12. Predictor variables included structured items (e.g., relating to sleep and well-being and responses to one unstructured question, “how do you feel today?” We compared NLP-based models using the unstructured question with logistic regression prediction models using structured data. The PPV, sensitivity, and specificity for NLP-based models of suicidal ideation were 0.61, 0.56, and 0.57, respectively, compared to 0.73, 0.76, and 0.62 of structured data-based models. The PPV, sensitivity, and specificity for NLP-based models of heightened psychiatric symptoms (GHQ-12 ≥ 4 were 0.56, 0.59, and 0.60, respectively, compared to 0.79, 0.79, and 0.85 in structured models. NLP-based models were able to generate relatively high predictive values based solely on responses to a simple general mood question. These models have promise for rapidly identifying persons at risk of suicide or psychological distress and could provide a low-cost screening alternative in settings where lengthy structured item surveys are not feasible.

  17. Novel Use of Natural Language Processing (NLP) to Predict Suicidal Ideation and Psychiatric Symptoms in a Text-Based Mental Health Intervention in Madrid.

    Science.gov (United States)

    Cook, Benjamin L; Progovac, Ana M; Chen, Pei; Mullin, Brian; Hou, Sherry; Baca-Garcia, Enrique

    2016-01-01

    Natural language processing (NLP) and machine learning were used to predict suicidal ideation and heightened psychiatric symptoms among adults recently discharged from psychiatric inpatient or emergency room settings in Madrid, Spain. Participants responded to structured mental and physical health instruments at multiple follow-up points. Outcome variables of interest were suicidal ideation and psychiatric symptoms (GHQ-12). Predictor variables included structured items (e.g., relating to sleep and well-being) and responses to one unstructured question, "how do you feel today?" We compared NLP-based models using the unstructured question with logistic regression prediction models using structured data. The PPV, sensitivity, and specificity for NLP-based models of suicidal ideation were 0.61, 0.56, and 0.57, respectively, compared to 0.73, 0.76, and 0.62 of structured data-based models. The PPV, sensitivity, and specificity for NLP-based models of heightened psychiatric symptoms (GHQ-12 ≥ 4) were 0.56, 0.59, and 0.60, respectively, compared to 0.79, 0.79, and 0.85 in structured models. NLP-based models were able to generate relatively high predictive values based solely on responses to a simple general mood question. These models have promise for rapidly identifying persons at risk of suicide or psychological distress and could provide a low-cost screening alternative in settings where lengthy structured item surveys are not feasible.

  18. AUTOMATED TESTING OF OPC SERVERS

    CERN Document Server

    Farnham, B

    2011-01-01

    CERN relies on OPC Server implementations from 3rd party device vendors to provide a software interface to their respective hardware. Each time a vendor releases a new OPC Server version it is regression tested internally to verify that existing functionality has not been inadvertently broken during the process of adding new features. In addition bugs and problems must be communicated to the vendors in a reliable and portable way. This presentation covers the automated test approach used at CERN to cover both cases: Scripts are written in a domain specific language specifically created for describing OPC tests and executed by a custom software engine driving the OPC Server implementation.

  19. What are Languages? A Biolinguistic Perspective

    OpenAIRE

    Mendívil-Giró José-Luis

    2014-01-01

    The goal of the present contribution is to explore what kinds of objects languages are from a biolinguistic point of view. I define the biolinguistic point of view as a naturalistic study of languages and I show that from this point of view, languages are human language organs, that is, they are natural objects. However, languages change over time; therefore, they are also historically modified objects. Considering that natural organisms are historically modified natural objects, ...

  20. Text Mining approaches for automated literature knowledge extraction and representation.

    Science.gov (United States)

    Nuzzo, Angelo; Mulas, Francesca; Gabetta, Matteo; Arbustini, Eloisa; Zupan, Blaz; Larizza, Cristiana; Bellazzi, Riccardo

    2010-01-01

    Due to the overwhelming volume of published scientific papers, information tools for automated literature analysis are essential to support current biomedical research. We have developed a knowledge extraction tool to help researcher in discovering useful information which can support their reasoning process. The tool is composed of a search engine based on Text Mining and Natural Language Processing techniques, and an analysis module which process the search results in order to build annotation similarity networks. We tested our approach on the available knowledge about the genetic mechanism of cardiac diseases, where the target is to find both known and possible hypothetical relations between specific candidate genes and the trait of interest. We show that the system i) is able to effectively retrieve medical concepts and genes and ii) plays a relevant role assisting researchers in the formulation and evaluation of novel literature-based hypotheses.