Full Text Available When assessing the importance of materials (or other components to a given set of applications, machine analysis of a very large corpus of scientific abstracts can provide an analyst a base of insights to develop further. The use of text analytics reduces the time required to conduct an evaluation, while allowing analysts to experiment with a multitude of different hypotheses. Because the scope and quantity of metadata analyzed can, and should, be large, any divergence from what a human analyst determines and what the text analysis shows provides a prompt for the human analyst to reassess any preliminary findings. In this work, we have successfully extracted material–application pairs and ranked them on their importance. This method provides a novel way to map scientific advances in a particular material to the application for which it is used. Approximately 438,000 titles and abstracts of scientific papers published from 1992 to 2011 were used to examine 16 materials. This analysis used coclustering text analysis to associate individual materials with specific clean energy applications, evaluate the importance of materials to specific applications, and assess their importance to clean energy overall. Our analysis reproduced the judgments of experts in assigning material importance to applications. The validated methods were then used to map the replacement of one material with another material in a specific application (batteries.
Saber A Akhondi
Full Text Available Exploring the chemical and biological space covered by patent applications is crucial in early-stage medicinal chemistry activities. Patent analysis can provide understanding of compound prior art, novelty checking, validation of biological assays, and identification of new starting points for chemical exploration. Extracting chemical and biological entities from patents through manual extraction by expert curators can take substantial amount of time and resources. Text mining methods can help to ease this process. To validate the performance of such methods, a manually annotated patent corpus is essential. In this study we have produced a large gold standard chemical patent corpus. We developed annotation guidelines and selected 200 full patents from the World Intellectual Property Organization, United States Patent and Trademark Office, and European Patent Office. The patents were pre-annotated automatically and made available to four independent annotator groups each consisting of two to ten annotators. The annotators marked chemicals in different subclasses, diseases, targets, and modes of action. Spelling mistakes and spurious line break due to optical character recognition errors were also annotated. A subset of 47 patents was annotated by at least three annotator groups, from which harmonized annotations and inter-annotator agreement scores were derived. One group annotated the full set. The patent corpus includes 400,125 annotations for the full set and 36,537 annotations for the harmonized set. All patents and annotated entities are publicly available at www.biosemantics.org.
L. Hollink (Laura); A. Bedjeti (Adriatik); M. van Harmelen; D. Elliott (Desmond)
htmlabstractIn recent years, several datasets have been released that include images and text, giving impulse to new methods that combine natural language processing and computer vision. However, there is a need for datasets of images in their natural textual context. The ION corpus contains 300K
The chosen source texts deal with a variety of topics such as the environment, globalization, psychology, history, politics, drama, etc. Their Arabic translations were taken from The World of Knowledge series published by the National Council for Culture, Arts and Letters (NCCAL) in Kuwait. Keywords: parallel corpus ...
Madaan, Nishtha; Mehta, Sameep; Saxena, Mayank; Aggarwal, Aditi; Agrawaal, Taneea S; Malhotra, Vrinda
In past few years, several data-sets have been released for text and images. We present an approach to create the data-set for use in detecting and removing gender bias from text. We also include a set of challenges we have faced while creating this corpora. In this work, we have worked with movie data from Wikipedia plots and movie trailers from YouTube. Our Bollywood Movie corpus contains 4000 movies extracted from Wikipedia and 880 trailers extracted from YouTube which were released from 1...
He, Bin; Dong, Bin; Guan, Yi; Yang, Jinfeng; Jiang, Zhipeng; Yu, Qiubin; Cheng, Jianyi; Qu, Chunyan
To build a comprehensive corpus covering syntactic and semantic annotations of Chinese clinical texts with corresponding annotation guidelines and methods as well as to develop tools trained on the annotated corpus, which supplies baselines for research on Chinese texts in the clinical domain. An iterative annotation method was proposed to train annotators and to develop annotation guidelines. Then, by using annotation quality assurance measures, a comprehensive corpus was built, containing annotations of part-of-speech (POS) tags, syntactic tags, entities, assertions, and relations. Inter-annotator agreement (IAA) was calculated to evaluate the annotation quality and a Chinese clinical text processing and information extraction system (CCTPIES) was developed based on our annotated corpus. The syntactic corpus consists of 138 Chinese clinical documents with 47,426 tokens and 2612 full parsing trees, while the semantic corpus includes 992 documents that annotated 39,511 entities with their assertions and 7693 relations. IAA evaluation shows that this comprehensive corpus is of good quality, and the system modules are effective. The annotated corpus makes a considerable contribution to natural language processing (NLP) research into Chinese texts in the clinical domain. However, this corpus has a number of limitations. Some additional types of clinical text should be introduced to improve corpus coverage and active learning methods should be utilized to promote annotation efficiency. In this study, several annotation guidelines and an annotation method for Chinese clinical texts were proposed, and a comprehensive corpus with its NLP modules were constructed, providing a foundation for further study of applying NLP techniques to Chinese texts in the clinical domain. Copyright © 2017. Published by Elsevier Inc.
In August 1996, the 38 Million Words Corpus was available for consultation by the international research community. The present paper reports on the characteristics of this corpus (design, text classification, linguistic annotation) and on its use, both in dictionary projects and in linguistic research. In spite of limitations with ...
Guerra Valdes, R.
In the present work the WONP-NURT corpus is taken as knowledge base for text mining in the INIS database. Main components of the information processing system, as well as computational methods for content analysis of INIS database record files are described. Results of the content analysis of the WONP-NURT corpus are reported. Furthermore, results of two comparative text mining studies in the INIS database are also shown. The first one explores 10 research areas in the more familiar nearest range of WONP-NURT corpus, while the second one surveys 15 regions in the more exotic far range. The results provide new elements to asses the significance of the WONP-NURT corpus in the context of the current state of nuclear science and technology research areas. (Author)
Vijay Krishna Menon
Full Text Available Tree adjoining grammars (TAGs are mildly context sensitive formalisms used mainly in modelling natural languages. Usage and research on these psycho linguistic formalisms have been erratic in the past decade, due to its demanding construction and difficulty to parse. However, they represent promising future for formalism based NLP in multilingual scenarios. In this paper we demonstrate basic synchronous Tree adjoining grammar for English-Tamil language pair that can be used readily for machine translation. We have also developed a multithreaded chart parser that gives ambiguous deep structures and a par dependency structure known as TAG derivation. Furthermore we then focus on a model for training this TAG for each language using a large corpus of text through a map reduce frequency count model in spark and estimation of various probabilistic parameters for the grammar trees thereafter; these parameters can be used to perform statistical parsing on the trained grammar.
Alnazzawi, Noha; Thompson, Paul; Batista-Navarro, Riza; Ananiadou, Sophia
Phenotypic information locked away in unstructured narrative text presents significant barriers to information accessibility, both for clinical practitioners and for computerised applications used for clinical research purposes. Text mining (TM) techniques have previously been applied successfully to extract different types of information from text in the biomedical domain. They have the potential to be extended to allow the extraction of information relating to phenotypes from free text. To stimulate the development of TM systems that are able to extract phenotypic information from text, we have created a new corpus (PhenoCHF) that is annotated by domain experts with several types of phenotypic information relating to congestive heart failure. To ensure that systems developed using the corpus are robust to multiple text types, it integrates text from heterogeneous sources, i.e., electronic health records (EHRs) and scientific articles from the literature. We have developed several different phenotype extraction methods to demonstrate the utility of the corpus, and tested these methods on a further corpus, i.e., ShARe/CLEF 2013. Evaluation of our automated methods showed that PhenoCHF can facilitate the training of reliable phenotype extraction systems, which are robust to variations in text type. These results have been reinforced by evaluating our trained systems on the ShARe/CLEF corpus, which contains clinical records of various types. Like other studies within the biomedical domain, we found that solutions based on conditional random fields produced the best results, when coupled with a rich feature set. PhenoCHF is the first annotated corpus aimed at encoding detailed phenotypic information. The unique heterogeneous composition of the corpus has been shown to be advantageous in the training of systems that can accurately extract phenotypic information from a range of different text types. Although the scope of our annotation is currently limited to a single
speech annotations are described in detail in accordance to baseline work. The stories were recorded in two speaking styles that are neutral and storytelling speaking style. The first. Malay language storytelling corpus is not only necessary for the development of a storytelling text-to-speech (TTS) synthesis. It is also ...
Verspoor, Karin; Cohen, Kevin Bretonnel; Lanfranchi, Arrick; Warner, Colin; Johnson, Helen L; Roeder, Christophe; Choi, Jinho D; Funk, Christopher; Malenkiy, Yuriy; Eckert, Miriam; Xue, Nianwen; Baumgartner, William A; Bada, Michael; Palmer, Martha; Hunter, Lawrence E
We introduce the linguistic annotation of a corpus of 97 full-text biomedical publications, known as the Colorado Richly Annotated Full Text (CRAFT) corpus. We further assess the performance of existing tools for performing sentence splitting, tokenization, syntactic parsing, and named entity recognition on this corpus. Many biomedical natural language processing systems demonstrated large differences between their previously published results and their performance on the CRAFT corpus when tested with the publicly available models or rule sets. Trainable systems differed widely with respect to their ability to build high-performing models based on this data. The finding that some systems were able to train high-performing models based on this corpus is additional evidence, beyond high inter-annotator agreement, that the quality of the CRAFT corpus is high. The overall poor performance of various systems indicates that considerable work needs to be done to enable natural language processing systems to work well when the input is full-text journal articles. The CRAFT corpus provides a valuable resource to the biomedical natural language processing community for evaluation and training of new models for biomedical full text publications.
Osborne, John D; Neu, Matthew B; Danila, Maria I; Solorio, Thamar; Bethard, Steven J
Traditionally text mention normalization corpora have normalized concepts to single ontology identifiers ("pre-coordinated concepts"). Less frequently, normalization corpora have used concepts with multiple identifiers ("post-coordinated concepts") but the additional identifiers have been restricted to a defined set of relationships to the core concept. This approach limits the ability of the normalization process to express semantic meaning. We generated a freely available corpus using post-coordinated concepts without a defined set of relationships that we term "compositional concepts" to evaluate their use in clinical text. We annotated 5397 disorder mentions from the ShARe corpus to SNOMED CT that were previously normalized as "CUI-less" in the "SemEval-2015 Task 14" shared task because they lacked a pre-coordinated mapping. Unlike the previous normalization method, we do not restrict concept mappings to a particular set of the Unified Medical Language System (UMLS) semantic types and allow normalization to occur to multiple UMLS Concept Unique Identifiers (CUIs). We computed annotator agreement and assessed semantic coverage with this method. We generated the largest clinical text normalization corpus to date with mappings to multiple identifiers and made it freely available. All but 8 of the 5397 disorder mentions were normalized using this methodology. Annotator agreement ranged from 52.4% using the strictest metric (exact matching) to 78.2% using a hierarchical agreement that measures the overlap of shared ancestral nodes. Our results provide evidence that compositional concepts can increase semantic coverage in clinical text. To our knowledge we provide the first freely available corpus of compositional concept annotation in clinical text.
Cohen, K Bretonnel; Lanfranchi, Arrick; Choi, Miji Joo-Young; Bada, Michael; Baumgartner, William A; Panteleyeva, Natalya; Verspoor, Karin; Palmer, Martha; Hunter, Lawrence E
Coreference resolution is the task of finding strings in text that have the same referent as other strings. Failures of coreference resolution are a common cause of false negatives in information extraction from the scientific literature. In order to better understand the nature of the phenomenon of coreference in biomedical publications and to increase performance on the task, we annotated the Colorado Richly Annotated Full Text (CRAFT) corpus with coreference relations. The corpus was manually annotated with coreference relations, including identity and appositives for all coreferring base noun phrases. The OntoNotes annotation guidelines, with minor adaptations, were used. Interannotator agreement ranges from 0.480 (entity-based CEAF) to 0.858 (Class-B3), depending on the metric that is used to assess it. The resulting corpus adds nearly 30,000 annotations to the previous release of the CRAFT corpus. Differences from related projects include a much broader definition of markables, connection to extensive annotation of several domain-relevant semantic classes, and connection to complete syntactic annotation. Tool performance was benchmarked on the data. A publicly available out-of-the-box, general-domain coreference resolution system achieved an F-measure of 0.14 (B3), while a simple domain-adapted rule-based system achieved an F-measure of 0.42. An ensemble of the two reached F of 0.46. Following the IDENTITY chains in the data would add 106,263 additional named entities in the full 97-paper corpus, for an increase of 76% percent in the semantic classes of the eight ontologies that have been annotated in earlier versions of the CRAFT corpus. The project produced a large data set for further investigation of coreference and coreference resolution in the scientific literature. The work raised issues in the phenomenon of reference in this domain and genre, and the paper proposes that many mentions that would be considered generic in the general domain are not
Explicitation is the process of rendering information which is only implicit in the source text explicit in the target text, and is believed to be one of the universals of translation (Blum-Kulka 1986, Olohan and Baker 2000, Øverås 1998, Séguinot 1988, Vanderauwera 1985). The present study uses corpus technology to attempt to shed some light on the complex relationship between translation, text length and explicitation. An awareness of what makes translations longer (or shorter) and more expl...
Van Auken, Kimberly; Schaeffer, Mary L; McQuilton, Peter; Laulederkind, Stanley J F; Li, Donghui; Wang, Shur-Jen; Hayman, G Thomas; Tweedie, Susan; Arighi, Cecilia N; Done, James; Müller, Hans-Michael; Sternberg, Paul W; Mao, Yuqing; Wei, Chih-Hsuan; Lu, Zhiyong
Gene function curation via Gene Ontology (GO) annotation is a common task among Model Organism Database groups. Owing to its manual nature, this task is considered one of the bottlenecks in literature curation. There have been many previous attempts at automatic identification of GO terms and supporting information from full text. However, few systems have delivered an accuracy that is comparable with humans. One recognized challenge in developing such systems is the lack of marked sentence-level evidence text that provides the basis for making GO annotations. We aim to create a corpus that includes the GO evidence text along with the three core elements of GO annotations: (i) a gene or gene product, (ii) a GO term and (iii) a GO evidence code. To ensure our results are consistent with real-life GO data, we recruited eight professional GO curators and asked them to follow their routine GO annotation protocols. Our annotators marked up more than 5000 text passages in 200 articles for 1356 distinct GO terms. For evidence sentence selection, the inter-annotator agreement (IAA) results are 9.3% (strict) and 42.7% (relaxed) in F1-measures. For GO term selection, the IAAs are 47% (strict) and 62.9% (hierarchical). Our corpus analysis further shows that abstracts contain ∼ 10% of relevant evidence sentences and 30% distinct GO terms, while the Results/Experiment section has nearly 60% relevant sentences and >70% GO terms. Further, of those evidence sentences found in abstracts, less than one-third contain enough experimental detail to fulfill the three core criteria of a GO annotation. This result demonstrates the need of using full-text articles for text mining GO annotations. Through its use at the BioCreative IV GO (BC4GO) task, we expect our corpus to become a valuable resource for the BioNLP research community. Database URL: http://www.biocreative.org/resources/corpora/bc-iv-go-task-corpus/. Published by Oxford University Press 2014. This work is written by US
Bajlak Ch. Oorzhak
Full Text Available The article examines the progress of semantic markup of the Electronic corpus of texts in Tuvan language (ECTTL, which is another stage of adding Tuvan texts to the database and marking up the corpus. ECTTL is a collaborative project by researchers from Tuvan State University (Research and Education Center of Turkic Studies and Department of Information Technologies. Semantic markup of Tuvan lexis will come as a search engine and reference system which will help users find text snippets containing words with desired meanings in ECTTL. The first stage of this process is setting up databases of basic lexemes of Tuvan language. All meaningful lexemes were classified into the following semantic groups: humans, animals, objects, natural objects and phenomena, and abstract concepts. All Tuvan object nouns, as well as both descriptive and relative adjectives, were assigned to one of these lexico-semantic classes. Each class, sub-class and descriptor is tagged in Tuvan, Russian and English; these tags, in turn, will help automatize searching. The databases of meaningful lexemes of Tuvan language will also outline their lexical combinations. The automatized system will contain information on semantic combinations of adjectives with nouns, adverbs with verbs, nouns with verbs, as well as on the combinations which are semantically incompatible.
To support the research in the field of library and information science terminology and dictionary construction in Slovene language a specialized text corpus has been designed and constructed. The corpus has reached 3,6 million words extracted from 625 Slovene technical and scientific texts of the field. It supports a variety of specialized search methods, display of search results, and their statistic computation. The web based application is in open public access.
Full Text Available
Abstract: Bilingual lexicographers, translation specialists and English teachers in the Arabworld do not have access to computerized corpora of parallel texts for the English–Arabic languagepair. This project has been carried out to meet this requirement by establishing the first generalparallel corpus of English texts and their Arabic translations. The first phase of the project involvedthe selection of general source texts having appropriate lexical and stylistic features. The chosensource texts deal with a variety of topics such as the environment, globalization, psychology, history,politics, drama, etc. Their Arabic translations were taken from The World of Knowledge seriespublished by the National Council for Culture, Arts and Letters (NCCAL in Kuwait.
Keywords: PARALLEL CORPUS, LEXICOGRAPHY, TRANSLATION, BILINGUAL DICTIONARY,COLLOCATIONS, ALIGNMENT, SYNONYMS, DERIVATIVES, ANTONYMS, GLOSSARY,FREQUENCY
Opsomming: 'n Nuwe Engels–Arabiese parallelletekskorpus vir leksikografiesetoepassings Tweetalige leksikograwe, vertaalkundiges en Engelsonderwysers in dieArabiese wêreld het nie toegang tot gerekenariseerde korpusse van parallelle tekste vir die Engels–Arabiese taalpaar nie. Hierdie projek is onderneem om in dié behoefte te voorsien deur die eerstealgemene parallelle korpus van Engelse tekste en hul Arabiese vertalings tot stand te bring. Dieeerste fase van die projek het die keuse van algemene brontekste behels wat geskikte leksikale enstilistiese eienskappe besit. Die gekose brontekste handel oor 'n verskeidenheid onderwerpe soosdie omgewing, globalisering, psigologie, geskiedenis, politiek, drama, ens. Hul Arabiese vertalingsis geneem uit The World of Knowledge-reeks gepubliseer deur die National Council for Culture, Artsand Letters (NCCAL in Koeweit.
Sleutelwoorde: PARALLELLE KORPUS, LEKSIKOGRAFIE, VERTALING, TWEETALIGEWOORDEBOEK, KOLLOKASIES, OOREENSTEMMING, SINONIEME, AFLEIDINGS, ANTONIEME
Full Text Available The paper deals with the field of Czech corpus linguistics and represents one of various current studies analysing text coherence through language interactions. It presents a corpusbased analysis of grammatical coreference and sentence information structure (in terms of contextual boundness in Czech. It focuses on examining the interaction of these two language phenomena and observes where they meet to participate in text structuring. Specifically, the paper analyses contextually bound and non-bound sentence items and examines whether (and how often they are involved in relations of grammatical coreference in Czech newspaper articles. The analysis is carried out on the language data of the Prague Dependency Treebank (PDT containing 3,165 Czech texts. The results of the analysis are helpful in automatic text annotation - the paper presents how (or to what extent the annotation of grammatical coreference may be used in automatic (pre-annotation of sentence information structure in Czech. It demonstrates how accurately we may (automatically assume the value of contextual boundness for the antecedent and anaphor (as the two participants of a grammatical coreference relation. The results of the paper demonstrate that the anaphor of grammatical coreference is automatically predictable - it is a non-contrastive contextually bound sentence item in 99.18% of cases. On the other hand, the value of contextual boundness of the antecedent is not so easy to estimate (according to the PDT, the antecedent is contextually non-bound in 37% of cases, non-contrastive contextually bound in 50% and contrastive contextually bound in 13% of cases.
Full Text Available AbstractBasal encephalocele is a rare craniofacial anomaly. In the presentpaper we report a 10-year-old boy presented with cleftpalate, congenital nystagmus, and hypertelorism. During preoperativeevaluation for cleft palate repair, a pulsatile masswas detected in the pharynx. Magnetic resonance imagingshowed sphenoethmoidal type of basal encephalocele andagenesis of corpus callosum. Neurosurgical consultation wasperformed for further evaluation and management.Iran J Med Sci 2010; 35(2: 154-156.
Najafi, Elham; Darooneh, Amir H.
Text can be regarded as a complex system. There are some methods in statistical physics which can be used to study this system. In this work, by means of statistical physics methods, we reveal new universal behaviors of texts associating with the fractality values of words in a text. The fractality measure indicates the importance of words in a text by considering distribution pattern of words throughout the text. We observed a power law relation between fractality of text and vocabulary size for texts and corpora. We also observed this behavior in studying biological data.
Kim, Sun; Chatr-aryamontri, Andrew; Chang, Christie S.; Oughtred, Rose; Rust, Jennifer; Wilbur, W. John; Comeau, Donald C.; Dolinski, Kara; Tyers, Mike
A great deal of information on the molecular genetics and biochemistry of model organisms has been reported in the scientific literature. However, this data is typically described in free text form and is not readily amenable to computational analyses. To this end, the BioGRID database systematically curates the biomedical literature for genetic and protein interaction data. This data is provided in a standardized computationally tractable format and includes structured annotation of experimental evidence. BioGRID curation necessarily involves substantial human effort by expert curators who must read each publication to extract the relevant information. Computational text-mining methods offer the potential to augment and accelerate manual curation. To facilitate the development of practical text-mining strategies, a new challenge was organized in BioCreative V for the BioC task, the collaborative Biocurator Assistant Task. This was a non-competitive, cooperative task in which the participants worked together to build BioC-compatible modules into an integrated pipeline to assist BioGRID curators. As an integral part of this task, a test collection of full text articles was developed that contained both biological entity annotations (gene/protein and organism/species) and molecular interaction annotations (protein–protein and genetic interactions (PPIs and GIs)). This collection, which we call the BioC-BioGRID corpus, was annotated by four BioGRID curators over three rounds of annotation and contains 120 full text articles curated in a dataset representing two major model organisms, namely budding yeast and human. The BioC-BioGRID corpus contains annotations for 6409 mentions of genes and their Entrez Gene IDs, 186 mentions of organism names and their NCBI Taxonomy IDs, 1867 mentions of PPIs and 701 annotations of PPI experimental evidence statements, 856 mentions of GIs and 399 annotations of GI evidence statements. The purpose, characteristics and possible future
Islamaj Dogan, Rezarta; Kim, Sun; Chatr-Aryamontri, Andrew; Chang, Christie S; Oughtred, Rose; Rust, Jennifer; Wilbur, W John; Comeau, Donald C; Dolinski, Kara; Tyers, Mike
A great deal of information on the molecular genetics and biochemistry of model organisms has been reported in the scientific literature. However, this data is typically described in free text form and is not readily amenable to computational analyses. To this end, the BioGRID database systematically curates the biomedical literature for genetic and protein interaction data. This data is provided in a standardized computationally tractable format and includes structured annotation of experimental evidence. BioGRID curation necessarily involves substantial human effort by expert curators who must read each publication to extract the relevant information. Computational text-mining methods offer the potential to augment and accelerate manual curation. To facilitate the development of practical text-mining strategies, a new challenge was organized in BioCreative V for the BioC task, the collaborative Biocurator Assistant Task. This was a non-competitive, cooperative task in which the participants worked together to build BioC-compatible modules into an integrated pipeline to assist BioGRID curators. As an integral part of this task, a test collection of full text articles was developed that contained both biological entity annotations (gene/protein and organism/species) and molecular interaction annotations (protein-protein and genetic interactions (PPIs and GIs)). This collection, which we call the BioC-BioGRID corpus, was annotated by four BioGRID curators over three rounds of annotation and contains 120 full text articles curated in a dataset representing two major model organisms, namely budding yeast and human. The BioC-BioGRID corpus contains annotations for 6409 mentions of genes and their Entrez Gene IDs, 186 mentions of organism names and their NCBI Taxonomy IDs, 1867 mentions of PPIs and 701 annotations of PPI experimental evidence statements, 856 mentions of GIs and 399 annotations of GI evidence statements. The purpose, characteristics and possible future
Bingel, Joachim; Haider, Thomas
We describe a systematic and application-oriented approach to training and evaluating named entity recognition and classification (NERC) systems, the purpose of which is to identify an optimal system and to train an optimal model for named entity tagging DeReKo, a very large general-purpose corpus...... when evaluated on more uniform and less diverse data. We create and manually annotate such a representative sample as evaluation data for three different NERC systems, for each of which various models are learnt on multiple training data. The proposed sampling method can be viewed as a generally...
Full Text Available Information institutions use text-based information retrieval systems to store, index and retrieve metadata, full-text, or both metadata and full-text (hybrid contents. The aim of this research was to evaluate impact of these contents on information retrieval performance. For this purpose, metadata (MIR, full-text (FIR and hybrid (HIR content information retrieval systems were developed with default Lucene information retrieval model for a small scale Turkish corpus. In order to evaluate performance of this three systems, “precision - recall” and “normalized recall” tests were conducted. Experimental findings showed that there were no significant differences between MIR and FIR in mean average precision (MAP performance. On the other hand, MAP performance of HIR was significantly higher in comparison to MIR and FIR. When information retrieval performance was evaluated as user-centered, the “normalized recall” performances of MIR and HIR were significantly higher than FIR. Additionally, there were no significant differences between the systems in retrieved relevant document means. Processing different types of contents such as metadata and full-text had some advantages and disadvantages for information retrieval systems in terms of term management. The advantages brought together in hybrid content processing (HIR and information retrieval performance improved.
Zhao, Lianhua; Ma, Qiang; Wang, Qiushi; Zeng, Ying; Luo, Qingya; Xiao, Hualiang
Primary diffuse large B cell lymphoma (DLBCL) of the uterus is rare, and primary DLBCL arising from a uterine leiomyoma (collision tumor) has not been reported in the literature. We describe the clinical, histological, immunohistochemical, and molecular features of primary DLBCL arising from a leiomyoma in the uterine corpus. A 73-year-old female patient had a uterine mass for 23 years. An ultrasound scan revealed marked enlargement of the uterus, measuring 18.2 × 13 × 16.3 cm, with a 17.6 × 10.9 × 11.6 cm hypoechoic mass in the uterine corpus. The tumors consisted of medium- to large-sized cells exhibiting a diffuse pattern of growth with a well-circumscribed leiomyoma. The neoplastic cells strongly expressed CD79α, CD20 and PAX5. Molecular analyses indicated clonal B-cell receptor gene rearrangement. To the best of our knowledge, no previous cases of primary DLBCL arising from a leiomyoma have been reported. It is necessary to differentiate a diagnosis of primary DLBCL arising from a leiomyoma from that of leiomyoma with florid reactive lymphocytic infiltration (lymphoma-like lesion). Careful analysis of clinical, histological, immunophenotypic, and genetic features is required to establish the correct diagnosis.
known as H1N1, H1N2 , H3N1, H3N2, and H2N3. Swine influenza virus is common throughout pig populations worldwide. Transmission of the virus from pigs to...large region, for instance a continent, or even worldwide. Swine Flu Swine influenza (also called pig influenza , swine flu, hog flu and pig flu) is an...infection by any one of several types of swine influenza virus. Swine influenza virus (SIV) or S-OIV (swine- origin influenza virus) is any strain
procedure described in Part I can be brought to bear on the task of making Kaj Munk’s works available electronically to the general public. I do so by describing how I have implemented a “Munk Browser” desktop application. Chapter 13 discusses ways in which the EMdF model and the MQL query language can...... language can be extended to support the requirements of the problem of storing and retrieving annotated text even better. Finally, Chapter 15 concludes the dissertation. Appendix A gives the grammar for the subset of the MQL query language which closely resembles Doedens’s QL. Seven already-published...
Frandsen, Tove Faber; Nicolaisen, Jeppe
Using statistical methods to analyse digital material for patterns makes it possible to detect patterns in big data that we would otherwise not be able to detect. This paper seeks to exemplify this fact by statistically analysing a large corpus of references in systematic reviews. The aim...
Potts, Amanda; Kjær, Anne Lise
that legal language can be subjective and emotive. The semantic field of ‘crime’ is an expected key, but concordance analysis shows ideological skew in discursive construction of crimes/victims. For instance, ‘rape’/‘sexual assault’ co-occurs with female victims, whereas ‘torture’/‘outrages upon personal......Legal language is an integral and foundational party of our social reality, but it is underrepresented in interdisciplinary, critical linguistic analyses. This is perhaps because language is more objective and formulaic than media texts, which can be more subjective and emotive (Kjær and Palsbro......, 2008). In this paper, I demonstrate how a corpus-based critical discourse analysis of legal language can expose hidden traces of the underlying ideologies of text creators, while demonstrating how identity can be performed in legal texts. Research is based on a half-million-word corpus of annual...
Altszyler, Edgar; Ribeiro, Sidarta; Sigman, Mariano; Fernández Slezak, Diego
Computer-based dreams content analysis relies on word frequencies within predefined categories in order to identify different elements in text. As a complementary approach, we explored the capabilities and limitations of word-embedding techniques to identify word usage patterns among dream reports. These tools allow us to quantify words associations in text and to identify the meaning of target words. Word-embeddings have been extensively studied in large datasets, but only a few studies analyze semantic representations in small corpora. To fill this gap, we compared Skip-gram and Latent Semantic Analysis (LSA) capabilities to extract semantic associations from dream reports. LSA showed better performance than Skip-gram in small size corpora in two tests. Furthermore, LSA captured relevant word associations in dream collection, even in cases with low-frequency words or small numbers of dreams. Word associations in dreams reports can thus be quantified by LSA, which opens new avenues for dream interpretation and decoding. Copyright © 2017 Elsevier Inc. All rights reserved.
A methodology for generating text map representations of the semantic content of text databases is presented. Text maps provide a graphical metaphor for conceptualizing and visualizing the contents and data interrelationships of large text databases. Described are a set of experiments conducted against the TIPSTER corpora of Wall Street Journal articles. These experiments provide an introduction to current work in the representation and visualization of documents by way of their semantic content.
Full Text Available This paper seeks to describe some crucial importance of corpus and text processing. Corpus is a projection of how language is used by its speakers. Technology support has improved corpus for easier maintenance, made it space-saving, and it may electronically structure its data. The latest offers much freedom for corpus users to access and exploit it for language teaching, analysis or other specified tasks. This paper will demonstrate how to use open-access corpus on internet such as Corpus of Contemporary American English (COCA and British National Corpus (BNC. Besides how to use a corpus, another crucial importance that this paper seeks to describe is how to build a corpus. In this paper, the writer will use UNITEX, a corpus (text-based processing software. This software will demonstrate steps of corpus building, ranging from text collection, annotation, electronic dictionary application to some natural language based operations ranging from pattern matching, concordance, to simple extraction. It will show how graph technology may outperform regular expression, a retrieval method exploited by other corpus processor, in terms of writing output.
Full Text Available In the context of the glocalization of business, it is becoming increasingly important to better understand the cross-linguistic persuasive communication conveyed through media, such as advertisement, which is considered one of the most active form of modern media. In achieving this goal, intertextuality study in the pragmatic field proves to be helpful. Employing quantitative and qualitative approaches, we compare English and Chinese texts of advertisements in Cosmopolitan, the most sellable female fashion magazine, with the study focused on lexical, thematic and cultural intertextuality. It is found that the glocalization of advertisement for female products and services are dependent on local culture and language. The analysis of intertextuality between two texts will contribute to researches on female advertisements and international marketing strategy.
Full Text Available The phonetical statistics were collected from several Polish corpora. The paper is a summaryof the data which are phoneme n-grams and some phenomena in the statistics. Triphonestatistics apply context-dependent speech units which have an important role in speech recognitionsystems and were never calculated for a large set of Polish written texts. The standardphonetic alphabet for Polish, SAMPA, and methods of providing phonetic transcriptions are described.
Kelly, Colin; Devereux, Barry; Korhonen, Anna
Traditional methods for deriving property-based representations of concepts from text have focused on either extracting only a subset of possible relation types, such as hyponymy/hypernymy (e.g., car is-a vehicle) or meronymy/metonymy (e.g., car has wheels), or unspecified relations (e.g., car--petrol). We propose a system for the challenging task of automatic, large-scale acquisition of unconstrained, human-like property norms from large text corpora, and discuss the theoretical implications of such a system. We employ syntactic, semantic, and encyclopedic information to guide our extraction, yielding concept-relation-feature triples (e.g., car be fast, car require petrol, car cause pollution), which approximate property-based conceptual representations. Our novel method extracts candidate triples from parsed corpora (Wikipedia and the British National Corpus) using syntactically and grammatically motivated rules, then reweights triples with a linear combination of their frequency and four statistical metrics. We assess our system output in three ways: lexical comparison with norms derived from human-generated property norm data, direct evaluation by four human judges, and a semantic distance comparison with both WordNet similarity data and human-judged concept similarity ratings. Our system offers a viable and performant method of plausible triple extraction: Our lexical comparison shows comparable performance to the current state-of-the-art, while subsequent evaluations exhibit the human-like character of our generated properties.
Tony Berber Sardinha
Full Text Available A KeyWords analysis (using WordSmith Tools enables the discovery of lexical items which reveal the main lexical sets in a text or corpus. Such an analysis requires that a reference corpus be compared to the corpus the researcher intends to describe (the study corpus. This paper presents a mathematical method for finding out the influence of reference corpus size on the number of key words extracted by the program. The results reveal that a reference corpus that is at least five times as large as the study corpus allows for drawing an amount of key words that is statistically equivalent to larger reference corpora, thus suggesting five times (as larger as the study corpora as the minimum order of magnitude for reference corpora.
Roll, Uri; Correia, Ricardo A; Berger-Tal, Oded
Systematic reviews are an increasingly popular decision-making tool that provides an unbiased summary of evidence to support conservation action. These reviews bridge the gap between researchers and managers by presenting a comprehensive overview of all studies relating to a particular topic and identify specifically where and under which conditions an effect is present. However, several technical challenges can severely hinder the feasibility and applicability of systematic reviews, for example, homonyms (terms that share spelling but differ in meaning). Homonyms add noise to search results and cannot be easily identified or removed. We developed a semiautomated approach that can aid in the classification of homonyms among narratives. We used a combination of automated content analysis and artificial neural networks to quickly and accurately sift through large corpora of academic texts and classify them to distinct topics. As an example, we explored the use of the word reintroduction in academic texts. Reintroduction is used within the conservation context to indicate the release of organisms to their former native habitat; however, a Web of Science search for this word returned thousands of publications in which the term has other meanings and contexts. Using our method, we automatically classified a sample of 3000 of these publications with over 99% accuracy, relative to a manual classification. Our approach can be used easily with other homonyms and can greatly facilitate systematic reviews or similar work in which homonyms hinder the harnessing of large text corpora. Beyond homonyms we see great promise in combining automated content analysis and machine-learning methods to handle and screen big data for relevant information in conservation science. © 2017 Society for Conservation Biology.
Hochberg, J.; Scovel, C.; Thomas, T.; Hall, S.
This paper describes a method for asking statistical questions about a large text corpus. The authors exemplify the method by addressing the question, ``What percentage of Federal Register documents are real documents, of possible interest to a text researcher or analyst?`` They estimate an answer to this question by evaluating 200 documents selected from a corpus of 45,820 Federal Register documents. Bayesian analysis and stratified sampling are used to reduce the sampling uncertainty of the estimate from over 3,100 documents to fewer than 1,000. A possible application of the method is to establish baseline statistics used to estimate recall rates for information retrieval systems.
We present TISC, a language-independent and context-sensitive spelling checking and correction system designed to facilitate the automatic removal of non-word spelling errors in large corpora. Its lexicon is derived from a very large corpus of raw text, without supervision, and contains word
Full Text Available Over recent years, the use of corpora in stylistic analysis has grown in popularity. However, questions still remain over the remit of corpus stylistics, its distinction from corpus linguistics generally and its capacity to explain complex stylistic effects. This article argues in favour of an integrated corpus stylistics; that is, an approach to corpus stylistics that integrates it with other stylistic methods and analytical frameworks. I suggest that this approach is needed for two main reasons: (i it is analytically necessary in order to fully explain stylistic effects in texts, and (ii integrating corpus methods with other stylistic tools is what will distinguish corpus stylistics from corpus linguistics. My argument is supported by reference to examples from Mark Haddon’s no vel The Curious Incident of the Dog in the Night-time and the HBO TV series Deadwood. Both these examples rely for their explanation on a combination of corpus stylistic analytical techniques and other stylistic methods of analysis.
Full Text Available This article represents the book “Literary Hermeneutic” by Victoria Fonari, Ph.D., State University of Moldova. Hermeneutic, as a researching object, includes literary, critical, theological, juridical, linguistic, psychological, verbal and sociological knowledge. Literary Hermeneutic is one of the most favored disciplines. It is venerated both in Homeric exegesis from antiquity and in the improvement of the methodology interpretation of the canonical works, in which a vain moment is texts’ deciphering – the monuments and authors’ comment from times immemorial, thus re-establishing a part of human values. The re-establishing of the connections between the values of the past and their understanding from the present prospect is due to literary interpretation. The demands of the paradigm of the literary and artistic interpretation, constitutes a basic element which is important both for the writing of academic researches and for the literary values of understanding. It directs the student to scientific works and facilitated the professional activity of teachers, journalists, jurists and translators.
Full Text Available Med 12. in 15. julijem je na Univerzi v Lancastru potekala poletna šola korpusnega jezikoslovja Lancaster Summer Schools in Corpus Linguistics and Other Digital Methods. Poletno šolo so organizirali UCREL (University Centre for Computer Corpus Research on Language, ERC (Evropski svet za raziskave – European Research Council, CASS (ESRC Centre for Corpus Approaches to Social Science in ESRC (Economic and Social Research Council, razdeljena pa je bila na šest programov, prilagojenih različnim področjem: Korpusno jezikoslovje za proučevanje jezikov (Corpus Linguistics for Language Studies, Korpusno jezikoslovje za družbene vede (Corpus Linguistics for Social Science, Korpusno jezikoslovje za humanistiko (Corpus Linguistics for Humanities, Statistika za korpusno jezikoslovje (Statistics for Corpus Linguistics, Geografski informacijski sistemi za digitalno humanistiko (Geographical Information Systems for the Digital Humanities in Korpusno podprta obdelava naravnih jezikov (Corpus-based Natural Language Processing.
Gelfand, Jessica T; Christie, Robert E; Gelfand, Stanley A
Speech recognition may be analyzed in terms of recognition probabilities for perceptual wholes (e.g., words) and parts (e.g., phonemes), where j or the j-factor reveals the number of independent perceptual units required for recognition of the whole (Boothroyd, 1968b; Boothroyd & Nittrouer, 1988; Nittrouer & Boothroyd, 1990). For consonant-vowel-consonant (CVC) nonsense syllables, j ∼ 3 because all 3 phonemes are needed to identify the syllable, but j ∼ 2.5 for real-word CVCs (revealing ∼2.5 independent perceptual units) because higher level contributions such as lexical knowledge enable word recognition even if less than 3 phonemes are accurately received. These findings were almost exclusively determined with the 120-word corpus of the isophonemic word lists (Boothroyd, 1968a; Boothroyd & Nittrouer, 1988), presented one word at a time. It is therefore possible that its generality or applicability may be limited. This study thus determined j by using a much larger and less restricted corpus of real-word CVCs presented in 3-word groups as well as whether j is influenced by test size. The j-factor for real-word CVCs was derived from the recognition performance of 223 individuals with a broad range of hearing sensitivity by using the Tri-Word Test (Gelfand, 1998), which involves 50 three-word presentations and a corpus of 450 words. The influence of test size was determined from a subsample of 96 participants with separate scores for the first 10, 20, and 25 (and all 50) presentation sets of the full test. The mean value of j was 2.48 with a 95% confidence interval of 2.44-2.53, which is in good agreement with values obtained with isophonemic word lists, although its value varies among individuals. A significant correlation was found between percent-correct scores and j, but it was small and accounted for only 12.4% of the variance in j for phoneme scores ≥60%. Mean j-factors for the 10-, 20-, 25-, and 50-set test sizes were between 2.49 and 2.53 and were not
Fleck, Regina Caballero
Full Text Available Profissionais que trabalham com traduções e línguas em geral provavelmente já encontraram em textos palavras “intraduzíveis”, tais como “namorar” e “date”. O presente estudo busca difundir o uso de ferramentas baseadas em corpus entre tradutores literários. Nossas perguntas de pesquisa são: quais as soluções tradutórias encontradas no corpus? Como essas soluções estão relacionadas a fatores extralinguísticos? Os dados deste estudo foram retirados do Compara, corpus paralelo que está disponível online e que consiste em textos originais em português e inglês alinhados com suas respectivas traduções. A fim de analisarmos os exemplos, nossos parâmetros serão as definições dos dicionários Houaiss e Oxford. Ao fim deste estudo, podemos observar uma equivalência unilateral entre “namorar” e “date” e que esses termos evoluíram de maneira distinta nos dois idiomas
Susana M. Lizcano Rejano
Full Text Available We search through the Corpus Philostrateum for the presence of connections between this literary production and Orphismus – its system of beliefs, its peculiar interpretation of the traditional Greek mythology, its proposal for a particular way of life. Also, we try to determine the relation, that we can find in this corpus between the ideology and customs that the Pythagoreans and Orphics supported.
XML Old Persian corpus. The corpus is based on publicly available data on the Web. Those data can be traced back to the grammar of Old Persian by Kent (1950). The corpus contains those data and is arranged in a way suitable for corpus searches.
Wagler, Amy E.; Lesser, Lawrence M.; González, Ariel I.; Leal, Luis
A corpus of current editions of statistics textbooks was assessed to compare aspects and levels of readability for the topics of "measures of center," "line of fit," "regression analysis," and "regression inference." Analysis with lexical software of these text selections revealed that the large corpus can…
Biber, Douglas; And Others
Examines a representative text corpus to gain insights into language structure and use and to open new areas of linguistic inquiry. Various illustrations are presented that provide a glimpse into the value of corpus-based investigations for increasing one's understanding of language use and imparting insights important for designing effective…
Full Text Available El significado e interpretación de la muerte, el tratamiento de los muertos y el valor ritual de los objetos asociados ha sido un tema que ha apasionado a los estudiosos de diferentes culturas por décadas. Sin embargo, no se ha puesto el mismo énfasis en analizar el impacto que las investigaciones pueden generar en aquellos continuadores de culturas tradicionales al excavar o manipular restos humanos, en trasgresión a normas rituales preexistentes. En las últimas décadas, se ha debatido vastamente a nivel internacional los implicancias éticas del tratamiento de los restos humanos, así como los reclamos efectuados por diferentes agrupaciones indígenas, los cuales en algunos países han sido legalmente reconocidos. En nuestro país, la cuestión ha sido considerada casi exclusivamente en el marco de casos puntuales que resultaron particularmente conflictivos. Sin embargo, la reciente ley nacional 25.517 -que exige que el consentimiento de las comunidades indígenas para realizar todo emprendimiento científico que tenga por objeto dichas comunidades así como su patrimonio histórico y cultural-, pone de manifiesta la necesidad de generar nuevas modalidades de trabajo que sean consensuadas con las comunidades locales y/o étnicas involucradas. En este trabajo tiene por objeto discutir el estado de la cuestión en Argentina, presentando un variabilidad de situaciones de conflictos -potenciales o reales- en los cuales estuvieron involucrados comunidades locales, organismos provinciales y/o municipales y especialistas en diferentes provincias. A partir de este análisis se pretende contribuir a la discusión del papel que debe asumir el investigador en contextos sociales y culturales de relativamente alta conflictividad.
The World Wide Web constitutes the largest existing source of texts written in a great variety of languages. A feasible and sound way of exploiting this data for linguistic research is to compile a static corpus for a given language. There are several adavantages of this approach: (i) Working with such corpora obviates the problems encountered when using Internet search engines in quantitative linguistic research (such as non-transparent ranking algorithms). (ii) Creating a corpus from web data is virtually free. (iii) The size of corpora compiled from the WWW may exceed by several orders of magnitudes the size of language resources offered elsewhere. (iv) The data is locally available to the user, and it can be linguistically post-processed and queried with the tools preferred by her/him. This book addresses the main practical tasks in the creation of web corpora up to giga-token size. Among these tasks are the sampling process (i.e., web crawling) and the usual cleanups including boilerplate removal and rem...
The present study examined the use and effectiveness of a large corpus--the Corpus del Español (Davies, 2002)--in a 300-level Spanish grammar university course. Students conducted hands-on corpus searches with the goal of finding concordances containing particular types of collocations (combinations of words that tend to co-occur) and tokens (any…
Janne Bondi Johannessen
Full Text Available The paper describes the Nordic Dialect Corpus as of June 2010. The corpus is a tool that combines a number of useful features that together makes it a unique and very advanced resource for researchers of many fields of language search. The corpus is web-based and features full audio-visual representation linked to transcriptions and translations.
Full Text Available
Abstract: In this paper the writer examines problems the African Languages Lexical (ALLEX Project (at present the African Languages Research Institute (ALRI? encountered while tagging the Shona corpus. The problems to be highlighted include general problems which apply to more than one language as well as problems peculiar to Shona. The paper was inspired by the challenges the writer encountered when he took part in building the Shona corpus. An analysis of the problems that most corpus builders face shows that more problems are likely to be encountered when dealing with spoken corpora than with written corpora. The paper demonstrates that tagging is an important component of corpus building as it makes it easier for a researcher to extract relevant data. To utilise the benefits of a tagged corpus, the tagging should be thorough and accurate. Wellinformed decisions form an integral part of the tagging process since the utility of a tagged corpus depends largely on the input of the tagging process. This paper shows the need to take the tagging process seriously.
Keywords: ALLEX PROJECT, COMPUTER, CORPUS, ENCODING, FOREIGN WORD, LEMMATIZATION, LEXICOGRAPHY, MONITOR CORPUS, PART OF SPEECH, SCANNING, SHONA, SLANG, TAGGING, TRANSCRIPTION, WORD
Opsomming: Die Shonakorpus en die probleem van etikettering, In hierdieartikel ondersoek die outeur probleme wat die African Languages Lexical (ALLEX Project (tansdie African Languages Research Institute (ALRI» teegekom het terwyl die Shonakorpus geetiketteeris. Die probleme wat bespreek word, sluit algemene probleme in wat van toepassing is opmeer as een taa, sowel as spesifieke probleme wat eie aan Shona is. Die artikel het sy ontstaan indie uitdagings wat die outeur teegekom het terwyl hy deel gehad het aan die opbou van die Shonakorpus.'n Ontieding van die probleme waarvoor die meeste korpusbouers te staan kom, toon datdaar waarskynlik meer probleme teegekom word wanneer daar met gesproke
Word sets, keywords, and text contents: an investigation of text topic on the computer Iniciando a língüística do corpus do português: explorando um corpus para ensinar português como língua estrangeira
Antonio P. BERBER SARDINHA
Full Text Available This study presents a methodology for the identification of coherent word sets. Eight sets were initially identified and further grouped into two main sets: a `company' set and a `non-company' set. These two sets shared very few collocates, and therefore they seemed to represent distinct topics. The positions of the words in the `company' and `non-company' sets across the text were computed. The results indicated that the `non-company' sets referred to `company' implicitly. Finally, the key words were compared to an automatic abridgment of the text which revealed that nearly all key words were present in the ahridgment. This was interpreted as suggesting that the key words may indeed represent the main contents of the text.Este estudo apresenta uma metodologia para a identificação de conjuntos de palavras coerentes. Oito conjuntos foram identificados inicialmente e posteriormente agrupados em dois conjuntos principais: um conjunto denominado `companhia' e outro denominado `não-companhia'. Estes dois conjuntos partilham alguns colocados, e portanto parecem representar tópicos distintos. A posição das palavras de ambos os conjuntos foi computada ao longo do texto analisado. Os resultados indicaram que os conjuntos `não-companhia' se referiam indiretamente à companhia. Por fim, as palavras-chave dos conjuntos foram comparadas a um resumo do texto automático gerado por computador o qual revelou que quase todas as palavras-chave estavam presentes no resumo. Este fato foi interpretado como indício de que as palavras-chave representam o conteúdo central do texto.
Mahale, Rohan; Mehta, Anish; Buddaraju, Kiran; John, Aju Abraham; Javali, Mahendra; Srinivasa, Rangasetty
Infarctions of the corpus callosum are rare vascular events. It is relatively immune to vascular insult because of its rich vascular supply from anterior and posterior circulations of brain. Report of 3 patients with largely diffuse acute corpus callosum infarction. 3 patients with largely diffuse acute corpus callosum infarction were studied and each of these 3 patients had 3 different aetiologies. The 3 different aetiologies of largely diffuse acute corpus callosum infarction were cardioembolism, tuberculous arteritis and takayasu arteritis. Diffuse corpus callosum infarcts are rare events. This case series narrates the three different aetiologies of diffuse acute corpus callosum infarction which is a rare vascular event. Copyright © 2015 Elsevier B.V. All rights reserved.
Full Text Available We study the correlation properties of word lengths in large texts from 30 ebooks in the English language from the Gutenberg Project (www.gutenberg.org using the natural visibility graph method (NVG. NVG converts a time series into a graph and then analyzes its graph properties. First, the original sequence of words is transformed into a sequence of values containing the length of each word, and then, it is integrated. Next, we apply the NVG to the integrated word-length series and construct the network. We show that the degree distribution of that network follows a power law, P ( k ∼ k - γ , with two regimes, which are characterized by the exponents γ s ≈ 1 . 7 (at short degree scales and γ l ≈ 1 . 3 (at large degree scales. This suggests that word lengths are much more strongly correlated at large distances between words than at short distances between words. That finding is also supported by the detrended fluctuation analysis (DFA and recurrence time distribution. These results provide new information about the universal characteristics of the structure of written texts beyond that given by word frequencies.
Crossno, Patricia Joyce; Dunlavy, Daniel M.; Stanton, Eric T.; Shead, Timothy M.
This report is a summary of the accomplishments of the 'Scalable Solutions for Processing and Searching Very Large Document Collections' LDRD, which ran from FY08 through FY10. Our goal was to investigate scalable text analysis; specifically, methods for information retrieval and visualization that could scale to extremely large document collections. Towards that end, we designed, implemented, and demonstrated a scalable framework for text analysis - ParaText - as a major project deliverable. Further, we demonstrated the benefits of using visual analysis in text analysis algorithm development, improved performance of heterogeneous ensemble models in data classification problems, and the advantages of information theoretic methods in user analysis and interpretation in cross language information retrieval. The project involved 5 members of the technical staff and 3 summer interns (including one who worked two summers). It resulted in a total of 14 publications, 3 new software libraries (2 open source and 1 internal to Sandia), several new end-user software applications, and over 20 presentations. Several follow-on projects have already begun or will start in FY11, with additional projects currently in proposal.
Port, Dan; Nikora, Allen; Hihn, Jairus; Huang, LiGuo
Often repositories of systems engineering artifacts at NASA's Jet Propulsion Laboratory (JPL) are so large and poorly structured that they have outgrown our capability to effectively manually process their contents to extract useful information. Sophisticated text mining methods and tools seem a quick, low-effort approach to automating our limited manual efforts. Our experiences of exploring such methods mainly in three areas including historical risk analysis, defect identification based on requirements analysis, and over-time analysis of system anomalies at JPL, have shown that obtaining useful results requires substantial unanticipated efforts - from preprocessing the data to transforming the output for practical applications. We have not observed any quick 'wins' or realized benefit from short-term effort avoidance through automation in this area. Surprisingly we have realized a number of unexpected long-term benefits from the process of applying text mining to our repositories. This paper elaborates some of these benefits and our important lessons learned from the process of preparing and applying text mining to large unstructured system artifacts at JPL aiming to benefit future TM applications in similar problem domains and also in hope for being extended to broader areas of applications.
Юрий Петрович Костиленко
Full Text Available Aim: to study the special features of the male and female corpus callosum internal organization at the mature age.Materials and methods: the total preparations of the male and female corpus callosum (10 preparation of each sex at 45–60 years old were used as the material. The given preparations were used to get from it the plate cuts in the two mutually perpendicular planes with 2 mm. thick. Then the received tissue plates of the corpus callosum underwent plastination in the epoxy. Then the preparations were extracted from the non-polymerized epoxy and placed on the polyethylene film that was covered with the other film of the same size. Further this stratified block was placed amid the two glasses of the equal size that shrunk together by placing the small load on it. After the complete polymerization the received epoxy plates with the corpus callosum tissue contained in it underwent the gentle grinding and the accurate polish and as the result was obtained the surface denudation of its tissue structures that were colored with the 1 % solution of blue methylene for 1% borax solution.Results of research: at the study of the corpus callosum plastinated cuts in saggital plane was revealed that the transverse platen-form elevations of its higher surface are the cord-form tenias standing out from within and going through the corpus callosum. At its studying in the transverse cut was established that in adults can be separated two types of corpus callosum by its density: the dense one and disperse one.At the large increases of the binocular loupe (microscope MBS-9 can be seen the gaps between the adjacent commissural cords. Within it can be detected the blood vessels. On the transverse cut of commissural cords in its depth are revealed the thinnest streaks which totality consists of the two alternate dark and light lines that form the layered striation. Among the series of the light lines are visible the interlayer that separate the whole depth of
Full Text Available Abstract Background Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. Results This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is
Choi, Wonjun; Kim, Baeksoo; Cho, Hyejin; Lee, Doheon; Lee, Hyunju
Plants are natural products that humans consume in various ways including food and medicine. They have a long empirical history of treating diseases with relatively few side effects. Based on these strengths, many studies have been performed to verify the effectiveness of plants in treating diseases. It is crucial to understand the chemicals contained in plants because these chemicals can regulate activities of proteins that are key factors in causing diseases. With the accumulation of a large volume of biomedical literature in various databases such as PubMed, it is possible to automatically extract relationships between plants and chemicals in a large-scale way if we apply a text mining approach. A cornerstone of achieving this task is a corpus of relationships between plants and chemicals. In this study, we first constructed a corpus for plant and chemical entities and for the relationships between them. The corpus contains 267 plant entities, 475 chemical entities, and 1,007 plant-chemical relationships (550 and 457 positive and negative relationships, respectively), which are drawn from 377 sentences in 245 PubMed abstracts. Inter-annotator agreement scores for the corpus among three annotators were measured. The simple percent agreement scores for entities and trigger words for the relationships were 99.6 and 94.8 %, respectively, and the overall kappa score for the classification of positive and negative relationships was 79.8 %. We also developed a rule-based model to automatically extract such plant-chemical relationships. When we evaluated the rule-based model using the corpus and randomly selected biomedical articles, overall F-scores of 68.0 and 61.8 % were achieved, respectively. We expect that the corpus for plant-chemical relationships will be a useful resource for enhancing plant research. The corpus is available at http://combio.gist.ac.kr/plantchemicalcorpus .
Background Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events) involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. Results This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized) together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is modelled using the Lexical
Baldauf, Richard B., Jr.
Focuses on the historical and sociolinguistic studies that illuminate corpus planning processes. These processes are broken down and discussed under two categories: those related to the establishment of norms, referred to as codification, and those related to the extension of the linguistic functions of language, referred to as elaboration. (60…
The empirical law uncovered by Menzerath and formulated by Altmann, known as the Menzerath-Altmann law (henceforth the MA law), reveals the statistical distribution behavior of human language in various organizational levels. Building on previous studies relating organizational regularities in a language, we propose that the distribution of distinct (or different) words in a large text can effectively be described by the MA law. The validity of the proposition is demonstrated by examining two text corpora written in different languages not belonging to the same language family (English and Turkish). The results show not only that distinct word distribution behavior can accurately be predicted by the MA law, but that this result appears to be language-independent. This result is important not only for quantitative linguistic studies, but also may have significance for other naturally occurring organizations that display analogous organizational behavior. We also deliberately demonstrate that the MA law is a special case of the probability function of the generalized gamma distribution.
Full Text Available Este artículo presenta GEstor de COrpus (GECO, un software de gestión de corpus en línea que permite a los usuarios subir colecciones de documentos y volverlos corpus digitales. En el sistema, los corpus pueden ser procesados por otras aplicaciones, las cuales están implementadas como módulos integrados a la infraestructura de GECO. En este documento se describen a detalle sus características, así como la funcionalidad del generador de concordancias desarrollado en torno a él.
Full Text Available An Edition of the Corpus areopagiticum slavicum In the fourteenth century, the monk Isaiah of the holy Mount Athos translated the writings of pseudo-Dionysius the Areopagite (c. end of the 5th century, core texts for Eastern and Western European theological and philosophical thought, from Greek into Church Slavonic. This first Slavic translation of Dionysius’ oeuvre (“De Coelesti Hierarchia,” “De Ecclesiastica Hierarchia,” “De Divinis Nominibus,” “De Mystica Theologia,” the epistles and scholia, which played a significant role in the development of Slavic culture, Orthodox Slavic socio-political theory and praxis, is still central to the study of Slavia Orthodoxa. A working group of German and Russian scholars has completed an edition of the translator’s Church Slavonic autograph with an en face reconstruction of the Greek text used by the translator and philological commentary. A Church Slavonic-Greek and Greek-Church Slavonic dictionary of this edition, currently in preparation, plans to make the terminology used in this influential translation accessible to interdisciplinary researchers. For the first time, the Church Slavonic lexica of this corpus, a substantial part of which was coined by the translator, will be registered in an index of words and forms.
Westbury, Chris; Keith, Jeff; Briesemeister, Benny B; Hofmann, Markus J; Jacobs, Arthur M
Ever since Aristotle discussed the issue in Book II of his Rhetoric, humans have attempted to identify a set of "basic emotion labels". In this paper we propose an algorithmic method for evaluating sets of basic emotion labels that relies upon computed co-occurrence distances between words in a 12.7-billion-word corpus of unselected text from USENET discussion groups. Our method uses the relationship between human arousal and valence ratings collected for a large list of words, and the co-occurrence similarity between each word and emotion labels. We assess how well the words in each of 12 emotion label sets-proposed by various researchers over the past 118 years-predict the arousal and valence ratings on a test and validation dataset, each consisting of over 5970 items. We also assess how well these emotion labels predict lexical decision residuals (LDRTs), after co-varying out the effects attributable to basic lexical predictors. We then demonstrate a generalization of our method to determine the most predictive "basic" emotion labels from among all of the putative models of basic emotion that we considered. As well as contributing empirical data towards the development of a more rigorous definition of basic emotions, our method makes it possible to derive principled computational estimates of emotionality-specifically, of arousal and valence-for all words in the language.
Giffoni Silvyo David Araújo
Full Text Available Considering the rarity of the frontonasal dysplasia (FD and the few reports about it in a large casuistry using magnetic resonance image (MRI, we describe the results of the angular analysis of the corpus callosum of 18 individuals with FD (7 male, 11 female, using an easily-reproductive method. Group I had 12 individuals with isolated form and Group II had 6 individuals with FD syndromic with unknown etiology. The results are presented in set. Comparing with the control group, patients with FD presented alpha angle increase and beta and gamma angles reduction (p<0.05. Alpha and gamma angles express the relationship between the anterior portion of corpus callosum and the floor of 4th ventricle. Considering the embryonary development, these findings would occur secondarily to failure during the development of nasal capsula. Thus, angular anomaly in corpus callosum would be a usual finding, and not fortuitous in patients with FD.
In recent years, continuing advances in technology have increased the capacity to automate the extraction of a range of linguistic features of texts and thus have provided the impetus for the substantial growth of corpus linguistics. While corpus linguistic tools and methods have been used extensively in second language learning research, they…
Full Text Available Medecine of V and IV centuries B.C. attested in the Corpus Hippocraticum ascribes all diseases to the rheuma, i.e. the flux of humours into the body. This flux produces not only the rise of cold, hoarsness, cough, reddenings, dropsy, but also arthritis, sciatica, gout.
Tomaiuolo, Francesco; Campana, Serena; Collins, D Louis
We examined the effects of visual deprivation at birth on the development of the corpus callosum in a large group of congenitally blind individuals. We acquired high-resolution T1-weighted MRI scans in 28 congenitally blind and 28 normal sighted subjects matched for age and gender....... There was no overall group effect of visual deprivation on the total surface area of the corpus callosum. However, subdividing the corpus callosum into five subdivisions revealed significant regional changes in its three most posterior parts. Compared to the sighted controls, congenitally blind individuals showed a 12......% reduction in the splenium, and a 20% increase in the isthmus and the posterior part of the body. A shape analysis further revealed that the bending angle of the corpus callosum was more convex in congenitally blind compared to the sighted control subjects. The observed morphometric changes in the corpus...
Bada, Michael; Eckert, Miriam; Evans, Donald; Garcia, Kristin; Shipley, Krista; Sitnikov, Dmitry; Baumgartner, William A; Cohen, K Bretonnel; Verspoor, Karin; Blake, Judith A; Hunter, Lawrence E
Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text. This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP) community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released). Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement. As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens), our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection), the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are freely available at http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml.
Full Text Available Framing, the effect of context on cognitive processes, is a prominent topic of research in psychology and public opinion research. Research on framing has traditionally relied on controlled experiments and manually annotated document collections. In this paper we present a method that allows for quantifying the relative strengths of competing linguistic frames based on corpus analysis. This method requires little human intervention and can therefore be efficiently applied to large bodies of text. We demonstrate its effectiveness by tracking changes in the framing of terror over time and comparing the framing of abortion by Democrats and Republicans in the U.S.
Üstverinin Tam-Metin Bilgi Erişim Performansı Üzerindeki Etkisi: Küçük Ölçekli Türkçe Külliyat Üzerinde Deneysel Bir Araştırma / Impact of Metadata on Full-text Information Retrieval Performance: An Experimental Research on a Small Scale Turkish Corpus
Information institutions use text-based information retrieval systems to store, index and retrieve metadata, full-text, or both metadata and full-text (hybrid) contents. The aim of this research was to evaluate impact of these contents on information retrieval performance. For this purpose, metadata (MIR), full-text (FIR) and hybrid (HIR) content information retrieval systems were developed with default Lucene information retrieval model for a small scale Turkish corpus. In order to evaluate ...
Jacobi, C.; van Atteveldt, W.H.; Welbers, K.
The huge collections of news content which have become available through digital technologies both enable and warrant scientific inquiry, challenging journalism scholars to analyse unprecedented amounts of texts. We propose Latent Dirichlet Allocation (LDA) topic modelling as a tool to face this
Godbehere, Andrew B.
Given the overwhelming quantities of data generated every day, there is a pressing need for tools that can extract valuable and timely information. Vast reams of text data are now published daily, containing information of interest to those in social science, marketing, finance, and public policy, to name a few. Consider the case of the micro-blogging website Twitter, which in May 2013 was estimated to contain 58 million messages per day: in a single day, Twitter generates a greater volume of...
Mallory, Emily K; Zhang, Ce; Ré, Christopher; Altman, Russ B
A complete repository of gene-gene interactions is key for understanding cellular processes, human disease and drug response. These gene-gene interactions include both protein-protein interactions and transcription factor interactions. The majority of known interactions are found in the biomedical literature. Interaction databases, such as BioGRID and ChEA, annotate these gene-gene interactions; however, curation becomes difficult as the literature grows exponentially. DeepDive is a trained system for extracting information from a variety of sources, including text. In this work, we used DeepDive to extract both protein-protein and transcription factor interactions from over 100,000 full-text PLOS articles. We built an extractor for gene-gene interactions that identified candidate gene-gene relations within an input sentence. For each candidate relation, DeepDive computed a probability that the relation was a correct interaction. We evaluated this system against the Database of Interacting Proteins and against randomly curated extractions. Our system achieved 76% precision and 49% recall in extracting direct and indirect interactions involving gene symbols co-occurring in a sentence. For randomly curated extractions, the system achieved between 62% and 83% precision based on direct or indirect interactions, as well as sentence-level and document-level precision. Overall, our system extracted 3356 unique gene pairs using 724 features from over 100,000 full-text articles. Application source code is publicly available at https://github.com/edoughty/deepdive_genegene_app email@example.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Van Niekerk, D
Full Text Available development. Extracted contours are processed and analysed statistically to describe acoustic properties in different tonal contexts. The authors demonstrate how features useful for tone recognition or synthesis can be successfully extracted from a corpus...
Tony Berber Sardinha
Full Text Available This article offers a reexamination of two of Saussure’s insights from the point of view of corpus linguistics—namely, freedom of combination and heterogeneity in language in use. Regarding the first insight, an analysis of word combinations in a corpus of newspaper texts written in Brazilian Portuguese was carried out to determine how many of these combinations were actual collocations—that is, were used frequently enough in a very large reference corpus (the Brazilian corpus to warrant statistical significance. The results suggested that most word combinations are not free; rather, they follow previously established preferences among speakers. Regarding the second notion, that of heterogeneity, the collocations in the newspaper texts were tracked as they were deployed one after the other along each text, and this flow was visually depicted. The inspection of the charts revealed unique patterns of the distribution of collocation, thereby suggesting that the evidence supports the view of heterogeneity. A cluster analysis was later conducted on the amount of collocations in each text, revealing three basic collocation bands onto which all the texts can be fitted. This was interpreted as suggesting that heterogeneity, despite being present and noticeable, is constrained rather than limitless. The article concludes that the methods and techniques afforded by present-day corpus linguistics can shed light onto Saussure’s many valuable insights. ------------------------------------------------------------------------------ LIBERDADE DE COMBINAÇÃO E HETEROGENEIDADE: UM OLHAR DA LINGUÍSTICA DE CORPUS EM DOIS INSIGHTS SAUSSUREANOS O artigo reexamina dois dos insights de Saussure a partir da perspectiva da linguística de corpus, a saber a liberdade de combinação e a heterogeneidade no uso da língua. Com relação ao primeiro, foi feita uma análise de combinações de palavras em corpus de textos de jornais para determinar quantas eram
Wren Jonathan D
Full Text Available Abstract Motivation The use or study of chemical compounds permeates almost every scientific field and in each of them, the amount of textual information is growing rapidly. There is a need to accurately identify chemical names within text for a number of informatics efforts such as database curation, report summarization, tagging of named entities and keywords, or the development/curation of reference databases. Results A first-order Markov Model (MM was evaluated for its ability to distinguish chemical names from words, yielding ~93% recall in recognizing chemical terms and ~99% precision in rejecting non-chemical terms on smaller test sets. However, because total false-positive events increase with the number of words analyzed, the scalability of name recognition was measured by processing 13.1 million MEDLINE records. The method yielded precision ranges from 54.7% to 100%, depending upon the cutoff score used, averaging 82.7% for approximately 1.05 million putative chemical terms extracted. Extracted chemical terms were analyzed to estimate the number of spelling variants per term, which correlated with the total number of times the chemical name appeared in MEDLINE. This variability in term construction was found to affect both information retrieval and term mapping when using PubMed and Ovid.
Christopher William White
Full Text Available The Yale-Classical Archives Corpus (YCAC contains harmonic and rhythmic information for a dataset of Western European Classical art music. This corpus is based on data from classicalarchives.com, a repository of thousands of user-generated MIDI representations of pieces from several periods of Western European music history. The YCAC makes available metadata for each MIDI file, as well as a list of pitch simultaneities ("salami slices" in the MIDI file. Metadata include the piece's composer, the composer's country of origin, date of composition, genre (e.g., symphony, piano sonata, nocturne, etc., instrumentation, meter, and key. The processing step groups the file's pitches into vertical slices each time a pitch is added or subtracted from the texture, recording the slice's offset (measured in the number of quarter notes separating the event from the file's beginning, highest pitch, lowest pitch, prime form, scale-degrees in relation to the global key (as determined by experts, and local key information (as determined by a windowed key-profile analysis. The corpus contains 13,769 MIDI files by 571 composers yielding over 14,051,144 vertical slices. This paper outlines several properties of this corpus, along with a representative study using this dataset.
This paper outlines how corpus linguistics--and more specifically the corpus-assisted discourse studies approach--can add useful dimensions to studies of language ideology. First, it is argued that the identification of words of high, low, and statistically significant frequency can help in the identification and exploration of language ideologies…
Pedro Patiño García
Full Text Available This paper describes the Corpus of Free Trade Agreements (henceforth FTA, a specialized parallel corpus in English and Spanish from Europe and America and a smaller subcorpus in English-Norwegian and Spanish-Norwegian that was prepared and then aligned with Translation Corpus Aligner 2 (Hofland & Johansson, 1998. The data was taken from Free Trade Agreements. These agreements are specialized texts officially signed and ratified by several countries and blocks of countries in the last twenty years. Thus, FTAs are a rich repository for terminology and phraseology that is used in different fields of business activity throughout the world. The corpus contains around 1.37 million words in the English section and 1.48 million words in its Spanish counterpart, plus 60,000 words each in the Spanish-Norwegian and English-Norwegian subcorpus. The corpus is being used primarily to study the terms and specialized collocations that include these terms in this kind of specialized texts.Keywords: specialized collocation, specialized parallel corpus, corpus linguistics, Free Trade Agreement
Scherfig, Erik Christian Høegh
oftalmology, biopsy, choroid, corpus vitreum, retina, malignant melanoma, biopsy technic, retinoblastoma......oftalmology, biopsy, choroid, corpus vitreum, retina, malignant melanoma, biopsy technic, retinoblastoma...
Summerville, Adam James; Snodgrass, Sam; Mateas, Michael; Ontañón, Santiago
Levels are a key component of many different video games, and a large body of work has been produced on how to procedurally generate game levels. Recently, Machine Learning techniques have been applied to video game level generation towards the purpose of automatically generating levels that have the properties of the training corpus. Towards that end we have made available a corpora of video game levels in an easy to parse format ideal for different machine learning and other game AI researc...
Carlos Alberto dos Santos Dutra
Full Text Available A festa religiosa de Corpus Christi, data da instituição da Eucaristia, neste ano foi comemorada no dia 15 de junho. Neste sacramento, entende a Igreja católica, o próprio Cristo se comunica para nutrir e salvar o homem. Expressão e síntese do cristianismo, é a identificação do sacrifício de Cristo com o sacrifício do homem.
Full Text Available
Abstract: The Ndebele language corpus described here is that compiled by the ALLEX Project (now ALRI at the University of Zimbabwe. It is intended to reflect as much as possible the Ndebele language as spoken in Zimbabwe. The Ndebele language corpus was built in order to provide much-needed material for the study of the Ndebele language with a special focus on dictionarymaking and research. Like most corpora, the Ndebele language corpus may in future be used for other purposes not thought of at the time of its inception. It has been designed to meet generally acceptable standards so that it can be adaptable to various possible uses by various researchers. The article wants to outline the building process of the Ndebele language corpus with special emphasis on the challenges that faced compilers, and possible solutions. It is assumed that some of these challenges might not be peculiar to Ndebele alone but could also affect related African languages in a more or less similar situation. The main focus of the discussion will be the composition of the Ndebele language corpus, i.e. the type of texts that constitute the corpus. The corpus is composed of published texts, unpublished texts and oral material gathered from Ndebele-speaking districts of Zimbabwe. It will be argued that the use of the corpus and its reliability for research depends among other factors on its contents. It will also be shown that the contents of a corpus depend on a number of factors, some of which include sociolinguistic, political and economic considerations. These considerations have implications on both the content and quality of published and oral texts that constitute the Ndebele language corpus.
Keywords: CORPUS, ORAL MATERIALS, CODE-MIXING, CODE-SWITCHING, MOTHER- TONGUE, NDEBELE
Opsomming: Die Ndebeletaalkorpus: 'n Oorsig van sommige faktore wat die inhoud van die korpus be?nvloed. Die Ndebeletaalkorpus wat hier beskryf word, is di? saamgestel deur die
Hammond, Kenric W; Ben-Ari, Alon Y; Laundry, Ryan J; Boyko, Edward J; Samore, Matthew H
Free text in electronic health records resists large-scale analysis. Text records facts of interest not found in encoded data, and text mining enables their retrieval and quantification. The U.S. Department of Veterans Affairs (VA) clinical data repository affords an opportunity to apply text-mining methodology to study clinical questions in large populations. To assess the feasibility of text mining, investigation of the relationship between exposure to adverse childhood experiences (ACEs) and recorded diagnoses was conducted among all VA-treated Gulf war veterans, utilizing all progress notes recorded from 2000-2011. Text processing extracted ACE exposures recorded among 44.7 million clinical notes belonging to 243,973 veterans. The relationship of ACE exposure to adult illnesses was analyzed using logistic regression. Bias considerations were assessed. ACE score was strongly associated with suicide attempts and serious mental disorders (ORs = 1.84 to 1.97), and less so with behaviorally mediated and somatic conditions (ORs = 1.02 to 1.36) per unit. Bias adjustments did not remove persistent associations between ACE score and most illnesses. Text mining to detect ACE exposure in a large population was feasible. Analysis of the relationship between ACE score and adult health conditions yielded patterns of association consistent with prior research. Copyright © 2015 International Society for Traumatic Stress Studies.
Full Text Available During the nineties, the accessibility of large corpora and the possibility of manipulation of enormous quantities of linguistic data was the origin of a renewal of interest in statistical and probability evidences that served to directly question linguistics about its objectives, methods and foundations. This interest gained increasing importance and became important currently under the name of corpus linguistics, a field of dominant research in language science. In this article we will show that the designation corpus linguistics covers considerably heterogeneous theoretical positions and research, topics. We show how corpus linguistics, originally of british origin, was later endowed with historical and theoretical legitimacy while at the same time intending to establish itself as a new paradigm in language science. Finally we distinguish two attitudes inside the british tradition: one, intending to build the studies on a corpus and in a new paradigm based on a retrospective construction of the critical works of chomsky during the years 1959 and 1960, which was intended to legitimize the studies; the other attitude involves the continuity of the tradition of british empirical linguistics.
Full Text Available Abstract We report an unusual case of an aortic type A dissection with a corpus alienum which compresses the right ventricle. The patient successfully underwent an aortic root replacement in deep hypothermia with re-implantation of the coronary arteries using a modified Bentall procedure and the resection of the corpus alienum. Intraoperative finding reveals 3 greatly adhered gauze compresses, which were most likely forgotten in the operation 34 years ago.
Full Text Available Experimental Polish-Lithuanian Corpus with the Semantic Annotation Elements In the article the authors present the experimental Polish-Lithuanian corpus (ECorpPL-LT formed for the idea of Polish-Lithuanian theoretical contrastive studies, a Polish-Lithuanian electronic dictionary, and as help for a sworn translator. The semantic annotation being brought into ECorpPL-LT is extremely useful in Polish-Lithuanian contrastive studies, and also proves helpful in translation work.
Cejuela, Juan Miguel; Vinchurkar, Shrikant; Goldberg, Tatyana
trees and was trained and evaluated on a newly improved LocTextCorpus. Combined with an automatic named-entity recognizer, LocText achieved high precision (P = 86%±4). After completing development, we mined the latest research publications for three organisms: human (Homo sapiens), budding yeast...
Alif Fairus Nor Mohamad
Full Text Available English for Specific Purposes (ESP educators often face dilemma in deciding what lexical items to teach their students. In the field of English for Nursing Purposes (ENP, there is no exception on this issue as well. Only by analyzing the nursing corpus made up of essential core textbooks that can provide better insights and guide to both nursing students and educators. This research aims to highlight the 2,000 most frequently used nursing words across the core textbooks of nursing and to profile the types of ‘low frequency’ lexis which comprise the nursing corpus in terms of the General Service List (GSL and Academic Word List (AWL lexis coverage. By knowing the frequently used nursing words would further reduce students’ reading deficiency if the students use the 2000-word list.
Carmichael, Lesley; Wright, Richard; Wassink, Alicia Beckford
We are developing a novel, searchable corpus as a research tool for investigating phonetic and phonological phenomena across various speech styles. Five speech styles have been well studied independently in previous work: reduced (casual), careful (hyperarticulated), citation (reading), Lombard effect (speech in noise), and ``motherese'' (child-directed speech). Few studies to date have collected a wide range of styles from a single set of speakers, and fewer yet have provided publicly available corpora. The pilot corpus includes recordings of (1) a set of speakers participating in a variety of tasks designed to elicit the five speech styles, and (2) casual peer conversations and wordlists to illustrate regional vowels. The data include high-quality recordings and time-aligned transcriptions linked to text files that can be queried. Initial measures drawn from the database provide comparison across speech styles along the following acoustic dimensions: MLU (changes in unit duration); relative intra-speaker intensity changes (mean and dynamic range); and intra-speaker pitch values (minimum, maximum, mean, range). The corpus design will allow for a variety of analyses requiring control of demographic and style factors, including hyperarticulation variety, disfluencies, intonation, discourse analysis, and detailed spectral measures.
Friese, S.A.; Bitzer, M.; Voigt, K.; Kueker, W. [Tuebingen Univ. (Germany). Abt. fuer Neuroradiologie; Freudenstein, D. [Department of Neurosurgery, Eberhard-Karls-University Tuebingen (Germany)
MRI has facilitated diagnostic assessment of the corpus callosum. Diagnostic classification of solitary or multiple lesions of the corpus callosum has not attracted much attention, although signal abnormalities are not uncommon. Our aim was to identify characteristic imaging features of lesions frequently encountered in practice. We reviewed the case histories of 59 patients with lesions shown on MRI. The nature of the lesions was based on clinical features and/or long term follow-up (ischaemic 20, Virchow-Robin spaces 3, diffuse axonal injury 7, multiple sclerosis 11, hydrocephalus 5, acute disseminated encephalomyelitis 5, Marchiafava-Bignami disease 4, lymphoma 2, glioblastoma hamartoma each 1). The location in the sagittal plane, the relationship to the borders of the corpus callosum and midline and the size were documented. The 20 ischaemic lesions were asymmetrical but adjacent to the midline; the latter was involved in new or large lesions. Diffuse axonal injury commonly resulted in large lesions, which tended to be asymmetrical; the midline and borders of the corpus callosum were always involved. Lesions in MS were small, at the lower border of the corpus callosum next to the septum pellucidum, and crossed the midline asymmetrically. Acute disseminated encephalomyelitis and the other perivenous inflammatory diseases caused relatively large, asymmetrical lesions. Hydrocephalus resulted in lesions of the upper part of the corpus callosum, and mostly in its posterior two thirds; they were found in the midline. Lesions in Marchiafava-Bignami disease were large, often symmetrically in the midline in the splenium and did not reach the edge of the corpus callosum. (orig.)
Nomori, Koji; Kitamura, Koji; Motomura, Yoichi; Nishida, Yoshifumi; Yamanaka, Tatsuhiro; Komatsubara, Akinori
In Japan, childhood injury prevention is urgent issue. Safety measures through creating knowledge of injury data are essential for preventing childhood injuries. Especially the injury prevention approach by product modification is very important. The risk assessment is one of the most fundamental methods to design safety products. The conventional risk assessment has been carried out subjectively because product makers have poor data on injuries. This paper deals with evidence-based risk assessment, in which artificial intelligence technologies are strongly needed. This paper describes a new method of foreseeing usage of products, which is the first step of the evidence-based risk assessment, and presents a retrieval system of injury data. The system enables a product designer to foresee how children use a product and which types of injuries occur due to the product in daily environment. The developed system consists of large scale injury data, text mining technology and probabilistic modeling technology. Large scale text data on childhood injuries was collected from medical institutions by an injury surveillance system. Types of behaviors to a product were derived from the injury text data using text mining technology. The relationship among products, types of behaviors, types of injuries and characteristics of children was modeled by Bayesian Network. The fundamental functions of the developed system and examples of new findings obtained by the system are reported in this paper.
Deny A. Kwary
Full Text Available This data article presents a corpus (i.e. a selection of a big number of words in an electronic form and a concordancer (i.e. a tool to show the word in its context of use of academic journal articles. As the title suggests, the data were collected from research articles published in academic journals. The corpus contains 5,686,428 words selected from 895 journal articles published by Elsevier in 2011–2015. The corpus is classified into four subject areas: Health sciences, Life sciences, Physical Sciences, and Social Sciences, following the classifications of Scopus, which is the largest abstract and citation database of peer-reviewed scientific journals, books and conference proceedings. To ease the access and utilization of the corpus, a program to produce the key word in context (KWIC and word frequency was created and placed on the website: corpus.kwary.net. The corpus is a valuable resource for researchers, teachers, and translators working on academic English.
Rommel P. Regadas
Full Text Available PURPOSE: To describe a technique for en bloc harvesting of the corpus cavernosum, cavernous artery and urethra from transplant organ donors and contraction-relaxation experiments with corpus cavernosum smooth muscle. MATERIALS AND METHODS: The corpus cavernosum was dissected to the point of attachment with the crus penis. A 3 cm segment (corpus cavernosum and urethra was isolated and placed in ice-cold sterile transportation buffer. Under magnification, the cavernous artery was dissected. Thus, 2 cm fragments of cavernous artery and corpus cavernosum were obtained. Strips measuring 3 x 3 x 8 mm3 were then mounted vertically in an isolated organ bath device. Contractions were measured isometrically with a Narco-Biosystems force displacement transducer (model F-60, Narco-Biosystems, Houston, TX, USA and recorded on a 4-channel Narco-Biosystems desk model polygraph. RESULTS: Phenylephrine (1µM was used to induce tonic contractions in the corpus cavernosum (3 - 5 g tension and cavernous artery (0.5 - 1g tension until reaching a plateau. After precontraction, smooth muscle relaxants were used to produce relaxation-response curves (10-12M to 10-4 M. Sodium nitroprusside was used as a relaxation control. CONCLUSION: The harvesting technique and the smooth muscle contraction-relaxation model described in this study were shown to be useful instruments in the search for new drugs for the treatment of human erectile dysfunction.
Full Text Available In this study, the resemblance of the language learning course books used in Turkey to authentic language spoken by native speakers is explored by using a corpus-based approach. For this, the 10-million-word spoken part of the British National Corpus was selected as reference corpus. After that, all language learning course books used in high schools in Turkey were scanned and transferred to SketchEngine, an online corpus query tool. Lastly, certain grammar points were extracted first from British National Corpus and then from course books; similaritites and differences were compared. At the end of the study, it was found that the language learning course books have little similarity to authentic language in terms of certain grammatical items and frequency of their collocations. In this way, the points to be revised and changed were explored. In addition, this study emphasized the role of corpus approach as a material development and analysis tool; and tested the functionality of course books for writers and for Ministry of National Education.
Xu, Rong; Wang, QuanQiu
Targeted anticancer drugs such as imatinib, trastuzumab and erlotinib dramatically improved treatment outcomes in cancer patients, however, these innovative agents are often associated with unexpected side effects. The pathophysiological mechanisms underlying these side effects are not well understood. The availability of a comprehensive knowledge base of side effects associated with targeted anticancer drugs has the potential to illuminate complex pathways underlying toxicities induced by these innovative drugs. While side effect association knowledge for targeted drugs exists in multiple heterogeneous data sources, published full-text oncological articles represent an important source of pivotal, investigational, and even failed trials in a variety of patient populations. In this study, we present an automatic process to extract targeted anticancer drug-associated side effects (drug-SE pairs) from a large number of high profile full-text oncological articles. We downloaded 13,855 full-text articles from the Journal of Oncology (JCO) published between 1983 and 2013. We developed text classification, relationship extraction, signaling filtering, and signal prioritization algorithms to extract drug-SE pairs from downloaded articles. We extracted a total of 26,264 drug-SE pairs with an average precision of 0.405, a recall of 0.899, and an F1 score of 0.465. We show that side effect knowledge from JCO articles is largely complementary to that from the US Food and Drug Administration (FDA) drug labels. Through integrative correlation analysis, we show that targeted drug-associated side effects positively correlate with their gene targets and disease indications. In conclusion, this unique database that we built from a large number of high-profile oncological articles could facilitate the development of computational models to understand toxic effects associated with targeted anticancer drugs. Copyright © 2015 Elsevier Inc. All rights reserved.
Full Text Available http://dx.doi.org/10.5007/2175-7968.2016v36nesp1p177 As language referential data banks, corpora are instrumental in the exploration of translation solutions in bilingual parallel texts or conventional usages of source or target language in monolingual general or specialized texts. These roles are firmly rooted in translation processes, from analysis and interpretation of source text to searching for an acceptable equivalent and integrating it into the production of the target text. Provided the creative and not the conservative way be taken, validation or adaptation of target text in accordance with conventional usages in the target language also benefits from corpora. Translation teaching is not exploiting this way of translating that is common practice in the professional translation markets around the world. Instead of showing what corpus tools can do to translation teaching, we start our analysis with a common issue within translation teaching and show how corpus data can help to resolve it in learning activities in translation courses. We suggest a corpus-driven model for the interpretation of ‘business’ as a term and as an item in complex terms based on source text pattern analysis. This methodology will make it possible for teachers to explain and justify interpretation rules that have been defined theoretically from corpus data. It will also help teachers to conceive and non-subjectively assess practical activities designed for learners of translation. Corpus data selected for the examples of rule-based interpretations provided in this paper have been compiled in a corpus-driven study (Poirier, 2015 on the translation of the noun ‘business’ in the field of specialized translation in business, economics, and finance from English to French. The corpus methodology and rule-based interpretation of senses can be generalized and applied in the definition of interpretation rules for other language pairs and other specialized simple and
Full Text Available As language referential data banks, corpora are instrumental in the exploration of translation solutions in bilingual parallel texts or conventional usages of source or target language in monolingual general or specialized texts. These roles are firmly rooted in translation processes, from analysis and interpretation of source text to searching for an acceptable equivalent and integrating it into the production of the target text. Provided the creative and not the conservative way be taken, validation or adaptation of target text in accordance with conventional usages in the target language also benefits from corpora. Translation teaching is not exploiting this way of translating that is common practice in the professional translation markets around the world. Instead of showing what corpus tools can do to translation teaching, we start our analysis with a common issue within translation teaching and show how corpus data can help to resolve it in learning activities in translation courses. We suggest a corpus-driven model for the interpretation of ‘business’ as a term and as an item in complex terms based on source text pattern analysis. This methodology will make it possible for teachers to explain and justify interpretation rules that have been defined theoretically from corpus data. It will also help teachers to conceive and non-subjectively assess practical activities designed for learners of translation. Corpus data selected for the examples of rule-based interpretations provided in this paper have been compiled in a corpus-driven study (Poirier, 2015 on the translation of the noun ‘business’ in the field of specialized translation in business, economics, and finance from English to French. The corpus methodology and rule-based interpretation of senses can be generalized and applied in the definition of interpretation rules for other language pairs and other specialized simple and complex terms. These works will encourage the
Full Text Available This article explores a type of co-occurrence pattern which cannot be adequately described by existing models of collocation, and for which combinatory dictionaries have yet failed to provide sufficient information. The phenomenon of “oblique inter-collocation”, as I propose to call it, is characterised by a concatenation of syntagmatic preferences which partially contravenes the habitual grammatical order of semantic selection. In particular, I will examine some of the effects which the verb cause exerts on the distribution of attributive adjectives in the context of specific noun classes. The procedure for detecting and describing patterns of oblique inter-collocation is illustrated by means of SketchEngine corpus query tools. Based on the data extracted from a large-scale corpus, this paper carries out a critical analysis of the micro-structure in Oxford Collocations Dictionary.
Bravo, Àlex; Piñero, Janet; Queralt-Rosinach, Núria; Rautschka, Michael; Furlong, Laura I
Current biomedical research needs to leverage and exploit the large amount of information reported in scientific publications. Automated text mining approaches, in particular those aimed at finding relationships between entities, are key for identification of actionable knowledge from free text repositories. We present the BeFree system aimed at identifying relationships between biomedical entities with a special focus on genes and their associated diseases. By exploiting morpho-syntactic information of the text, BeFree is able to identify gene-disease, drug-disease and drug-target associations with state-of-the-art performance. The application of BeFree to real-case scenarios shows its effectiveness in extracting information relevant for translational research. We show the value of the gene-disease associations extracted by BeFree through a number of analyses and integration with other data sources. BeFree succeeds in identifying genes associated to a major cause of morbidity worldwide, depression, which are not present in other public resources. Moreover, large-scale extraction and analysis of gene-disease associations, and integration with current biomedical knowledge, provided interesting insights on the kind of information that can be found in the literature, and raised challenges regarding data prioritization and curation. We found that only a small proportion of the gene-disease associations discovered by using BeFree is collected in expert-curated databases. Thus, there is a pressing need to find alternative strategies to manual curation, in order to review, prioritize and curate text-mining data and incorporate it into domain-specific databases. We present our strategy for data prioritization and discuss its implications for supporting biomedical research and applications. BeFree is a novel text mining system that performs competitively for the identification of gene-disease, drug-disease and drug-target associations. Our analyses show that mining only a
Theoretically the Northern Sotho language is made up of almost 30 dialects while practically it is not so, because the standard language was formed from very few of its dialects. As a result, even today the language has no corpus which is balanced or representative owing to the fact that almost all of the available corpora ...
S.A. Akhondi (Saber); A.G. Klenner (Alexander G.); C. Tyrchan (Christian); A.K. Manchala (Anil K.); K. Boppana (Kiran); D. Lowe (Daniel); M. Zimmermann (Marc); S.A.R.P. Jagarlapudi (Sarma A. R. P.); R. Sayle (Roger); J.A. Kors (Jan); C. Muresan (Cornelia)
textabstractExploring the chemical and biological space covered by patent applications is crucial in early-stage medicinal chemistry activities. Patent analysis can provide understanding of compound prior art, novelty checking, validation of biological assays, and identification of new starting
Præsentation af første halvdel et spansk tekskorpus bestående af samtlige interviews med spaniere i de to ugeskrifter Cambio16 og Tiempo i 1990. Dette korpus er siden suppleret med samtlige interviews i de samme tidsskrifter i 1995. Korpus samlede størrelse: over 1.2 million ord...
part of speech, was made accessible via Internet (Kruyt 1995a, b). A 27 Million ..... corpora yet, and that 16 user accounts are reserved for students of the Free ... are from Norway, Denmark, Austria, Slovenia, Latvia, Malaysia and Korea.
Decker, Barbara McElwee; Guitar, Barry; Solomon, Andrew
Compared with developmental stuttering, adult onset acquired stuttering is rare. However, several case reports describe acquired stuttering and an association with callosal pathology. Interestingly, these cases share a neuroanatomical localisation also demonstrated in developmental stuttering. We present a case of adult onset acquired stuttering associated with inflammatory demyelination within the corpus callosum. This patient's disfluency improved after the initiation of immunomodulatory therapy. © BMJ Publishing Group Ltd (unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Plecháč, Petr; Kolár, Robert
Roč. 2, č. 1 (2015), s. 107-118 ISSN 2346-6901 R&D Projects: GA ČR GAP406/11/1825 Institutional support: RVO:68378068 Keywords : Czech poetry * versification * corpus linguistics * theory of verse Subject RIV: AJ - Letters, Mass-media, Audiovision
... callosum, the structure that connects the two hemispheres (left and right) of the brain. In ACC the corpus callosum is partially or completely absent. It is caused by a disruption of brain cell migration during fetal development. ACC can occur as an isolated condition or ...
John M. Swales
Full Text Available The subtitle of Huddleston (1971 reads A syntactic study based on an analysis of scientific texts; this volume thus represents the first carefully designed and substantial corpus of scientific English. In this paper I re-examine a selection of his findings based on the science and engineering half of Hyland's corpus of 240 research articles. Features selected were variation in the passivization of individual transitive verbs, the paucity of instances of V + V-ing structures like "He continued working", and the meaning of the modal must in research prose. In all three cases, Huddleston's findings were largely confirmed in a database constructed about 35 years later, thus suggesting that English research writing in the sciences is, at least in grammatical terms, fundamentally stable. In the closing section, I contrast this linguistic stability with the rapid technological development of corpus linguistics. I instance a recent co-taught experimental course in which international senior doctoral students from the health and social sciences were able, with relatively little training and guidance, to construct paired corpora of their own research writings and of published articles from their own specialities and then conduct precisely the kinds of analysis that only a highly professional linguist could, with considerable more labour, conduct nearly forty years ago.
Castro Ferreira, Thiago; Wubben, Sander; Krahmer, Emiel
We introduce a corpus for the study of proper name generation. The corpus consists of proper name references to people in webpages, extracted from the Wikilinks corpus. In our analyses, we aim to identify the different ways, in terms of length and form, in which a proper names are produced
Barker-Plummer, Dave; Dale, Robert; Cox, Richard; Romanczuk, Alex
We have assembled a large corpus of student submissions to an automatic grading system, where the subject matter involves the translation of natural language sentences into propositional logic. Of the 2.3 million translation instances in the corpus, 286,000 (approximately 12%) are categorized as being in error. We want to understand the nature of…
Tony Berber Sardinha
Full Text Available In this paper, I look at four different aspects of metaphor research from a corpus linguistic perspective, namely: (1 the lexicogrammar of metaphors, which refers to the patterning of linguistic metaphor revealed by corpus analysis; (2 metaphor probabilities, which is a facet of metaphor that emerges from frequency-based studies of metaphor; (3 dimensions of metaphor variation, or the search for systematic parameters of variation in metaphor use across different registers; and (4 automated metaphor retrieval, which relates to the development of software to help identify metaphors in corpora. I argue that these four aspects are interrelated, and that advances in one of them can drive changes in the others.Neste artigo discuto quarto aspectos da pesquisa sobre metáfora do ponto de vista da linguística de corpus: (1 a lexicogramática das metáforas, que se refere aos padrões da metáfora linguística revelados pela análise de corpus; (2 probabilidades metafóricas, que é uma faceta da metáfora que emerge a partir dos estudos relacionados à freqüência de metáforas; (3 dimensões da variação de metáforas, ou a busca por parâmetros sistemáticos de variação de uso de metáfora em diferentes gêneros; e (4 captura automática de metáfora, que está relacionada ao desenvolvimento de softwares que auxiliam na identificação de metáforas em corpora. I defendo que esses quatro aspectos são interrelacionados, e que progressos em um deles podem acarretar mudanças nos outros.
Full Text Available Many elements contribute to the relative difficulty in acquiring specific aspects of English as a foreign language (Goldschneider & DeKeyser, 2001. Modal auxiliary verbs (e.g. could, might, are examples of a structure that is difficult for many learners. Not only are they particularly complex semantically, but especially in the Malaysian context reported on in this paper, there is no direct equivalent in the studentsâ L1. In other words, they are a good example of a structure for which successful acquisition depends very much on the quality of the input and instruction students receive. This paper reports on analysis of a 230,000 word corpus of Malaysian English textbooks, in which it was found that the relative frequency of the modals did not match that found in native speaker corpora such as the BNC. We compared the textbook corpus with a learner corpus of Malaysian form 4 learners and found no direct relationship between frequency of presentation of target forms in the textbooks and their use by students in their writing. We also found a very large percentage of errors in studentsâ writing. We suggest a number of possible reasons for these findings and discuss the implications for materials developers and teachers.
Full Text Available Analysis of regional corpus callosum fiber composition reveals that callosal regions connecting primary and secondary sensory areas tend to have higher proportions of coarse-diameter, highly myelinated fibers than callosal regions connecting so-called higher-order areas. This suggests that in primary/secondary sensory areas there are strong timing constraints for interhemispheric communication, which may be related to the process of midline fusion of the two sensory hemifields across the hemispheres. We postulate that the evolutionary origin of the corpus callosum in placental mammals is related to the mechanism of midline fusion in the sensory cortices, which only in mammals receive a topographically organized representation of the sensory surfaces. The early corpus callosum may have also served as a substrate for growth of fibers connecting higher-order areas, which possibly participated in the propagation of neuronal ensembles of synchronized activity between the hemispheres. However, as brains became much larger, the increasingly longer interhemispheric distance may have worked as a constraint for efficient callosal transmission. Callosal fiber composition tends to be quite uniform across species with different brain sizes, suggesting that the delay in callosal transmission is longer in bigger brains. There is only a small subset of large-diameter callosal fibers whose size increases with increasing interhemispheric distance. These limitations in interhemispheric connectivity may have favored the development of brain lateralization in some species like humans. "...if the currently received statements are correct, the appearance of the corpus callosum in the placental mammals is the greatest and most sudden modification exhibited by the brain in the whole series of vertebrated animals..." T.H. Huxley (1.
Hudson (1990) proposes that each conjunct in a coordinate phrase forms dependency relations with heads or dependents outside the coordinate phrase (the "multi-head" view). This proposal is tested through corpus analysis of Wall Street Journal text. For right-branching constituents (such as direct-object NPs), a short-long preference for conjunct…
Full Text Available The NCHLT speech corpus contains wide-band speech from approximately 200 speakers per language, in each of the eleven of cial languages of South Africa. We describe the design and development processes that were undertaken in order to develop...
Full Text Available Nastassja A Lewinski,1 Ivan Jimenez,1 Bridget T McInnes2 1Department of Chemical and Life Science Engineering, Virginia Commonwealth University, Richmond, VA, 2Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA Abstract: A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Although corpora are available for pharmaceuticals, resources for nanomedicines and nanotechnology are still limited. To foster nanotechnology text mining (NanoNLP efforts, we have constructed a corpus of annotated drug product inserts taken from the US Food and Drug Administration’s Drugs@FDA online database. In this work, we present the development of the Engineered Nanomedicine Database corpus to support the evaluation of nanomedicine entity extraction. The data were manually annotated for 21 entity mentions consisting of nanomedicine physicochemical characterization, exposure, and biologic response information of 41 Food and Drug Administration-approved nanomedicines. We evaluate the reliability of the manual annotations and demonstrate the use of the corpus by evaluating two state-of-the-art named entity extraction systems, OpenNLP and Stanford NER. The annotated corpus is available open source and, based on these results, guidelines and suggestions for future development of additional nanomedicine corpora are provided. Keywords: nanotechnology, informatics, natural language processing, text mining, corpora
This article examines the governing initial norms, namely explicitness and euphemism in English source texts and Ndebele translations, focusing on how these norms influenced the strategies chosen by the Ndebele translators in the translation of taboo terms. In the article, a corpus-based approach is used to identify head ...
Mengi V. Ondar
Full Text Available Contemporary information technologies and mathematical modelling has made creating corpora of natural languages significantly easier. A corpus is an information and reference system based on a collection of digitally processed texts. A corpus includes various written and oral texts in the given language, a set of dictionaries and markup – information on the properties of the text. It is the presence of the markup which distinguishes a corpus from an electronic library. At the moment, national corpora are being set up for many languages of the Russian Federation, including those of the Turkic peoples. Faculty members, postgraduate and undergraduate students at Tuvan State University and Siberian Federal University are working on the National corpus of Tuvan language. This article describes the structure of a dictionary entry in the National corpus of Tuvan language. The corpus database comprises the following tables: MAIN – the headword table, RUS, ENG, GER — translations of the headword into three languages, MORPHOLOGY — the table containing morphological data on the headword. The database is built in Microsoft Office Access. Working with the corpus dictionary includes the following functions: adding, editing and removing an entry, entry search (with transcription, setting and visualizing morphological features of a headword. The project allows us to view the corpus dictionary as a multi-structure entity with a complex hierarchical structure and a dictionary entry as its key component. The corpus dictionary we developed can be used for studying Tuvan language in its pronunciation, orthography and word analysis, as well as for searching for words and collocations in the texts included into the corpus.
Full Text Available This paper is concerned with sketching future directions for corpus-based dialectology. We advocate a holistic approach to the study of geographically conditioned linguistic variability, and we present a suitable methodology, 'corpusbased dialectometry', in exactly this spirit. Specifically, we argue that in order to live up to the potential of the corpus-based method, practitioners need to (i abandon their exclusive focus on individual linguistic features in favor of the study of feature aggregates, (ii draw on computationally advanced multivariate analysis techniques (such as multidimensional scaling, cluster analysis, and principal component analysis, and (iii aid interpretation of empirical results by marshalling state-of-the-art data visualization techniques. To exemplify this line of analysis, we present a case study which explores joint frequency variability of 57 morphosyntax features in 34 dialects all over Great Britain.Este artigo debruça-se sobre o esboço propositivo de futuras direções para a dialetologia baseada em corpus. Defendemos uma abordagem holística para o estudo da variabilidade linguística geograficamente condicionada, e apresentamos uma metodologia adequada para tal - a dialetometria baseada em corpus. Mais especificamente, defendemos que para que se obtenham todos os resultados esperados da metodologia de corpus, pesquisadores devem: (i abandonar seu foco exclusivo em traços linguísticos individuais em favor do estudo dos agregados de traços, (ii amparar-se em métodos computacionais avançados de técnicas de análise multivariada (tais como escalagem multidimensional, análise de clusters, e análise de componente principal, e (iii auxiliar a interpretação de resultados empíricos através da utilização do estado da arte em técnicas de visualização. A fim de exemplificarmos essa linha de análise, apresentamos um estudo de caso que explora a variabilidade da frequência agregada de 57 tra
Full Text Available Polyethylene glycol (PEG has been shown to restore axonal continuity after peripheral nerve transection in animal models. We hypothesized that PEG can also restore axonal continuity in the central nervous system. In this current experiment, coronal sectioning of the brains of Sprague-Dawley rats was performed after animal sacrifice. 3Brain high-resolution microelectrode arrays (MEA were used to measure mean firing rate (MFR and peak amplitude across the corpus callosum of the ex-vivo brain slices. The corpus callosum was subsequently transected and repeated measurements were performed. The cut ends of the corpus callosum were still apposite at this time. A PEG solution was applied to the injury site and repeated measurements were performed. MEA measurements showed that PEG was capable of restoring electrophysiology signaling after transection of central nerves. Before injury, the average MFRs at the ipsilateral, midline, and contralateral corpus callosum were 0.76, 0.66, and 0.65 spikes/second, respectively, and the average peak amplitudes were 69.79, 58.68, and 49.60 μV, respectively. After injury, the average MFRs were 0.71, 0.14, and 0.25 spikes/second, respectively and peak amplitudes were 52.11, 8.98, and 16.09 μV, respectively. After application of PEG, there were spikes in MFR and peak amplitude at the injury site and contralaterally. The average MFRs were 0.75, 0.55, and 0.47 spikes/second at the ipsilateral, midline, and contralateral corpus callosum, respectively and peak amplitudes were 59.44, 45.33, 40.02 μV, respectively. There were statistically differences in the average MFRs and peak amplitudes between the midline and non-midline corpus callosum groups (P < 0.01, P < 0.05. These findings suggest that PEG restores axonal conduction between severed central nerves, potentially representing axonal fusion.
Scholl, Gerd; Berger, Gerald; Freytag, Elisabeth
experience with a knowledge brokerage system comprised of two intertwined building blocks, a series of “policy meets research” workshops which attracted almost 300 professionals from all over Europe, and a web platform named “SCP Knowledge Hub” which evolved into a major knowledge repository for almost 900...... registered users. We identify three design principles for effective knowledge brokerage and overcoming of the translation barriers between science and policy - i.e. participatory, activating and modular - and formulate practical recommendations for brokering knowledge through an online medium....
Lewinski, Nastassja A; Jimenez, Ivan; McInnes, Bridget T
A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP) approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Although corpora are available for pharmaceuticals, resources for nanomedicines and nanotechnology are still limited. To foster nanotechnology text mining (NanoNLP) efforts, we have constructed a corpus of annotated drug product inserts taken from the US Food and Drug Administration's Drugs@FDA online database. In this work, we present the development of the Engineered Nanomedicine Database corpus to support the evaluation of nanomedicine entity extraction. The data were manually annotated for 21 entity mentions consisting of nanomedicine physicochemical characterization, exposure, and biologic response information of 41 Food and Drug Administration-approved nanomedicines. We evaluate the reliability of the manual annotations and demonstrate the use of the corpus by evaluating two state-of-the-art named entity extraction systems, OpenNLP and Stanford NER. The annotated corpus is available open source and, based on these results, guidelines and suggestions for future development of additional nanomedicine corpora are provided.
The purpose of this study was to analyze the curriculum of one chiropractic college in order to discover if there were any implicit consensus definitions of the term subluxation. Using the software WordSmith Tools, the corpus of an undergraduate chiropractic curriculum was analyzed by reviewing collocated terms and through discourse analysis of text blocks containing words based on the root 'sublux.' It was possible to identify 3 distinct concepts which were each referred to as 'subluxation:' i) an acute or instantaneous injurious event; ii) a clinical syndrome which manifested post-injury; iii) a physical lesion, i.e. an anatomical or physiological derangement which in most instances acted as a pain generator. In fact, coherent implicit definitions of subluxation exist and may enjoy broad but subconscious acceptance. However, confusion likely arises from failure to distinguish which concept an author or speaker is referring to when they employ the term subluxation.
António M. Galvão
Full Text Available In adults, physiological angiogenesis is a rare event, with few exceptions as the vasculogenesis needed for tissue growth and function in female reproductive organs. Particularly in the corpus luteum (CL, regulation of angiogenic process seems to be tightly controlled by opposite actions resultant from the balance between pro- and antiangiogenic factors. It is the extremely rapid sequence of events that determines the dramatic changes on vascular and nonvascular structures, qualifying the CL as a great model for angiogenesis studies. Using the mare CL as a model, reports on locally produced cytokines, such as tumor necrosis factor α (TNF, interferon gamma (IFNG, or Fas ligand (FASL, pointed out their role on angiogenic activity modulation throughout the luteal phase. Thus, the main purpose of this review is to highlight the interaction between immune, endothelial, and luteal steroidogenic cells, regarding vascular dynamics/changes during establishment and regression of the equine CL.
Gabriel de Ávila Othero
Full Text Available In this paper, we present the results of our work with automatic morphological annotation of excerpts from a corpus of spoken language – belonging to the VARSUL project – using the free morphosyntatic tagger Aelius. We present 20 texts containing 154,530 words, annotated automatically and corrected manually. This paper presents the tagger Aelius and our work of manual review of the texts, as well as our suggestions for improvements of the tool, concerning aspects of oral texts. We verify the performance of morphosyntactic tagging a spoken language corpus, an unprecedented challenge for the tagger. Based on the errors of the tagger, we try to infer certain patterns of annotation to overcome limitations presented by the program, and we propose suggestions for implementations in order to allow Aelius to tag spoken language corpora in a more effective way, specially treating cases such as interjections, apheresis, onomatopeia and conversational markers.
Shemilt, Ian; Simon, Antonia; Hollands, Gareth J.; Marteau, Theresa M.; Ogilvie, David; O'Mara-Eves, Alison; Kelly, Michael P.; Thomas, James
In scoping reviews, boundaries of relevant evidence may be initially fuzzy, with refined conceptual understanding of interventions and their proposed mechanisms of action an intended output of the scoping process rather than its starting point. Electronic searches are therefore sensitive, often retrieving very large record sets that are…
Van Niekerk, DR
Full Text Available With the increasing prominence and maturity of corpus-based techniques for speech synthesis, the process of system development has in some ways been simplified considerably. However, the dependence on sufficient amounts of relevant speech data...
Hardin, J. S.; Sarkis, G.; URC, P. .
We use the Enron email corpus to study relationships in a network by applying six different measures of centrality. Our results came out of an in-semester undergraduate research seminar. The Enron corpus is well suited to statistical analyses at all levels of undergraduate education. Through this article's focus on centrality, students can explore…
Full Text Available
Abstract: Language corpora are now indispensable to dictionary compilation. They help broaden the role of the dictionary from standardizing the vocabulary to recording a language. The trilingual corpus generated by the Hong Kong Polytechnic University gives a record of business languages used in Hong Kong. It differs from other corpora in that (1 it includes English, Chinese and Japanese; (2 it shows local characteristics; and (3 it focuses on a specific area (financial services, including banking, accounting, auditing, insurance and investment. The paper discusses various issues of setting up a tricorpus, and how to make full use of the data to generate a trilingual lexicon.
Keywords: MULTILINGUAL, SPECIAL PURPOSE, CORPUS, LEXICON
Opsomming: Van sakekorpus tot sakeleksikon. Taalkorpora is tans onontbeerlik virdie samestelling van woordeboeke. Hulle help om die rol van die woordeboek uit te brei vanaf diestandaardisering van die woordeskat tot die optekening van ‘n taal. Die drietalige korpus wat deurdie Hongkongse Politegniese Universiteit ontwikkel is, verskaf ‘n opgawe van die saketale wat inHongkong gebruik word. Dit verskil van ander korpora deurdat (1 dit Engels, Chinees and Japaneesinsluit; (2 dit plaaslike eienskappe vertoon; en (3 dit op 'n spesifieke gebied (finansiële dienste,insluitende bankwese, rekeningkunde, ouditering, versekering en belegging fokus. Die artikelbespreek verskillende aspekte van die totstandbrenging van 'n drietalige korpus, en hoe om vollegebruik te maak van die data om 'n drietalige leksikon te genereer.
Sleutelwoorde: MEERTALIG, SPESIALE DOEL, KORPUS, LEKSIKON
Long William J
Full Text Available Abstract Background Text-based patient medical records are a vital resource in medical research. In order to preserve patient confidentiality, however, the U.S. Health Insurance Portability and Accountability Act (HIPAA requires that protected health information (PHI be removed from medical records before they can be disseminated. Manual de-identification of large medical record databases is prohibitively expensive, time-consuming and prone to error, necessitating automatic methods for large-scale, automated de-identification. Methods We describe an automated Perl-based de-identification software package that is generally usable on most free-text medical records, e.g., nursing notes, discharge summaries, X-ray reports, etc. The software uses lexical look-up tables, regular expressions, and simple heuristics to locate both HIPAA PHI, and an extended PHI set that includes doctors' names and years of dates. To develop the de-identification approach, we assembled a gold standard corpus of re-identified nursing notes with real PHI replaced by realistic surrogate information. This corpus consists of 2,434 nursing notes containing 334,000 words and a total of 1,779 instances of PHI taken from 163 randomly selected patient records. This gold standard corpus was used to refine the algorithm and measure its sensitivity. To test the algorithm on data not used in its development, we constructed a second test corpus of 1,836 nursing notes containing 296,400 words. The algorithm's false negative rate was evaluated using this test corpus. Results Performance evaluation of the de-identification software on the development corpus yielded an overall recall of 0.967, precision value of 0.749, and fallout value of approximately 0.002. On the test corpus, a total of 90 instances of false negatives were found, or 27 per 100,000 word count, with an estimated recall of 0.943. Only one full date and one age over 89 were missed. No patient names were missed in either
Fariss, Christopher J; Linder, Fridolin J; Jones, Zachary M; Crabtree, Charles D; Biek, Megan A; Ross, Ana-Sophia M; Kaur, Taranamol; Tsai, Michael
We introduce and make publicly available a large corpus of digitized primary source human rights documents which are published annually by monitoring agencies that include Amnesty International, Human Rights Watch, the Lawyers Committee for Human Rights, and the United States Department of State. In addition to the digitized text, we also make available and describe document-term matrices, which are datasets that systematically organize the word counts from each unique document by each unique term within the corpus of human rights documents. To contextualize the importance of this corpus, we describe the development of coding procedures in the human rights community and several existing categorical indicators that have been created by human coding of the human rights documents contained in the corpus. We then discuss how the new human rights corpus and the existing human rights datasets can be used with a variety of statistical analyses and machine learning algorithms to help scholars understand how human rights practices and reporting have evolved over time. We close with a discussion of our plans for dataset maintenance, updating, and availability.
Jéssyca Camargo Cruz
Full Text Available This article aims at presenting a quantitative research and the analysis of the use of prepositions that are less frequent (underused in a corpus of learners of Spanish as a foreign language. We have observed the use of contra, hacia, enfrente de, excepto and tras through Corpus Linguistics by contrasting this lexical set and a supplementary corpus, composed by normative and descriptive Spanish grammar and by an online reference corpus of Spanish (CREA. Therefore, we present analyses made on a corpus constituted by 276 writings (85.729 words, gathered from two groups of freshman Language/Letras students, from 2011 to 2013. The data were collected with the aid of the WordSmith Tools (version 6 software and its tools, WordList and Concord enabled us to extract the frequency list of the prepositions in the corpus of study, as well as to observe and analyse their respective uses based on the lines of concordance.
Full Text Available Background/Aim. Changes in the morphology and the size of the corpus callosum, are related to various pathological conditions. An analysis of these changes requires data about sexual dimorphism of the corpus callosum, which we tried to obtain in our study. We also investigated the method of digital morphometry and compared the obtained results with the results of other authors obtained by magnetic resonance imaging or by planimetry. Methods. A morphological research included 34 human brains (cadavers of both sexes − 19 female and 15 male aged 26−72 years. By digital morphometry using an AutoCAD software we performed measurements in the corpus callosum: the length (L, width in the half of its length (WW’, length of its cortical margin (LCM, area and perimeter of the anterior and posterior callosal segments, as well as the area and perimeter of the corpus callosum section area. The investigated parameters were analyzed and compared between the females and males. Results. There was not a statistically significant difference between the males and females in the investigated parameters of the corpus callosum (t test; p > 0.05, including the mean values of the two most important parameters, the surface of its midsagittal section area (males 654.11 mm2; females 677.40 mm2 and of its perimeter (males 19.61 cm; females 19.72 cm. The results obtained by digital morphometry were in the range of the results of other authors obtained by magnetic resonance and by planimetry. However, the value of Pearson coefficient of linear correlation between the section surface area and perimeter of the corpus callosum in the males was highly significant (rxy = 0.6943, p < 0.01, while in the females this value was statistically insignificant. Conclusion. Digital morphometry is accurate method in encephalometric investigations. Our results suggest that the problem of sexual dimorphism of the corpus callosum is very complex, because the identical variables (section
Tablan, Valentin; Roberts, Ian; Cunningham, Hamish; Bontcheva, Kalina
Cloud computing is increasingly being regarded as a key enabler of the 'democratization of science', because on-demand, highly scalable cloud computing facilities enable researchers anywhere to carry out data-intensive experiments. In the context of natural language processing (NLP), algorithms tend to be complex, which makes their parallelization and deployment on cloud platforms a non-trivial task. This study presents a new, unique, cloud-based platform for large-scale NLP research--GATECloud. net. It enables researchers to carry out data-intensive NLP experiments by harnessing the vast, on-demand compute power of the Amazon cloud. Important infrastructural issues are dealt with by the platform, completely transparently for the researcher: load balancing, efficient data upload and storage, deployment on the virtual machines, security and fault tolerance. We also include a cost-benefit analysis and usage evaluation.
Full Text Available In Geographical Information Systems, geo-coding is used for the task of mapping from implicitly geo-referenced data to explicitly geo-referenced coordinates. At present, an enormous amount of implicitly geo-referenced information is hidden in unstructured text, e.g., Wikipedia, social data and news. Toponym recognition is the foundation of mining this useful geo-referenced information by identifying words as toponyms in text. In this paper, we propose an adapted toponym recognition approach based on deep belief network (DBN by exploring two key issues: word representation and model interpretation. A Skip-Gram model is used in the word representation process to represent words with contextual information that are ignored by current word representation models. We then determine the core hyper-parameters of the DBN model by illustrating the relationship between the performance and the hyper-parameters, e.g., vector dimensionality, DBN structures and probability thresholds. The experiments evaluate the performance of the Skip-Gram model implemented by the Word2Vec open-source tool, determine stable hyper-parameters and compare our approach with a conditional random field (CRF based approach. The experimental results show that the DBN model outperforms the CRF model with smaller corpus. When the corpus size is large enough, their statistical metrics become approaching. However, their recognition results express differences and complementarity on different kinds of toponyms. More importantly, combining their results can directly improve the performance of toponym recognition relative to their individual performances. It seems that the scale of the corpus has an obvious effect on the performance of toponym recognition. Generally, there is no adequate tagged corpus on specific toponym recognition tasks, especially in the era of Big Data. In conclusion, we believe that the DBN-based approach is a promising and powerful method to extract geo
Arthur M. Jacobs
Full Text Available This paper describes a corpus of about 3,000 English literary texts with about 250 million words extracted from the Gutenberg project that span a range of genres from both fiction and non-fiction written by more than 130 authors (e.g., Darwin, Dickens, Shakespeare. Quantitative narrative analysis (QNA is used to explore a cleaned subcorpus, the Gutenberg English Poetry Corpus (GEPC, which comprises over 100 poetic texts with around two million words from about 50 authors (e.g., Keats, Joyce, Wordsworth. Some exemplary QNA studies show author similarities based on latent semantic analysis, significant topics for each author or various text-analytic metrics for George Eliot’s poem “How Lisa Loved the King” and James Joyce’s “Chamber Music,” concerning, e.g., lexical diversity or sentiment analysis. The GEPC is particularly suited for research in Digital Humanities, Computational Stylistics, or Neurocognitive Poetics, e.g., as training and test corpus for stimulus development and control in empirical studies.
Full Text Available In the 20th century structuralism established itself as the central linguistic theory, in the first half mainly through its originator Ferdinand de Saussure, and in the second half with the figure of Noam Chomsky. The latter consistently refused to acknowledge analysis of extensive quantity of texts as a valuable method, and favoured linguistic intuition of a native speaker instead. In parallel with structuralism other trends in linguistics emerged which pointed to the inadequateness of the prevailing linguistic paradigm and to theoretical insights which were only possible after the systematic analysis of large quantities of texts. The paper discusses some of the dilemmas stemming from this dichotomy and places corpus linguistics in a broader linguistic context.
Mahantesh K. Pattanshetti
Full Text Available Learning has transcended into a life-long endeavor in the information age. It is no longer restricted to confines of formal classrooms. Consequently, a student is not restricted to traditional learning resources like teachers, textbooks or printed content. Digital resources available on the Internet form a very significant component of self-learning. Copious volumes of learning resources without legal barriers to self-learning reside in digital repositories, educational institution portals and on numerous websites. Learners wishing to utilize the web for personalized learning are faced with a daunting array of content to wade through and select the suitable ones to fulfill his/her learning objectives. Therefore, it is not a question of availability; it is one of relevance and suitability. Typically, in addition to time constraints, learners lack the expertise to screen content for effective eLearning. Adaptive hypermedia systems (AHSs offer a path to harnessing this large volume of learning resources for personalized learning. This review paper provides a concise and coherent discussion about the evolution of AHSs along with the challenges that need to be addressed for effectively harnessing openly available educational resources referred to as open corpus resources (OCRs.
Full Text Available At present, the mainstream lexicalized English writing methods take only the corpus dependence between words into consideration, without introducing the corpus collocation and other issues. “Drive” is a relatively essential feature of words. And once the drive structure of a word is determined, it will be relatively clear what kinds of words to collocate with, hence the structure of the sentence can be derived relatively directly. In this paper, the English writing model that relies on the computer network corpus drive model is put forward. In this model, rich English corpus is introduced in the decomposition of the rules and the calculation of the probability, which includes not only the corpus dependence information, but also the drive structure and other corpus collocation information. Improved computer network corpus drive model is used to carry out the English writing teaching experiment. The experimental results show that the precision and the recall rate are 88.76% and 87.43%, respectively. The F value of the comprehensive index is improved by 6.65% compared with the Collins headword driven English modes of writing.
Full Text Available For long it was believed that a particular population of enteric neurons, referred to as intrinsic primary afferent neuron (IPANs, encodes mechanical stimulation. We recently proposed a new concept suggesting that there are in addition mechanosensitive enteric neurons (MEN that are multifunctional. Based on firing pattern MEN behaved as rapidly, slowly or ultra-slowly adapting RAMEN, SAMEN or USAMEN, respectively. We aimed to validate this concept in the myenteric plexus of the gastric corpus, a region where IPANs were not identified and existence of enteric sensory neurons was even questioned. The gastric corpus is characterized by a particularly dense extrinsic sensory innervation. Neuronal activity was recorded with voltage sensitive dye imaging after deformation of ganglia by compression (intraganglionic volume injection or von Fry hair or tension (ganglionic stretch. We demonstrated that 27% of the gastric neurons were MEN and responded to intraganglionic volume injection. Of these 73% were RAMEN, 25% SAMEN and 2% USAMEN with a firing frequency of 1.7 (1.1/ 2.2 Hz, 5.1 (2.2/7.7 Hz and of 5.4 (5.0/15.5 Hz, respectively. The responses were reproducible and stronger with increased stimulus strength. Even after adaptation another deformation evoked spike discharge again suggesting a resetting mode of the mechanoreceptors. All MEN received fast synaptic input. 55% of all MEN were cholinergic and 45% nitrergic. Responses in some MEN significantly decreased after perfusion of TTX, low Ca++/high Mg++ Krebs solution, capsaicin induced nerve defunctionalization and capsazepine indicating the involvement of TRPV1 expressing extrinsic mechanosensitive nerves. Half of gastric MEN responded to intraganglionic volume injection as well as to ganglionic stretch and 23% responded to stretch only. Tension-sensitive MEN were to a large proportion USAMEN (44%. In summary, we demonstrated for the first time compression and tension-sensitive MEN in the stomach
Srinivasa Rao, A.; Rao, V.R.K.; Ravi Mandalam, K.; Gupta, A.K.; Kumar, S.; Joseph, S.; Unni, M.
Computed tomographic and plain X-ray observations in a patient with corpus callosum lipoma associated with frontal encephalocele are reported. The rarity of the lesion and the specific diagnostic criteria on CT are emphasised. (orig.)
The role of astrology in Arnau de Vilanova's medical work is revisited with special attention to the problems of authorship posed by the astrological writings of Arnau's corpus and to their hypothetical chronology.
Full Text Available
Abstract: This article presents various approaches used in corpus-based computational lexico-graphy. A claim is made that in order for computational lexicography to be efficient, precise and comprehensive, it should utilize the method where the corpus text is first analysed, and the results of this analysis is then processed further to meet the needs of a dictionary. This method has several advantages, including high precision and recall, as well as the possibility to automate the process much further than with more traditional computational methods. The frequency list obtained by using the lemma (the equivalent of the headword as basis helps in selecting the words to be in-cluded in the dictionary. The approach is demonstrated through various phases by applying SALAMA (the Swahili Language Manager to the process. Manual work will be needed in the phase when examples of use are selected from the corpus, and possibly modified. However, the list of examples of use, arranged alphabetically according to the corresponding headword, can also be produced automatically. Thus the alphabetical list of headwords with examples of use is the mate-rial on which the lexicographer works manually. The article deals with problems encountered in compiling traditional printed dictionaries, and it excludes electronic dictionaries and thesauri.
Keywords: LEXICOGRAPHY, DICTIONARY, LANGUAGE TECHNOLOGY, COMPUTA-TIONAL LINGUISTICS, AUTOMATIC COMPILATION, DICTIONARY TESTING, INFORMA-TION RETRIEVAL, MORPHOLOGICAL ANALYSIS, SEMANTIC ANALYSIS, DISAMBIGUA-TION, HEURISTICS
Opsomming: Nuwe ontwikkelinge in korpusgebaseerde leksikografie. Hier-die artikel beskryf verskillende benaderings wat in korpusgebaseerde rekenaarleksikografie ge-bruik word. Daar word aangevoer dat vir rekenaarleksikografie om doelmatig, noukeurig en omvattend te wees, dit die metode behoort te gebruik waarby die korpusteks eers ontleed word, en die resultaat van hierdie ontleding dan verder
Cohen, Raphael; Elhadad, Michael; Elhadad, Noémie
The increasing availability of Electronic Health Record (EHR) data and specifically free-text patient notes presents opportunities for phenotype extraction. Text-mining methods in particular can help disease modeling by mapping named-entities mentions to terminologies and clustering semantically related terms. EHR corpora, however, exhibit specific statistical and linguistic characteristics when compared with corpora in the biomedical literature domain. We focus on copy-and-paste redundancy: clinicians typically copy and paste information from previous notes when documenting a current patient encounter. Thus, within a longitudinal patient record, one expects to observe heavy redundancy. In this paper, we ask three research questions: (i) How can redundancy be quantified in large-scale text corpora? (ii) Conventional wisdom is that larger corpora yield better results in text mining. But how does the observed EHR redundancy affect text mining? Does such redundancy introduce a bias that distorts learned models? Or does the redundancy introduce benefits by highlighting stable and important subsets of the corpus? (iii) How can one mitigate the impact of redundancy on text mining? We analyze a large-scale EHR corpus and quantify redundancy both in terms of word and semantic concept repetition. We observe redundancy levels of about 30% and non-standard distribution of both words and concepts. We measure the impact of redundancy on two standard text-mining applications: collocation identification and topic modeling. We compare the results of these methods on synthetic data with controlled levels of redundancy and observe significant performance variation. Finally, we compare two mitigation strategies to avoid redundancy-induced bias: (i) a baseline strategy, keeping only the last note for each patient in the corpus; (ii) removing redundant notes with an efficient fingerprinting-based algorithm. (a)For text mining, preprocessing the EHR corpus with fingerprinting yields
Full Text Available Corpus callosum agenesis (CCA was evaluated by ultrasound examination and magnetic resonance imaging (MRI with many studies. Ultrasonography was able to suspect CCA by indirect signs but a definitive diagnosis of CCA was achieved in rare cases. MRI was able to diagnose complete CCA in majority of cases. Additional neurological abnormalities including heterotopia, gyration anomaly, asymmetry of the cerebral hemispheres, and Dandy-Walker variant were documented, as well as an ocular anomaly which was described, by MRI examination. Prenatal counseling for fetal agenesis of the corpus callosum is difficult as the prognosis is uncertain. The association with other cerebral abnormalities increases the likelihood of a poor outcome and ultrasonographic assessment of the fetal brain is limited. We found MRI to be a safe and useful additional procedure to complement ultrasonographic diagnosis or suspicion of CCA.
Full Text Available In this paper I explore the potential of a corpus stylistic approach to the study of literary translation. The study focuses on translation of children’s literature with its specific constrains, and illustrates with two corpus linguistic techniques: keyword and cluster analysis — specific cases of repetition. So in a broader sense the paper discusses the phenomenon of repetition in different literary (stylistic traditions. These are illustrated by examples from two children’s classics aimed at two different age groups: the Harry Potter and the Winnie the Pooh books — and their translations into Czech. Various shifts in translation, especially in the translation of children’s literature, are often explained by the operation of so-called ‘translation universals’. Though ‘repetition’ as such does not belong to the commonly discussed set of translation universals, the stylistic norms opposing repetition seem to be a strong explanation for the translation shifts identified.
Allan F. Lauder
Full Text Available This paper looks at the nature of data for lexicography and in particular on the central role that electronic corpora can play in providing it. Data has traditionally come from existing dictionaries, citations, and from the lexicographer’s own knowledge of words, through introspection. Each of these is examined and evaluated. Then the electronic corpus is considered. Different kinds of corpora are described and key design criteria are explained, in particular the size of corpus needed for lexicography as well as the issue of representativeness and sampling. The advantages and disadvantages of corpora are weighed and compared against the other types of data. While each of these has benefits, it is argued that corpora are a requirement, not an option, as data for dictionary making.
Christenson Lane K
Full Text Available Abstract The synthesis of progesterone by the corpus luteum is essential for the establishment and maintenance of early pregnancy. Regulation of luteal steroidogenesis can be broken down into three major events; luteinization (i.e., conversion of an ovulatory follicle, luteal regression, and pregnancy induced luteal maintenance/rescue. While the factors that control these events and dictate the final steroid end products are widely varied among different species, the composition of the corpus luteum (luteinized thecal and granulosa cells and the enzymes and proteins involved in the steroidogenic pathway are relatively similar among all species. The key factors involved in luteal steroidogenesis and several new exciting observations regarding regulation of luteal steroidogenic function are discussed in this review.
Knepper Timothy D.
Full Text Available Is the Dionysian God, or an experience of the Dionysian God, absolutely ineffable? Does the Dionysian corpus assert or perform such ineffability? This paper will argue that the answer to each of these questions is no. The Dionysian God is known hyper-nous as the hyper-ousia cause of all. And the Dionysian corpus unambiguously refers to, asserts of, and metaphorizes about this God just so. In arguing these points, this paper will call upon both the speech act theory of John Searle and the metaphor theory of George Lakoff and Mark Johnson. More particularly, it will look to Searle’s rules of reference and predication and conditions of illocutionary acts, as well as Lakoff and Johnson’s schematization of metaphor gestalt and entailment to show how Dionysian expressions of inexpressibility are rule-governed and the Dionysian God is thereby (relatively effable.
Full Text Available This paper describes a corpus-based approach to teaching and learning spoken grammar for English for Academic Purposes with reference to Bhatia’s (2002 multi-perspective model for discourse analysis: a textual perspective, a genre perspective and a social perspective. From a textual perspective, corpus-informed instruction helps students identify grammar items through statistical frequencies, collocational patterns, context-sensitive meanings and discoursal uses of words. From a genre perspective, corpus observation provides students with exposure to recurrent lexico-grammatical patterns across different academic text types (genres. From a social perspective, corpus models can be used to raise learners’ awareness of how speakers’ different discourse roles, discourse privileges and power statuses are enacted in their grammar choices. The paper describes corpus-based instructional procedures, gives samples of learners’ linguistic output, and provides comments on the students’ response to this method of instruction. Data resulting from the assessment process and student production suggest that corpus-informed instruction grounded in Bhatia’s multi-perspective model can constitute a pedagogical approach in order to i obtain positive student responses from input and authentic samples of grammar use, ii help students identify and understand the textual, genre and social aspects of grammar in real contexts of use, and therefore iii help develop students’ ability to use grammar accurately and appropriately.
Full Text Available This article describes research undertaken in order to design a methodology for the reticular representation of knowledge of a specific discourse community. To achieve this goal, a representative corpus of the scientific production of the members of this discourse community (Universidad Politécnica de Valencia, UPV was created. The article presents the practical analysis (frequency, keyword, collocation and cluster analysis that was carried out in the initial phases of the study aimed at establishing the theoretical and practical background and framework for our matrix and network analysis of the scientific discourse of the UPV. In the methodology section, the processes that have allowed us to extract from the corpus the linguistic elements needed to develop co-occurrence matrices, as well as the computer tools used in the research, are described. From these co-occurrence matrices, semantic networks of subject and discipline knowledge were generated. Finally, based on the results obtained, we suggest that it may be viable to extract and to represent the intellectual capital of an academic institution using corpus linguistics methods in combination with the formulations of network theory.En este artículo describimos la investigación que se ha desarrollado en el diseño de una metodología para la representación reticular del conocimiento que se genera en el seno de una institución a partir de un corpus representativo de la producción científica de los integrantes de dicha comunidad discursiva, la Universidad Politécnica de Valencia.. Para ello, presentamos las acciones que se realizaron en las fases iniciales del estudio encaminadas a establecer el marco teórico y práctico en el que se inscribe nuestro análisis. En la sección de metodología se describen las herramientas informáticas utilizadas, así como los procesos que nos permitieron disponer de aquellos elementos presentes en el corpus, que nos llevarían al desarrollo de
Full Text Available This paper offers a formally driven quantitative analysis of stance-annotated sentences in the Brexit Blog Corpus (BBC. Our goal is to identify features that determine the formal profiles of six stance categories (contrariety, hypotheticality, necessity, prediction, source of knowledge and uncertainty in a subset of the BBC. The study has two parts: firstly, it examines a large number of formal linguistic features, such as punctuation, words and grammatical categories that occur in the sentences in order to describe the specific characteristics of each category, and secondly, it compares characteristics in the entire data set in order to determine stance similarities in the data set. We show that among the six stance categories in the corpus, contrariety and necessity are the most discriminative ones, with the former using longer sentences, more conjunctions, more repetitions and shorter forms than the sentences expressing other stances. necessity has longer lexical forms but shorter sentences, which are syntactically more complex. We show that stance in our data set is expressed in sentences with around 21 words per sentence. The sentences consist mainly of alphabetical characters forming a varied vocabulary without special forms, such as digits or special characters.
Aaboud, M; Aad, G; Abbott, B; Abdallah, J; Abdinov, O; Abeloos, B; Abidi, S H; AbouZeid, O S; Abraham, N L; Abramowicz, H; Abreu, H; Abreu, R; Abulaiti, Y; Acharya, B S; Adachi, S; Adamczyk, L; Adelman, J; Adersberger, M; Adye, T; Affolder, A A; Agatonovic-Jovin, T; Agheorghiesei, C; Aguilar-Saavedra, J A; Ahlen, S P; Ahmadov, F; Aielli, G; Akatsuka, S; Akerstedt, H; Åkesson, T P A; Akimov, A V; Alberghi, G L; Albert, J; Albicocco, P; Alconada Verzini, M J; Aleksa, M; Aleksandrov, I N; Alexa, C; Alexander, G; Alexopoulos, T; Alhroob, M; Ali, B; Aliev, M; Alimonti, G; Alison, J; Alkire, S P; Allbrooke, B M M; Allen, B W; Allport, P P; Aloisio, A; Alonso, A; Alonso, F; Alpigiani, C; Alshehri, A A; Alstaty, M; Alvarez Gonzalez, B; Álvarez Piqueras, D; Alviggi, M G; Amadio, B T; Amaral Coutinho, Y; Amelung, C; Amidei, D; Amor Dos Santos, S P; Amorim, A; Amoroso, S; Amundsen, G; Anastopoulos, C; Ancu, L S; Andari, N; Andeen, T; Anders, C F; Anders, J K; Anderson, K J; Andreazza, A; Andrei, V; Angelidakis, S; Angelozzi, I; Angerami, A; Anisenkov, A V; Anjos, N; Annovi, A; Antel, C; Antonelli, M; Antonov, A; Antrim, D J; Anulli, F; Aoki, M; Aperio Bella, L; Arabidze, G; Arai, Y; Araque, J P; Araujo Ferraz, V; Arce, A T H; Ardell, R E; Arduh, F A; Arguin, J-F; Argyropoulos, S; Arik, M; Armbruster, A J; Armitage, L J; Arnaez, O; Arnold, H; Arratia, M; Arslan, O; Artamonov, A; Artoni, G; Artz, S; Asai, S; Asbah, N; Ashkenazi, A; Asquith, L; Assamagan, K; Astalos, R; Atkinson, M; Atlay, N B; Aubry, L; Augsten, K; Avolio, G; Axen, B; Ayoub, M K; Azuelos, G; Baas, A E; Baca, M J; Bachacou, H; Bachas, K; Backes, M; Backhaus, M; Bagnaia, P; Bahrasemani, H; Baines, J T; Bajic, M; Baker, O K; Baldin, E M; Balek, P; Balli, F; Balunas, W K; Banas, E; Banerjee, Sw; Bannoura, A A E; Barak, L; Barberio, E L; Barberis, D; Barbero, M; Barillari, T; Barisits, M-S; Barklow, T; Barlow, N; Barnes, S L; Barnett, B M; Barnett, R M; Barnovska-Blenessy, Z; Baroncelli, A; Barone, G; Barr, A J; Barranco Navarro, L; Barreiro, F; Barreiro Guimarães da Costa, J; Bartoldus, R; Barton, A E; Bartos, P; Basalaev, A; Bassalat, A; Bates, R L; Batista, S J; Batley, J R; Battaglia, M; Bauce, M; Bauer, F; Bawa, H S; Beacham, J B; Beattie, M D; Beau, T; Beauchemin, P H; Bechtle, P; Beck, H P; Becker, K; Becker, M; Beckingham, M; Becot, C; Beddall, A J; Beddall, A; Bednyakov, V A; Bedognetti, M; Bee, C P; Beermann, T A; Begalli, M; Begel, M; Behr, J K; Bell, A S; Bella, G; Bellagamba, L; Bellerive, A; Bellomo, M; Belotskiy, K; Beltramello, O; Belyaev, N L; Benary, O; Benchekroun, D; Bender, M; Bendtz, K; Benekos, N; Benhammou, Y; Benhar Noccioli, E; Benitez, J; Benjamin, D P; Benoit, M; Bensinger, J R; Bentvelsen, S; Beresford, L; Beretta, M; Berge, D; Bergeaas Kuutmann, E; Berger, N; Beringer, J; Berlendis, S; Bernard, N R; Bernardi, G; Bernius, C; Bernlochner, F U; Berry, T; Berta, P; Bertella, C; Bertoli, G; Bertolucci, F; Bertram, I A; Bertsche, C; Bertsche, D; Besjes, G J; Bessidskaia Bylund, O; Bessner, M; Besson, N; Betancourt, C; Bethani, A; Bethke, S; Bevan, A J; Beyer, J; Bianchi, R M; Biebel, O; Biedermann, D; Bielski, R; Biesuz, N V; Biglietti, M; Bilbao De Mendizabal, J; Billoud, T R V; Bilokon, H; Bindi, M; Bingul, A; Bini, C; Biondi, S; Bisanz, T; Bittrich, C; Bjergaard, D M; Black, C W; Black, J E; Black, K M; Blair, R E; Blazek, T; Bloch, I; Blocker, C; Blue, A; Blum, W; Blumenschein, U; Blunier, S; Bobbink, G J; Bobrovnikov, V S; Bocchetta, S S; Bocci, A; Bock, C; Boehler, M; Boerner, D; Bogavac, D; Bogdanchikov, A G; Bohm, C; Boisvert, V; Bokan, P; Bold, T; Boldyrev, A S; Bolz, A E; Bomben, M; Bona, M; Boonekamp, M; Borisov, A; Borissov, G; Bortfeldt, J; Bortoletto, D; Bortolotto, V; Boscherini, D; Bosman, M; Bossio Sola, J D; Boudreau, J; Bouffard, J; Bouhova-Thacker, E V; Boumediene, D; Bourdarios, C; Boutle, S K; Boveia, A; Boyd, J; Boyko, I R; Bracinik, J; Brandt, A; Brandt, G; Brandt, O; Bratzler, U; Brau, B; Brau, J E; Breaden Madden, W D; Brendlinger, K; Brennan, A J; Brenner, L; Brenner, R; Bressler, S; Briglin, D L; Bristow, T M; Britton, D; Britzger, D; Brochu, F M; Brock, I; Brock, R; Brooijmans, G; Brooks, T; Brooks, W K; Brosamer, J; Brost, E; Broughton, J H; Bruckman de Renstrom, P A; Bruncko, D; Bruni, A; Bruni, G; Bruni, L S; Brunt, B H; Bruschi, M; Bruscino, N; Bryant, P; Bryngemark, L; Buanes, T; Buat, Q; Buchholz, P; Buckley, A G; Budagov, I A; Buehrer, F; Bugge, M K; Bulekov, O; Bullock, D; Burch, T J; Burckhart, H; Burdin, S; Burgard, C D; Burger, A M; Burghgrave, B; Burka, K; Burke, S; Burmeister, I; Burr, J T P; Busato, E; Büscher, D; Büscher, V; Bussey, P; Butler, J M; Buttar, C M; Butterworth, J M; Butti, P; Buttinger, W; Buzatu, A; Buzykaev, A R; Cabrera Urbán, S; Caforio, D; Cairo, V M; Cakir, O; Calace, N; Calafiura, P; Calandri, A; Calderini, G; Calfayan, P; Callea, G; Caloba, L P; Calvente Lopez, S; Calvet, D; Calvet, S; Calvet, T P; Camacho Toro, R; Camarda, S; Camarri, P; Cameron, D; Caminal Armadans, R; Camincher, C; Campana, S; Campanelli, M; Camplani, A; Campoverde, A; Canale, V; Cano Bret, M; Cantero, J; Cao, T; Capeans Garrido, M D M; Caprini, I; Caprini, M; Capua, M; Carbone, R M; Cardarelli, R; Cardillo, F; Carli, I; Carli, T; Carlino, G; Carlson, B T; Carminati, L; Carney, R M D; Caron, S; Carquin, E; Carrá, S; Carrillo-Montoya, G D; Carvalho, J; Casadei, D; Casado, M P; Casolino, M; Casper, D W; Castelijn, R; Castillo Gimenez, V; Castro, N F; Catinaccio, A; Catmore, J R; Cattai, A; Caudron, J; Cavaliere, V; Cavallaro, E; Cavalli, D; Cavalli-Sforza, M; Cavasinni, V; Celebi, E; Ceradini, F; Cerda Alberich, L; Cerqueira, A S; Cerri, A; Cerrito, L; Cerutti, F; Cervelli, A; Cetin, S A; Chafaq, A; Chakraborty, D; Chan, S K; Chan, W S; Chan, Y L; Chang, P; Chapman, J D; Charlton, D G; Chau, C C; Chavez Barajas, C A; Che, S; Cheatham, S; Chegwidden, A; Chekanov, S; Chekulaev, S V; Chelkov, G A; Chelstowska, M A; Chen, C; Chen, H; Chen, S; Chen, S; Chen, X; Chen, Y; Cheng, H C; Cheng, H J; Cheplakov, A; Cheremushkina, E; Cherkaoui El Moursli, R; Chernyatin, V; Cheu, E; Chevalier, L; Chiarella, V; Chiarelli, G; Chiodini, G; Chisholm, A S; Chitan, A; Chiu, Y H; Chizhov, M V; Choi, K; Chomont, A R; Chouridou, S; Christodoulou, V; Chromek-Burckhart, D; Chu, M C; Chudoba, J; Chuinard, A J; Chwastowski, J J; Chytka, L; Ciftci, A K; Cinca, D; Cindro, V; Cioara, I A; Ciocca, C; Ciocio, A; Cirotto, F; Citron, Z H; Citterio, M; Ciubancan, M; Clark, A; Clark, B L; Clark, M R; Clark, P J; Clarke, R N; Clement, C; Coadou, Y; Cobal, M; Coccaro, A; Cochran, J; Colasurdo, L; Cole, B; Colijn, A P; Collot, J; Colombo, T; Conde Muiño, P; Coniavitis, E; Connell, S H; Connelly, I A; Constantinescu, S; Conti, G; Conventi, F; Cooke, M; Cooper-Sarkar, A M; Cormier, F; Cormier, K J R; Corradi, M; Corriveau, F; Cortes-Gonzalez, A; Cortiana, G; Costa, G; Costa, M J; Costanzo, D; Cottin, G; Cowan, G; Cox, B E; Cranmer, K; Crawley, S J; Creager, R A; Cree, G; Crépé-Renaudin, S; Crescioli, F; Cribbs, W A; Cristinziani, M; Croft, V; Crosetti, G; Cueto, A; Cuhadar Donszelmann, T; Cukierman, A R; Cummings, J; Curatolo, M; Cúth, J; Czirr, H; Czodrowski, P; D'amen, G; D'Auria, S; D'eramo, L; D'Onofrio, M; Da Cunha Sargedas De Sousa, M J; Da Via, C; Dabrowski, W; Dado, T; Dai, T; Dale, O; Dallaire, F; Dallapiccola, C; Dam, M; Dandoy, J R; Daneri, M F; Dang, N P; Daniells, A C; Dann, N S; Danninger, M; Dano Hoffmann, M; Dao, V; Darbo, G; Darmora, S; Dassoulas, J; Dattagupta, A; Daubney, T; Davey, W; David, C; Davidek, T; Davies, M; Davis, D R; Davison, P; Dawe, E; Dawson, I; De, K; de Asmundis, R; De Benedetti, A; De Castro, S; De Cecco, S; De Groot, N; de Jong, P; De la Torre, H; De Lorenzi, F; De Maria, A; De Pedis, D; De Salvo, A; De Sanctis, U; De Santo, A; De Vasconcelos Corga, K; De Vivie De Regie, J B; Dearnaley, W J; Debbe, R; Debenedetti, C; Dedovich, D V; Dehghanian, N; Deigaard, I; Del Gaudio, M; Del Peso, J; Del Prete, T; Delgove, D; Deliot, F; Delitzsch, C M; Dell'Acqua, A; Dell'Asta, L; Dell'Orso, M; Della Pietra, M; Della Volpe, D; Delmastro, M; Delporte, C; Delsart, P A; DeMarco, D A; Demers, S; Demichev, M; Demilly, A; Denisov, S P; Denysiuk, D; Derendarz, D; Derkaoui, J E; Derue, F; Dervan, P; Desch, K; Deterre, C; Dette, K; Devesa, M R; Deviveiros, P O; Dewhurst, A; Dhaliwal, S; Di Bello, F A; Di Ciaccio, A; Di Ciaccio, L; Di Clemente, W K; Di Donato, C; Di Girolamo, A; Di Girolamo, B; Di Micco, B; Di Nardo, R; Di Petrillo, K F; Di Simone, A; Di Sipio, R; Di Valentino, D; Diaconu, C; Diamond, M; Dias, F A; Diaz, M A; Diehl, E B; Dietrich, J; Díez Cornell, S; Dimitrievska, A; Dingfelder, J; Dita, P; Dita, S; Dittus, F; Djama, F; Djobava, T; Djuvsland, J I; do Vale, M A B; Dobos, D; Dobre, M; Doglioni, C; Dolejsi, J; Dolezal, Z; Donadelli, M; Donati, S; Dondero, P; Donini, J; Dopke, J; Doria, A; Dova, M T; Doyle, A T; Drechsler, E; Dris, M; Du, Y; Duarte-Campderros, J; Dubreuil, A; Duchovni, E; Duckeck, G; Ducourthial, A; Ducu, O A; Duda, D; Dudarev, A; Dudder, A Chr; Duffield, E M; Duflot, L; Dührssen, M; Dumancic, M; Dumitriu, A E; Duncan, A K; Dunford, M; Duran Yildiz, H; Düren, M; Durglishvili, A; Duschinger, D; Dutta, B; Dyndal, M; Eckardt, C; Ecker, K M; Edgar, R C; Eifert, T; Eigen, G; Einsweiler, K; Ekelof, T; El Kacimi, M; El Kosseifi, R; Ellajosyula, V; Ellert, M; Elles, S; Ellinghaus, F; Elliot, A A; Ellis, N; Elmsheuser, J; Elsing, M; Emeliyanov, D; Enari, Y; Endner, O C; Ennis, J S; Erdmann, J; Ereditato, A; Ernis, G; Ernst, M; Errede, S; Escalier, M; Escobar, C; Esposito, B; Estrada Pastor, O; Etienvre, A I; Etzion, E; Evans, H; Ezhilov, A; Ezzi, M; Fabbri, F; Fabbri, L; Facini, G; Fakhrutdinov, R M; Falciano, S; Falla, R J; Faltova, J; Fang, Y; Fanti, M; Farbin, A; Farilla, A; Farina, C; Farina, E M; Farooque, T; Farrell, S; Farrington, S M; Farthouat, P; Fassi, F; Fassnacht, P; Fassouliotis, D; Faucci Giannelli, M; Favareto, A; Fawcett, W J; Fayard, L; Fedin, O L; Fedorko, W; Feigl, S; Feligioni, L; Feng, C; Feng, E J; Feng, H; Fenton, M J; Fenyuk, A B; Feremenga, L; Fernandez Martinez, P; Fernandez Perez, S; Ferrando, J; Ferrari, A; Ferrari, P; Ferrari, R; Ferreira de Lima, D E; Ferrer, A; Ferrere, D; Ferretti, C; Fiedler, F; Filipčič, A; Filipuzzi, M; Filthaut, F; Fincke-Keeler, M; Finelli, K D; Fiolhais, M C N; Fiorini, L; Fischer, A; Fischer, C; Fischer, J; Fisher, W C; Flaschel, N; Fleck, I; Fleischmann, P; Fletcher, R R M; Flick, T; Flierl, B M; Flores Castillo, L R; Flowerdew, M J; Forcolin, G T; Formica, A; Förster, F A; Forti, A; Foster, A G; Fournier, D; Fox, H; Fracchia, S; Francavilla, P; Franchini, M; Franchino, S; Francis, D; Franconi, L; Franklin, M; Frate, M; Fraternali, M; Freeborn, D; Fressard-Batraneanu, S M; Freund, B; Froidevaux, D; Frost, J A; Fukunaga, C; Fusayasu, T; Fuster, J; Gabaldon, C; Gabizon, O; Gabrielli, A; Gabrielli, A; Gach, G P; Gadatsch, S; Gadomski, S; Gagliardi, G; Gagnon, L G; Galea, C; Galhardo, B; Gallas, E J; Gallop, B J; Gallus, P; Galster, G; Gan, K K; Ganguly, S; Gao, Y; Gao, Y S; Garay Walls, F M; García, C; García Navarro, J E; Garcia-Sciveres, M; Gardner, R W; Garelli, N; Garonne, V; Gascon Bravo, A; Gasnikova, K; Gatti, C; Gaudiello, A; Gaudio, G; Gavrilenko, I L; Gay, C; Gaycken, G; Gazis, E N; Gee, C N P; Geisen, J; Geisen, M; Geisler, M P; Gellerstedt, K; Gemme, C; Genest, M H; Geng, C; Gentile, S; Gentsos, C; George, S; Gerbaudo, D; Gershon, A; Geßner, G; Ghasemi, S; Ghneimat, M; Giacobbe, B; Giagu, S; Giannetti, P; Gibson, S M; Gignac, M; Gilchriese, M; Gillberg, D; Gilles, G; Gingrich, D M; Giokaris, N; Giordani, M P; Giorgi, F M; Giraud, P F; Giromini, P; Giugni, D; Giuli, F; Giuliani, C; Giulini, M; Gjelsten, B K; Gkaitatzis, S; Gkialas, I; Gkougkousis, E L; Gkountoumis, P; Gladilin, L K; Glasman, C; Glatzer, J; Glaysher, P C F; Glazov, A; Goblirsch-Kolb, M; Godlewski, J; Goldfarb, S; Golling, T; Golubkov, D; Gomes, A; Gonçalo, R; Goncalves Gama, R; Goncalves Pinto Firmino Da Costa, J; Gonella, G; Gonella, L; Gongadze, A; González de la Hoz, S; Gonzalez-Sevilla, S; Goossens, L; Gorbounov, P A; Gordon, H A; Gorelov, I; Gorini, B; Gorini, E; Gorišek, A; Goshaw, A T; Gössling, C; Gostkin, M I; Gottardo, C A; Goudet, C R; Goujdami, D; Goussiou, A G; Govender, N; Gozani, E; Graber, L; Grabowska-Bold, I; Gradin, P O J; Gramling, J; Gramstad, E; Grancagnolo, S; Gratchev, V; Gravila, P M; Gray, C; Gray, H M; Greenwood, Z D; Grefe, C; Gregersen, K; Gregor, I M; Grenier, P; Grevtsov, K; Griffiths, J; Grillo, A A; Grimm, K; Grinstein, S; Gris, Ph; Grivaz, J-F; Groh, S; Gross, E; Grosse-Knetter, J; Grossi, G C; Grout, Z J; Grummer, A; Guan, L; Guan, W; Guenther, J; Guescini, F; Guest, D; Gueta, O; Gui, B; Guido, E; Guillemin, T; Guindon, S; Gul, U; Gumpert, C; Guo, J; Guo, W; Guo, Y; Gupta, R; Gupta, S; Gustavino, G; Gutierrez, P; Gutierrez Ortiz, N G; Gutschow, C; Guyot, C; Guzik, M P; Gwenlan, C; Gwilliam, C B; Haas, A; Haber, C; Hadavand, H K; Haddad, N; Hadef, A; Hageböck, S; Hagihara, M; Hakobyan, H; Haleem, M; Haley, J; Halladjian, G; Hallewell, G D; Hamacher, K; Hamal, P; Hamano, K; Hamilton, A; Hamity, G N; Hamnett, P G; Han, L; Han, S; Hanagaki, K; Hanawa, K; Hance, M; Haney, B; Hanke, P; Hansen, J B; Hansen, J D; Hansen, M C; Hansen, P H; Hara, K; Hard, A S; Harenberg, T; Hariri, F; Harkusha, S; Harrington, R D; Harrison, P F; Hartmann, N M; Hasegawa, M; Hasegawa, Y; Hasib, A; Hassani, S; Haug, S; Hauser, R; Hauswald, L; Havener, L B; Havranek, M; Hawkes, C M; Hawkings, R J; Hayakawa, D; Hayden, D; Hays, C P; Hays, J M; Hayward, H S; Haywood, S J; Head, S J; Heck, T; Hedberg, V; Heelan, L; Heidegger, K K; Heim, S; Heim, T; Heinemann, B; Heinrich, J J; Heinrich, L; Heinz, C; Hejbal, J; Helary, L; Held, A; Hellman, S; Helsens, C; Henderson, R C W; Heng, Y; Henkelmann, S; Henriques Correia, A M; Henrot-Versille, S; Herbert, G H; Herde, H; Herget, V; Hernández Jiménez, Y; Herten, G; Hertenberger, R; Hervas, L; Herwig, T C; Hesketh, G G; Hessey, N P; Hetherly, J W; Higashino, S; Higón-Rodriguez, E; Hill, E; Hill, J C; Hiller, K H; Hillier, S J; Hils, M; Hinchliffe, I; Hirose, M; Hirschbuehl, D; Hiti, B; Hladik, O; Hoad, X; Hobbs, J; Hod, N; Hodgkinson, M C; Hodgson, P; Hoecker, A; Hoeferkamp, M R; Hoenig, F; Hohn, D; Holmes, T R; Homann, M; Honda, S; Honda, T; Hong, T M; Hooberman, B H; Hopkins, W H; Horii, Y; Horton, A J; Hostachy, J-Y; Hou, S; Hoummada, A; Howarth, J; Hoya, J; Hrabovsky, M; Hrdinka, J; Hristova, I; Hrivnac, J; Hryn'ova, T; Hrynevich, A; Hsu, P J; Hsu, S-C; Hu, Q; Hu, S; Huang, Y; Hubacek, Z; Hubaut, F; Huegging, F; Huffman, T B; Hughes, E W; Hughes, G; Huhtinen, M; Huo, P; Huseynov, N; Huston, J; Huth, J; Iacobucci, G; Iakovidis, G; Ibragimov, I; Iconomidou-Fayard, L; Idrissi, Z; Iengo, P; Igonkina, O; Iizawa, T; Ikegami, Y; Ikeno, M; Ilchenko, Y; Iliadis, D; Ilic, N; Introzzi, G; Ioannou, P; Iodice, M; Iordanidou, K; Ippolito, V; Isacson, M F; Ishijima, N; Ishino, M; Ishitsuka, M; Issever, C; Istin, S; Ito, F; Iturbe Ponce, J M; Iuppa, R; Iwasaki, H; Izen, J M; Izzo, V; Jabbar, S; Jackson, P; Jacobs, R M; Jain, V; Jakobi, K B; Jakobs, K; Jakobsen, S; Jakoubek, T; Jamin, D O; Jana, D K; Jansky, R; Janssen, J; Janus, M; Janus, P A; Jarlskog, G; Javadov, N; Javůrek, T; Javurkova, M; Jeanneau, F; Jeanty, L; Jejelava, J; Jelinskas, A; Jenni, P; Jeske, C; Jézéquel, S; Ji, H; Jia, J; Jiang, H; Jiang, Y; Jiang, Z; Jiggins, S; Jimenez Pena, J; Jin, S; Jinaru, A; Jinnouchi, O; Jivan, H; Johansson, P; Johns, K A; Johnson, C A; Johnson, W J; Jon-And, K; Jones, R W L; Jones, S D; Jones, S; Jones, T J; Jongmanns, J; Jorge, P M; Jovicevic, J; Ju, X; Juste Rozas, A; Köhler, M K; Kaczmarska, A; Kado, M; Kagan, H; Kagan, M; Kahn, S J; Kaji, T; Kajomovitz, E; Kalderon, C W; Kaluza, A; Kama, S; Kamenshchikov, A; Kanaya, N; Kanjir, L; Kantserov, V A; Kanzaki, J; Kaplan, B; Kaplan, L S; Kar, D; Karakostas, K; Karastathis, N; Kareem, M J; Karentzos, E; Karpov, S N; Karpova, Z M; Karthik, K; Kartvelishvili, V; Karyukhin, A N; Kasahara, K; Kashif, L; Kass, R D; Kastanas, A; Kataoka, Y; Kato, C; Katre, A; Katzy, J; Kawade, K; Kawagoe, K; Kawamoto, T; Kawamura, G; Kay, E F; Kazanin, V F; Keeler, R; Kehoe, R; Keller, J S; Kempster, J J; Kendrick, J; Keoshkerian, H; Kepka, O; Kerševan, B P; Kersten, S; Keyes, R A; Khader, M; Khalil-Zada, F; Khanov, A; Kharlamov, A G; Kharlamova, T; Khodinov, A; Khoo, T J; Khovanskiy, V; Khramov, E; Khubua, J; Kido, S; Kilby, C R; Kim, H Y; Kim, S H; Kim, Y K; Kimura, N; Kind, O M; King, B T; Kirchmeier, D; Kirk, J; Kiryunin, A E; Kishimoto, T; Kisielewska, D; Kiuchi, K; Kivernyk, O; Kladiva, E; Klapdor-Kleingrothaus, T; Klein, M H; Klein, M; Klein, U; Kleinknecht, K; Klimek, P; Klimentov, A; Klingenberg, R; Klingl, T; Klioutchnikova, T; Kluge, E-E; Kluit, P; Kluth, S; Kneringer, E; Knoops, E B F G; Knue, A; Kobayashi, A; Kobayashi, D; Kobayashi, T; Kobel, M; Kocian, M; Kodys, P; Koffas, T; Koffeman, E; Köhler, N M; Koi, T; Kolb, M; Koletsou, I; Komar, A A; Komori, Y; Kondo, T; Kondrashova, N; Köneke, K; König, A C; Kono, T; Konoplich, R; Konstantinidis, N; Kopeliansky, R; Koperny, S; Kopp, A K; Korcyl, K; Kordas, K; Korn, A; Korol, A A; Korolkov, I; Korolkova, E V; Kortner, O; Kortner, S; Kosek, T; Kostyukhin, V V; Kotwal, A; Koulouris, A; Kourkoumeli-Charalampidi, A; Kourkoumelis, C; Kourlitis, E; Kouskoura, V; Kowalewska, A B; Kowalewski, R; Kowalski, T Z; Kozakai, C; Kozanecki, W; Kozhin, A S; Kramarenko, V A; Kramberger, G; Krasnopevtsev, D; Krasny, M W; Krasznahorkay, A; Krauss, D; Kremer, J A; Kretzschmar, J; Kreutzfeldt, K; Krieger, P; Krizka, K; Kroeninger, K; Kroha, H; Kroll, J; Kroll, J; Kroseberg, J; Krstic, J; Kruchonak, U; Krüger, H; Krumnack, N; Kruse, M C; Kubota, T; Kucuk, H; Kuday, S; Kuechler, J T; Kuehn, S; Kugel, A; Kuger, F; Kuhl, T; Kukhtin, V; Kukla, R; Kulchitsky, Y; Kuleshov, S; Kulinich, Y P; Kuna, M; Kunigo, T; Kupco, A; Kupfer, T; Kuprash, O; Kurashige, H; Kurchaninov, L L; Kurochkin, Y A; Kurth, M G; Kus, V; Kuwertz, E S; Kuze, M; Kvita, J; Kwan, T; Kyriazopoulos, D; La Rosa, A; Navarro, J L La Rosa; La Rotonda, L; Lacasta, C; Lacava, F; Lacey, J; Lacker, H; Lacour, D; Ladygin, E; Lafaye, R; Laforge, B; Lagouri, T; Lai, S; Lammers, S; Lampl, W; Lançon, E; Landgraf, U; Landon, M P J; Lanfermann, M C; Lang, V S; Lange, J C; Langenberg, R J; Lankford, A J; Lanni, F; Lantzsch, K; Lanza, A; Lapertosa, A; Laplace, S; Laporte, J F; Lari, T; Lasagni Manghi, F; Lassnig, M; Laurelli, P; Lavrijsen, W; Law, A T; Laycock, P; Lazovich, T; Lazzaroni, M; Le, B; Le Dortz, O; Le Guirriec, E; Le Quilleuc, E P; LeBlanc, M; LeCompte, T; Ledroit-Guillon, F; Lee, C A; Lee, G R; Lee, S C; Lee, L; Lefebvre, B; Lefebvre, G; Lefebvre, M; Legger, F; Leggett, C; Lehan, A; Lehmann Miotto, G; Lei, X; Leight, W A; Leite, M A L; Leitner, R; Lellouch, D; Lemmer, B; Leney, K J C; Lenz, T; Lenzi, B; Leone, R; Leone, S; Leonidopoulos, C; Lerner, G; Leroy, C; Lesage, A A J; Lester, C G; Levchenko, M; Levêque, J; Levin, D; Levinson, L J; Levy, M; Lewis, D; Li, B; Li, C; Li, H; Li, L; Li, Q; Li, S; Li, X; Li, Y; Liang, Z; Liberti, B; Liblong, A; Lie, K; Liebal, J; Liebig, W; Limosani, A; Lin, S C; Lin, T H; Lindquist, B E; Lionti, A E; Lipeles, E; Lipniacka, A; Lisovyi, M; Liss, T M; Lister, A; Litke, A M; Liu, B; Liu, H; Liu, H; Liu, J K K; Liu, J; Liu, J B; Liu, K; Liu, L; Liu, M; Liu, Y L; Liu, Y; Livan, M; Lleres, A; Llorente Merino, J; Lloyd, S L; Lo, C Y; Sterzo, F Lo; Lobodzinska, E M; Loch, P; Loebinger, F K; Loesle, A; Loew, K M; Loginov, A; Lohse, T; Lohwasser, K; Lokajicek, M; Long, B A; Long, J D; Long, R E; Longo, L; Looper, K A; Lopez, J A; Lopez Mateos, D; Lopez Paz, I; Lopez Solis, A; Lorenz, J; Lorenzo Martinez, N; Losada, M; Lösel, P J; Lou, X; Lounis, A; Love, J; Love, P A; Lu, H; Lu, N; Lu, Y J; Lubatti, H J; Luci, C; Lucotte, A; Luedtke, C; Luehring, F; Lukas, W; Luminari, L; Lundberg, O; Lund-Jensen, B; Luzi, P M; Lynn, D; Lysak, R; Lytken, E; Lyubushkin, V; Ma, H; Ma, L L; Ma, Y; Maccarrone, G; Macchiolo, A; Macdonald, C M; Maček, B; Machado Miguens, J; Madaffari, D; Madar, R; Mader, W F; Madsen, A; Maeda, J; Maeland, S; Maeno, T; Maevskiy, A S; Magradze, E; Mahlstedt, J; Maiani, C; Maidantchik, C; Maier, A A; Maier, T; Maio, A; Majersky, O; Majewski, S; Makida, Y; Makovec, N; Malaescu, B; Malecki, Pa; Maleev, V P; Malek, F; Mallik, U; Malon, D; Malone, C; Maltezos, S; Malyukov, S; Mamuzic, J; Mancini, G; Mandelli, L; Mandić, I; Maneira, J; Manhaes de Andrade Filho, L; Manjarres Ramos, J; Mann, A; Manousos, A; Mansoulie, B; Mansour, J D; Mantifel, R; Mantoani, M; Manzoni, S; Mapelli, L; Marceca, G; March, L; Marchese, L; Marchiori, G; Marcisovsky, M; Marjanovic, M; Marley, D E; Marroquim, F; Marsden, S P; Marshall, Z; Martensson, M U F; Marti-Garcia, S; Martin, C B; Martin, T A; Martin, V J; Martin Dit Latour, B; Martinez, M; Martinez Outschoorn, V I; Martin-Haugh, S; Martoiu, V S; Martyniuk, A C; Marzin, A; Masetti, L; Mashimo, T; Mashinistov, R; Masik, J; Maslennikov, A L; Massa, L; Mastrandrea, P; Mastroberardino, A; Masubuchi, T; Mättig, P; Maurer, J; Maxfield, S J; Maximov, D A; Mazini, R; Maznas, I; Mazza, S M; Mc Fadden, N C; Mc Goldrick, G; Mc Kee, S P; McCarn, A; McCarthy, R L; McCarthy, T G; McClymont, L I; McDonald, E F; Mcfayden, J A; Mchedlidze, G; McMahon, S J; McNamara, P C; McPherson, R A; Meehan, S; Megy, T J; Mehlhase, S; Mehta, A; Meideck, T; Meier, K; Meirose, B; Melini, D; Mellado Garcia, B R; Mellenthin, J D; Melo, M; Meloni, F; Menary, S B; Meng, L; Meng, X T; Mengarelli, A; Menke, S; Meoni, E; Mergelmeyer, S; Mermod, P; Merola, L; Meroni, C; Merritt, F S; Messina, A; Metcalfe, J; Mete, A S; Meyer, C; Meyer, J-P; Meyer, J; Meyer Zu Theenhausen, H; Miano, F; Middleton, R P; Miglioranzi, S; Mijović, L; Mikenberg, G; Mikestikova, M; Mikuž, M; Milesi, M; Milic, A; Miller, D W; Mills, C; Milov, A; Milstead, D A; Minaenko, A A; Minami, Y; Minashvili, I A; Mincer, A I; Mindur, B; Mineev, M; Minegishi, Y; Ming, Y; Mir, L M; Mistry, K P; Mitani, T; Mitrevski, J; Mitsou, V A; Miucci, A; Miyagawa, P S; Mizukami, A; Mjörnmark, J U; Mkrtchyan, T; Mlynarikova, M; Moa, T; Mochizuki, K; Mogg, P; Mohapatra, S; Molander, S; Moles-Valls, R; Monden, R; Mondragon, M C; Mönig, K; Monk, J; Monnier, E; Montalbano, A; Montejo Berlingen, J; Monticelli, F; Monzani, S; Moore, R W; Morange, N; Moreno, D; Moreno Llácer, M; Morettini, P; Morgenstern, S; Mori, D; Mori, T; Morii, M; Morinaga, M; Morisbak, V; Morley, A K; Mornacchi, G; Morris, J D; Morvaj, L; Moschovakos, P; Mosidze, M; Moss, H J; Moss, J; Motohashi, K; Mount, R; Mountricha, E; Moyse, E J W; Muanza, S; Mudd, R D; Mueller, F; Mueller, J; Mueller, R S P; Muenstermann, D; Mullen, P; Mullier, G A; Munoz Sanchez, F J; Murray, W J; Musheghyan, H; Muškinja, M; Myagkov, A G; Myska, M; Nachman, B P; Nackenhorst, O; Nagai, K; Nagai, R; Nagano, K; Nagasaka, Y; Nagata, K; Nagel, M; Nagy, E; Nairz, A M; Nakahama, Y; Nakamura, K; Nakamura, T; Nakano, I; Naranjo Garcia, R F; Narayan, R; Narrias Villar, D I; Naryshkin, I; Naumann, T; Navarro, G; Nayyar, R; Neal, H A; Nechaeva, P Yu; Neep, T J; Negri, A; Negrini, M; Nektarijevic, S; Nellist, C; Nelson, A; Nelson, M E; Nemecek, S; Nemethy, P; Nessi, M; Neubauer, M S; Neumann, M; Newman, P R; Ng, T Y; Nguyen Manh, T; Nickerson, R B; Nicolaidou, R; Nielsen, J; Nikolaenko, V; Nikolic-Audit, I; Nikolopoulos, K; Nilsen, J K; Nilsson, P; Ninomiya, Y; Nisati, A; Nishu, N; Nisius, R; Nitsche, I; Nobe, T; Noguchi, Y; Nomachi, M; Nomidis, I; Nomura, M A; Nooney, T; Nordberg, M; Norjoharuddeen, N; Novgorodova, O; Nowak, S; Nozaki, M; Nozka, L; Ntekas, K; Nurse, E; Nuti, F; O'connor, K; O'Neil, D C; O'Rourke, A A; O'Shea, V; Oakham, F G; Oberlack, H; Obermann, T; Ocariz, J; Ochi, A; Ochoa, I; Ochoa-Ricoux, J P; Oda, S; Odaka, S; Ogren, H; Oh, A; Oh, S H; Ohm, C C; Ohman, H; Oide, H; Okawa, H; Okumura, Y; Okuyama, T; Olariu, A; Oleiro Seabra, L F; Olivares Pino, S A; Oliveira Damazio, D; Olszewski, A; Olszowska, J; Onofre, A; Onogi, K; Onyisi, P U E; Oreglia, M J; Oren, Y; Orestano, D; Orlando, N; Orr, R S; Osculati, B; Ospanov, R; Otero Y Garzon, G; Otono, H; Ouchrif, M; Ould-Saada, F; Ouraou, A; Oussoren, K P; Ouyang, Q; Owen, M; Owen, R E; Ozcan, V E; Ozturk, N; Pachal, K; Pacheco Pages, A; Pacheco Rodriguez, L; Padilla Aranda, C; Pagan Griso, S; Paganini, M; Paige, F; Palacino, G; Palazzo, S; Palestini, S; Palka, M; Pallin, D; St Panagiotopoulou, E; Panagoulias, I; Pandini, C E; Panduro Vazquez, J G; Pani, P; Panitkin, S; Pantea, D; Paolozzi, L; Papadopoulou, Th D; Papageorgiou, K; Paramonov, A; Paredes Hernandez, D; Parker, A J; Parker, M A; Parker, K A; Parodi, F; Parsons, J A; Parzefall, U; Pascuzzi, V R; Pasner, J M; Pasqualucci, E; Passaggio, S; Pastore, Fr; Pataraia, S; Pater, J R; Pauly, T; Pearson, B; Pedraza Lopez, S; Pedro, R; Peleganchuk, S V; Penc, O; Peng, C; Peng, H; Penwell, J; Peralva, B S; Perego, M M; Perepelitsa, D V; Perini, L; Pernegger, H; Perrella, S; Peschke, R; Peshekhonov, V D; Peters, K; Peters, R F Y; Petersen, B A; Petersen, T C; Petit, E; Petridis, A; Petridou, C; Petroff, P; Petrolo, E; Petrov, M; Petrucci, F; Pettersson, N E; Peyaud, A; Pezoa, R; Phillips, F H; Phillips, P W; Piacquadio, G; Pianori, E; Picazio, A; Piccaro, E; Pickering, M A; Piegaia, R; Pilcher, J E; Pilkington, A D; Pin, A W J; Pinamonti, M; Pinfold, J L; Pirumov, H; Pitt, M; Plazak, L; Pleier, M-A; Pleskot, V; Plotnikova, E; Pluth, D; Podberezko, P; Poettgen, R; Poggi, R; Poggioli, L; Pohl, D; Polesello, G; Poley, A; Policicchio, A; Polifka, R; Polini, A; Pollard, C S; Polychronakos, V; Pommès, K; Ponomarenko, D; Pontecorvo, L; Pope, B G; Popeneciu, G A; Poppleton, A; Pospisil, S; Potamianos, K; Potrap, I N; Potter, C J; Poulard, G; Poulsen, T; Poveda, J; Pozo Astigarraga, M E; Pralavorio, P; Pranko, A; Prell, S; Price, D; Price, L E; Primavera, M; Prince, S; Proklova, N; Prokofiev, K; Prokoshin, F; Protopopescu, S; Proudfoot, J; Przybycien, M; Puri, A; Puzo, P; Qian, J; Qin, G; Qin, Y; Quadt, A; Queitsch-Maitland, M; Quilty, D; Raddum, S; Radeka, V; Radescu, V; Radhakrishnan, S K; Radloff, P; Rados, P; Ragusa, F; Rahal, G; Raine, J A; Rajagopalan, S; Rangel-Smith, C; Rashid, T; Raspopov, S; Ratti, M G; Rauch, D M; Rauscher, F; Rave, S; Ravinovich, I; Rawling, J H; Raymond, M; Read, A L; Readioff, N P; Reale, M; Rebuzzi, D M; Redelbach, A; Redlinger, G; Reece, R; Reed, R G; Reeves, K; Rehnisch, L; Reichert, J; Reiss, A; Rembser, C; Ren, H; Rescigno, M; Resconi, S; Resseguie, E D; Rettie, S; Reynolds, E; Rezanova, O L; Reznicek, P; Rezvani, R; Richter, R; Richter, S; Richter-Was, E; Ricken, O; Ridel, M; Rieck, P; Riegel, C J; Rieger, J; Rifki, O; Rijssenbeek, M; Rimoldi, A; Rimoldi, M; Rinaldi, L; Ripellino, G; Ristić, B; Ritsch, E; Riu, I; Rizatdinova, F; Rizvi, E; Rizzi, C; Roberts, R T; Robertson, S H; Robichaud-Veronneau, A; Robinson, D; Robinson, J E M; Robson, A; Rocco, E; Roda, C; Rodina, Y; Rodriguez Bosca, S; Rodriguez Perez, A; Rodriguez Rodriguez, D; Roe, S; Rogan, C S; Røhne, O; Roloff, J; Romaniouk, A; Romano, M; Romano Saez, S M; Romero Adam, E; Rompotis, N; Ronzani, M; Roos, L; Rosati, S; Rosbach, K; Rose, P; Rosien, N-A; Rossi, E; Rossi, L P; Rosten, J H N; Rosten, R; Rotaru, M; Roth, I; Rothberg, J; Rousseau, D; Rozanov, A; Rozen, Y; Ruan, X; Rubbo, F; Rühr, F; Ruiz-Martinez, A; Rurikova, Z; Rusakovich, N A; Russell, H L; Rutherfoord, J P; Ruthmann, N; Ryabov, Y F; Rybar, M; Rybkin, G; Ryu, S; Ryzhov, A; Rzehorz, G F; Saavedra, A F; Sabato, G; Sacerdoti, S; Sadrozinski, H F-W; Sadykov, R; Safai Tehrani, F; Saha, P; Sahinsoy, M; Saimpert, M; Saito, M; Saito, T; Sakamoto, H; Sakurai, Y; Salamanna, G; Salazar Loyola, J E; Salek, D; Sales De Bruin, P H; Salihagic, D; Salnikov, A; Salt, J; Salvatore, D; Salvatore, F; Salvucci, A; Salzburger, A; Sammel, D; Sampsonidis, D; Sampsonidou, D; Sánchez, J; Sanchez Martinez, V; Sanchez Pineda, A; Sandaker, H; Sandbach, R L; Sander, C O; Sandhoff, M; Sandoval, C; Sankey, D P C; Sannino, M; Sansoni, A; Santoni, C; Santonico, R; Santos, H; Santoyo Castillo, I; Sapronov, A; Saraiva, J G; Sarrazin, B; Sasaki, O; Sato, K; Sauvan, E; Savage, G; Savard, P; Savic, N; Sawyer, C; Sawyer, L; Saxon, J; Sbarra, C; Sbrizzi, A; Scanlon, T; Scannicchio, D A; Scarcella, M; Scarfone, V; Schaarschmidt, J; Schacht, P; Schachtner, B M; Schaefer, D; Schaefer, L; Schaefer, R; Schaeffer, J; Schaepe, S; Schaetzel, S; Schäfer, U; Schaffer, A C; Schaile, D; Schamberger, R D; Scharf, V; Schegelsky, V A; Scheirich, D; Schernau, M; Schiavi, C; Schier, S; Schildgen, L K; Schillo, C; Schioppa, M; Schlenker, S; Schmidt-Sommerfeld, K R; Schmieden, K; Schmitt, C; Schmitt, S; Schmitz, S; Schnoor, U; Schoeffel, L; Schoening, A; Schoenrock, B D; Schopf, E; Schott, M; Schouwenberg, J F P; Schovancova, J; Schramm, S; Schuh, N; Schulte, A; Schultens, M J; Schultz-Coulon, H-C; Schulz, H; Schumacher, M; Schumm, B A; Schune, Ph; Schwartzman, A; Schwarz, T A; Schweiger, H; Schwemling, Ph; Schwienhorst, R; Schwindling, J; Sciandra, A; Sciolla, G; Scuri, F; Scutti, F; Searcy, J; Seema, P; Seidel, S C; Seiden, A; Seixas, J M; Sekhniaidze, G; Sekhon, K; Sekula, S J; Semprini-Cesari, N; Senkin, S; Serfon, C; Serin, L; Serkin, L; Sessa, M; Seuster, R; Severini, H; Sfiligoj, T; Sforza, F; Sfyrla, A; Shabalina, E; Shaikh, N W; Shan, L Y; Shang, R; Shank, J T; Shapiro, M; Shatalov, P B; Shaw, K; Shaw, S M; Shcherbakova, A; Shehu, C Y; Shen, Y; Sherafati, N; Sherwood, P; Shi, L; Shimizu, S; Shimmin, C O; Shimojima, M; Shipsey, I P J; Shirabe, S; Shiyakova, M; Shlomi, J; Shmeleva, A; Shoaleh Saadi, D; Shochet, M J; Shojaii, S; Shope, D R; Shrestha, S; Shulga, E; Shupe, M A; Sicho, P; Sickles, A M; Sidebo, P E; Sideras Haddad, E; Sidiropoulou, O; Sidoti, A; Siegert, F; Sijacki, Dj; Silva, J; Silverstein, S B; Simak, V; Simic, Lj; Simion, S; Simioni, E; Simmons, B; Simon, M; Sinervo, P; Sinev, N B; Sioli, M; Siragusa, G; Siral, I; Sivoklokov, S Yu; Sjölin, J; Skinner, M B; Skubic, P; Slater, M; Slavicek, T; Slawinska, M; Sliwa, K; Slovak, R; Smakhtin, V; Smart, B H; Smiesko, J; Smirnov, N; Smirnov, S Yu; Smirnov, Y; Smirnova, L N; Smirnova, O; Smith, J W; Smith, M N K; Smith, R W; Smizanska, M; Smolek, K; Snesarev, A A; Snyder, I M; Snyder, S; Sobie, R; Socher, F; Soffer, A; Soh, D A; Sokhrannyi, G; Solans Sanchez, C A; Solar, M; Soldatov, E Yu; Soldevila, U; Solodkov, A A; Soloshenko, A; Solovyanov, O V; Solovyev, V; Sommer, P; Son, H; Sopczak, A; Sosa, D; Sotiropoulou, C L; Soualah, R; Soukharev, A M; South, D; Sowden, B C; Spagnolo, S; Spalla, M; Spangenberg, M; Spanò, F; Sperlich, D; Spettel, F; Spieker, T M; Spighi, R; Spigo, G; Spiller, L A; Spousta, M; St Denis, R D; Stabile, A; Stamen, R; Stamm, S; Stanecka, E; Stanek, R W; Stanescu, C; Stanitzki, M M; Stapf, B S; Stapnes, S; Starchenko, E A; Stark, G H; Stark, J; Stark, S H; Staroba, P; Starovoitov, P; Stärz, S; Staszewski, R; Steinberg, P; Stelzer, B; Stelzer, H J; Stelzer-Chilton, O; Stenzel, H; Stewart, G A; Stockton, M C; Stoebe, M; Stoicea, G; Stolte, P; Stonjek, S; Stradling, A R; Straessner, A; Stramaglia, M E; Strandberg, J; Strandberg, S; Strauss, M; Strizenec, P; Ströhmer, R; Strom, D M; Stroynowski, R; Strubig, A; Stucci, S A; Stugu, B; Styles, N A; Su, D; Su, J; Suchek, S; Sugaya, Y; Suk, M; Sulin, V V; Sultan, D M S; Sultansoy, S; Sumida, T; Sun, S; Sun, X; Suruliz, K; Suster, C J E; Sutton, M R; Suzuki, S; Svatos, M; Swiatlowski, M; Swift, S P; Sykora, I; Sykora, T; Ta, D; Tackmann, K; Taenzer, J; Taffard, A; Tafirout, R; Taiblum, N; Takai, H; Takashima, R; Takasugi, E H; Takeshita, T; Takubo, Y; Talby, M; Talyshev, A A; Tanaka, J; Tanaka, M; Tanaka, R; Tanaka, S; Tanioka, R; Tannenwald, B B; Tapia Araya, S; Tapprogge, S; Tarem, S; Tartarelli, G F; Tas, P; Tasevsky, M; Tashiro, T; Tassi, E; Tavares Delgado, A; Tayalati, Y; Taylor, A C; Taylor, G N; Taylor, P T E; Taylor, W; Teixeira-Dias, P; Temple, D; Ten Kate, H; Teng, P K; Teoh, J J; Tepel, F; Terada, S; Terashi, K; Terron, J; Terzo, S; Testa, M; Teuscher, R J; Theveneaux-Pelzer, T; Thomas, J P; Thomas-Wilsker, J; Thompson, P D; Thompson, A S; Thomsen, L A; Thomson, E; Tibbetts, M J; Ticse Torres, R E; Tikhomirov, V O; Tikhonov, Yu A; Timoshenko, S; Tipton, P; Tisserant, S; Todome, K; Todorova-Nova, S; Tojo, J; Tokár, S; Tokushuku, K; Tolley, E; Tomlinson, L; Tomoto, M; Tompkins, L; Toms, K; Tong, B; Tornambe, P; Torrence, E; Torres, H; Torró Pastor, E; Toth, J; Touchard, F; Tovey, D R; Treado, C J; Trefzger, T; Tresoldi, F; Tricoli, A; Trigger, I M; Trincaz-Duvoid, S; Tripiana, M F; Trischuk, W; Trocmé, B; Trofymov, A; Troncon, C; Trottier-McDonald, M; Trovatelli, M; Truong, L; Trzebinski, M; Trzupek, A; Tsang, K W; Tseng, J C-L; Tsiareshka, P V; Tsipolitis, G; Tsirintanis, N; Tsiskaridze, S; Tsiskaridze, V; Tskhadadze, E G; Tsui, K M; Tsukerman, I I; Tsulaia, V; Tsuno, S; Tsybychev, D; Tu, Y; Tudorache, A; Tudorache, V; Tulbure, T T; Tuna, A N; Tupputi, S A; Turchikhin, S; Turgeman, D; Turk Cakir, I; Turra, R; Tuts, P M; Ucchielli, G; Ueda, I; Ughetto, M; Ukegawa, F; Unal, G; Undrus, A; Unel, G; Ungaro, F C; Unno, Y; Unverdorben, C; Urban, J; Urquijo, P; Urrejola, P; Usai, G; Usui, J; Vacavant, L; Vacek, V; Vachon, B; Valderanis, C; Valdes Santurio, E; Valentinetti, S; Valero, A; Valéry, L; Valkar, S; Vallier, A; Valls Ferrer, J A; Van Den Wollenberg, W; van der Graaf, H; van Gemmeren, P; Van Nieuwkoop, J; van Vulpen, I; van Woerden, M C; Vanadia, M; Vandelli, W; Vaniachine, A; Vankov, P; Vardanyan, G; Vari, R; Varnes, E W; Varni, C; Varol, T; Varouchas, D; Vartapetian, A; Varvell, K E; Vasquez, J G; Vasquez, G A; Vazeille, F; Vazquez Schroeder, T; Veatch, J; Veeraraghavan, V; Veloce, L M; Veloso, F; Veneziano, S; Ventura, A; Venturi, M; Venturi, N; Venturini, A; Vercesi, V; Verducci, M; Verkerke, W; Vermeulen, A T; Vermeulen, J C; Vetterli, M C; Viaux Maira, N; Viazlo, O; Vichou, I; Vickey, T; Vickey Boeriu, O E; Viehhauser, G H A; Viel, S; Vigani, L; Villa, M; Villaplana Perez, M; Vilucchi, E; Vincter, M G; Vinogradov, V B; Vishwakarma, A; Vittori, C; Vivarelli, I; Vlachos, S; Vlasak, M; Vogel, M; Vokac, P; Volpi, G; von der Schmitt, H; von Toerne, E; Vorobel, V; Vorobev, K; Vos, M; Voss, R; Vossebeld, J H; Vranjes, N; Vranjes Milosavljevic, M; Vrba, V; Vreeswijk, M; Vuillermet, R; Vukotic, I; Wagner, P; Wagner, W; Wagner-Kuhr, J; Wahlberg, H; Wahrmund, S; Wakabayashi, J; Walder, J; Walker, R; Walkowiak, W; Wallangen, V; Wang, C; Wang, C; Wang, F; Wang, H; Wang, H; Wang, J; Wang, J; Wang, Q; Wang, R; Wang, S M; Wang, T; Wang, W; Wang, W; Wang, Z; Wanotayaroj, C; Warburton, A; Ward, C P; Wardrope, D R; Washbrook, A; Watkins, P M; Watson, A T; Watson, M F; Watts, G; Watts, S; Waugh, B M; Webb, A F; Webb, S; Weber, M S; Weber, S W; Weber, S A; Webster, J S; Weidberg, A R; Weinert, B; Weingarten, J; Weirich, M; Weiser, C; Weits, H; Wells, P S; Wenaus, T; Wengler, T; Wenig, S; Wermes, N; Werner, M D; Werner, P; Wessels, M; Whalen, K; Whallon, N L; Wharton, A M; White, A S; White, A; White, M J; White, R; Whiteson, D; Whitmore, B W; Wickens, F J; Wiedenmann, W; Wielers, M; Wiglesworth, C; Wiik-Fuchs, L A M; Wildauer, A; Wilk, F; Wilkens, H G; Williams, H H; Williams, S; Willis, C; Willocq, S; Wilson, J A; Wingerter-Seez, I; Winkels, E; Winklmeier, F; Winston, O J; Winter, B T; Wittgen, M; Wobisch, M; Wolf, T M H; Wolff, R; Wolter, M W; Wolters, H; Wong, V W S; Worm, S D; Wosiek, B K; Wotschack, J; Wozniak, K W; Wu, M; Wu, S L; Wu, X; Wu, Y; Wyatt, T R; Wynne, B M; Xella, S; Xi, Z; Xia, L; Xu, D; Xu, L; Yabsley, B; Yacoob, S; Yamaguchi, D; Yamaguchi, Y; Yamamoto, A; Yamamoto, S; Yamanaka, T; Yamatani, M; Yamauchi, K; Yamazaki, Y; Yan, Z; Yang, H; Yang, H; Yang, Y; Yang, Z; Yao, W-M; Yap, Y C; Yasu, Y; Yatsenko, E; Yau Wong, K H; Ye, J; Ye, S; Yeletskikh, I; Yigitbasi, E; Yildirim, E; Yorita, K; Yoshihara, K; Young, C; Young, C J S; Yu, J; Yu, J; Yuen, S P Y; Yusuff, I; Zabinski, B; Zacharis, G; Zaidan, R; Zaitsev, A M; Zakharchuk, N; Zalieckas, J; Zaman, A; Zambito, S; Zanzi, D; Zeitnitz, C; Zemla, A; Zeng, J C; Zeng, Q; Zenin, O; Ženiš, T; Zerwas, D; Zhang, D; Zhang, F; Zhang, G; Zhang, H; Zhang, J; Zhang, L; Zhang, L; Zhang, M; Zhang, P; Zhang, R; Zhang, R; Zhang, X; Zhang, Y; Zhang, Z; Zhao, X; Zhao, Y; Zhao, Z; Zhemchugov, A; Zhou, B; Zhou, C; Zhou, L; Zhou, M; Zhou, M; Zhou, N; Zhu, C G; Zhu, H; Zhu, J; Zhu, Y; Zhuang, X; Zhukov, K; Zibell, A; Zieminska, D; Zimine, N I; Zimmermann, C; Zimmermann, S; Zinonos, Z; Zinser, M; Ziolkowski, M; Živković, L; Zobernig, G; Zoccoli, A; Zou, R; Zur Nedden, M; Zwalinski, L
Results of a search for physics beyond the Standard Model in events containing an energetic photon and large missing transverse momentum with the ATLAS detector at the Large Hadron Collider are reported. As the number of events observed in data, corresponding to an integrated luminosity of 36.1 fb[Formula: see text] of proton-proton collisions at a centre-of-mass energy of [Formula: see text], is in agreement with the Standard Model expectations, model-independent limits are set on the fiducial cross section for the production of events in this final state. Exclusion limits are also placed in models where dark-matter candidates are pair-produced. For dark-matter production via an axial-vector or a vector mediator in the s -channel, this search excludes mediator masses below 750-[Formula: see text] for dark-matter candidate masses below 230-[Formula: see text] at 95% confidence level, depending on the couplings. In an effective theory of dark-matter production, the limits restrict the value of the suppression scale [Formula: see text] to be above [Formula: see text] at 95% confidence level. A limit is also reported on the production of a high-mass scalar resonance by processes beyond the Standard Model, in which the resonance decays to [Formula: see text] and the Z boson subsequently decays into neutrinos.
Stockwell, Peter; Mahlberg, Michaela
We suggest an innovative approach to literary discourse by using corpus linguistic methods to address research questions from cognitive poetics. In this article, we focus on the way that readers engage in mind-modelling in the process of characterisation. The article sets out our cognitive poetic model of characterisation that emphasises the continuity between literary characterisation and real-life human relationships. The model also aims to deal with the modelling of the author's mind in line with the modelling of the minds of fictional characters. Crucially, our approach to mind-modelling is text-driven. Therefore we are able to employ corpus linguistic techniques systematically to identify textual patterns that function as cues triggering character information. In this article, we explore our understanding of mind-modelling through the characterisation of Mr. Dick from David Copperfield by Charles Dickens. Using the CLiC tool (Corpus Linguistics in Cheshire) developed for the exploration of 19th-century fiction, we investigate the textual traces in non-quotations around this character, in order to draw out the techniques of characterisation other than speech presentation. We show that Mr. Dick is a thematically and authorially significant character in the novel, and we move towards a rigorous account of the reader's modelling of authorial intention.
Full Text Available The study of the world’s verbal arts offers an opportunity to consider ways that computational analysis and modeling of narratives may lead to new understandings of how they are constructed, their dynamics and relationships. Similarly, as corpus linguistics operations must define metrics, it offers an occasion to review basic interpretive concepts such as “units of analysis, context, and genre." My essay begins with an admittedly cursory overview from a novice perspective of what capabilities corpus linguistics currently possesses for the analysis and modeling of narratives. Consideration is given to the epistemological issue in the social sciences with the positivistic prescription or empiricist description of units of analysis and the potential pitfalls or advantages corpus linguistics encounters in searching for adequate equivalent terms. This review leads naturally to reflection on the crucial determinative action of context on meaning and the extent to which current computational interfaces are able to account for and integrate into global analysis of linguistic and performance dimensions such as performer, intonation, gesture, diction, idioms and figurative language, setting, audience, time, and occasion. As a tentative conclusion from this review, it can be stated that artificial intelligence for modeling narratives or devising narrative algorithms must develop capacities to account for performance dimensions in order to fulfill their analytical potential.
Full Text Available In this paper, we investigate the relationship between log file records and corpus frequency. The study was motivated by practical considerations of how best to keep an already existing corpus-based dictionary updated. Should the next word in the dictionary be the one that follows next on a list of declining corpus frequency? Or the one that users most frequently look up but don’t find? In order to establish manageable criteria, we analysed log files for The Danish Dictionary from 2009 to 2012 and compared the list of most popular words looked up by the users with the frequency of the same words in the corpus underlying The Danish Dictionary. The users’ actual search behaviour was analysed in order to find answers to questions such as these: Are there words which are never looked up? If so, can we say something meaningful about their corpus frequency patterns – do they belong to particular parts of speech, are they particularly frequent or infrequent, could it even be that the pattern is cumulative, in such a way that a particular threshold can be identified? Ultimately, the question is whether it makes sense to use corpus frequency as a criterion for lemma selection.
Marco González T
Full Text Available The objective of the study was to determine the volume, weight, measures, ovarian location and shape of the corpus luteum of pregnant and non - pregnant cows from zebu cows of the Colombian tropics. 528 reproductive tracts were collected; 264 pregnant and 264 non-pregnant of cows benefited at the local slaughterhouse in Monteria, Córdoba, Colombia. The period of collection of the samples was extended for three months. After collection of each reproductive tract, the ovaries were separated, identified as right and left, weighed and measured. Then the drawing of the location of the corpus luteum was performed on the ovary according to the anatomical planes previously established in the corresponding form. Subsequently the corpus luteum was removed to perform their measurements, weighings and visualization of their shape. There were statistical differences between the locations of the corpus luteum in the ovary: Anterior pole, posterior pole, free edge, upper face and lower face (p≤0.05. The weight and volume of gestational corpus luteum was greater by 30 and 27.9% than the corpus luteum of non-pregnant cows. The predominant form por shape of the corpus luteum in both pregnant and non-pregnant cows was oval, then pyramidal and finally rounded. No gestation was observed contralateral to the location of the corpus luteum.
Full Text Available Increasing class sizes to gain economies of scale have resulted in less interaction between lecturers and students during lectures. This paper presented the results of a pilot study that set out to examine the use of applications on personally owned devices (APODs to enhance student interaction, participation and engagement in large lectures. The pilot study commences with the development and trial of a text messaging based application, and after a survey of students regarding ownership levels of mobile devices, concludes with the trial of an application developed for mobile devices. The conclusions of the paper highlight that the use of APODs can significantly increase student interaction, participation and engagement in large lectures and identifies implications and opportunities for further research.
Vydiswaran, V G Vinod; Mei, Qiaozhu; Hanauer, David A; Zheng, Kai
Community-generated text corpora can be a valuable resource to extract consumer health vocabulary (CHV) and link them to professional terminologies and alternative variants. In this research, we propose a pattern-based text-mining approach to identify pairs of CHV and professional terms from Wikipedia, a large text corpus created and maintained by the community. A novel measure, leveraging the ratio of frequency of occurrence, was used to differentiate consumer terms from professional terms. We empirically evaluated the applicability of this approach using a large data sample consisting of MedLine abstracts and all posts from an online health forum, MedHelp. The results show that the proposed approach is able to identify synonymous pairs and label the terms as either consumer or professional term with high accuracy. We conclude that the proposed approach provides great potential to produce a high quality CHV to improve the performance of computational applications in processing consumer-generated health text.
Pechenick, Eitan Adam; Danforth, Christopher M; Dodds, Peter Sheridan
It is tempting to treat frequency trends from the Google Books data sets as indicators of the "true" popularity of various words and phrases. Doing so allows us to draw quantitatively strong conclusions about the evolution of cultural perception of a given topic, such as time or gender. However, the Google Books corpus suffers from a number of limitations which make it an obscure mask of cultural popularity. A primary issue is that the corpus is in effect a library, containing one of each book. A single, prolific author is thereby able to noticeably insert new phrases into the Google Books lexicon, whether the author is widely read or not. With this understood, the Google Books corpus remains an important data set to be considered more lexicon-like than text-like. Here, we show that a distinct problematic feature arises from the inclusion of scientific texts, which have become an increasingly substantive portion of the corpus throughout the 1900 s. The result is a surge of phrases typical to academic articles but less common in general, such as references to time in the form of citations. We use information theoretic methods to highlight these dynamics by examining and comparing major contributions via a divergence measure of English data sets between decades in the period 1800-2000. We find that only the English Fiction data set from the second version of the corpus is not heavily affected by professional texts. Overall, our findings call into question the vast majority of existing claims drawn from the Google Books corpus, and point to the need to fully characterize the dynamics of the corpus before using these data sets to draw broad conclusions about cultural and linguistic evolution.
Jelena Kuvač Kraljević
Full Text Available Interest in spoken-language corpora has increased over the past two decades leading to the development of new corpora and the discovery of new facets of spoken language. These types of corpora represent the most comprehensive data source about the language of ordinary speakers. Such corpora are based on spontaneous, unscripted speech defined by a variety of styles, registers and dialects. The aim of this paper is to present the Croatian Adult Spoken Language Corpus (HrAL, its structure and its possible applications in different linguistic subfields. HrAL was built by sampling spontaneous conversations among 617 speakers from all Croatian counties, and it comprises more than 250,000 tokens and more than 100,000 types. Data were collected during three time slots: from 2010 to 2012, from 2014 to 2015 and during 2016. HrAL is today available within TalkBank, a large database of spoken-language corpora covering different languages (https://talkbank.org, in the Conversational Analyses corpora within the subsection titled Conversational Banks. Data were transcribed, coded and segmented using the transcription format Codes for Human Analysis of Transcripts (CHAT and the Computerised Language Analysis (CLAN suite of programmes within the TalkBank toolkit. Speech streams were segmented into communication units (C-units based on syntactic criteria. Most transcripts were linked to their source audios. The TalkBank is public free, i.e. all data stored in it can be shared by the wider community in accordance with the basic rules of the TalkBank. HrAL provides information about spoken grammar and lexicon, discourse skills, error production and productivity in general. It may be useful for sociolinguistic research and studies of synchronic language changes in Croatian.
Full Text Available
ABSTRACT: On the basis of sample analysis of a Czech adjective, a definition based on the data drawn from the Czech National Corpus (cf. Čermák and Schmiedtová 2003 is gradually compiled and finally offered, pointing at the drawbacks of definitions found in traditional dictionaries. Steps undertaken here are then generalized and used, in an ordered sequence (similar to a work-flow ordering, as topics, briefly discussed in the second part to which lexicographers of monolingual dictionaries should pay attention. These are supplemented by additional remarks and caveats useful in the compilation of a dictionary. Thus, a brief survey of some of the major steps of dictionary compilation is presented here, supplemented by the original Czech data, analyzed in their raw, though semiotically classified form.
OPSOMMING: Aantekeninge oor die samestelling van 'n korpusgebaseerde woordeboek. Op grond van 'n steekproefontleding van 'n Tsjeggiese adjektief, word 'n definisie gebaseer op data ontleen aan die Tsjeggiese Nasionale Korpus (cf. Čermák en Schmiedtová 2003 geleidelik saamgestel en uiteindelik aangebied wat wys op die gebreke van definisies aangetref in tradisionele woordeboeke. Stappe wat hier onderneem word, word dan veralgemeen en gebruik in 'n geordende reeks (soortgelyk aan 'n werkvloeiordening, as onderwerpe, kortliks bespreek in die tweede deel, waaraan leksikograwe van eentalige woordeboeke aandag behoort te gee. Hulle word aangevul deur bykomende opmerkings en waarskuwings wat nuttig is vir die samestelling van 'n woordeboek. Op dié manier word 'n kort oorsig van sommige van die hoofstappe van woordeboeksamestelling hier aangebied, aangevul deur die oorspronklike Tsjeggiese data, ontleed in hul onbewerkte, alhoewel semioties geklassifiseerde vorm.
Sleutelwoorde: EENTALIGE WOORDEBOEKE, KORPUSLEKSIKOGRAFIE, SINTAGMATIEK EN PARADIGMATIEK IN WOORDEBOEKE, WOORDEBOEKINSKRYWING, SOORTE LEMMAS, PRAGMATIEK, BEHANDELING VAN
Full Text Available The aim of this article is to establish what perspectives exist on inner change within the “Corpus Paulinum” and how it should be applied in pastoral counselling. The Scriptural guidelines of change that will be examined for the purposes of this article, are found in the following references: Ephesians 4:22-24, Colos- sians 3:8-10, and Romans 12:1-2. The work of the Holy Spirit as “Agent of change” will also be discussed and finally some pointers on inner change and the implications for pastoral counselling will be proposed.
The article also presents the ‘sketch grammar’ (the basis for the word sketches in detail, describes the process of building and processing the corpus, and considers the role of the corpus in additional research on Arabic.
McEnery and Wilson (1996: 32) stress the importance of a corpus: 'As a stan- ... close to five million running words, and the Ndebele corpus at around three ... since their introduction and reinforcement through the second form of contact.
Hu, Chunyu; Liu, Huijie
A historical perspective on economy metaphor can shed new lights on economic thoughts. Based on the TIME Magazine Corpus (TMC), this paper investigates inflation metaphor over 83 years and compares findings against the economic data over the relatively corresponding period. The results show how inflation, an abstract concept and a normal economic…
Shirani, Shapour; Rekabi, Vahab; Kamalian, Naser
Sirenomelia is a very rare anomaly presented with fusion of the lower limbs. Genitourinary, neural tube, and vertebral anomalies are found in most cases. We report a case of sirenomelia with agenesis of corpus callosum, which has not been reported previously.
Beek, Leonoor Johanneke van der
In this dissertation, corpus data is applied in various kinds of linguistic analyses. The data serves as a source of examples and counterexamples in a theoretical linguistic analysis of the Dutch cleft construction, as the source of quantitative data in a probabilistic account of the dative
Erwich, C.M.; Kingham, Cody
Text-Fabric (TF) is a promising new framework for the Eep Talstra Center for Bible and Computer corpus plus (linguistic) annotations. TF is a Python 3.x software package that provides scientific, accessible and reproducible ways of processing Biblical Hebrew text data. It also allows sharing the
Full Text Available The aim of the paper is to provide a quantitative description of legal Chinese. This study adopts the approach of corpus-based analyses and it shows basic statistical parameters of legal texts in Chinese, namely the length of a sentence, the proportion of part of speech etc. The research is conducted on the Chinese monolingual corpus Hanku. The paper also discusses the issues of statistical data processing from various corpora, e.g. the tokenisation and part of speech tagging and their relevance to study of registers variation.
Bingel, Joachim; Diewald, Nils
. In this paper, we present KoralQuery, a JSON-LD based general corpus query protocol, aiming to be independent of particular QLs, tasks and corpus formats. In addition to describing the system of types and operations that KoralQuery is built on, we exemplify the representation of corpus queries in the serialized...
In this article the integrated corpus query functionality of the dictionary compilation software TshwaneLex is analysed. Attention is given to the handling of both raw corpus data and annotated corpus data. With regard to the latter it is shown how, with a minimum of human effort, machine learning techniques can be employed ...
With a good corpus, data can be provided giving an authoritative body of linguistic evidence which can support generalisations and against which hypotheses can be tested. As this proves the invaluable status of a corpus, the article assesses the processing of the Shona corpus and discusses how some aspects of the ...
Full Text Available Many translation scholars have proposed the use of corpora to allow professional translators to produce high quality texts which read like originals. Yet, the diffusion of this methodology has been modest, one reason being the fact that software for corpora analyses have been developed with the linguist in mind, which means that they are generally complex and cumbersome, offering many advanced features, but lacking the level of usability and the specific features that meet translators’ needs. To overcome this shortcoming, we have developed TranslatorBank, a free corpus creation and analysis tool designed for translation tasks. TranslatorBank supports the creation of specialized monolingual corpora from the web; it includes a concordancer with a query system similar to a search engine; it uses basic statistical measures to indicate the reliability of results; it accesses the original documents directly for more contextual information; it includes a statistical and linguistic terminology extraction utility to extract the relevant terminology of the domain and the typical collocations of a given term. Designed to be easy and intuitive to use, the tool may help translation students as well as professionals to increase their translation quality by adhering to the specific linguistic variety of the target text corpus.
The aim of HuLCC (the human language chorus corpus), is to provide a resource of sufficient size to facilitate inter-language analysis by incorporating languages from all the major language families: for the first time all aspects of typology will be incorporated within a single corpus, adhering to a consistent grammatical classification and granularity, which historically adopt a plethora of disparate schemes. An added feature will be the inclusion of a common text element, which will be translated across all languages, to provide a precise comparable thread for detailed linguistic analysis for translation strategies and a mechanism by which these mappings can be explicitly achieved. Methods developed to solve unambiguous mappings across these languages can then be adopted for any subsequent message authored by the SETI community. Initially, it is planned to provide at least 20,000 words for each chosen language, as this amount of text exceeds the point where randomly generated text can be disambiguated from natural language and is of sufficient size useful for message transmission  (Elliot, 2002). This paper details the design of this resource, which ultimately will be made available to SETI upon its completion, and discusses issues 'core' to any message construction.
Nundy, Shantanu; Razi, Rabia R; Dick, Jonathan J; Smith, Bryan; Mayo, Ainoa; O'Connor, Anne; Meltzer, David O
There is increasing interest in finding novel approaches to reduce health disparities in readmissions for acute decompensated heart failure (ADHF). Text messaging is a promising platform for improving chronic disease self-management in low-income populations, yet is largely unexplored in ADHF. The purpose of this pre-post study was to assess the feasibility and acceptability of a text message-based (SMS: short message service) intervention in a largely African American population with ADHF and explore its effects on self-management. Hospitalized patients with ADHF were enrolled in an automated text message-based heart failure program for 30 days following discharge. Messages provided self-care reminders and patient education on diet, symptom recognition, and health care navigation. Demographic and cell phone usage data were collected on enrollment, and an exit survey was administered on completion. The Self-Care of Heart Failure Index (SCHFI) was administered preintervention and postintervention and compared using sample t tests (composite) and Wilcoxon rank sum tests (individual). Clinical data were collected through chart abstraction. Of 51 patients approached for recruitment, 27 agreed to participate and 15 were enrolled (14 African-American, 1 White). Barriers to enrollment included not owning a personal cell phone (n=12), failing the Mini-Mental exam (n=3), needing a proxy (n=2), hard of hearing (n=1), and refusal (n=3). Another 3 participants left the study for health reasons and 3 others had technology issues. A total of 6 patients (5 African-American, 1 White) completed the postintervention surveys. The mean age was 50 years (range 23-69) and over half had Medicaid or were uninsured (60%, 9/15). The mean ejection fraction for those with systolic dysfunction was 22%, and at least two-thirds had a prior hospitalization in the past year. Participants strongly agreed that the program was easy to use (83%), reduced pills missed (66%), and decreased salt intake
Bollegala, Danushka; Maehara, Takanori; Kawarabayashi, Ken-ichi
Methods for representing the meaning of words in vector spaces purely using the information distributed in text corpora have proved to be very valuable in various text mining and natural language processing (NLP) tasks. However, these methods still disregard the valuable semantic relational structure between words in co-occurring contexts. These beneficial semantic relational structures are contained in manually-created knowledge bases (KBs) such as ontologies and semantic lexicons, where the meanings of words are represented by defining the various relationships that exist among those words. We combine the knowledge in both a corpus and a KB to learn better word embeddings. Specifically, we propose a joint word representation learning method that uses the knowledge in the KBs, and simultaneously predicts the co-occurrences of two words in a corpus context. In particular, we use the corpus to define our objective function subject to the relational constrains derived from the KB. We further utilise the corpus co-occurrence statistics to propose two novel approaches, Nearest Neighbour Expansion (NNE) and Hedged Nearest Neighbour Expansion (HNE), that dynamically expand the KB and therefore derive more constraints that guide the optimisation process. Our experimental results over a wide-range of benchmark tasks demonstrate that the proposed method statistically significantly improves the accuracy of the word embeddings learnt. It outperforms a corpus-only baseline and reports an improvement of a number of previously proposed methods that incorporate corpora and KBs in both semantic similarity prediction and word analogy detection tasks. PMID:29529052
Andrew J Reagan
Full Text Available Abstract The emergence and global adoption of social media has rendered possible the real-time estimation of population-scale sentiment, an extraordinary capacity which has profound implications for our understanding of human behavior. Given the growing assortment of sentiment-measuring instruments, it is imperative to understand which aspects of sentiment dictionaries contribute to both their classification accuracy and their ability to provide richer understanding of texts. Here, we perform detailed, quantitative tests and qualitative assessments of 6 dictionary-based methods applied to 4 different corpora, and briefly examine a further 20 methods. We show that while inappropriate for sentences, dictionary-based methods are generally robust in their classification accuracy for longer texts. Most importantly they can aid understanding of texts with reliable and meaningful word shift graphs if (1 the dictionary covers a sufficiently large portion of a given text’s lexicon when weighted by word usage frequency; and (2 words are scored on a continuous scale.
Gilles-Maurice de Schryver
Full Text Available
Abstract: In this article the integrated corpus query functionality of the dictionary compilation software TshwaneLex is analysed. Attention is given to the handling of both raw corpus data and annotated corpus data. With regard to the latter it is shown how, with a minimum of human effort, machine learning techniques can be employed to obtain part-of-speech tagged corpora that can be used for lexicographic purposes. All points are illustrated with data drawn from English and Northern Sotho. The tools and techniques themselves, however, are language-independent, and as such the encouraging outcomes of this study are far-reaching.
Keywords: LEXICOGRAPHY, DICTIONARY, SOFTWARE, DICTIONARY WRITING SYS-TEM (DWS, CORPUS QUERY PACKAGE (CQP, TSHWANELEX, CORPUS, CORPUS ANNO-TATION, PART-OF-SPEECH TAGGER (POS-TAGGER, MACHINE LEARNING, NORTHERN SOTHO (SESOTHO SA LEBOA
Samenvatting: Woordenboekaanmaaksysteem + corpusanalysepakket: een studie van TshwaneLex. In dit artikel wordt het geïntegreerde corpusanalysepakket van het woordenboekaanmaaksysteem TshwaneLex geanalyseerd. Aandacht gaat zowel naar het verwer-ken van onbewerkte corpusdata als naar geannoteerde corpusdata. Wat het laatste betreft wordt aangetoond hoe, met een minimum aan intellectuele arbeid, automatische leertechnieken met suc-ces kunnen worden ingezet om corpora voor lexicografische doeleinden aan te maken waarin de woordklassen expliciet worden vermeld. Alle stappen van de redenering worden geïllustreerd met gegevens uit het Engels en Noord-Sotho. De instrumenten en technieken zelf zijn echter allemaal taalonafhankelijk, waardoor de veelbelovende resultaten van deze studie verreikend zijn.
Sleutelwoorden: LEXICOGRAFIE, WOORDENBOEK, SOFTWARE, WOORDENBOEK-AANMAAKSYSTEEM, CORPUSANALYSEPAKKET, TSHWANELEX, CORPUS, CORPUSANNO-TATIE, WOORDKLASSETAGGER, AUTOMATISCHE LEERTECHNIEKEN, NOORD-SOTHO
Full Text Available Several reports have described magnetic resonance (MR findings in canine and feline lysosomal storage diseases such as gangliosidoses and neuronal ceroid lipofuscinosis. Although most of those studies described the signal intensities of white matter in the cerebrum, findings of the corpus callosum were not described in detail. A retrospective study was conducted on MR findings of the corpus callosum as well as the rostral commissure and the fornix in 18 cases of canine and feline lysosomal storage diseases. This included 6 Shiba Inu dogs and 2 domestic shorthair cats with GM1 gangliosidosis; 2 domestic shorthair cats, 2 familial toy poodles, and a golden retriever with GM2 gangliosidosis; and 2 border collies and 3 chihuahuas with neuronal ceroid lipofuscinoses, to determine whether changes of the corpus callosum is an imaging indicator of those diseases. The corpus callosum and the rostral commissure were difficult to recognize in all cases of juvenile-onset gangliosidoses (GM1 gangliosidosis in Shiba Inu dogs and domestic shorthair cats and GM2 gangliosidosis in domestic shorthair cats and GM2 gangliosidosis in toy poodles with late juvenile-onset. In contrast, the corpus callosum and the rostral commissure were confirmed in cases of GM2 gangliosidosis in a golden retriever and canine neuronal ceroid lipofuscinoses with late juvenile- to early adult-onset, but were extremely thin. Abnormal findings of the corpus callosum on midline sagittal images may be a useful imaging indicator for suspecting lysosomal storage diseases, especially hypoplasia (underdevelopment of the corpus callosum in juvenile-onset gangliosidoses.
Kaewphan, Suwisa; Van Landeghem, Sofie; Ohta, Tomoko; Van de Peer, Yves; Ginter, Filip; Pyysalo, Sampo
Motivation: The recognition and normalization of cell line names in text is an important task in biomedical text mining research, facilitating for instance the identification of synthetically lethal genes from the literature. While several tools have previously been developed to address cell line recognition, it is unclear whether available systems can perform sufficiently well in realistic and broad-coverage applications such as extracting synthetically lethal genes from the cancer literature. In this study, we revisit the cell line name recognition task, evaluating both available systems and newly introduced methods on various resources to obtain a reliable tagger not tied to any specific subdomain. In support of this task, we introduce two text collections manually annotated for cell line names: the broad-coverage corpus Gellus and CLL, a focused target domain corpus. Results: We find that the best performance is achieved using NERsuite, a machine learning system based on Conditional Random Fields, trained on the Gellus corpus and supported with a dictionary of cell line names. The system achieves an F-score of 88.46% on the test set of Gellus and 85.98% on the independently annotated CLL corpus. It was further applied at large scale to 24 302 102 unannotated articles, resulting in the identification of 5 181 342 cell line mentions, normalized to 11 755 unique cell line database identifiers. Availability and implementation: The manually annotated datasets, the cell line dictionary, derived corpora, NERsuite models and the results of the large-scale run on unannotated texts are available under open licenses at http://turkunlp.github.io/Cell-line-recognition/. Contact: firstname.lastname@example.org PMID:26428294
...-AA00 Safety Zone; Naval Air Station Corpus Christi Air Show, Oso Bay, Corpus Christi, TX AGENCY: Coast... zone on the navigable waters of Oso Bay in Corpus Christi, Texas in support of the 2011 Naval Air... entities and very few recreational fisherman utilize this section of Oso Bay, the restriction of vessel...
Matharu, J; Hale, B; Ammar, M; Brennan, P A
With the widespread use of smartphones, text messaging has become an accepted form of communication for both social and professional use in medicine. To our knowledge no published studies have assessed the prevalence and use of short message service (SMS) texting by doctors on call. We have used an online questionnaire to seek information from doctors in a large NHS Trust in the UK about their use of texting while on call, what they use it for, and whether they send images relevant to patients' care. We received 302 responses (43% response rate), of whom 166 (55%) used SMS while on call. There was a significant association between SMS and age group (p=0.005), with the 20-30-year-old group using it much more than the other age groups. Doctors in the surgical specialties used it significantly less than those in other speciality groups (pcall was deemed to be safe and reliable (pcommunication to use when on call. Copyright © 2016 The British Association of Oral and Maxillofacial Surgeons. Published by Elsevier Ltd. All rights reserved.
Full Text Available The research problems in this research are 1 how lexicogrammar takes role in determining polarity of F-Word1 and 2 how to formalize it for corpus processing. The data is obtained from the Contemporary American English Corpus (COCA. In this corpus, F-word is proven to be highest in frequency as compared to its distribution across corpora. Corpus methodology is applied by sending queries to retrieve F-Words to COCA interface. Tokens combination surrounding F-words resulted in the phrase and clause unit accompanying F-words, which are significant cues to determine F-word polarity. The polarity is later proven to be not necessarily negative. I also designed a computational resource to allow the retrieval of F-words offline so that users might apply it to any digital text collections.
Many translation scholars have proposed the use of corpora to allow professional translators to produce high quality texts which read like originals. Yet, the diffusion of this methodology has been modest, one reason being the fact that software for corpora analyses have been developed with the linguist in mind, which means that they are generally complex and cumbersome, offering many advanced features, but lacking the level of usability and the specific features that meet translators’ needs. To overcome this shortcoming, we have developed TranslatorBank, a free corpus creation and analysis tool designed for translation tasks. TranslatorBank supports the creation of specialized monolingual corpora from the web; it includes a concordancer with a query system similar to a search engine; it uses basic statistical measures to indicate the reliability of results; it accesses the original documents directly for more contextual information; it includes a statistical and linguistic terminology extraction utility to extract the relevant terminology of the domain and the typical collocations of a given term. Designed to be easy and intuitive to use, the tool may help translation students as well as professionals to increase their translation quality by adhering to the specific linguistic variety of the target text corpus.
Wail Hamood KHALED
Full Text Available The quantity of text information published in Arabic language on the net requires the implementation of effective techniques for the extraction and classifying of relevant information contained in large corpus of texts. In this paper we presented an implementation of an enhanced k-NN Arabic text classifier. We apply the traditional k-NN and Naive Bayes from Weka Toolkit for comparison purpose. Our proposed modified k-NN algorithm features an improved decision rule to skip the classes that are less similar and identify the right class from k nearest neighbours which increases the accuracy. The study evaluates the improved decision rule technique using the standard of recall, precision and f-measure as the basis of comparison. We concluded that the effectiveness of the proposed classifier is promising and outperforms the classical k-NN classifier.
This textbook examines empirical linguistics from a theoretical linguist’s perspective. It provides both a theoretical discussion of what quantitative corpus linguistics entails and detailed, hands-on, step-by-step instructions to implement the techniques in the field. The statistical methodology and R-based coding from this book teach readers the basic and then more advanced skills to work with large data sets in their linguistics research and studies. Massive data sets are now more than ever the basis for work that ranges from usage-based linguistics to the far reaches of applied linguistics. This book presents much of the methodology in a corpus-based approach. However, the corpus-based methods in this book are also essential components of recent developments in sociolinguistics, historical linguistics, computational linguistics, and psycholinguistics. Material from the book will also be appealing to researchers in digital humanities and the many non-linguistic fields that use textual data analysis and t...
Gisele Almeida Lima Veiga
Full Text Available The expression of genes encoding the receptors for estrogen (ERαmRNA and oxytocin (OTRmRNA was studied in the corpus luteum during pregnancy and parturition in dogs. Real-time PCR was performed to quantify the levels of ERαmRNA and OTRmRNA in the corpus luteum of bitches during Early (up to 20 days of gestation, Mid (20 to 40 days and Late Pregnancy (40 to 60 days, and Parturition (first stage of labor. The corpus luteum expressed mRNA for OTR, however ERα mRNA was not detected. There was a reduction of OTR mRNA expression in the corpus luteum from gestational Day 20 onward, which suggests an important role of OTR mRNA in the mechanism of pregnancy recognition in dogs. We concluded that the expression of OTR mRNA in canine corpus luteum vary over time, which support the idea that the sensitivity and response to hormone therapy can vary along the course of pregnancy and labor. Moreover, the canine CL lacks ERα mRNA expression during pregnancy.
Full Text Available Aims of study. The aim of the present study was to investigate whether an ethanol extract of Scutellaria baicalensis (ESB relaxes penile corpus cavernosum muscle in organ bath experiments. Materials and methods. Changes in tension of cavernous smooth muscle strips were determined by penile strip chamber model and in penile perfusion model. Isolated endothelium-intact rabbit corpus cavernosum was precontracted with phenylephrine (PE and then treated with ESB. Results. ESB relaxed penile smooth muscle in a dose-dependent manner, and this was inhibited by pre-treatment with NG-nitro-l-arginine methyl ester (l-NAME, a nitric oxide (NO synthase inhibitor, and 1H-[1, 2, 4]-oxadiazolo-[4,3-α]-quinoxalin-1-one (ODQ, a soluble guanylyl cyclase (sGC inhibitor. ESB-induced relaxation was significantly attenuated by pretreatment with tetraethylammonium (TEA, a nonselective K+ channel blocker, and charybdotoxin, a selective Ca2+-dependent K+ channel inhibitor. ESB increased the cGMP levels of rabbit corpus cavernosum in a concentration-dependent manner without changes in cAMP levels. In a perfusion model of penile tissue, ESB also relaxed penile corpus cavernosum smooth muscle in a dose-dependent manner. Conclusion. Taken together, these results suggest that ESB relaxed rabbit cavernous smooth muscle via the NO/cGMP system and Ca2+-sensitive K+ channels in the corpus cavernosum.
Chen, Yu-Hua; Bruncak, Radovan
With the advances in technology, wordlists retrieved from computer corpora have become increasingly popular in recent years. The lexical items in those wordlists are usually selected, according to a set of robust frequency and dispersion criteria, from large corpora of authentic and naturally occurring language. Corpus wordlists are of great value…
important step towards the creation of a truly representative large corpus of SAE and ... Census data which elicit information about home language do not tell .... ISAE has absorbed lexical items such as robot (traffic light), dagha (mud), baba- ..... used their access to existing social networks to identify other contributors to.
Full Text Available The learning of English as a foreign language is an additional burden for art majors. This study aimed to examine high frequency words in art research articles to improve the efficiency of art majors’ English learning, especially their academic reading and writing. For this aim, the study built a corpus, analyzed data from art research articles and compared data with three base word lists. We found that the General Service List (GSL and the Academic Word List (AWL had a high coverage in our corpus, and there was a different high frequency word order in the Art Research Article Corpus (ARAC. These findings provide some implications for teaching English for art majors.
Full Text Available Le projet ANR «Corpus des monuments religieux antérieurs à l’an Mil» [Corpus architecturae religiosae europeae/CARE – IV-X saec.] a débuté en janvier 2008. Il représente l’apport de la France à un programme international, initié en 2002 par l’IRCLAMA de Zagreb (Croatie . Ce corpus a pour objectif de recenser les édifices religieux d’Europe entre le IVe siècle et le tout début du XIe siècle. Il regroupe déjà l’Italie, l’Espagne, la Croatie, l’Europe centrale et demain, probablement, l’Irlande...
Gloria Valencia Mendoza
Full Text Available El presente trabajo se desarrolla a partir de una experiencia de vida de Gloria Valencia Mendoza y su trayectoria de estudio e investigación. La aproximación permanente a los planteamientos y la filosofía del maestro Edgar Willems se presentan desde el inicio de su formación como pedagoga musical en la Universidad Nacional de Colombia (Bogotá y posteriormente a través de la relación directa con el Maestro, en cursos de formación en Suiza. Willems fundamenta su propuesta por medio de una profunda visión del ser humano; desde el estudio de las diferentes conciencias y la conexión con la música en su esencia y su existencia ofrece una plataforma filosófica unida a una propuesta didáctica de suma importancia para quienes estamos recorriendo el camino de la pedagogía musical. Es importante además encontrar algunos nexos con Jean Piaget.
Trybula, Walter J.
Reviews the state of research in text mining, focusing on newer developments. The intent is to describe the disparate investigations currently included under the term text mining and provide a cohesive structure for these efforts. A summary of research identifies key organizations responsible for pushing the development of text mining. A section…
Full Text Available This study looks at demonstrative descriptions, regarding them as text-deictic procedures which contribute to weave discourse reference. Text deixis is thought of as a metaphorical referential device which maps the ground of utterance onto the text itself. Demonstrative expressions with textual antecedent-triggers, considered as the most important text-deictic units, are identified in a narrative corpus consisting of J. M. Barrie’s Peter Pan and its translation into Catalan. Some linguistic and discourse variables related to DemNPs are analysed to characterise adequately text deixis. It is shown that this referential device is usually combined with abstract nouns, thus categorising and encapsulating (non-nominal complex discourse entities as nouns, while performing a referential cohesive function by means of the text deixis + general noun type of lexical cohesion.
Full Text Available Ectopic neurohypophysis is a pituitary gland abnormality, which can accompany growth hormone deficiency associated with dwarfism. Here we present magnetic resonance imaging (MRI findings of a rare case of ectopic neurohypophysis, corpus callosum dysgenesis, and periventricular neuronal heterotopia coexisting, with a review of the literature.
Full Text Available The SWISS TEXT CORPUS (CHTK has made it its goal to extensively document the German language of the 20th century in Switzerland. In this way, and in its parallel function as a sub-corpus of the Corpus C4, that will consist of 20 million text words (tokens each from Germany, Austria, Italy/South Tirol and, as already said, Switzerland, it represents a classical reference corpus both for the standard German language in Switzerland as well as in the entire German-speaking area of Western Europe. A reference corpus should meet the requirement of comprehensively depicting the central repertoire of a language, i.e. the generally used vocabulary of this language, which is why questions of corpus structure and general planning (corpus design play a decisive role (cf. Lemnitzer/Zinsmeister (2006: 106, where the type of the reference corpus is contrasted with the special corpus. Four and a half years after the start of the project, the SWISS TEXT CORPUS was made available to the general public in April 2009, as a research instrument. The following article outlines in brief the history of this research project and deals with fundamental and specific decisions that had to be made in the design of such a reference corpus, and with how the CHTK is compiled. Together with a concluding overview of some retrieval and analysis options offered by the CHTK, this article also provides an overview of the potential of this new research instrument and supplies the background knowledge required to work with the CHTK. For reasons of space, the methods of working, the corpus-driven approaches, cannot be thematised here (cf. Bubenhofer 2008, 2006.
Alvaro, Nestor; Miyao, Yusuke; Collier, Nigel
Work on pharmacovigilance systems using texts from PubMed and Twitter typically target at different elements and use different annotation guidelines resulting in a scenario where there is no comparable set of documents from both Twitter and PubMed annotated in the same manner. This study aimed to provide a comparable corpus of texts from PubMed and Twitter that can be used to study drug reports from these two sources of information, allowing researchers in the area of pharmacovigilance using natural language processing (NLP) to perform experiments to better understand the similarities and differences between drug reports in Twitter and PubMed. We produced a corpus comprising 1000 tweets and 1000 PubMed sentences selected using the same strategy and annotated at entity level by the same experts (pharmacists) using the same set of guidelines. The resulting corpus, annotated by two pharmacists, comprises semantically correct annotations for a set of drugs, diseases, and symptoms. This corpus contains the annotations for 3144 entities, 2749 relations, and 5003 attributes. We present a corpus that is unique in its characteristics as this is the first corpus for pharmacovigilance curated from Twitter messages and PubMed sentences using the same data selection and annotation strategies. We believe this corpus will be of particular interest for researchers willing to compare results from pharmacovigilance systems (eg, classifiers and named entity recognition systems) when using data from Twitter and from PubMed. We hope that given the comprehensive set of drug names and the annotated entities and relations, this corpus becomes a standard resource to compare results from different pharmacovigilance studies in the area of NLP. ©Nestor Alvaro, Yusuke Miyao, Nigel Collier. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 03.05.2017.
Trull, Timothy J; Vergés, Alvaro; Wood, Phillip K; Jahng, Seungmin; Sher, Kenneth J
We examined the latent structure underlying the criteria for DSM-IV-TR (American Psychiatric Association, 2000, Diagnostic and statistical manual of mental disorders (4th ed., text revision). Washington, DC: Author.) personality disorders in a large nationally representative sample of U.S. adults. Personality disorder symptom data were collected using a structured diagnostic interview from approximately 35,000 adults assessed over two waves of data collection in the National Epidemiologic Survey on Alcohol and Related Conditions. Our analyses suggested that a seven-factor solution provided the best fit for the data, and these factors were marked primarily by one or at most two personality disorder criteria sets. A series of regression analyses that used external validators tapping Axis I psychopathology, treatment for mental health problems, functioning scores, interpersonal conflict, and suicidal ideation and behavior provided support for the seven-factor solution. We discuss these findings in the context of previous studies that have examined the structure underlying the personality disorder criteria as well as the current proposals for DSM-5 personality disorders. (PsycINFO Database Record (c) 2012 APA, all rights reserved).
Satoru Takahashi, M.D.
Full Text Available The corpus callosum is the major commissural pathway connecting the cerebral hemispheres. This pathway receives its blood supply from anterior communicating artery, pericallosal artery, and posterior pericallosal artery. However, in some cases, the entire corpus callosum is supplied by median callosal artery; thus, occlusion of this artery can lead to infarction of the entire corpus callosum. Few reports have described this type of infarction, and no reports after subarachnoid hemorrhage (SAH exist. Here, we report on a 42-year-old female who was diagnosed with SAH after two aneurysms were discovered in bifurcation of left anterior cerebral artery (A1-A2. After successful clipping was performed, the patient was alert and had no neurological deficits; moreover, the computed tomography images that were acquired after the operation showed no evidence of infarction. Nine days after admittance to the hospital, drowsiness and weakness of the left limbs with brain swelling appeared and decompressive hemi-craniectomy was performed. Diagnostic cerebral angiography revealed vasospasms in both anterior and middle cerebral arteries, thus fasudil hydrochloride was administered intra-arterially. While blood flow in all arteries improved, diffusion-weighted magnetic resonance imaging detected infarction along the entire length of the corpus callosum and in the medial region of the right frontal lobe. We believe this infarction was due to secondary ischemia of median callosal artery. This case reminded us of the anatomical variation wherein median callosal artery is the sole blood supply line for the corpus callosum and demonstrated that infarction of the entire corpus callosum is possible.
Sikveland, A.; Öttl, A.; Amdal, I.; Ernestus, M.; Svendsen, T.; Edlund, J.
Spontal-N is a corpus of spontaneous, interactional Norwegian. To our knowledge, it is the first corpus of Norwegian in which the majority of speakers have spent significant parts of their lives in Sweden, and in which the recorded speech displays varying degrees of interference from Swedish. The corpus consists of studio quality audio- and video-recordings of four 30-minute free conversations between acquaintances, and a manual orthographic transcription of the entire material. On basis of t...
The first three chapters of the book offer relevant information on the new methodological approach, learner corpus profiling, and the exemplifying case, Romanian Learner English. The description of the Romanian Corpus of Learner English is also given special attention. The following three chapters include corpus-based frequency analyses of selected grammatical categories (articles, prepositions, genitives), combined with error analyses. In the concluding discussion, the book summarizes the features compiled as lexico-grammatical profiles.
... Petroleum Gas, the waters within a 500 yard radius of the LPG carrier while the vessel transits the Corpus Christi Ship Channel to the LPG receiving facility. The safety zone remains in effect until the LPG vessel is moored at the LPG receiving facility. (2) For outgoing tank vessels loaded with LPG, the waters...
Lábadi, Beatrix; Beke, Anna Maria
Agenesis of corpus callosum is a relatively frequent congenital cerebral malformation including dysplasia, total or partial absence of corpus callosum. The agenesis of corpus callosum can be occured in isolated form without accompanying somatic or central nervous system abnormalities and it can be associated with other central nervus system malformations. The behavioral and cognitive outcome is more favorable for patients with isolated agenesis of corpus callous than syndromic form of corpus callosum. The aim of this study is to review recent research on behavioral and social-cognitive functions in individuals with agenesis of corpus callosum. Developmental delay is common especially in higher-order cognitive and social functions. An internet database search was performed to identify publications on the subject. Fifty-five publications in English corresponded to the criteria. These studies reported deficits in language, social cognition and emotions in individuals with agenesis of corpus callosum which is known as primary corpus callous syndrome. The results indicate that individuals with agenesis of corpus callosum have deficiency in social-cognitive domain (recognition of emotions, weakness in paralinguistic aspects of language and mentalizing abilities). The impaired social cognition can be manifested in behavioral problems like autism and attention deficit hyperactivity disorder.
Doğan, Rezarta Islamaj; Leaman, Robert; Lu, Zhiyong
Information encoded in natural language in biomedical literature publications is only useful if efficient and reliable ways of accessing and analyzing that information are available. Natural language processing and text mining tools are therefore essential for extracting valuable information, however, the development of powerful, highly effective tools to automatically detect central biomedical concepts such as diseases is conditional on the availability of annotated corpora. This paper presents the disease name and concept annotations of the NCBI disease corpus, a collection of 793 PubMed abstracts fully annotated at the mention and concept level to serve as a research resource for the biomedical natural language processing community. Each PubMed abstract was manually annotated by two annotators with disease mentions and their corresponding concepts in Medical Subject Headings (MeSH®) or Online Mendelian Inheritance in Man (OMIM®). Manual curation was performed using PubTator, which allowed the use of pre-annotations as a pre-step to manual annotations. Fourteen annotators were randomly paired and differing annotations were discussed for reaching a consensus in two annotation phases. In this setting, a high inter-annotator agreement was observed. Finally, all results were checked against annotations of the rest of the corpus to assure corpus-wide consistency. The public release of the NCBI disease corpus contains 6892 disease mentions, which are mapped to 790 unique disease concepts. Of these, 88% link to a MeSH identifier, while the rest contain an OMIM identifier. We were able to link 91% of the mentions to a single disease concept, while the rest are described as a combination of concepts. In order to help researchers use the corpus to design and test disease identification methods, we have prepared the corpus as training, testing and development sets. To demonstrate its utility, we conducted a benchmarking experiment where we compared three different
Heron José de Santana Gordilho
Full Text Available This essay presents a comparison between human evolution and legal developments, trying to demonstrate how darwinian theory of evolution by natural selection has caused changes in the legal world, the bridge of today some lawyers using the recent discoveries about how similar genetic between man and great primates to claim extension of human rights for chimpanzees, bonobos, gorillas and orangs. It also that many activists for animal`s rights have considered the dispute an important strategy, whether to set new means for legal institutes such as the Habeas Corpus, hitherto used only to ensure human freedom, whether to increase the movement and increase the conscietization of the general population about the importance of the recognition of animals as holders of basic rights.
Full Text Available À la fin de l’année 2007, le projet déposé auprès de l’Agence nationale de la recherche (ANR et consacré à la constitution d’un corpus des monuments religieux (CARE antérieurs à l’an Mil a été retenu. Il correspond au volet propre à la France. En effet, plusieurs pays, dont l’Italie, l’Espagne, la République Tchèque, la Slovaquie, la Pologne et la Croatie ont commencé depuis deux ans les travaux préparatoires à cette ambitieuse entreprise ; la Grèce est, depuis, intéressée, de même que l’Al...
Liptak Amy R
Full Text Available Abstract Chemokines are small molecular weight peptides responsible for adhesion, activation, and recruitment of leukocytes into tissues. Leukocytes are thought to influence follicular atresia, ovulation, and luteal function. Many studies in recent years have focused attention on the characterization of leukocyte populations within the ovary, the importance of leukocyte-ovarian cell interactions, and more recently, the mechanisms of ovarian leukocyte recruitment. Information about the role of chemokines and leukocyte trafficking (chemotaxis during ovarian function is important to understanding paracrine-autocrine relationships shared between reproductive and immune systems. Recent advances regarding chemokine expression and leukocyte accumulation within the ovulatory follicle and the corpus luteum are the subject of this mini-review.
Full Text Available The question of whether epistemic modals contribute to the truth conditions of the sentences they appear in is a matter of active debate in the literature. Fueling this debate is the lack of consensus about the extent to which epistemics can appear in the scope of other operators. This corpus study investigates the distribution of epistemics in naturalistic data. Our results indicate that they do embed, supporting the view that they contribute semantic content. However, their distribution is limited, compared to that of other modals. This limited distribution seems to call for a nuanced account: while epistemics are semantically contentful, they may require special licensing conditions. http://dx.doi.org/10.3765/sp.5.4 BibTeX info
Christine P. Petersen
Full Text Available Intestinal-type gastric adenocarcinoma evolves in a field of pre-existing metaplasia. Over the past 20 years, a number of murine models have been developed to address aspects of the physiology and pathophysiology of metaplasia induction. Although none of these models has achieved true recapitulation of the induction of adenocarcinoma, they have led to important insights into the factors that influence the induction and progression of metaplasia. Here, we review the pathologic definitions relevant to alterations in gastric corpus lineages and classification of metaplasia by specific lineage markers. In addition, we review present murine models of the induction and progression of spasmolytic polypeptide (TFF2âexpressing metaplasia, the predominant metaplastic lineage observed in murine models. These models provide a basis for the development of a broader understanding of the physiological and pathophysiological roles of metaplasia in the stomach. Keywords: SPEM, Intestinal Metaplasia, Gastric Cancer, TFF2, Chief Cell, Hyperplasia
Pafilis, Evangelos; Frankild, Sune P; Schnetzer, Julia; Fanini, Lucia; Faulwetter, Sarah; Pavloudi, Christina; Vasileiadou, Katerina; Leary, Patrick; Hammock, Jennifer; Schulz, Katja; Parr, Cynthia Sims; Arvanitidis, Christos; Jensen, Lars Juhl
The association of organisms to their environments is a key issue in exploring biodiversity patterns. This knowledge has traditionally been scattered, but textual descriptions of taxa and their habitats are now being consolidated in centralized resources. However, structured annotations are needed to facilitate large-scale analyses. Therefore, we developed ENVIRONMENTS, a fast dictionary-based tagger capable of identifying Environment Ontology (ENVO) terms in text. We evaluate the accuracy of the tagger on a new manually curated corpus of 600 Encyclopedia of Life (EOL) species pages. We use the tagger to associate taxa with environments by tagging EOL text content monthly, and integrate the results into the EOL to disseminate them to a broad audience of users. The software and the corpus are available under the open-source BSD and the CC-BY-NC-SA 3.0 licenses, respectively, at http://environments.hcmr.gr. © The Author 2015. Published by Oxford University Press.
Y. B. Abdullin
Full Text Available Sentiment analysis of short texts such as Twitter messages and comments in news portals is challenging due to the lack of contextual information. We propose a deep neural network model that uses bilingual word embeddings to effectively solve sentiment classification problem for a given pair of languages. We apply our approach to two corpora of two different language pairs: English-Russian and Russian-Kazakh. We show how to train a classifier in one language and predict in another. Our approach achieves 73% accuracy for English and 74% accuracy for Russian. For Kazakh sentiment analysis, we propose a baseline method, that achieves 60% accuracy; and a method to learn bilingual embeddings from a large unlabeled corpus using a bilingual word pairs.
Full Text Available This Paper introduces corpus methods and its application to media text analysis. The researcher collect 1,363 macroeconomic reports from three major Taiwanese newspapers, including Apple Daily, The Liberty Times, and The United Daily as the copra. Research shows that corpus-assisted media text analysis enables researcher to calculate frequency of vocabulary and analyze lexical structure of the text via concordance and collocation. By using macroeconomic news as the study case, this paper also found that news reports tend to simplify GDP number as a mission, prefer attributing local economic performance as a systematic problem of global economy, and treat economy as a manageable task by attributing it to the government. All these ideologies and values are reflected on vocabularies and discursive practice of media.
Kristina HMELJAK SANGAWA
Full Text Available The paper presents a set of integrated on-line language resources targeted at Japanese language learners, primarily those whose mother tongue is Slovene. The resources consist of the on-line Japanese-Slovene learners’ dictionary jaSlo and two corpora, a 1 million word Japanese-Slovene parallel corpus and a 300 million word corpus of web pages, where each word and sentence is marked by its difficulty level; this corpus is furthermore available as a set of five distinct corpora, each one containing sentences of the particular level. The corpora are available for exploration through NoSketch Engine, the open source version of the commercial state-of-the-art corpus analysis software Sketch Engine. The dictionary is available for Web searching, and dictionary entries have direct links to examples from the corpora, thus offering a wider picture of a possible translations in concrete contextualised examples, and b monolingual Japanese usage examples of different difficulty levels to support language learning.
Choudhary, Prakash; Nain, Neeta; Ahmed, Mushtaq
This paper presents a methodology for the development of an Urdu handwritten text image Corpus and application of Corpus linguistics in the field of OCR and information retrieval from handwritten document. Compared to other language scripts, Urdu script is little bit complicated for data entry. To enter a single character it requires a combination of multiple keys entry. Here, a mixed approach is proposed and demonstrated for building Urdu Corpus for OCR and Demographic data collection. Demographic part of database could be used to train a system to fetch the data automatically, which will be helpful to simplify existing manual data-processing task involved in the field of data collection such as input forms like Passport, Ration Card, Voting Card, AADHAR, Driving licence, Indian Railway Reservation, Census data etc. This would increase the participation of Urdu language community in understanding and taking benefit of the Government schemes. To make availability and applicability of database in a vast area of corpus linguistics, we propose a methodology for data collection, mark-up, digital transcription, and XML metadata information for benchmarking.
Full Text Available This paper responds to the article by Christopher White and Ian Quinn, in which these authors introduce the Yale-Classical Archives Corpus (YCAC. I begin by making some general observations about the corpus, especially with regard to ramifications of the keyboard-performance origins of many pieces in the original MIDI collection. I then assess the accuracy of the scale-degree and local-key fields in the database, which were generated by the Bellman-Budge key-finding algorithm. I point out that some of the inaccuracies from the key-finding algorithm's output may influence the results we obtain from statistical studies of this corpus. I also offer an alternative analysis to the authors' finding that the ratio of V7 to V chords increases over time in common-practice music. Specifically, I conjecture that this finding may be the result of (or related to increasing instrumental resources over time. I close with some recommendations for future versions of the corpus, such as enabling end users to help repair transcription errors as well as offer ground truths for harmonic analyses and key area information.
Comparative study on corpus development for Malay investment fraud detection in website. ... Journal of Fundamental and Applied Sciences ... The aim of this research is to develop a corpus for Malay investment fraud so that it can be used in ...
An analysis of the problems that most corpus builders face shows that more problems are likely to be encountered when dealing with spoken corpora than with written corpora. The paper demonstrates that tagging is an important component of corpus building as it makes it easier for a researcher to extract relevant data.
Marx, M.; Schuth, A.
A corpus called DutchParl is created which aims to contain all digitally available parliamentary documents written in the Dutch language. The first version of DutchParl contains documents from the parliaments of The Netherlands, Flanders and Belgium. The corpus is divided along three dimensions: per
Kim, Hyoung Sub; Kim, Jong Chul; Kang, Yong Soo; Lee, Young Hwan; Kim, Young Wol
To measure the mean size of the various portions of the corpus callosum in normal Korean children, using MR imaging. Our subjects were 166 children (male : female=100 : 66) aged under 15 whose findings on MR imaging and neurologic examination were normal. Using midsagittal T1-weighted imaging, we measured the length of the brain and corpus callosum, the height of the latter, and the thickness of its genu body, transitional zone and splenium. The measurements were statistically analysed according to age and sex. Brain length and the size of the various portions of the corpus callosum tended to increase relatively rapidly during the first three years of life, but the rate of growth tended to decrease according to age. The mean lenght of the brain and corpus callosum and the mean thickness of the splenium of the corpus callosum did not differ according to sex. The mean thickness of the genu, body and transitional zone of the corpus callosum was greater in males than in females. The ratio of the length of the corpus callosum to the anteroposterior diameter of the brain was significantly greater in females than in males (alpha=0.05). Using MR imaging, we measured the mean sizes of the various portions of the corpus callosum in normal children;these values may provide a useful basis for determing changes occurring in its structure
This primarily methodological article makes a proposition for linguistic exploration of textual resources available through the "Google Scholar" search engine. These resources ("Google Scholar virtual corpus") are significantly larger than any existing corpus of academic writing. "Google Scholar", however, was not designed for linguistic searches…
Christensen, Lars Rune
Actors in the building process are critically dependent on a corpus of written text that draws the distributed work tasks together. This paper introduces, on the basis of a field study, the concepts of corpus, intertext and intertextuality to the analysis of text in cooperative work practice. Thi...... type and the mediated type, may constitute the intertext of a particular task. By employing the concepts of corpus, intertext and intertextuality with respect to the study of the building process, this paper outlines an approach to the investigation of text in cooperative work.......Actors in the building process are critically dependent on a corpus of written text that draws the distributed work tasks together. This paper introduces, on the basis of a field study, the concepts of corpus, intertext and intertextuality to the analysis of text in cooperative work practice....... This paper shows that actors in the building process create intertext (connections) between complementary texts, in a particular situation and for a particular task. This has an integrating effect on the building process. Several types of intertextuality, including the complementary type, the intratextual...
Ersanli, Ceylan Yangin
This study reports on the insights from an EFL learner corpora (a total of 151 essays and 49,690 words) generated from essays collected over the years in a Turkish state university from freshmen students enrolling in the Advanced Writing course. The comparison of cohesive devices in the non-native corpus (NNC) with those in a native corpus (NC)…
Chen, Qi; Guang-Chun, Ge
We conducted a lexical study on the word frequency and the text coverage of the 570 word families from Coxhead's Academic Word List (AWL) in medical research articles (RAs) based on a corpus of 50 medical RAs written in English with 190425 running words. By computer analysis, we found that the text coverage of the AWL words accounted for around…
Askehave, Inger; Kastberg, Peter
is derived from another text or to establish what aspects of the text have been derived, one must gain control over external variables that are not easily controllable. In our approach, we suggest a method that - while controlling external variables - is designed to isolate a suitable text corpus. Contrary...
Full Text Available Recentemente tem havido um aumento no interesse, tanto no meio acadêmico quanto na indústria, em aplicações de aprendizagem de máquina e técnicas de inteligência artificial relacionadas com problemas agrícolas. Mineração de texto e técnicas relacionadas com o processamento da língua natural, raramente foram usadas para resolver problemas agrícolas, e muito menos para a língua portuguesa. É possível que um dos fatores que influenciam a escassez no uso técnicas de mineração de texto, para analisar textos em português e resolver problemas agrícolas, pode ser devido à falta de um corpus anotado livremente disponível. Para colmatar a falta de um corpus agrícola em língua portuguesa, estamos liberando um recurso em português-brasileiro voltado para agricultura, descrito neste artigo. O corpus abrange um período parcialmente contínuo de tempo entre 1996 e 2016, consistindo de notícias em português-brasileiro que foram anotadas com o seguinte tipo de informação: causal, sentimento, entidades nomeadas que incluem expressões temporais. O corpus tem recursos adicionais como: treebank, listas de termos frequentes (sem stop-words: unigramas, bigramas e trigramas, bem como palavras ou frases que foram identificados por jornalistas como de domínio específico. Espera-se que a liberação do corpus estimule a adoção da mineração de texto na agricultura na comunidade de pesquisa lusófona.
Naughton, Felix; Cooper, Sue; Foster, Katharine; Emery, Joanne; Leonardi-Bee, Jo; Sutton, Stephen; Jones, Matthew; Ussher, Michael; Whitemore, Rachel; Leighton, Matthew; Montgomery, Alan; Parrott, Steve; Coleman, Tim
To estimate the effectiveness of pregnancy smoking cessation support delivered by short message service (SMS) text message and key parameters needed to plan a definitive trial. Multi-centre, parallel-group, single-blinded, individual randomized controlled trial. Sixteen antenatal clinics in England. Four hundred and seven participants were randomized to the intervention (n = 203) or usual care (n = 204). Eligible women were 5 pre-pregnancy), were able to receive and understand English SMS texts and were not already using text-based cessation support. All participants received a smoking cessation leaflet; intervention participants also received a 12-week programme of individually tailored, automated, interactive, self-help smoking cessation text messages (MiQuit). Seven smoking outcomes, including validated continuous abstinence from 4 weeks post-randomization until 36 weeks gestation, design parameters for a future trial and cost-per-quitter. Using the validated, continuous abstinence outcome, 5.4% (11 of 203) of MiQuit participants were abstinent versus 2.0% (four of 204) of usual care participants [odds ratio (OR) = 2.7, 95% confidence interval (CI) = 0.93-9.35]. The Bayes factor for this outcome was 2.23. Completeness of follow-up at 36 weeks gestation was similar in both groups; provision of self-report smoking data was 64% (MiQuit) and 65% (usual care) and abstinence validation rates were 56% (MiQuit) and 61% (usual care). The incremental cost-per-quitter was £133.53 (95% CI = -£395.78 to 843.62). There was some evidence, although not conclusive, that a text-messaging programme may increase cessation rates in pregnant smokers when provided alongside routine NHS cessation care. © 2017 The Authors. Addiction published by John Wiley & Sons Ltd on behalf of Society for the Study of Addiction.
Full Text Available Abstract Background Biomedical literature, e.g., MEDLINE, contains a wealth of knowledge regarding functions of proteins. Major recurring biological concepts within such text corpora represent the domains of this body of knowledge. The goal of this research is to identify the major biological topics/concepts from a corpus of protein-related MEDLINE© titles and abstracts by applying a probabilistic topic model. Results The latent Dirichlet allocation (LDA model was applied to the corpus. Based on the Bayesian model selection, 300 major topics were extracted from the corpus. The majority of identified topics/concepts was found to be semantically coherent and most represented biological objects or concepts. The identified topics/concepts were further mapped to the controlled vocabulary of the Gene Ontology (GO terms based on mutual information. Conclusion The major and recurring biological concepts within a collection of MEDLINE documents can be extracted by the LDA model. The identified topics/concepts provide parsimonious and semantically-enriched representation of the texts in a semantic space with reduced dimensionality and can be used to index text.
Ogura, Koichiro; Yamamoto, Isao; Hara, Makoto; Suzuki, Yoshio; Nakane, Toshichi; Watanabe, Masao.
The value of the computerized tomography (CT) in the diagnosis of the intracerebral hematoma has been well documented. However, there is little report about the CT findings of the hematoma of the corpus callosum. This report presents two cases of the traumatic hematoma in the corpus callosum and is discussed about their CT findings. Two patients, 52 year-old male and 40 year-old male, respectively, are the cases of blunt mechanical head trauma which were accompanied neither by skull fracture nor by scalp injury. In all these cases, the findings that hematoma occupied from the genu to the body of the corpus callosum were verified by surgery and the axial CT revealed the following two similar findings. First; the hematoma in the genu of the corpus callosum was shown as a cresent-shaped high density mass. This finding seems to be due to the following anatomical structure, that is, the genu of the corpus callosum is located just in front of the anterior horn of the lateral ventricles in the shape of the convex towards posteriorly. Second; as the midportion of the body of the corpus callosum tends to be appeared narrow in width between both lateral ventricles, the hematoma which extended from the genu towards the body of the corpus callosum was shown as a dumbbell-shaped high density mass. (author)
Konnova Mariya Nikolaevna
Full Text Available In the scope of complex cognitive and linguoculturological approach, aiming at investigating the triple unity of language, mind and culture, the author analyzes cognitive mechanisms of change in the meaning of New Testament saying "Dovleet dnevi zloba yego" (Mf. 6: 34 / "Sufficient for the day is the evil thereof" (St. Matthew 6: 34. This approach provides deeper insight into the essence of mental schemes underlying the process of lexicalisation of biblical micro-texts both as fixed phrases (quotations and idioms. The semantic shifts of microdiachronic character, which touched upon the semantic structure of biblical idiomatic expressions in 19-20th centuries and led to substantial restructuring of axiological and temporal components of meaning, are analyzed on the data of Russian National Corpus. The author proves that the use of biblical quotations outside their original context leads to their complete semantic transformation. The loss of original meaning is connected with the loss of key axiological and temporal characteristics typical for New Testament texts.
Full Text Available The theme of the article is the preparation of a stemming algorithm for Slovenian library science texts. The procedure consisted of three phases: learning, testing and evaluation.The preparation of the optimal stemmer for Slovenian texts from the field of library science is presented, its testing and comparison with two other stemmers for the Slovenian language: the Popovič stemmer and the Generic stemmer. A corpus of 790.000 words from the field of library science was used for learning. Lists of stems, word endings and stop-words were built. In the testing phase, the component parts of the algorithm were tested on an additional corpus of 167.000 words. In the evaluation phase, a comparison of the three stemmers processing the same word corpus was made. The results of each stemmer were compared with an intellectually prepared control result of the stemming of the corpus. It consisted of groups of semantically connected words with no errors. Understemming was especially monitored – the number of stems for semantically connected words, produced by an algorithm. The results were statistically processed with the Kruskal-Wallis test. The Optimal stemmer produced the best results.It matched best with the reference results and also gave the smallest number of stems for one semantic meaning. The Popovič stemmer followed closely. The Generic stemmer proved to be the least accurate. The procedures described in the thesis can represent a platform for the development of the tools for automatic indexing and retrieval for library science texts in Slovenian language.
Corpus linguistics, systemic functional grammar and literary meaning: a critical analysis of harry potter and the philosopher’s stone Corpus linguistics, systemic functional grammar and literary meaning: a critical analysis of harry potter and the philosopher’s stone
Full Text Available The research reported in this paper has two aims. First, to show how corpus linguistics, using word frequency and concordance data, which is then analysed according to transitivity systems of systemic functional grammar (SFG, can be useful to the enterprise of critical linguistics. Second, to investigate to what extent this critical corpus linguistics (CCL gives a valid representation of the meanings and ideologies of a literary text. The hypothesis tested is that semiotic models of communication, in this case of popular children’s literature, with their emphasis on the encoding and decoding of meanings, lend themselves to a corpus linguistics approach. But that, in fact, these mutually reinforcing approaches (SFG and CCL with their reliance on what is encoded as text cannot entirely succeed in accounting for how literature, in particular, is understood and interpreted, and how ideology works within it and behind it. For a richer critical discourse analysis we need a pragmatic account, for example an analysis of presupposition, inference and propositional attitude. The issues here will be discussed in the light of recent debate between Michael Stubbs and Henry Widdowson on the strengths and limitations of corpus linguistics in critical discourse analysis. The research reported in this paper has two aims. First, to show how corpus linguistics, using word frequency and concordance data, which is then analysed according to transitivity systems of systemic functional grammar (SFG, can be useful to the enterprise of critical linguistics. Second, to investigate to what extent this critical corpus linguistics (CCL gives a valid representation of the meanings and ideologies of a literary text. The hypothesis tested is that semiotic models of communication, in this case of popular children’s literature, with their emphasis on the encoding and decoding of meanings, lend themselves to a corpus linguistics approach. But that, in fact, these
Full Text Available This paper aims to offer a new approach to the aspectual category of states based on Catalan data extracted from corpus. The goal is twofold: firstly, to point out that states constitute a gradual category; and secondly, to highlight that syntactic variability within the stative predicates category receives a more understandable and clear explanation if the different possibilities of situation conceptualization are taken into account.
Ankara : The Program of Teaching English as a Foreign Language Bilkent University, 2015. Thesis (Master's) -- Bilkent University, 2015. Includes bibliographical references leaves 83-91. This study investigated the effectiveness of the use of a concordance software and concordance lines as a pedagogical tool to learn the target vocabulary of a text book. The purpose of the study was to compare the effects of corpus-aided vocabulary instruction with traditional vocabulary teac...
Aquest article fa un repàs als plantejaments actuals sobre l'ús del web com a corpus lingüístic i emfatitza els avantatges (així com els inevitables riscos) que aquests poden introduir en el treball del traductor. Per tal d'il•lustrar aquest punt, es mostra un exemple de les diferents maneres en què un corpus derivat del web es pot aplicar profitosament a una tasca de traducció especialitzada.. Este artículo estudia los planteamientos actuales sobre el uso de la web como corpus lingüístico...
Główny nacisk położono na opis haseł słownikowych, zawierający istotne uwagi praktyczne i teoretyczne. Omówiono poszczególne części hasła słownikowego, po czym umieszczono komentarz odnoszący się do różnych kwestii związanych z daną częścią (np. wybór hasła wyrazowego i przedstawienia wersji pisowni i do przyjętych rozwiązań. Szczególną uwagę poświęcono główce hasła, objaśnieniu znaczenia wynikającego z przykładów występujacych w korpusie, różnym rodzajom kolokacji i ich przedstawieniu w słowniku, jak też informacjom etymologicznym. Na końcu zamieszczono zwięzły przegląd oprogramowania słownikowego TLex 2013, oparty na doświadczeniu autorów, zdobytym podczas pracy z tym narzędziem.
Sahraian, Mohammad Ali; Moghadasi, Abdorreza Naser; Owji, Mahsa; Naghshineh, Hoda; Minagar, Alireza
Neuromyelitis optica is a demyelinating disease of the central nervous system with various patterns of brain lesions. Corpus callosum may be involved in both multiple sclerosis and neuromyelitis optica. Previous case reports have demonstrated that callosal lesions in neuromyelitis optica are usually large and edematous and have a heterogeneous intensity showing a "marbled pattern" in the acute phase. Their size and intensity may reduce with time or disappear in the chronic stages. In this report, we describe a case of a 25-year-old Caucasian man with neuromyelitis optica who presented clinically with optic neuritis and myelitis. His brain magnetic resonance imaging demonstrated linear enhancement of the corpus callosum. Brain images with contrast agent added also showed linear ependymal layer enhancement of the lateral ventricles, which has been reported in this disease previously. Linear enhancement of corpus callosum in magnetic resonance imaging with contrast agent could help in diagnosing neuromyelitis optica and differentiating it from other demyelinating disease, especially multiple sclerosis.
Bruner, Emiliano; de la Cuétara, José Manuel; Colom, Roberto; Martin-Loeches, Manuel
The corpus callosum displays considerable morphological variability between individuals. Although some characteristics are thought to differ between male and female brains, there is no agreement regarding the source of this variation. Biomedical imaging and geometric morphometrics have provided tools to investigate shape and size variation in terms of integration and correlation. Here we analyze variations at the midsagittal outline of the corpus callosum in a sample of 102 young adults in order to describe and quantify the pattern of covariation associated with its morphology. Our results suggest that the shape of the corpus callosum is characterized by low levels of morphological integration, which explains the large variability. In larger brains, a minor allometric component involves a relative reduction of the splenium. Small differences between males and?females are associated with this allometric pattern, induced primarily by size variation rather than gender-specific characteristics. PMID:22296183
Patricia Sotelo Dios
Full Text Available Neste artigo presento un proxecto de investigación que consiste na compilación e na explotación do corpus Veiga, un corpus multimedia de subtítulos en inglés e en galego. Trátase dun proxecto en fase de desenvolvemento que pretende servir como ferramenta para o estudo e a investigación de certos aspectos relacionados coa práctica da subtitulación intralingüística en inglés e da subtitulación interlingüística do inglés cara ao galego. O Veiga, inda que forma parte do corpus paralelo CLUVI, transcende o plano textual propio dos demais subcorpus do CLUVI e permite observar os subtítulos no seu estado natural, isto é, como parte dun produto audiovisual. Amais de cuestións relacionadas coa construción do corpus e co sistema de buscas, mencionarei algunha das posibles utilidades deste corpus para a práctica, a investigación e a formación en subtitulación.
Full Text Available Nombre de défis d’ordre méthodologique et épistémologique s’imposent aujourd’hui à la recherche scientifique orientée vers le travail sur des corpus numériques. Certes, chaque plateforme virtuelle présente une écologie spécifique (Paveau, 2013a, 2013b qui oriente un rapprochement différent tant de l’objet que du corpus. Ainsi, l’environnement Facebook (FBK, une surface essentiellement multiforme suggère un regard qui puisse saisir son hétérogénéité sémiotique et énonciative. Dans ce travail, nous visons, d’abord, une redéfinition de la notion de corpus comme « matrice du sens » (Mayaffre, 2011 : 11 permettant de focaliser les enjeux scientifiques que la conception des corpus numériques issus du Web 2.0, notamment, de FBK, entraîne; ensuite, la description de certains concepts méthodologiques et épistémologiques fondamentaux — linéarité, technodiscours, sérialité, réticularité — aidant à l’élaboration ainsi qu’à la gestion de corpus FBK.
Mammas, Ioannis N.; Spandidos, Demetrios A.
Hippocrates (Island of Kos, 460 B.C.-Larissa, 370 B.C.) is the founder of the most famous Medical School of the classical antiquity. In acknowledgement of his pioneering contribution to the new scientific field of Paediatric Virology, this article provides a systematic analysis of the Hippocratic Corpus, with particular focus on viral infections predominating in neonates and children. A mumps epidemic, affecting the island of Thasos in the 5th century B.C., is described in detail. ‘Herpes’, a medical term derived from the ancient Greek word ‘ἕρπειν’, meaning ‘to creep’ or ‘crawl’, is used to describe the spreading of cutaneous lesions in both childhood and adulthood. Cases of children with exanthema ‘resembling mosquito bites’ are presented in reference to varicella or smallpox infection. A variety of upper and lower respiratory tract viral infections are described with impressive accuracy, including rhinitis, pharyngitis, tonsillitis, laryngitis, bronchiolitis and bronchitis. The ‘cough of Perinthos’ epidemic, an influenza-like outbreak in the 5th century B.C., is also recorded and several cases complicated with pneumonia or fatal outcomes are discussed. Hippocrates, moreover, describes conjunctivitis, otitis, lymphadenitis, meningoencephalitis, febrile convulsions, gastroenteritis, hepatitis, poliomyelitis and skin warts, along with proposed treatment directions. Almost 2,400 years later, Hippocrates' systematic approach and methodical innovations can inspire paediatric trainees and future Paediatric Virology subspecialists. PMID:27446241
Li, Peipei; He, Lu; Wang, Haiyan; Hu, Xuegang; Zhang, Yuhong; Li, Lei; Wu, Xindong
Short text streams such as search snippets and micro blogs have been popular on the Web with the emergence of social media. Unlike traditional normal text streams, these data present the characteristics of short length, weak signal, high volume, high velocity, topic drift, etc. Short text stream classification is hence a very challenging and significant task. However, this challenge has received little attention from the research community. Therefore, a new feature extension approach is proposed for short text stream classification with the help of a large-scale semantic network obtained from a Web corpus. It is built on an incremental ensemble classification model for efficiency. First, more semantic contexts based on the senses of terms in short texts are introduced to make up of the data sparsity using the open semantic network, in which all terms are disambiguated by their semantics to reduce the noise impact. Second, a concept cluster-based topic drifting detection method is proposed to effectively track hidden topic drifts. Finally, extensive studies demonstrate that as compared to several well-known concept drifting detection methods in data stream, our approach can detect topic drifts effectively, and it enables handling short text streams effectively while maintaining the efficiency as compared to several state-of-the-art short text classification approaches.
Chinese students are the largest international student group in UK universities today, yet little is known about their undergraduate writing and the challenges they face. Drawing on the British Academic Written English corpus - a large corpus of proficient undergraduate student writing collected in the UK in the early 2000s - this study explores Chinese students' written assignments in English in a range of university disciplines, contrasting these with assignments from British students. The study is supplemented by questionnaire and interview datasets with discipline lecturers, writing tutors and students, and provides a comprehensive picture of the Chinese student writer today. Theoretically framed through work within academic literacies and lexical priming, the author seeks to explore what we know about Chinese students' writing and to extend these findings to undergraduate writing more generally. In a globalized educational environment, it is important for educators to understand differences in writing st...
Ottenhof, Sarah R; de Graaf, Petra; Soeterik, Timo F W; Neeter, Lidewij M F H; Zilverschoon, Marijn; Spinder, Matty; Bosch, J L H Ruud; Bleys, Ronald L A W; Heck-de Kort, Laetitia
PURPOSE: Urethral reconstruction is performed for urethral stricture or hypospadias correction. Research on urethral tissue engineering is increasing. Because the corpus spongiosum is important to support the urethra, urethral tissue engineering should ideally be combined with reconstruction of a
Full Text Available In our paper we present a corpus of transcribed Lithuanian parliamentary speeches. The corpus is prepared in a specific format, appropriate for different authorship identification tasks. The corpus consists of approximately 111 thousand texts (24 million words. Each text matches one parliamentary speech produced during an ordinary session from the period of 7 parliamentary terms starting on March 10, 1990 and ending on December 23, 2013. The texts are grouped into 147 categories corresponding to individual authors, therefore they can be used for authorship attribution tasks; besides, these texts are also grouped according to age, gender and political views, therefore they are also suitable for author profiling tasks. Whereas short texts complicate recognition of author speaking style and are ambiguous in relation to the style of other authors, we incorporated only texts containing not less than 100 words into the corpus. In order to make each category as comprehensive and representative as possible, we included only those authors, who produced speeches at least 200 times. All the texts are lemmatized, morphologically and syntactically annotated, tokenized into the character n-grams. The statistical information of the corpus is also available. We have also demonstrated that the created corpus can be effectively used in authorship attribution and author profiling tasks with supervised machine learning methods. The corpus structure also allows using it with unsupervised machine learning methods and can be used for creation of rule-based methods, as well as in different linguistic analyses.
Kantz, Deirdre; Marenzi, Ivana
This article presents the findings of a field experiment in medical English with first-year medical students at the University of Pavia, Northern Italy. Working in groups of 8-10, the students were asked to produce a corpus of medical texts in English demonstrating how the human body is itself a meaningful text (Baldry and Thibault 2006: Ch. 1).…
Full Text Available L’article propose une réflexion pratique et méthodologique sur l’exploitation d’un corpus de twittes, considéré comme un corpus complexe pour ses caractéristiques particulières (dont la présence des métadonnées et la possibilité de le mettre en relation avec des corpus plus traditionnels. Le modèle d’analyse quantitative et qualitative expérimenté sur le débat autour du mariage homosexuel en France en 2013 et en particulier sur la formule « mariage pour tous », ici mot-dièse et formule, veut poser les bases pour de nouvelles méthodes d’exploitation des données en analyse du discours.
Christodouloupoulos, Christos; Steedman, Mark
We describe the creation of a massively parallel corpus based on 100 translations of the Bible. We discuss some of the difficulties in acquiring and processing the raw material as well as the potential of the Bible as a corpus for natural language processing. Finally we present a statistical analysis of the corpora collected and a detailed comparison between the English translation and other English corpora.
Lee, Eun Ja; Kim, Ji Chang; Kim, Jong Chul; And Others
To evaluate, using magnetic resonance (MR) imaging, the clinal significance of the corpus callosum by measuring the size of various portions of the corpus callosum in children with cerebral palsy, and in paired controls. Fifty-two children (30 boys and 22 girls aged between six and 96 (median, 19) months) in whom cerebral palsy was clinically diagnosed underwent MR imaging. There were 23 term patients and 29 preterm, and the control group was selected by age and sex matching. Clinal subtypes of cerebral palsy were classified as hemiplegia (n=14), spastic diplegia (n=22), or spastic quadriplegia (n=16), and according to the severity of motor palsy, the condition was also classified as mild (n=26), moderate (n=13), or severe (n=13). In addition to the length and height of the corpus callosum, the thickness of its genu, body, transitional zone and splenium, as seen on midsagittal T1-weighted MR images, were also measured. Differences in the measured values of the two groups were statistically analysed and differences in the size of the corpus callosum according to the clinical severity and subtypes of cerebral palsy, and gestational age, were also assessed. Except for height, the measured values of the corpus callosum in patients with cerebral palsy were significantly less than those of the control group (p less than 0.05). Its size decreased according to the severity of motor palsy. Compared with term patients, the corpus callosum in preterm patients was considerably smaller (p less than 0.05). There was statistically significant correlation between the severity of motor palsy and the size of the corpus callosum. Quantitative evaluation of the corpus callosum might be a good indicator of neurologic prognosis, and a sensitive marker for assessing the extent of brain injury
Lee, Eun Ja; Kim, Ji Chang [The Catholic University of Korea, Seoul (Korea, Republic of); Kim, Jong Chul [School of Medicine, Chungnam National University, Taejon (Korea, Republic of); And Others
To evaluate, using magnetic resonance (MR) imaging, the clinal significance of the corpus callosum by measuring the size of various portions of the corpus callosum in children with cerebral palsy, and in paired controls. Fifty-two children (30 boys and 22 girls aged between six and 96 (median, 19) months) in whom cerebral palsy was clinically diagnosed underwent MR imaging. There were 23 term patients and 29 preterm, and the control group was selected by age and sex matching. Clinal subtypes of cerebral palsy were classified as hemiplegia (n=14), spastic diplegia (n=22), or spastic quadriplegia (n=16), and according to the severity of motor palsy, the condition was also classified as mild (n=26), moderate (n=13), or severe (n=13). In addition to the length and height of the corpus callosum, the thickness of its genu, body, transitional zone and splenium, as seen on midsagittal T1-weighted MR images, were also measured. Differences in the measured values of the two groups were statistically analysed and differences in the size of the corpus callosum according to the clinical severity and subtypes of cerebral palsy, and gestational age, were also assessed. Except for height, the measured values of the corpus callosum in patients with cerebral palsy were significantly less than those of the control group (p less than 0.05). Its size decreased according to the severity of motor palsy. Compared with term patients, the corpus callosum in preterm patients was considerably smaller (p less than 0.05). There was statistically significant correlation between the severity of motor palsy and the size of the corpus callosum. Quantitative evaluation of the corpus callosum might be a good indicator of neurologic prognosis, and a sensitive marker for assessing the extent of brain injury.
In this article it is shown how a corpus-based dictionary grammar may be compiled — that is, a mini-grammar fully based on corpus data and specifically written for use in and inte-grated with a dictionary. Such an effort is, to the best of our knowledge, a world's first. We exem-plify our approach for a Northern Sotho ...
Describes a project to make a corpus of English spoken as a lingua franca in university settings in Finland. This corpus is one of the first to address the need for corpora that show the target for English-as-a-Foreign-Language learners whose goal is not to speak with native speakers but to interact in communities where English is a lingua franca.…
Full Text Available The aim of the present study is to demonstrate the usage of an annotated corpus in the field of experimental psycholinguistics. Specifically, we demonstrate how the manually annotated Corpus of Serbian Language (Kostić, Đ. 2001 can be used for probability estimates of grammatical forms, which allow the control of independent variables in psycholinguistic experiments. We address the issue of processing Serbian inflected forms within two subparadigms of feminine nouns. In regression analysis, almost all processing variability of inflected forms has been accounted for by the amount of information (i.e. bits carried by the presented forms. In spite of the fact that probability distributions of inflected forms for the two paradigms differ, it was shown that the best prediction of processing variability is obtained by the probabilities derived from the predominant subparadigm which encompasses about 80% of feminine nouns. The relevance of annotated corpora in experimental psycholinguistics is discussed more in detail .
Hardeep Singh Malhotra
Full Text Available Transient signal abnormality in the splenium of corpus callosum on magnetic resonance imaging (MRI is occasionally encountered in clinical practice. It has been reported in various clinical conditions apart from patients with epilepsy. We describe 4 patients with different etiologies presenting with signal changes in the splenium of corpus callosum. They were diagnosed as having progressive myoclonic epilepsy (case 1, localization-related epilepsy (case 2, hemicrania continua (case 3, and postinfectious parkinsonism (case 4. While three patients had complete involvement of the splenium on diffusion-weighted image ("boomerang sign", the patient having hemicrania continua showed semilunar involvement ("mini-boomerang" on T2-weighted and FLAIR image. All the cases had noncontiguous involvement of the splenium. We herein, discuss these cases with transient splenial involvement and stress that such patients do not need aggressive diagnostic and therapeutic interventions. An attempt has been made to review the literature regarding the pathophysiology, etiology, and outcome of such lesions.
SARDINHA Tony Berber
Full Text Available O presente trabalho oferece uma retrospectiva da Lingüística de Corpus, uma área de pesquisa que tem experimentado um crescimento vertiginoso nos últimos anos e que tem tido um impacto considerável na lingüística. A retrospectiva inclui tanto um painel histórico quanto um posicionamento em relação aos debates correntes e desenvolvimentos futuros da área. Os conceitos principais em voga na área são apresentados e discutidos. O trabalho ainda comenta os fatos mais marcantes na Lingüística de Corpus em relação à teoria e à prática, elencando os principais corpora em existência bem como as mais importantes contribuições no campo de programas de computador para análise e exploração desses corpora.
Pilar Mur Dueñas
Full Text Available The ultimate aim of intercultural analyses in English for Academic Purposes is to help non-native scholars function successfully in the international disciplinary community in English. The aim of this paper is to show how corpus-based intercultural analyses can be useful to design EAP materials on a particular metadiscourse category, logical markers, in research article writing. The paper first describes the analysis carried out of additive, contrastive and consecutive logical markers in a corpus of research articles in English and in Spanish in a particular discipline, Business Management. Differences were found in their frequency and also in the use of each of the sub-categories. Then, five activities designed on the basis of these results are presented. They are aimed at raising Spanish Business scholars' awareness of the specific uses and pragmatic function of frequent logical markers in international research articles in English.
Full Text Available The feature fusion from separate source is the current technical difficulties of cross-corpus speech emotion recognition. The purpose of this paper is to, based on Deep Belief Nets (DBN in Deep Learning, use the emotional information hiding in speech spectrum diagram (spectrogram as image features and then implement feature fusion with the traditional emotion features. First, based on the spectrogram analysis by STB/Itti model, the new spectrogram features are extracted from the color, the brightness, and the orientation, respectively; then using two alternative DBN models they fuse the traditional and the spectrogram features, which increase the scale of the feature subset and the characterization ability of emotion. Through the experiment on ABC database and Chinese corpora, the new feature subset compared with traditional speech emotion features, the recognition result on cross-corpus, distinctly advances by 8.8%. The method proposed provides a new idea for feature fusion of emotion recognition.
Full Text Available There are many practical reasons why experiences of a given musical work tend to be heard repeatedly at the same pitch transposition level, especially recordings of musical works. Yet here, a corpus study is presented that challenges this very basic assumption of music perception. In 2011, an initial corpus of 100 user-posted YouTube videos was collected in order to investigate the prevalence of transposition and tempo alterations within these videos. Results found 42% of these videos contained nominal changes of pitch (36% and/or tempo (22%. Using the same methodology, a follow-up study was performed in 2015 and found only that 24% of user-posted videos contained these same alterations. Implications for these observations are discussed in light of musical communication models, YouTubeology, and absolute pitch memory.
Wang, Huaquan; Guo, Lifang; Shao, Zonghong
Aplastic anemia is a rare hematopoietic stem-cell disorder that results in pancytopenia and hypocellular bone marrow. Women with aplastic anemia usually are at increased risk of corpus luteum rupture due to thrombocytopenia and infection. Here we report two cases had hemoperitoneum from corpus luteum rupture in patients with aplastic anemia in our center. Case 1 involved two episodes of hemoperitoneum resulting from rupture of the corpus luteum in a 23-year-old unmarried female with severe aplastic anemia. This patient was managed conservatively with platelet and packed red cell transfusion. Case 2 involved two episodes of hemoperitoneum resulting from rupture of the corpus luteum in a 33-year-old married patient with aplastic anemia. Emergency laparoscopy revealed massive hemoperitoneum. Bilateral salpingo-oophorectomy were performed successively with platelet and packed red cell transfusion. Hemoperitoneum resulting from a ruptured corpus luteum is a life-threatening condition in patients with aplastic anemia. Prompt and appropriate evaluation of corpus luteum rupture and emergent therapy are needed.
Kwary, Deny A
This data article presents a corpus (i.e. a selection of a big number of words in an electronic form) and a concordancer (i.e. a tool to show the word in its context of use) of academic journal articles. As the title suggests, the data were collected from research articles published in academic journals. The corpus contains 5,686,428 words selected from 895 journal articles published by Elsevier in 2011-2015. The corpus is classified into four subject areas: Health sciences, Life sciences, Physical Sciences, and Social Sciences, following the classifications of Scopus, which is the largest abstract and citation database of peer-reviewed scientific journals, books and conference proceedings. To ease the access and utilization of the corpus, a program to produce the key word in context (KWIC) and word frequency was created and placed on the website: corpus.kwary.net. The corpus is a valuable resource for researchers, teachers, and translators working on academic English.
Beasley, Robert E.
The purpose of this study was to investigate the use of symbolic expressions (e.g., "BTW," "LOL," "UR") in an SMS text messaging corpus consisting of over 10,000 text messages. More specifically, the purpose was to determine, not only how frequently these symbolic expressions are used, but how they are utilized in terms of the language functions…
Full Text Available ResumenLas Islas Canarias (España siempre han mantenido un estrecho contactocon el mundo anglosajón, lo que ha generado importantes consecuencias económicas, así como también socioculturales, lingüísticas y literarias. Un análisis de la bibliografía inglesa sobre Canarias revela, entre otros aspectos, la tendencia al uso de hispanismos y canarismos. Este trabajo ofrece el registro de esas voces que aparecen en un corpus de catorce obras tomadas de la extensa bibliografía anglocanaria. Tras revisar brevemente la relevancia del hispanismo inglés, nuestra recopilación intenta resaltar la contribución del español de Canarias al enriquecimiento del vocabulario de la lengua inglesa, constatando cuáles de los hispanismos de nuestro corpus que son canarismos han pasado al registro lexicográfico realizado por elShorter Oxford English Dictionary on Historical Principles (2007.AbstractThe Canary Islands (Spain have always been in close contact with the Anglo-Saxon world, which has had important consequences for the economy but also at the socio-cultural, linguistic and literary levels. A review of the English bibliography on the Canaries reveals, among other aspects, a tendency in most authors to use hispanicisms and canarianisms in their texts. This article offers a record of those words which appear in a corpus of fourteen works taken from this extensive bibliography. Apart from providing an overview of the studies on hispanicisms in English, this paper’s main aim is to highlight the contribution of Canarian Spanish to the enrichment of the vocabulary of English by checking which of the hispanicisms in our corpus, which are actually canarianisms, have been included in the lexical repertoire of the Shorter Oxford English Dictionaryon Historical Principles (2007.
Webster M. Mavhu
Full Text Available
Abstract: With specific reference to Shona monolingual lexicography, this article discusses how corpus-based lexicographers might, in some instances, decide not strictly to adhere to the corpus when it comes to headword and sense treatment. The writer is a member of the African Languages Research Institute (ALRI, formerly known as the African Languages Lexical (ALLEX Project. ALRI is a nonfaculty interdisciplinary unit dedicated to research on and the development of African languages in Zimbabwe. The writer is part of the six-member team that compiled the now published Shona monolingual, synchronic, medium-sized and general-purpose dictionary Du-ramazwi Guru ReChiShona (2001. The article originates from the writer's experience of working on this dictionary. The article highlights the fact that being corpus-based does not necessarily imply being corpus-bound.
Keywords: CORPUS, CORPUS-BASED, FREQUENCY, HEADWORD, LEXICOGRAPHY, SENSE, SHONA, SLANG, SYNONYMS
Opsomming: Verontagsaming van die korpus: Trefwoord- en betekenisbe-handeling in die Sjona- eentalige leksikografie. Met spesifieke verwysing na die Sjona- eentalige leksikografie bespreek hierdie artikel hoe korpusgebaseerde leksikograwe in som-mige gevalle kan besluit om nie streng by die korpus te bly wanneer dit kom by trefwoord- en betekenisbehandeling nie. Die skrywer is 'n lid van die African Language Research Institute (AL-RI, vroeër bekend as die African Languages Lexical (ALLEX Project. ALRI is 'n niefakulteits- interdissiplinêre eenheid wat hom beywer vir navorsing oor en die ontwikkeling van die Afrikatale in Zimbabwe. Die skrywer is deel van 'n span van ses lede wat die reeds gepubliseerde Sjona- een-talige, sinchroniese, middelgroot en meerdoelige woordeboek Duramazwi Guru ReChiShona (2001 saamgestel het. Die artikel het uit die skrywer se ervaring van werk aan hierdie woordeboek ont-staan. Die artikel belig die feit dat korpusgebaseerdheid nie noodwendig
Song, Dong Hoon; Chang, Seung Kuk; Kim, Jong Deok; Eun, Tchoong Kie; Park, Dong Woo
To measure the size of normal corpus callosum in each portion using objective and reproducible method with MRI and evaluation of morphological change of corpus callosum by grade of hydrocephalus. Midsagittal T1-weighted MR imaging of the corpus callosum was investigated in 41 volunteers of normal Korean adults and 19 patients with hydrocephalus. Corpus callosum was measured for the anteroposterior length(A), height(B), and the thickness of genu(C), body(D), splenium(E), and the narrowest portion of body(F). And the analysis of morphology and signal intensity of the corpus callosum were also evaluated. Hydrocephalus was graded as mild, moderate, and severe, and comparison of thickness with normal corpus callosum in each portion was done. The mean length and height were 72.3 mm, 28.6 mm in male, and 70.7 mm, 28.9 mm in female. And the mean dimension for C, D, E and F were 13.1 mm, 8 mm, 13.2 mm, 5.2 mm in male, and 12.8 mm, 7.5 mm, 12.3 mm, 5 mm in female. The morphology of normal corpus callosum was 'hook' shaped on midline sagittal T1-weighted image. Narrowing at posterior third portion of body were present on 30 cases(73.2%) and even in thickness of the body in 11 cases(26.8%). The signal intensity of the corpus callosum on midsagittal T1-weighted spin echo image of normal cases was homogeneous hyperintense as compared with cerebral gray matter. In hydrocephalus, A and B were increased and other portions were decreased in thickness. Genu and the narrowest portion of body showed significant difference of thickness according to the grade of hydrocephalus. The mean dimension of all portion of corpus callosum were larger in male than female except for callosal height but not significant statistically with the exception of splenium. Hydrocephalus lead to morphological change of the corpus callosum. Among the portion of corpus callosum, genu and the narrowest portion of the body were thought to be the most sensitive indicators of degree in hydrocephalus
Amiri, Ali Akbar; Gilanpour, Hassan; Veshkini, Abbas
The aim of this study was to determine the drainage routes of the corpus cvernosum penis and the corpus spongiosum penis in the cat using contrast cavernosography. Five male cats, 1.5-2.5 years old, weighing between 4.5 and 5.5 kg were investigated. The cats were anesthetized and the root and the proximal part of the penis were exposed by an incision on the perineum reaching the scrotum. Each cat was radiographed in lateral and dorsal recumbency before and during injection of contrast medium into the erectile bodies. The corpus spongiosum penis was injected at the bulb of the penis and the corpus cavernosum penis at the root. Injection of contrast media into the cavernous bodies showed that both the external and internal iliac veins drain the erectile bodies into the caudal vena cava. Drainage from the corpus spongiosum penis was from the bulb for the proximal part and from the glans for the distal part. The corpus cavernosum penis was drained only proximally, from the crura. There was a network of veins above the pelvic symphysis and the drainage of erectile bodies where through various routes into the internal and external iliac veins.
Full Text Available Axon diameter is an important neuroanatomical characteristic of the nervous system that alters in the course of neurological disorders such as multiple sclerosis. Axon diameters vary, even within a fiber bundle, and are not normally distributed. An accurate distribution function is therefore beneficial, either to describe axon diameters that are obtained from a direct measurement technique (e.g., microscopy, or to infer them indirectly (e.g., using diffusion-weighted MRI. The gamma distribution is a common choice for this purpose (particularly for the inferential approach because it resembles the distribution profile of measured axon diameters which has been consistently shown to be non-negative and right-skewed. In this study we compared a wide range of parametric probability distribution functions against empirical data obtained from electron microscopy images. We observed that the gamma distribution fails to accurately describe the main characteristics of the axon diameter distribution, such as location and scale of the mode and the profile of distribution tails. We also found that the generalized extreme value distribution consistently fitted the measured distribution better than other distribution functions. This suggests that there may be distinct subpopulations of axons in the corpus callosum, each with their own distribution profiles. In addition, we observed that several other distributions outperformed the gamma distribution, yet had the same number of unknown parameters; these were the inverse Gaussian, log normal, log logistic and Birnbaum-Saunders distributions.
Timothy J Herron
Full Text Available The corpus callosum includes the majority of fibers that connect the two cortical hemispheres. Studies of cross-sectional callosal morphometry and area have revealed developmental, gender, and hemispheric differences in healthy populations and callosal deficits associated with neurodegenerative disease and brain injury. However, accurate quantification of the callosum using magnetic resonance imaging is complicated by intersubject variability in callosal size, shape, and location and often requires manual outlining of the callosum in order to achieve adequate performance. Here we describe an objective, fully automated protocol that utilizes voxel-based image to quantify the area and thickness both of the entire callosum and of different callosal compartments. We verify the method’s accuracy, reliability, robustness and multisite consistency and make comparisons with manual measurements using public brain-image databases. An analysis of age-related changes in the callosum showed increases in length and reductions in thickness and area with age. A comparison of older subjects with and without mild dementia revealed that reductions in anterior callosal area independently predicted poorer cognitive performance after factoring out Mini-Mental Status Examination scores and normalized whole brain volume. Open-source software implementing the algorithm is available at www.nitrc.org/projects/c8c8.
Full Text Available Currently there are very few specialised corpora of literary texts that are tailored to the needs of literary critics who are interested in corpus stylistic analyses of prose fiction. Many existing corpora including literary texts were compiled for linguistic research interests and are often unsuitable for corpus stylistic purposes. The paper addresses three of the main problems: the absence of labelling of the texts for literary genre, the use of extracts, and the prevalence of linguistic periodisation schemes. C18P is a corpus of prose fiction designed specifically to address these issues. It traces the early development of the novel from 1700 up until the Victorian era. It can, for instance, be used for an analysis of the characteristic linguistic features of individual literary genres and forms. The following paper introduces the design of the corpus as well as some of its potential uses.
Full Text Available Abstract Background Using a rat model we have found that the bioflavonoid silymarin (SY ameliorates some of the negative consequences of in utero exposure to ethanol (EtOH. In the current study our aim was to determine if laterality preference and corpus callosum development were altered in rat offspring whose mothers were provided with a concomitant administration of SY with EtOH throughout gestation. Methods We provided pregnant Fisher/344 rats with liquid diets containing 35% ethanol derived calories (EDC throughout the gestational period. A silymarin/phospholipid compound containing 29.8% silybin was co administered with EtOH to a separate experimental group. We tested the offspring for laterality preference at age 12 weeks. After testing the rats were sacrificed and their brains perfused for later corpus callosum extraction. Results We observed incomplete development of the splenium in the EtOH-only offspring. Callosal development was complete in all other treatment groups. Rats from the EtOH-only group displayed a left paw preference; whereas control rats were evenly divided between right and left paw preference. Inexplicably both SY groups were largely right paw preferring. Conclusions The addition of SY to the EtOH liquid diet did confer some ameliorative effects upon the developing fetal rat brain.
Full Text Available The rising prevalence of high throughput screening and the general inability of (1 two dimensional (2D cell culture and (2 in vitro release studies to predict in vivo neurobiological and pharmacokinetic responses in humans has led to greater interest in more realistic three dimensional (3D benchtop platforms. Advantages of 3D human cell culture over its 2D analogue, or even animal models, include taking the effects of microgeometry and long-range topological features into consideration. In the era of personalized medicine, it has become increasingly valuable to screen candidate molecules and synergistic therapeutics at a patient-specific level, in particular for diseases that manifest in highly variable ways. The lack of established standards and the relatively arbitrary choice of probing conditions has limited in vitro drug release to a largely qualitative assessment as opposed to a predictive, quantitative measure of pharmacokinetics and pharmacodynamics in tissue. Here we report the methods used in the rapid, low-cost development of a 3D model of a mucopolysaccharidosis type I patient’s corpus callosum, which may be used for cell culture and drug release. The CAD model is developed from in vivo brain MRI tracing of the corpus callosum using open-source software, printed with poly (lactic-acid on a Makerbot Replicator 5X, UV-sterilized, and coated with poly (lysine for cellular adhesion. Adaptations of material and 3D printer for expanded applications are also discussed.
Silva, Catarina Helena Branco Simões da
Tese de doutoramento em Engenharia Informática apresentada à Fac. de Ciências e Tecnologia da Univ. de Coimbra Nas últimas décadas a disponibilidade e importância dos textos em formato digital tem vindo a aumentar exponencialmente, encontrando-se neste momento presentes em quase todos os aspectos da vida moderna. A classificação de textos é deste modo uma área activa de investigação, justificada por muitas aplicações reais. Ainda assim, lidar com a sobrecarga de textos em formato digital e...
Full Text Available Abstract – Conjunctions as fundamental elements in the construction of discourse cohesion represent a relatively neglected research area, due to their complexity and the bewildering number of “conjunctive relations” (Halliday and Hasan 1976: 226 that they may express in context, as also highlighted in Christiansen (2011. In addition to this, there does not seem to be a shared view as far as the classification and denomination of the different kinds of conjunctions are concerned (cf. Halliday and Hasan 1976; Vande Kopple 1985; Martin and Rose 2003; Hyland 2005b. The selection of a specific type of conjunction acquires more importance because they are typically open to so many different interpretations, especially when the participants in the speech event come from diverse lingua-cultural backgrounds (cf. Guido 2007; Guido 2008; Cogo et al. 2011.Following the taxonomy provided by Halliday and Hasan (1976 for conjunctions, our study attempts to shed light on the usage of conjunctions by ELF speakers in specific contexts. We shall consider ten transcripts taken from the VOICE Corpus (Seidlhofer et. al 2013, namely five interviews and five conversations in multicultural academic contexts (approximately 4,000 words each, and analyze the number of instances for each type of conjunction (additive, adversative, clausal, temporal as well as continuatives in depth, by adopting a quantitative as well as a qualitative method and by using TextSTAT 2.9 (Huning 2012. We shall then move on to the analysis of conjunctions with respect to their internal properties/collocates and eventually see the occurrence of conjunctions by comparing them with the two different speech events which are chosen as the subject of our study, i.e. interviews and conversations. We shall see the extent to which certain conjunctions are more restricted than others in terms of usage (cf. Leung 2005 in both types of speech events, despite the great number of options available to the
Full Text Available Purpose: To investigate whether relaxation of the rat penile corpus cavernosum could be controlled with NOBL-1, a novel, lightcontrollable nitric oxide (NO releaser. Materials and Methods: Fifteen-week-old male Wistar-ST rats were used. The penile corpus cavernosum was prepared and used in an isometric tension study. After noradrenaline (10−5 M achieved precontraction, the penile corpus cavernosum was irradiated by light (470–500 nm with and without NOBL-1 (10−6 M. In addition, we noted rats’ responses to light with vardenafil (10−6 M, a phosphodiesterase-5 (PDE-5 inhibitor. Next, responses to light in the presence of a guanylate cyclase inhibitor, ODQ (1H-[1,2,4] oxadiazolo[4,3-a]quinoxalin-1-one (10−5 M, were measured. All measurements were performed in pretreated L-NAME (10−4 M conditions to inhibit endogenous NO production. Results: Corpus cavernosal smooth muscle, precontracted with noradrenaline, was unchanged by light irradiation in the absence of NOBL-1. However, in the presence of NOBL-1, corpus cavernosal smooth muscle, precontracted with noradrenaline, relaxed in response to light irradiation. After blue light irradiation ceased, tension returned. In addition, the light response was obviously enhanced in the presence of a PDE-5 inhibitor. Conclusions: This study showed that rat corpus cavernosal smooth muscle relaxation can be light-controlled using NOBL-1, a novel, light sensitive NO releaser. Though further in vivo studies are needed to investigate possible usefulness, NOBL-1 may be prove to be a useful tool for erectile dysfunction therapy, specifically in the field of penile rehabilitation.
Full Text Available Background and Aim: The purpose of the present study was to investigate the effect of vestibulo-proprioceptive stimulations of sensory integration theory on the development of gross and fine motor, language and personal-social functions in a child with agenesis of the corpus callosum.Case: We report a 10.5 month old boy with agenesis of the corpus callosum. The intervention was administered based on sensory integration theory an hour a week for 20 weeks. The exercise intervention consisted of proprioceptive and linear, sustained and low frequency vestibular stimulations on suspension device and physio roll. A Denver Developmental Screening- II and milestones skill testing was completed pre-intervention and monthly. Post-intervention, age of gross motor, fine motor adaptive, language, and personal-social functions significantly improved. Based on milestones skills, maintenance of gross motor functions (e.g. sitting and quadruped position improved. The child could roll from side to side and released objects voluntarily. The reaction time to auditory stimulations became less than 2 seconds.Conclusion: vestibulo-proprioceptive stimulations using the neuroplasticity ability of the central nervous system is effective for development of gross and fine motor, language, and personal-social functions. These exercises can be administered for a child with agenesis of the corpus callosum.
Full Text Available In the present study, we aimed to investigate the difference in white matter between smokers and nonsmokers. In addition, we examined relationships between white matter integrity and nicotine dependence parameters in smoking subjects. Nineteen male smokers were enrolled in this study. Eighteen age-matched non-smokers with no current or past psychiatric history were included as controls. Diffusion tensor imaging scans were performed, and the analysis was conducted using a tract-based special statistics approach. Compared with nonsmokers, smokers exhibited a significant decrease in fractional anisotropy (FA throughout the whole corpus callosum. There were no significant differences in radial diffusivity or axial diffusivity between the two groups. There was a significant negative correlation between FA in the whole corpus callosum and the amount of tobacco use (cigarettes/day; R = - 0.580, p = 0.023. These results suggest that the corpus callosum may be one of the key areas influenced by chronic smoking.
Natalia Veniaminovna Khokhlova
Full Text Available The research aimed at studying the use of abstract nouns in the Englishmen’s speech from the standpoint of sociolinguistics. The article introduces a new, sociolinguistic, approach to research of abstract nouns; it is also the first time they are studied in a language corpus. The first stage of the research was based on fiction literary works: abstract nouns were extracted of analysis from the statements of the characters belonging to the opposite social classes. Later, these data was compared with the results of the original corpus research based on the British national corpus: sentences with nouns were selected out of the conversational subcorpus of BNC and were further sorted into abstract, concrete and words denoting people. Then, their frequency and vocabulary was studied with regards to speakers’ age, gender and social standing. The results revealed that abstract words are used more often that concrete ones regardless of the speaker’s social characteristics, however, the size and content of vocabulary is different (it is generally more substantial in the speech of women and representatives of higher social classes. The results of this research can be used in elaborating a course of the English language or in teaching general linguistics, sociolinguistics and country studies.
Alessio Palmero Aprosio
Full Text Available In this work, we describe a methodology to interpret large persons’ networks extracted from text by classifying cliques using the DBpedia ontology. The approach relies on a combination of NLP, Semantic web technologies, and network analysis. The classification methodology that first starts from single nodes and then generalizes to cliques is effective in terms of performance and is able to deal also with nodes that are not linked to Wikipedia. The gold standard manually developed for evaluation shows that groups of co-occurring entities share in most of the cases a category that can be automatically assigned. This holds for both languages considered in this study. The outcome of this work may be of interest to enhance the readability of large networks and to provide an additional semantic layer on top of cliques. This would greatly help humanities scholars when dealing with large amounts of textual data that need to be interpreted or categorized. Furthermore, it represents an unsupervised approach to automatically extend DBpedia starting from a corpus.
Full Text Available
Résumé: Les corpus sont à la base de la plupart des recherches en linguistique et particulièrementlexicographique. La compilation d'un corpus est une activité spécialisée dont dépend lerésultat de la recherche en question. Le sujet de cet article est la compilation du corpus lexicographiquedans les langues à tradition orale, et exige une démarche différente de celle ayant unelongue tradition écrite. De ce fait, ces dernières disposent d'une importante documentationpouvant servir comme base pour de nombreux sujets de recherche. L'auteur propose commeapproche une analyse qui permettrait de mieux rendre compte des spécificités lexicales etsémantiques des langues à tradition orale.Par le truchement de la production orale libre, l'auteur base ses hypothèses de recherche surune expérience en dialecte fang-mekè, une variante linguistique localisée au Gabon. Les résultatspermettent de mettre l'accent sur deux données essentielles du processus de compilation dans leslangues à tradition orale: les informateurs et la représentativité du corpus. Cette dernière, qui doits'exprimer à travers des champs lexicaux diversifiés mais également équilibrés, permettrait d'élaborerdes dictionnaires dans lesquels les locuteurs, qui en sont les premiers utilisateurs, doivent sereconnaître.
Mots-clés: CORPUS, LEXICOGRAPHIE, LANGUES À TRADITION ORALE, LANGUES ÀTRADITION ÉCRITE, INFORMATEURS, EXHAUSTIVITÉ, REPRÉSENTATIVITÉ, CHAMPSLEXICAUX, ORALITÉ, ÉCRITURE, MÉTHODE, DIALECTE FANG-MEKÈ, CORPUS ÉQUILIBRÉ.
Abstract: The Lexicographic Corpus in Languages with an Oral Tradition: The Case of the Dialect Fang-Mek?. Corpora form the basis of most linguistic and especially lexicographic research. The compilation of a corpus is a specialised activity on which depends the result of the research to be undertaken. The subject of this article is the compilation of a lexicographic corpus in languages with an oral tradition
Mølgaard, Lasse Lohilahti; Larsen, Jan; Goutte, Cyril
Detecting and tracking of temporal data is an important task in multiple applications. In this paper we study temporal text mining methods for Music Information Retrieval. We compare two ways of detecting the temporal latent semantics of a corpus extracted from Wikipedia, using a stepwise...
Full Text Available This paper contains an overview of basic formulations and approaches to clustering. Then it presents two important clustering paradigms: a bottom-up agglomerative technique, which collects similar documents into larger and larger groups, and a top-down partitioning technique, which divides a corpus into topic-oriented partitions.
Afzal, Zubair; Pons, Ewoud; Kang, Ning; Sturkenboom, Miriam C J M; Schuemie, Martijn J; Kors, Jan A
In order to extract meaningful information from electronic medical records, such as signs and symptoms, diagnoses, and treatments, it is important to take into account the contextual properties of the identified information: negation, temporality, and experiencer. Most work on automatic identification of these contextual properties has been done on English clinical text. This study presents ContextD, an adaptation of the English ConText algorithm to the Dutch language, and a Dutch clinical corpus. We created a Dutch clinical corpus containing four types of anonymized clinical documents: entries from general practitioners, specialists' letters, radiology reports, and discharge letters. Using a Dutch list of medical terms extracted from the Unified Medical Language System, we identified medical terms in the corpus with exact matching. The identified terms were annotated for negation, temporality, and experiencer properties. To adapt the ConText algorithm, we translated English trigger terms to Dutch and added several general and document specific enhancements, such as negation rules for general practitioners' entries and a regular expression based temporality module. The ContextD algorithm utilized 41 unique triggers to identify the contextual properties in the clinical corpus. For the negation property, the algorithm obtained an F-score from 87% to 93% for the different document types. For the experiencer property, the F-score was 99% to 100%. For the historical and hypothetical values of the temporality property, F-scores ranged from 26% to 54% and from 13% to 44%, respectively. The ContextD showed good performance in identifying negation and experiencer property values across all Dutch clinical document types. Accurate identification of the temporality property proved to be difficult and requires further work. The anonymized and annotated Dutch clinical corpus can serve as a useful resource for further algorithm development.
Hayakawa, K.; Kanda, T.; Hashimoto, K.; Okuno, Y.; Yamori, Y.; Yuge, M.; Ando, R.; Ozaki, N.; Tamamoto, A.
Purpose: The MR findings in patients with spastic diplegia were investigated and the role of MR imaging in assessing the extent of brain injury was evaluated. Material and Methods: 39 male and 24 female patients (preterm/term 43/20) were imaged using a 0.5 T MR system. Results: The MR findings in term patients were quite different from those in preterm patients; 55% of the term patients showed normal and minimal changes on MR, whereas 90.7% of the 43 preterm children had periventricular leucomalacia. The deep cerebral white matter was the most frequently involved site. Objective measurements revealed significant reductions of the entire sagittal area of corpus callosum in diplegic patients in comparison with normal controls. The motor plasy severity correlated well with the extent of corpus callosum involvement. Conclusion: The corpus callosum appears to be a sensitive marker site for the assessment of the extent of white matter injury. (orig.)
Kim, Ham Gyum
As a result of measuring the size of corpus callosum in normal Korean people by using MRI, the following conclusions were obtained. 1. Maximum, minimum, and mean values by the region in whole subjects 1) Anteroposterior length amounted to the mean with 69.30 mm, the minimum with 50.70 mm, and the maximum with 80.40 mm. 2) Diameter of genu amounted to the mean with 11.93 mm, the minimum with 6.00 mm, and the maximum with 18.50 mm. 3) Diameter of mid body amounted to the mean with 7.00 mm, the minimum with 3.40 mm, and the maximum with 10.40 mm. 4) Diameter of narrowing portion amounted to the mean with 4.51 mm, the minimum with 0.80 mm, and the maximum with 9.50 mm. 5) Diameter of splenium amounted to the mean with 12.17 mm, the minimum with 6.90 mm, and the maximum with 17.20 mm. 2. Comparison by region according to the gender in the whole subjects 1) Anteroposterior length was bigger in men than in women, and showed the significant difference depending on gender. 2) Diameter of genu, diameter of mid body, and diameter of narrowing portion were bigger in men than in women, but there was no significant difference. 3) Diameter of splenium was bigger in men than in women, and showed the statistically significant difference. 3. Comparison by region according to the age in the whole subjects 1) Anteroposterior length was the biggest in the 50s at the age, and was smaller in heir 10s than other age levels. In addition, the significant difference was indicated depending on age. 2) Diameter of genu and diameter of mid body were the biggest in their 30s, and were smaller in the 60s than other age levels. And, the statistically significant difference was indicated. 3) Diameter of narrowing portion was the thickest in their 20s, and was thinner in their 60s than other age levels. And, the significant difference was indicated depending on age. 4) Diameter of splenium was the thickest in their 30s, and was thinner in their 10s than other age levels. And, the statistically
Ito, Shoichi; Makino, Takahiro; Shirai, Wakako; Hattori, Takamichi [Department of Neurology, Graduate School of Medicine, Chiba University (Japan)
Progressive supranuclear palsy (PSP) is a neurodegenerative disease featuring parkinsonism, supranuclear ophthalmoplegia, dysphagia, and frontal lobe dysfunction. The corpus callosum which consists of many commissure fibers probably reflects cerebral cortical function. Several previous reports showed atrophy or diffusion abnormalities of anterior corpus callosum in PSP patients, but partitioning method used in these studies was based on data obtained in nonhuman primates. In this study, we performed a diffusion tensor analysis using a new partitioning method for the human corpus callosum. Seven consecutive patients with PSP were compared with 29 age-matched patients with Parkinson's Disease (PD) and 19 age-matched healthy control subjects. All subjects underwent diffusion tensor magnetic resonance imaging, and the corpus callosum was partitioned into five areas on the mid-sagittal plane according to a recently established topography of human corpus callosum (CC1-prefrontal area, CC2-premotor and supplementary motor area, CC3-motor area, CC4-sensory area, CC5-parietal, temporal, and occipital area). Fractional anisotropy (FA) and apparent diffusion coefficient (ADC) were measured in each area and differences between groups were analyzed. In the PSP group, FA values were significantly decreased in CC1 and CC2, and ADC values were significantly increased in CC1 and CC2. Receiver operating characteristic analysis showed excellent reliability of FA and ADC analyses of CC1 for differentiating PSP from PD. The anterior corpus callosum corresponding to the prefrontal, premotor, and supplementary motor cortices is affected in PSP patients. This analysis can be an additional test for further confirmation of the diagnosis of PSP.
Ito, Shoichi; Makino, Takahiro; Shirai, Wakako; Hattori, Takamichi
Progressive supranuclear palsy (PSP) is a neurodegenerative disease featuring parkinsonism, supranuclear ophthalmoplegia, dysphagia, and frontal lobe dysfunction. The corpus callosum which consists of many commissure fibers probably reflects cerebral cortical function. Several previous reports showed atrophy or diffusion abnormalities of anterior corpus callosum in PSP patients, but partitioning method used in these studies was based on data obtained in nonhuman primates. In this study, we performed a diffusion tensor analysis using a new partitioning method for the human corpus callosum. Seven consecutive patients with PSP were compared with 29 age-matched patients with Parkinson's Disease (PD) and 19 age-matched healthy control subjects. All subjects underwent diffusion tensor magnetic resonance imaging, and the corpus callosum was partitioned into five areas on the mid-sagittal plane according to a recently established topography of human corpus callosum (CC1-prefrontal area, CC2-premotor and supplementary motor area, CC3-motor area, CC4-sensory area, CC5-parietal, temporal, and occipital area). Fractional anisotropy (FA) and apparent diffusion coefficient (ADC) were measured in each area and differences between groups were analyzed. In the PSP group, FA values were significantly decreased in CC1 and CC2, and ADC values were significantly increased in CC1 and CC2. Receiver operating characteristic analysis showed excellent reliability of FA and ADC analyses of CC1 for differentiating PSP from PD. The anterior corpus callosum corresponding to the prefrontal, premotor, and supplementary motor cortices is affected in PSP patients. This analysis can be an additional test for further confirmation of the diagnosis of PSP
Bamba, Ravinder; Riley, D Colton; Boyer, Richard B; Pollins, Alonda C; Shack, R Bruce; Thayer, Wesley P
Polyethylene glycol (PEG) has been shown to restore axonal continuity after peripheral nerve transection in animal models. We hypothesized that PEG can also restore axonal continuity in the central nervous system. In this current experiment, coronal sectioning of the brains of Sprague-Dawley rats was performed after animal sacrifice. 3Brain high-resolution microelectrode arrays (MEA) were used to measure mean firing rate (MFR) and peak amplitude across the corpus callosum of the ex-vivo brain slices. The corpus callosum was subsequently transected and repeated measurements were performed. The cut ends of the corpus callosum were still apposite at this time. A PEG solution was applied to the injury site and repeated measurements were performed. MEA measurements showed that PEG was capable of restoring electrophysiology signaling after transection of central nerves. Before injury, the average MFRs at the ipsilateral, midline, and contralateral corpus callosum were 0.76, 0.66, and 0.65 spikes/second, respectively, and the average peak amplitudes were 69.79, 58.68, and 49.60 μV, respectively. After injury, the average MFRs were 0.71, 0.14, and 0.25 spikes/second, respectively and peak amplitudes were 52.11, 8.98, and 16.09 μV, respectively. After application of PEG, there were spikes in MFR and peak amplitude at the injury site and contralaterally. The average MFRs were 0.75, 0.55, and 0.47 spikes/second at the ipsilateral, midline, and contralateral corpus callosum, respectively and peak amplitudes were 59.44, 45.33, 40.02 μV, respectively. There were statistically differences in the average MFRs and peak amplitudes between the midline and non-midline corpus callosum groups ( P < 0.01, P < 0.05). These findings suggest that PEG restores axonal conduction between severed central nerves, potentially representing axonal fusion.
Moya-Sánchez, E; Medina-Benítez, A; Medina-Salas, V; Fernández-Navarro, L
Partial segmental thrombosis of the corpus cavernosum is an unusual clinical condition of unknown origin that mainly affects young males, whose characteristic presentation is the appearance of unexplained perineal pain associated with a palpable perineal mass. This entity consists of thrombosis in the perineal portion of the corpus cavernosum, usually unilateral and it is associated with underlying malignant pathologies and predisposing factors such as microtrauma. After the adequate adherence to conservative treatment, the appearance of complications such as erectile dysfunction is very uncommon. Copyright © 2018 SERAM. Publicado por Elsevier España, S.L.U. All rights reserved.
Adult learners are an under-researched group in music education. Although music education research often uses texts (interviews, autobiographical accounts, survey responses), linguistic analysis has not yet been used in this area. Meanwhile, the internet has become a source of support and expression for adult music learners, through blogs and forums. This presentation describes part of the research undertaken for my MA in English Language, which uses a corpus of online texts to investigate di...
Matteo La Grassa
Full Text Available L’indagine presenta i primi risultati emersi dall’analisi di una parte del corpus LISAU (LIS di Adulti Udenti sulla produzione segnata del sintagma nominale in LIS da parte di informanti udenti che hanno appreso la LIS come L2 in età adulta. Scopo dell’indagine è cominciare a tracciare una linea di ricerca nell’ambito della linguistica acquisizionale con riferimento all’acquisizione della LIS come L2 da parte di udenti. Il corpus LISAU include il segnato di 7 informanti udenti con livello di competenza omogenea che hanno terminato un corso di terzo livello presso la sede Ente Nazionale Sordi di Prato e di 2 informanti sordi segnanti nativi considerati come gruppo di controllo. L’analisi si è incentrata sulla realizzazione dei nomi di prima e di seconda classe rilevando anche forme non citazionali, sulla realizzazione di forme plurali e sulle modalità di accordo tra nomi e aggettivi. Dalla maggior parte dei dati analizzati si rileva la piena competenza degli informanti nella realizzazione del sintagma nominale. Nouns Signed by hearing adults in LIS: a preliminary survey on the LISAU corpus The results of an analysis concerning part of the LISAU (LIS of Hearing Adults corpus related to the production of the noun phrase in LIS by hearing informants who learned the LIS L2 in adulthood are presented. The purpose of the investigation was to outline the process with regard to the acquisition of LIS L2 by hearing adults. The LISAU corpus is composed of the sign language of 7 hearing informants with a homogeneous level of competence who completed a third-level course at the Ente Nazionale Sordi in Prato. LISAU also includes the sign language of 2 deaf native signers, considered the control group. The analysis focuses on the first and second-class nouns, including non-citation forms, plural forms and noun-adjective agreement. Most of the analyzed data reveals the informants’ full competence in creating noun phrases.
Savkov, Aleksandar; Carroll, John; Koeling, Rob; Cassell, Jackie
The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning.
Agnes Pisanski Peterlin
Full Text Available In recent decades the increasing reliance on computer technology and the emergence of electronic publishing have precipitated changes in both the production and reception of academic writing. At the same time, the dominance of English as the medium of academic communication has been asserted in all fields of study. While many scholars write their own texts in English, it is not exceptional for others to have their papers translated into English. It is interesting, however, that translation of academic discourse has received relatively little research attention so far. In the study presented here, the question how translated academic texts differ from comparable original English academic texts is addressed. To explore this question, a 700,000-word corpus comprising 104 research articles (Slovene-English translations and comparable English originals is analyzed in terms of references to the entire text itself. The results show considerable differences between the translated texts and the comparable English-language originals.
See Min Choi
Full Text Available Purpose: We studied the effects of alcohol administration on the corpus cavernosum (CC using an animal model. Materials and Methods: CC sections and the aortic ring of rabbits were used in an organ bath study. After acute alcohol administration, changes in blood alcohol concentration and electrical stimulation induced intracavernosal pressure/mean arterial pressure (ICP/MAP percentage were compared in rats. Cyclic adenosine monophosphate (cAMP and cyclic guanosine monophosphate (cGMP levels in the CC were measured using immunoassays. After chronic alcohol administration, ICP/MAP percentage, cAMP and cGMP were compared in rats. Histological changes were examined using the Masson trichrome stain and the Sircol collagen assay. Endothelial nitric oxide synthase (eNOS expression was examined using immunohistochemistry and Western blotting. Results: Alcohol relaxed the CC in a dose-dependent manner, and the relaxation response was suppressed when pretreated with propranolol, indomethacin, glibenclamide, and 4-aminopyridine. In rats with acute alcohol exposure, the cAMP level in the CC was significantly greater than was observed in the control group (p＜0.05. In rats with chronic alcohol exposure, however, changes in cAMP and cGMP levels were insignificant, and the CC showed markedly smaller areas of smooth muscle, greater amounts of dense collagen (p＜0.05. Immunohistochemical analysis of eNOS showed a less intense response, and western blotting showed that eNOS expression was significantly lower in this group (p＜0.05. Conclusions: Acute alcohol administration activated the cAMP pathway with positive effects on erectile function. In contrast, chronic alcohol administration changed the ultrastructures of the CC and suppressed eNOS expression, thereby leading to erectile dysfunction.
Full Text Available To preserve digital information it is vital that the format of that information can be identified, in-perpetuity. This is the major focus of research within the field of Digital Preservation. The National Archives of the UK called for the Digital Preservation and Digital Curation communities to develop a test corpus of digital objects to help further develop tools to aid this purpose. Following that call, an attempt has been made to develop the suite.This paper initially outlines a methodology to generate a skeleton corpus using simple user-generated digital objects. It then explores the lessons learnt in the generation of a corpus using scripting language techniques from the file format signatures described in The National Archives PRONOM technical registry. It will also discuss the use of the digital signature for this purpose, the benefits of developing a test corpus using this technique. Finally, this paper will outline a methodology for future research before exploring how the community can best make use of the output of this project and how this project needs to be taken forward to completion.
Full Text Available In this paper, aiming at the characteristics of Chinese text classification, using the ICTCLAS(Chinese lexical analysis system of Chinese academy of sciences for document segmentation, and for data cleaning and filtering the Stop words, using the information gain and document frequency feature selection algorithm to document feature selection. Based on this, based on the Naive Bayesian algorithm implemented text classifier , and use Chinese corpus of Fudan University has carried on the experiment and analysis on the system.
Salway, Andrew; Graham, Mike; Tomadaki, Eleftheria; Xu, Yan
The ongoing TIWO project is investigating the synthesis of language technologies, like information extraction and corpus-based text analysis, video data modeling and knowledge representation. The aim is to develop a computational account of how video and text can be integrated by representations of narrative in multimedia systems. The multimedia domain is that of film and audio description – an emerging text type that is produced specifically to be informative about the events and objects dep...
Alquraishi, Mohammed Abdulrahman
The purpose of this study is to investigate the functions of lexical bundles in two corpora: a corpus of engineering academic texts and a corpus of IEP advanced writing class texts. This study is concerned with the nature of formulaic language in Pathway IEPs and engineering texts, and whether those types of texts show similar or distinctive formulaic functions. Moreover, the study looked into lexical bundles found in an engineering 1.26 million-word corpus and an ESL 65000-word corpus using a concordancing program. The study then analyzed the functions of those lexical bundles and compared them statistically using chi-square tests. Additionally, the results of this investigation showed 236 unique frequent lexical bundles in the engineering corpus and 37 bundles in the pathway corpus. Also, the study identified several differences between the density and functions of lexical bundles in the two corpora. These differences were evident in the distribution of functions of lexical bundles and the minimal overlap of lexical bundles found in the two corpora. The results of this study call for more attention to formulaic language at ESP and EAP programs.
Full Text Available Partant de l’ouvrage "Texte et ordinateur. L’écriture réinventée ?" de Jacques Anis (1998 nous cherchons à montrer la trajectoire des travaux français en sciences du langage sur les corpus médiés par la technologie jusqu’à aujourd’hui. La communication en ligne renvoie à des formes diverses selon qu’il s’agit de la production de textes fixes (par exemple, sites Web, courriels ou de formes plutôt centrées sur les processus d’interaction et de communication (par exemple, chat, visioconférence susceptibles donc d’être étudiées tant du point de vue de l’analyse du discours que de celui de l’analyse conversationnelle. Nous nous proposons de montrer, dans cet article, dans quelle mesure les deux traditions en sciences du langage ont trouvé matière à exploiter ces corpus en ligne en empiétant, parfois, l’une comme l’autre, sur leurs « territoires » respectifs. Dans cette perspective, nous commençons par mettre au jour l’apport des chercheurs revendiquant leur appartenance à l’analyse du discours, puis celui des chercheurs relevant de l’analyse des interactions et nous montrons les zones de tuilage entre les deux courants. Dans une dernière partie, nous nous intéressons aux défis juridiques, techniques et épistémologiques que doit relever le linguiste qui cherche à étudier des corpus multimodaux en ligne qui prennent des formes de plus en plus sophistiquées et complexes.
Full Text Available This case study was carried out in the English Education Department of State University of Malang. The aim of the study was to identify and describe the vocabulary in the reading text and to seek if the text is useful for reading skill development. A descriptive qualitative design was applied to obtain the data. For this purpose, some available computer programs were used to find the description of vocabulary in the texts. It was found that the 20 texts containing 7,945 words are dominated by low frequency words which account for 16.97% of the words in the texts. The high frequency words occurring in the texts were dominated by function words. In the case of word levels, it was found that the texts have very limited number of words from GSL (General Service List of English Words (West, 1953. The proportion of the first 1,000 words of GSL only accounts for 44.6%. The data also show that the texts contain too large proportion of words which are not in the three levels (the first 2,000 and UWL. These words account for 26.44% of the running words in the texts.Â It is believed that the constraints are due to the selection of the texts which are made of a series of short-unrelated texts. This kind of text is subject to the accumulation of low frequency words especially those of content words and limited of words from GSL. It could also defeat the development of students' reading skills and vocabulary enrichment.
Luciane Corrêa Ferreira
Full Text Available This study concerns the use of corpus linguistics methodology in psycholinguistics research. Ten linguistic metaphors were selected from English and American newspapers. After that, we identified the underlying conceptual metaphor based on the conceptual metaphor inventory by Lakoff and Johnson (1980, 1999. We seek to investigate what sort of knowledge EFL-learners use when trying to understand a linguistic metaphor. We examined how EFL-learners comprehend linguistic metaphors, firstly without using the context and then using the context. The sample comprised 221 Brazilian students and 16 American students at UCSC. We have also carried out an empirical research using WebCorp.Este estudo investiga o uso de metodologia de lingüística de corpus na pesquisa psicolingüística. Selecionaram-se dez metáforas em jornais ingleses e norte-americanos. Depois, identificou-se a metáfora conceptual subjacente com base no inventário de metáforas conceptuais apresentado por Lakoff e Johnson (1980, 1999. Buscou-se investigar que tipo de conhecimento os aprendizes de LE empregam ao tentar compreender metáforas. Analisou-se a maneira como os aprendizes de LE compreendem metáforas lingüísticas sem usar o contexto e, depois, utilizando o contexto. A amostra incluiu 221 estudantes brasileiros e 16 estudantes norte-americanos da UCSC.
Corpus Jurise taust ja areng - finantshuvide kaitse areng, Corpus Jurise ajalugu; Corpus Jurise õiguslik baas (Amsterdami leping), ülesehitus ja struktuur (üleeuroopalise territoriaalsuse printsiip, kohtuliku kontrolli printsiip, "vastuväitelise" protsessi printsiip, kohaliku õiguse subsidiaarsuse printsiip)
Full Text Available Taking the Michigan Corpus of Academic Spoken English, this paper explores the pragmatic behavior of one-word tags – a common feature in conversational English – in academic speech. The analysis indicates that university professors use tags within textual metadiscourse patterns to signpost their audiences and facilitate comprehension. In addition, tags correlate with interpersonal metadiscourse elements typical of conversation that help lecturers adopt stances, convey solidarity and socialize with their undergraduates. The conclusion section relates the interpersonal semiotics of lectures to the communicative goals of university talk and suggests the need to approach listening comprehension through students’ awareness of genres as social actions.
Kristoffersen, Jette Hedegaard; Troelsgård, Thomas; Langer, Gabriele
In a combined corpus-dictionary project, you would need one lexical database that could serve as a shared “backbone” for both corpus annotation and dictionary editing, but it is not that easy to define a database structure that applies satisfactorily to both these purposes. In this paper, we...... will exemplify the problem and present ideas on how to model structures in a lexical database that facilitate corpus annotation as well as dictionary editing. The paper is a joint work between the DGS Corpus Project and the DTS Dictionary Project. The two projects come from opposite sides of the spectrum (one...... adjusting a lexical database grown from dictionary making for corpus annotating, one building a lexical database in parallel with corpus annotation and editing a corpus-based dictionary), and we will consider requirements and feasible structures for a database that can serve both corpus and dictionary....
One of the many new features of English language learners' dictionaries derived from the technological developments that have taken place over recent decades is the presence of corpus-based examples to illustrate the use of words in context. However, empirical studies have generally not been able to produce conclusive evidence about their…
De Bra, P.M.E.; Smits, D.; Pechenizkiy, M.; Knutov, E.; Yudelson, M.; Abel, F.; Houben, G.J.P.M.; Herder, E.
"Open" has quickly become the hottest topic in any field related to information, including open government data, open learning resources, open user models, … Open Corpus Adaptation has been defined as the ability to perform adaptation to resources located anywhere on the Web. This leaves the
Separation of Powers Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Eliminating Federal Court Jurisdiction Where There Is No State Court Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 1 542 U.S. 466 (2004). Enemy Combatant Detainees: Habeas Corpus Challenges in Federal Court In Rasul v. Bush,1 a divided Supreme Court declared that “a state
English legal commentator William Blackstone described the writ of habeas corpus as a second Magna Carta, and Supreme Court Chief Justice John Marshall called it the "great writ." It has been part of the Anglo-American common law tradition since the Middle Ages. In the United States, it has been a source of tension between state and…
Frederiksen, Kristian Steen; Garde, Ellen; Skimminge, Arnold
Several studies have found atrophy of the corpus callosum (CC) in patients with Alzheimer's disease (AD). However, it remains unclear whether callosal atrophy is already present in the early stages of AD, and to what extent it may be associated with other structural changes in the brain......, such as age-related white matter changes (ARWMC) and progression of the disease....
Interaction as 'involvement' in writing for students: a corpus linguistic analysis of a key readability feature. E Hilton Hubbard. Abstract. The rapid change in the demographics of South Africa's tertiary level student population over the last decade — and most specifically the huge increase in those who have to study at a ...
As an important yet intricate linguistic feature in English language, synonymy poses a great challenge for second language learners. Using the 100 million-word British National Corpus (BNC) as data and the software Sketch Engine (SkE) as an analyzing tool, this article compares the usage of "learn" and "acquire" used in natural…
Koops, Hendrik Vincent; Volk, A.; de Haas, W.B.
This paper presents a corpus-based study on rhythmic patterns in the RAG-collection of approximately 11.000 symbolically encoded ragtime pieces. While characteristic musical features that define ragtime as a genre have been debated since its inception, musicologists argue that specific syncopation
This paper presents two chatbot systems, ALICE and. Elizabeth, illustrating the dialogue knowledge representation and pattern matching techniques of each. We discuss the problems which arise when using the. Corpus of Spoken Afrikaans (Korpus Gesproke Afrikaans) to retrain the ALICE chatbot system with human ...
Paggio, Patrizia; Navarretta, Costanza
, specifically head movements, facial expressions, and body posture. The corpus has served as the empirical basis for a number of studies of communication phenomena related to turn management, feedback exchange, information packaging and the expression of emotional attitudes. We describe the annotation scheme...
Coronel-Molina, Serafin M.
The discussion of corpus planning for the Southern Quechua language variety of Peru examines issues of graphization, standardization, modernization, and renovation of Quechua in the face of increasing domination by the Spanish language. The efforts of three major groups of linguists and other scholars working on language planning in Peru, and the…
Institute, University of Zimbabwe, Harare, Zimbabwe (email@example.com). Abstract: In this article the writer ... sentative" in terms of size in order to be appropriately used as basis for such corpus-based diction- aries, the ISN editors .... (e) the format should suggest a preference rather than a restriction. For COBUILD, a good ...
As far as traditionally published Swahili language dictionaries are concerned, throughout the long history of Swahili lexicography, most new dictionaries were based on their predecessors. Thus far the only innovative traditionally printed corpus-based dictionary has been published by Finnish scholars (Abdulla et al. 2002).
This article investigates the extent to which four representatives of the latest generation of English-French / French-English dictionaries present "real English", i.e. actually used meanings of actually used English word patterns. The findings of a corpus study of the verb CONSIDER are confronted with the entries for this verb ...
In order to illustrate the feasibility of corpus applications for the African languages at present, the article first considers 'fundamental linguistic research' in the fields of phonetics and question particles. It is shown how that research was boosted as a result of the utilisation of corpora. In a second section 'language teaching ...
Sigurbjörnsson, B.; Kamps, J.; de Rijke, M.
EuroGOV is a multilingual web corpus that was created to serve as the document collection for WebCLEF, the CLEF 2005 web retrieval task. EuroGOV is a collection of web pages crawled from the European Union portal, European Union member state governmental web sites, and Russian government web sites.
Kok, Rianne; Lucassen, Nicole; Bakermans-Kranenburg, Marian J; van IJzendoorn, Marinus H; Ghassabian, Akhgar; Roza, Sabine J; Govaert, Paul; Jaddoe, Vincent W; Hofman, Albert; Verhulst, Frank C; Tiemeier, Henning
In this longitudinal population-based study (N = 544), we investigated whether early parenting and corpus callosum length predict child executive function abilities at 4 years of age. The length of the corpus callosum in infancy was measured using postnatal cranial ultrasounds at 6 weeks of age. At 3 years, two aspects of parenting were observed: maternal sensitivity during a teaching task and maternal discipline style during a discipline task. Parents rated executive function problems at 4 years of age in five domains of inhibition, shifting, emotional control, working memory, and planning/organizing, using the Behavior Rating Inventory of Executive Function-Preschool Version. Maternal sensitivity predicted less executive function problems at preschool age. A significant interaction was found between corpus callosum length in infancy and maternal use of positive discipline to determine child inhibition problems: The association between a relatively shorter corpus callosum in infancy and child inhibition problems was reduced in children who experienced more positive discipline. Our results point to the buffering potential of positive parenting for children with biological vulnerability.
Full Text Available Introduction Spontaneous speech samples in individuals with aphasia (IWA have been analyzed to examine many different psycholinguistic features. The present study focused on how IWA use verbs in spontaneous speech. Some verbs can occur in more than one argument structure, but are biased to occur more frequently in one frame than another. For example, "watch" appears in transitive and intransitive structures, but is usually used transitively. This is known as a transitivity bias. It is unknown whether IWA show the same transitivity biases in production as those reported in previous corpus studies with unimpaired individuals. Studies of sentence comprehension have shown that IWA are sensitive to verb biases (e.g., DeDe, 2013. In addition, IWA have shown an overall preference for transitive structures, which are the most frequent structures in English (Roland, Dick, & Elman, 2007. The present study investigated whether IWA show the same pattern of transitive and intransitive biases in spontaneous speech as unimpaired individuals. Method Participants: 278 interviews with IWA were taken from AphasiaBank. The IWA represented a range of aphasia types. Participants were omitted if they spoke English as a second language. Materials: 54 verbs were coded. We chose verbs with the goal of representing different bias types (e.g., transitive, intransitive, sentential complement. Of these, data from 11 transitively biased and 11 intransitively biased verbs (matched for frequency of use and number of syllables are presented here. Coding: All productions of the 54 verbs were coded. The coding protocol was based on Gahl, Jurafsky, and Roland (2004. We implemented an additional level of coding to indicate erroneous verb productions, such as ungrammatical structures and verb agreement errors. Results The (intransitivity biases for IWA were compared to biases from a previously published corpus study (Gahl et al., 2004. The IWA used transitively biased verbs in
Schildhauer, M.; Adams, B.; Rebich Hespanha, S.
There is a clear need for better semantic representation of Earth and environmental concepts, to facilitate more effective discovery and re-use of information resources relevant to scientists doing integrative research. In order to develop general-purpose Earth and environmental science ontologies, however, it is necessary to represent concepts and relationships that span usage across multiple disciplines and scientific specialties. Traditional knowledge modeling through ontologies utilizes expert knowledge but inevitably favors the particular perspectives of the ontology engineers, as well as the domain experts who interacted with them. This often leads to ontologies that lack robust coverage of synonymy, while also missing important relationships among concepts that can be extremely useful for working scientists to be aware of. In this presentation we will discuss methods we have developed that utilize statistical topic modeling on a large corpus of Earth and environmental science articles, to expand coverage and disclose relationships among concepts in the Earth sciences. For our work we collected a corpus of over 121,000 abstracts from many of the top Earth and environmental science journals. We performed latent Dirichlet allocation topic modeling on this corpus to discover a set of latent topics, which consist of terms that commonly co-occur in abstracts. We match terms in the topics to concept labels in existing ontologies to reveal gaps, and we examine which terms are commonly associated in natural language discourse, to identify relationships that are important to formally model in ontologies. Our text mining methodology uncovers significant gaps in the content of some popular existing ontologies, and we show how, through a workflow involving human interpretation of topic models, we can bootstrap ontologies to have much better coverage and richer semantics. Because we base our methods directly on what working scientists are communicating about their
Abreu Junior, Luiz de; Borri, Maria Lucia; Wolosker, Angela Maria Borri; Hartmann, Luiz Guilherme de Carvalho; Galvao Filho, Mario de Melo; D'Ippolito, Giuseppe
The corpus callosum is the major system of association fibers that permits communication of both cerebral hemispheres. Magnetic resonance imaging has improved the study of brain malformations, including the corpus callosum dysgenesis. Lipoma is a common finding in the spectrum of corpus callosum dysgenesis. The purpose of these study was to review the embryologic events and the magnetic resonance imaging aspects related to the corpus callosum dysgenesis and to the formation of the related lipoma. (author)
This plenary paper showcases current corpus-based research on written academic English, illustrating the tight links that exist between corpus research and pedagogic applications. I first explicate Sinclair's concept of the "lexical approach", which underpins much corpus research and pedagogy. I then discuss studies which focus on…
Full Text Available This study focuses on a method for sequential data augmentation in order to alleviate data sparseness problems. Specifically, we present corpus expansion techniques for enhancing the coverage of a language model. Recent recurrent neural network studies show that a seq2seq model can be applied for addressing language generation issues; it has the ability to generate new sentences from given input sentences. We present a method of corpus expansion using a sentence‐chain based seq2seq model. For training the seq2seq model, sentence chains are used as triples. The first two sentences in a triple are used for the encoder of the seq2seq model, while the last sentence becomes a target sequence for the decoder. Using only internal resources, evaluation results show an improvement of approximately 7.6% relative perplexity over a baseline language model of Korean text. Additionally, from a comparison with a previous study, the sentence chain approach reduces the size of the training data by 38.4% while generating 1.4‐times the number of n‐grams with superior performance for English text.
Jimeno, Antonio; Jimenez-Ruiz, Ernesto; Lee, Vivian; Gaudan, Sylvain; Berlanga, Rafael; Rebholz-Schuhmann, Dietrich
In recent years, the recognition of semantic types from the biomedical scientific literature has been focused on named entities like protein and gene names (PGNs) and gene ontology terms (GO terms). Other semantic types like diseases have not received the same level of attention. Different solutions have been proposed to identify disease named entities in the scientific literature. While matching the terminology with language patterns suffers from low recall (e.g., Whatizit) other solutions make use of morpho-syntactic features to better cover the full scope of terminological variability (e.g., MetaMap). Currently, MetaMap that is provided from the National Library of Medicine (NLM) is the state of the art solution for the annotation of concepts from UMLS (Unified Medical Language System) in the literature. Nonetheless, its performance has not yet been assessed on an annotated corpus. In addition, little effort has been invested so far to generate an annotated dataset that links disease entities in text to disease entries in a database, thesaurus or ontology and that could serve as a gold standard to benchmark text mining solutions. As part of our research work, we have taken a corpus that has been delivered in the past for the identification of associations of genes to diseases based on the UMLS Metathesaurus and we have reprocessed and re-annotated the corpus. We have gathered annotations for disease entities from two curators, analyzed their disagreement (0.51 in the kappa-statistic) and composed a single annotated corpus for public use. Thereafter, three solutions for disease named entity recognition including MetaMap have been applied to the corpus to automatically annotate it with UMLS Metathesaurus concepts. The resulting annotations have been benchmarked to compare their performance. The annotated corpus is publicly available at ftp://ftp.ebi.ac.uk/pub/software/textmining/corpora/diseases and can serve as a benchmark to other systems. In addition, we found
Data-driven learning (DDL), or corpus-based language learning, involves the learner in an exploratory task to discover appropriate expressions or collocates regarding his writing. However, the problematic units of meaning in each learner's writing are so diverse that conventional corpora often prove futile. The search engine Google with the…
Wang, Yan; Melton, Genevieve B.; Pakhomov, Serguei
Although anaphoric expressions are very common in biomedical and clinical documents, little work has been done to systematically characterize their use in clinical text. Samples of ‘it’, ‘this’, and ‘that’ expressions occurring in inpatient clinical notes from four metropolitan hospitals were analyzed using a combination of semi-automated and manual annotation techniques. We developed a rule-based approach to filter potential non-referential expressions. A physician then manually annotated 1000 potential referential instances to determine referent status and the antecedent of each referent expression. A distributional analysis of the three referring expressions in the entire corpus of notes demonstrates a high prevalence of anaphora and large variance in distributions of referential expressions with different notes. Our results confirm that anaphoric expressions are common in clinical texts. Effective co-reference resolution with anaphoric expressions remains an important challenge in medical natural language processing research. PMID:22195211
Full Text Available The paper describes the preparation and development of the text collections within the framework of MorphoRuEval-2017 shared task, an evaluation campaign designed to stimulate development of the automatic morphological processing technologies for Russian. The main challenge for the organizers was to standardize all available Russian corpora with the manually verified high-quality tagging to a single format (Universal Dependencies CONLL-U. The sources of the data were the disambiguated subcorpus of the Russian National Corpus, SynTagRus, OpenCorpora.org data and GICR corpus with the resolved homonymy, all exhibiting different tagsets, rules for lemmatization, pipeline architecture, technical solutions and error systematicity. The collections includes both normative texts (the news and modern literature and more informal discourse (social media and spoken data, the texts are available under CC BY-NC-SA 3.0 license.
Full Text Available The assertion of the centrality and supremacy of man, or rather, of the idea(l of humanity, during the Renaissance period, inevitably entailed the repudiation of the animal and the beginning of the great human-animal divide. What was seen, at the time, as the re-birth of man, was also the birth of a rampant anthropocentrism which, until the recent so-called “animal turn”“ in critical and literary studies went unquestioned. Taking this into account, one would expect to find an almost exclusive focus on the human or what is/was perceived as being human in most works from that period. Yet, surprisingly, throughout Shakespeare‘s plays, one encounters a plethora of figures of animality leaping, running, crawling, flying, swimming, or advancing, as Derrida would say, “à pas de loup”“. From dogs, bears, lions, apes and foxes to birds, fish, worms and reptiles, Shakespeare the humanist paradoxically unfolds a veritable bestiary of nonhuman presences. Using corpus-based analysis that focuses on animal similes built with the preposition “like”“ and a critical angle largely informed by posthumanist theory, we take a closer look at the forms, roles and functions of both nonhuman and human animality in Shakespeare, as well as the intricate relationship between anthropocentrism and anthropomorphism.
Full Text Available Using diffusion-tensor MRI and fiber tractography the topographic organization of the corpus callosum (CC has been described to comprise 5 segments with fibers projecting into prefrontal (I, premotor and supplementary motor (II, primary motor (III, and primary sensory areas (IV, as well as into parietal, temporal, and occipital cortical areas (V. In order to more rapidly characterize the underlying anatomy of these segments, this study used a novel single-shot T1 mapping method to quantitatively determine T1 relaxation times in the human CC. A region-of-interest analysis revealed a tendency for the lowest T1 relaxation times in the genu and the highest T1 relaxation times in the somatomotor region of the CC. This observation separates regions dominated by myelinated fibers with large diameters (somatomotor area from densely packed smaller axonal bundles (genu with less myelin. The results indicate that characteristic T1 relaxation times in callosal profiles provide an additional means to monitor differences in fiber anatomy, fiber density, and gray matter in respective neocortical areas. In conclusion, rapid T1 mapping allows for a characterization of the axonal architecture in an individual CC in less than 10 s. The approach emerges as a valuable means for studying neocortical brain anatomy with possible implications for the diagnosis of neurodegenerative processes.
Liu, Wenyu; An, Dongmei; Niu, Running; Gong, Qiyong; Zhou, Dong
To investigate the quantitative diffusion properties of the corpus callosum (CC) in a large group of patients with periventricular nodular heterotopia (PNH) related epilepsy and to further investigate the effect of Filamin A ( FLNA ) mutation on these properties. Patients with PNH (n = 34), subdivided into FLNA -mutated (n = 11) and FLNA -nonmutated patients (n = 23) and healthy controls (n = 34), underwent 3.0 T structural MRI and diffusion imaging scan (64 direction). Fractional anisotropy (FA) and mean diffusivity (MD) were measured in the three major subdivisions of the CC (genu, body and splenium). Correlations between DTI metric changes and clinical parameters were also evaluated. Furthermore, the effect of FLNA mutation on structural integrity of the corpus callosum was examined. Patients with PNH and epilepsy had significant reductions in FA for the genu and splenium of the CC, accompanied by increases in MD for the splenium, as compared to healthy controls. There were no correlations between clinical parameters of epilepsy and MD. The FA value in the splenium negatively correlated with epilepsy duration. Interestingly, FLNA -mutated patients showed significantly decreased FA for all three major subdivisions of the CC, and increased MD for the genu and splenium, as compared to HCs and FLNA -nonmutated patients. These findings support the conclusion that patients with epilepsy secondary to PNH present widespread microstructural changes found in the corpus callosum that extend beyond the macroscopic MRI-visible lesions. This study also indicates that FLNA may affect white matter integrity in this disorder.
This paper studies transmutation theory as found in the texts attributed to Zosimus of Panopolis, "the philosopher Synesius," and "the philosopher Olympiodorus of Alexandria." It shows that transmutation theory (i.e. a theory explaining the complete transformation of substances) is mostly absent from the work attributed to these three authors. The text attributed to Synesius describes a gilding process, which is similar to those described by Pliny and Vitruvius. The commentary attributed to Olympiodorus is the only text studied here that describes something similar to a transmutation theory. It is unclear, however, if this was a theory of transmutation or if the writer meant something more like the literal meaning of the word "ekstrophē," a term used to describe the transformation of metals, as the "turning inside-out" of what is hidden in a substance. A similar conception of ekstrophē can be found in the works of Zosimus, who discussed transmutation to make an analogy with self-purification processes, which, from the perspective of his own anthropogony, consisted in the "turning inside-out" of the "inner human" (esō anthrōpos).
Davies, Florence; Greene, Terry
This paper describes Directed Activities Related to Text (DART), procedures that were developed and are used in the Reading for Learning Project at the University of Nottingham (England) to enhance learning from texts and that fall into two broad categories: (1) text analysis procedures, which require students to engage in some form of analysis of…
Ke , Guiyao
Thematic comparable corpora regroup texts from a same topic and written in several languages, highly similar but without mutual translations. Comparing with parallel corpora which regroup pairs of translations, comparable corpora have three advantages: firstly, they are rich and big resources jointly in volume and in covered period; secondly, comparable corpora provide original language and thematic resources. Finally, they are less expensive to develop than parallel corpus. With the consider...
Full Text Available The present paper aims to provide an overview of some of the advantages of creating and working with a DIY corpus, i.e. a corpus compiled by the linguist, as groundwork for a PhD thesis. Collected in order to investigate the grammatical and pragmatical behavior in historical Romanian of some so-called parenthetical verbs: a zice/ a spune ‘to say’, a crede ‘to think’, a şti ‘to know’, within 5 types of texts from the 16th/17th to the 20th centuries, this DIY corpus represents a necessary alternative as a database of Romania texts. Although its creation demanded some additional steps (e.g. the selection of the texts, which is determined by various diachronical factors, such a corpus proves to be relevant for investigating parenthetical verbs in literary, historical and law texts, as well as in formal and informal letters. In order to do so, the paradigm of the afore-mentioned verbs has to be systematized in relation to a precise word frequency per text type.
Full Text Available This study investigated current obesity prevalence and associations between musculoskeletal fitness test scores and the odds of being underweight, overweight, or obese compared to having a healthy weight in elementary school children in Corpus Christi, Texas. The sample analyzed consisted of 492 public elementary school children between kindergarten and fifth grade. Their ages ranged from 5 to 11 years. Trunk lift, 90° push-up, curl-up, and back saver sit and reach tests were administered. Weight status was determined using BMI scores and the CDC growth charts. Obesity prevalence remains high among elementary school-aged children in Corpus Christi, Texas. Higher 90° push-up test scores were most consistently associated with decreased odds of being obese as compared to being overweight and having healthy weight except in kindergarten. Conversely, higher trunk lift test scores were associated with increased odds of being obese in second and fourth grades. When children achieved the minimum score to be classified in the Healthy Fitness Zone, those with healthy weight had similarly low musculoskeletal fitness (i.e., abdominal strength and endurance, hamstring flexibility, and trunk extensor strength and flexibility as peers with overweight and obesity, especially in the lower grades. It was concluded that increased obesity prevalence in higher grades may be precipitated (at least in part by low musculoskeletal fitness in the lower grades, especially kindergarten. Given previous associations in the literature, low musculoskeletal fitness may be symptomatic of poor motor skill competence in the current sample. These findings suggest a need for early and focused school-based interventions that leverage both known and novel strategies to combat pediatric obesity in Corpus Christi.
Full Text Available One theory to account for neglect symptoms in patients with right focal damage invokes a release of inhibition of the right parietal cortex over the left parieto-frontal circuits, by disconnection mechanism. This theory is supported by transcranial magnetic stimulation studies showing the existence of asymmetric inhibitory interactions between the left and right posterior parietal cortex, with a right hemispheric advantage. These inhibitory mechanisms are mediated by direct transcallosal projections located in the posterior portions of the corpus callosum. The current study, using diffusion imaging and tract-based spatial statistics (TBSS, aims at assessing, in a data-driven fashion, the contribution of structural disconnection between hemispheres in determining the presence and severity of neglect. Eleven patients with right acute stroke and 11 healthy matched controls underwent MRI at 3T, including diffusion imaging, and T1-weighted volumes. TBSS was modified to account for the presence of the lesion and used to assess the presence and extension of changes in diffusion indices of microscopic white matter integrity in the left hemisphere of patients compared to controls, and to investigate, by correlation analysis, whether this damage might account for the presence and severity of patients' neglect, as assessed by the Behavioural Inattention Test (BIT. None of the patients had any macroscopic abnormality in the left hemisphere; however, 3 cases were discarded due to image artefacts in the MRI data. Conversely, TBSS analysis revealed widespread changes in diffusion indices in most of their left hemisphere tracts, with a predominant involvement of the corpus callosum and its projections on the parietal white matter. A region of association between patients' scores at BIT and brain FA values was found in the posterior part of the corpus callosum. This study strongly supports the hypothesis of a major role of structural disconnection between the
Kors, Jan A; Clematide, Simon; Akhondi, Saber A; van Mulligen, Erik M; Rebholz-Schuhmann, Dietrich
To create a multilingual gold-standard corpus for biomedical concept recognition. We selected text units from different parallel corpora (Medline abstract titles, drug labels, biomedical patent claims) in English, French, German, Spanish, and Dutch. Three annotators per language independently annotated the biomedical concepts, based on a subset of the Unified Medical Language System and covering a wide range of semantic groups. To reduce the annotation workload, automatically generated preannotations were provided. Individual annotations were automatically harmonized and then adjudicated, and cross-language consistency checks were carried out to arrive at the final annotations. The number of final annotations was 5530. Inter-annotator agreement scores indicate good agreement (median F-score 0.79), and are similar to those between individual annotators and the gold standard. The automatically generated harmonized annotation set for each language performed equally well as the best annotator for that language. The use of automatic preannotations, harmonized annotations, and parallel corpora helped to keep the manual annotation efforts manageable. The inter-annotator agreement scores provide a reference standard for gauging the performance of automatic annotation techniques. To our knowledge, this is the first gold-standard corpus for biomedical concept recognition in languages other than English. Other distinguishing features are the wide variety of semantic groups that are being covered, and the diversity of text genres that were annotated. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Blanco, R.; De Tejada, S.; Goldstein, I.; Krane, R.J.; Wotiz, H.H.; Cohen, R.A.
Physiological and histochemical evidence indicates that cholinergic nerves may participate in mediating penile erection. Acetylcholine synthesis and release was studied in isolated human corporal tissue. Human corpus cavernosum incubated with [ 3 H]choline accumulated [ 3 H]choline and synthesized [ 3 H]acethylcholine in an concentration-dependent manner. [ 3 H]Acetylcholine accumulation by the tissue was inhibited by hemicholinium-3, a specific antagonist of the high-affinity choline transport in cholinergic nerves. Transmural electrical field stimulation caused release of [ 3 H]acetylcholine which was significantly diminished by inhibiting neurotransmission with calcium-free physiological salt solution or tetrodotoxin. These observations provide biochemical and physiological evidence for the existence of cholinergic innervation in human corpus cavernosum
Full Text Available Text mining deals with complex and unstructured texts. Usually a particular collection of texts that is specified to one or more domains is necessary. We have developed a customizable text classifier for users to mine the collection automatically. It derives from the sentence category of the HNC theory and corresponding techniques. It can start with a few texts, and it can adjust automatically or be adjusted by user. The user can also control the number of domains chosen and decide the standard with which to choose the texts based on demand and abundance of materials. The performance of the classifier varies with the user's choice.
Abu Shawar, Bayan; Atwell, Eric
International research in NLP is dominated by work on English. NLP techniques and systems can be ported to other natural languages, but this is generally a labour-intensive task, requiring scarce computational and linguistic expertise; hence minority languages are poorly represented in NLP technology. We present an automated approach to porting an NLP technology, the AIML-based chatbot, to new languages, by using a corpus in the target language to retrain the chatbot. We have s...
Mousten, Birthe; Laursen, Anne Lise
Lay investors and semi-professionals lean on professional stock bloggers and stock analysts for advice on stock investments; semi-professionals and professionals write about investments globally, and stock information has to be available in many local markets. Using the correct terminology......’s critical sense is not enough to make the right choices. Our corpus-linguistic tool can be a help in this specialized field....
HMELJAK SANGAWA, Kristina
Full Text Available The paper presents a set of integrated on-line language resources targeted at Japanese language learners, primarily those whose mother tongue is Slovene. The resources consist of the on-line Japanese-Slovene learners’ dictionary jaSlo and two corpora, a 1 million word Japanese-Slovene parallel corpus and a 300 million word corpus of web pages, where each word and sentence is marked by its difficulty level; this corpus is furthermore available as a set of five distinct corpora, each one containing sentences of the particular level. The corpora are available for exploration through NoSketch Engine, the open source version of the commercial state-of-the-art corpus analysis software Sketch Engine. The dictionary is available for Web searching, and dictionary entries have direct links to examples from the corpora, thus offering a wider picture of a possible translations in concrete contextualised examples, and b monolingual Japanese usage examples of different difficulty levels to support language learning.-----Članek predstavlja japonsko-slovenski slovar jaSlo, spletni slovar za slovensko govoreče učence japonščine, in vključitev primerov iz dveh korpusov s pomočjo odprto-kodnega korpusnega iskalnika NoSketch Engine. Korpusa sta jaSlo (milijon besed, vzporedni korpus japonskih in slovenskih besedil, ki je bil zgrajen za ta namen in vsebuje večinoma literarna, spletna in akademska besedila, ter JpWaC-L (300 milijonov besed, korpus spletnih besedil, razdeljenih v povedi, ki so rangirane po težavnostnih stopnjah. S pregledno povezavo korpusnih primerov in slovarskih iztočnic v dvojezičnem slovarju za učence japonščine kot tujega jezika, ponuja sistem uporabnikom prijazen dostop k slovarskim podatkom, tj. reprezentativnim prevodnim ustreznicam, in korpusnim podatkom, ki ponujajo a širšo sliko možnih prevodnih ustreznic v konkretnih primerih s sobesedilom in b enojezične primere rabe japonskih besed v povedih različnih te
Full Text Available The aim of this corpus-based study is to identify the functions that selected expressions of futurality can express in professional economic texts. The classification of functions is established on the corpus of seven economic books. Excerpted instances of futural constructions are analysed with respect to textual and interpersonal functions as defined by Halliday. Futurality is interpreted broadly to include all lexical and grammatical means referring to the future. This approach makes it also possible to analyse futurality as a means of text coherence. Hence the core grammatical means are interpreted along with co-occurring lexical means under the two categories of functions to provide a comprehensive model of text coherence with regard to futurality. Frequency analysis shows that core futural expressions are not distributed equally throughout the corpus. While some expressions (e.g., will and the present simple tense dominate, others prove to be rather insignificant (e.g., be on the point/verge of, the present progressive tense. In addition, both lexical and grammatical constructions regularly co-occur in clusters, contributing to the coherence of the economic texts.
Lee, Myung Seob; Kim, Myung Soon; Park, Hyun Ju
Measurement of various portions of the corpus callosum was performed on magnetic resonance(MR) images of 114 subjects with no known or suspected corpus callosal disorders. Midsagittal T1-weighted images used for measurements and mean diameters of various portions in each age and sex group were obtained. Measures of five portions were made: (A) the anterio-posterior length, (B) the diameter of genu position, (C) the diameter of splenium, (D) the diameter of mid-body portion, (E) the diameter of a narrow portion at the body of corpus callosum. The mean diameter in each gender group for A, B, C, D and E were 68.8 mm, 12.1 mm, 12.3 mm, 6,9 mm, 4.1 mm in male and 69.9 mm, 12.0 mm, 12.1 mm, 6.4 mm, 4.1 mm in female, retrospectively. The groups of 0-9 years of both genders showed the minimum mean value in each portion
Sakamoto, Masanobu; Takeda, Katsuhiko; Bandou, Mitsuaki; Murayama, Shigeo; Sakuta, Manabu
We have reported a case of agenesis of the corpus callosum, in which NMR-CT revealed a complete defect of it, and have examined the localization of the speech center of this patient. The patient is a right-handed 26-year-old man who has complained of headache on the parietal region. His neurological examination revealed only a mild mental difficulty (IQ 77). X-ray CT showed the lateral ventricles to be separated widely and the posterior horns dilated, which were compatible with the agenesis of the corpus callosum. Further, NMR-CT has revealed a total agenesis of the corpus callosum. NMR-CT seems to be highly useful for the detection of the degree of the callosal defect. We have carried out the intracarotid amobarbital injection (Wada's test) for the determination of the lateralization of cerebral speech dominance. It had been reported by some authors that when it comes to the cerebral speech dominance, acallosal patients had no difference between each hemisphere. However, our results have demonstrated a left sided dominance. (author)
Martens, Marilee A; Wilson, Sarah J; Chen, Jian; Wood, Amanda G; Reutens, David C
Williams syndrome is a neurodevelopmental genetic disorder caused by a hemizygous deletion on chromosome 7q11.23, resulting in atypical brain structure and function, including abnormal morphology of the corpus callosum. An influence of handedness on the size of the corpus callosum has been observed in studies of typical individuals, but handedness has not been taken into account in studies of callosal morphology in Williams syndrome. We hypothesized that callosal area is smaller and the size of the splenium and isthmus is reduced in individuals with Williams syndrome compared to healthy controls, and examined age, sex, and handedness effects on corpus callosal area. Structural magnetic resonance imaging scans were obtained on 25 individuals with Williams syndrome (18 right-handed, 7 left-handed) and 25 matched controls. We found that callosal thickness was significantly reduced in the splenium of Williams syndrome individuals compared to controls. We also found novel evidence that the callosal area was smaller in left-handed participants with Williams syndrome than their right-handed counterparts, with opposite findings observed in the control group. This novel finding may be associated with LIM-kinase hemizygosity, a characteristic of Williams syndrome. The findings may have significant clinical implications in future explorations of the Williams syndrome cognitive phenotype.
Lee, Myung Seob; Kim, Myung Soon; Park, Hyun Ju [Wonju College of Medicine, Yonsei University, Wonju (Korea, Republic of)
Measurement of various portions of the corpus callosum was performed on magnetic resonance(MR) images of 114 subjects with no known or suspected corpus callosal disorders. Midsagittal T1-weighted images used for measurements and mean diameters of various portions in each age and sex group were obtained. Measures of five portions were made: (A) the anterio-posterior length, (B) the diameter of genu position, (C) the diameter of splenium, (D) the diameter of mid-body portion, (E) the diameter of a narrow portion at the body of corpus callosum. The mean diameter in each gender group for A, B, C, D and E were 68.8 mm, 12.1 mm, 12.3 mm, 6,9 mm, 4.1 mm in male and 69.9 mm, 12.0 mm, 12.1 mm, 6.4 mm, 4.1 mm in female, retrospectively. The groups of 0-9 years of both genders showed the minimum mean value in each portion.
Full Text Available To what extent non-recorded oral corpora may constitute objects of analysis of pragmatic meaning?These corpora are heard by chance: on the radio, on television, in the street, a shop, a means of transport or generally in any conversational interaction in which the linguist participates, but had not previously planned to record for his research. The problem of the use of these corpora in linguistics is all the more crucial since the aim, in phonopragmatics, is to discover the functions and significations of their phonic part. I shall attempt to answer the following questions:–The accuracy of the transcription with respect to the original. To what extent can we ignore our own phonological code, our regional variants, mastered/partly known styles of speech?–The reliability of the oral reproduction carried out by the linguist – for example, during a talk at a conference. What is his capacity for deferred mimicry?–The relation between a significant discrepancy and the elocutionary habits of the speaker.–The relation between the comprehension of the external auditors and the effect produced on the 'real' person addressed.Considering that transparency is (sometimes? often? an illusion, I shall also examine what precautions should be taken so that these corpora offer guarantees as to the veracity.
Oda, Kanae; Kim, Jin-Dong; Ohta, Tomoko; Okanohara, Daisuke; Matsuzaki, Takuya; Tateisi, Yuka; Tsujii, Jun'ichi
Background Associating literature with pathways poses new challenges to the Text Mining (TM) community. There are three main challenges to this task: (1) the identification of the mapping position of a specific entity or reaction in a given pathway, (2) the recognition of the causal relationships among multiple reactions, and (3) the formulation and implementation of required inferences based on biological domain knowledge. Results To address these challenges, we constructed new resources to link the text with a model pathway; they are: the GENIA pathway corpus with event annotation and NF-kB pathway. Through their detailed analysis, we address the untapped resource, ‘bio-inference,’ as well as the differences between text and pathway representation. Here, we show the precise comparisons of their representations and the nine classes of ‘bio-inference’ schemes observed in the pathway corpus. Conclusions We believe that the creation of such rich resources and their detailed analysis is the significant first step for accelerating the research of the automatic construction of pathway from text. PMID:18426550
Laura Hidalgo Downing
Full Text Available In this paper we examine the role played by metaphor in a corpus of sixty abstracts on immunology from Scientific American. We focus on the distinction between conventional metaphors and culturally adapted new metaphors and discuss the role played by metaphor choice in the communicative purposes of the abstracts and their register features. We argue that one of the main strategies used to attract the reader‘s attention is the combination of highly conventionalized metaphors, which occur more frequently in the corpus, together with what we call “culturally adapted new metaphors”, which display different degrees of creativity and are less frequent in the corpus. Conventional metaphors typically reinforce the world view shared by the scientific community and introduce basic ideas on the subject of immunology. Culturally adapted new metaphors include a cline from slightly new perspectives of conventional models, to highly creative uses of metaphor. Culturally adapted new metaphors appeal primarily to a general readership and not to the scientific community, as they tap human emotions and mythic constructions. These play a crucial role in the abstracts, as they contribute to persuasive and didactic communicative functions in the text.
Paul, Lynn K; Corsello, Christina; Kennedy, Daniel P; Adolphs, Ralph
The corpus callosum, with its ∼200 million axons, remains enigmatic in its contribution to cognition and behaviour. Agenesis of the corpus callosum is a congenital condition in which the corpus callosum fails to develop; such individuals exhibit localized deficits in non-literal language comprehension, humour, theory of mind and social reasoning. These findings together with parent reports suggest that behavioural and cognitive impairments in subjects with callosal agenesis may overlap with the profile of autism spectrum disorders, particularly with respect to impairments in social interaction and communication. To provide a comprehensive test of this hypothesis, we directly compared a group of 26 adults with callosal agenesis to a group of 28 adults with a diagnosis of autism spectrum disorder but no neurological abnormality. All participants had full-scale intelligence quotient scores >78 and groups were matched on age, handedness, and gender ratio. Using the Autism Diagnostic Observation Schedule together with current clinical presentation to assess autistic symptomatology, we found that 8/26 (about a third) of agenesis subjects presented with autism. However, more formal diagnosis additionally involving recollective parent-report measures regarding childhood behaviour showed that only 3/22 met complete formal criteria for an autism spectrum disorder (parent reports were unavailable for four subjects). We found no relationship between intelligence quotient and autism symptomatology in callosal agenesis, nor evidence that the presence of any residual corpus callosum differentiated those who exhibited current autism spectrum symptoms from those who did not. Relative to the autism spectrum comparison group, parent ratings of childhood behaviour indicated children with agenesis were less likely to meet diagnostic criteria for autism, even for those who met autism spectrum criteria as adults, and even though there was no group difference in parent report of current
This monograph presents in great detail a large number of both unpublished and previously published Babylonian mathematical texts in the cuneiform script. It is a continuation of the work A Remarkable Collection of Babylonian Mathematical Texts (Springer 2007) written by Jöran Friberg, the leading expert on Babylonian mathematics. Focussing on the big picture, Friberg explores in this book several Late Babylonian arithmetical and metro-mathematical table texts from the sites of Babylon, Uruk and Sippar, collections of mathematical exercises from four Old Babylonian sites, as well as a new text from Early Dynastic/Early Sargonic Umma, which is the oldest known collection of mathematical exercises. A table of reciprocals from the end of the third millennium BC, differing radically from well-documented but younger tables of reciprocals from the Neo-Sumerian and Old-Babylonian periods, as well as a fragment of a Neo-Sumerian clay tablet showing a new type of a labyrinth are also discussed. The material is presen...
This interesting and informative review by Liu and colleagues  in this issue covers the full spectrum of research on the idea that in natural language, dependency distance tends to be small. The authors discuss two distinct research threads: experimental work from psycholinguistics on online processes in comprehension and production, and text-corpus studies of dependency length distributions.
Purificación Sánchez Hernández
Full Text Available Grammars and dictionaries usually offer relevant and accurate information to students of a second language. However, the meaning of a textual element is often dynamic and that information is not always based on real usage patterns. New occurrences on the object level in new contexts can introduce novel semantic potentials, so that existing interpretations may be superseded by new ones. Concordancing has been shown to be one of the most important tools to facilitate the understanding of the usage patterns of a language. In this paper we examine the differences between amount, quantity and body as terms expressing magnitude, sum and size in a corpus of Biology. According to some popular dictionaries and grammars, the terms amount and quantity have always been considered synonymous terms for expressing magnitude, size and sum. We demonstrate that, according to our records, they cannot be always used as synonymous terms since they have different patterns of usage. On the other hand there are other forms, such as body, that appear in our Corpus, implying magnitude, size and sum, that are not usually described as having such meanings in dictionaries.
M. Pınar Babanoğlu
Full Text Available Contraction forms in English are mostly occur in speech and informal writing and they are generally avoided in formal writing types such as academic prose, business reports and journal articles, therefore, most teachers discourage their use in academic essays (Biber, Johansonn, Leech, Conrad and Finegan 1999. Contractions in English have two types; negative contractions (isn’t, haven’t, doesn’t and verb contractions (I’m, they’ve, that’s. This corpus based study attempts to investigate contraction usage in learner and native English speaker essays. Major goal is to examine whether learners consider essay writing rules in respect of contractions which are accepted inappropriate for academic prose style. Five corpora, three learner and two native English, were utilized in order to analyze verb and not-contraction forms. Frequency calculations of contraction forms in each corpus compared via log-likelihood measurement for statistical significance. Results revealed that learners use considerably more contraction forms, especially negative ones, than native English students in their argumentative essays.
Full Text Available Recent years have seen the rise of musical corpus studies, primarily detailing harmonic tendencies of tonal music. This article extends this scholarship by addressing a new genre (rap music and a new parameter of focus (rhythm. More specifically, I use corpus methods to investigate the relation between metric ambivalence in the instrumental parts of a rap track (i.e., the beat and an emcee's rap delivery (i.e., the flow. Unlike virtually every other rap track, the instrumental tracks of Outkast's "Mainstream" (1996 simultaneously afford hearing both a four-beat and a three-beat metric cycle. Because three-beat durations between rhymes, phrase endings, and reiterated rhythmic patterns are rare in rap music, an abundance of them within a verse of "Mainstream" suggests that an emcee highlights the three-beat cycle, especially if that emcee is not prone to such durations more generally. Through the construction of three corpora, one representative of the genre as a whole, and two that are artist specific, I show how the emcee T-Mo Goodie's expressive practice highlights the rare three-beat affordances of the track.
Ebtisam Saleh Aluthman
Full Text Available The present study is conducted within the borders of lexicographic research, where corpora have increasingly become all-pervasive. The overall goal of this study is to compile an open-source OPEC Word List (OWL that is available for lexicographic research and vocabulary learning related to English language learning for the purpose of oil marketing and oil industries. To achieve this goal, an OPEC Monthly Reports Corpus (OMRC comprising of 1,004,542 words was compiled. The OMRC consists of 40 OPEC monthly reports released between 2003 and 2015. Consideration was given to both range and frequency criteria when compiling the OWL which consists of 255 word types. Along with this basic goal, this study aims to investigate the coverage of the most well-recognised word lists, the General Service List of English Words (GSL (West ,1953 and the Academic Word List (AWL (Coxhead, 2000 in the OMRC corpus. The 255 word types included in the OWL are not overlapping with either the AWL or the GSL. Results suggest the necessity of making this discipline-specific word list for ESL students of oil marketing industries. The availability of the OWL has significant pedagogical contributions to curriculum design, learning activities and the overall process of vocabulary learning in the context of teaching English for specific purposes (ESP. OPEC stands for Organisation of Petroleum Exporting Countries.
Barrios Rozúa, Juan Manuel
Full Text Available The Corpus Christi hospital of Granada was a victim of prejudices against the baroque on the part of influential historians. Nevertheless, the building is an interesting example of hospital architecture with a frankly original temple. Thanks to the exhaustive analysis of the institution’s very complete archive, it can be determined that some thirty artists worked there, including Alonso Cano and his disciple Juan Luis de Ortega, whose architectural works are evaluated here.
El hospital del Corpus Christi de Granada fue víctima de los prejuicios contra el barroco de influyentes historiadores. Sin embargo, el edificio constituye un interesante ejemplo de arquitectura hospitalaria con un templo francamente original. Gracias al análisis exhaustivo de su completo archivo, puede detectarse la labor de una treintena de artífices, entre ellos Alonso Cano y su discípulo Juan Luis de Ortega, cuyas obras arquitectónicas son valoradas aquí.
Full Text Available Corpus luteum cyst rupture with consequent hemoperitoneum is a common disorder in women in their reproductive age. This condition should be promptly recognized and treated because a delayed diagnosis may significantly reduce women’s fertility and intra-abdominal bleeding may be life-threatening. Many imaging modalities play a key role in the diagnosis of acute pelvic pain from gynecological causes. Ultrasound study (USS is usually the first imaging technique for initial evaluation. USS is used to confirm or to exclude the presence of intraperitoneal fluid but it has some limitations in the identification of the bleeding source. Contrast-enhanced computed tomography (CT is the imaging modality which could be used in the acute setting in order to recognize gynecological emergencies and to establish a correct management. Magnetic resonance imaging (MRI nowadays is the most useful technique for studying the pelvis but its low availability and the long acquisition time of the images limit its usefulness in characterization of acute gynecological complications. We report a case of a young patient with hemoperitoneum from hemorrhagic corpus luteum correctly identified by transabdominal USS and contrast-enhanced CT.
Full Text Available BACKGROUND The Corpus Callosum (CC can best be seen in the mid-sagittal section of brain both in cadaver and MRI. The morphometric measurements of the same will be of use in neurosurgical procedures. Sexual dimorphism and the age-related changes in its measurements remained controversial. Till date, no studies have been done on corpus callosum in Kerala. MATERIALS AND METHODS Measurements of CC has been taken and studied in detail in 24 formalin fixed brains from the Department of Anatomy and 48 MR images from the Department of Radiology. The changes according to age and sex were analysed. RESULTS The mean length of CC in the cadaver was 7.24 cm, which was 3.38 cm posterior to frontal pole and 5.73 cm anterior to occipital pole. In MR images, the mean length was 7.10 in males and 6.76 in females. The difference we got was not statistically significant. The length increased with age. Thickness of genu and body decreased as the age advances, but the splenial thickness was found to be increasing with age. There was significant correlation between the thicknesses of various parts of CC. CONCLUSION The values were almost similar to those in the previous studies. Morphometrically, a significant gender difference was not identified in the present study. There were changes according to age both in males and females.
Pham Thuy Dung
Full Text Available The recent yet powerful emergence of E-learning and using online resources in learning EFL (English as a Foreign Language has helped promote learner autonomy in language acquisition including self-correcting their mistakes. This pilot study despite conducted on a modest sample of 25 second year students majoring in Business English at Hanoi Foreign Trade University is an initial attempt to investigate the feasibility of using corpus-based websites to promote learner autonomy in correcting collocation errors in EFL writing. The data is collected using a pre-questionnaire and a post-interview aiming to find out the participants’ change in belief and attitude toward learner autonomy in collocation errors in writing, the extent of their success in using the corpus-based websites to self-correct the errors and the change in their confidence in self-correcting the errors using the websites. The findings show that a significant majority of students have shifted their belief and attitude toward a more autonomous mode of learning, enjoyed a fair success of using the websites to self-correct the errors and become more confident. The study also yields an implication that a face-to-face training of how to use these online tools is vital to the later confidence and success of the learners
Yu. S. Hetsevich
Full Text Available The article focuses on the problems existing in text-to-speech synthesis. Different morphological, lexical and syntactical elements were localized with the help of the Belarusian unit of NooJ program. Those types of errors, which occur in Belarusian texts, were analyzed and corrected. Language model and part of speech tagging model were built. The natural language processing of Belarusian corpus with the help of developed algorithm using machine learning was carried out. The precision of developed models of machine learning has been 80–90 %. The dictionary was enriched with new words for the further using it in the systems of Belarusian speech synthesis.
Pafilis, Evangelos; Pletscher-Frankild, Sune; Fanini, Lucia
The exponential growth of the biomedical literature is making the need for efficient, accurate text-mining tools increasingly clear. The identification of named biological entities in text is a central and difficult task. We have developed an efficient algorithm and implementation of a dictionary......-based approach to named entity recognition, which we here use to identify names of species and other taxa in text. The tool, SPECIES, is more than an order of magnitude faster and as accurate as existing tools. The precision and recall was assessed both on an existing gold-standard corpus and on a new corpus...
Full Text Available Este artigo discute a viabilidade da utilização de ferramentas da Linguística de Corpus na análise do discurso pedagógico. Para tanto, são apresentados dois estudos de caso. O primeiro focaliza o discurso de professores de língua inglesa de um renomado curso de idiomas do Rio de Janeiro acerca da implementação de recursos tecnológicos na sala de aula. O segundo estudo, por sua vez, busca perceber qual é o posicionamento de professores universitários de literaturas em língua inglesa sobre literatura e seu ensino. Os resultados apontam para a riqueza dos dados contextuais que podem ser depreendidos a partir de uma análise linguística de base empírica. Em última análise, o artigo revela a importância e a flexibilidade da abordagem de corpus na análise do discurso, que pode ser aplicada a inúmeros contextos.This paper discusses the feasibility of using Corpus Linguistics tools in the analysis of pedagogic discourse. For doing this, two case studies are presented. The first one focuses on the discourse of English language teachers of a well-known languages course in Rio de Janeiro about the implementation of technological resources in the classroom. The second study, in its turn, seeks to realize the position held by university professors of literatures in English language with regard to literature and its teaching. The results point out to the richness of contextual data which can be inferred from a linguistic analysis with an empirical basis. All in all, the paper uncovers the importance and flexibility of the corpus approach in discourse analysis, which may be applied to several contexts.
Other investigators have reported that amphetamine administered to rodents results in an increase in the in vivo accumulation of either the tritiated dopamine receptor ligand, spiperone or pimozide in the dopaminergic corpus striatum, (specific binding) while not altering that in the sparsely dopaminergically innervated cerebellum (non-specific binding). Experiments were undertaken to determine if the results could be replicated and if some other drugs would modify the effect. Male mice were injected with [ 3 H]-spiperone (20 μCi/Kg, 0.0003 mg/kg) s.c. and killed 2 hrs later for determination of radioactivity in corpus striatum and cerebellum. Amphetamine (20 mg/kg, i.p.) given 15 min before [ 3 H]-spiperone, increased accumulation in striatum but not cerebellum. The increase was inhibited by α - methyltyrosine (α-MT), haloperidol, reserpine or amantadine. It is suggested that the amphetamine-induced increase in accumulation of [ 3 H]-spiperone in corpus striatum (specific binding) depends on release of large amounts of dopamine, which then must be able to interact with the dopamine receptor. The antagonism of the effect by α-MT or reserpine can be explained by dopamine depletion, that of haloperidol by antagonism for binding at the receptor site. It is suggested that amantadine acts by a dual mechanism: (1) as a low efficacy agonist, it competes for binding to the receptor and (2) it has some ability to block dopamine release
Spencer, Brenda H.
Notes that a text map is an instructional approach designed to help students gain fluency in reading content area materials. Discusses how the goal is to teach students about the important features of the material and how the maps can be used to build new understandings. Presents the procedures for preparing and using a text map. (SG)
Michael G. Newbrey
Full Text Available Cardabiodon ricki and Cardabiodon venator were large lamniform sharks with a patchy but global distribution in the Cenomanian and Turonian. Their teeth are generally rare and skeletal elements are less common. The centra of Cardabiodon ricki can be distinguished from those of other lamniforms by their unique combination of characteristics: medium length, round articulating outline with a very thick corpus calcareum, a corpus calcareum with a laterally flat rim, robust radial lamellae, thick radial lamellae that occur in low density, concentric lamellae absent, small circular or subovate pores concentrated next to each corpus calcareum, and papillose circular ridges on the surface of the corpus calcareum. The large diameter and robustness of the centra of two examined specimens suggest that Cardabiodon was large, had a rigid vertebral column, and was a fast swimmer. The sectioned corpora calcarea show both individuals deposited 13 bands (assumed to represent annual increments after the birth ring. The identification of the birth ring is supported in the holotype of Cardabiodon ricki as the back-calculated tooth size at age 0 is nearly equal to the size of the smallest known isolated tooth of this species. The birth ring size (5–6.6 mm radial distance [RD] overlaps with that of Archaeolamna kopingensis (5.4 mm RD and the range of variation of Cretoxyrhina mantelli (6–11.6 mm RD from the Smoky Hill Chalk, Niobrara Formation. The revised, reconstructed lower jaw dentition of the holotype of Cardabiodon ricki contains four anterior and 12 lateroposterior files. Total body length is estimated at 5.5 m based on 746 mm lower jaw bite circumference reconstructed from associated teeth of the holotype.
Leung Ray C. H.
Full Text Available Contextualized within immigrants’ acquisition of specialized knowledge about the host country at the institutional level, this article examines a 64295-word corpus of textbooks written for participants of the orientation course in German politics, history and culture. Corpus-based techniques (“keyness,” collocation and qualitative examination of concordance lines are deployed to explore the corpus. The findings reveal that the collocational patterns of the identified keywords construct particular world views vis-à-vis Germany. For instance, the keyword DDR [German Democratic Republic (GDR, aka East Germany] frequently co-occurs with negatively connoted lexis while collocates of the keywords denoting present-day Germany (e.g., Bundesrepublik Deutschland [Federal Republic of Germany] and Staat [nation, country, state] facilitate the portrayal of Germany as a nurturing welfare state that is popular among foreigners. It is argued that such discursively-construed opposition between the “bad” GDR and the “good” Federal Republic of Germany helps to legitimize the German reunification. Furthermore, it is found that certain keywords (e.g., Sie [you], Kurs [course, class] and z.B. [e.g.] are “metadiscourse resources” (Hyland, 2005. Their pedagogic effects are discussed in relation to the ideological implications of the research findings.
Full Text Available Aims: To emphasize the functional vision characteristics in visually impaired multiple disabled children (MDVI aged 2 to 9 years old related to brain damages on magnetic resonance imaging in different cortical and subcortical areas and in the corpus callosum region. Material and Method: 12 MDVI children with severe and mild neurological disorders were medically and neuropsychological assessed. The clinical - psychological, neurological and ophthalmological – and paraclinical methods – visual evoked potential (VEP and magnetic resonance imaging (MRI were carried out in order to outline the complete profile of each child. The assessment was completed by morphometric measurement of corpus callosum and brain. Results: 10 of infants with severe neurological disorders showed ocular disorders such as ocular motility and visual function abnormalities. Severe cognitive and psychomotor retardation were associated in visual disorders in MDVI children. Significant correlation between neurological disorders, neuropsychological [τ(12 = 0.783, p = 0.001] evaluation and visual acuity [τ(12 = 0.783, p = 0.001] were found in multiple disabled children. The significant difference of diameter [t(22 = -4.858, p = 0.000] and surface of corpus callosum [t(22 = -6.254, p = 0.000] in multiple disabled children compared with control group was found. Conclusion: The structured assessment of visually impaired children due to neurological disorders, as early as possible, is the remarkably key which reveals the functionality of child and outlines the appropriate developmental and educational rehabilitation.
Full Text Available Dr. Paweł Rutkowski is head of the Section for Sign Linguistics at the University of Warsaw. He is a general linguist and a specialist in the field of syntax of natural languages, carrying out research on Polish Sign Language (polski język migowy — PJM. He has been awarded a number of prizes, grants and scholarships by such institutions as the Foundation for Polish Science, Polish Ministry of Science and Higher Education, National Science Centre, Poland, Polish–U.S. Fulbright Commission, Kosciuszko Foundation and DAAD. Dr. Rutkowski leads the team developing the Corpus of Polish Sign Language and the Corpus-based Dictionary of Polish Sign Language, the first dictionary of this language prepared in compliance with modern lexicographical standards. The dictionary is an open-access publication, available freely at the following address: http://www.slownikpjm.uw.edu.pl/en/. This interview took place at eLex 2017, a biennial conference on electronic lexicography, where Dr. Rutkowski was awarded the Adam Kilgarriff Prize and gave a keynote address entitled Sign language as a challenge to electronic lexicography: The Corpus-based Dictionary of Polish Sign Language and beyond. The interview was conducted by Dr. Victoria Nyst from Leiden University, Faculty of Humanities, and Dr. Iztok Kosem from the University of Ljubljana, Faculty of Arts.
María Cristina Patiño-González
Full Text Available Luego de casi tres años y medio sin que existiera en Colombia un desarrollo legal del habeas corpus, el pasado 2 de noviembre de 2006 fue sancionada la Ley Estatutaria 1095, por la cual se reglamentó el artículo 30 de la Carta Política. Este cuerpo normativo dispuso que el habeas corpus tenía la naturaleza jurídica de un derecho fundamental y una acción constitucional que tutela la libertad personal cuando alguien es privado de dicha libertad con violación de las garantías constitucionales y legales. Sin embargo, en aplicación del bloque de constitucionalidad, el propio desarrollo de la Ley Estatutaria y la jurisprudencia de la Corte Constitucional, el habeas corpus también se erige como la garantía fundamental que protege los derechos funda mentales colaterales de los detenidos y ostenta la naturaleza de un recurso de amparo. El artículo ofrece un estudio sobre el desarrollo previsto por la Ley Estatutaria de Habeas Corpus en materia de definición, competencia, garantías para el ejercicio de la acción, contenido de la petición, su trámite, decisión y los medios de impugnación susceptibles de impetrar, y analiza críticamente la Sentencia C-187/06 de la Corte Constitucional que realizó el control previo de constitucionalidad; ofrece además una serie de aportes para una interpretación más garantista de la institución y se hacen observaciones de lege ferenda
Full Text Available R has gained explicit text mining support with the tm package enabling statisticians to answer many interesting research questions via statistical analysis or modeling of (text corpora. However, we typically face two challenges when analyzing large corpora: (1 the amount of data to be processed in a single machine is usually limited by the available main memory (i.e., RAM, and (2 the more data to be analyzed the higher the need for efficient procedures for calculating valuable results. Fortunately, adequate programming models like MapReduce facilitate parallelization of text mining tasks and allow for processing data sets beyond what would fit into memory by using a distributed file system possibly spanning over several machines, e.g., in a cluster of workstations. In this paper we present a plug-in package to tm called tm.plugin.dc implementing a distributed corpus class which can take advantage of the Hadoop MapReduce library for large scale text mining tasks. We show on the basis of an application in culturomics that we can efficiently handle data sets of significant size.
Lee, Myoung Seok; Moon, Min Hoan; Woo, Hyun Sik; Sung, Chang Kyu; Jeon, Hye Won; Lee, Taek Sang [SMG-SNU Boramae Medical Center, Seoul National University College of Medicine, Seoul (Korea, Republic of)
To evaluate the determinant pretreatment CT findings that can predict surgical intervention for patients suffering from corpus luteal cyst rupture with hemoperitoneum. From January 2009 to December 2014, a total of 106 female patients (mean age, 26.1 years; range, 17–44 years) who visited the emergency room of our institute for acute abdominal pain and were subsequently diagnosed with ruptured corpus luteal cyst with hemoperitoneum were included in the retrospective study. The analysis of CT findings included cyst size, cyst shape, sentinel clot sign, ring of fire sign, hemoperitoneum depth, active bleeding in portal phase and attenuation of hemoperitoneum. The comparison of CT findings between the surgery and conservative management groups was performed with the Mann-Whitney U test or chi-square test. Logistic regression analysis was used to determine significant CT findings in predicting surgical intervention for a ruptured cyst. Comparative analysis revealed that the presence of active bleeding and the hemoperitoneum depth were significantly different between the surgery and conservative management groups and were confirmed as significant CT findings for predicting surgery, with adjusted odds ratio (ORs) of 3.773 and 1.318, respectively (p < 0.01). On the receiver-operating characteristic curve analysis for hemoperitoneum depth, the optimal cut-off value was 5.8 cm with 73.7% sensitivity and 58.6% specificity (Az = 0.711, p = 0.004). In cases with a hemoperitoneum depth > 5.8 cm and concurrent active bleeding, the OR for surgery increased to 5.786. The presence of active bleeding and the hemoperitoneum depth on a pretreatment CT scan can be predictive warning signs of surgery for a patient with a ruptured corpus luteal cyst with hemoperitoneum.
El discurso de la ciencia y la tecnología en la prensa escrita chilena: aproximación al corpus DICIPE-2004 O Discurso da ciência e da tecnologia da imprensa escrita chilena: aproximação ao corpus DICIPE-2004 The discourse of science and technology in the chilean press: an approximation to the DICIPE-2004 corpus
Full Text Available La comunicación de la ciencia y la tecnología (C&T ha cobrado gran relevancia en los últimos años, inicialmente a través de artículos científicos y actualmente através de los medios masivos de comunicación. En este contexto, los objetivos de este artículo son: a determinar y cuantificar, en términos comparativos, el espacio que un grupo de cinco periódicos chilenos destinan a la divulgación de temáticas de C&T; b determinar los tipos de textos periodísticos a través de los cuales se divulga la C&T en la prensa escrita; c identificar los macrotemas, subtemas y disciplinas presentes en el corpus. El corpus fue recolectado durante tres meses y quedó formado por 411 textos. Se calculó y normalizó la ocurrencia de textos y palabras, los tipos textuales, los macrotemas, temas y disciplinas en las cuales se inscribe cada texto. Los hallazgos muestran, entre otros, que la divulgación de la C&T ocupa en promedio un 1% de lo que se publica en estos cinco periódicos y que, mayoritariamente, predominan textos relacionados con ciencia médicas, astronomía y astrofísica y ciencias de la vida.A comunicação na ciência e na tecnologia (C&T assumiu grande relevância nos últimos anos, inicialmente através de artigos e posteriormente através dos meios de comunicação de massa. Neste contexto, os objetivos deste artigo são: a determinar e quantificar, em termos comparativos, o espaço que um grupo de jornais chilenos destina à divulgação de temas de C&T; b determinar os tipos de texto jornalpisticos através dos quais se divulga a C & T na imprensa escrita; c identificar os macrotemas, subtemas e disciplinas presentes no corpus. O corpus foi coletado durante três meses e é formado por 411 textos. Foram calculados e normatizados as ocorrências de textos e palavras, os tipo de textos, os macrotemas, temas e disciplinas nas quais se insere cada texto. Os achados mostram, entre outros, que a divulgação de C & T ocupa, em média, 1
Laursen, Anne Lise; Pellón, Ismael Arinas
translators only allows for teaching the students methods that they can apply systematically to several professional tasks. The chapter illustrates how the traditional translation training strategies can be combined with the use of concordancing software to cope with translations.......The chapter describes the corpus analysis strategies used with the translation master’s students at the Department of Business Communication at the Faculty of Business and Social Sciences (formerly Aarhus School of Business or ASB). The short time available for the training of specialized...
Park, Sung Eun; Choi, Dae Seob; Shin, Hwa Seon; Baek, Hye Jin; Choi, Ho Cheol; Kim, Ji Eun; Choi, Hye Young; Park, Min Jung [Dept. of Radiology, Gyeongsang National University School of Medicine, Jinju (Korea, Republic of)
The corpus callosum (CC) is the largest white matter structure in the brain, consisting of more than 200–250 million axons that provide a large connection mainly between homologous cerebral cortical areas in mirror image sites. The posterior end of the CC is the thickest part, which is called the slenium. Various diseases including congenital to acquired lesions including congenital anomalies, traumatic lesions, ischemic diseases, tumors, metabolic, toxic, degenerative, and demyelinating diseases, can involve the splenium of the CC and their clinical symptoms and signs are also variable. Therefore, knowledge of the disease entities and the imaging findings of lesions involving the splenium is valuable in clinical practice. MR imaging is useful for the detection and differential diagnosis of splenial lesions of the CC. In this study, we classify the disease entities and describe imaging findings of lesions involving the splenium of the CC based on our experiences and a review of the literature.
Elster, A.D.; DiPersio, D.A.; Moody, D.M.
Magnetic resonance (MR) imaging was performed in 120 normal right-handed individuals (60 males, 60 females) to clarify existing contradictory data concerning possible sexual dimorphism of the human corpus callosum (CC). Five linear and three area measurements of the CC and brain were obtained directly at the MR scanner console from midline sagittal T1-weighted images. The anteroposterior length of the CC was significantly larger in males than in females (p=0.0005). No other differences in absolute callosal measurements between the sexes could be demonstrated. However, several size ratios did achieve statistical significance (p<0.05), being consistently larger in females: splenial width/length CC, splenial width/brain length, and area of CC/area of brain. Where no statistically significant differences were obtained, precision, tolerance, and confidence interval calculations are presented. The data in this large series support a limited but definite sexual dimorphism of the CC in right-handed individuals. (author)
Bernicot, J.; Goumi, A.; Bert-Erboul, A.; Volckaert-Legrier, O.
The link between students' spelling level and their text-messaging practice gives rise to numerous questions from teachers, parents and the media. A corpus of 4524 text messages produced in daily-life situations by students in sixth and seventh grade (n?=?19, 11-12 years of age) was compiled. None of the participants had ever owned or used a…
Hollaar, L A
The Utah Text Retrieval project seeks well-engineered solutions to the implementation of large, inexpensive, rapid text information retrieval systems. The project has three major components. Perhaps the best known is the work on the specialized processors, particularly search engines, necessary to achieve the desired performance and cost. The other two concern the user interface to the system and the system's internal structure. The work on user interface development is not only concentrating on the syntax and semantics of the query language, but also on the overall environment the system presents to the user. Environmental enhancements include convenient ways to browse through retrieved documents, access to other information retrieval systems through gateways supporting a common command interface, and interfaces to word processing systems. The system's internal structure is based on a high-level data communications protocol linking the user interface, index processor, search processor, and other system modules. This allows them to be easily distributed in a multi- or specialized-processor configuration. It also allows new modules, such as a knowledge-based query reformulator, to be added. 15 references.
Full Text Available Ibrahim G Alghamdi,1 Issam I Hussain,1 Mohamed S Alghamdi,2 Mohamed A El-Sheemy1,3 1University of Lincoln, Brayford Pool, Lincoln, UK; 2Ministry of Health, General Directorate of Health Affairs, Al-Baha, Kingdom of Saudi Arabia; 3Research and Development, Lincoln Hospital, Lincolnshire Hospitals NHS Trust, Lincoln, UK Background: The present study reviews the epidemiological data on corpus uteri cancer among Saudi women, including its frequency, crude incidence rate, and age-standardized incidence rate (ASIR, adjusted by region and year of diagnosis. Methods: A retrospective, descriptive epidemiological analysis was conducted of all the corpus uteri cancer cases recorded in the Saudi Cancer Registry between January 2001 and December 2008. The statistical analyses were performed using descriptive statistics, analysis of variance, Poisson regression, and a simple linear model. Results: A total of 1,060 corpus uteri cancer cases were included. Women aged 60–74 years of age were most affected by the disease. The region of Riyadh in Saudi Arabia had the highest overall ASIR, at 4.4 cases per 100,000 female patients, followed by the eastern region, at 4.2, and Makkah, at 3.7. Jazan, Najran, and Qassim had the lowest average ASIRs, ranging from 0.8 to 1.4. A Poisson regression model using Jazan as the reference revealed that the corpus uteri cancer incidence rate ratio was significantly higher for the regions of Makkah, at 16.5 times (95% confidence interval [CI]: 8.0–23.0, followed by Riyadh, at 16.0 times (95% CI: 9.0–22.0, and the eastern region, at 9.9 times (95% CI: 5.6–17.6. The northern region experienced the highest changes in ASIRs of corpus uteri cancer among female Saudi patients between 2001 and 2008. Conclusion: There was a slight increase in the crude incidence rates and ASIRs for corpus uteri cancer in Saudi Arabia between 2001 and 2008. Older Saudi women were most affected by the disease. Riyadh, the eastern region, and Makkah
Full Text Available One aspect of short message service (SMS communication through a cell phone is use of politeness strategies. As it is extensively argued that females are more polite language users, the present study sought to describe the strategies used by these two groups and to find out whether there is any significant difference between male and female English as a foreign language (EFL learners in the use of positive and negative politeness strategies in sending SMS to their professors, considering that there is an asymmetric power relation and social distance between them. To this end, a corpus of 300 L1 (Persian and L2 (English request messages was compiled. Results of qualitative and quantitative data analysis showed no significant difference between the two groups. Results of the study have implication in politeness research.
Zangana Hero M
Full Text Available Abstract Background Bilateral ischemic infarction involving the corpus striatum is a rare event which usually results from global cerebral hypoxia, intoxications, and drug abuse. Case presentation We report a 28 year old Caucasian woman who presented with progressive obtundation and later development of severe expressive dysphasia and Parkinsonism after sustaining ischemic stroke of both corpora striata. Hemorrhagic transformation developed on day four of admission. Conclusion This is a rare case of bilateral basal ganglia infarction with hemorrhagic transformation in a young patient. Our patient's work up did not reveal any cause behind this stroke; however, advanced investigations (such as genetic testing and conventional angiography were not done. The damage resulted in motor dysphasia and Parkinsonism. Neither dystonia nor other involuntary movements developed, and cognitive function was not assessed because of the language disorder.
Maritza Fernanda Ortega
Full Text Available The present paper reports the implementation of syllabus innovations in EFL teacher education in Chile after diagnosing a lack of language achievement standards common to all EFL teacher training programs offered in public and private universities alike. The aim of this study is to collect linguistic data in natural and artificial social contexts – EFL trainees’ intermediate status between their native language (Spanish and the target language (English – in order to create the first Chilean corpus of spoken English as a foreign language, in the interest of analyzing the errors that are most likely to be made and fossilized by native speakers of Chilean Spanish. Once the results of this exercise are available, EFL trainers, professors, and SLA researchers will be able to design a newly sequenced syllabus based on the Content-based Approach and tailored to students’ needs so as to enhance oral performance in L2 English.
Soon Young Ko
Full Text Available Reversible focal lesions on the splenium of the corpus callosum (SCC have been reported in patients with mild encephalitis/encephalopathy caused by various infectious agents, such as influenza, mumps, adenovirus, Varicella zoster, Escherichia coli, Legionella pneumophila, and Staphylococcus aureus. We report a case of a reversible SCC lesion causing reversible encephalopathy in nonfulminant hepatitis A. A 30-year-old healthy male with dysarthria and fever was admitted to our hospital. After admission his mental status became confused, and so we performed electroencephalography (EEG and magnetic resonance imaging (MRI of the brain, which revealed an intensified signal on diffusion-weighted imaging (DWI at the SCC. His mental status improved 5 days after admission, and the SCC lesion had completely disappeared 15 days after admission.
Full Text Available Japanese language learners aim to acquire reading, listening, writing and speaking skills. We at the Hinoki project (https://hinoki-project.org/ have recently been working on the Natsume collocation search system (https://hinoki-project.org/natsume/, the Natane learner corpus to support Natsume (https://hinoki-project.org/natane/ and the Nutmeg writing support system (http://hinoki-project.org/nutmeg/. In order to test the effectiveness of Nutmeg, we conducted an online experiment with 36 participants who used the system's register misuse identification feature to correct four writing assignments. Results show that Nutmeg can be an effective tool in correcting common register-related errors, especially those involving auxiliary verbs. However, the accuracy of verb and adverb identification was too low, suggesting the need for improvements in the variety of corpora used for identifying register misuse.
Full Text Available English verbs have built-in properties that determine how they behave syntactically and generate appropriate meaning associated. With these inherent properties some verbs can fill in only in certain syntactic structures and some in others. The observation of the verb COOK using English corpus has revealed its lexical properties covering the area of syntax, semantics, and collocation suggesting uniqueness of its behaviours that are distinguishable from other verbs. Having found the lexical properties of COOK, this article concludes that the acquisition of lexicon should include lexical properties that reflect their level of competence. It also argues that the acquisition of lexical properties should be implicit, not through meta-linguistic knowledge. This would render early grammar teaching unnecessary. The acquisition of lexical properties should take place through subconscious process, not explicit grammar instruction. Many of these are grammatical aspects such as word order, sentence construction, grammatical and lexical collocations.
Full Text Available The present study analyzed different types of errors in the EFL learners’ IELTS essays. In order to determine the major types of errors, a corpus of 70 IELTS examinees’ writings were collected, and their errors were extracted and categorized qualitatively. Errors were categorized based on a researcher-developed error-coding scheme into 13 aspects. Based on the descriptive statistical analyses, the frequency of each error type was calculated and the commonest errors committed by the EFL learners in IELTS essays were identified. The results indicated that the two most frequent errors that IELTS candidates committed were related to word choice and verb forms. Based on the research results, pedagogical implications highlight analyzing EFL learners’ writing errors as a useful basis for instructional purposes including creating pedagogical teaching materials that are in line with learners’ linguistic strengths and weaknesses.
Full Text Available The paper describes the findings of a biblio-metric research of Transformación corpus aimed at the knowledge of the main sources consulted by the authors and its correspondence to the main contribution of the scientific community to the development of pedagogical sciences in Cuba. The total number of issues and articles published between 2011 and 2013 were selected as sample. 933 consulted works were studied and the references were grouped into belonging and not-belonging to campus’ authors. Likewise, the papers refers to the number of Ph. D. and master dissertations, research reports and pedagogical journal articles used as references. Finally, the findings were correlated to the selection of books suggested by Chávez, Deler y Suárez in “La actualidad de la pedagogía y la didáctica en Cuba” (2009. Key words: Biliometry, biblio-metric indicators, bibliography refences.
Anderson, Luke B; Paul, Lynn K; Brown, Warren S
People with agenesis of the corpus callosum (AgCC) with normal general intelligence have deficits in complex cognitive processing, as well as in social cognition. It is uncertain the extent to which impoverished processing of emotions may contribute to social processing deficiencies. We used the Mayer-Salovey-Caruso Emotional Intelligence Test to clarify the nature of emotional intelligence in 16 adults with AgCC. As hypothesized, persons with AgCC exhibited greater disparities from norms on tests involving more socially complex aspects of emotions. The AgCC group did not differ from norms on the Experiential subscale, but they were significantly below norms on the Strategic subscale. These findings suggest that the corpus callosum is not essential for experiencing and thinking about basic emotions in a "normal" way, but is necessary for more complex processes involving emotions in the context of social interactions. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: firstname.lastname@example.org.
Gómez, Nicolás Garófalo; Hamad, Ana Paula; Marinho, Murilo; Tavares, Igor M; Carrete, Henrique; Caboclo, Luís Otávio; Yacubian, Elza Márcia; Centeno, Ricardo
Startle epilepsy is a syndrome of reflex epilepsy in which the seizures are precipitated by a sudden and surprising, usually auditory, stimulus. We describe herein a girl who had been suffering with startle-induced seizures since 2 years of age. She had focal, tonic and tonic-clonic seizures, refractory to antiepileptic treatment. Daily tonic seizures led to very frequent falls and morbidity. Neurologically, she had no deficit. Interictal EEG showed slow waves and epileptiform discharges in central and fronto-central regions. Video-polygraphic recordings of seizures, triggered by stimuli, showed generalised symmetric tonic posturing with ictal EEG, characterised by an abrupt and diffuse electrodecremental pattern of fast activity, followed by alpha-theta rhythm superimposed by epileptic discharges predominantly over the vertex and anterior regions. Magnetic resonance imaging showed no abnormalities. Corpus callosotomy was performed when the patient was 17. Since surgery, the patient (one year follow-up) has remained seizure-free. Corpus callosotomy may be considered in patients with startle epilepsy and tonic seizures, in the absence of focal lesions amenable to surgery. [Published with video sequences].
Devoto, Luigi; Henríquez, Soledad; Kohen, Paulina; Strauss, Jerome F
The human corpus luteum (CL) is a temporary endocrine gland derived from the ovulated follicle. Its formation and limited lifespan is critical for steroid hormone production required to support menstrual cyclicity, endometrial receptivity for successful implantation, and the maintenance of early pregnancy. Endocrine and paracrine-autocrine molecular mechanisms associated with progesterone production throughout the luteal phase are critical for the development, maintenance, regression, and rescue by hCG which sustains CL function into early pregnancy. However, the signaling systems driving the regression of the primate corpus luteum in non-conception cycles are not well understood. Recently, there has been interest in the functional roles of estradiol metabolites (EMs), mostly in estrogen-producing tissues. The human CL produces a number of EMs, and it has been postulated that the EMs acting via paracrine-autocrine pathways affect angiogenesis or LH-mediated events. The present review describes advances in understanding the role of EMs in the functional lifespan and regression of the human CL in non-conception cycles. Copyright © 2017 Elsevier Inc. All rights reserved.
Bénézit, Audrey; Hertz-Pannier, Lucie; Dehaene-Lambertz, Ghislaine; Monzalvo, Karla; Germanaud, David; Duclap, Delphine; Guevara, Pamela; Mangin, Jean-François; Poupon, Cyril; Moutard, Marie-Laure; Dubois, Jessica
Isolated corpus callosum dysgenesis (CCD) is a congenital malformation which occurs during early development of the brain. In this study, we aimed to identify and describe its consequences beyond the lack of callosal fibres, on the morphology, microstructure and asymmetries of the main white matter bundles with diffusion imaging and fibre tractography. Seven children aged between 9 and 13 years old and seven age- and gender-matched control children were studied. First, we focused on bundles within the mesial region of the cerebral hemispheres: the corpus callosum, Probst bundles and cingulum which were selected using a conventional region-based approach. We demonstrated that the Probst bundles have a wider connectivity than the previously described rostrocaudal direction, and a microstructure rather distinct from the cingulum but relatively close to callosal remnant fibres. A sigmoid bundle was found in two partial ageneses. Second, the corticospinal tract, thalamic radiations and association bundles were extracted automatically via an atlas of adult white matter bundles to overcome bias resulting from a priori knowledge of the bundles' anatomical morphology and trajectory. Despite the lack of callosal fibres and the colpocephaly observed in CCD, all major white matter bundles were identified with a relatively normal morphology, and preserved microstructure (i.e. fractional anisotropy, mean diffusivity) and asymmetries. Consequently the bundles' organisation seems well conserved in brains with CCD. These results await further investigations with functional imaging before apprehending the cognition variability in children with isolated dysgenesis. Copyright © 2014 Elsevier Ltd. All rights reserved.
This book comprises a set of articles that specify the methodology of text mining, describe the creation of lexical resources in the framework of text mining and use text mining for various tasks in natural language processing (NLP). The analysis of large amounts of textual data is a prerequisite to build lexical resources such as dictionaries and ontologies and also has direct applications in automated text processing in fields such as history, healthcare and mobile applications, just to name a few. This volume gives an update in terms of the recent gains in text mining methods and reflects
Camino Rea Rizzo
Full Text Available
Wireless is the word selected to illustrate a model of analysis designed to determine the specialized character of a lexical unit. Wireless belongs to the repertoire of specialized vocabulary automatically extracted from a corpus of telecommunication engineering English (TEC. This paper describes the procedure followed in the analysis which is intended to fulfil a twofold purpose: first, to validate the automatic classification; and second, to gain a better insight on the lexical profile of telecommunication English. The statistical information provided by the variables of frequency, distribution and keyness, are combined with the data extracted from the exploration of the surrounding co-text, in order to describe the sintagmatic relations established.
El término Wireless ha sido seleccionado para ilustrar un método de análisis que tiene como fin determinar la naturaleza de la unidad léxica. Wíreless es un término especializado, extraído automáticamente de un corpus de inglés para telecomunicaciones (TEC. Este trabajo describe el procedimiento seguido para obtener un objetivo doble: primero, validar la clasificación automática; segundo, profundizar en la definición del inglés para las telecomunicaciones. La información estadística obtenida con las variables de frecuencia, distribución y palabras-clave se combina con datos extraídos del análisis del co-texto, con el fin de describir las relaciones sintagmáticas existentes.
Andronikou, Savvas; Pillay, Tanyia; Gabuza, Lungile; Mahomed, Nasreen; Naidoo, Jaishree; Tebogo Hlabangana, Linda [University of the Witwatersrand, Radiology Department, Faculty of Health Sciences, Johannesburg (South Africa); Du Plessis, Vicci [University of KwaZulu-Natal, Radiology Department, Faculty of Health Sciences, Durban (South Africa); Prabhu, Sanjay P. [Harvard Medical School, Department of Radiology, Boston Children' s Hospital, Boston, MA (United States)
Thickening of the corpus callosum is an important feature of development, whereas thinning of the corpus callosum can be the result of a number of diseases that affect development or cause destruction of the corpus callosum. Corpus callosum thickness reflects the volume of the hemispheres and responds to changes through direct effects or through Wallerian degeneration. It is therefore not only important to evaluate the morphology of the corpus callosum for congenital anomalies but also to evaluate the thickness of specific components or the whole corpus callosum in association with other findings. The goal of this pictorial review is raise awareness that the thickness of the corpus callosum can be a useful feature of pathology in pediatric central nervous system disease and must be considered in the context of the stage of development of a child. Thinning of the corpus callosum can be primary or secondary, and generalized or focal. Primary thinning is caused by abnormal or failed myelination related to the hypomyelinating leukoencephalopathies, metabolic disorders affecting white matter, and microcephaly. Secondary thinning of the corpus callosum can be caused by diffuse injury such as hypoxic-ischemic encephalopathy, human immunodeficiency virus (HIV) encephalopathy, hydrocephalus, dysmyelinating conditions and demyelinating conditions. Focal disturbance of formation or focal injury also causes localized thinning, e.g., callosal dysgenesis, metabolic disorders with localized effects, hypoglycemia, white matter injury of prematurity, HIV-related atrophy, infarction and vasculitis, trauma and toxins. The corpus callosum might be too thick because of a primary disorder in which the corpus callosum finding is essential to diagnosis; abnormal thickening can also be secondary to inflammation, infection and trauma. (orig.)
Andronikou, Savvas; Pillay, Tanyia; Gabuza, Lungile; Mahomed, Nasreen; Naidoo, Jaishree; Tebogo Hlabangana, Linda; Du Plessis, Vicci; Prabhu, Sanjay P.
Thickening of the corpus callosum is an important feature of development, whereas thinning of the corpus callosum can be the result of a number of diseases that affect development or cause destruction of the corpus callosum. Corpus callosum thickness reflects the volume of the hemispheres and responds to changes through direct effects or through Wallerian degeneration. It is therefore not only important to evaluate the morphology of the corpus callosum for congenital anomalies but also to evaluate the thickness of specific components or the whole corpus callosum in association with other findings. The goal of this pictorial review is raise awareness that the thickness of the corpus callosum can be a useful feature of pathology in pediatric central nervous system disease and must be considered in the context of the stage of development of a child. Thinning of the corpus callosum can be primary or secondary, and generalized or focal. Primary thinning is caused by abnormal or failed myelination related to the hypomyelinating leukoencephalopathies, metabolic disorders affecting white matter, and microcephaly. Secondary thinning of the corpus callosum can be caused by diffuse injury such as hypoxic-ischemic encephalopathy, human immunodeficiency virus (HIV) encephalopathy, hydrocephalus, dysmyelinating conditions and demyelinating conditions. Focal disturbance of formation or focal injury also causes localized thinning, e.g., callosal dysgenesis, metabolic disorders with localized effects, hypoglycemia, white matter injury of prematurity, HIV-related atrophy, infarction and vasculitis, trauma and toxins. The corpus callosum might be too thick because of a primary disorder in which the corpus callosum finding is essential to diagnosis; abnormal thickening can also be secondary to inflammation, infection and trauma. (orig.)
Full Text Available The analysis of electronic health records for an automated detection of adverse drug reactions is an approach to solve the problems that arise from traditional methods like spontaneous reporting or manual chart review. Algorithms addressing this task should be modeled on the criteria for a standardized case causality assessment defined by the World Health Organization. One of these criteria is the temporal relationship between drug intake and the occurrence of a reaction or a laboratory test abnormality. Appropriate data that would allow for developing or validating related algorithms is not publicly available, though.In order to provide such data, retrospective routine data of drug administrations and temporally corresponding laboratory observations from a university clinic were extracted, transformed and evaluated by experts in terms of a reasonable time relationship between drug administration and lab value alteration.The result is a data corpus of 400 episodes of normalized laboratory parameter values in temporal context with drug administrations. Each episode has been manually classified whether it contains data that might indicate a temporal correlation between the drug administration and the change of the lab value course, whether such a change is not observable or whether a decision between those two options is not possible due to the data. In addition, each episode has been assigned a concordance value which indicates how difficult it is to assess. This is the first open data corpus of a computable ground truth of temporal correlations between drug administration and lab value alterations.The main purpose of this data corpus is the provision of data for further research and the provision of a ground truth which allows for comparing the outcome of other assessments of this data with the outcome of assessments made by human experts. It can serve as a contribution towards systematic, computerized ADR detection in retrospective data. With
Full Text Available Documented associations between corpus callosum size and cognitive ability have heretofore been inconsistent potentially owing to differences in sample characteristics, differing methodologies in measuring CC size, or the use of absolute versus relative measures. We investigated the relationship between CC size and intelligence quotient (IQ in the NIH MRI Study of Normal Brain Development sample, a large cohort of healthy children and adolescents (aged six to 18, n = 198 recruited to be representative of the US population. CC midsagittal area was measured using an automated system that partitioned the CC into 25 subregions. IQ was measured using the Wechsler Abbreviated Scale of Intelligence (WASI. After correcting for total brain volume and age, a significant negative correlation was found between total CC midsagittal area and IQ (r = -0.147; p = 0.040. Post hoc analyses revealed a significant negative correlation in children (age<12 (r = -0.279; p = 0.004 but not in adolescents (age≥12 (r = -0.005; p = 0.962. Partitioning the subjects by gender revealed a negative correlation in males (r = -0.231; p = 0.034 but not in females (r = 0.083; p = 0.389. Results suggest that the association between CC and intelligence is mostly driven by male children. In children, a significant gender difference was observed for FSIQ and PIQ, and in males, a significant age-group difference was observed for FSIQ and PIQ. These findings suggest that the correlation between CC midsagittal area and IQ may be related to age and gender.
Tittarelli, Ana María; Piacente, Irma Telma
This paper sets out to report on findings about features of task-specific reformulation observed in university students in the middle stretch of the Psychology degree course (N=58) and in a reference group of students from the degree courses in Modern Languages, Spanish and Library Studies (N=33) from the National University of La Plata (Argentina). Three types of reformulation were modeled: summary reformulation, comprehensive and productive reformulation.The study was based on a corpus of 6...
Tittarelli, Ana María; Piacente, Telma
This paper sets out to report on findings about features of task-specific reformulation observed in university students in the middle stretch of the Psychology degree course (N=58) and in a reference group of students from the degree courses in Modern Languages, Spanish and Library Studies (N=33) from the National University of La Plata (Argentina). Three types of reformulation were modeled: summary reformulation, comprehensive and productive reformulation.The study was based on a corpus of 6...
Full Text Available Robert Paul1, Lorrie Henry2, Stuart M Grieve3, Thomas J Guilmette2,4, Raymond Niaura4, Richard Bryant5, Steven Bruce1, Leanne M Williams3,6, Clark C Richard7, Ronald A Cohen4, Evian Gordon3,71University of Missouri, St. Louis, St. Louis, MO, USA; 2Providence College, Providence, RI, USA; 3The Brain Resource International Database, The Brain Resource Company, Ultimo, NSW, Australia; 4Brown Medical School, Department of Psychiatry, Providence, RI, USA; 5School of Psychology, University of New South Wales, Sydney, NSW, Australia; 6Brain Dynamics Centre, Westmead Millennium Institute, Westmead Hospital, Westmead, NSW, Australia; 7Cognitive Neuroscience Laboratory and School of Psychology, Flinders University, Adelaide, SA, AustraliaBackground: Previous studies have examined the impact of early life stress (ELS on the gross morphometry of brain regions, including the corpus callosum. However, studies have not examined the relationship between ELS and the microstructural integrity of the brain.Methods: In the present study we evaluated this relationship in healthy non-clinical participants using diffusion tensor imaging (DTI and self-reported history of ELS.Results: Regression analyses revealed significant reductions in fractional anisotropy (FA within the genu of the corpus callosum among those exposed to the greatest number of early life stressors, suggesting reduced microstructural integrity associated with increased ELS. These effects were most pronounced in the genu of the corpus callosum compared to the body and splenium, and were evident for females rather than males despite no differences in total ELS exposure between the sexes. In addition, a further comparison of those participants who were exposed to no ELS vs. three or more ELS events revealed lower FA in the genu of the corpus callosum among the ELS-exposed group, with trends of FA reduction in the body and the whole corpus callosum. By contrast, there were no relationships between ELS
Samin A Sajan
Full Text Available Agenesis of the corpus callosum (ACC, cerebellar hypoplasia (CBLH, and polymicrogyria (PMG are severe congenital brain malformations with largely undiscovered causes. We conducted a large-scale chromosomal copy number variation (CNV discovery effort in 255 ACC, 220 CBLH, and 147 PMG patients, and 2,349 controls. Compared to controls, significantly more ACC, but unexpectedly not CBLH or PMG patients, had rare genic CNVs over one megabase (p = 1.48×10⁻³; odds ratio [OR] = 3.19; 95% confidence interval [CI] = 1.89-5.39. Rare genic CNVs were those that impacted at least one gene in less than 1% of the combined population of patients and controls. Compared to controls, significantly more ACC but not CBLH or PMG patients had rare CNVs impacting over 20 genes (p = 0.01; OR = 2.95; 95% CI = 1.69-5.18. Independent qPCR confirmation showed that 9.4% of ACC patients had de novo CNVs. These, in comparison to inherited CNVs, preferentially overlapped de novo CNVs previously observed in patients with autism spectrum disorders (p = 3.06×10⁻⁴; OR = 7.55; 95% CI = 2.40-23.72. Interestingly, numerous reports have shown a reduced corpus callosum area in autistic patients, and diminished social and executive function in many ACC patients. We also confirmed and refined previously known CNVs, including significantly narrowing the 8p23.1-p11.1 duplication present in 2% of our current ACC cohort. We found six novel CNVs, each in a single patient, that are likely deleterious: deletions of 1p31.3-p31.1, 1q31.2-q31.3, 5q23.1, and 15q11.2-q13.1; and duplications of 2q11.2-q13 and 11p14.3-p14.2. One ACC patient with microcephaly had a paternally inherited deletion of 16p13.11 that included NDE1. Exome sequencing identified a recessive maternally inherited nonsense mutation in the non-deleted allele of NDE1, revealing the complexity of ACC genetics. This is the first systematic study of CNVs in congenital brain malformations, and
McIntosh, Tara; Curran, James R
The increasing availability of full-text biomedical articles will allow more biomedical knowledge to be extracted automatically with greater reliability. However, most Information Retrieval (IR) and Extraction (IE) tools currently process only abstracts. The lack of corpora has limited the development of tools that are capable of exploiting the knowledge in full-text articles. As a result, there has been little investigation into the advantages of full-text document structure, and the challenges developers will face in processing full-text articles. We manually annotated passages from full-text articles that describe interactions summarised in a Molecular Interaction Map (MIM). Our corpus tracks the process of identifying facts to form the MIM summaries and captures any factual dependencies that must be resolved to extract the fact completely. For example, a fact in the results section may require a synonym defined in the introduction. The passages are also annotated with negated and coreference expressions that must be resolved.We describe the guidelines for identifying relevant passages and possible dependencies. The corpus includes 2162 sentences from 78 full-text articles. Our corpus analysis demonstrates the necessity of full-text processing; identifies the article sections where interactions are most commonly stated; and quantifies the proportion of interaction statements requiring coherent dependencies. Further, it allows us to report on the relative importance of identifying synonyms and resolving negated expressions. We also experiment with an oracle sentence retrieval system using the corpus as a gold-standard evaluation set. We introduce the MIM corpus, a unique resource that maps interaction facts in a MIM to annotated passages within full-text articles. It is an invaluable case study providing guidance to developers of biomedical IR and IE systems, and can be used as a gold-standard evaluation set for full-text IR tasks.
Full Text Available Resumen: El presente trabajo busca evidenciar la potencialidad de los corpus discursivos orales reales como herramienta para la enseñanza/aprendizaje de la atenuación en E/LE, y también pretende demostrar cómo los estudiantes pueden mejorar sus habilidades pragmáticas relacionadas con la atenuación a través de un aprendizaje consciente que les permita desenvolverse en los diferentes contextos comunicativos, igual que los nativos. Palabras clave: corpus oral real, atenuación, español lengua extranjera (E/LE Abstract: The aim of the present investigation is to point out the potentiality of using real oral discourse corpus as a didactic resource in order to teach/learn mitigation in S/SL classes. Also, we want to highlight how students can improve their pragmatic and mitigation skills, thanks to a conscious learning of those strategies that help them to communicate in different communicative contexts as well as native speakers do. Keywords: real oral corpus, mitigation, Spanish as second language (S/SL
Wilder Yesid Escobar
Full Text Available Recognizing that developing the competences needed to appropriately use linguistic resources according to contextual characteristics (pragmatics is as important as the cultural-imbedded linguistic knowledge itself (semantics and that both are equally essential to form competent speakers of English in foreign language contexts, we feel this research relies on corpus linguistics to analyze both the scope and the limitations of the sociolinguistic knowledge and the communicative skills of English students at the university level. To such end, a linguistic corpus was assembled, compared to an existing corpus of native speakers, and analyzed in terms of the frequency, overuse, underuse, misuse, ambiguity, success, and failure of the linguistic parameters used in speech acts. The findings herein describe the linguistic configurations employed to modify levels and degrees of descriptions (salient sematic theme exhibited in the EFL learners´ corpus appealing to the sociolinguistic principles governing meaning making and language use which are constructed under the social conditions of the environments where the language is naturally spoken for sociocultural exchange.
Michael J. Harris
Full Text Available In this study, we address various measures that have been employed to distinguish between syllable and stress- timed languages. This study differs from all previous ones by (i exploring and comparing multiple metrics within a quantitative and multifactorial perspective and by (ii also documenting the impact of corpus-based word frequency. We begin with the basic distinctions of speech rhythms, dealing with the differences between syllable-timed languages and stress-timed languages and several methods that have been used to attempt to distinguish between the two. We then describe how these metrics were used in the current study comparing the speech rhythms of Mexican Spanish speakers and bilingual English/Spanish speakers (speakers born to Mexican parents in California. More specifically, we evaluate how well various metrics of vowel duration variability as well as the so far understudied factor of corpus-based frequency allow to classify speakers as monolingual or bilingual. A binary logistic regression identifies several main effects and interactions. Most importantly, our results call the utility of a particular rhythm metric, the PVI, into question and indicate that corpus data in the form of lemma frequencies interact with two metrics of durational variability, suggesting that durational variability metrics should ideally be studied in conjunction with corpus-based frequency data.
Vu H. Nguyen
Full Text Available We propose an efficient method for compressing Vietnamese text using n-gram dictionaries. It has a significant compression ratio in comparison with those of state-of-the-art methods on the same dataset. Given a text, first, the proposed method splits it into n-grams and then encodes them based on n-gram dictionaries. In the encoding phase, we use a sliding window with a size that ranges from bigram to five grams to obtain the best encoding stream. Each n-gram is encoded by two to four bytes accordingly based on its corresponding n-gram dictionary. We collected 2.5 GB text corpus from some Vietnamese news agencies to build n-gram dictionaries from unigram to five grams and achieve dictionaries with a size of 12 GB in total. In order to evaluate our method, we collected a testing set of 10 different text files with different sizes. The experimental results indicate that our method achieves compression ratio around 90% and outperforms state-of-the-art methods.
Ohtake, Hiroshi; Fujita, Nobuyuki; Kawamoto, Takeshi; Morren, Brian; Ugawa, Yoshihiro; Kaneko, Shuji
We developed an English Collocations On Demand system offering on-line corpus and concordance information to help Japanese researchers acquire a better command of English collocation patterns. The Life Science Dictionary Corpus consists of approximately 90,000,000 words collected from life science related research papers published in academic…
Heide, Solveig; Keren, Boris; Billette de Villemeur, Thierry; Chantot-Bastaraud, Sandra; Depienne, Christel; Nava, Caroline; Mignot, Cyril; Jacquette, Aurélia; Fonteneau, Eric; Lejeune, Elodie; Mach, Corinne; Marey, Isabelle; Whalen, Sandra; Lacombe, Didier; Naudion, Sophie; Rooryck, Caroline; Toutain, Annick; Caignec, Cédric Le; Haye, Damien; Olivier-Faivre, Laurence; Masurel-Paulet, Alice; Thauvin-Robinet, Christel; Lesne, Fabien; Faudet, Anne; Ville, Dorothée; des Portes, Vincent; Sanlaville, Damien; Siffroi, Jean-Pierre; Moutard, Marie-Laure; Héron, Delphine
To evaluate the role that chromosomal micro-rearrangements play in patients with both corpus callosum abnormality and intellectual disability, we analyzed copy number variations (CNVs) in patients with corpus callosum abnormality/intellectual disability STUDY DESIGN: We screened 149 patients with corpus callosum abnormality/intellectual disability using Illumina SNP arrays. In 20 patients (13%), we have identified at least 1 CNV that likely contributes to corpus callosum abnormality/intellectual disability phenotype. We confirmed that the most common rearrangement in corpus callosum abnormality/intellectual disability is inverted duplication with terminal deletion of the 8p chromosome (3.2%). In addition to the identification of known recurrent CNVs, such as deletions 6qter, 18q21 (including TCF4), 1q43q44, 17p13.3, 14q12, 3q13, 3p26, and 3q26 (including SOX2), our analysis allowed us to refine the 2 known critical regions associated with 8q21.1 deletion and 19p13.1 duplication relevant for corpus callosum abnormality; report a novel 10p12 deletion including ZEB1 recently implicated in corpus callosum abnormality with corneal dystrophy; and) report a novel pathogenic 7q36 duplication encompassing SHH. In addition, 66 variants of unknown significance were identified in 57 patients encompassed candidate genes. Our results confirm the relevance of using microarray analysis as first line test in patients with corpus callosum abnormality/intellectual disability. Copyright © 2017 Elsevier Inc. All rights reserved.
Slovene Sign Language (SZJ) has as yet received little attention from linguists. This article presents some basic facts about SZJ, its history, current status, and a description of the Slovene Sign Language Corpus and Pilot Grammar (SIGNOR) project, which compiled and annotated a representative corpus of SZJ. Finally, selected quantitative data…
Merlini, Laura; Anooshiravani, Mehrak; Kanavaki, Aikaterini; Hanquinet, Sylviane [University of Geneva Children' s Hospital, Pediatric Radiology Unit, Geneva (Switzerland)
Thickened corpus callosum is a rare finding and its pathophysiology is not well known. An anomalous supracallosal bundle has been depicted by fiber tracking in some cases but no diffusion tensor imaging metrics of thickened corpus callosum have been reported. To use diffusion tensor imaging (DTI) in cases of thickened corpus callosum to help in understanding its clinical significance. During a 7-year period five children (ages 6 months to 15 years) with thickened corpus callosum were studied. We determined DTI metrics of fractional anisotropy (FA), mean diffusivity, and axial (λ1) and radial (λ2, λ3) diffusivity and performed 3-D fiber tracking reconstruction of the thickened corpus callosum. We compared our results with data from the literature and 24 age-matched controls. Brain abnormalities were seen in all cases. All children had at least three measurements of corpus callosum thickness above the 97th percentile according to age. In all children 3-D fiber tracking showed an anomalous supracallosal bundle and statistically significant decrease in FA (P = 0.003) and λ1 (P = 0.001) of the corpus callosum compared with controls, but no significant difference in mean diffusivity and radial diffusivity. Thickened corpus callosum was associated with abnormal bundles, suggesting underlying axonal guidance abnormality. DTI metrics suggested abnormal fiber compactness and density, which may be associated with alterations in cognition. (orig.)
WonHo Yoo, Isaiah
To ascertain whether what ESL/EFL grammars say is informed by what scholars discuss in the literature and supported by what corpus findings actually show, this paper first presents a brief overview of the literature on the English definite article and then compares popular ESL/EFL grammars' coverage of "the" and corpus findings on definite article…
... Jose Island Airport, TX (Lat. 27[deg]56'40'' N., long. 96[deg]59'06'' W.) Rockport, Aransas County... Meacham Blvd., Fort Worth, TX 76137; telephone (817) 321- 7716. SUPPLEMENTARY INFORMATION: History On... Corpus Christi, TX [Amended] Corpus Christi International Airport, TX (Lat. 27[deg]46'13'' N., long. 97...
Samaie, Mahmoud; Malmir, Bahareh
This article exploits the synergy of critical discourse studies and Corpus Linguistics to study the pervasive representation of Islam and Muslims in an approximate 670,000-word corpus of US news media stories published between 2001 and 2015. Following collocation and concordance analysis of the most frequent topics or categories which revolve…
Laheij, R.J.F.; Rossum, L.G.M. van; Boer, W.A. de; Jansen, J.B.M.J.
BACKGROUND: A high level of gastric acid secretion is considered to be a risk factor for reflux oesophagitis or Barrett's oesophagus. Corpus gastritis may have a protective effect on the oesophagus, because of decreased gastric acid output. AIM: To determine if corpus gastritis is associated with
Barbieri, Federica; Eckhardt, Suzanne E. B.
Arguing that the introduction of corpus linguistics in teaching materials and the language classroom should be informed by theories and principles of SLA, this paper presents a case study illustrating how corpus-based findings on reported speech can be integrated into a form-focused model of instruction. After overviewing previous work which…
Frederiksen, Kristian S; Garde, Ellen; Skimminge, Arnold
To examine the impact of corpus callosum (CC) tissue loss on the development of global cognitive and motor impairment in the elderly.......To examine the impact of corpus callosum (CC) tissue loss on the development of global cognitive and motor impairment in the elderly....
Belli, Serap Atasever
This study was designed to investigate whether contemporary corpus-informed grammar textbooks written for English language learners and teachers presented the progressive use of stative verbs and if yes, which stative verbs were presented to occur with the progressive aspect and for which functions they took this aspect. A corpus of six electronic…
Fuertes-Olivera, Pedro A.
This article investigates lexical gender in specialized communication. The key method of analysis is that of forms of address, professional titles, and "generic man" in a 10 million word corpus of written Business English. After a brief introduction and literature review on both gender in specialized communication and similar corpus-based views of…
Jerry Cheng-Yen Lai
Conclusion: According to the observed changes in incidence rate, the burden of uterine corpus cancer in the general female population is expected to increase in the near future. From a public-health perspective, care providers should develop strategies for the prevention, early detection, and intervention to reduce the rapidly increasing incidence of uterine corpus cancer in Taiwan.
This paper combines corpus processing tools to investigate the cultural elements of Saudi education of English as a foreign language (EFL). The latest Saudi EFL textbooks (2016 onwards) are available in researchable PDF formats. This helps process them through corpus search software tools. The method adopted is based on analysing 20 cultural…
Tourte, Gregory J L
Text mining tools and technologies have long been a part of the repository world, where they have been applied to a variety of purposes, from pragmatic aims to support tools. Research areas as diverse as biology, chemistry, sociology and criminology have seen effective use made of text mining technologies. Working With Text collects a subset of the best contributions from the 'Working with text: Tools, techniques and approaches for text mining' workshop, alongside contributions from experts in the area. Text mining tools and technologies in support of academic research include supporting research on the basis of a large body of documents, facilitating access to and reuse of extant work, and bridging between the formal academic world and areas such as traditional and social media. Jisc have funded a number of projects, including NaCTem (the National Centre for Text Mining) and the ResDis programme. Contents are developed from workshop submissions and invited contributions, including: Legal considerations in te...
Mehdi, Mohamad; Okoli, Chitu; Mesgari, Mostafa
Although primarily an encyclopedia, Wikipedia’s expansive content provides a knowledge base that has been continuously exploited by researchers in a wide variety of domains. This article systematically reviews the scholarly studies that have used Wikipedia as a data source, and investigates...... the means by which Wikipedia has been employed in three main computer science research areas: information retrieval, natural language processing, and ontology building. We report and discuss the research trends of the identified and examined studies. We further identify and classify a list of tools that can...... be used to extract data from Wikipedia, and compile a list of currently available data sets extracted from Wikipedia....
In order to reach far in the work for sustainable development, communication in foreign languages prior to strategic decisions is required from international partners. In this communication English has become the lingua franca. Even though the use of EFL (English as a foreign language) is widely spread, it is clear that in some geographical…
Text-Fabric is a Python3 package for Text plus Annotations. It provides a data model, a text file format, and a binary format for (ancient) text plus (linguistic) annotations. The emphasis of this all is on: data processing; sharing data; and contributing modules. A defining characteristic is that
With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the…
Riggs, Ken Roger
Discusses problems with marking free text, text that is either natural language or semigrammatical but unstructured, that prevent well-formed XML from marking text for readily available meaning. Proposes a solution to mark meaning in free text that is consistent with the intended simplicity of XML versus SGML. (Author/LRW)
Full Text Available The abundance of grammatical categories in Slavonic and their overlap are particularly evident in the agreement between conjoined subjects and predicate. When they are accompanied by agreement conditions, such as word order and animacy in Slavic languages, different agreement patterns, dependent also on concrete context and speaker, are to be expected. In this paper the study of the agreement between conjoined subjects and predicate is based on an analysis of the medieval Glagolitic Croatian Church Slavonic corpus. Number, gender, and person are grammatical categories, i. e., features of conjoined noun phrases and predicate agreement. The analysis includes noun phrases conjoined by coordinating and some non-coordinating conjunctions as well as noun phrases conjoined by a gradational ‛not only [. . .] but also’ structure. Comitative and reciprocal noun phrases are included as well. The research in the given corpus shows that the conjoined noun phrases with predicate agreement can be syntactic (predicate showing agreement with one conjunct or semantic (predicate showing agreement with all conjuncts. Syntactic agreement appears as the so-called contact agreement (predicate showing agreement with the closest conjunct and as distant agreement (predicate showing agreement with the most distant conjunct. Semantic agreement is applied mostly in accordance with G. G. Corbett’s resolution rules for Slavic languages. However, the analysis shows that some resolution rules for number should be revised due to dual number. Although absent from the majority of contemporary Slavic languages, it is precisely in historical Slavic idioms that dual number reveals its identity, highlighted in agreement study as well.
Full Text Available Este artigo apresenta a evolução e as contribuições da pesquisa em estudos da tradução orientados para corpora no Brasil. Faz-se uma revisão dos trabalhos iniciais desenvolvidos no Laboratório Experimental de Tradução (LETRA, mostrando que a maioria deles adotava uma abordagem de linguística contrastiva da tradução e que as pesquisas, gradualmente, foram evoluindo para uma preocupação com a estilística tradutória e o estilo do tradutor literário. Também se relata a compilação de um corpus para o estudo do estilo da tradução, o ESTRA, projetado exclusivamente para esse fim. Mostra-se como a pesquisa de corpora do ESTRA promove a interdisciplinaridade nos estudos da tradução e introduz a triangulação de resultados de análises realizadas com procedimentos metodológicos das diferentes abordagens usadas para estudar o estilo. Descrevem-se procedimentos metodológicos novos, em especial a etiquetagem do corpus para algumas das categorias de estilo. Termina-se o artigo com uma visão crítica sobre o que tem sido feito até o presente, apresentando perspectivas futuras de pesquisa em estilística tradutória no LETRA.
per- form a monolingual run in the target language to act as a baseline. Thirteen groups participated in the TREC-6 CLIR track. Three major...language; the use of machine-readable bilingual dictionaries or other existing linguistic re- sources; and the use of corpus resources to train or...formance for each method. In general, the best cross- language performance was between 50%-75% as ef- fective as a quality monolingual run. The TREC-7
Lipi, Afia Akhter; Yamaoka, Yuji; Rehm, Matthias
When encountering people who have a different cultural background from our own, many of us feel uncomfortable because gestures and facial expressions may not be familiar to us. Thus, to enhance the believability of conversational agents, culture-specific nonverbal behaviors should be implemented ...... a more detailed analysis about posture shifts, and proposes a chat system with an embodied conversational agent (ECA) that can act as a language trainer....... into the agents. In our previous study, with the goal of building a user interface that incorporates a user’s cultural background, we have collected comparative conversation corpus in Germany and Japan, and investigated the differences in gestures and posture shifts between these two countries. This paper reports...
Edqvist, L.E.; Fredriksson, G.; Kindahl, H.
Following parturition in cattle, prostaglandin levels are high for 10-20 days. The duration and possibly the magnitude of the release seem to be related to the time required for completion of uterine involution. Animals showing clinical signs of postpartum uterine disorder have a prolonged release of prostaglandin. The intravenous administration of an endotoxin from Salmonella typhimurium to goats induces a massive prostaglandin release terminating corpus luteum function, resulting in short oestrous cycles in non-pregnant animals and abortions in pregnant animals. The possibility exists that postpartum uterine infections may be partly responsible for the postpartum prostaglandin release and that this bacteriologic/endocrine interrelationship represents a way in which the uterus eliminates infectious agents, particularly gram-negative bacteria. (author)
Aug 26, 2016 ... Three case studies are presented, namely, `Evolving role of diabetes educators', `Cancer risk assessment' and `Dynamic concepts on disease and comorbidity' to illustrate the use of pubmed.mineR. The package generally runs fast with small elapsed times in regular workstations even on large corpus ...
Full Text Available A patient with a severe amnesic syndrome following a glioma of the splenium of the corpus callosum is reported. The long-term memory deficit involved anterograde as well as retrograde events dating back to 40 years and causing topographical disorientation. Short-term memory test performance was in the normal range, with the exception of tactile memory which was severely impaired. The patient also showed disconnection symptoms, due to severing of occipito-parietal and parieto-temporal connections, while parieto-parietal connections were undamaged.
Julia Sanmartín Sáez
Full Text Available This paper proposes, as a starting point, to elaborate a corpus of touristic discourse extracted from a database designed for that purpose. In a further stage, an analysis will be carried out, in order to study the usage and contextualization of terms related to bedroom typologies in hotel web sites, and whether they adapt or not to their definition in current tourism dictionaries and ISO hotel regulations. This methodology allows to design and make hotel promotion glossaries that include the different meaning shades and usage remarks useful and adequate to the glossaries’ final users. The analysis of usage contexts is thus a determining element when constructing definitions.
Full Text Available We provide a detailed morphometric analysis of eight transmission electron micrographs (TEMs obtained from the corpus callosum of one cynomolgus macaque. The raw TEM images are included in the article, along with the distributions of the axon caliber and the myelin g-ratio in each image. The distributions are analyzed to determine the relationship between axon caliber and g-ratio, and compared against the aggregate metrics (myelin volume fraction, fiber volume fraction, and the aggregate g-ratio, as defined in the accompanying research article entitled ‘In vivo histology of the myelin g-ratio with magnetic resonance imaging’ (Stikov et al., NeuroImage, 2015.
Puerto Moro, Laura
Ce travail analyse de façon exhaustive un ensemble dramatique qui est crucial pour une meilleure compréhension de l’itinéraire qui conduit jusqu’à notre théâtre classique: celui de la comédie urbaine de type célestinesque, comprise également par la critique comme comédie naharresca ou comme Romantic comedy. Cette étude traite de l’établissement du corpus, de sa chronologie et de sa contextualisation rituelle, ainsi que de la structure des textes analysés et des motifs qui y sont récurrents. ...
Rehman, Zobia; Anwar, Waqas; Bajwa, Usama Ijaz; Xuan, Wang; Chaoying, Zhou
Text tokenization is a fundamental pre-processing step for almost all the information processing applications. This task is nontrivial for the scarce resourced languages such as Urdu, as there is inconsistent use of space between words. In this paper a morpheme matching based approach has been proposed for Urdu text tokenization, along with some other algorithms to solve the additional issues of boundary detection of compound words, affixation, reduplication, names and abbreviations. This study resulted into 97.28% precision, 93.71% recall, and 95.46% F1-measure; while tokenizing a corpus of 57000 words by using a morpheme list with 6400 entries.
Full Text Available Electronic health records and scientific articles possess differing linguistic characteristics that may impact the performance of natural language processing tools developed for one or the other. In this paper, we investigate the performance of four extant concept recognition tools: the clinical Text Analysis and Knowledge Extraction System (cTAKES, the National Center for Biomedical Ontology (NCBO Annotator, the Biomedical Concept Annotation System (BeCAS and MetaMap. Each of the four concept recognition systems is applied to four different corpora: the i2b2 corpus of clinical documents, a PubMed corpus of Medline abstracts, a clinical trails corpus and the ShARe/CLEF corpus. In addition, we assess the individual system performances with respect to one gold standard annotation set, available for the ShARe/CLEF corpus. Furthermore, we built a silver standard annotation set from the individual systems' output and assess the quality as well as the contribution of individual systems to the quality of the silver standard. Our results demonstrate that mainly the NCBO annotator and cTAKES contribute to the silver standard corpora (F1-measures in the range of 21% to 74% and their quality (best F1-measure of 33%, independent from the type of text investigated. While BeCAS and MetaMap can contribute to the precision of silver standard annotations (precision of up to 42%, the F1-measure drops when combined with NCBO Annotator and cTAKES due to a low recall. In conclusion, the performances of individual systems need to be improved independently from the text types, and the leveraging strategies to best take advantage of individual systems' annotations need to be revised. The textual content of the PubMed corpus, accession numbers for the clinical trials corpus, and assigned annotations of the four concept recognition systems as well as the generated silver standard annotation sets are available from http://purl.org/phenotype/resources. The textual content
Duong, Hieu N.; Snasel, Vaclav
We propose an efficient method for compressing Vietnamese text using n-gram dictionaries. It has a significant compression ratio in comparison with those of state-of-the-art methods on the same dataset. Given a text, first, the proposed method splits it into n-grams and then encodes them based on n-gram dictionaries. In the encoding phase, we use a sliding window with a size that ranges from bigram to five grams to obtain the best encoding stream. Each n-gram is encoded by two to four bytes accordingly based on its corresponding n-gram dictionary. We collected 2.5 GB text corpus from some Vietnamese news agencies to build n-gram dictionaries from unigram to five grams and achieve dictionaries with a size of 12 GB in total. In order to evaluate our method, we collected a testing set of 10 different text files with different sizes. The experimental results indicate that our method achieves compression ratio around 90% and outperforms state-of-the-art methods. PMID:27965708
Finnemann, Niels Ole
text can be defined by taking as point of departure the digital format in which everything is represented in the binary alphabet. While the notion of text, in most cases, lends itself to be independent of medium and embodiment, it is also often tacitly assumed that it is, in fact, modeled around...... the print medium, rather than written text or speech. In late 20th century, the notion of text was subject to increasing criticism as in the question raised within literary text theory: is there a text in this class? At the same time, the notion was expanded by including extra linguistic sign modalities...
Full Text Available This article reports on a corpus-based exploration of the role that fictional dialogue plays in characterisation. The focus is on the two main characters of Austen’s Sense and Sensibility and (a the extent to which certain features of their dialogue can be said to tie in with general perceptions that Elinor represents the “sense” and Marianne the “sensibility” of the novel’s title; and (b the extent to which Austen can be said to have exploited these features to enable the sisters to speak with subtly differing voices. The features themselves were drawn from two linguistic frameworks, namely cohesion in text linguistics (specifically, the category of conjunctive cohesion as originated by Halliday and Hasan (1976, and the category of “involvement” in register analysis (most prominently, Biber 1988. The density of these features in each dialogue was calculated, compared statistically and salient differences considered in relation to the focal issues of the study. Although two of the five hypotheses formulated were not supported, the results overall provided strong indications that Austen successfully distinguishes between the sisters through their dialogue, and often in ways that link with less subtle, more explicit cues to their character that are given in the text. The study thus reveals how certain text-linguistic and register features can underpin characterisation in fiction, and in so doing explicates aspects of what it is that readers and literary critics respond to when they comment on characterisation in a novel.
Full Text Available Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment.In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM, and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes.This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research.
... text. What's the Big Deal? The problem is multitasking. No matter how young and agile we are, ... on something other than the road. In fact, driving while texting (DWT) can be more dangerous than ...
Full Text Available Recent evidence shows that chronic ethanol consumption increases endothelin (ET-1 induced sustained contraction of trabecular smooth muscle cells of the corpora cavernosa in corpus cavernosum of rats by a mechanism that involves increased expression of ETA and ETB receptors. Our goal was to evaluate the effects of alcohol and diabetes and their relationship to miRNA-155, miRNA-199 and endothelin receptors in the corpus cavernosum and blood of rats submitted to the experimental model of diabetes mellitus and chronic alcoholism. Forty-eight male Wistar rats were divided into four groups: control (C, alcoholic (A, diabetic (D, and alcoholic-diabetic (AD. Samples of the corpus cavernosum were prepared to study the protein expression of endothelin receptors by immunohistochemistry and expression of miRNAs-155 and -199 in serum and the cavernous tissue. Immunostaining for endothelin receptors was markedly higher in the A, D, and AD groups than in the C group. Moreover, a significant hypoexpression of the miRNA-199 in the corpus cavernosum tissue from the AD group was observed, compared to the C group. When analyzing the microRNA profile in blood, a significant hypoexpression of miRNA-155 in the AD group was observed compared to the C group. The miRNA-199 analysis demonstrated significant hypoexpression in D and AD groups compared to the C group. Our findings in corpus cavernosum showed downregulated miRNA-155 and miRNA-199 levels associated with upregulated protein expression and unaltered mRNA expression of ET receptors suggesting decreased ET receptor turnover, which can contribute to erectile dysfunction in diabetic rats exposed to high alcohol levels.
Van Engen, Kristin J.; Baese-Berk, Melissa; Baker, Rachel E.; Choi, Arim; Kim, Midam; Bradlow, Ann R.
This paper describes the development of the Wildcat Corpus of native- and foreign-accented English, a corpus containing scripted and spontaneous speech recordings from 24 native speakers of American English and 52 non-native speakers of English. The core element of this corpus is a set of spontaneous speech recordings, for which a new method of…
In the thesis a coherent text is defined as a continuity of senses of the outcome of combining concepts and relations into a network composed of knowledge space centered around main topics. And the author maintains that in order to obtain the coherence of a target language text from a source text during the process of translation, a translator can…
Full Text Available Automatic text correction is an essential problem of today text processors and editors. Thispaper introduces a novel algorithm for automation of contextual text correction using a LinguisticHabit Graph (LHG also introduced in this paper. A specialist internet crawler hasbeen constructed for searching through web sites in order to build a Linguistic Habit Graphafter text corpuses gathered in polish web sites. The achieved correction results on a basis ofthis algorithm using this LHG were compared with commercial programs which also enableto make text correction: Microsoft Word 2007, Open Office Writer 3.0 and search engineGoogle. The achieved results of text correction were much better than correction made bythese commercial tools.
van de Camp, Matje; Christiansen, Henning
It is demonstrated how Constraint Handling Rules can be applied for resolution of indirect and relative time expressions in text as part of a shallow analysis, following a specialized tagging phase. A method is currently under development, optimized for a particular corpus of historical biographies...
van Elfrinkhof, A.M.E.; Maks, I.; Kaal, A.R.; Kaal, A.R.; Maks, I.; van Elfrinkhof, A.M.E.
Abstract: This chapter explores how three methods of political text analysis can complement each other to differentiate parties in detail. A word-frequency method and corpus linguistic techniques are joined by critical discourse analysis in an attempt to assess the ideological relation between
Oellrich, Anika; Collier, Nigel; Smedley, Damian; Groza, Tudor
Electronic health records and scientific articles possess differing linguistic characteristics that may impact the performance of natural language processing tools developed for one or the other. In this paper, we investigate the performance of four extant concept recognition tools: the clinical Text Analysis and Knowledge Extraction System (cTAKES), the National Center for Biomedical Ontology (NCBO) Annotator, the Biomedical Concept Annotation System (BeCAS) and MetaMap. Each of the four concept recognition systems is applied to four different corpora: the i2b2 corpus of clinical documents, a PubMed corpus of Medline abstracts, a clinical trails corpus and the ShARe/CLEF corpus. In addition, we assess the individual system performances with respect to one gold standard annotation set, available for the ShARe/CLEF corpus. Furthermore, we built a silver standard annotation set from the individual systems' output and assess the quality as well as the contribution of individual systems to the quality of the silver standard. Our results demonstrate that mainly the NCBO annotator and cTAKES contribute to the silver standard corpora (F1-measures in the range of 21% to 74%) and their quality (best F1-measure of 33%), independent from the type of text investigated. While BeCAS and MetaMap can contribute to the precision of silver standard annotations (precision of up to 42%), the F1-measure drops when combined with NCBO Annotator and cTAKES due to a low recall. In conclusion, the performances of individual systems need to be improved independently from the text types, and the leveraging strategies to best take advantage of individual systems' annotations need to be revised. The textual content of the PubMed corpus, accession numbers for the clinical trials corpus, and assigned annotations of the four concept recognition systems as well as the generated silver standard annotation sets are available from http://purl.org/phenotype/resources. The textual content of the Sh
He, Karen Y.; Wang, Kai
Background Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. Results In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. Conclusions This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research. PMID:27685652
Ye, Zhan; Tafti, Ahmad P; He, Karen Y; Wang, Kai; He, Max M
Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research.
Margareta Manu Magda
Full Text Available The paper tries to identify the special problems posed by the study of interjection based on the examination of a corpus of texts from the old Romanian (1600–1780, referring to texts from modern Romanian. We have watched how certain interjectional formations have acquired, through diachronic expansion, new grammatical, semantic and pragmatic values.The structure of the paper is the following: the introduction (§1 summarizes the author’s position on the status of the interjection category at a morphosyntactic, semantic and pragmatic level (§1.1 and on the relation between different linguistic structures and their grammaticalization / pragmaticalization process (§1.2. The second section (§2 refers to the specific routes followed by the evolution of the various categories of the analysed interjections, from the old Romanian to the modern Romanian: the presentatives adecă, iată, ni (§2.1, the hortatives haide, ni (§2.2, the addressing particles bre, măi (§2.3, the connectors with demarcation signal function adevăr, amin (§2.4. The third section (§3 has as objective the description of a species of delocutive derivation, illustrated in Romanian by the lexicalized semantic variants of the secondary interjection Doamne!. The study concludes with several final considerations regarding the results of the research (§4.
Full Text Available
The article presents the status of the PEDANT project with parallel corpora at the Language Bank at Göteborg University. The solutions for access to the corpus data are presented. Access is provided by way of the internet and standard applications and SGML-aware programming tools. The SGML format for encoding translation pairs is outlined together. The methods allow working with everything from plain text to texts densely encoded with linguistic information.
In hierdie artikel word 'n beskrywing gegee van die stand van die PEDANT-projek met parallelle korpora by die Taalbank by die Universiteit van Göteborg. Oplossings vir die verkryging van toegang tot die korpusdata word aangedui. Toegang word verskaf deur middel van die Internet en standaardtoepassings en SGML-sensitiewe programmeringshulpmiddels. Die SGML-formaat vir die enkodering van vertaalpare word gesamentlik geskets. Hierdie metodes laat toe dat gewerk kan word met enigiets vanaf suiwer teks tot tekste wat taalkundig dig geëtiketteer is.
Hasegawa, Daisuke; Tamura, Shinji; Nakamoto, Yuya; Matsuki, Naoaki; Takahashi, Kimimasa; Fujita, Michio; Uchida, Kazuyuki; Yamato, Osamu
Several reports have described magnetic resonance (MR) findings in canine and feline lysosomal storage diseases such as gangliosidoses and neuronal ceroid lipofuscinosis. Although most of those studies described the signal intensities of white matter in the cerebrum, findings of the corpus callosum were not described in detail. A retrospective study was conducted on MR findings of the corpus callosum as well as the rostral commissure and the fornix in 18 cases of canine and feline lysosomal storage diseases. This included 6 Shiba Inu dogs and 2 domestic shorthair cats with GM1 gangliosidosis; 2 domestic shorthair cats, 2 familial toy poodles, and a golden retriever with GM2 gangliosidosis; and 2 border collies and 3 chihuahuas with neuronal ceroid lipofuscinoses, to determine whether changes of the corpus callosum is an imaging indicator of those diseases. The corpus callosum and the rostral commissure were difficult to recognize in all cases of juvenile-onset gangliosidoses (GM1 gangliosidosis in Shiba Inu dogs and domestic shorthair cats and GM2 gangliosidosis in domestic shorthair cats) and GM2 gangliosidosis in toy poodles with late juvenile-onset. In contrast, the corpus callosum and the rostral commissure were confirmed in cases of GM2 gangliosidosis in a golden retriever and canine neuronal ceroid lipofuscinoses with late juvenile- to early adult-onset, but were extremely thin. Abnormal findings of the corpus callosum on midline sagittal images may be a useful imaging indicator for suspecting lysosomal storage diseases, especially hypoplasia (underdevelopment) of the corpus callosum in juvenile-onset gangliosidoses.
Stivaros, Stavros M. [Manchester Academic Health Science Centre, Academic Unit of Paediatric Radiology, Royal Manchester Children' s Hospital, Central Manchester University Hospitals NHS Foundation Trust, Manchester (United Kingdom); University of Manchester, Centre for Imaging Sciences, Institute of Population Health, Manchester (United Kingdom); Radon, Mark R. [The Walton Centre NHS Foundation Trust, Department of Neuroradiology, Liverpool (United Kingdom); Mileva, Reneta; Gledson, Ann; Keane, John A. [University of Manchester, School of Computer Science, Manchester (United Kingdom); Connolly, Daniel J.A.; Batty, Ruth [Sheffield Children' s Hospital NHS Foundation Trust, Department of Neuroradiology, Sheffield (United Kingdom); Cowell, Patricia E. [University of Sheffield, Department of Human Communication Sciences, Sheffield (United Kingdom); Hoggard, Nigel; Griffiths, Paul D. [University of Sheffield, Academic Unit of Radiology, Sheffield (United Kingdom); Wright, Neville B.; Tang, Vivian [Manchester Academic Health Science Centre, Academic Unit of Paediatric Radiology, Royal Manchester Children' s Hospital, Central Manchester University Hospitals NHS Foundation Trust, Manchester (United Kingdom)
Birth-related acute profound hypoxic-ischaemic brain injury has specific patterns of damage including the paracentral lobules. To test the hypothesis that there is anatomically coherent regional volume loss of the corpus callosum as a result of this hemispheric abnormality. Study subjects included 13 children with proven acute profound hypoxic-ischaemic brain injury and 13 children with developmental delay but no brain abnormalities. A computerised system divided the corpus callosum into 100 segments, measuring each width. Principal component analysis grouped the widths into contiguous anatomical regions. We conducted analysis of variance of corpus callosum widths as well as support vector machine stratification into patient groups. There was statistically significant narrowing of the mid-posterior body and genu of the corpus callosum in children with hypoxic-ischaemic brain injury. Support vector machine analysis yielded over 95% accuracy in patient group stratification using the corpus callosum centile widths. Focal volume loss is seen in the corpus callosum of children with hypoxic-ischaemic brain injury secondary to loss of commissural fibres arising in the paracentral lobules. Support vector machine stratification into the hypoxic-ischaemic brain injury group or the control group on the basis of corpus callosum width is highly accurate and points towards rapid clinical translation of this technique as a potential biomarker of hypoxic-ischaemic brain injury. (orig.)
Stivaros, Stavros M.; Radon, Mark R.; Mileva, Reneta; Gledson, Ann; Keane, John A.; Connolly, Daniel J.A.; Batty, Ruth; Cowell, Patricia E.; Hoggard, Nigel; Griffiths, Paul D.; Wright, Neville B.; Tang, Vivian
Birth-related acute profound hypoxic-ischaemic brain injury has specific patterns of damage including the paracentral lobules. To test the hypothesis that there is anatomically coherent regional volume loss of the corpus callosum as a result of this hemispheric abnormality. Study subjects included 13 children with proven acute profound hypoxic-ischaemic brain injury and 13 children with developmental delay but no brain abnormalities. A computerised system divided the corpus callosum into 100 segments, measuring each width. Principal component analysis grouped the widths into contiguous anatomical regions. We conducted analysis of variance of corpus callosum widths as well as support vector machine stratification into patient groups. There was statistically significant narrowing of the mid-posterior body and genu of the corpus callosum in children with hypoxic-ischaemic brain injury. Support vector machine analysis yielded over 95% accuracy in patient group stratification using the corpus callosum centile widths. Focal volume loss is seen in the corpus callosum of children with hypoxic-ischaemic brain injury secondary to loss of commissural fibres arising in the paracentral lobules. Support vector machine stratification into the hypoxic-ischaemic brain injury group or the control group on the basis of corpus callosum width is highly accurate and points towards rapid clinical translation of this technique as a potential biomarker of hypoxic-ischaemic brain injury. (orig.)
Tony Berber Sardinha
Full Text Available Um dos grandes fenômenos linguísticos da vida política brasileira recente é o que a mídia vem chamando de 'metáforas do presidente Lula'. O ponto de partida deste trabalho é o fato de que deve haver muitas metáforas que passam despercebidas no discurso do presidente e que podem ser descobertas por meio de pesquisa com corpora eletrônicos. Investigamos a presença de metáforas conceptuais relacionadas a 'desenvolvimento' em um corpus composto por pronunciamentos emitidos ao longo de três anos pelo presidente Luís Inácio Lula da Silva. Os resultados indicam que há uso sistemático de três conceitos metafóricos que definem a noção de desenvolvimento do chefe de Estado: VIAGEM, CONSTRUÇÃO e ORGANISMO. Esses três conceitos, em geral, equacionam desenvolvimento com um processo longo, construído, planejado e gerado pelo governo.One of the main linguistic phenomena in recent Brazilian politics is what the media has called 'President Lula's metaphors'. The starting point for the present investigation is that there must be lots of metaphors that go unnoticed in the president's discourse and that these may be uncovered by corpus-based research. We looked at the presence of conceptual metaphors related to 'development' in a corpus of three years of official presidential speeches. The results indicated the systematic use of three metaphorical concepts that together define the notion of development for the head of State: JOURNEY, BUILDING and ORGANISM. These three concepts together equate development with a long process that is generated, planned and carried out by the government.
Full Text Available This study presents the results of the author's research project called Olomouc Corpus of Spoken Czech (OCSC. The paper is focused on the state and partial phases of constructing the corpora, its methodology and annotation. Within the OCSC we use so called dual system of transcription, which means (1 an orthographic one with the purpose of linguistic (morpho-logical analysis and tagging and (2 a phonetic version of transcript which consists of three layers of the text: first the real transcription and further various types of the metatexts as a second and third layer, including communication aspects of the texts. The criteria of selection of speakers are also listed here and the highly important statistical analysis of the sociolin-guistic categories (gender, age, type of education, types of recordings is presented as well. This analysis can serve as a base for a partial correction of possible non-balance among those sociolinguistic parameters. The annotation rules and principles are mentioned at the end of this study.
Okamoto, Kouichirou; Ito, Jusuke; Tokiguchi, Susumu
The size and shape of the corpus callosum of twenty seven normal young volunteers (age 18-31 years, 17 men and 10 women) were investigated using a superconducting high field (1.5 T) MRI unit. The length of the corpus callosum was 71.1±5.1 mm (mean±S.D.) and the height was 24.9±2.1 mm. The length ratio of the corpus callosum to the brain was 43.9±2.3% with the ratio of the height 25.0±2.3%. The callosal index (height/length) was 35.4±2.9%. The area of the corpus callosum in the midsagittal plane was 681.4±93.6 mm 2 (min. 563 mm 2 to max. 902 mm 2 ). We divided the corpus callosum into three segments: rostrum and genu; anterior and posterior trunks; splenium. Each part accounts for one third of the total area of the corpus callosum. The genu and splenium were generally equal in thickness. The minimal thickness of the trunk was 3 mm with the maximal one 9 mm. The posterior trunk was never thicker than the anterior one. The posterior part of the posterior trunk showed thinning and concavity in almost all cases. So-called impressio corporis callosi was observed in 12 cases (44.4%). Thirteen cases (48.1%) showed a shallow concave configuration at the anterior dorsal surface of the corpus callosum. Six cases of these were thought to be due to compression by the pericallosal artery. This finding was not detected in the posterior portion of the corpus callosum. This concavity was also seen in infants. The thinning of the posterior part of the posterior trunk was seen after the development of the splenium, but the concave configuration at the anterior dorsal surface of the corpus callosum may be encountered before the full development of the genu and splenium. (author)
Fuertes-Olivera, Pedro; Bergenholtz, Henning
Dictionaries for Text Production are information tools that are designed and constructed for helping users to produce (i.e. encode) texts, both oral and written texts. These can be broadly divided into two groups: (a) specialized text production dictionaries, i.e., dictionaries that only offer...... a small amount of lexicographic data, most or all of which are typically used in a production situation, e.g. synonym dictionaries, grammar and spelling dictionaries, collocation dictionaries, concept dictionaries such as the Longman Language Activator, which is advertised as the World’s First Production...... Dictionary; (b) general text production dictionaries, i.e., dictionaries that offer all or most of the lexicographic data that are typically used in a production situation. A review of existing production dictionaries reveals that there are many specialized text production dictionaries but only a few general...
Ling, Richard; Bertel, Troels Fibæk; Sundsøy, Pål
Who texts, and with whom do they text? This article examines the use of texting using metered traffic data from a large dataset (nearly 400 million anonymous text messages). We ask 1) How much do different age groups use mobile phone based texting (SMS)? 2) How wide is the circle of texting...
A starter which teaches the basic tasks to be performed with Sublime Text with the necessary practical examples and screenshots. This book requires only basic knowledge of the Internet and basic familiarity with any one of the three major operating systems, Windows, Linux, or Mac OS X. However, as Sublime Text 2 is primarily a text editor for writing software, many of the topics discussed will be specifically relevant to software development. That being said, the Sublime Text 2 Starter is also suitable for someone without a programming background who may be looking to learn one of the tools of
Meystre, Stéphane M; Ferrández, Óscar; Friedlin, F Jeffrey; South, Brett R; Shen, Shuying; Samore, Matthew H
As more and more electronic clinical information is becoming easier to access for secondary uses such as clinical research, approaches that enable faster and more collaborative research while protecting patient privacy and confidentiality are becoming more important. Clinical text de-identification offers such advantages but is typically a tedious manual process. Automated Natural Language Processing (NLP) methods can alleviate this process, but their impact on subsequent uses of the automatically de-identified clinical narratives has only barely been investigated. In the context of a larger project to develop and investigate automated text de-identification for Veterans Health Administration (VHA) clinical notes, we studied the impact of automated text de-identification on clinical information in a stepwise manner. Our approach started with a high-level assessment of clinical notes informativeness and formatting, and ended with a detailed study of the overlap of select clinical information types and Protected Health Information (PHI). To investigate the informativeness (i.e., document type information, select clinical data types, and interpretation or conclusion) of VHA clinical notes, we used five different existing text de-identification systems. The informativeness was only minimally altered by these systems while formatting was only modified by one system. To examine the impact of de-identification on clinical information extraction, we compared counts of SNOMED-CT concepts found by an open source information extraction application in the original (i.e., not de-identified) version of a corpus of VHA clinical notes, and in the same corpus after de-identification. Only about 1.2-3% less SNOMED-CT concepts were found in de-identified versions of our corpus, and many of these concepts were PHI that was erroneously identified as clinical information. To study this impact in more details and assess how generalizable our findings were, we examined the overlap between
Gorodilova, V.V.; Yatskovskaya, N.L.
Studied were some immunological indices in patients with cancer of corpus uteri. An attempt was made to elucidate a possible dependence of immunological indices on the process propagation rate and treatment methods. Updated methods used for uteri corpus cancer treatment except for progestinotherapeutics promote the decrease of organism responsiveness. Radiation therapy applied with total therapeutic dose has especially pronounced immunodepressing effect. Progestine series preparations result in the differentiation effect on tumours in some patients with cancer of corpus uteri, which clinically manifests in decreasing the tumour and even complete elimination. Simultaneously immunological indices in such patients are improved
Byrd, S.E.; Flannery, A.; Osborn, R.E.; Radkowski, M.A.; Naidich, T.P.; Bohan, T.P.
Absence (agenesis) of the corpus callosum is one of the most common congenital malformations of the brain seen in the pediatric population. The authors used CT, MR imaging, or US to study 70 children with absence of the corpus callosum. Patients were divided into two groups; those with isolated absence of the corpus callosum, and those with other associated brain lesions. The associated brain lesions included interhemispheric arachnoid cyst, Dandy-Walker malformations, encephaloceles, and migrational disorders (heterotopias, schizencephaly, lissencaphaly, septo-optic dysplasia, lipoma, Chiari malformations, and holoprosenscephaly). The clinical presentations and radiologic findings are described
Deleger, Louise; Lingren, Todd; Ni, Yizhao; Kaiser, Megan; Stoutenborough, Laura; Marsolo, Keith; Kouril, Michal; Molnar, Katalin; Solti, Imre
The current study aims to fill the gap in available healthcare de-identification resources by creating a new sharable dataset with realistic Protected Health Information (PHI) without reducing the value of the data for de-identification research. By releasing the annotated gold standard corpus with Data Use Agreement we would like to encourage other Computational Linguists to experiment with our data and develop new machine learning models for de-identification. This paper describes: (1) the modifications required by the Institutional Review Board before sharing the de-identification gold standard corpus; (2) our efforts to keep the PHI as realistic as possible; (3) and the tests to show the effectiveness of these efforts in preserving the value of the modified data set for machine learning model development. In a previous study we built an original de-identification gold standard corpus annotated with true Protected Health Information (PHI) from 3503 randomly selected clinical notes for the 22 most frequent clinical note types of our institution. In the current study we modified the original gold standard corpus to make it suitable for external sharing by replacing HIPAA-specified PHI with newly generated realistic PHI. Finally, we evaluated the research value of this new dataset by comparing the performance of an existing published in-house de-identification system, when trained on the new de-identification gold standard corpus, with the performance of the same system, when trained on the original corpus. We assessed the potential benefits of using the new de-identification gold standard corpus to identify PHI in the i2b2 and PhysioNet datasets that were released by other groups for de-identification research. We also measured the effectiveness of the i2b2 and PhysioNet de-identification gold standard corpora in identifying PHI in our original clinical notes. Performance of the de-identification system using the new gold standard corpus as a training set was very
Full Text Available Identifying molecular biomarkers has become one of the important tasks for scientists to assess the different phenotypic states of cells or organisms correlated to the genotypes of diseases from large-scale biological data. In this paper, we proposed a text-mining-based method to discover biomarkers from PubMed. First, we construct a database based on a dictionary, and then we used a finite state machine to identify the biomarkers. Our method of text mining provides a highly reliable approach to discover the biomarkers in the PubMed database.
Full Text Available Diabetic erectile dysfunction is associated with penile dorsal nerve bundle neuropathy in the corpus cavernosum and the mechanism is not well understood. We investigated the neuropathy changes in the corpus cavernosum of rats with streptozotocin-induced diabetes and the effects of Icariside II (ICA II on improving neuropathy. Thirty-six 8-week-old Sprague-Dawley rats were randomly distributed into normal control group, diabetic group and ICA-II treated group. Diabetes was induced by a one-time intraperitoneal injection of streptozotocin (60 mg/kg. Three days later, the diabetic rats were randomly divided into 2 groups including a saline treated placebo group and an ICA II-treated group (5 mg/kg/day, by intragastric administration daily. Twelve weeks later, erectile function was measured by cavernous nerve electrostimulation with real time intracorporal pressure assessment. The penis was harvested for the histological examination (immunofluorescence and immunohistochemical staining and transmission electron microscopy detecting. Diabetic animals exhibited a decreased density of dorsal nerve bundle in penis. The neurofilament of the dorsal nerve bundle was fragmented in the diabetic rats. There was a decreased expression of nNOS and NGF in the diabetic group. The ICA II group had higher density of dorsal nerve bundle, higher expression of NGF and nNOS in the penis. The pathological change of major pelvic nerve ganglion (including the microstructure by transmission electron microscope and the neurite outgrowth length of major pelvic nerve ganglion tissue cultured in vitro was greatly attenuated in the ICA II-treated group (p < 0.01. ICA II treatment attenuates the diabetes-related impairment of corpus cavernosum and major pelvic ganglion neuropathy in rats with Streptozotocin-Induced Diabetes.
Full Text Available Objective To investigate the structure of corpus striatum and the integrity of white matter fiber in patients with Parkinson's disease (PD and idiopathic rapid eye movement sleep behavior disorder (iRBD. Methods Twelve patients with iRBD, 12 patients with PD and 10 healthy subjects that were well matched in gender, age and education were enrolled in this study. Head MRI examination was performed to all subjects to observe the changes of corpus striatum structure (the gray matter volume and the integrity of white matter fiber [fractional anisotropy (FA] by combining voxel?based morphometry (VBM and diffusion tensor imaging (DTI. Results Compared with healthy subjects, the gray matter volume of left caudate nucleus was significantly decreased (P < 0.005, and FA values of left caudate nucleus (P < 0.005, right caudate nucleus (P < 0.001 and right putamen (P < 0.05 were all significantly reduced in iRBD patients; FA value of right putamen was significantly decreased in PD patients (P < 0.05. Compared with PD patients, the gray matter volume of left caudate nucleus of iRBD patients was significantly reduced (P < 0.001, FA values of left caudate nucleus (P < 0.01 and right caudate nucleus (P < 0.005 of iRBD patients were significantly reduced. Conclusions There is atrophy of gray matter volume and extensive white matter fiber impairment in corpus striatum of patients with iRBD, and the white matter fiber impairment was similar to PD, which provides an anatomical evidence for iRBD being presymptom of PD. DOI: 10.3969/j.issn.1672-6731.2017.05.008
Full Text Available Abstract Acetylcholine (ACh, the first neurotransmitter to be identified, regulate the activities of central and peripheral functions through interactions with muscarinic receptors. Changes in muscarinic acetylcholine receptor (mAChR have been implicated in the pathophysiology of many major diseases of the central nervous system (CNS. Previous reports from our laboratory on streptozotocin (STZ induced diabetic rats showed down regulation of muscarinic M1 receptors in the brainstem, hypothalamus, cerebral cortex and pancreatic islets. In this study, we have investigated the changes of acetylcholine esterase (AChE enzyme activity, total muscarinic and muscarinic M1 receptor binding and gene expression in the corpus striatum of STZ – diabetic rats and the insulin treated diabetic rats. The striatum, a neuronal nucleus intimately involved in motor behaviour, is one of the brain regions with the highest acetylcholine content. ACh has complex and clinically important actions in the striatum that are mediated predominantly by muscarinic receptors. We observed that insulin treatment brought back the decreased maximal velocity (Vmax of acetylcholine esterase in the corpus striatum during diabetes to near control state. In diabetic rats there was a decrease in maximal number (Bmax and affinity (Kd of total muscarinic receptors whereas muscarinic M1 receptors were increased with decrease in affinity in diabetic rats. We observed that, in all cases, the binding parameters were reversed to near control by the treatment of diabetic rats with insulin. Real-time PCR experiment confirmed the increase in muscarinic M1 receptor gene expression and a similar reversal with insulin treatment. These results suggest the diabetes-induced changes of the cholinergic activity in the corpus striatum and the regulatory role of insulin on binding parameters and gene expression of total and muscarinic M1 receptors.
We present TISC, a multilingual, language-independent and context-sensitive spelling checking and correction system designed to facilitate the automatic removal of non-word spelling errors in large corpora. Its lexicon is derived from raw text corpora, without supervision, and contains word unigrams
A model for how text interpretation proceeds from what is pronounced, through what is said to what is comunicated, and definition of the concepts 'presupposition' and 'implicature'.......A model for how text interpretation proceeds from what is pronounced, through what is said to what is comunicated, and definition of the concepts 'presupposition' and 'implicature'....
To present background, principles, and procedures for a strategy for qualitative analysis called systematic text condensation and discuss this approach compared with related strategies.......To present background, principles, and procedures for a strategy for qualitative analysis called systematic text condensation and discuss this approach compared with related strategies....
A chemistry teacher describes the elements of the ideal chemistry textbook. The perfect text is focused and helps students draw a coherent whole out of the myriad fragments of information and interpretation. The text would show chemistry as the central science necessary for understanding other sciences and would also root chemistry firmly in the…
This is a review of the web resource 'Text 2 Mind Map' www.Text2MindMap.com. It covers what the resource is, and how it might be used in Library and education context, in particular for School Librarians.
Kotler, R. S.
File Comparator program IFCOMP, is text file comparator for IBM OS/VScompatable systems. IFCOMP accepts as input two text files and produces listing of differences in pseudo-update form. IFCOMP is very useful in monitoring changes made to software at the source code level.
Ehrensvärd, Martin Gustaf
For two centuries, scholars have pointed to consistent differences in the Hebrew of certain biblical texts and interpreted these differences as reflecting the date of composition of the texts. Until the 1980s, this was quite uncontroversial as the linguistic findings largely confirmed the chronol......For two centuries, scholars have pointed to consistent differences in the Hebrew of certain biblical texts and interpreted these differences as reflecting the date of composition of the texts. Until the 1980s, this was quite uncontroversial as the linguistic findings largely confirmed...... the chronology of the texts established by other means: the Hebrew of Genesis-2 Kings was judged to be early and that of Esther, Daniel, Ezra, Nehemiah, and Chronicles to be late. In the current debate where revisionists have questioned the traditional dating, linguistic arguments in the dating of texts have...... come more into focus. The study critically examines some linguistic arguments adduced to support the traditional position, and reviewing the arguments it points to weaknesses in the linguistic dating of EBH texts to pre-exilic times. When viewing the linguistic evidence in isolation it will be clear...
Full Text Available Biblical education as a holistic process goes far beyond biblical learning. It must be understood as a lifelong process, in which both biblical texts and their understanders operate appropriating their counterpart in a dialogical way. – Neither does the recipient’s horizon of understanding appear as an empty room, which had to be filled with the text only, nor is the latter a dead material one could only examine cognitively. The recipient discovers the meaning of the biblical text recomposing it by existential appropriation. So the text is brought to live in each individual reality. Both scientific insights and subjective structures as well as the understanders’ community must be included to avoid potential one-sidednesses. Unfortunately, a special negative association obscures the approach of the bible very often: Still biblical work as part of religious education appears in a cognitively oriented habit, which is neither regarding the vitality and sovereignty of the biblical texts nor the students’ desire for meaning. Moreover, the bible is getting misused for teaching moral terms or pontifications. Such downfalls can be disrupted by biblical didactics which are empowerment didactics. Regarding the sovereignty of biblical texts, these didactics assist the understander with his/her individuation by opening the texts with focus on the understander’s otherness. Thus each the text and the recipient become subjects in a dialogue. The approach of the Biblical-Enabling-Didactics leads the Bible to become always new a book of life. Understanding them from within their hermeneutics, empowerment didactics could be raised to the principle of biblical didactics in general and grow into an essential element of holistic education.
Clarity and accuracy of reporting are fundamental to the scientific process. Readability formulas can estimate how difficult a text is to read. Here, in a corpus consisting of 709,577 abstracts published between 1881 and 2015 from 123 scientific journals, we show that the readability of science is steadily decreasing. Our analyses show that this trend is indicative of a growing use of general scientific jargon. These results are concerning for scientists and for the wider public, as they impact both the reproducibility and accessibility of research findings. PMID:28873054
van Velzen, Marjolein H; Nanetti, Luca; de Deyn, Peter P
Corpus linguistics allows researchers to process millions of words. However, the more words we analyse, i.e., the more data we acquire, the more urgent the call for correct data interpretation becomes. In recent years, a number of studies saw the light attempting to profile some prolific authors' linguistic decline, linking this decline to pathological conditions such as Alzheimer's Disease (AD). However, in line with the nature of the (literary) work that was analysed, numbers alone do not suffice to 'tell the story'. The one and only objective of using statistical methods for the analysis of research data is to tell a story--what happened, when, and how. In the present study we describe a computerised but individualised approach to linguistic analysis--we propose a unifying approach, with firm grounds in Information Theory, that, independently from the specific parameter being investigated, guarantees to produce a robust model of the temporal dynamics of an author's linguistic richness over his or her lifetime. We applied this methodology to six renowned authors with an active writing life of four decades or more: Iris Murdoch, Gerard Reve, Hugo Claus, Agatha Christie, P.D. James, and Harry Mulisch. The first three were diagnosed with probable Alzheimer Disease, confirmed post-mortem for Iris Murdoch; this same condition was hypothesized for Agatha Christie. Our analysis reveals different evolutive patterns of lexical richness, in turn plausibly correlated with the authors' different conditions. Copyright © 2013 Elsevier Ltd. All rights reserved.
From 496 case reports of patients with a corpus carcinoma collected between 1970 and 1976, the clinical findings, separation into clinical stages and the various therapy forms were compiled and evaluated. As a mean age of 62.3 years, 56.9 per cent of patients reached an average five-year, recidivation-free survival periods. Metastases occurred in 19.1 per cent of all treated women, vaginal recidivations in 1.8 per cent. Particular attention was given to the side effects of radiation therapy and retarded harmful effects. In this connexion an increase in complications following treatment with newly introduced radiation qualities had to be recorded. 21.9 per cent of all radiation-treated patients differed side-effects, and in 11.7 per cent of all radiation-treated women retarded harmful effects were found. Owing to the experience collected meanwhile in radiotherapy with ultra-hard X-rays and to the use of computerized tomography establishing the adequate quantity of radiation, complications following radiation treatment are expected to occur less frequently in future. (orig./MG) [de
de Tarso, S G S; Gastal, G D A; Bashir, S T; Gastal, M O; Apgar, G A; Gastal, E L
Colour Doppler ultrasonography was used to compare the ability of preovulatory follicle (POF) blood flow and its dimensions to predict the size, blood flow and progesterone production capability of the subsequent corpus luteum (CL). Cows (n=30) were submitted to a synchronisation protocol. Follicles ≥7mm were measured and follicular wall blood flow evaluated every 12h for approximately 3.5 days until ovulation. After ovulation, cows were scanned daily for 8 days and similar parameters were evaluated for the CL. Blood samples were collected and plasma progesterone concentrations quantified. All parameters were positively correlated. Correlation values ranged from 0.26 to 0.74 on data normalised to ovulation and from 0.31 to 0.74 on data normalised to maximum values. Correlations between calculated ratios of both POF and CL in data normalised to ovulation and to maximum values ranged from moderate (0.57) to strong (0.87). Significant (Pprogesterone concentrations of the resultant CL. These findings indicate that follicle vascularity coordinates CL blood flow and progesterone production in synchronised beef cows.
Kaymakamzade, Bahar; Eker, Amber
Acute ischemia of the corpus callosum (CC) is not a well-known feature in patients with acute hydrocephalus. Herein, we describe a case with acute CC infarction due to another rare entity; transient obstructive hydrocephalus. A 66-year-old male was admitted with sudden onset right-sided hemiparesia. CT demonstrated a hematoma on the left basal ganglia with extension to all ventricles. The following day, the patient's neurological status progressed to coma and developed bilateral pyramidal signs. MRI demonstrated obstructive hydrocephalus and acute diffuse infarction accompanied by elevation of the CC. On the same day there was improvement in his neurological status with significant decrease in ventricular size and complete resolution of the clot in the third ventricle. The mechanism of signal abnormalities is probably related with the neural compression of the CC against the falx. Presumably, the clot causing obstruction in the third ventricle dissolved or decayed by the help of fibrinolytic activity of CSF, which was raised after IVH and caused spontaneous improvement of hydrocephalus. Bilateral neurological symptoms suggest diffuse axonal damage and normalization of the intracranial pressure should be performed on the early onset of clinical detorioration in order to prevent axonal injury. Copyright © 2016 Polish Neurological Society. Published by Elsevier Urban & Partner Sp. z o.o. All rights reserved.
Yan, Erjia; Williams, Jake; Chen, Zheng
Publication metadata help deliver rich analyses of scholarly communication. However, research concepts and ideas are more effectively expressed through unstructured fields such as full texts. Thus, the goals of this paper are to employ a full-text enabled method to extract terms relevant to disciplinary vocabularies, and through them, to understand the relationships between disciplines. This paper uses an efficient, domain-independent term extraction method to extract disciplinary vocabularies from a large multidisciplinary corpus of PLoS ONE publications. It finds a power-law pattern in the frequency distributions of terms present in each discipline, indicating a semantic richness potentially sufficient for further study and advanced analysis. The salient relationships amongst these vocabularies become apparent in application of a principal component analysis. For example, Mathematics and Computer and Information Sciences were found to have similar vocabulary use patterns along with Engineering and Physics; while Chemistry and the Social Sciences were found to exhibit contrasting vocabulary use patterns along with the Earth Sciences and Chemistry. These results have implications to studies of scholarly communication as scholars attempt to identify the epistemological cultures of disciplines, and as a full text-based methodology could lead to machine learning applications in the automated classification of scholarly work according to disciplinary vocabularies.
Full Text Available In moving society towards more sustainable forms of consumption and production, social learning must play an important role. Making the assumption that it occurs as a consequence of changes in understanding, this article presents a methodology for mapping meanings in sustainability communication texts. The methodology uses techniques from corpus linguistics and framing theory. Two large databases of text were constructed by copying material down from the websites of two different groups of social actors: (i environmental NGOs and (ii British green business, and saving it as .txt files. The findings on individual words show that the NGOs and business use them very differently. Focusing on words expressing concern for the natural environment, it is proposed that the two actors also conceptualize their concern differently. Green business’s cognitive system of concern has two well-developed frames; good intentions and risk management. However, three frames—concern for the natural environment, perception of the damage, and responsibility, are light on detail. In contrast, within the NGOs’ system of concern, the frames of concern for the natural environment, perception of the damage and responsibility, contain words making detailed representations.
Examines chemical engineering students' attitudes to text and other parts of English language textbooks. A questionnaire was administered to a group of undergraduates. Results reveal one way students get around the problem of textbook reading. (Author/VWL)
with literary texts written in indigenous South African languages. The project ... Homi Bhabha uses the words of Salman Rushdie to underline the fact that new .... I could not conceptualise an African-language-to-African-language dictionary. An.
Leroy, Gondy; Endicott, James E
With increasing text digitization, digital libraries can personalize materials for individuals with different education levels and language skills. To this end, documents need meta-information describing their difficulty level. Previous attempts at such labeling used readability formulas but the formulas have not been validated with modern texts and their outcome is seldom associated with actual difficulty. We focus on medical texts and are developing new, evidence-based meta-tags that are associated with perceived and actual text difficulty. This work describes a first tag, term familiarity , which is based on term frequency in the Google corpus. We evaluated its feasibility to serve as a tag by looking at a document corpus (N=1,073) and found that terms in blogs or journal articles displayed unexpected but significantly different scores. Term familiarity was then applied to texts and results from a previous user study (N=86) and could better explain differences for perceived and actual difficulty.
Full Text Available Text corpus size is an important issue when building a language model (LM. This is a particularly important issue for languages where little data is available. This paper introduces an LM adaptation technique to improve an LM built using a small amount of task-dependent text with the help of a machine-translated text corpus. Icelandic speech recognition experiments were performed using data, machine translated (MT from English to Icelandic on a word-by-word and sentence-by-sentence basis. LM interpolation using the baseline LM and an LM built from either word-by-word or sentence-by-sentence translated text reduced the word error rate significantly when manually obtained utterances used as a baseline were very sparse.
Marta Eugenia Rojas-Porras
Full Text Available The ethical and social responsibility of citing the sources in a scientific or artistic work is undeniable. This paper explores, in a preliminary way, academic plagiarism in its various forms. It includes findings based on a forensic analysis. The purpose of this paper is to raise awareness on the importance of considering these details when writing and publishing a text. Hopefully, this analysis may put the issue under discussion.