large text corpus: Topics by WorldWideScience.org

Sample records for large text corpus

Building a comprehensive syntactic and semantic corpus of Chinese clinical texts.

Science.gov (United States)

He, Bin; Dong, Bin; Guan, Yi; Yang, Jinfeng; Jiang, Zhipeng; Yu, Qiubin; Cheng, Jianyi; Qu, Chunyan

2017-05-01

To build a comprehensive corpus covering syntactic and semantic annotations of Chinese clinical texts with corresponding annotation guidelines and methods as well as to develop tools trained on the annotated corpus, which supplies baselines for research on Chinese texts in the clinical domain. An iterative annotation method was proposed to train annotators and to develop annotation guidelines. Then, by using annotation quality assurance measures, a comprehensive corpus was built, containing annotations of part-of-speech (POS) tags, syntactic tags, entities, assertions, and relations. Inter-annotator agreement (IAA) was calculated to evaluate the annotation quality and a Chinese clinical text processing and information extraction system (CCTPIES) was developed based on our annotated corpus. The syntactic corpus consists of 138 Chinese clinical documents with 47,426 tokens and 2612 full parsing trees, while the semantic corpus includes 992 documents that annotated 39,511 entities with their assertions and 7693 relations. IAA evaluation shows that this comprehensive corpus is of good quality, and the system modules are effective. The annotated corpus makes a considerable contribution to natural language processing (NLP) research into Chinese texts in the clinical domain. However, this corpus has a number of limitations. Some additional types of clinical text should be introduced to improve corpus coverage and active learning methods should be utilized to promote annotation efficiency. In this study, several annotation guidelines and an annotation method for Chinese clinical texts were proposed, and a comprehensive corpus with its NLP modules were constructed, providing a foundation for further study of applying NLP techniques to Chinese texts in the clinical domain. Copyright © 2017. Published by Elsevier Inc.
Application of Text Analytics to Extract and Analyze Material–Application Pairs from a Large Scientific Corpus

Directory of Open Access Journals (Sweden)

Nikhil Kalathil

2018-01-01

Full Text Available When assessing the importance of materials (or other components to a given set of applications, machine analysis of a very large corpus of scientific abstracts can provide an analyst a base of insights to develop further. The use of text analytics reduces the time required to conduct an evaluation, while allowing analysts to experiment with a multitude of different hypotheses. Because the scope and quantity of metadata analyzed can, and should, be large, any divergence from what a human analyst determines and what the text analysis shows provides a prompt for the human analyst to reassess any preliminary findings. In this work, we have successfully extracted material–application pairs and ranked them on their importance. This method provides a novel way to map scientific advances in a particular material to the application for which it is used. Approximately 438,000 titles and abstracts of scientific papers published from 1992 to 2011 were used to examine 16 materials. This analysis used coclustering text analysis to associate individual materials with specific clean energy applications, evaluate the importance of materials to specific applications, and assess their importance to clean energy overall. Our analysis reproduced the judgments of experts in assigning material importance to applications. The validated methods were then used to map the replacement of one material with another material in a specific application (batteries.
Annotated chemical patent corpus: a gold standard for text mining.

Directory of Open Access Journals (Sweden)

Saber A Akhondi

Full Text Available Exploring the chemical and biological space covered by patent applications is crucial in early-stage medicinal chemistry activities. Patent analysis can provide understanding of compound prior art, novelty checking, validation of biological assays, and identification of new starting points for chemical exploration. Extracting chemical and biological entities from patents through manual extraction by expert curators can take substantial amount of time and resources. Text mining methods can help to ease this process. To validate the performance of such methods, a manually annotated patent corpus is essential. In this study we have produced a large gold standard chemical patent corpus. We developed annotation guidelines and selected 200 full patents from the World Intellectual Property Organization, United States Patent and Trademark Office, and European Patent Office. The patents were pre-annotated automatically and made available to four independent annotator groups each consisting of two to ten annotators. The annotators marked chemicals in different subclasses, diseases, targets, and modes of action. Spelling mistakes and spurious line break due to optical character recognition errors were also annotated. A subset of 47 patents was annotated by at least three annotator groups, from which harmonized annotations and inter-annotator agreement scores were derived. One group annotated the full set. The patent corpus includes 400,125 annotations for the full set and 36,537 annotations for the harmonized set. All patents and annotated entities are publicly available at www.biosemantics.org.
Using text mining techniques to extract phenotypic information from the PhenoCHF corpus.

Science.gov (United States)

Alnazzawi, Noha; Thompson, Paul; Batista-Navarro, Riza; Ananiadou, Sophia

2015-01-01

Phenotypic information locked away in unstructured narrative text presents significant barriers to information accessibility, both for clinical practitioners and for computerised applications used for clinical research purposes. Text mining (TM) techniques have previously been applied successfully to extract different types of information from text in the biomedical domain. They have the potential to be extended to allow the extraction of information relating to phenotypes from free text. To stimulate the development of TM systems that are able to extract phenotypic information from text, we have created a new corpus (PhenoCHF) that is annotated by domain experts with several types of phenotypic information relating to congestive heart failure. To ensure that systems developed using the corpus are robust to multiple text types, it integrates text from heterogeneous sources, i.e., electronic health records (EHRs) and scientific articles from the literature. We have developed several different phenotype extraction methods to demonstrate the utility of the corpus, and tested these methods on a further corpus, i.e., ShARe/CLEF 2013. Evaluation of our automated methods showed that PhenoCHF can facilitate the training of reliable phenotype extraction systems, which are robust to variations in text type. These results have been reinforced by evaluating our trained systems on the ShARe/CLEF corpus, which contains clinical records of various types. Like other studies within the biomedical domain, we found that solutions based on conditional random fields produced the best results, when coupled with a rich feature set. PhenoCHF is the first annotated corpus aimed at encoding detailed phenotypic information. The unique heterogeneous composition of the corpus has been shown to be advantageous in the training of systems that can accurately extract phenotypic information from a range of different text types. Although the scope of our annotation is currently limited to a single
CUILESS2016: a clinical corpus applying compositional normalization of text mentions.

Science.gov (United States)

Osborne, John D; Neu, Matthew B; Danila, Maria I; Solorio, Thamar; Bethard, Steven J

2018-01-10

Traditionally text mention normalization corpora have normalized concepts to single ontology identifiers ("pre-coordinated concepts"). Less frequently, normalization corpora have used concepts with multiple identifiers ("post-coordinated concepts") but the additional identifiers have been restricted to a defined set of relationships to the core concept. This approach limits the ability of the normalization process to express semantic meaning. We generated a freely available corpus using post-coordinated concepts without a defined set of relationships that we term "compositional concepts" to evaluate their use in clinical text. We annotated 5397 disorder mentions from the ShARe corpus to SNOMED CT that were previously normalized as "CUI-less" in the "SemEval-2015 Task 14" shared task because they lacked a pre-coordinated mapping. Unlike the previous normalization method, we do not restrict concept mappings to a particular set of the Unified Medical Language System (UMLS) semantic types and allow normalization to occur to multiple UMLS Concept Unique Identifiers (CUIs). We computed annotator agreement and assessed semantic coverage with this method. We generated the largest clinical text normalization corpus to date with mappings to multiple identifiers and made it freely available. All but 8 of the 5397 disorder mentions were normalized using this methodology. Annotator agreement ranged from 52.4% using the strictest metric (exact matching) to 78.2% using a hierarchical agreement that measures the overlap of shared ancestral nodes. Our results provide evidence that compositional concepts can increase semantic coverage in clinical text. To our knowledge we provide the first freely available corpus of compositional concept annotation in clinical text.
A 38 million words Dutch text corpus and its users | Kruyt | Lexikos

African Journals Online (AJOL)

In August 1996, the 38 Million Words Corpus was available for consultation by the international research community. The present paper reports on the characteristics of this corpus (design, text classification, linguistic annotation) and on its use, both in dictionary projects and in linguistic research. In spite of limitations with ...
A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools.

Science.gov (United States)

Verspoor, Karin; Cohen, Kevin Bretonnel; Lanfranchi, Arrick; Warner, Colin; Johnson, Helen L; Roeder, Christophe; Choi, Jinho D; Funk, Christopher; Malenkiy, Yuriy; Eckert, Miriam; Xue, Nianwen; Baumgartner, William A; Bada, Michael; Palmer, Martha; Hunter, Lawrence E

2012-08-17

We introduce the linguistic annotation of a corpus of 97 full-text biomedical publications, known as the Colorado Richly Annotated Full Text (CRAFT) corpus. We further assess the performance of existing tools for performing sentence splitting, tokenization, syntactic parsing, and named entity recognition on this corpus. Many biomedical natural language processing systems demonstrated large differences between their previously published results and their performance on the CRAFT corpus when tested with the publicly available models or rule sets. Trainable systems differed widely with respect to their ability to build high-performing models based on this data. The finding that some systems were able to train high-performing models based on this corpus is additional evidence, beyond high inter-annotator agreement, that the quality of the CRAFT corpus is high. The overall poor performance of various systems indicates that considerable work needs to be done to enable natural language processing systems to work well when the input is full-text journal articles. The CRAFT corpus provides a valuable resource to the biomedical natural language processing community for evaluation and training of new models for biomedical full text publications.
A corpus of images and text in online news

NARCIS (Netherlands)

L. Hollink (Laura); A. Bedjeti (Adriatik); M. van Harmelen; D. Elliott (Desmond)

2016-01-01

htmlabstractIn recent years, several datasets have been released that include images and text, giving impulse to new methods that combine natural language processing and computer vision. However, there is a need for datasets of images in their natural textual context. The ION corpus contains 300K
Bollywood Movie Corpus for Text, Images and Videos

OpenAIRE

Madaan, Nishtha; Mehta, Sameep; Saxena, Mayank; Aggarwal, Aditi; Agrawaal, Taneea S; Malhotra, Vrinda

2017-01-01

In past few years, several data-sets have been released for text and images. We present an approach to create the data-set for use in detecting and removing gender bias from text. We also include a set of challenges we have faced while creating this corpora. In this work, we have worked with movie data from Wikipedia plots and movie trailers from YouTube. Our Bollywood Movie corpus contains 4000 movies extracted from Wikipedia and 880 trailers extracted from YouTube which were released from 1...
A new English–Arabic parallel text corpus for lexicographic ...

African Journals Online (AJOL)

The chosen source texts deal with a variety of topics such as the environment, globalization, psychology, history, politics, drama, etc. Their Arabic translations were taken from The World of Knowledge series published by the National Council for Culture, Arts and Letters (NCCAL) in Kuwait. Keywords: parallel corpus ...
The WONP-NURT corpus as nuclear knowledge base for text mining in the INIS database

International Nuclear Information System (INIS)

Guerra Valdes, R.

2011-01-01

In the present work the WONP-NURT corpus is taken as knowledge base for text mining in the INIS database. Main components of the information processing system, as well as computational methods for content analysis of INIS database record files are described. Results of the content analysis of the WONP-NURT corpus are reported. Furthermore, results of two comparative text mining studies in the INIS database are also shown. The first one explores 10 research areas in the more familiar nearest range of WONP-NURT corpus, while the second one surveys 15 regions in the more exotic far range. The results provide new elements to asses the significance of the WONP-NURT corpus in the context of the current state of nuclear science and technology research areas. (Author)
Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles.

Science.gov (United States)

Cohen, K Bretonnel; Lanfranchi, Arrick; Choi, Miji Joo-Young; Bada, Michael; Baumgartner, William A; Panteleyeva, Natalya; Verspoor, Karin; Palmer, Martha; Hunter, Lawrence E

2017-08-17

Coreference resolution is the task of finding strings in text that have the same referent as other strings. Failures of coreference resolution are a common cause of false negatives in information extraction from the scientific literature. In order to better understand the nature of the phenomenon of coreference in biomedical publications and to increase performance on the task, we annotated the Colorado Richly Annotated Full Text (CRAFT) corpus with coreference relations. The corpus was manually annotated with coreference relations, including identity and appositives for all coreferring base noun phrases. The OntoNotes annotation guidelines, with minor adaptations, were used. Interannotator agreement ranges from 0.480 (entity-based CEAF) to 0.858 (Class-B3), depending on the metric that is used to assess it. The resulting corpus adds nearly 30,000 annotations to the previous release of the CRAFT corpus. Differences from related projects include a much broader definition of markables, connection to extensive annotation of several domain-relevant semantic classes, and connection to complete syntactic annotation. Tool performance was benchmarked on the data. A publicly available out-of-the-box, general-domain coreference resolution system achieved an F-measure of 0.14 (B3), while a simple domain-adapted rule-based system achieved an F-measure of 0.42. An ensemble of the two reached F of 0.46. Following the IDENTITY chains in the data would add 106,263 additional named entities in the full 97-paper corpus, for an increase of 76% percent in the semantic classes of the eight ontologies that have been annotated in earlier versions of the CRAFT corpus. The project produced a large data set for further investigation of coreference and coreference resolution in the scientific literature. The work raised issues in the phenomenon of reference in this domain and genre, and the paper proposes that many mentions that would be considered generic in the general domain are not
The first Malay language storytelling text-to-speech (TTS) corpus for ...

African Journals Online (AJOL)

speech annotations are described in detail in accordance to baseline work. The stories were recorded in two speaking styles that are neutral and storytelling speaking style. The first. Malay language storytelling corpus is not only necessary for the development of a storytelling text-to-speech (TTS) synthesis. It is also ...
A New English?Arabic Parallel Text Corpus for Lexicographic Applications

Directory of Open Access Journals (Sweden)

Hashan Al-Ajmi

2011-10-01

Full Text Available
Abstract: Bilingual lexicographers, translation specialists and English teachers in the Arabworld do not have access to computerized corpora of parallel texts for the English–Arabic languagepair. This project has been carried out to meet this requirement by establishing the first generalparallel corpus of English texts and their Arabic translations. The first phase of the project involvedthe selection of general source texts having appropriate lexical and stylistic features. The chosensource texts deal with a variety of topics such as the environment, globalization, psychology, history,politics, drama, etc. Their Arabic translations were taken from The World of Knowledge seriespublished by the National Council for Culture, Arts and Letters (NCCAL in Kuwait.
Keywords: PARALLEL CORPUS, LEXICOGRAPHY, TRANSLATION, BILINGUAL DICTIONARY,COLLOCATIONS, ALIGNMENT, SYNONYMS, DERIVATIVES, ANTONYMS, GLOSSARY,FREQUENCY
Opsomming: 'n Nuwe Engels–Arabiese parallelletekskorpus vir leksikografiesetoepassings Tweetalige leksikograwe, vertaalkundiges en Engelsonderwysers in dieArabiese wêreld het nie toegang tot gerekenariseerde korpusse van parallelle tekste vir die Engels–Arabiese taalpaar nie. Hierdie projek is onderneem om in dié behoefte te voorsien deur die eerstealgemene parallelle korpus van Engelse tekste en hul Arabiese vertalings tot stand te bring. Dieeerste fase van die projek het die keuse van algemene brontekste behels wat geskikte leksikale enstilistiese eienskappe besit. Die gekose brontekste handel oor 'n verskeidenheid onderwerpe soosdie omgewing, globalisering, psigologie, geskiedenis, politiek, drama, ens. Hul Arabiese vertalingsis geneem uit The World of Knowledge-reeks gepubliseer deur die National Council for Culture, Artsand Letters (NCCAL in Koeweit.
Sleutelwoorde: PARALLELLE KORPUS, LEKSIKOGRAFIE, VERTALING, TWEETALIGEWOORDEBOEK, KOLLOKASIES, OOREENSTEMMING, SINONIEME, AFLEIDINGS, ANTONIEME
TRAINING TREE ADJOINING GRAMMARS WITH HUGE TEXT CORPUS USING SPARK MAP REDUCE

Directory of Open Access Journals (Sweden)

Vijay Krishna Menon

2015-07-01

Full Text Available Tree adjoining grammars (TAGs are mildly context sensitive formalisms used mainly in modelling natural languages. Usage and research on these psycho linguistic formalisms have been erratic in the past decade, due to its demanding construction and difficulty to parse. However, they represent promising future for formalism based NLP in multilingual scenarios. In this paper we demonstrate basic synchronous Tree adjoining grammar for English-Tamil language pair that can be used readily for machine translation. We have also developed a multithreaded chart parser that gives ambiguous deep structures and a par dependency structure known as TAG derivation. Furthermore we then focus on a model for training this TAG for each language using a large corpus of text through a map reduce frequency count model in spark and estimation of various probabilistic parameters for the grammar trees thereafter; these parameters can be used to perform statistical parsing on the trained grammar.
BC4GO: a full-text corpus for the BioCreative IV GO task.

Science.gov (United States)

Van Auken, Kimberly; Schaeffer, Mary L; McQuilton, Peter; Laulederkind, Stanley J F; Li, Donghui; Wang, Shur-Jen; Hayman, G Thomas; Tweedie, Susan; Arighi, Cecilia N; Done, James; Müller, Hans-Michael; Sternberg, Paul W; Mao, Yuqing; Wei, Chih-Hsuan; Lu, Zhiyong

2014-01-01

Gene function curation via Gene Ontology (GO) annotation is a common task among Model Organism Database groups. Owing to its manual nature, this task is considered one of the bottlenecks in literature curation. There have been many previous attempts at automatic identification of GO terms and supporting information from full text. However, few systems have delivered an accuracy that is comparable with humans. One recognized challenge in developing such systems is the lack of marked sentence-level evidence text that provides the basis for making GO annotations. We aim to create a corpus that includes the GO evidence text along with the three core elements of GO annotations: (i) a gene or gene product, (ii) a GO term and (iii) a GO evidence code. To ensure our results are consistent with real-life GO data, we recruited eight professional GO curators and asked them to follow their routine GO annotation protocols. Our annotators marked up more than 5000 text passages in 200 articles for 1356 distinct GO terms. For evidence sentence selection, the inter-annotator agreement (IAA) results are 9.3% (strict) and 42.7% (relaxed) in F1-measures. For GO term selection, the IAAs are 47% (strict) and 62.9% (hierarchical). Our corpus analysis further shows that abstracts contain ∼ 10% of relevant evidence sentences and 30% distinct GO terms, while the Results/Experiment section has nearly 60% relevant sentences and >70% GO terms. Further, of those evidence sentences found in abstracts, less than one-third contain enough experimental detail to fulfill the three core criteria of a GO annotation. This result demonstrates the need of using full-text articles for text mining GO annotations. Through its use at the BioCreative IV GO (BC4GO) task, we expect our corpus to become a valuable resource for the BioNLP research community. Database URL: http://www.biocreative.org/resources/corpora/bc-iv-go-task-corpus/. Published by Oxford University Press 2014. This work is written by US
Are translations longer than source texts? A corpus-based study of explicitation

OpenAIRE

Frankenberg-Garcia, A

2009-01-01

Explicitation is the process of rendering information which is only implicit in the source text explicit in the target text, and is believed to be one of the universals of translation (Blum-Kulka 1986, Olohan and Baker 2000, Øverås 1998, Séguinot 1988, Vanderauwera 1985). The present study uses corpus technology to attempt to shed some light on the complex relationship between translation, text length and explicitation. An awareness of what makes translations longer (or shorter) and more expl...
CREATING AND PROCESSING A CORPUS

Directory of Open Access Journals (Sweden)

Prihantoro

2015-05-01

Full Text Available This paper seeks to describe some crucial importance of corpus and text processing. Corpus is a projection of how language is used by its speakers. Technology support has improved corpus for easier maintenance, made it space-saving, and it may electronically structure its data. The latest offers much freedom for corpus users to access and exploit it for language teaching, analysis or other specified tasks. This paper will demonstrate how to use open-access corpus on internet such as Corpus of Contemporary American English (COCA and British National Corpus (BNC. Besides how to use a corpus, another crucial importance that this paper seeks to describe is how to build a corpus. In this paper, the writer will use UNITEX, a corpus (text-based processing software. This software will demonstrate steps of corpus building, ranging from text collection, annotation, electronic dictionary application to some natural language based operations ranging from pattern matching, concordance, to simple extraction. It will show how graph technology may outperform regular expression, a retrieval method exploited by other corpus processor, in terms of writing output.
Slovene specialized text corpus of Library and Information Science – An advanced lexicographic tool for library terminology research

OpenAIRE

Kanič, Ivan

2013-01-01

To support the research in the field of library and information science terminology and dictionary construction in Slovene language a specialized text corpus has been designed and constructed. The corpus has reached 3,6 million words extracted from 625 Slovene technical and scientific texts of the field. It supports a variety of specialized search methods, display of search results, and their statistic computation. The web based application is in open public access.
The Influence of Reference Corpus Size on Wordsmith Tools Keywords Extraction

Directory of Open Access Journals (Sweden)

Tony Berber Sardinha

2012-05-01

Full Text Available A KeyWords analysis (using WordSmith Tools enables the discovery of lexical items which reveal the main lexical sets in a text or corpus. Such an analysis requires that a reference corpus be compared to the corpus the researcher intends to describe (the study corpus. This paper presents a mathematical method for finding out the influence of reference corpus size on the number of key words extracted by the program. The results reveal that a reference corpus that is at least five times as large as the study corpus allows for drawing an amount of key words that is statistically equivalent to larger reference corpora, thus suggesting five times (as larger as the study corpora as the minimum order of magnitude for reference corpora.

Towards an integrated corpus stylistics

Directory of Open Access Journals (Sweden)

McIntyre Dan

2015-12-01

Full Text Available Over recent years, the use of corpora in stylistic analysis has grown in popularity. However, questions still remain over the remit of corpus stylistics, its distinction from corpus linguistics generally and its capacity to explain complex stylistic effects. This article argues in favour of an integrated corpus stylistics; that is, an approach to corpus stylistics that integrates it with other stylistic methods and analytical frameworks. I suggest that this approach is needed for two main reasons: (i it is analytically necessary in order to fully explain stylistic effects in texts, and (ii integrating corpus methods with other stylistic tools is what will distinguish corpus stylistics from corpus linguistics. My argument is supported by reference to examples from Mark Haddon’s no vel The Curious Incident of the Dog in the Night-time and the HBO TV series Deadwood. Both these examples rely for their explanation on a combination of corpus stylistic analytical techniques and other stylistic methods of analysis.
Bayesian stratified sampling to assess corpus utility

Energy Technology Data Exchange (ETDEWEB)

Hochberg, J.; Scovel, C.; Thomas, T.; Hall, S.

1998-12-01

This paper describes a method for asking statistical questions about a large text corpus. The authors exemplify the method by addressing the question, ``What percentage of Federal Register documents are real documents, of possible interest to a text researcher or analyst?`` They estimate an answer to this question by evaluating 200 documents selected from a corpus of 45,820 Federal Register documents. Bayesian analysis and stratified sampling are used to reduce the sampling uncertainty of the estimate from over 3,100 documents to fewer than 1,000. A possible application of the method is to establish baseline statistics used to estimate recall rates for information retrieval systems.
Semantic markup of nouns and adjectives for the Electronic corpus of texts in Tuvan language

Directory of Open Access Journals (Sweden)

Bajlak Ch. Oorzhak

2016-12-01

Full Text Available The article examines the progress of semantic markup of the Electronic corpus of texts in Tuvan language (ECTTL, which is another stage of adding Tuvan texts to the database and marking up the corpus. ECTTL is a collaborative project by researchers from Tuvan State University (Research and Education Center of Turkic Studies and Department of Information Technologies. Semantic markup of Tuvan lexis will come as a search engine and reference system which will help users find text snippets containing words with desired meanings in ECTTL. The first stage of this process is setting up databases of basic lexemes of Tuvan language. All meaningful lexemes were classified into the following semantic groups: humans, animals, objects, natural objects and phenomena, and abstract concepts. All Tuvan object nouns, as well as both descriptive and relative adjectives, were assigned to one of these lexico-semantic classes. Each class, sub-class and descriptor is tagged in Tuvan, Russian and English; these tags, in turn, will help automatize searching. The databases of meaningful lexemes of Tuvan language will also outline their lexical combinations. The automatized system will contain information on semantic combinations of adjectives with nouns, adverbs with verbs, nouns with verbs, as well as on the combinations which are semantically incompatible.
FTA Corpus: a parallel corpus of English and Spanish Free Trade Agreements for the study of specialized collocations

Directory of Open Access Journals (Sweden)

Pedro Patiño García

2013-04-01

Full Text Available This paper describes the Corpus of Free Trade Agreements (henceforth FTA, a specialized parallel corpus in English and Spanish from Europe and America and a smaller subcorpus in English-Norwegian and Spanish-Norwegian that was prepared and then aligned with Translation Corpus Aligner 2 (Hofland & Johansson, 1998. The data was taken from Free Trade Agreements. These agreements are specialized texts officially signed and ratified by several countries and blocks of countries in the last twenty years. Thus, FTAs are a rich repository for terminology and phraseology that is used in different fields of business activity throughout the world. The corpus contains around 1.37 million words in the English section and 1.48 million words in its Spanish counterpart, plus 60,000 words each in the Spanish-Norwegian and English-Norwegian subcorpus. The corpus is being used primarily to study the terms and specialized collocations that include these terms in this kind of specialized texts.Keywords: specialized collocation, specialized parallel corpus, corpus linguistics, Free Trade Agreement
Corpus-based critical discourse analysis as a method of exploring underlying ideologies and self-representation strategies in legal texts

DEFF Research Database (Denmark)

Potts, Amanda; Kjær, Anne Lise

that legal language can be subjective and emotive. The semantic field of ‘crime’ is an expected key, but concordance analysis shows ideological skew in discursive construction of crimes/victims. For instance, ‘rape’/‘sexual assault’ co-occurs with female victims, whereas ‘torture’/‘outrages upon personal......Legal language is an integral and foundational party of our social reality, but it is underrepresented in interdisciplinary, critical linguistic analyses. This is perhaps because language is more objective and formulaic than media texts, which can be more subjective and emotive (Kjær and Palsbro......, 2008). In this paper, I demonstrate how a corpus-based critical discourse analysis of legal language can expose hidden traces of the underlying ideologies of text creators, while demonstrating how identity can be performed in legal texts. Research is based on a half-million-word corpus of annual...
Studying text coherence in Czech – a corpus-based analysis

Directory of Open Access Journals (Sweden)

Rysová Magdaléna

2017-12-01

Full Text Available The paper deals with the field of Czech corpus linguistics and represents one of various current studies analysing text coherence through language interactions. It presents a corpusbased analysis of grammatical coreference and sentence information structure (in terms of contextual boundness in Czech. It focuses on examining the interaction of these two language phenomena and observes where they meet to participate in text structuring. Specifically, the paper analyses contextually bound and non-bound sentence items and examines whether (and how often they are involved in relations of grammatical coreference in Czech newspaper articles. The analysis is carried out on the language data of the Prague Dependency Treebank (PDT containing 3,165 Czech texts. The results of the analysis are helpful in automatic text annotation - the paper presents how (or to what extent the annotation of grammatical coreference may be used in automatic (pre-annotation of sentence information structure in Czech. It demonstrates how accurately we may (automatically assume the value of contextual boundness for the antecedent and anaphor (as the two participants of a grammatical coreference relation. The results of the paper demonstrate that the anaphor of grammatical coreference is automatically predictable - it is a non-contrastive contextually bound sentence item in 99.18% of cases. On the other hand, the value of contextual boundness of the antecedent is not so easy to estimate (according to the PDT, the antecedent is contextually non-bound in 37% of cases, non-contrastive contextually bound in 50% and contrastive contextually bound in 13% of cases.
Text Induced Spelling Correction

NARCIS (Netherlands)

Reynaert, M.W.C.

2004-01-01

We present TISC, a language-independent and context-sensitive spelling checking and correction system designed to facilitate the automatic removal of non-word spelling errors in large corpora. Its lexicon is derived from a very large corpus of raw text, without supervision, and contains word
Diffuse corpus callosum infarction - Rare vascular entity with differing etiology.

Science.gov (United States)

Mahale, Rohan; Mehta, Anish; Buddaraju, Kiran; John, Aju Abraham; Javali, Mahendra; Srinivasa, Rangasetty

2016-01-15

Infarctions of the corpus callosum are rare vascular events. It is relatively immune to vascular insult because of its rich vascular supply from anterior and posterior circulations of brain. Report of 3 patients with largely diffuse acute corpus callosum infarction. 3 patients with largely diffuse acute corpus callosum infarction were studied and each of these 3 patients had 3 different aetiologies. The 3 different aetiologies of largely diffuse acute corpus callosum infarction were cardioembolism, tuberculous arteritis and takayasu arteritis. Diffuse corpus callosum infarcts are rare events. This case series narrates the three different aetiologies of diffuse acute corpus callosum infarction which is a rare vascular event. Copyright © 2015 Elsevier B.V. All rights reserved.
Lancaster Summer School in Corpus Linguistics

Directory of Open Access Journals (Sweden)

Jaka Čibej

2016-11-01

Full Text Available Med 12. in 15. julijem je na Univerzi v Lancastru potekala poletna šola korpusnega jezikoslovja Lancaster Summer Schools in Corpus Linguistics and Other Digital Methods. Poletno šolo so organizirali UCREL (University Centre for Computer Corpus Research on Language, ERC (Evropski svet za raziskave – European Research Council, CASS (ESRC Centre for Corpus Approaches to Social Science in ESRC (Economic and Social Research Council, razdeljena pa je bila na šest programov, prilagojenih različnim področjem: Korpusno jezikoslovje za proučevanje jezikov (Corpus Linguistics for Language Studies, Korpusno jezikoslovje za družbene vede (Corpus Linguistics for Social Science, Korpusno jezikoslovje za humanistiko (Corpus Linguistics for Humanities, Statistika za korpusno jezikoslovje (Statistics for Corpus Linguistics, Geografski informacijski sistemi za digitalno humanistiko (Geographical Information Systems for the Digital Humanities in Korpusno podprta obdelava naravnih jezikov (Corpus-based Natural Language Processing.
Large Sphenoethmoidal Encephalocele Associated with Agenesis of Corpus Callosum and Cleft Palate

Directory of Open Access Journals (Sweden)

Basir Hashemi

2010-06-01

Full Text Available AbstractBasal encephalocele is a rare craniofacial anomaly. In the presentpaper we report a 10-year-old boy presented with cleftpalate, congenital nystagmus, and hypertelorism. During preoperativeevaluation for cleft palate repair, a pulsatile masswas detected in the pharynx. Magnetic resonance imagingshowed sphenoethmoidal type of basal encephalocele andagenesis of corpus callosum. Neurosurgical consultation wasperformed for further evaluation and management.Iran J Med Sci 2010; 35(2: 154-156.
The BioC-BioGRID corpus: full text articles annotated for curation of protein–protein and genetic interactions

Science.gov (United States)

Kim, Sun; Chatr-aryamontri, Andrew; Chang, Christie S.; Oughtred, Rose; Rust, Jennifer; Wilbur, W. John; Comeau, Donald C.; Dolinski, Kara; Tyers, Mike

2017-01-01

A great deal of information on the molecular genetics and biochemistry of model organisms has been reported in the scientific literature. However, this data is typically described in free text form and is not readily amenable to computational analyses. To this end, the BioGRID database systematically curates the biomedical literature for genetic and protein interaction data. This data is provided in a standardized computationally tractable format and includes structured annotation of experimental evidence. BioGRID curation necessarily involves substantial human effort by expert curators who must read each publication to extract the relevant information. Computational text-mining methods offer the potential to augment and accelerate manual curation. To facilitate the development of practical text-mining strategies, a new challenge was organized in BioCreative V for the BioC task, the collaborative Biocurator Assistant Task. This was a non-competitive, cooperative task in which the participants worked together to build BioC-compatible modules into an integrated pipeline to assist BioGRID curators. As an integral part of this task, a test collection of full text articles was developed that contained both biological entity annotations (gene/protein and organism/species) and molecular interaction annotations (protein–protein and genetic interactions (PPIs and GIs)). This collection, which we call the BioC-BioGRID corpus, was annotated by four BioGRID curators over three rounds of annotation and contains 120 full text articles curated in a dataset representing two major model organisms, namely budding yeast and human. The BioC-BioGRID corpus contains annotations for 6409 mentions of genes and their Entrez Gene IDs, 186 mentions of organism names and their NCBI Taxonomy IDs, 1867 mentions of PPIs and 701 annotations of PPI experimental evidence statements, 856 mentions of GIs and 399 annotations of GI evidence statements. The purpose, characteristics and possible future
A corpus for plant-chemical relationships in the biomedical domain.

Science.gov (United States)

Choi, Wonjun; Kim, Baeksoo; Cho, Hyejin; Lee, Doheon; Lee, Hyunju

2016-09-20

Plants are natural products that humans consume in various ways including food and medicine. They have a long empirical history of treating diseases with relatively few side effects. Based on these strengths, many studies have been performed to verify the effectiveness of plants in treating diseases. It is crucial to understand the chemicals contained in plants because these chemicals can regulate activities of proteins that are key factors in causing diseases. With the accumulation of a large volume of biomedical literature in various databases such as PubMed, it is possible to automatically extract relationships between plants and chemicals in a large-scale way if we apply a text mining approach. A cornerstone of achieving this task is a corpus of relationships between plants and chemicals. In this study, we first constructed a corpus for plant and chemical entities and for the relationships between them. The corpus contains 267 plant entities, 475 chemical entities, and 1,007 plant-chemical relationships (550 and 457 positive and negative relationships, respectively), which are drawn from 377 sentences in 245 PubMed abstracts. Inter-annotator agreement scores for the corpus among three annotators were measured. The simple percent agreement scores for entities and trigger words for the relationships were 99.6 and 94.8 %, respectively, and the overall kappa score for the classification of positive and negative relationships was 79.8 %. We also developed a rule-based model to automatically extract such plant-chemical relationships. When we evaluated the rule-based model using the corpus and randomly selected biomedical articles, overall F-scores of 68.0 and 61.8 % were achieved, respectively. We expect that the corpus for plant-chemical relationships will be a useful resource for enhancing plant research. The corpus is available at http://combio.gist.ac.kr/plantchemicalcorpus .
The Ndebele Language Corpus: A Review of Some Factors Influencing the Content of the Corpus*

Directory of Open Access Journals (Sweden)

Samukele Hadebe

2011-10-01

Full Text Available
Abstract: The Ndebele language corpus described here is that compiled by the ALLEX Project (now ALRI at the University of Zimbabwe. It is intended to reflect as much as possible the Ndebele language as spoken in Zimbabwe. The Ndebele language corpus was built in order to provide much-needed material for the study of the Ndebele language with a special focus on dictionarymaking and research. Like most corpora, the Ndebele language corpus may in future be used for other purposes not thought of at the time of its inception. It has been designed to meet generally acceptable standards so that it can be adaptable to various possible uses by various researchers. The article wants to outline the building process of the Ndebele language corpus with special emphasis on the challenges that faced compilers, and possible solutions. It is assumed that some of these challenges might not be peculiar to Ndebele alone but could also affect related African languages in a more or less similar situation. The main focus of the discussion will be the composition of the Ndebele language corpus, i.e. the type of texts that constitute the corpus. The corpus is composed of published texts, unpublished texts and oral material gathered from Ndebele-speaking districts of Zimbabwe. It will be argued that the use of the corpus and its reliability for research depends among other factors on its contents. It will also be shown that the contents of a corpus depend on a number of factors, some of which include sociolinguistic, political and economic considerations. These considerations have implications on both the content and quality of published and oral texts that constitute the Ndebele language corpus.
Keywords: CORPUS, ORAL MATERIALS, CODE-MIXING, CODE-SWITCHING, MOTHER- TONGUE, NDEBELE
Opsomming: Die Ndebeletaalkorpus: 'n Oorsig van sommige faktore wat die inhoud van die korpus be?nvloed. Die Ndebeletaalkorpus wat hier beskryf word, is di? saamgestel deur die
Statistical analyses of digital collections: Using a large corpus of systematic reviews to study non-citations

DEFF Research Database (Denmark)

Frandsen, Tove Faber; Nicolaisen, Jeppe

2017-01-01

Using statistical methods to analyse digital material for patterns makes it possible to detect patterns in big data that we would otherwise not be able to detect. This paper seeks to exemplify this fact by statistically analysing a large corpus of references in systematic reviews. The aim...
Assessing the Lexico-Grammatical Characteristics of a Corpus of College-Level Statistics Textbooks: Implications for Instruction and Practice

Science.gov (United States)

Wagler, Amy E.; Lesser, Lawrence M.; González, Ariel I.; Leal, Luis

2015-01-01

A corpus of current editions of statistics textbooks was assessed to compare aspects and levels of readability for the topics of "measures of center," "line of fit," "regression analysis," and "regression inference." Analysis with lexical software of these text selections revealed that the large corpus can…
The BioC-BioGRID corpus: full text articles annotated for curation of protein-protein and genetic interactions.

Science.gov (United States)

Islamaj Dogan, Rezarta; Kim, Sun; Chatr-Aryamontri, Andrew; Chang, Christie S; Oughtred, Rose; Rust, Jennifer; Wilbur, W John; Comeau, Donald C; Dolinski, Kara; Tyers, Mike

2017-01-01

A great deal of information on the molecular genetics and biochemistry of model organisms has been reported in the scientific literature. However, this data is typically described in free text form and is not readily amenable to computational analyses. To this end, the BioGRID database systematically curates the biomedical literature for genetic and protein interaction data. This data is provided in a standardized computationally tractable format and includes structured annotation of experimental evidence. BioGRID curation necessarily involves substantial human effort by expert curators who must read each publication to extract the relevant information. Computational text-mining methods offer the potential to augment and accelerate manual curation. To facilitate the development of practical text-mining strategies, a new challenge was organized in BioCreative V for the BioC task, the collaborative Biocurator Assistant Task. This was a non-competitive, cooperative task in which the participants worked together to build BioC-compatible modules into an integrated pipeline to assist BioGRID curators. As an integral part of this task, a test collection of full text articles was developed that contained both biological entity annotations (gene/protein and organism/species) and molecular interaction annotations (protein-protein and genetic interactions (PPIs and GIs)). This collection, which we call the BioC-BioGRID corpus, was annotated by four BioGRID curators over three rounds of annotation and contains 120 full text articles curated in a dataset representing two major model organisms, namely budding yeast and human. The BioC-BioGRID corpus contains annotations for 6409 mentions of genes and their Entrez Gene IDs, 186 mentions of organism names and their NCBI Taxonomy IDs, 1867 mentions of PPIs and 701 annotations of PPI experimental evidence statements, 856 mentions of GIs and 399 annotations of GI evidence statements. The purpose, characteristics and possible future
Corpus-Based Investigations of Language Use.

Science.gov (United States)

Biber, Douglas; And Others

1996-01-01

Examines a representative text corpus to gain insights into language structure and use and to open new areas of linguistic inquiry. Various illustrations are presented that provide a glimpse into the value of corpus-based investigations for increasing one's understanding of language use and imparting insights important for designing effective…
Primary diffuse large B cell lymphoma arising from a leiomyoma of the uterine corpus.

Science.gov (United States)

Zhao, Lianhua; Ma, Qiang; Wang, Qiushi; Zeng, Ying; Luo, Qingya; Xiao, Hualiang

2016-01-20

Primary diffuse large B cell lymphoma (DLBCL) of the uterus is rare, and primary DLBCL arising from a uterine leiomyoma (collision tumor) has not been reported in the literature. We describe the clinical, histological, immunohistochemical, and molecular features of primary DLBCL arising from a leiomyoma in the uterine corpus. A 73-year-old female patient had a uterine mass for 23 years. An ultrasound scan revealed marked enlargement of the uterus, measuring 18.2 × 13 × 16.3 cm, with a 17.6 × 10.9 × 11.6 cm hypoechoic mass in the uterine corpus. The tumors consisted of medium- to large-sized cells exhibiting a diffuse pattern of growth with a well-circumscribed leiomyoma. The neoplastic cells strongly expressed CD79α, CD20 and PAX5. Molecular analyses indicated clonal B-cell receptor gene rearrangement. To the best of our knowledge, no previous cases of primary DLBCL arising from a leiomyoma have been reported. It is necessary to differentiate a diagnosis of primary DLBCL arising from a leiomyoma from that of leiomyoma with florid reactive lymphocytic infiltration (lymphoma-like lesion). Careful analysis of clinical, histological, immunophenotypic, and genetic features is required to establish the correct diagnosis.
Exploring theoretical functions of corpus data in teaching translation

Directory of Open Access Journals (Sweden)

Éric Poirier

2016-04-01

Full Text Available http://dx.doi.org/10.5007/2175-7968.2016v36nesp1p177 As language referential data banks, corpora are instrumental in the exploration of translation solutions in bilingual parallel texts or conventional usages of source or target language in monolingual general or specialized texts. These roles are firmly rooted in translation processes, from analysis and interpretation of source text to searching for an acceptable equivalent and integrating it into the production of the target text. Provided the creative and not the conservative way be taken, validation or adaptation of target text in accordance with conventional usages in the target language also benefits from corpora. Translation teaching is not exploiting this way of translating that is common practice in the professional translation markets around the world. Instead of showing what corpus tools can do to translation teaching, we start our analysis with a common issue within translation teaching and show how corpus data can help to resolve it in learning activities in translation courses. We suggest a corpus-driven model for the interpretation of ‘business’ as a term and as an item in complex terms based on source text pattern analysis. This methodology will make it possible for teachers to explain and justify interpretation rules that have been defined theoretically from corpus data. It will also help teachers to conceive and non-subjectively assess practical activities designed for learners of translation. Corpus data selected for the examples of rule-based interpretations provided in this paper have been compiled in a corpus-driven study (Poirier, 2015 on the translation of the noun ‘business’ in the field of specialized translation in business, economics, and finance from English to French. The corpus methodology and rule-based interpretation of senses can be generalized and applied in the definition of interpretation rules for other language pairs and other specialized simple and
Exploring theoretical functions of corpus data in teaching translation

Directory of Open Access Journals (Sweden)

Éric Poirier

2016-06-01

Full Text Available As language referential data banks, corpora are instrumental in the exploration of translation solutions in bilingual parallel texts or conventional usages of source or target language in monolingual general or specialized texts. These roles are firmly rooted in translation processes, from analysis and interpretation of source text to searching for an acceptable equivalent and integrating it into the production of the target text. Provided the creative and not the conservative way be taken, validation or adaptation of target text in accordance with conventional usages in the target language also benefits from corpora. Translation teaching is not exploiting this way of translating that is common practice in the professional translation markets around the world. Instead of showing what corpus tools can do to translation teaching, we start our analysis with a common issue within translation teaching and show how corpus data can help to resolve it in learning activities in translation courses. We suggest a corpus-driven model for the interpretation of ‘business’ as a term and as an item in complex terms based on source text pattern analysis. This methodology will make it possible for teachers to explain and justify interpretation rules that have been defined theoretically from corpus data. It will also help teachers to conceive and non-subjectively assess practical activities designed for learners of translation. Corpus data selected for the examples of rule-based interpretations provided in this paper have been compiled in a corpus-driven study (Poirier, 2015 on the translation of the noun ‘business’ in the field of specialized translation in business, economics, and finance from English to French. The corpus methodology and rule-based interpretation of senses can be generalized and applied in the definition of interpretation rules for other language pairs and other specialized simple and complex terms. These works will encourage the

Orfismo en el Corpus Philostrateum

Directory of Open Access Journals (Sweden)

Susana M. Lizcano Rejano

2003-06-01

Full Text Available We search through the Corpus Philostrateum for the presence of connections between this literary production and Orphismus – its system of beliefs, its peculiar interpretation of the traditional Greek mythology, its proposal for a particular way of life. Also, we try to determine the relation, that we can find in this corpus between the ideology and customs that the Pythagoreans and Orphics supported.
Classification of acquired lesions of the corpus callosum with MRI

Energy Technology Data Exchange (ETDEWEB)

Friese, S.A.; Bitzer, M.; Voigt, K.; Kueker, W. [Tuebingen Univ. (Germany). Abt. fuer Neuroradiologie; Freudenstein, D. [Department of Neurosurgery, Eberhard-Karls-University Tuebingen (Germany)

2000-11-01

MRI has facilitated diagnostic assessment of the corpus callosum. Diagnostic classification of solitary or multiple lesions of the corpus callosum has not attracted much attention, although signal abnormalities are not uncommon. Our aim was to identify characteristic imaging features of lesions frequently encountered in practice. We reviewed the case histories of 59 patients with lesions shown on MRI. The nature of the lesions was based on clinical features and/or long term follow-up (ischaemic 20, Virchow-Robin spaces 3, diffuse axonal injury 7, multiple sclerosis 11, hydrocephalus 5, acute disseminated encephalomyelitis 5, Marchiafava-Bignami disease 4, lymphoma 2, glioblastoma hamartoma each 1). The location in the sagittal plane, the relationship to the borders of the corpus callosum and midline and the size were documented. The 20 ischaemic lesions were asymmetrical but adjacent to the midline; the latter was involved in new or large lesions. Diffuse axonal injury commonly resulted in large lesions, which tended to be asymmetrical; the midline and borders of the corpus callosum were always involved. Lesions in MS were small, at the lower border of the corpus callosum next to the septum pellucidum, and crossed the midline asymmetrically. Acute disseminated encephalomyelitis and the other perivenous inflammatory diseases caused relatively large, asymmetrical lesions. Hydrocephalus resulted in lesions of the upper part of the corpus callosum, and mostly in its posterior two thirds; they were found in the midline. Lesions in Marchiafava-Bignami disease were large, often symmetrically in the midline in the splenium and did not reach the edge of the corpus callosum. (orig.)
Concept annotation in the CRAFT corpus.

Science.gov (United States)

Bada, Michael; Eckert, Miriam; Evans, Donald; Garcia, Kristin; Shipley, Krista; Sitnikov, Dmitry; Baumgartner, William A; Cohen, K Bretonnel; Verspoor, Karin; Blake, Judith A; Hunter, Lawrence E

2012-07-09

Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text. This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP) community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released). Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement. As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens), our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection), the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are freely available at http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml.
The structure of an entry in the National corpus of Tuvan language

Directory of Open Access Journals (Sweden)

Mengi V. Ondar

2016-12-01

Full Text Available Contemporary information technologies and mathematical modelling has made creating corpora of natural languages significantly easier. A corpus is an information and reference system based on a collection of digitally processed texts. A corpus includes various written and oral texts in the given language, a set of dictionaries and markup – information on the properties of the text. It is the presence of the markup which distinguishes a corpus from an electronic library. At the moment, national corpora are being set up for many languages of the Russian Federation, including those of the Turkic peoples. Faculty members, postgraduate and undergraduate students at Tuvan State University and Siberian Federal University are working on the National corpus of Tuvan language. This article describes the structure of a dictionary entry in the National corpus of Tuvan language. The corpus database comprises the following tables: MAIN – the headword table, RUS, ENG, GER — translations of the headword into three languages, MORPHOLOGY — the table containing morphological data on the headword. The database is built in Microsoft Office Access. Working with the corpus dictionary includes the following functions: adding, editing and removing an entry, entry search (with transcription, setting and visualizing morphological features of a headword. The project allows us to view the corpus dictionary as a multi-structure entity with a complex hierarchical structure and a dictionary entry as its key component. The corpus dictionary we developed can be used for studying Tuvan language in its pronunciation, orthography and word analysis, as well as for searching for words and collocations in the texts included into the corpus.
Automatic extraction of property norm-like data from large text corpora.

Science.gov (United States)

Kelly, Colin; Devereux, Barry; Korhonen, Anna

2014-01-01

Traditional methods for deriving property-based representations of concepts from text have focused on either extracting only a subset of possible relation types, such as hyponymy/hypernymy (e.g., car is-a vehicle) or meronymy/metonymy (e.g., car has wheels), or unspecified relations (e.g., car--petrol). We propose a system for the challenging task of automatic, large-scale acquisition of unconstrained, human-like property norms from large text corpora, and discuss the theoretical implications of such a system. We employ syntactic, semantic, and encyclopedic information to guide our extraction, yielding concept-relation-feature triples (e.g., car be fast, car require petrol, car cause pollution), which approximate property-based conceptual representations. Our novel method extracts candidate triples from parsed corpora (Wikipedia and the British National Corpus) using syntactically and grammatically motivated rules, then reweights triples with a linear combination of their frequency and four statistical metrics. We assess our system output in three ways: lexical comparison with norms derived from human-generated property norm data, direct evaluation by four human judges, and a semantic distance comparison with both WordNet similarity data and human-judged concept similarity ratings. Our system offers a viable and performant method of plausible triple extraction: Our lexical comparison shows comparable performance to the current state-of-the-art, while subsequent evaluations exhibit the human-like character of our generated properties.
Using a Corpus in a 300-Level Spanish Grammar Course

Science.gov (United States)

Benavides, Carlos

2015-01-01

The present study examined the use and effectiveness of a large corpus--the Corpus del Español (Davies, 2002)--in a 300-level Spanish grammar university course. Students conducted hands-on corpus searches with the goal of finding concordances containing particular types of collocations (combinations of words that tend to co-occur) and tokens (any…
Named-Entity Tagging a Very Large Unbalanced Corpus. Training and Evaluating NE classifiers

DEFF Research Database (Denmark)

Bingel, Joachim; Haider, Thomas

2014-01-01

We describe a systematic and application-oriented approach to training and evaluating named entity recognition and classification (NERC) systems, the purpose of which is to identify an optimal system and to train an optimal model for named entity tagging DeReKo, a very large general-purpose corpus...... when evaluated on more uniform and less diverse data. We create and manually annotate such a representative sample as evaluation data for three different NERC systems, for each of which various models are learnt on multiple training data. The proposed sampling method can be viewed as a generally...
The Nordic Dialect Corpus – a joint research infrastructure

Directory of Open Access Journals (Sweden)

Janne Bondi Johannessen

2011-06-01

Full Text Available The paper describes the Nordic Dialect Corpus as of June 2010. The corpus is a tool that combines a number of useful features that together makes it a unique and very advanced resource for researchers of many fields of language search. The corpus is web-based and features full audio-visual representation linked to transcriptions and translations.
A Lingüística de corpus: história, problemas, legitimidade

Directory of Open Access Journals (Sweden)

Jacqueline Léon

2006-01-01

Full Text Available During the nineties, the accessibility of large corpora and the possibility of manipulation of enormous quantities of linguistic data was the origin of a renewal of interest in statistical and probability evidences that served to directly question linguistics about its objectives, methods and foundations. This interest gained increasing importance and became important currently under the name of corpus linguistics, a field of dominant research in language science. In this article we will show that the designation corpus linguistics covers considerably heterogeneous theoretical positions and research, topics. We show how corpus linguistics, originally of british origin, was later endowed with historical and theoretical legitimacy while at the same time intending to establish itself as a new paradigm in language science. Finally we distinguish two attitudes inside the british tradition: one, intending to build the studies on a corpus and in a new paradigm based on a retrospective construction of the critical works of chomsky during the years 1959 and 1960, which was intended to legitimize the studies; the other attitude involves the continuity of the tradition of british empirical linguistics.
FREEDOM OF COMBINATION AND HETEROGENEITY: A CORPUS LINGUIST’S LOOK AT TWO SAUSSUREAN INSIGHTS

Directory of Open Access Journals (Sweden)

Tony Berber Sardinha

2014-06-01

Full Text Available This article offers a reexamination of two of Saussure’s insights from the point of view of corpus linguistics—namely, freedom of combination and heterogeneity in language in use. Regarding the first insight, an analysis of word combinations in a corpus of newspaper texts written in Brazilian Portuguese was carried out to determine how many of these combinations were actual collocations—that is, were used frequently enough in a very large reference corpus (the Brazilian corpus to warrant statistical significance. The results suggested that most word combinations are not free; rather, they follow previously established preferences among speakers. Regarding the second notion, that of heterogeneity, the collocations in the newspaper texts were tracked as they were deployed one after the other along each text, and this flow was visually depicted. The inspection of the charts revealed unique patterns of the distribution of collocation, thereby suggesting that the evidence supports the view of heterogeneity. A cluster analysis was later conducted on the amount of collocations in each text, revealing three basic collocation bands onto which all the texts can be fitted. This was interpreted as suggesting that heterogeneity, despite being present and noticeable, is constrained rather than limitless. The article concludes that the methods and techniques afforded by present-day corpus linguistics can shed light onto Saussure’s many valuable insights. ------------------------------------------------------------------------------ LIBERDADE DE COMBINAÇÃO E HETEROGENEIDADE: UM OLHAR DA LINGUÍSTICA DE CORPUS EM DOIS INSIGHTS SAUSSUREANOS O artigo reexamina dois dos insights de Saussure a partir da perspectiva da linguística de corpus, a saber a liberdade de combinação e a heterogeneidade no uso da língua. Com relação ao primeiro, foi feita uma análise de combinações de palavras em corpus de textos de jornais para determinar quantas eram
GECO, un Gestor de Corpus colaborativo basado en web

Directory of Open Access Journals (Sweden)

Gerardo Sierra

2017-12-01

Full Text Available Este artículo presenta GEstor de COrpus (GECO, un software de gestión de corpus en línea que permite a los usuarios subir colecciones de documentos y volverlos corpus digitales. En el sistema, los corpus pueden ser procesados por otras aplicaciones, las cuales están implementadas como módulos integrados a la infraestructura de GECO. En este documento se describen a detalle sus características, así como la funcionalidad del generador de concordancias desarrollado en torno a él.
Angular analysis of corpus callosum in 18 patients with frontonasal dysplasia

Directory of Open Access Journals (Sweden)

Giffoni Silvyo David Araújo

2004-01-01

Full Text Available Considering the rarity of the frontonasal dysplasia (FD and the few reports about it in a large casuistry using magnetic resonance image (MRI, we describe the results of the angular analysis of the corpus callosum of 18 individuals with FD (7 male, 11 female, using an easily-reproductive method. Group I had 12 individuals with isolated form and Group II had 6 individuals with FD syndromic with unknown etiology. The results are presented in set. Comparing with the control group, patients with FD presented alpha angle increase and beta and gamma angles reduction (p<0.05. Alpha and gamma angles express the relationship between the anterior portion of corpus callosum and the floor of 4th ventricle. Considering the embryonary development, these findings would occur secondarily to failure during the development of nasal capsula. Thus, angular anomaly in corpus callosum would be a usual finding, and not fortuitous in patients with FD.
Automated de-identification of free-text medical records

Directory of Open Access Journals (Sweden)

Long William J

2008-07-01

Full Text Available Abstract Background Text-based patient medical records are a vital resource in medical research. In order to preserve patient confidentiality, however, the U.S. Health Insurance Portability and Accountability Act (HIPAA requires that protected health information (PHI be removed from medical records before they can be disseminated. Manual de-identification of large medical record databases is prohibitively expensive, time-consuming and prone to error, necessitating automatic methods for large-scale, automated de-identification. Methods We describe an automated Perl-based de-identification software package that is generally usable on most free-text medical records, e.g., nursing notes, discharge summaries, X-ray reports, etc. The software uses lexical look-up tables, regular expressions, and simple heuristics to locate both HIPAA PHI, and an extended PHI set that includes doctors' names and years of dates. To develop the de-identification approach, we assembled a gold standard corpus of re-identified nursing notes with real PHI replaced by realistic surrogate information. This corpus consists of 2,434 nursing notes containing 334,000 words and a total of 1,779 instances of PHI taken from 163 randomly selected patient records. This gold standard corpus was used to refine the algorithm and measure its sensitivity. To test the algorithm on data not used in its development, we constructed a second test corpus of 1,836 nursing notes containing 296,400 words. The algorithm's false negative rate was evaluated using this test corpus. Results Performance evaluation of the de-identification software on the development corpus yielded an overall recall of 0.967, precision value of 0.749, and fallout value of approximately 0.002. On the test corpus, a total of 90 instances of false negatives were found, or 27 per 100,000 word count, with an estimated recall of 0.943. Only one full date and one age over 89 were missed. No patient names were missed in either
The Shona Corpus and the Problem of Tagging?

Directory of Open Access Journals (Sweden)

Emmanuel Chabata

2011-10-01

Full Text Available
Abstract: In this paper the writer examines problems the African Languages Lexical (ALLEX Project (at present the African Languages Research Institute (ALRI? encountered while tagging the Shona corpus. The problems to be highlighted include general problems which apply to more than one language as well as problems peculiar to Shona. The paper was inspired by the challenges the writer encountered when he took part in building the Shona corpus. An analysis of the problems that most corpus builders face shows that more problems are likely to be encountered when dealing with spoken corpora than with written corpora. The paper demonstrates that tagging is an important component of corpus building as it makes it easier for a researcher to extract relevant data. To utilise the benefits of a tagged corpus, the tagging should be thorough and accurate. Wellinformed decisions form an integral part of the tagging process since the utility of a tagged corpus depends largely on the input of the tagging process. This paper shows the need to take the tagging process seriously.
Keywords: ALLEX PROJECT, COMPUTER, CORPUS, ENCODING, FOREIGN WORD, LEMMATIZATION, LEXICOGRAPHY, MONITOR CORPUS, PART OF SPEECH, SCANNING, SHONA, SLANG, TAGGING, TRANSCRIPTION, WORD
Opsomming: Die Shonakorpus en die probleem van etikettering, In hierdieartikel ondersoek die outeur probleme wat die African Languages Lexical (ALLEX Project (tansdie African Languages Research Institute (ALRI» teegekom het terwyl die Shonakorpus geetiketteeris. Die probleme wat bespreek word, sluit algemene probleme in wat van toepassing is opmeer as een taa, sowel as spesifieke probleme wat eie aan Shona is. Die artikel het sy ontstaan indie uitdagings wat die outeur teegekom het terwyl hy deel gehad het aan die opbou van die Shonakorpus.'n Ontieding van die probleme waarvoor die meeste korpusbouers te staan kom, toon datdaar waarskynlik meer probleme teegekom word wanneer daar met gesproke
Old Persian corpus [Dataset

NARCIS (Netherlands)

Bavant, M.

2011-01-01

XML Old Persian corpus. The corpus is based on publicly available data on the Web. Those data can be traced back to the grammar of Old Persian by Kent (1950). The corpus contains those data and is arranged in a way suitable for corpus searches.
A corpus-based study on the translation of “namorar” and “date” in literature texts = Um estudo baseado em corpus sobre a tradução de “namorar” e “date” em textos literários

Directory of Open Access Journals (Sweden)

Fleck, Regina Caballero

2012-01-01

Full Text Available Profissionais que trabalham com traduções e línguas em geral provavelmente já encontraram em textos palavras “intraduzíveis”, tais como “namorar” e “date”. O presente estudo busca difundir o uso de ferramentas baseadas em corpus entre tradutores literários. Nossas perguntas de pesquisa são: quais as soluções tradutórias encontradas no corpus? Como essas soluções estão relacionadas a fatores extralinguísticos? Os dados deste estudo foram retirados do Compara, corpus paralelo que está disponível online e que consiste em textos originais em português e inglês alinhados com suas respectivas traduções. A fim de analisarmos os exemplos, nossos parâmetros serão as definições dos dicionários Houaiss e Oxford. Ao fim deste estudo, podemos observar uma equivalência unilateral entre “namorar” e “date” e que esses termos evoluíram de maneira distinta nos dois idiomas
Holistic corpus-based dialectology Dialetologia holística baseada em corpus

Directory of Open Access Journals (Sweden)

Benedikt Szmrecsanyi

2011-01-01

Full Text Available This paper is concerned with sketching future directions for corpus-based dialectology. We advocate a holistic approach to the study of geographically conditioned linguistic variability, and we present a suitable methodology, 'corpusbased dialectometry', in exactly this spirit. Specifically, we argue that in order to live up to the potential of the corpus-based method, practitioners need to (i abandon their exclusive focus on individual linguistic features in favor of the study of feature aggregates, (ii draw on computationally advanced multivariate analysis techniques (such as multidimensional scaling, cluster analysis, and principal component analysis, and (iii aid interpretation of empirical results by marshalling state-of-the-art data visualization techniques. To exemplify this line of analysis, we present a case study which explores joint frequency variability of 57 morphosyntax features in 34 dialects all over Great Britain.Este artigo debruça-se sobre o esboço propositivo de futuras direções para a dialetologia baseada em corpus. Defendemos uma abordagem holística para o estudo da variabilidade linguística geograficamente condicionada, e apresentamos uma metodologia adequada para tal - a dialetometria baseada em corpus. Mais especificamente, defendemos que para que se obtenham todos os resultados esperados da metodologia de corpus, pesquisadores devem: (i abandonar seu foco exclusivo em traços linguísticos individuais em favor do estudo dos agregados de traços, (ii amparar-se em métodos computacionais avançados de técnicas de análise multivariada (tais como escalagem multidimensional, análise de clusters, e análise de componente principal, e (iii auxiliar a interpretação de resultados empíricos através da utilização do estado da arte em técnicas de visualização. A fim de exemplificarmos essa linha de análise, apresentamos um estudo de caso que explora a variabilidade da frequência agregada de 57 tra
A corpus and a concordancer of academic journal articles

Directory of Open Access Journals (Sweden)

Deny A. Kwary

2018-02-01

Full Text Available This data article presents a corpus (i.e. a selection of a big number of words in an electronic form and a concordancer (i.e. a tool to show the word in its context of use of academic journal articles. As the title suggests, the data were collected from research articles published in academic journals. The corpus contains 5,686,428 words selected from 895 journal articles published by Elsevier in 2011–2015. The corpus is classified into four subject areas: Health sciences, Life sciences, Physical Sciences, and Social Sciences, following the classifications of Scopus, which is the largest abstract and citation database of peer-reviewed scientific journals, books and conference proceedings. To ease the access and utilization of the corpus, a program to produce the key word in context (KWIC and word frequency was created and placed on the website: corpus.kwary.net. The corpus is a valuable resource for researchers, teachers, and translators working on academic English.
Morphometric changes of the corpus callosum in congenital blindness

DEFF Research Database (Denmark)

Tomaiuolo, Francesco; Campana, Serena; Collins, D Louis

2014-01-01

We examined the effects of visual deprivation at birth on the development of the corpus callosum in a large group of congenitally blind individuals. We acquired high-resolution T1-weighted MRI scans in 28 congenitally blind and 28 normal sighted subjects matched for age and gender....... There was no overall group effect of visual deprivation on the total surface area of the corpus callosum. However, subdividing the corpus callosum into five subdivisions revealed significant regional changes in its three most posterior parts. Compared to the sighted controls, congenitally blind individuals showed a 12......% reduction in the splenium, and a 20% increase in the isthmus and the posterior part of the body. A shape analysis further revealed that the bending angle of the corpus callosum was more convex in congenitally blind compared to the sighted control subjects. The observed morphometric changes in the corpus...
The Yale-Classical Archives Corpus

Directory of Open Access Journals (Sweden)

Christopher William White

2016-07-01

Full Text Available The Yale-Classical Archives Corpus (YCAC contains harmonic and rhythmic information for a dataset of Western European Classical art music. This corpus is based on data from classicalarchives.com, a repository of thousands of user-generated MIDI representations of pieces from several periods of Western European music history. The YCAC makes available metadata for each MIDI file, as well as a list of pitch simultaneities ("salami slices" in the MIDI file. Metadata include the piece's composer, the composer's country of origin, date of composition, genre (e.g., symphony, piano sonata, nocturne, etc., instrumentation, meter, and key. The processing step groups the file's pitches into vertical slices each time a pitch is added or subtracted from the texture, recording the slice's offset (measured in the number of quarter notes separating the event from the file's beginning, highest pitch, lowest pitch, prime form, scale-degrees in relation to the global key (as determined by experts, and local key information (as determined by a windowed key-profile analysis. The corpus contains 13,769 MIDI files by 571 composers yielding over 14,051,144 vertical slices. This paper outlines several properties of this corpus, along with a representative study using this dataset.

The Gutenberg English Poetry Corpus: Exemplary Quantitative Narrative Analyses

Directory of Open Access Journals (Sweden)

Arthur M. Jacobs

2018-04-01

Full Text Available This paper describes a corpus of about 3,000 English literary texts with about 250 million words extracted from the Gutenberg project that span a range of genres from both fiction and non-fiction written by more than 130 authors (e.g., Darwin, Dickens, Shakespeare. Quantitative narrative analysis (QNA is used to explore a cleaned subcorpus, the Gutenberg English Poetry Corpus (GEPC, which comprises over 100 poetic texts with around two million words from about 50 authors (e.g., Keats, Joyce, Wordsworth. Some exemplary QNA studies show author similarities based on latent semantic analysis, significant topics for each author or various text-analytic metrics for George Eliot’s poem “How Lisa Loved the King” and James Joyce’s “Chamber Music,” concerning, e.g., lexical diversity or sentiment analysis. The GEPC is particularly suited for research in Digital Humanities, Computational Stylistics, or Neurocognitive Poetics, e.g., as training and test corpus for stimulus development and control in empirical studies.
An annotated corpus with nanomedicine and pharmacokinetic parameters

Directory of Open Access Journals (Sweden)

Lewinski NA

2017-10-01

Full Text Available Nastassja A Lewinski,1 Ivan Jimenez,1 Bridget T McInnes2 1Department of Chemical and Life Science Engineering, Virginia Commonwealth University, Richmond, VA, 2Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA Abstract: A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Although corpora are available for pharmaceuticals, resources for nanomedicines and nanotechnology are still limited. To foster nanotechnology text mining (NanoNLP efforts, we have constructed a corpus of annotated drug product inserts taken from the US Food and Drug Administration’s Drugs@FDA online database. In this work, we present the development of the Engineered Nanomedicine Database corpus to support the evaluation of nanomedicine entity extraction. The data were manually annotated for 21 entity mentions consisting of nanomedicine physicochemical characterization, exposure, and biologic response information of 41 Food and Drug Administration-approved nanomedicines. We evaluate the reliability of the manual annotations and demonstrate the use of the corpus by evaluating two state-of-the-art named entity extraction systems, OpenNLP and Stanford NER. The annotated corpus is available open source and, based on these results, guidelines and suggestions for future development of additional nanomedicine corpora are provided. Keywords: nanotechnology, informatics, natural language processing, text mining, corpora
An annotated corpus with nanomedicine and pharmacokinetic parameters.

Science.gov (United States)

Lewinski, Nastassja A; Jimenez, Ivan; McInnes, Bridget T

2017-01-01

A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP) approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Although corpora are available for pharmaceuticals, resources for nanomedicines and nanotechnology are still limited. To foster nanotechnology text mining (NanoNLP) efforts, we have constructed a corpus of annotated drug product inserts taken from the US Food and Drug Administration's Drugs@FDA online database. In this work, we present the development of the Engineered Nanomedicine Database corpus to support the evaluation of nanomedicine entity extraction. The data were manually annotated for 21 entity mentions consisting of nanomedicine physicochemical characterization, exposure, and biologic response information of 41 Food and Drug Administration-approved nanomedicines. We evaluate the reliability of the manual annotations and demonstrate the use of the corpus by evaluating two state-of-the-art named entity extraction systems, OpenNLP and Stanford NER. The annotated corpus is available open source and, based on these results, guidelines and suggestions for future development of additional nanomedicine corpora are provided.
Then and now: A reconsideration of the first corpus of scientific English

Directory of Open Access Journals (Sweden)

John M. Swales

2004-10-01

Full Text Available The subtitle of Huddleston (1971 reads A syntactic study based on an analysis of scientific texts; this volume thus represents the first carefully designed and substantial corpus of scientific English. In this paper I re-examine a selection of his findings based on the science and engineering half of Hyland's corpus of 240 research articles. Features selected were variation in the passivization of individual transitive verbs, the paucity of instances of V + V-ing structures like "He continued working", and the meaning of the modal must in research prose. In all three cases, Huddleston's findings were largely confirmed in a database constructed about 35 years later, thus suggesting that English research writing in the sciences is, at least in grammatical terms, fundamentally stable. In the closing section, I contrast this linguistic stability with the rapid technological development of corpus linguistics. I instance a recent co-taught experimental course in which international senior doctoral students from the health and social sciences were able, with relatively little training and guidance, to construct paired corpora of their own research writings and of published articles from their own specialities and then conduct precisely the kinds of analysis that only a highly professional linguist could, with considerable more labour, conduct nearly forty years ago.
Corpus Based Authenicity Analysis of Language Teaching Course Books

Directory of Open Access Journals (Sweden)

Emrah PEKSOY

2017-12-01

Full Text Available In this study, the resemblance of the language learning course books used in Turkey to authentic language spoken by native speakers is explored by using a corpus-based approach. For this, the 10-million-word spoken part of the British National Corpus was selected as reference corpus. After that, all language learning course books used in high schools in Turkey were scanned and transferred to SketchEngine, an online corpus query tool. Lastly, certain grammar points were extracted first from British National Corpus and then from course books; similaritites and differences were compared. At the end of the study, it was found that the language learning course books have little similarity to authentic language in terms of certain grammatical items and frequency of their collocations. In this way, the points to be revised and changed were explored. In addition, this study emphasized the role of corpus approach as a material development and analysis tool; and tested the functionality of course books for writers and for Ministry of National Education.
Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies.

Science.gov (United States)

Cohen, Raphael; Elhadad, Michael; Elhadad, Noémie

2013-01-16

The increasing availability of Electronic Health Record (EHR) data and specifically free-text patient notes presents opportunities for phenotype extraction. Text-mining methods in particular can help disease modeling by mapping named-entities mentions to terminologies and clustering semantically related terms. EHR corpora, however, exhibit specific statistical and linguistic characteristics when compared with corpora in the biomedical literature domain. We focus on copy-and-paste redundancy: clinicians typically copy and paste information from previous notes when documenting a current patient encounter. Thus, within a longitudinal patient record, one expects to observe heavy redundancy. In this paper, we ask three research questions: (i) How can redundancy be quantified in large-scale text corpora? (ii) Conventional wisdom is that larger corpora yield better results in text mining. But how does the observed EHR redundancy affect text mining? Does such redundancy introduce a bias that distorts learned models? Or does the redundancy introduce benefits by highlighting stable and important subsets of the corpus? (iii) How can one mitigate the impact of redundancy on text mining? We analyze a large-scale EHR corpus and quantify redundancy both in terms of word and semantic concept repetition. We observe redundancy levels of about 30% and non-standard distribution of both words and concepts. We measure the impact of redundancy on two standard text-mining applications: collocation identification and topic modeling. We compare the results of these methods on synthetic data with controlled levels of redundancy and observe significant performance variation. Finally, we compare two mitigation strategies to avoid redundancy-induced bias: (i) a baseline strategy, keeping only the last note for each patient in the corpus; (ii) removing redundant notes with an efficient fingerprinting-based algorithm. (a)For text mining, preprocessing the EHR corpus with fingerprinting yields
Web corpus construction

CERN Document Server

Schafer, Roland

2013-01-01

The World Wide Web constitutes the largest existing source of texts written in a great variety of languages. A feasible and sound way of exploiting this data for linguistic research is to compile a static corpus for a given language. There are several adavantages of this approach: (i) Working with such corpora obviates the problems encountered when using Internet search engines in quantitative linguistic research (such as non-transparent ranking algorithms). (ii) Creating a corpus from web data is virtually free. (iii) The size of corpora compiled from the WWW may exceed by several orders of magnitudes the size of language resources offered elsewhere. (iv) The data is locally available to the user, and it can be linguistically post-processed and queried with the tools preferred by her/him. This book addresses the main practical tasks in the creation of web corpora up to giga-token size. Among these tasks are the sampling process (i.e., web crawling) and the usual cleanups including boilerplate removal and rem...
Intertext: On Connecting Text in the Building Process

DEFF Research Database (Denmark)

Christensen, Lars Rune

2015-01-01

Actors in the building process are critically dependent on a corpus of written text that draws the distributed work tasks together. This paper introduces, on the basis of a field study, the concepts of corpus, intertext and intertextuality to the analysis of text in cooperative work practice. Thi...... type and the mediated type, may constitute the intertext of a particular task. By employing the concepts of corpus, intertext and intertextuality with respect to the study of the building process, this paper outlines an approach to the investigation of text in cooperative work.......Actors in the building process are critically dependent on a corpus of written text that draws the distributed work tasks together. This paper introduces, on the basis of a field study, the concepts of corpus, intertext and intertextuality to the analysis of text in cooperative work practice....... This paper shows that actors in the building process create intertext (connections) between complementary texts, in a particular situation and for a particular task. This has an integrating effect on the building process. Several types of intertextuality, including the complementary type, the intratextual...
Acute aortic dissection type A discloses Corpus alienum

Directory of Open Access Journals (Sweden)

Kolat Philipp

2009-01-01

Full Text Available Abstract We report an unusual case of an aortic type A dissection with a corpus alienum which compresses the right ventricle. The patient successfully underwent an aortic root replacement in deep hypothermia with re-implantation of the coronary arteries using a modified Bentall procedure and the resection of the corpus alienum. Intraoperative finding reveals 3 greatly adhered gauze compresses, which were most likely forgotten in the operation 34 years ago.
What Does Corpus Linguistics Have to Offer to Language Assessment?

Science.gov (United States)

Xi, Xiaoming

2017-01-01

In recent years, continuing advances in technology have increased the capacity to automate the extraction of a range of linguistic features of texts and thus have provided the impetus for the substantial growth of corpus linguistics. While corpus linguistic tools and methods have been used extensively in second language learning research, they…
English Writing Teaching Model Dependent on Computer Network Corpus Drive Model

Directory of Open Access Journals (Sweden)

Shi Lei

2018-03-01

Full Text Available At present, the mainstream lexicalized English writing methods take only the corpus dependence between words into consideration, without introducing the corpus collocation and other issues. “Drive” is a relatively essential feature of words. And once the drive structure of a word is determined, it will be relatively clear what kinds of words to collocate with, hence the structure of the sentence can be derived relatively directly. In this paper, the English writing model that relies on the computer network corpus drive model is put forward. In this model, rich English corpus is introduced in the decomposition of the rules and the calculation of the probability, which includes not only the corpus dependence information, but also the drive structure and other corpus collocation information. Improved computer network corpus drive model is used to carry out the English writing teaching experiment. The experimental results show that the precision and the recall rate are 88.76% and 87.43%, respectively. The F value of the comprehensive index is improved by 6.65% compared with the Collins headword driven English modes of writing.
Human Rights Texts: Converting Human Rights Primary Source Documents into Data.

Science.gov (United States)

Fariss, Christopher J; Linder, Fridolin J; Jones, Zachary M; Crabtree, Charles D; Biek, Megan A; Ross, Ana-Sophia M; Kaur, Taranamol; Tsai, Michael

2015-01-01

We introduce and make publicly available a large corpus of digitized primary source human rights documents which are published annually by monitoring agencies that include Amnesty International, Human Rights Watch, the Lawyers Committee for Human Rights, and the United States Department of State. In addition to the digitized text, we also make available and describe document-term matrices, which are datasets that systematically organize the word counts from each unique document by each unique term within the corpus of human rights documents. To contextualize the importance of this corpus, we describe the development of coding procedures in the human rights community and several existing categorical indicators that have been created by human coding of the human rights documents contained in the corpus. We then discuss how the new human rights corpus and the existing human rights datasets can be used with a variety of statistical analyses and machine learning algorithms to help scholars understand how human rights practices and reporting have evolved over time. We close with a discussion of our plans for dataset maintenance, updating, and availability.
Impact of Metadata on Full-text Information Retrieval Performance: An Experimental Research on a Small Scale Turkish Corpus

Directory of Open Access Journals (Sweden)

Çağdaş Çapkın

2016-12-01

Full Text Available Information institutions use text-based information retrieval systems to store, index and retrieve metadata, full-text, or both metadata and full-text (hybrid contents. The aim of this research was to evaluate impact of these contents on information retrieval performance. For this purpose, metadata (MIR, full-text (FIR and hybrid (HIR content information retrieval systems were developed with default Lucene information retrieval model for a small scale Turkish corpus. In order to evaluate performance of this three systems, “precision - recall” and “normalized recall” tests were conducted. Experimental findings showed that there were no significant differences between MIR and FIR in mean average precision (MAP performance. On the other hand, MAP performance of HIR was significantly higher in comparison to MIR and FIR. When information retrieval performance was evaluated as user-centered, the “normalized recall” performances of MIR and HIR were significantly higher than FIR. Additionally, there were no significant differences between the systems in retrieved relevant document means. Processing different types of contents such as metadata and full-text had some advantages and disadvantages for information retrieval systems in terms of term management. The advantages brought together in hybrid content processing (HIR and information retrieval performance improved.
THE CASE FOR VERB-ADJECTIVE COLLOCATIONS: CORPUS-BASED ANALYSIS AND LEXICOGRAPHICAL TREATMENT

Directory of Open Access Journals (Sweden)

Moisés Almela

2011-10-01

Full Text Available This article explores a type of co-occurrence pattern which cannot be adequately described by existing models of collocation, and for which combinatory dictionaries have yet failed to provide sufficient information. The phenomenon of “oblique inter-collocation”, as I propose to call it, is characterised by a concatenation of syntagmatic preferences which partially contravenes the habitual grammatical order of semantic selection. In particular, I will examine some of the effects which the verb cause exerts on the distribution of attributive adjectives in the context of specific noun classes. The procedure for detecting and describing patterns of oblique inter-collocation is illustrated by means of SketchEngine corpus query tools. Based on the data extracted from a large-scale corpus, this paper carries out a critical analysis of the micro-structure in Oxford Collocations Dictionary.
Metaphor and Corpus Linguistics Metáfora e linguística de corpus

Directory of Open Access Journals (Sweden)

Tony Berber Sardinha

2011-01-01

Full Text Available In this paper, I look at four different aspects of metaphor research from a corpus linguistic perspective, namely: (1 the lexicogrammar of metaphors, which refers to the patterning of linguistic metaphor revealed by corpus analysis; (2 metaphor probabilities, which is a facet of metaphor that emerges from frequency-based studies of metaphor; (3 dimensions of metaphor variation, or the search for systematic parameters of variation in metaphor use across different registers; and (4 automated metaphor retrieval, which relates to the development of software to help identify metaphors in corpora. I argue that these four aspects are interrelated, and that advances in one of them can drive changes in the others.Neste artigo discuto quarto aspectos da pesquisa sobre metáfora do ponto de vista da linguística de corpus: (1 a lexicogramática das metáforas, que se refere aos padrões da metáfora linguística revelados pela análise de corpus; (2 probabilidades metafóricas, que é uma faceta da metáfora que emerge a partir dos estudos relacionados à freqüência de metáforas; (3 dimensões da variação de metáforas, ou a busca por parâmetros sistemáticos de variação de uso de metáfora em diferentes gêneros; e (4 captura automática de metáfora, que está relacionada ao desenvolvimento de softwares que auxiliam na identificação de metáforas em corpora. I defendo que esses quatro aspectos são interrelacionados, e que progressos em um deles podem acarretar mudanças nos outros.
ANR Corpus architecturae religiosae europeae [CARE]saec. IV-X

Directory of Open Access Journals (Sweden)

Christian Sapin

2010-10-01

Full Text Available Le projet ANR «Corpus des monuments religieux antérieurs à l’an Mil» [Corpus architecturae religiosae europeae/CARE – IV-X saec.] a débuté en janvier 2008. Il représente l’apport de la France à un programme international, initié en 2002 par l’IRCLAMA de Zagreb (Croatie . Ce corpus a pour objectif de recenser les édifices religieux d’Europe entre le IVe siècle et le tout début du XIe siècle. Il regroupe déjà l’Italie, l’Espagne, la Croatie, l’Europe centrale et demain, probablement, l’Irlande...
Experimental Polish-Lithuanian Corpus with the Semantic Annotation Elements

Directory of Open Access Journals (Sweden)

Danuta Roszko

2015-06-01

Full Text Available Experimental Polish-Lithuanian Corpus with the Semantic Annotation Elements In the article the authors present the experimental Polish-Lithuanian corpus (ECorpPL-LT formed for the idea of Polish-Lithuanian theoretical contrastive studies, a Polish-Lithuanian electronic dictionary, and as help for a sworn translator. The semantic annotation being brought into ECorpPL-LT is extremely useful in Polish-Lithuanian contrastive studies, and also proves helpful in translation work.
Experimental model of human corpus cavernosum smooth muscle relaxation

Directory of Open Access Journals (Sweden)

Rommel P. Regadas

2010-08-01

Full Text Available PURPOSE: To describe a technique for en bloc harvesting of the corpus cavernosum, cavernous artery and urethra from transplant organ donors and contraction-relaxation experiments with corpus cavernosum smooth muscle. MATERIALS AND METHODS: The corpus cavernosum was dissected to the point of attachment with the crus penis. A 3 cm segment (corpus cavernosum and urethra was isolated and placed in ice-cold sterile transportation buffer. Under magnification, the cavernous artery was dissected. Thus, 2 cm fragments of cavernous artery and corpus cavernosum were obtained. Strips measuring 3 x 3 x 8 mm3 were then mounted vertically in an isolated organ bath device. Contractions were measured isometrically with a Narco-Biosystems force displacement transducer (model F-60, Narco-Biosystems, Houston, TX, USA and recorded on a 4-channel Narco-Biosystems desk model polygraph. RESULTS: Phenylephrine (1µM was used to induce tonic contractions in the corpus cavernosum (3 - 5 g tension and cavernous artery (0.5 - 1g tension until reaching a plateau. After precontraction, smooth muscle relaxants were used to produce relaxation-response curves (10-12M to 10-4 M. Sodium nitroprusside was used as a relaxation control. CONCLUSION: The harvesting technique and the smooth muscle contraction-relaxation model described in this study were shown to be useful instruments in the search for new drugs for the treatment of human erectile dysfunction.
LocText

DEFF Research Database (Denmark)

Cejuela, Juan Miguel; Vinchurkar, Shrikant; Goldberg, Tatyana

2018-01-01

trees and was trained and evaluated on a newly improved LocTextCorpus. Combined with an automatic named-entity recognizer, LocText achieved high precision (P = 86%±4). After completing development, we mined the latest research publications for three organisms: human (Homo sapiens), budding yeast...
Textual, Genre and Social Features of Spoken Grammar: A Corpus-Based Approach

Directory of Open Access Journals (Sweden)

Carmen Pérez-Llantada

2009-02-01

Full Text Available This paper describes a corpus-based approach to teaching and learning spoken grammar for English for Academic Purposes with reference to Bhatia’s (2002 multi-perspective model for discourse analysis: a textual perspective, a genre perspective and a social perspective. From a textual perspective, corpus-informed instruction helps students identify grammar items through statistical frequencies, collocational patterns, context-sensitive meanings and discoursal uses of words. From a genre perspective, corpus observation provides students with exposure to recurrent lexico-grammatical patterns across different academic text types (genres. From a social perspective, corpus models can be used to raise learners’ awareness of how speakers’ different discourse roles, discourse privileges and power statuses are enacted in their grammar choices. The paper describes corpus-based instructional procedures, gives samples of learners’ linguistic output, and provides comments on the students’ response to this method of instruction. Data resulting from the assessment process and student production suggest that corpus-informed instruction grounded in Bhatia’s multi-perspective model can constitute a pedagogical approach in order to i obtain positive student responses from input and authentic samples of grammar use, ii help students identify and understand the textual, genre and social aspects of grammar in real contexts of use, and therefore iii help develop students’ ability to use grammar accurately and appropriately.

Schweizer Text Korpus – Theoretische Grundlagen, Korpusdesign und Abfragemöglichkeiten

Directory of Open Access Journals (Sweden)

Bickel, Hans

2009-01-01

Full Text Available The SWISS TEXT CORPUS (CHTK has made it its goal to extensively document the German language of the 20th century in Switzerland. In this way, and in its parallel function as a sub-corpus of the Corpus C4, that will consist of 20 million text words (tokens each from Germany, Austria, Italy/South Tirol and, as already said, Switzerland, it represents a classical reference corpus both for the standard German language in Switzerland as well as in the entire German-speaking area of Western Europe. A reference corpus should meet the requirement of comprehensively depicting the central repertoire of a language, i.e. the generally used vocabulary of this language, which is why questions of corpus structure and general planning (corpus design play a decisive role (cf. Lemnitzer/Zinsmeister (2006: 106, where the type of the reference corpus is contrasted with the special corpus. Four and a half years after the start of the project, the SWISS TEXT CORPUS was made available to the general public in April 2009, as a research instrument. The following article outlines in brief the history of this research project and deals with fundamental and specific decisions that had to be made in the design of such a reference corpus, and with how the CHTK is compiled. Together with a concluding overview of some retrieval and analysis options offered by the CHTK, this article also provides an overview of the potential of this new research instrument and supplies the background knowledge required to work with the CHTK. For reasons of space, the methods of working, the corpus-driven approaches, cannot be thematised here (cf. Bubenhofer 2008, 2006.
Edition des Corpus areopagiticum slavicum

Directory of Open Access Journals (Sweden)

Dieter Fahl

2005-12-01

Full Text Available An Edition of the Corpus areopagiticum slavicum In the fourteenth century, the monk Isaiah of the holy Mount Athos translated the writings of pseudo-Dionysius the Areopagite (c. end of the 5th century, core texts for Eastern and Western European theological and philosophical thought, from Greek into Church Slavonic. This first Slavic translation of Dionysius’ oeuvre (“De Coelesti Hierarchia,” “De Ecclesiastica Hierarchia,” “De Divinis Nominibus,” “De Mystica Theologia,” the epistles and scholia, which played a significant role in the development of Slavic culture, Orthodox Slavic socio-political theory and praxis, is still central to the study of Slavia Orthodoxa. A working group of German and Russian scholars has completed an edition of the translator’s Church Slavonic autograph with an en face reconstruction of the Greek text used by the translator and philological commentary. A Church Slavonic-Greek and Greek-Church Slavonic dictionary of this edition, currently in preparation, plans to make the terminology used in this influential translation accessible to interdisciplinary researchers. For the first time, the Church Slavonic lexica of this corpus, a substantial part of which was coined by the translator, will be registered in an index of words and forms.
Rheumatic diseases in the Corpus Hippocraticum

Directory of Open Access Journals (Sweden)

G. Squillace

2011-09-01

Full Text Available Medecine of V and IV centuries B.C. attested in the Corpus Hippocraticum ascribes all diseases to the rheuma, i.e. the flux of humours into the body. This flux produces not only the rise of cold, hoarsness, cough, reddenings, dropsy, but also arthritis, sciatica, gout.
Identifying issue frames in text.

Directory of Open Access Journals (Sweden)

Eyal Sagi

Full Text Available Framing, the effect of context on cognitive processes, is a prominent topic of research in psychology and public opinion research. Research on framing has traditionally relied on controlled experiments and manually annotated document collections. In this paper we present a method that allows for quantifying the relative strengths of competing linguistic frames based on corpus analysis. This method requires little human intervention and can therefore be efficiently applied to large bodies of text. We demonstrate its effectiveness by tracking changes in the framing of terror over time and comparing the framing of abortion by Democrats and Republicans in the U.S.
Deep Belief Networks Based Toponym Recognition for Chinese Text

Directory of Open Access Journals (Sweden)

Shu Wang

2018-06-01

Full Text Available In Geographical Information Systems, geo-coding is used for the task of mapping from implicitly geo-referenced data to explicitly geo-referenced coordinates. At present, an enormous amount of implicitly geo-referenced information is hidden in unstructured text, e.g., Wikipedia, social data and news. Toponym recognition is the foundation of mining this useful geo-referenced information by identifying words as toponyms in text. In this paper, we propose an adapted toponym recognition approach based on deep belief network (DBN by exploring two key issues: word representation and model interpretation. A Skip-Gram model is used in the word representation process to represent words with contextual information that are ignored by current word representation models. We then determine the core hyper-parameters of the DBN model by illustrating the relationship between the performance and the hyper-parameters, e.g., vector dimensionality, DBN structures and probability thresholds. The experiments evaluate the performance of the Skip-Gram model implemented by the Word2Vec open-source tool, determine stable hyper-parameters and compare our approach with a conditional random field (CRF based approach. The experimental results show that the DBN model outperforms the CRF model with smaller corpus. When the corpus size is large enough, their statistical metrics become approaching. However, their recognition results express differences and complementarity on different kinds of toponyms. More importantly, combining their results can directly improve the performance of toponym recognition relative to their individual performances. It seems that the scale of the corpus has an obvious effect on the performance of toponym recognition. Generally, there is no adequate tagged corpus on specific toponym recognition tasks, especially in the era of Big Data. In conclusion, we believe that the DBN-based approach is a promising and powerful method to extract geo
Corpus-based Studies on Nursing Textbooks

Directory of Open Access Journals (Sweden)

Alif Fairus Nor Mohamad

2013-07-01

Full Text Available English for Specific Purposes (ESP educators often face dilemma in deciding what lexical items to teach their students. In the field of English for Nursing Purposes (ENP, there is no exception on this issue as well. Only by analyzing the nursing corpus made up of essential core textbooks that can provide better insights and guide to both nursing students and educators. This research aims to highlight the 2,000 most frequently used nursing words across the core textbooks of nursing and to profile the types of ‘low frequency’ lexis which comprise the nursing corpus in terms of the General Service List (GSL and Academic Word List (AWL lexis coverage. By knowing the frequently used nursing words would further reduce students’ reading deficiency if the students use the 2000-word list.
Tone realisation in a Yoruba speech recognition corpus

CSIR Research Space (South Africa)

Van Niekerk, D

2012-05-01

Full Text Available development. Extracted contours are processed and analysed statistically to describe acoustic properties in different tonal contexts. The authors demonstrate how features useful for tone recognition or synthesis can be successfully extracted from a corpus...
Corpus Linguistics, Network Analysis and Co-occurrence Matrices Corpus Linguistics, Network Analysis and Co-occurrence Matrices

Directory of Open Access Journals (Sweden)

Keith Stuart

2009-12-01

Full Text Available This article describes research undertaken in order to design a methodology for the reticular representation of knowledge of a specific discourse community. To achieve this goal, a representative corpus of the scientific production of the members of this discourse community (Universidad Politécnica de Valencia, UPV was created. The article presents the practical analysis (frequency, keyword, collocation and cluster analysis that was carried out in the initial phases of the study aimed at establishing the theoretical and practical background and framework for our matrix and network analysis of the scientific discourse of the UPV. In the methodology section, the processes that have allowed us to extract from the corpus the linguistic elements needed to develop co-occurrence matrices, as well as the computer tools used in the research, are described. From these co-occurrence matrices, semantic networks of subject and discipline knowledge were generated. Finally, based on the results obtained, we suggest that it may be viable to extract and to represent the intellectual capital of an academic institution using corpus linguistics methods in combination with the formulations of network theory.En este artículo describimos la investigación que se ha desarrollado en el diseño de una metodología para la representación reticular del conocimiento que se genera en el seno de una institución a partir de un corpus representativo de la producción científica de los integrantes de dicha comunidad discursiva, la Universidad Politécnica de Valencia.. Para ello, presentamos las acciones que se realizaron en las fases iniciales del estudio encaminadas a establecer el marco teórico y práctico en el que se inscribe nuestro análisis. En la sección de metodología se describen las herramientas informáticas utilizadas, así como los procesos que nos permitieron disponer de aquellos elementos presentes en el corpus, que nos llevarían al desarrollo de
Corpus methods and their reflection in linguistic theories of the 20th century

Directory of Open Access Journals (Sweden)

Simon Krek

2013-05-01

Full Text Available In the 20th century structuralism established itself as the central linguistic theory, in the first half mainly through its originator Ferdinand de Saussure, and in the second half with the figure of Noam Chomsky. The latter consistently refused to acknowledge analysis of extensive quantity of texts as a valuable method, and favoured linguistic intuition of a native speaker instead. In parallel with structuralism other trends in linguistics emerged which pointed to the inadequateness of the prevailing linguistic paradigm and to theoretical insights which were only possible after the systematic analysis of large quantities of texts. The paper discusses some of the dilemmas stemming from this dichotomy and places corpus linguistics in a broader linguistic context.
Form of the male and female corpus callosum internal organization at the mature age

Directory of Open Access Journals (Sweden)

Юрий Петрович Костиленко

2016-04-01

Full Text Available Aim: to study the special features of the male and female corpus callosum internal organization at the mature age.Materials and methods: the total preparations of the male and female corpus callosum (10 preparation of each sex at 45–60 years old were used as the material. The given preparations were used to get from it the plate cuts in the two mutually perpendicular planes with 2 mm. thick. Then the received tissue plates of the corpus callosum underwent plastination in the epoxy. Then the preparations were extracted from the non-polymerized epoxy and placed on the polyethylene film that was covered with the other film of the same size. Further this stratified block was placed amid the two glasses of the equal size that shrunk together by placing the small load on it. After the complete polymerization the received epoxy plates with the corpus callosum tissue contained in it underwent the gentle grinding and the accurate polish and as the result was obtained the surface denudation of its tissue structures that were colored with the 1 % solution of blue methylene for 1% borax solution.Results of research: at the study of the corpus callosum plastinated cuts in saggital plane was revealed that the transverse platen-form elevations of its higher surface are the cord-form tenias standing out from within and going through the corpus callosum. At its studying in the transverse cut was established that in adults can be separated two types of corpus callosum by its density: the dense one and disperse one.At the large increases of the binocular loupe (microscope MBS-9 can be seen the gaps between the adjacent commissural cords. Within it can be detected the blood vessels. On the transverse cut of commissural cords in its depth are revealed the thinnest streaks which totality consists of the two alternate dark and light lines that form the layered striation. Among the series of the light lines are visible the interlayer that separate the whole depth of
A Corpus-based Study of English Vocabulary in Art Research Articles

Directory of Open Access Journals (Sweden)

Ping Wang

2017-09-01

Full Text Available The learning of English as a foreign language is an additional burden for art majors. This study aimed to examine high frequency words in art research articles to improve the efficiency of art majors’ English learning, especially their academic reading and writing. For this aim, the study built a corpus, analyzed data from art research articles and compared data with three base word lists. We found that the General Service List (GSL and the Academic Word List (AWL had a high coverage in our corpus, and there was a different high frequency word order in the Art Research Article Corpus (ARAC. These findings provide some implications for teaching English for art majors.
76 FR 18395 - Safety Zone; Naval Air Station Corpus Christi Air Show, Oso Bay, Corpus Christi, TX

Science.gov (United States)

2011-04-04

...-AA00 Safety Zone; Naval Air Station Corpus Christi Air Show, Oso Bay, Corpus Christi, TX AGENCY: Coast... zone on the navigable waters of Oso Bay in Corpus Christi, Texas in support of the 2011 Naval Air... entities and very few recreational fisherman utilize this section of Oso Bay, the restriction of vessel...
Using Edit Distance to Analyse Errors in a Natural Language to Logic Translation Corpus

Science.gov (United States)

Barker-Plummer, Dave; Dale, Robert; Cox, Richard; Romanczuk, Alex

2012-01-01

We have assembled a large corpus of student submissions to an automatic grading system, where the subject matter involves the translation of natural language sentences into propositional logic. Of the 2.3 million translation instances in the corpus, 286,000 (approximately 12%) are categorized as being in error. We want to understand the nature of…
Cell line name recognition in support of the identification of synthetic lethality in cancer from text

Science.gov (United States)

Kaewphan, Suwisa; Van Landeghem, Sofie; Ohta, Tomoko; Van de Peer, Yves; Ginter, Filip; Pyysalo, Sampo

2016-01-01

Motivation: The recognition and normalization of cell line names in text is an important task in biomedical text mining research, facilitating for instance the identification of synthetically lethal genes from the literature. While several tools have previously been developed to address cell line recognition, it is unclear whether available systems can perform sufficiently well in realistic and broad-coverage applications such as extracting synthetically lethal genes from the cancer literature. In this study, we revisit the cell line name recognition task, evaluating both available systems and newly introduced methods on various resources to obtain a reliable tagger not tied to any specific subdomain. In support of this task, we introduce two text collections manually annotated for cell line names: the broad-coverage corpus Gellus and CLL, a focused target domain corpus. Results: We find that the best performance is achieved using NERsuite, a machine learning system based on Conditional Random Fields, trained on the Gellus corpus and supported with a dictionary of cell line names. The system achieves an F-score of 88.46% on the test set of Gellus and 85.98% on the independently annotated CLL corpus. It was further applied at large scale to 24 302 102 unannotated articles, resulting in the identification of 5 181 342 cell line mentions, normalized to 11 755 unique cell line database identifiers. Availability and implementation: The manually annotated datasets, the cell line dictionary, derived corpora, NERsuite models and the results of the large-scale run on unannotated texts are available under open licenses at http://turkunlp.github.io/Cell-line-recognition/. Contact: sukaew@utu.fi PMID:26428294
Jointly learning word embeddings using a corpus and a knowledge base

Science.gov (United States)

Bollegala, Danushka; Maehara, Takanori; Kawarabayashi, Ken-ichi

2018-01-01

Methods for representing the meaning of words in vector spaces purely using the information distributed in text corpora have proved to be very valuable in various text mining and natural language processing (NLP) tasks. However, these methods still disregard the valuable semantic relational structure between words in co-occurring contexts. These beneficial semantic relational structures are contained in manually-created knowledge bases (KBs) such as ontologies and semantic lexicons, where the meanings of words are represented by defining the various relationships that exist among those words. We combine the knowledge in both a corpus and a KB to learn better word embeddings. Specifically, we propose a joint word representation learning method that uses the knowledge in the KBs, and simultaneously predicts the co-occurrences of two words in a corpus context. In particular, we use the corpus to define our objective function subject to the relational constrains derived from the KB. We further utilise the corpus co-occurrence statistics to propose two novel approaches, Nearest Neighbour Expansion (NNE) and Hedged Nearest Neighbour Expansion (HNE), that dynamically expand the KB and therefore derive more constraints that guide the optimisation process. Our experimental results over a wide-range of benchmark tasks demonstrate that the proposed method statistically significantly improves the accuracy of the word embeddings learnt. It outperforms a corpus-only baseline and reports an improvement of a number of previously proposed methods that incorporate corpora and KBs in both semantic similarity prediction and word analogy detection tasks. PMID:29529052
Developing a corpus of spoken language variability

Science.gov (United States)

Carmichael, Lesley; Wright, Richard; Wassink, Alicia Beckford

2003-10-01

We are developing a novel, searchable corpus as a research tool for investigating phonetic and phonological phenomena across various speech styles. Five speech styles have been well studied independently in previous work: reduced (casual), careful (hyperarticulated), citation (reading), Lombard effect (speech in noise), and ``motherese'' (child-directed speech). Few studies to date have collected a wide range of styles from a single set of speakers, and fewer yet have provided publicly available corpora. The pilot corpus includes recordings of (1) a set of speakers participating in a variety of tasks designed to elicit the five speech styles, and (2) casual peer conversations and wordlists to illustrate regional vowels. The data include high-quality recordings and time-aligned transcriptions linked to text files that can be queried. Initial measures drawn from the database provide comparison across speech styles along the following acoustic dimensions: MLU (changes in unit duration); relative intra-speaker intensity changes (mean and dynamic range); and intra-speaker pitch values (minimum, maximum, mean, range). The corpus design will allow for a variety of analyses requiring control of demographic and style factors, including hyperarticulation variety, disfluencies, intonation, discourse analysis, and detailed spectral measures.
Identifying biological concepts from a protein-related corpus with a probabilistic topic model

Directory of Open Access Journals (Sweden)

Lu Xinghua

2006-02-01

Full Text Available Abstract Background Biomedical literature, e.g., MEDLINE, contains a wealth of knowledge regarding functions of proteins. Major recurring biological concepts within such text corpora represent the domains of this body of knowledge. The goal of this research is to identify the major biological topics/concepts from a corpus of protein-related MEDLINE© titles and abstracts by applying a probabilistic topic model. Results The latent Dirichlet allocation (LDA model was applied to the corpus. Based on the Bayesian model selection, 300 major topics were extracted from the corpus. The majority of identified topics/concepts was found to be semantically coherent and most represented biological objects or concepts. The identified topics/concepts were further mapped to the controlled vocabulary of the Gene Ontology (GO terms based on mutual information. Conclusion The major and recurring biological concepts within a collection of MEDLINE documents can be extracted by the LDA model. The identified topics/concepts provide parsimonious and semantically-enriched representation of the texts in a semantic space with reduced dimensionality and can be used to index text.
Corpus vitreum, retina og chorioidea biopsi

DEFF Research Database (Denmark)

Scherfig, Erik Christian Høegh

2002-01-01

oftalmology, biopsy, choroid, corpus vitreum, retina, malignant melanoma, biopsy technic, retinoblastoma......oftalmology, biopsy, choroid, corpus vitreum, retina, malignant melanoma, biopsy technic, retinoblastoma...
An odd couple – Corpus frequency and look-up frequency: what relationship?

Directory of Open Access Journals (Sweden)

Lars Trap-Jensen

2014-12-01

Full Text Available In this paper, we investigate the relationship between log file records and corpus frequency. The study was motivated by practical considerations of how best to keep an already existing corpus-based dictionary updated. Should the next word in the dictionary be the one that follows next on a list of declining corpus frequency? Or the one that users most frequently look up but don’t find? In order to establish manageable criteria, we analysed log files for The Danish Dictionary from 2009 to 2012 and compared the list of most popular words looked up by the users with the frequency of the same words in the corpus underlying The Danish Dictionary. The users’ actual search behaviour was analysed in order to find answers to questions such as these: Are there words which are never looked up? If so, can we say something meaningful about their corpus frequency patterns – do they belong to particular parts of speech, are they particularly frequent or infrequent, could it even be that the pattern is cumulative, in such a way that a particular threshold can be identified? Ultimately, the question is whether it makes sense to use corpus frequency as a criterion for lemma selection.
Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution.

Science.gov (United States)

Pechenick, Eitan Adam; Danforth, Christopher M; Dodds, Peter Sheridan

2015-01-01

It is tempting to treat frequency trends from the Google Books data sets as indicators of the "true" popularity of various words and phrases. Doing so allows us to draw quantitatively strong conclusions about the evolution of cultural perception of a given topic, such as time or gender. However, the Google Books corpus suffers from a number of limitations which make it an obscure mask of cultural popularity. A primary issue is that the corpus is in effect a library, containing one of each book. A single, prolific author is thereby able to noticeably insert new phrases into the Google Books lexicon, whether the author is widely read or not. With this understood, the Google Books corpus remains an important data set to be considered more lexicon-like than text-like. Here, we show that a distinct problematic feature arises from the inclusion of scientific texts, which have become an increasingly substantive portion of the corpus throughout the 1900 s. The result is a surge of phrases typical to academic articles but less common in general, such as references to time in the form of citations. We use information theoretic methods to highlight these dynamics by examining and comparing major contributions via a divergence measure of English data sets between decades in the period 1800-2000. We find that only the English Fiction data set from the second version of the corpus is not heavily affected by professional texts. Overall, our findings call into question the vast majority of existing claims drawn from the Google Books corpus, and point to the need to fully characterize the dynamics of the corpus before using these data sets to draw broad conclusions about cultural and linguistic evolution.

How textbooks (and learners get it wrong: A corpus study of modal auxiliary verbs

Directory of Open Access Journals (Sweden)

Hayo Reinders

2013-02-01

Full Text Available Many elements contribute to the relative difficulty in acquiring specific aspects of English as a foreign language (Goldschneider & DeKeyser, 2001. Modal auxiliary verbs (e.g. could, might, are examples of a structure that is difficult for many learners. Not only are they particularly complex semantically, but especially in the Malaysian context reported on in this paper, there is no direct equivalent in the studentsâ L1. In other words, they are a good example of a structure for which successful acquisition depends very much on the quality of the input and instruction students receive. This paper reports on analysis of a 230,000 word corpus of Malaysian English textbooks, in which it was found that the relative frequency of the modals did not match that found in native speaker corpora such as the BNC. We compared the textbook corpus with a learner corpus of Malaysian form 4 learners and found no direct relationship between frequency of presentation of target forms in the textbooks and their use by students in their writing. We also found a very large percentage of errors in studentsâ writing. We suggest a number of possible reasons for these findings and discuss the implications for materials developers and teachers.
Polyethylene glycol restores axonal conduction after corpus callosum transection

Directory of Open Access Journals (Sweden)

Ravinder Bamba

2017-01-01

Full Text Available Polyethylene glycol (PEG has been shown to restore axonal continuity after peripheral nerve transection in animal models. We hypothesized that PEG can also restore axonal continuity in the central nervous system. In this current experiment, coronal sectioning of the brains of Sprague-Dawley rats was performed after animal sacrifice. 3Brain high-resolution microelectrode arrays (MEA were used to measure mean firing rate (MFR and peak amplitude across the corpus callosum of the ex-vivo brain slices. The corpus callosum was subsequently transected and repeated measurements were performed. The cut ends of the corpus callosum were still apposite at this time. A PEG solution was applied to the injury site and repeated measurements were performed. MEA measurements showed that PEG was capable of restoring electrophysiology signaling after transection of central nerves. Before injury, the average MFRs at the ipsilateral, midline, and contralateral corpus callosum were 0.76, 0.66, and 0.65 spikes/second, respectively, and the average peak amplitudes were 69.79, 58.68, and 49.60 μV, respectively. After injury, the average MFRs were 0.71, 0.14, and 0.25 spikes/second, respectively and peak amplitudes were 52.11, 8.98, and 16.09 μV, respectively. After application of PEG, there were spikes in MFR and peak amplitude at the injury site and contralaterally. The average MFRs were 0.75, 0.55, and 0.47 spikes/second at the ipsilateral, midline, and contralateral corpus callosum, respectively and peak amplitudes were 59.44, 45.33, 40.02 μV, respectively. There were statistically differences in the average MFRs and peak amplitudes between the midline and non-midline corpus callosum groups (P < 0.01, P < 0.05. These findings suggest that PEG restores axonal conduction between severed central nerves, potentially representing axonal fusion.
Chinese legal texts – Quantitative Description

Directory of Open Access Journals (Sweden)

Ľuboš GAJDOŠ

2017-06-01

Full Text Available The aim of the paper is to provide a quantitative description of legal Chinese. This study adopts the approach of corpus-based analyses and it shows basic statistical parameters of legal texts in Chinese, namely the length of a sentence, the proportion of part of speech etc. The research is conducted on the Chinese monolingual corpus Hanku. The paper also discusses the issues of statistical data processing from various corpora, e.g. the tokenisation and part of speech tagging and their relevance to study of registers variation.
MORPOHOLOGICAL POS TAGGING IN ORAL LANGUAGE CORPUS: CHALLENGES FOR AELIUS

Directory of Open Access Journals (Sweden)

Gabriel de Ávila Othero

2014-12-01

Full Text Available In this paper, we present the results of our work with automatic morphological annotation of excerpts from a corpus of spoken language – belonging to the VARSUL project – using the free morphosyntatic tagger Aelius. We present 20 texts containing 154,530 words, annotated automatically and corrected manually. This paper presents the tagger Aelius and our work of manual review of the texts, as well as our suggestions for improvements of the tool, concerning aspects of oral texts. We verify the performance of morphosyntactic tagging a spoken language corpus, an unprecedented challenge for the tagger. Based on the errors of the tagger, we try to infer certain patterns of annotation to overcome limitations presented by the program, and we propose suggestions for implementations in order to allow Aelius to tag spoken language corpora in a more effective way, specially treating cases such as interjections, apheresis, onomatopeia and conversational markers.
One hundred million years of interhemispheric communication: the history of the corpus callosum

Directory of Open Access Journals (Sweden)

Aboitiz F.

2003-01-01

Full Text Available Analysis of regional corpus callosum fiber composition reveals that callosal regions connecting primary and secondary sensory areas tend to have higher proportions of coarse-diameter, highly myelinated fibers than callosal regions connecting so-called higher-order areas. This suggests that in primary/secondary sensory areas there are strong timing constraints for interhemispheric communication, which may be related to the process of midline fusion of the two sensory hemifields across the hemispheres. We postulate that the evolutionary origin of the corpus callosum in placental mammals is related to the mechanism of midline fusion in the sensory cortices, which only in mammals receive a topographically organized representation of the sensory surfaces. The early corpus callosum may have also served as a substrate for growth of fibers connecting higher-order areas, which possibly participated in the propagation of neuronal ensembles of synchronized activity between the hemispheres. However, as brains became much larger, the increasingly longer interhemispheric distance may have worked as a constraint for efficient callosal transmission. Callosal fiber composition tends to be quite uniform across species with different brain sizes, suggesting that the delay in callosal transmission is longer in bigger brains. There is only a small subset of large-diameter callosal fibers whose size increases with increasing interhemispheric distance. These limitations in interhemispheric connectivity may have favored the development of brain lateralization in some species like humans. "...if the currently received statements are correct, the appearance of the corpus callosum in the placental mammals is the greatest and most sudden modification exhibited by the brain in the whole series of vertebrated animals..." T.H. Huxley (1.
Visualizing the semantic content of large text databases using text maps

Science.gov (United States)

Combs, Nathan

1993-01-01

A methodology for generating text map representations of the semantic content of text databases is presented. Text maps provide a graphical metaphor for conceptualizing and visualizing the contents and data interrelationships of large text databases. Described are a set of experiments conducted against the TIPSTER corpora of Wall Street Journal articles. These experiments provide an introduction to current work in the representation and visualization of documents by way of their semantic content.
Towards proper name generation : A corpus analysis

NARCIS (Netherlands)

Castro Ferreira, Thiago; Wubben, Sander; Krahmer, Emiel

We introduce a corpus for the study of proper name generation. The corpus consists of proper name references to people in webpages, extracted from the Wikilinks corpus. In our analyses, we aim to identify the different ways, in terms of length and form, in which a proper names are produced
TwiMed: Twitter and PubMed Comparable Corpus of Drugs, Diseases, Symptoms, and Their Relations.

Science.gov (United States)

Alvaro, Nestor; Miyao, Yusuke; Collier, Nigel

2017-05-03

Work on pharmacovigilance systems using texts from PubMed and Twitter typically target at different elements and use different annotation guidelines resulting in a scenario where there is no comparable set of documents from both Twitter and PubMed annotated in the same manner. This study aimed to provide a comparable corpus of texts from PubMed and Twitter that can be used to study drug reports from these two sources of information, allowing researchers in the area of pharmacovigilance using natural language processing (NLP) to perform experiments to better understand the similarities and differences between drug reports in Twitter and PubMed. We produced a corpus comprising 1000 tweets and 1000 PubMed sentences selected using the same strategy and annotated at entity level by the same experts (pharmacists) using the same set of guidelines. The resulting corpus, annotated by two pharmacists, comprises semantically correct annotations for a set of drugs, diseases, and symptoms. This corpus contains the annotations for 3144 entities, 2749 relations, and 5003 attributes. We present a corpus that is unique in its characteristics as this is the first corpus for pharmacovigilance curated from Twitter messages and PubMed sentences using the same data selection and annotation strategies. We believe this corpus will be of particular interest for researchers willing to compare results from pharmacovigilance systems (eg, classifiers and named entity recognition systems) when using data from Twitter and from PubMed. We hope that given the comprehensive set of drug names and the annotated entities and relations, this corpus becomes a standard resource to compare results from different pharmacovigilance studies in the area of NLP. ©Nestor Alvaro, Yusuke Miyao, Nigel Collier. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 03.05.2017.
Vocabulary Practice and Media Representation: A Corpus-Assisted Study of Macroeconomic News

Directory of Open Access Journals (Sweden)

Win-Ping Kuo

2015-11-01

Full Text Available This Paper introduces corpus methods and its application to media text analysis. The researcher collect 1,363 macroeconomic reports from three major Taiwanese newspapers, including Apple Daily, The Liberty Times, and The United Daily as the copra. Research shows that corpus-assisted media text analysis enables researcher to calculate frequency of vocabulary and analyze lexical structure of the text via concordance and collocation. By using macroeconomic news as the study case, this paper also found that news reports tend to simplify GDP number as a mission, prefer attributing local economic performance as a systematic problem of global economy, and treat economy as a manageable task by attributing it to the government. All these ideologies and values are reflected on vocabularies and discursive practice of media.
KoralQuery -- A General Corpus Query Protocol

DEFF Research Database (Denmark)

Bingel, Joachim; Diewald, Nils

2015-01-01

. In this paper, we present KoralQuery, a JSON-LD based general corpus query protocol, aiming to be independent of particular QLs, tasks and corpus formats. In addition to describing the system of types and operations that KoralQuery is built on, we exemplify the representation of corpus queries in the serialized...
Cholesterol transport and steroidogenesis by the corpus luteum

Directory of Open Access Journals (Sweden)

Christenson Lane K

2003-11-01

Full Text Available Abstract The synthesis of progesterone by the corpus luteum is essential for the establishment and maintenance of early pregnancy. Regulation of luteal steroidogenesis can be broken down into three major events; luteinization (i.e., conversion of an ovulatory follicle, luteal regression, and pregnancy induced luteal maintenance/rescue. While the factors that control these events and dictate the final steroid end products are widely varied among different species, the composition of the corpus luteum (luteinized thecal and granulosa cells and the enzymes and proteins involved in the steroidogenic pathway are relatively similar among all species. The key factors involved in luteal steroidogenesis and several new exciting observations regarding regulation of luteal steroidogenic function are discussed in this review.
Estrogen and oxytocin receptors in the canine corpus luteum during pregnancy and parturition

Directory of Open Access Journals (Sweden)

Gisele Almeida Lima Veiga

2015-02-01

Full Text Available The expression of genes encoding the receptors for estrogen (ERαmRNA and oxytocin (OTRmRNA was studied in the corpus luteum during pregnancy and parturition in dogs. Real-time PCR was performed to quantify the levels of ERαmRNA and OTRmRNA in the corpus luteum of bitches during Early (up to 20 days of gestation, Mid (20 to 40 days and Late Pregnancy (40 to 60 days, and Parturition (first stage of labor. The corpus luteum expressed mRNA for OTR, however ERα mRNA was not detected. There was a reduction of OTR mRNA expression in the corpus luteum from gestational Day 20 onward, which suggests an important role of OTR mRNA in the mechanism of pregnancy recognition in dogs. We concluded that the expression of OTR mRNA in canine corpus luteum vary over time, which support the idea that the sensitivity and response to hormone therapy can vary along the course of pregnancy and labor. Moreover, the canine CL lacks ERα mRNA expression during pregnancy.
Corpus Approaches to Language Ideology

Science.gov (United States)

Vessey, Rachelle

2017-01-01

This paper outlines how corpus linguistics--and more specifically the corpus-assisted discourse studies approach--can add useful dimensions to studies of language ideology. First, it is argued that the identification of words of high, low, and statistically significant frequency can help in the identification and exploration of language ideologies…
Contrast radiographic study of venous drainage of the corpus cavernosum and the corpus spongiosum of the cat penis.

Science.gov (United States)

Amiri, Ali Akbar; Gilanpour, Hassan; Veshkini, Abbas

2014-01-01

The aim of this study was to determine the drainage routes of the corpus cvernosum penis and the corpus spongiosum penis in the cat using contrast cavernosography. Five male cats, 1.5-2.5 years old, weighing between 4.5 and 5.5 kg were investigated. The cats were anesthetized and the root and the proximal part of the penis were exposed by an incision on the perineum reaching the scrotum. Each cat was radiographed in lateral and dorsal recumbency before and during injection of contrast medium into the erectile bodies. The corpus spongiosum penis was injected at the bulb of the penis and the corpus cavernosum penis at the root. Injection of contrast media into the cavernous bodies showed that both the external and internal iliac veins drain the erectile bodies into the caudal vena cava. Drainage from the corpus spongiosum penis was from the bulb for the proximal part and from the glans for the distal part. The corpus cavernosum penis was drained only proximally, from the crura. There was a network of veins above the pelvic symphysis and the drainage of erectile bodies where through various routes into the internal and external iliac veins.
Stemming of Slovenian library science texts

Directory of Open Access Journals (Sweden)

Polona Vilar

2002-01-01

Full Text Available The theme of the article is the preparation of a stemming algorithm for Slovenian library science texts. The procedure consisted of three phases: learning, testing and evaluation.The preparation of the optimal stemmer for Slovenian texts from the field of library science is presented, its testing and comparison with two other stemmers for the Slovenian language: the Popovič stemmer and the Generic stemmer. A corpus of 790.000 words from the field of library science was used for learning. Lists of stems, word endings and stop-words were built. In the testing phase, the component parts of the algorithm were tested on an additional corpus of 167.000 words. In the evaluation phase, a comparison of the three stemmers processing the same word corpus was made. The results of each stemmer were compared with an intellectually prepared control result of the stemming of the corpus. It consisted of groups of semantically connected words with no errors. Understemming was especially monitored – the number of stems for semantically connected words, produced by an algorithm. The results were statistically processed with the Kruskal-Wallis test. The Optimal stemmer produced the best results.It matched best with the reference results and also gave the smallest number of stems for one semantic meaning. The Popovič stemmer followed closely. The Generic stemmer proved to be the least accurate. The procedures described in the thesis can represent a platform for the development of the tools for automatic indexing and retrieval for library science texts in Slovenian language.
Measurement of normal corpus callosum with MRI in Korean adults and morphological change of corpus callosum by grade of hydrocephalus

International Nuclear Information System (INIS)

Song, Dong Hoon; Chang, Seung Kuk; Kim, Jong Deok; Eun, Tchoong Kie; Park, Dong Woo

1995-01-01

To measure the size of normal corpus callosum in each portion using objective and reproducible method with MRI and evaluation of morphological change of corpus callosum by grade of hydrocephalus. Midsagittal T1-weighted MR imaging of the corpus callosum was investigated in 41 volunteers of normal Korean adults and 19 patients with hydrocephalus. Corpus callosum was measured for the anteroposterior length(A), height(B), and the thickness of genu(C), body(D), splenium(E), and the narrowest portion of body(F). And the analysis of morphology and signal intensity of the corpus callosum were also evaluated. Hydrocephalus was graded as mild, moderate, and severe, and comparison of thickness with normal corpus callosum in each portion was done. The mean length and height were 72.3 mm, 28.6 mm in male, and 70.7 mm, 28.9 mm in female. And the mean dimension for C, D, E and F were 13.1 mm, 8 mm, 13.2 mm, 5.2 mm in male, and 12.8 mm, 7.5 mm, 12.3 mm, 5 mm in female. The morphology of normal corpus callosum was 'hook' shaped on midline sagittal T1-weighted image. Narrowing at posterior third portion of body were present on 30 cases(73.2%) and even in thickness of the body in 11 cases(26.8%). The signal intensity of the corpus callosum on midsagittal T1-weighted spin echo image of normal cases was homogeneous hyperintense as compared with cerebral gray matter. In hydrocephalus, A and B were increased and other portions were decreased in thickness. Genu and the narrowest portion of body showed significant difference of thickness according to the grade of hydrocephalus. The mean dimension of all portion of corpus callosum were larger in male than female except for callosal height but not significant statistically with the exception of splenium. Hydrocephalus lead to morphological change of the corpus callosum. Among the portion of corpus callosum, genu and the narrowest portion of the body were thought to be the most sensitive indicators of degree in hydrocephalus
The NCHLT speech corpus of the South African languages

CSIR Research Space (South Africa)

Barnard, E

2014-05-01

Full Text Available The NCHLT speech corpus contains wide-band speech from approximately 200 speakers per language, in each of the eleven of cial languages of South Africa. We describe the design and development processes that were undertaken in order to develop...
New challenges for text mining: mapping between text and manually curated pathways

Science.gov (United States)

Oda, Kanae; Kim, Jin-Dong; Ohta, Tomoko; Okanohara, Daisuke; Matsuzaki, Takuya; Tateisi, Yuka; Tsujii, Jun'ichi

2008-01-01

Background Associating literature with pathways poses new challenges to the Text Mining (TM) community. There are three main challenges to this task: (1) the identification of the mapping position of a specific entity or reaction in a given pathway, (2) the recognition of the causal relationships among multiple reactions, and (3) the formulation and implementation of required inferences based on biological domain knowledge. Results To address these challenges, we constructed new resources to link the text with a model pathway; they are: the GENIA pathway corpus with event annotation and NF-kB pathway. Through their detailed analysis, we address the untapped resource, ‘bio-inference,’ as well as the differences between text and pathway representation. Here, we show the precise comparisons of their representations and the nine classes of ‘bio-inference’ schemes observed in the pathway corpus. Conclusions We believe that the creation of such rich resources and their detailed analysis is the significant first step for accelerating the research of the automatic construction of pathway from text. PMID:18426550
Corpus linguistics, systemic functional grammar and literary meaning: a critical analysis of harry potter and the philosopher’s stone Corpus linguistics, systemic functional grammar and literary meaning: a critical analysis of harry potter and the philosopher’s stone

Directory of Open Access Journals (Sweden)

Andrew Goatly

2008-04-01

Full Text Available The research reported in this paper has two aims. First, to show how corpus linguistics, using word frequency and concordance data, which is then analysed according to transitivity systems of systemic functional grammar (SFG, can be useful to the enterprise of critical linguistics. Second, to investigate to what extent this critical corpus linguistics (CCL gives a valid representation of the meanings and ideologies of a literary text. The hypothesis tested is that semiotic models of communication, in this case of popular children’s literature, with their emphasis on the encoding and decoding of meanings, lend themselves to a corpus linguistics approach. But that, in fact, these mutually reinforcing approaches (SFG and CCL with their reliance on what is encoded as text cannot entirely succeed in accounting for how literature, in particular, is understood and interpreted, and how ideology works within it and behind it. For a richer critical discourse analysis we need a pragmatic account, for example an analysis of presupposition, inference and propositional attitude. The issues here will be discussed in the light of recent debate between Michael Stubbs and Henry Widdowson on the strengths and limitations of corpus linguistics in critical discourse analysis. The research reported in this paper has two aims. First, to show how corpus linguistics, using word frequency and concordance data, which is then analysed according to transitivity systems of systemic functional grammar (SFG, can be useful to the enterprise of critical linguistics. Second, to investigate to what extent this critical corpus linguistics (CCL gives a valid representation of the meanings and ideologies of a literary text. The hypothesis tested is that semiotic models of communication, in this case of popular children’s literature, with their emphasis on the encoding and decoding of meanings, lend themselves to a corpus linguistics approach. But that, in fact, these
[Behavioral and cognitive profile of corpus callosum agenesia - Review].

Science.gov (United States)

Lábadi, Beatrix; Beke, Anna Maria

2016-11-30

Agenesis of corpus callosum is a relatively frequent congenital cerebral malformation including dysplasia, total or partial absence of corpus callosum. The agenesis of corpus callosum can be occured in isolated form without accompanying somatic or central nervous system abnormalities and it can be associated with other central nervus system malformations. The behavioral and cognitive outcome is more favorable for patients with isolated agenesis of corpus callous than syndromic form of corpus callosum. The aim of this study is to review recent research on behavioral and social-cognitive functions in individuals with agenesis of corpus callosum. Developmental delay is common especially in higher-order cognitive and social functions. An internet database search was performed to identify publications on the subject. Fifty-five publications in English corresponded to the criteria. These studies reported deficits in language, social cognition and emotions in individuals with agenesis of corpus callosum which is known as primary corpus callous syndrome. The results indicate that individuals with agenesis of corpus callosum have deficiency in social-cognitive domain (recognition of emotions, weakness in paralinguistic aspects of language and mentalizing abilities). The impaired social cognition can be manifested in behavioral problems like autism and attention deficit hyperactivity disorder.

Representativeness in corpora of literary texts: introducing the C18P project

Directory of Open Access Journals (Sweden)

Gemeinböck, Iris

2016-07-01

Full Text Available Currently there are very few specialised corpora of literary texts that are tailored to the needs of literary critics who are interested in corpus stylistic analyses of prose fiction. Many existing corpora including literary texts were compiled for linguistic research interests and are often unsuitable for corpus stylistic purposes. The paper addresses three of the main problems: the absence of labelling of the texts for literary genre, the use of extracts, and the prevalence of linguistic periodisation schemes. C18P is a corpus of prose fiction designed specifically to address these issues. It traces the early development of the novel from 1700 up until the Victorian era. It can, for instance, be used for an analysis of the characteristic linguistic features of individual literary genres and forms. The following paper introduces the design of the corpus as well as some of its potential uses.
MR measurement of normal corpus callosum in children

International Nuclear Information System (INIS)

Kim, Hyoung Sub; Kim, Jong Chul; Kang, Yong Soo; Lee, Young Hwan; Kim, Young Wol

1997-01-01

To measure the mean size of the various portions of the corpus callosum in normal Korean children, using MR imaging. Our subjects were 166 children (male : female=100 : 66) aged under 15 whose findings on MR imaging and neurologic examination were normal. Using midsagittal T1-weighted imaging, we measured the length of the brain and corpus callosum, the height of the latter, and the thickness of its genu body, transitional zone and splenium. The measurements were statistically analysed according to age and sex. Brain length and the size of the various portions of the corpus callosum tended to increase relatively rapidly during the first three years of life, but the rate of growth tended to decrease according to age. The mean lenght of the brain and corpus callosum and the mean thickness of the splenium of the corpus callosum did not differ according to sex. The mean thickness of the genu, body and transitional zone of the corpus callosum was greater in males than in females. The ratio of the length of the corpus callosum to the anteroposterior diameter of the brain was significantly greater in females than in males (alpha=0.05). Using MR imaging, we measured the mean sizes of the various portions of the corpus callosum in normal children;these values may provide a useful basis for determing changes occurring in its structure
Challenges for automatically extracting molecular interactions from full-text articles.

Science.gov (United States)

McIntosh, Tara; Curran, James R

2009-09-24

The increasing availability of full-text biomedical articles will allow more biomedical knowledge to be extracted automatically with greater reliability. However, most Information Retrieval (IR) and Extraction (IE) tools currently process only abstracts. The lack of corpora has limited the development of tools that are capable of exploiting the knowledge in full-text articles. As a result, there has been little investigation into the advantages of full-text document structure, and the challenges developers will face in processing full-text articles. We manually annotated passages from full-text articles that describe interactions summarised in a Molecular Interaction Map (MIM). Our corpus tracks the process of identifying facts to form the MIM summaries and captures any factual dependencies that must be resolved to extract the fact completely. For example, a fact in the results section may require a synonym defined in the introduction. The passages are also annotated with negated and coreference expressions that must be resolved.We describe the guidelines for identifying relevant passages and possible dependencies. The corpus includes 2162 sentences from 78 full-text articles. Our corpus analysis demonstrates the necessity of full-text processing; identifies the article sections where interactions are most commonly stated; and quantifies the proportion of interaction statements requiring coherent dependencies. Further, it allows us to report on the relative importance of identifying synonyms and resolving negated expressions. We also experiment with an oracle sentence retrieval system using the corpus as a gold-standard evaluation set. We introduce the MIM corpus, a unique resource that maps interaction facts in a MIM to annotated passages within full-text articles. It is an invaluable case study providing guidance to developers of biomedical IR and IE systems, and can be used as a gold-standard evaluation set for full-text IR tasks.
Lexical bundles in an advanced INTOCSU writing class and engineering texts: A functional analysis

Science.gov (United States)

Alquraishi, Mohammed Abdulrahman

The purpose of this study is to investigate the functions of lexical bundles in two corpora: a corpus of engineering academic texts and a corpus of IEP advanced writing class texts. This study is concerned with the nature of formulaic language in Pathway IEPs and engineering texts, and whether those types of texts show similar or distinctive formulaic functions. Moreover, the study looked into lexical bundles found in an engineering 1.26 million-word corpus and an ESL 65000-word corpus using a concordancing program. The study then analyzed the functions of those lexical bundles and compared them statistically using chi-square tests. Additionally, the results of this investigation showed 236 unique frequent lexical bundles in the engineering corpus and 37 bundles in the pathway corpus. Also, the study identified several differences between the density and functions of lexical bundles in the two corpora. These differences were evident in the distribution of functions of lexical bundles and the minimal overlap of lexical bundles found in the two corpora. The results of this study call for more attention to formulaic language at ESP and EAP programs.
Mining consumer health vocabulary from community-generated text.

Science.gov (United States)

Vydiswaran, V G Vinod; Mei, Qiaozhu; Hanauer, David A; Zheng, Kai

2014-01-01

Community-generated text corpora can be a valuable resource to extract consumer health vocabulary (CHV) and link them to professional terminologies and alternative variants. In this research, we propose a pattern-based text-mining approach to identify pairs of CHV and professional terms from Wikipedia, a large text corpus created and maintained by the community. A novel measure, leveraging the ratio of frequency of occurrence, was used to differentiate consumer terms from professional terms. We empirically evaluated the applicability of this approach using a large data sample consisting of MedLine abstracts and all posts from an online health forum, MedHelp. The results show that the proposed approach is able to identify synonymous pairs and label the terms as either consumer or professional term with high accuracy. We conclude that the proposed approach provides great potential to produce a high quality CHV to improve the performance of computational applications in processing consumer-generated health text.
Corpus multimedia VEIGA inglés-galego de subtitulación cinematográfica

Directory of Open Access Journals (Sweden)

Patricia Sotelo Dios

2012-01-01

Full Text Available Neste artigo presento un proxecto de investigación que consiste na compilación e na explotación do corpus Veiga, un corpus multimedia de subtítulos en inglés e en galego. Trátase dun proxecto en fase de desenvolvemento que pretende servir como ferramenta para o estudo e a investigación de certos aspectos relacionados coa práctica da subtitulación intralingüística en inglés e da subtitulación interlingüística do inglés cara ao galego. O Veiga, inda que forma parte do corpus paralelo CLUVI, transcende o plano textual propio dos demais subcorpus do CLUVI e permite observar os subtítulos no seu estado natural, isto é, como parte dun produto audiovisual. Amais de cuestións relacionadas coa construción do corpus e co sistema de buscas, mencionarei algunha das posibles utilidades deste corpus para a práctica, a investigación e a formación en subtitulación.
Lexicon and teaching: an analysis of the use of less frequent prepositions in a Spanish learners corpus

Directory of Open Access Journals (Sweden)

Jéssyca Camargo Cruz

2017-08-01

Full Text Available This article aims at presenting a quantitative research and the analysis of the use of prepositions that are less frequent (underused in a corpus of learners of Spanish as a foreign language. We have observed the use of contra, hacia, enfrente de, excepto and tras through Corpus Linguistics by contrasting this lexical set and a supplementary corpus, composed by normative and descriptive Spanish grammar and by an online reference corpus of Spanish (CREA. Therefore, we present analyses made on a corpus constituted by 276 writings (85.729 words, gathered from two groups of freshman Language/Letras students, from 2011 to 2013. The data were collected with the aid of the WordSmith Tools (version 6 software and its tools, WordList and Concord enabled us to extract the frequency list of the prepositions in the corpus of study, as well as to observe and analyse their respective uses based on the lines of concordance.
AUTOMATIC RETRIEVAL AND THE FORMALIZATION OF MULTI WORDS EXPRESSIONS WITH F-WORDS IN THE CORPUS OF CONTEMPORARY AMERICAN ENGLISH

Directory of Open Access Journals (Sweden)

Prihantoro Prihantoro

2016-01-01

Full Text Available The research problems in this research are 1 how lexicogrammar takes role in determining polarity of F-Word1 and 2 how to formalize it for corpus processing. The data is obtained from the Contemporary American English Corpus (COCA. In this corpus, F-word is proven to be highest in frequency as compared to its distribution across corpora. Corpus methodology is applied by sending queries to retrieve F-Words to COCA interface. Tokens combination surrounding F-words resulted in the phrase and clause unit accompanying F-words, which are significant cues to determine F-word polarity. The polarity is later proven to be not necessarily negative. I also designed a computational resource to allow the retrieval of F-words offline so that users might apply it to any digital text collections.
Mind-modelling with corpus stylistics in David Copperfield.

Science.gov (United States)

Stockwell, Peter; Mahlberg, Michaela

2015-05-01

We suggest an innovative approach to literary discourse by using corpus linguistic methods to address research questions from cognitive poetics. In this article, we focus on the way that readers engage in mind-modelling in the process of characterisation. The article sets out our cognitive poetic model of characterisation that emphasises the continuity between literary characterisation and real-life human relationships. The model also aims to deal with the modelling of the author's mind in line with the modelling of the minds of fictional characters. Crucially, our approach to mind-modelling is text-driven. Therefore we are able to employ corpus linguistic techniques systematically to identify textual patterns that function as cues triggering character information. In this article, we explore our understanding of mind-modelling through the characterisation of Mr. Dick from David Copperfield by Charles Dickens. Using the CLiC tool (Corpus Linguistics in Cheshire) developed for the exploration of 19th-century fiction, we investigate the textual traces in non-quotations around this character, in order to draw out the techniques of characterisation other than speech presentation. We show that Mr. Dick is a thematically and authorially significant character in the novel, and we move towards a rigorous account of the reader's modelling of authorial intention.
Divergent approaches to corpus processing: the need for ...

African Journals Online (AJOL)

With a good corpus, data can be provided giving an authoritative body of linguistic evidence which can support generalisations and against which hypotheses can be tested. As this proves the invaluable status of a corpus, the article assesses the processing of the Shona corpus and discusses how some aspects of the ...
A human language corpus for interstellar message construction

Science.gov (United States)

Elliott, John

2011-02-01

The aim of HuLCC (the human language chorus corpus), is to provide a resource of sufficient size to facilitate inter-language analysis by incorporating languages from all the major language families: for the first time all aspects of typology will be incorporated within a single corpus, adhering to a consistent grammatical classification and granularity, which historically adopt a plethora of disparate schemes. An added feature will be the inclusion of a common text element, which will be translated across all languages, to provide a precise comparable thread for detailed linguistic analysis for translation strategies and a mechanism by which these mappings can be explicitly achieved. Methods developed to solve unambiguous mappings across these languages can then be adopted for any subsequent message authored by the SETI community. Initially, it is planned to provide at least 20,000 words for each chosen language, as this amount of text exceeds the point where randomly generated text can be disambiguated from natural language and is of sufficient size useful for message transmission [1] (Elliot, 2002). This paper details the design of this resource, which ultimately will be made available to SETI upon its completion, and discusses issues 'core' to any message construction.
Sexual dimorphism of the human corpus callosum: Digital morphometric study

Directory of Open Access Journals (Sweden)

Spasojević Goran

2006-01-01

Full Text Available Background/Aim. Changes in the morphology and the size of the corpus callosum, are related to various pathological conditions. An analysis of these changes requires data about sexual dimorphism of the corpus callosum, which we tried to obtain in our study. We also investigated the method of digital morphometry and compared the obtained results with the results of other authors obtained by magnetic resonance imaging or by planimetry. Methods. A morphological research included 34 human brains (cadavers of both sexes − 19 female and 15 male aged 26−72 years. By digital morphometry using an AutoCAD software we performed measurements in the corpus callosum: the length (L, width in the half of its length (WW’, length of its cortical margin (LCM, area and perimeter of the anterior and posterior callosal segments, as well as the area and perimeter of the corpus callosum section area. The investigated parameters were analyzed and compared between the females and males. Results. There was not a statistically significant difference between the males and females in the investigated parameters of the corpus callosum (t test; p > 0.05, including the mean values of the two most important parameters, the surface of its midsagittal section area (males 654.11 mm2; females 677.40 mm2 and of its perimeter (males 19.61 cm; females 19.72 cm. The results obtained by digital morphometry were in the range of the results of other authors obtained by magnetic resonance and by planimetry. However, the value of Pearson coefficient of linear correlation between the section surface area and perimeter of the corpus callosum in the males was highly significant (rxy = 0.6943, p < 0.01, while in the females this value was statistically insignificant. Conclusion. Digital morphometry is accurate method in encephalometric investigations. Our results suggest that the problem of sexual dimorphism of the corpus callosum is very complex, because the identical variables (section
Euphemism vs explicitness: A corpus-based analysis of translated ...

African Journals Online (AJOL)

This article examines the governing initial norms, namely explicitness and euphemism in English source texts and Ndebele translations, focusing on how these norms influenced the strategies chosen by the Ndebele translators in the translation of taboo terms. In the article, a corpus-based approach is used to identify head ...
Dictionary Writing System (DWS) + Corpus Query Package (CQP ...

African Journals Online (AJOL)

In this article the integrated corpus query functionality of the dictionary compilation software TshwaneLex is analysed. Attention is given to the handling of both raw corpus data and annotated corpus data. With regard to the latter it is shown how, with a minimum of human effort, machine learning techniques can be employed ...
Functions of Expressions of Futurality in Professional Economic Texts

Directory of Open Access Journals (Sweden)

Mikuláš Martin

2016-07-01

Full Text Available The aim of this corpus-based study is to identify the functions that selected expressions of futurality can express in professional economic texts. The classification of functions is established on the corpus of seven economic books. Excerpted instances of futural constructions are analysed with respect to textual and interpersonal functions as defined by Halliday. Futurality is interpreted broadly to include all lexical and grammatical means referring to the future. This approach makes it also possible to analyse futurality as a means of text coherence. Hence the core grammatical means are interpreted along with co-occurring lexical means under the two categories of functions to provide a comprehensive model of text coherence with regard to futurality. Frequency analysis shows that core futural expressions are not distributed equally throughout the corpus. While some expressions (e.g., will and the present simple tense dominate, others prove to be rather insignificant (e.g., be on the point/verge of, the present progressive tense. In addition, both lexical and grammatical constructions regularly co-occur in clusters, contributing to the coherence of the economic texts.
Dictionary Writing System (DWS + Corpus Query Package (CQP: The Case of TshwaneLex

Directory of Open Access Journals (Sweden)

Gilles-Maurice de Schryver

2011-10-01

Full Text Available
Abstract: In this article the integrated corpus query functionality of the dictionary compilation software TshwaneLex is analysed. Attention is given to the handling of both raw corpus data and annotated corpus data. With regard to the latter it is shown how, with a minimum of human effort, machine learning techniques can be employed to obtain part-of-speech tagged corpora that can be used for lexicographic purposes. All points are illustrated with data drawn from English and Northern Sotho. The tools and techniques themselves, however, are language-independent, and as such the encouraging outcomes of this study are far-reaching.
Keywords: LEXICOGRAPHY, DICTIONARY, SOFTWARE, DICTIONARY WRITING SYS-TEM (DWS, CORPUS QUERY PACKAGE (CQP, TSHWANELEX, CORPUS, CORPUS ANNO-TATION, PART-OF-SPEECH TAGGER (POS-TAGGER, MACHINE LEARNING, NORTHERN SOTHO (SESOTHO SA LEBOA
Samenvatting: Woordenboekaanmaaksysteem + corpusanalysepakket: een studie van TshwaneLex. In dit artikel wordt het geïntegreerde corpusanalysepakket van het woordenboekaanmaaksysteem TshwaneLex geanalyseerd. Aandacht gaat zowel naar het verwer-ken van onbewerkte corpusdata als naar geannoteerde corpusdata. Wat het laatste betreft wordt aangetoond hoe, met een minimum aan intellectuele arbeid, automatische leertechnieken met suc-ces kunnen worden ingezet om corpora voor lexicografische doeleinden aan te maken waarin de woordklassen expliciet worden vermeld. Alle stappen van de redenering worden geïllustreerd met gegevens uit het Engels en Noord-Sotho. De instrumenten en technieken zelf zijn echter allemaal taalonafhankelijk, waardoor de veelbelovende resultaten van deze studie verreikend zijn.
Sleutelwoorden: LEXICOGRAFIE, WOORDENBOEK, SOFTWARE, WOORDENBOEK-AANMAAKSYSTEEM, CORPUSANALYSEPAKKET, TSHWANELEX, CORPUS, CORPUSANNO-TATIE, WOORDKLASSETAGGER, AUTOMATISCHE LEERTECHNIEKEN, NOORD-SOTHO
The Dependency Structure of Coordinate Phrases: A Corpus Approach

Science.gov (United States)

Temperley, David

2005-01-01

Hudson (1990) proposes that each conjunct in a coordinate phrase forms dependency relations with heads or dependents outside the coordinate phrase (the "multi-head" view). This proposal is tested through corpus analysis of Wall Street Journal text. For right-branching constituents (such as direct-object NPs), a short-long preference for conjunct…
Corpus web 2.0 : quelques enjeux méthodologiques et épistémologiques

Directory of Open Access Journals (Sweden)

Sabrina Bevilacqua

2016-12-01

Full Text Available Nombre de défis d’ordre méthodologique et épistémologique s’imposent aujourd’hui à la recherche scientifique orientée vers le travail sur des corpus numériques. Certes, chaque plateforme virtuelle présente une écologie spécifique (Paveau, 2013a, 2013b qui oriente un rapprochement différent tant de l’objet que du corpus. Ainsi, l’environnement Facebook (FBK, une surface essentiellement multiforme suggère un regard qui puisse saisir son hétérogénéité sémiotique et énonciative. Dans ce travail, nous visons, d’abord, une redéfinition de la notion de corpus comme « matrice du sens » (Mayaffre, 2011 : 11 permettant de focaliser les enjeux scientifiques que la conception des corpus numériques issus du Web 2.0, notamment, de FBK, entraîne; ensuite, la description de certains concepts méthodologiques et épistémologiques fondamentaux — linéarité, technodiscours, sérialité, réticularité — aidant à l’élaboration ainsi qu’à la gestion de corpus FBK.
Evaluating stance-annotated sentences from the Brexit Blog Corpus: A quantitative linguistic analysis

Directory of Open Access Journals (Sweden)

Simaki Vasiliki

2018-03-01

Full Text Available This paper offers a formally driven quantitative analysis of stance-annotated sentences in the Brexit Blog Corpus (BBC. Our goal is to identify features that determine the formal profiles of six stance categories (contrariety, hypotheticality, necessity, prediction, source of knowledge and uncertainty in a subset of the BBC. The study has two parts: firstly, it examines a large number of formal linguistic features, such as punctuation, words and grammatical categories that occur in the sentences in order to describe the specific characteristics of each category, and secondly, it compares characteristics in the entire data set in order to determine stance similarities in the data set. We show that among the six stance categories in the corpus, contrariety and necessity are the most discriminative ones, with the former using longer sentences, more conjunctions, more repetitions and shorter forms than the sentences expressing other stances. necessity has longer lexical forms but shorter sentences, which are syntactically more complex. We show that stance in our data set is expressed in sentences with around 21 words per sentence. The sentences consist mainly of alphabetical characters forming a varied vocabulary without special forms, such as digits or special characters.
JaSlo: Integration of a Japanese-Slovene Bilingual Dictionary with a Corpus Search System

Directory of Open Access Journals (Sweden)

Kristina HMELJAK SANGAWA

2012-12-01

Full Text Available The paper presents a set of integrated on-line language resources targeted at Japanese language learners, primarily those whose mother tongue is Slovene. The resources consist of the on-line Japanese-Slovene learners’ dictionary jaSlo and two corpora, a 1 million word Japanese-Slovene parallel corpus and a 300 million word corpus of web pages, where each word and sentence is marked by its difficulty level; this corpus is furthermore available as a set of five distinct corpora, each one containing sentences of the particular level. The corpora are available for exploration through NoSketch Engine, the open source version of the commercial state-of-the-art corpus analysis software Sketch Engine. The dictionary is available for Web searching, and dictionary entries have direct links to examples from the corpora, thus offering a wider picture of a possible translations in concrete contextualised examples, and b monolingual Japanese usage examples of different difficulty levels to support language learning.

CORPUS CHRISTI E A FOME NO MUNDO

Directory of Open Access Journals (Sweden)

Carlos Alberto dos Santos Dutra

1995-01-01

Full Text Available A festa religiosa de Corpus Christi, data da instituição da Eucaristia, neste ano foi comemorada no dia 15 de junho. Neste sacramento, entende a Igreja católica, o próprio Cristo se comunica para nutrir e salvar o homem. Expressão e síntese do cristianismo, é a identificação do sacrifício de Cristo com o sacrifício do homem.
Un corpus DIY pour l’étude du roumain en diachronie. Stratégies de constitution et stratégies de recherche

Directory of Open Access Journals (Sweden)

Ana Zisman

2017-12-01

Full Text Available The present paper aims to provide an overview of some of the advantages of creating and working with a DIY corpus, i.e. a corpus compiled by the linguist, as groundwork for a PhD thesis. Collected in order to investigate the grammatical and pragmatical behavior in historical Romanian of some so-called parenthetical verbs: a zice/ a spune ‘to say’, a crede ‘to think’, a şti ‘to know’, within 5 types of texts from the 16th/17th to the 20th centuries, this DIY corpus represents a necessary alternative as a database of Romania texts. Although its creation demanded some additional steps (e.g. the selection of the texts, which is determined by various diachronical factors, such a corpus proves to be relevant for investigating parenthetical verbs in literary, historical and law texts, as well as in formal and informal letters. In order to do so, the paradigm of the afore-mentioned verbs has to be systematized in relation to a precise word frequency per text type.
A unified approach for development of Urdu Corpus for OCR and demographic purpose

Science.gov (United States)

Choudhary, Prakash; Nain, Neeta; Ahmed, Mushtaq

2015-02-01

This paper presents a methodology for the development of an Urdu handwritten text image Corpus and application of Corpus linguistics in the field of OCR and information retrieval from handwritten document. Compared to other language scripts, Urdu script is little bit complicated for data entry. To enter a single character it requires a combination of multiple keys entry. Here, a mixed approach is proposed and demonstrated for building Urdu Corpus for OCR and Demographic data collection. Demographic part of database could be used to train a system to fetch the data automatically, which will be helpful to simplify existing manual data-processing task involved in the field of data collection such as input forms like Passport, Ration Card, Voting Card, AADHAR, Driving licence, Indian Railway Reservation, Census data etc. This would increase the participation of Urdu language community in understanding and taking benefit of the Government schemes. To make availability and applicability of database in a vast area of corpus linguistics, we propose a methodology for data collection, mark-up, digital transcription, and XML metadata information for benchmarking.
Data for lexicography The central role of the corpus

Directory of Open Access Journals (Sweden)

Allan F. Lauder

2010-10-01

Full Text Available This paper looks at the nature of data for lexicography and in particular on the central role that electronic corpora can play in providing it. Data has traditionally come from existing dictionaries, citations, and from the lexicographer’s own knowledge of words, through introspection. Each of these is examined and evaluated. Then the electronic corpus is considered. Different kinds of corpora are described and key design criteria are explained, in particular the size of corpus needed for lexicography as well as the issue of representativeness and sampling. The advantages and disadvantages of corpora are weighed and compared against the other types of data. While each of these has benefits, it is argued that corpora are a requirement, not an option, as data for dictionary making.
Magnetic resonance findings of the corpus callosum in canine and feline lysosomal storage diseases.

Directory of Open Access Journals (Sweden)

Daisuke Hasegawa

Full Text Available Several reports have described magnetic resonance (MR findings in canine and feline lysosomal storage diseases such as gangliosidoses and neuronal ceroid lipofuscinosis. Although most of those studies described the signal intensities of white matter in the cerebrum, findings of the corpus callosum were not described in detail. A retrospective study was conducted on MR findings of the corpus callosum as well as the rostral commissure and the fornix in 18 cases of canine and feline lysosomal storage diseases. This included 6 Shiba Inu dogs and 2 domestic shorthair cats with GM1 gangliosidosis; 2 domestic shorthair cats, 2 familial toy poodles, and a golden retriever with GM2 gangliosidosis; and 2 border collies and 3 chihuahuas with neuronal ceroid lipofuscinoses, to determine whether changes of the corpus callosum is an imaging indicator of those diseases. The corpus callosum and the rostral commissure were difficult to recognize in all cases of juvenile-onset gangliosidoses (GM1 gangliosidosis in Shiba Inu dogs and domestic shorthair cats and GM2 gangliosidosis in domestic shorthair cats and GM2 gangliosidosis in toy poodles with late juvenile-onset. In contrast, the corpus callosum and the rostral commissure were confirmed in cases of GM2 gangliosidosis in a golden retriever and canine neuronal ceroid lipofuscinoses with late juvenile- to early adult-onset, but were extremely thin. Abnormal findings of the corpus callosum on midline sagittal images may be a useful imaging indicator for suspecting lysosomal storage diseases, especially hypoplasia (underdevelopment of the corpus callosum in juvenile-onset gangliosidoses.
Corpus linguistics and statistics with R introduction to quantitative methods in linguistics

CERN Document Server

Desagulier, Guillaume

2017-01-01

This textbook examines empirical linguistics from a theoretical linguist’s perspective. It provides both a theoretical discussion of what quantitative corpus linguistics entails and detailed, hands-on, step-by-step instructions to implement the techniques in the field. The statistical methodology and R-based coding from this book teach readers the basic and then more advanced skills to work with large data sets in their linguistics research and studies. Massive data sets are now more than ever the basis for work that ranges from usage-based linguistics to the far reaches of applied linguistics. This book presents much of the methodology in a corpus-based approach. However, the corpus-based methods in this book are also essential components of recent developments in sociolinguistics, historical linguistics, computational linguistics, and psycholinguistics. Material from the book will also be appealing to researchers in digital humanities and the many non-linguistic fields that use textual data analysis and t...
The VGLC: The Video Game Level Corpus

OpenAIRE

Summerville, Adam James; Snodgrass, Sam; Mateas, Michael; Ontañón, Santiago

2016-01-01

Levels are a key component of many different video games, and a large body of work has been produced on how to procedurally generate game levels. Recently, Machine Learning techniques have been applied to video game level generation towards the purpose of automatically generating levels that have the properties of the training corpus. Towards that end we have made available a corpora of video game levels in an easy to parse format ideal for different machine learning and other game AI researc...
Macroscopic morphometry of the corpus luteum of pregnant and non-pregnant zebu cows in the Colombian tropics

Directory of Open Access Journals (Sweden)

Marco González T

2017-07-01

Full Text Available The objective of the study was to determine the volume, weight, measures, ovarian location and shape of the corpus luteum of pregnant and non - pregnant cows from zebu cows of the Colombian tropics. 528 reproductive tracts were collected; 264 pregnant and 264 non-pregnant of cows benefited at the local slaughterhouse in Monteria, Córdoba, Colombia. The period of collection of the samples was extended for three months. After collection of each reproductive tract, the ovaries were separated, identified as right and left, weighed and measured. Then the drawing of the location of the corpus luteum was performed on the ovary according to the anatomical planes previously established in the corresponding form. Subsequently the corpus luteum was removed to perform their measurements, weighings and visualization of their shape. There were statistical differences between the locations of the corpus luteum in the ovary: Anterior pole, posterior pole, free edge, upper face and lower face (p≤0.05. The weight and volume of gestational corpus luteum was greater by 30 and 27.9% than the corpus luteum of non-pregnant cows. The predominant form por shape of the corpus luteum in both pregnant and non-pregnant cows was oval, then pyramidal and finally rounded. No gestation was observed contralateral to the location of the corpus luteum.
A corpus and a concordancer of academic journal articles.

Science.gov (United States)

Kwary, Deny A

2018-02-01

This data article presents a corpus (i.e. a selection of a big number of words in an electronic form) and a concordancer (i.e. a tool to show the word in its context of use) of academic journal articles. As the title suggests, the data were collected from research articles published in academic journals. The corpus contains 5,686,428 words selected from 895 journal articles published by Elsevier in 2011-2015. The corpus is classified into four subject areas: Health sciences, Life sciences, Physical Sciences, and Social Sciences, following the classifications of Scopus, which is the largest abstract and citation database of peer-reviewed scientific journals, books and conference proceedings. To ease the access and utilization of the corpus, a program to produce the key word in context (KWIC) and word frequency was created and placed on the website: corpus.kwary.net. The corpus is a valuable resource for researchers, teachers, and translators working on academic English.
Text Fabric: What, How, and Why

NARCIS (Netherlands)

Erwich, C.M.; Kingham, Cody

Text-Fabric (TF) is a promising new framework for the Eep Talstra Center for Bible and Computer corpus plus (linguistic) annotations. TF is a Python 3.x software package that provides scientific, accessible and reproducible ways of processing Biblical Hebrew text data. It also allows sharing the
Revisiting corpus creation and analysis tools for translation tasks

Directory of Open Access Journals (Sweden)

Claudio Fantinuoli

2016-06-01

Full Text Available Many translation scholars have proposed the use of corpora to allow professional translators to produce high quality texts which read like originals. Yet, the diffusion of this methodology has been modest, one reason being the fact that software for corpora analyses have been developed with the linguist in mind, which means that they are generally complex and cumbersome, offering many advanced features, but lacking the level of usability and the specific features that meet translators’ needs. To overcome this shortcoming, we have developed TranslatorBank, a free corpus creation and analysis tool designed for translation tasks. TranslatorBank supports the creation of specialized monolingual corpora from the web; it includes a concordancer with a query system similar to a search engine; it uses basic statistical measures to indicate the reliability of results; it accesses the original documents directly for more contextual information; it includes a statistical and linguistic terminology extraction utility to extract the relevant terminology of the domain and the typical collocations of a given term. Designed to be easy and intuitive to use, the tool may help translation students as well as professionals to increase their translation quality by adhering to the specific linguistic variety of the target text corpus.
Corpus callosum agenesis: Role of fetal magnetic resonance imaging

Directory of Open Access Journals (Sweden)

Achour Radhouane

2016-05-01

Full Text Available Corpus callosum agenesis (CCA was evaluated by ultrasound examination and magnetic resonance imaging (MRI with many studies. Ultrasonography was able to suspect CCA by indirect signs but a definitive diagnosis of CCA was achieved in rare cases. MRI was able to diagnose complete CCA in majority of cases. Additional neurological abnormalities including heterotopia, gyration anomaly, asymmetry of the cerebral hemispheres, and Dandy-Walker variant were documented, as well as an ocular anomaly which was described, by MRI examination. Prenatal counseling for fetal agenesis of the corpus callosum is difficult as the prognosis is uncertain. The association with other cerebral abnormalities increases the likelihood of a poor outcome and ultrasonographic assessment of the fetal brain is limited. We found MRI to be a safe and useful additional procedure to complement ultrasonographic diagnosis or suspicion of CCA.
Hispanismos y canarismos en un corpus de textos ingleses sobre Canarias

Directory of Open Access Journals (Sweden)

María-Isabel González-Cruz

2013-12-01

Full Text Available ResumenLas Islas Canarias (España siempre han mantenido un estrecho contactocon el mundo anglosajón, lo que ha generado importantes consecuencias económicas, así como también socioculturales, lingüísticas y literarias. Un análisis de la bibliografía inglesa sobre Canarias revela, entre otros aspectos, la tendencia al uso de hispanismos y canarismos. Este trabajo ofrece el registro de esas voces que aparecen en un corpus de catorce obras tomadas de la extensa bibliografía anglocanaria. Tras revisar brevemente la relevancia del hispanismo inglés, nuestra recopilación intenta resaltar la contribución del español de Canarias al enriquecimiento del vocabulario de la lengua inglesa, constatando cuáles de los hispanismos de nuestro corpus que son canarismos han pasado al registro lexicográfico realizado por elShorter Oxford English Dictionary on Historical Principles (2007.AbstractThe Canary Islands (Spain have always been in close contact with the Anglo-Saxon world, which has had important consequences for the economy but also at the socio-cultural, linguistic and literary levels. A review of the English bibliography on the Canaries reveals, among other aspects, a tendency in most authors to use hispanicisms and canarianisms in their texts. This article offers a record of those words which appear in a corpus of fourteen works taken from this extensive bibliography. Apart from providing an overview of the studies on hispanicisms in English, this paper’s main aim is to highlight the contribution of Canarian Spanish to the enrichment of the vocabulary of English by checking which of the hispanicisms in our corpus, which are actually canarianisms, have been included in the lexical repertoire of the Shorter Oxford English Dictionaryon Historical Principles (2007.
Large-corpus phoneme and word recognition and the generality of lexical context in CVC word perception.

Science.gov (United States)

Gelfand, Jessica T; Christie, Robert E; Gelfand, Stanley A

2014-02-01

Speech recognition may be analyzed in terms of recognition probabilities for perceptual wholes (e.g., words) and parts (e.g., phonemes), where j or the j-factor reveals the number of independent perceptual units required for recognition of the whole (Boothroyd, 1968b; Boothroyd & Nittrouer, 1988; Nittrouer & Boothroyd, 1990). For consonant-vowel-consonant (CVC) nonsense syllables, j ∼ 3 because all 3 phonemes are needed to identify the syllable, but j ∼ 2.5 for real-word CVCs (revealing ∼2.5 independent perceptual units) because higher level contributions such as lexical knowledge enable word recognition even if less than 3 phonemes are accurately received. These findings were almost exclusively determined with the 120-word corpus of the isophonemic word lists (Boothroyd, 1968a; Boothroyd & Nittrouer, 1988), presented one word at a time. It is therefore possible that its generality or applicability may be limited. This study thus determined j by using a much larger and less restricted corpus of real-word CVCs presented in 3-word groups as well as whether j is influenced by test size. The j-factor for real-word CVCs was derived from the recognition performance of 223 individuals with a broad range of hearing sensitivity by using the Tri-Word Test (Gelfand, 1998), which involves 50 three-word presentations and a corpus of 450 words. The influence of test size was determined from a subsample of 96 participants with separate scores for the first 10, 20, and 25 (and all 50) presentation sets of the full test. The mean value of j was 2.48 with a 95% confidence interval of 2.44-2.53, which is in good agreement with values obtained with isophonemic word lists, although its value varies among individuals. A significant correlation was found between percent-correct scores and j, but it was small and accounted for only 12.4% of the variance in j for phoneme scores ≥60%. Mean j-factors for the 10-, 20-, 25-, and 50-set test sizes were between 2.49 and 2.53 and were not
Sentence‐Chain Based Seq2seq Model for Corpus Expansion

Directory of Open Access Journals (Sweden)

Euisok Chung

2017-08-01

Full Text Available This study focuses on a method for sequential data augmentation in order to alleviate data sparseness problems. Specifically, we present corpus expansion techniques for enhancing the coverage of a language model. Recent recurrent neural network studies show that a seq2seq model can be applied for addressing language generation issues; it has the ability to generate new sentences from given input sentences. We present a method of corpus expansion using a sentence‐chain based seq2seq model. For training the seq2seq model, sentence chains are used as triples. The first two sentences in a triple are used for the encoder of the seq2seq model, while the last sentence becomes a target sequence for the decoder. Using only internal resources, evaluation results show an improvement of approximately 7.6% relative perplexity over a baseline language model of Korean text. Additionally, from a comparison with a previous study, the sentence chain approach reduces the size of the training data by 38.4% while generating 1.4‐times the number of n‐grams with superior performance for English text.
Towards a corpus of South African English: corralling the sub-varieties

African Journals Online (AJOL)

Riette Ruthven

important step towards the creation of a truly representative large corpus of SAE and ... Census data which elicit information about home language do not tell .... ISAE has absorbed lexical items such as robot (traffic light), dagha (mud), baba- ..... used their access to existing social networks to identify other contributors to.
Spontal-N: A Corpus of Interactional Spoken Norwegian

OpenAIRE

Sikveland, A.; Öttl, A.; Amdal, I.; Ernestus, M.; Svendsen, T.; Edlund, J.

2010-01-01

Spontal-N is a corpus of spontaneous, interactional Norwegian. To our knowledge, it is the first corpus of Norwegian in which the majority of speakers have spent significant parts of their lives in Sweden, and in which the recorded speech displays varying degrees of interference from Swedish. The corpus consists of studio quality audio- and video-recordings of four 30-minute free conversations between acquaintances, and a manual orthographic transcription of the entire material. On basis of t...
Perceptual evaluation of corpus-based speech synthesis techniques in under-resourced environments

CSIR Research Space (South Africa)

Van Niekerk, DR

2009-11-01

Full Text Available With the increasing prominence and maturity of corpus-based techniques for speech synthesis, the process of system development has in some ways been simplified considerably. However, the dependence on sufficient amounts of relevant speech data...
ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus.

Science.gov (United States)

Afzal, Zubair; Pons, Ewoud; Kang, Ning; Sturkenboom, Miriam C J M; Schuemie, Martijn J; Kors, Jan A

2014-11-29

In order to extract meaningful information from electronic medical records, such as signs and symptoms, diagnoses, and treatments, it is important to take into account the contextual properties of the identified information: negation, temporality, and experiencer. Most work on automatic identification of these contextual properties has been done on English clinical text. This study presents ContextD, an adaptation of the English ConText algorithm to the Dutch language, and a Dutch clinical corpus. We created a Dutch clinical corpus containing four types of anonymized clinical documents: entries from general practitioners, specialists' letters, radiology reports, and discharge letters. Using a Dutch list of medical terms extracted from the Unified Medical Language System, we identified medical terms in the corpus with exact matching. The identified terms were annotated for negation, temporality, and experiencer properties. To adapt the ConText algorithm, we translated English trigger terms to Dutch and added several general and document specific enhancements, such as negation rules for general practitioners' entries and a regular expression based temporality module. The ContextD algorithm utilized 41 unique triggers to identify the contextual properties in the clinical corpus. For the negation property, the algorithm obtained an F-score from 87% to 93% for the different document types. For the experiencer property, the F-score was 99% to 100%. For the historical and hypothetical values of the temporality property, F-scores ranged from 26% to 54% and from 13% to 44%, respectively. The ContextD showed good performance in identifying negation and experiencer property values across all Dutch clinical document types. Accurate identification of the temporality property proved to be difficult and requires further work. The anonymized and annotated Dutch clinical corpus can serve as a useful resource for further algorithm development.
Techniques and Rules of Ineffability in the Dionysian Corpus

Directory of Open Access Journals (Sweden)

Knepper Timothy D.

2014-06-01

Full Text Available Is the Dionysian God, or an experience of the Dionysian God, absolutely ineffable? Does the Dionysian corpus assert or perform such ineffability? This paper will argue that the answer to each of these questions is no. The Dionysian God is known hyper-nous as the hyper-ousia cause of all. And the Dionysian corpus unambiguously refers to, asserts of, and metaphorizes about this God just so. In arguing these points, this paper will call upon both the speech act theory of John Searle and the metaphor theory of George Lakoff and Mark Johnson. More particularly, it will look to Searle’s rules of reference and predication and conditions of illocutionary acts, as well as Lakoff and Johnson’s schematization of metaphor gestalt and entailment to show how Dionysian expressions of inexpressibility are rule-governed and the Dionysian God is thereby (relatively effable.

arTenTen: Arabic Corpus and Word Sketches

Directory of Open Access Journals (Sweden)

Tressy Arts

2014-12-01

The article also presents the ‘sketch grammar’ (the basis for the word sketches in detail, describes the process of building and processing the corpus, and considers the role of the corpus in additional research on Arabic.
NCBI disease corpus: a resource for disease name recognition and concept normalization.

Science.gov (United States)

Doğan, Rezarta Islamaj; Leaman, Robert; Lu, Zhiyong

2014-02-01

Information encoded in natural language in biomedical literature publications is only useful if efficient and reliable ways of accessing and analyzing that information are available. Natural language processing and text mining tools are therefore essential for extracting valuable information, however, the development of powerful, highly effective tools to automatically detect central biomedical concepts such as diseases is conditional on the availability of annotated corpora. This paper presents the disease name and concept annotations of the NCBI disease corpus, a collection of 793 PubMed abstracts fully annotated at the mention and concept level to serve as a research resource for the biomedical natural language processing community. Each PubMed abstract was manually annotated by two annotators with disease mentions and their corresponding concepts in Medical Subject Headings (MeSH®) or Online Mendelian Inheritance in Man (OMIM®). Manual curation was performed using PubTator, which allowed the use of pre-annotations as a pre-step to manual annotations. Fourteen annotators were randomly paired and differing annotations were discussed for reaching a consensus in two annotation phases. In this setting, a high inter-annotator agreement was observed. Finally, all results were checked against annotations of the rest of the corpus to assure corpus-wide consistency. The public release of the NCBI disease corpus contains 6892 disease mentions, which are mapped to 790 unique disease concepts. Of these, 88% link to a MeSH identifier, while the rest contain an OMIM identifier. We were able to link 91% of the mentions to a single disease concept, while the rest are described as a combination of concepts. In order to help researchers use the corpus to design and test disease identification methods, we have prepared the corpus as training, testing and development sets. To demonstrate its utility, we conducted a benchmarking experiment where we compared three different
Short Message Service (SMS) Texting Symbols: A Functional Analysis of 10,000 Cellular Phone Text Messages

Science.gov (United States)

Beasley, Robert E.

2009-01-01

The purpose of this study was to investigate the use of symbolic expressions (e.g., "BTW," "LOL," "UR") in an SMS text messaging corpus consisting of over 10,000 text messages. More specifically, the purpose was to determine, not only how frequently these symbolic expressions are used, but how they are utilized in terms of the language functions…
Network Analysis with the Enron Email Corpus

Science.gov (United States)

Hardin, J. S.; Sarkis, G.; URC, P. .

2015-01-01

We use the Enron email corpus to study relationships in a network by applying six different measures of centrality. Our results came out of an in-semester undergraduate research seminar. The Enron corpus is well suited to statistical analyses at all levels of undergraduate education. Through this article's focus on centrality, students can explore…
Hemoperitoneum from corpus luteum rupture in patients with aplastic anemia.

Science.gov (United States)

Wang, Huaquan; Guo, Lifang; Shao, Zonghong

2015-01-01

Aplastic anemia is a rare hematopoietic stem-cell disorder that results in pancytopenia and hypocellular bone marrow. Women with aplastic anemia usually are at increased risk of corpus luteum rupture due to thrombocytopenia and infection. Here we report two cases had hemoperitoneum from corpus luteum rupture in patients with aplastic anemia in our center. Case 1 involved two episodes of hemoperitoneum resulting from rupture of the corpus luteum in a 23-year-old unmarried female with severe aplastic anemia. This patient was managed conservatively with platelet and packed red cell transfusion. Case 2 involved two episodes of hemoperitoneum resulting from rupture of the corpus luteum in a 33-year-old married patient with aplastic anemia. Emergency laparoscopy revealed massive hemoperitoneum. Bilateral salpingo-oophorectomy were performed successively with platelet and packed red cell transfusion. Hemoperitoneum resulting from a ruptured corpus luteum is a life-threatening condition in patients with aplastic anemia. Prompt and appropriate evaluation of corpus luteum rupture and emergent therapy are needed.
Assessment of disease named entity recognition on a corpus of annotated sentences.

Science.gov (United States)

Jimeno, Antonio; Jimenez-Ruiz, Ernesto; Lee, Vivian; Gaudan, Sylvain; Berlanga, Rafael; Rebholz-Schuhmann, Dietrich

2008-04-11

In recent years, the recognition of semantic types from the biomedical scientific literature has been focused on named entities like protein and gene names (PGNs) and gene ontology terms (GO terms). Other semantic types like diseases have not received the same level of attention. Different solutions have been proposed to identify disease named entities in the scientific literature. While matching the terminology with language patterns suffers from low recall (e.g., Whatizit) other solutions make use of morpho-syntactic features to better cover the full scope of terminological variability (e.g., MetaMap). Currently, MetaMap that is provided from the National Library of Medicine (NLM) is the state of the art solution for the annotation of concepts from UMLS (Unified Medical Language System) in the literature. Nonetheless, its performance has not yet been assessed on an annotated corpus. In addition, little effort has been invested so far to generate an annotated dataset that links disease entities in text to disease entries in a database, thesaurus or ontology and that could serve as a gold standard to benchmark text mining solutions. As part of our research work, we have taken a corpus that has been delivered in the past for the identification of associations of genes to diseases based on the UMLS Metathesaurus and we have reprocessed and re-annotated the corpus. We have gathered annotations for disease entities from two curators, analyzed their disagreement (0.51 in the kappa-statistic) and composed a single annotated corpus for public use. Thereafter, three solutions for disease named entity recognition including MetaMap have been applied to the corpus to automatically annotate it with UMLS Metathesaurus concepts. The resulting annotations have been benchmarked to compare their performance. The annotated corpus is publicly available at ftp://ftp.ebi.ac.uk/pub/software/textmining/corpora/diseases and can serve as a benchmark to other systems. In addition, we found
Text de-identification for privacy protection: a study of its impact on clinical text information content.

Science.gov (United States)

Meystre, Stéphane M; Ferrández, Óscar; Friedlin, F Jeffrey; South, Brett R; Shen, Shuying; Samore, Matthew H

2014-08-01

As more and more electronic clinical information is becoming easier to access for secondary uses such as clinical research, approaches that enable faster and more collaborative research while protecting patient privacy and confidentiality are becoming more important. Clinical text de-identification offers such advantages but is typically a tedious manual process. Automated Natural Language Processing (NLP) methods can alleviate this process, but their impact on subsequent uses of the automatically de-identified clinical narratives has only barely been investigated. In the context of a larger project to develop and investigate automated text de-identification for Veterans Health Administration (VHA) clinical notes, we studied the impact of automated text de-identification on clinical information in a stepwise manner. Our approach started with a high-level assessment of clinical notes informativeness and formatting, and ended with a detailed study of the overlap of select clinical information types and Protected Health Information (PHI). To investigate the informativeness (i.e., document type information, select clinical data types, and interpretation or conclusion) of VHA clinical notes, we used five different existing text de-identification systems. The informativeness was only minimally altered by these systems while formatting was only modified by one system. To examine the impact of de-identification on clinical information extraction, we compared counts of SNOMED-CT concepts found by an open source information extraction application in the original (i.e., not de-identified) version of a corpus of VHA clinical notes, and in the same corpus after de-identification. Only about 1.2-3% less SNOMED-CT concepts were found in de-identified versions of our corpus, and many of these concepts were PHI that was erroneously identified as clinical information. To study this impact in more details and assess how generalizable our findings were, we examined the overlap between
Divergent Approaches to Corpus Processing: The Need for ...

African Journals Online (AJOL)

Riette Ruthven

McEnery and Wilson (1996: 32) stress the importance of a corpus: 'As a stan- ... close to five million running words, and the Ndebele corpus at around three ... since their introduction and reinforcement through the second form of contact.
Seimo posėdžių stenogramų tekstynas autorystės nustatymo bei autoriaus profilio sudarymo tyrimams | Corpus of transcribed parliamentary speeches for authorship attribution and author profiling tasks

Directory of Open Access Journals (Sweden)

Jurgita Kapočiūtė-Dzikienė

2014-12-01

Full Text Available In our paper we present a corpus of transcribed Lithuanian parliamentary speeches. The corpus is prepared in a specific format, appropriate for different authorship identification tasks. The corpus consists of approximately 111 thousand texts (24 million words. Each text matches one parliamentary speech produced during an ordinary session from the period of 7 parliamentary terms starting on March 10, 1990 and ending on December 23, 2013. The texts are grouped into 147 categories corresponding to individual authors, therefore they can be used for authorship attribution tasks; besides, these texts are also grouped according to age, gender and political views, therefore they are also suitable for author profiling tasks. Whereas short texts complicate recognition of author speaking style and are ambiguous in relation to the style of other authors, we incorporated only texts containing not less than 100 words into the corpus. In order to make each category as comprehensive and representative as possible, we included only those authors, who produced speeches at least 200 times. All the texts are lemmatized, morphologically and syntactically annotated, tokenized into the character n-grams. The statistical information of the corpus is also available. We have also demonstrated that the created corpus can be effectively used in authorship attribution and author profiling tasks with supervised machine learning methods. The corpus structure also allows using it with unsupervised machine learning methods and can be used for creation of rule-based methods, as well as in different linguistic analyses.
Arabic Text Categorization Using Improved k-Nearest neighbour Algorithm

Directory of Open Access Journals (Sweden)

Wail Hamood KHALED

2014-10-01

Full Text Available The quantity of text information published in Arabic language on the net requires the implementation of effective techniques for the extraction and classifying of relevant information contained in large corpus of texts. In this paper we presented an implementation of an enhanced k-NN Arabic text classifier. We apply the traditional k-NN and Naive Bayes from Weka Toolkit for comparison purpose. Our proposed modified k-NN algorithm features an improved decision rule to skip the classes that are less similar and identify the right class from k nearest neighbours which increases the accuracy. The study evaluates the improved decision rule technique using the standard of recall, precision and f-measure as the basis of comparison. We concluded that the effectiveness of the proposed classifier is promising and outperforms the classical k-NN classifier.
Clinical significance of the corpus callosum in cerebral palsy

International Nuclear Information System (INIS)

Lee, Eun Ja; Kim, Ji Chang; Kim, Jong Chul; And Others

2000-01-01

To evaluate, using magnetic resonance (MR) imaging, the clinal significance of the corpus callosum by measuring the size of various portions of the corpus callosum in children with cerebral palsy, and in paired controls. Fifty-two children (30 boys and 22 girls aged between six and 96 (median, 19) months) in whom cerebral palsy was clinically diagnosed underwent MR imaging. There were 23 term patients and 29 preterm, and the control group was selected by age and sex matching. Clinal subtypes of cerebral palsy were classified as hemiplegia (n=14), spastic diplegia (n=22), or spastic quadriplegia (n=16), and according to the severity of motor palsy, the condition was also classified as mild (n=26), moderate (n=13), or severe (n=13). In addition to the length and height of the corpus callosum, the thickness of its genu, body, transitional zone and splenium, as seen on midsagittal T1-weighted MR images, were also measured. Differences in the measured values of the two groups were statistically analysed and differences in the size of the corpus callosum according to the clinical severity and subtypes of cerebral palsy, and gestational age, were also assessed. Except for height, the measured values of the corpus callosum in patients with cerebral palsy were significantly less than those of the control group (p less than 0.05). Its size decreased according to the severity of motor palsy. Compared with term patients, the corpus callosum in preterm patients was considerably smaller (p less than 0.05). There was statistically significant correlation between the severity of motor palsy and the size of the corpus callosum. Quantitative evaluation of the corpus callosum might be a good indicator of neurologic prognosis, and a sensitive marker for assessing the extent of brain injury
Clinical significance of the corpus callosum in cerebral palsy

Energy Technology Data Exchange (ETDEWEB)

Lee, Eun Ja; Kim, Ji Chang [The Catholic University of Korea, Seoul (Korea, Republic of); Kim, Jong Chul [School of Medicine, Chungnam National University, Taejon (Korea, Republic of); And Others

2000-10-01

To evaluate, using magnetic resonance (MR) imaging, the clinal significance of the corpus callosum by measuring the size of various portions of the corpus callosum in children with cerebral palsy, and in paired controls. Fifty-two children (30 boys and 22 girls aged between six and 96 (median, 19) months) in whom cerebral palsy was clinically diagnosed underwent MR imaging. There were 23 term patients and 29 preterm, and the control group was selected by age and sex matching. Clinal subtypes of cerebral palsy were classified as hemiplegia (n=14), spastic diplegia (n=22), or spastic quadriplegia (n=16), and according to the severity of motor palsy, the condition was also classified as mild (n=26), moderate (n=13), or severe (n=13). In addition to the length and height of the corpus callosum, the thickness of its genu, body, transitional zone and splenium, as seen on midsagittal T1-weighted MR images, were also measured. Differences in the measured values of the two groups were statistically analysed and differences in the size of the corpus callosum according to the clinical severity and subtypes of cerebral palsy, and gestational age, were also assessed. Except for height, the measured values of the corpus callosum in patients with cerebral palsy were significantly less than those of the control group (p less than 0.05). Its size decreased according to the severity of motor palsy. Compared with term patients, the corpus callosum in preterm patients was considerably smaller (p less than 0.05). There was statistically significant correlation between the severity of motor palsy and the size of the corpus callosum. Quantitative evaluation of the corpus callosum might be a good indicator of neurologic prognosis, and a sensitive marker for assessing the extent of brain injury.
Comparative study on corpus development for Malay investment ...

African Journals Online (AJOL)

Comparative study on corpus development for Malay investment fraud detection in website. ... Journal of Fundamental and Applied Sciences ... The aim of this research is to develop a corpus for Malay investment fraud so that it can be used in ...
Text collections for evaluation of Russian morphological taggers

Directory of Open Access Journals (Sweden)

Lyashevskaya Olga

2017-12-01

Full Text Available The paper describes the preparation and development of the text collections within the framework of MorphoRuEval-2017 shared task, an evaluation campaign designed to stimulate development of the automatic morphological processing technologies for Russian. The main challenge for the organizers was to standardize all available Russian corpora with the manually verified high-quality tagging to a single format (Universal Dependencies CONLL-U. The sources of the data were the disambiguated subcorpus of the Russian National Corpus, SynTagRus, OpenCorpora.org data and GICR corpus with the resolved homonymy, all exhibiting different tagsets, rules for lemmatization, pipeline architecture, technical solutions and error systematicity. The collections includes both normative texts (the news and modern literature and more informal discourse (social media and spoken data, the texts are available under CC BY-NC-SA 3.0 license.
A tm Plug-In for Distributed Text Mining in R

Directory of Open Access Journals (Sweden)

Stefan Theussl

2012-11-01

Full Text Available R has gained explicit text mining support with the tm package enabling statisticians to answer many interesting research questions via statistical analysis or modeling of (text corpora. However, we typically face two challenges when analyzing large corpora: (1 the amount of data to be processed in a single machine is usually limited by the available main memory (i.e., RAM, and (2 the more data to be analyzed the higher the need for efficient procedures for calculating valuable results. Fortunately, adequate programming models like MapReduce facilitate parallelization of text mining tasks and allow for processing data sets beyond what would fit into memory by using a distributed file system possibly spanning over several machines, e.g., in a cluster of workstations. In this paper we present a plug-in package to tm called tm.plugin.dc implementing a distributed corpus class which can take advantage of the Hadoop MapReduce library for large scale text mining tasks. We show on the basis of an application in culturomics that we can efficiently handle data sets of significant size.
A Balanced and Representative Corpus: The Effects of Strict Corpus ...

African Journals Online (AJOL)

Theoretically the Northern Sotho language is made up of almost 30 dialects while practically it is not so, because the standard language was formed from very few of its dialects. As a result, even today the language has no corpus which is balanced or representative owing to the fact that almost all of the available corpora ...
Learner corpus profiles the case of Romanian learner English

CERN Document Server

Chitez, Madalina

2014-01-01

The first three chapters of the book offer relevant information on the new methodological approach, learner corpus profiling, and the exemplifying case, Romanian Learner English. The description of the Romanian Corpus of Learner English is also given special attention. The following three chapters include corpus-based frequency analyses of selected grammatical categories (articles, prepositions, genitives), combined with error analyses. In the concluding discussion, the book summarizes the features compiled as lexico-grammatical profiles.
Corpus callosum demyelination associated with acquired stuttering.

Science.gov (United States)

Decker, Barbara McElwee; Guitar, Barry; Solomon, Andrew

2018-04-21

Compared with developmental stuttering, adult onset acquired stuttering is rare. However, several case reports describe acquired stuttering and an association with callosal pathology. Interestingly, these cases share a neuroanatomical localisation also demonstrated in developmental stuttering. We present a case of adult onset acquired stuttering associated with inflammatory demyelination within the corpus callosum. This patient's disfluency improved after the initiation of immunomodulatory therapy. © BMJ Publishing Group Ltd (unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Disregarding the Corpus: Head-word and Sense Treatment in Shona Monolingual Lexicography*

Directory of Open Access Journals (Sweden)

Webster M. Mavhu

2011-10-01

Full Text Available
Abstract: With specific reference to Shona monolingual lexicography, this article discusses how corpus-based lexicographers might, in some instances, decide not strictly to adhere to the corpus when it comes to headword and sense treatment. The writer is a member of the African Languages Research Institute (ALRI, formerly known as the African Languages Lexical (ALLEX Project. ALRI is a nonfaculty interdisciplinary unit dedicated to research on and the development of African languages in Zimbabwe. The writer is part of the six-member team that compiled the now published Shona monolingual, synchronic, medium-sized and general-purpose dictionary Du-ramazwi Guru ReChiShona (2001. The article originates from the writer's experience of working on this dictionary. The article highlights the fact that being corpus-based does not necessarily imply being corpus-bound.
Keywords: CORPUS, CORPUS-BASED, FREQUENCY, HEADWORD, LEXICOGRAPHY, SENSE, SHONA, SLANG, SYNONYMS
Opsomming: Verontagsaming van die korpus: Trefwoord- en betekenisbe-handeling in die Sjona- eentalige leksikografie. Met spesifieke verwysing na die Sjona- eentalige leksikografie bespreek hierdie artikel hoe korpusgebaseerde leksikograwe in som-mige gevalle kan besluit om nie streng by die korpus te bly wanneer dit kom by trefwoord- en betekenisbehandeling nie. Die skrywer is 'n lid van die African Language Research Institute (AL-RI, vroeër bekend as die African Languages Lexical (ALLEX Project. ALRI is 'n niefakulteits- interdissiplinêre eenheid wat hom beywer vir navorsing oor en die ontwikkeling van die Afrikatale in Zimbabwe. Die skrywer is deel van 'n span van ses lede wat die reeds gepubliseerde Sjona- een-talige, sinchroniese, middelgroot en meerdoelige woordeboek Duramazwi Guru ReChiShona (2001 saamgestel het. Die artikel het uit die skrywer se ervaring van werk aan hierdie woordeboek ont-staan. Die artikel belig die feit dat korpusgebaseerdheid nie noodwendig
Translating children’s literature: some insights from corpus stylistics

Directory of Open Access Journals (Sweden)

Anna Čermáková

2018-01-01

Full Text Available In this paper I explore the potential of a corpus stylistic approach to the study of literary translation. The study focuses on translation of children’s literature with its specific constrains, and illustrates with two corpus linguistic techniques: keyword and cluster analysis — specific cases of repetition. So in a broader sense the paper discusses the phenomenon of repetition in different literary (stylistic traditions. These are illustrated by examples from two children’s classics aimed at two different age groups: the Harry Potter and the Winnie the Pooh books — and their translations into Czech. Various shifts in translation, especially in the translation of children’s literature, are often explained by the operation of so-called ‘translation universals’. Though ‘repetition’ as such does not belong to the commonly discussed set of translation universals, the stylistic norms opposing repetition seem to be a strong explanation for the translation shifts identified.

The open corpus challenge in eLearning

Directory of Open Access Journals (Sweden)

Mahantesh K. Pattanshetti

2018-03-01

Full Text Available Learning has transcended into a life-long endeavor in the information age. It is no longer restricted to confines of formal classrooms. Consequently, a student is not restricted to traditional learning resources like teachers, textbooks or printed content. Digital resources available on the Internet form a very significant component of self-learning. Copious volumes of learning resources without legal barriers to self-learning reside in digital repositories, educational institution portals and on numerous websites. Learners wishing to utilize the web for personalized learning are faced with a daunting array of content to wade through and select the suitable ones to fulfill his/her learning objectives. Therefore, it is not a question of availability; it is one of relevance and suitability. Typically, in addition to time constraints, learners lack the expertise to screen content for effective eLearning. Adaptive hypermedia systems (AHSs offer a path to harnessing this large volume of learning resources for personalized learning. This review paper provides a concise and coherent discussion about the evolution of AHSs along with the challenges that need to be addressed for effectively harnessing openly available educational resources referred to as open corpus resources (OCRs.
Insights from a Learner Corpus as Opposed to a Native Corpus about Cohesive Devices in an Academic Writing Context

Science.gov (United States)

Ersanli, Ceylan Yangin

2015-01-01

This study reports on the insights from an EFL learner corpora (a total of 151 essays and 49,690 words) generated from essays collected over the years in a Turkish state university from freshmen students enrolling in the Advanced Writing course. The comparison of cohesive devices in the non-native corpus (NNC) with those in a native corpus (NC)…
Infarction of the entire corpus callosum as a complication in subarachnoid hemorrhage: A case report

Directory of Open Access Journals (Sweden)

Satoru Takahashi, M.D.

2017-03-01

Full Text Available The corpus callosum is the major commissural pathway connecting the cerebral hemispheres. This pathway receives its blood supply from anterior communicating artery, pericallosal artery, and posterior pericallosal artery. However, in some cases, the entire corpus callosum is supplied by median callosal artery; thus, occlusion of this artery can lead to infarction of the entire corpus callosum. Few reports have described this type of infarction, and no reports after subarachnoid hemorrhage (SAH exist. Here, we report on a 42-year-old female who was diagnosed with SAH after two aneurysms were discovered in bifurcation of left anterior cerebral artery (A1-A2. After successful clipping was performed, the patient was alert and had no neurological deficits; moreover, the computed tomography images that were acquired after the operation showed no evidence of infarction. Nine days after admittance to the hospital, drowsiness and weakness of the left limbs with brain swelling appeared and decompressive hemi-craniectomy was performed. Diagnostic cerebral angiography revealed vasospasms in both anterior and middle cerebral arteries, thus fasudil hydrochloride was administered intra-arterially. While blood flow in all arteries improved, diffusion-weighted magnetic resonance imaging detected infarction along the entire length of the corpus callosum and in the medial region of the right frontal lobe. We believe this infarction was due to secondary ischemia of median callosal artery. This case reminded us of the anatomical variation wherein median callosal artery is the sole blood supply line for the corpus callosum and demonstrated that infarction of the entire corpus callosum is possible.
Computerized tomography of the traumatic hematoma in the corpus callosum

International Nuclear Information System (INIS)

Ogura, Koichiro; Yamamoto, Isao; Hara, Makoto; Suzuki, Yoshio; Nakane, Toshichi; Watanabe, Masao.

1982-01-01

The value of the computerized tomography (CT) in the diagnosis of the intracerebral hematoma has been well documented. However, there is little report about the CT findings of the hematoma of the corpus callosum. This report presents two cases of the traumatic hematoma in the corpus callosum and is discussed about their CT findings. Two patients, 52 year-old male and 40 year-old male, respectively, are the cases of blunt mechanical head trauma which were accompanied neither by skull fracture nor by scalp injury. In all these cases, the findings that hematoma occupied from the genu to the body of the corpus callosum were verified by surgery and the axial CT revealed the following two similar findings. First; the hematoma in the genu of the corpus callosum was shown as a cresent-shaped high density mass. This finding seems to be due to the following anatomical structure, that is, the genu of the corpus callosum is located just in front of the anterior horn of the lateral ventricles in the shape of the convex towards posteriorly. Second; as the midportion of the body of the corpus callosum tends to be appeared narrow in width between both lateral ventricles, the hematoma which extended from the genu towards the body of the corpus callosum was shown as a dumbbell-shaped high density mass. (author)
Corpus callosum dysgenesis and lipoma: embryologic and magnetic resonance imaging aspects

International Nuclear Information System (INIS)

Abreu Junior, Luiz de; Borri, Maria Lucia; Wolosker, Angela Maria Borri; Hartmann, Luiz Guilherme de Carvalho; Galvao Filho, Mario de Melo; D'Ippolito, Giuseppe

2005-01-01

The corpus callosum is the major system of association fibers that permits communication of both cerebral hemispheres. Magnetic resonance imaging has improved the study of brain malformations, including the corpus callosum dysgenesis. Lipoma is a common finding in the spectrum of corpus callosum dysgenesis. The purpose of these study was to review the embryologic events and the magnetic resonance imaging aspects related to the corpus callosum dysgenesis and to the formation of the related lipoma. (author)
Transpositions Within User-Posted YouTube Lyric Videos: A Corpus Study

Directory of Open Access Journals (Sweden)

Joseph Plazak

2016-07-01

Full Text Available There are many practical reasons why experiences of a given musical work tend to be heard repeatedly at the same pitch transposition level, especially recordings of musical works. Yet here, a corpus study is presented that challenges this very basic assumption of music perception. In 2011, an initial corpus of 100 user-posted YouTube videos was collected in order to investigate the prevalence of transposition and tempo alterations within these videos. Results found 42% of these videos contained nominal changes of pitch (36% and/or tempo (22%. Using the same methodology, a follow-up study was performed in 2015 and found only that 24% of user-posted videos contained these same alterations. Implications for these observations are discussed in light of musical communication models, YouTubeology, and absolute pitch memory.
Revisiting corpus creation and analysis tools for translation tasks

Directory of Open Access Journals (Sweden)

Claudio Fantinuoli

2016-04-01

Many translation scholars have proposed the use of corpora to allow professional translators to produce high quality texts which read like originals. Yet, the diffusion of this methodology has been modest, one reason being the fact that software for corpora analyses have been developed with the linguist in mind, which means that they are generally complex and cumbersome, offering many advanced features, but lacking the level of usability and the specific features that meet translators’ needs. To overcome this shortcoming, we have developed TranslatorBank, a free corpus creation and analysis tool designed for translation tasks. TranslatorBank supports the creation of specialized monolingual corpora from the web; it includes a concordancer with a query system similar to a search engine; it uses basic statistical measures to indicate the reliability of results; it accesses the original documents directly for more contextual information; it includes a statistical and linguistic terminology extraction utility to extract the relevant terminology of the domain and the typical collocations of a given term. Designed to be easy and intuitive to use, the tool may help translation students as well as professionals to increase their translation quality by adhering to the specific linguistic variety of the target text corpus.
Effect of an Ethanol Extract of Scutellaria baicalensis on Relaxation in Corpus Cavernosum Smooth Muscle

Directory of Open Access Journals (Sweden)

Xiang Li

2012-01-01

Full Text Available Aims of study. The aim of the present study was to investigate whether an ethanol extract of Scutellaria baicalensis (ESB relaxes penile corpus cavernosum muscle in organ bath experiments. Materials and methods. Changes in tension of cavernous smooth muscle strips were determined by penile strip chamber model and in penile perfusion model. Isolated endothelium-intact rabbit corpus cavernosum was precontracted with phenylephrine (PE and then treated with ESB. Results. ESB relaxed penile smooth muscle in a dose-dependent manner, and this was inhibited by pre-treatment with NG-nitro-l-arginine methyl ester (l-NAME, a nitric oxide (NO synthase inhibitor, and 1H-[1, 2, 4]-oxadiazolo-[4,3-α]-quinoxalin-1-one (ODQ, a soluble guanylyl cyclase (sGC inhibitor. ESB-induced relaxation was significantly attenuated by pretreatment with tetraethylammonium (TEA, a nonselective K+ channel blocker, and charybdotoxin, a selective Ca2+-dependent K+ channel inhibitor. ESB increased the cGMP levels of rabbit corpus cavernosum in a concentration-dependent manner without changes in cAMP levels. In a perfusion model of penile tissue, ESB also relaxed penile corpus cavernosum smooth muscle in a dose-dependent manner. Conclusion. Taken together, these results suggest that ESB relaxed rabbit cavernous smooth muscle via the NO/cGMP system and Ca2+-sensitive K+ channels in the corpus cavernosum.
Generation of silver standard concept annotations from biomedical texts with special relevance to phenotypes.

Directory of Open Access Journals (Sweden)

Anika Oellrich

Full Text Available Electronic health records and scientific articles possess differing linguistic characteristics that may impact the performance of natural language processing tools developed for one or the other. In this paper, we investigate the performance of four extant concept recognition tools: the clinical Text Analysis and Knowledge Extraction System (cTAKES, the National Center for Biomedical Ontology (NCBO Annotator, the Biomedical Concept Annotation System (BeCAS and MetaMap. Each of the four concept recognition systems is applied to four different corpora: the i2b2 corpus of clinical documents, a PubMed corpus of Medline abstracts, a clinical trails corpus and the ShARe/CLEF corpus. In addition, we assess the individual system performances with respect to one gold standard annotation set, available for the ShARe/CLEF corpus. Furthermore, we built a silver standard annotation set from the individual systems' output and assess the quality as well as the contribution of individual systems to the quality of the silver standard. Our results demonstrate that mainly the NCBO annotator and cTAKES contribute to the silver standard corpora (F1-measures in the range of 21% to 74% and their quality (best F1-measure of 33%, independent from the type of text investigated. While BeCAS and MetaMap can contribute to the precision of silver standard annotations (precision of up to 42%, the F1-measure drops when combined with NCBO Annotator and cTAKES due to a low recall. In conclusion, the performances of individual systems need to be improved independently from the text types, and the leveraging strategies to best take advantage of individual systems' annotations need to be revised. The textual content of the PubMed corpus, accession numbers for the clinical trials corpus, and assigned annotations of the four concept recognition systems as well as the generated silver standard annotation sets are available from http://purl.org/phenotype/resources. The textual content
Dans un corpus hybride : les messages twittés, l’intertextualité et la formule

Directory of Open Access Journals (Sweden)

Virone Daniela

2015-01-01

Full Text Available L’article propose une réflexion pratique et méthodologique sur l’exploitation d’un corpus de twittes, considéré comme un corpus complexe pour ses caractéristiques particulières (dont la présence des métadonnées et la possibilité de le mettre en relation avec des corpus plus traditionnels. Le modèle d’analyse quantitative et qualitative expérimenté sur le débat autour du mariage homosexuel en France en 2013 et en particulier sur la formule « mariage pour tous », ici mot-dièse et formule, veut poser les bases pour de nouvelles méthodes d’exploitation des données en analyse du discours.
Corpus-Based Research and Pedagogy in EAP: From Lexis to Genre

Science.gov (United States)

Flowerdew, Lynne

2015-01-01

This plenary paper showcases current corpus-based research on written academic English, illustrating the tight links that exist between corpus research and pedagogic applications. I first explicate Sinclair's concept of the "lexical approach", which underpins much corpus research and pedagogy. I then discuss studies which focus on…
Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages

Directory of Open Access Journals (Sweden)

Sadaoki Furui

2009-01-01

Full Text Available Text corpus size is an important issue when building a language model (LM. This is a particularly important issue for languages where little data is available. This paper introduces an LM adaptation technique to improve an LM built using a small amount of task-dependent text with the help of a machine-translated text corpus. Icelandic speech recognition experiments were performed using data, machine translated (MT from English to Icelandic on a word-by-word and sentence-by-sentence basis. LM interpolation using the baseline LM and an LM built from either word-by-word or sentence-by-sentence translated text reduced the word error rate significantly when manually obtained utterances used as a baseline were very sparse.
Abnormal white matter integrity in the corpus callosum among smokers: tract-based spatial statistics.

Directory of Open Access Journals (Sweden)

Wakako Umene-Nakano

Full Text Available In the present study, we aimed to investigate the difference in white matter between smokers and nonsmokers. In addition, we examined relationships between white matter integrity and nicotine dependence parameters in smoking subjects. Nineteen male smokers were enrolled in this study. Eighteen age-matched non-smokers with no current or past psychiatric history were included as controls. Diffusion tensor imaging scans were performed, and the analysis was conducted using a tract-based special statistics approach. Compared with nonsmokers, smokers exhibited a significant decrease in fractional anisotropy (FA throughout the whole corpus callosum. There were no significant differences in radial diffusivity or axial diffusivity between the two groups. There was a significant negative correlation between FA in the whole corpus callosum and the amount of tobacco use (cigarettes/day; R = - 0.580, p = 0.023. These results suggest that the corpus callosum may be one of the key areas influenced by chronic smoking.
El discurso de la ciencia y la tecnología en la prensa escrita chilena: aproximación al corpus DICIPE-2004 O Discurso da ciência e da tecnologia da imprensa escrita chilena: aproximação ao corpus DICIPE-2004 The discourse of science and technology in the chilean press: an approximation to the DICIPE-2004 corpus

Directory of Open Access Journals (Sweden)

Giovanni Parodi

2007-01-01

Full Text Available La comunicación de la ciencia y la tecnología (C&T ha cobrado gran relevancia en los últimos años, inicialmente a través de artículos científicos y actualmente através de los medios masivos de comunicación. En este contexto, los objetivos de este artículo son: a determinar y cuantificar, en términos comparativos, el espacio que un grupo de cinco periódicos chilenos destinan a la divulgación de temáticas de C&T; b determinar los tipos de textos periodísticos a través de los cuales se divulga la C&T en la prensa escrita; c identificar los macrotemas, subtemas y disciplinas presentes en el corpus. El corpus fue recolectado durante tres meses y quedó formado por 411 textos. Se calculó y normalizó la ocurrencia de textos y palabras, los tipos textuales, los macrotemas, temas y disciplinas en las cuales se inscribe cada texto. Los hallazgos muestran, entre otros, que la divulgación de la C&T ocupa en promedio un 1% de lo que se publica en estos cinco periódicos y que, mayoritariamente, predominan textos relacionados con ciencia médicas, astronomía y astrofísica y ciencias de la vida.A comunicação na ciência e na tecnologia (C&T assumiu grande relevância nos últimos anos, inicialmente através de artigos e posteriormente através dos meios de comunicação de massa. Neste contexto, os objetivos deste artigo são: a determinar e quantificar, em termos comparativos, o espaço que um grupo de jornais chilenos destina à divulgação de temas de C&T; b determinar os tipos de texto jornalpisticos através dos quais se divulga a C & T na imprensa escrita; c identificar os macrotemas, subtemas e disciplinas presentes no corpus. O corpus foi coletado durante três meses e é formado por 411 textos. Foram calculados e normatizados as ocorrências de textos e palavras, os tipo de textos, os macrotemas, temas e disciplinas nas quais se insere cada texto. Os achados mostram, entre outros, que a divulgação de C & T ocupa, em média, 1
UNITS OF MEASUREMENT: ORAL TRADITION, TRANSLATION STUDIES AND CORPUS LINGUISTICS

Directory of Open Access Journals (Sweden)

John ZEMKE

2017-06-01

Full Text Available The study of the world’s verbal arts offers an opportunity to consider ways that computational analysis and modeling of narratives may lead to new understandings of how they are constructed, their dynamics and relationships. Similarly, as corpus linguistics operations must define metrics, it offers an occasion to review basic interpretive concepts such as “units of analysis, context, and genre." My essay begins with an admittedly cursory overview from a novice perspective of what capabilities corpus linguistics currently possesses for the analysis and modeling of narratives. Consideration is given to the epistemological issue in the social sciences with the positivistic prescription or empiricist description of units of analysis and the potential pitfalls or advantages corpus linguistics encounters in searching for adequate equivalent terms. This review leads naturally to reflection on the crucial determinative action of context on meaning and the extent to which current computational interfaces are able to account for and integrate into global analysis of linguistic and performance dimensions such as performer, intonation, gesture, diction, idioms and figurative language, setting, audience, time, and occasion. As a tentative conclusion from this review, it can be stated that artificial intelligence for modeling narratives or devising narrative algorithms must develop capacities to account for performance dimensions in order to fulfill their analytical potential.
Big Data, Big Questions: A Closer Look at the Yale–Classical Archives Corpus (c. 2015

Directory of Open Access Journals (Sweden)

Trevor deClerq

2016-07-01

Full Text Available This paper responds to the article by Christopher White and Ian Quinn, in which these authors introduce the Yale-Classical Archives Corpus (YCAC. I begin by making some general observations about the corpus, especially with regard to ramifications of the keyboard-performance origins of many pieces in the original MIDI collection. I then assess the accuracy of the scale-degree and local-key fields in the database, which were generated by the Bellman-Budge key-finding algorithm. I point out that some of the inaccuracies from the key-finding algorithm's output may influence the results we obtain from statistical studies of this corpus. I also offer an alternative analysis to the authors' finding that the ratio of V7 to V chords increases over time in common-practice music. Specifically, I conjecture that this finding may be the result of (or related to increasing instrumental resources over time. I close with some recommendations for future versions of the corpus, such as enabling end users to help repair transcription errors as well as offer ground truths for harmonic analyses and key area information.
BrAgriNews: Um Corpus Temporal-Causal (Português-Brasileiro para a Agricultura

Directory of Open Access Journals (Sweden)

Brett Drury

2017-07-01

Full Text Available Recentemente tem havido um aumento no interesse, tanto no meio acadêmico quanto na indústria, em aplicações de aprendizagem de máquina e técnicas de inteligência artificial relacionadas com problemas agrícolas. Mineração de texto e técnicas relacionadas com o processamento da língua natural, raramente foram usadas para resolver problemas agrícolas, e muito menos para a língua portuguesa. É possível que um dos fatores que influenciam a escassez no uso técnicas de mineração de texto, para analisar textos em português e resolver problemas agrícolas, pode ser devido à falta de um corpus anotado livremente disponível. Para colmatar a falta de um corpus agrícola em língua portuguesa, estamos liberando um recurso em português-brasileiro voltado para agricultura, descrito neste artigo. O corpus abrange um período parcialmente contínuo de tempo entre 1996 e 2016, consistindo de notícias em português-brasileiro que foram anotadas com o seguinte tipo de informação: causal, sentimento, entidades nomeadas que incluem expressões temporais. O corpus tem recursos adicionais como: treebank, listas de termos frequentes (sem stop-words: unigramas, bigramas e trigramas, bem como palavras ou frases que foram identificados por jornalistas como de domínio específico. Espera-se que a liberação do corpus estimule a adoção da mineração de texto na agricultura na comunidade de pesquisa lusófona.
Google and beyond : web-as-corpus methodologies for translators

OpenAIRE

Ferraresi, Adriano

2009-01-01

Aquest article fa un repàs als plantejaments actuals sobre l'ús del web com a corpus lingüístic i emfatitza els avantatges (així com els inevitables riscos) que aquests poden introduir en el treball del traductor. Per tal d'il•lustrar aquest punt, es mostra un exemple de les diferents maneres en què un corpus derivat del web es pot aplicar profitosament a una tasca de traducció especialitzada.. Este artículo estudia los planteamientos actuales sobre el uso de la web como corpus lingüístico...
Chinese students' writing in English implications from a corpus-driven study

CERN Document Server

Leedham, Maria

2014-01-01

Chinese students are the largest international student group in UK universities today, yet little is known about their undergraduate writing and the challenges they face. Drawing on the British Academic Written English corpus - a large corpus of proficient undergraduate student writing collected in the UK in the early 2000s - this study explores Chinese students' written assignments in English in a range of university disciplines, contrasting these with assignments from British students. The study is supplemented by questionnaire and interview datasets with discipline lecturers, writing tutors and students, and provides a comprehensive picture of the Chinese student writer today. Theoretically framed through work within academic literacies and lexical priming, the author seeks to explore what we know about Chinese students' writing and to extend these findings to undergraduate writing more generally. In a globalized educational environment, it is important for educators to understand differences in writing st...
Intergeneric Derivation: on the Genealogy of an LSP text

DEFF Research Database (Denmark)

Askehave, Inger; Kastberg, Peter

2001-01-01

is derived from another text or to establish what aspects of the text have been derived, one must gain control over external variables that are not easily controllable. In our approach, we suggest a method that - while controlling external variables - is designed to isolate a suitable text corpus. Contrary...

Gender-based differences in the shape of the human corpus callosum are associated with allometric variations

Science.gov (United States)

Bruner, Emiliano; de la Cuétara, José Manuel; Colom, Roberto; Martin-Loeches, Manuel

2012-01-01

The corpus callosum displays considerable morphological variability between individuals. Although some characteristics are thought to differ between male and female brains, there is no agreement regarding the source of this variation. Biomedical imaging and geometric morphometrics have provided tools to investigate shape and size variation in terms of integration and correlation. Here we analyze variations at the midsagittal outline of the corpus callosum in a sample of 102 young adults in order to describe and quantify the pattern of covariation associated with its morphology. Our results suggest that the shape of the corpus callosum is characterized by low levels of morphological integration, which explains the large variability. In larger brains, a minor allometric component involves a relative reduction of the splenium. Small differences between males and?females are associated with this allometric pattern, induced primarily by size variation rather than gender-specific characteristics. PMID:22296183
DEEP LEARNING MODEL FOR BILINGUAL SENTIMENT CLASSIFICATION OF SHORT TEXTS

Directory of Open Access Journals (Sweden)

Y. B. Abdullin

2017-01-01

Full Text Available Sentiment analysis of short texts such as Twitter messages and comments in news portals is challenging due to the lack of contextual information. We propose a deep neural network model that uses bilingual word embeddings to effectively solve sentiment classification problem for a given pair of languages. We apply our approach to two corpora of two different language pairs: English-Russian and Russian-Kazakh. We show how to train a classifier in one language and predict in another. Our approach achieves 73% accuracy for English and 74% accuracy for Russian. For Kazakh sentiment analysis, we propose a baseline method, that achieves 60% accuracy; and a method to learn bilingual embeddings from a large unlabeled corpus using a bilingual word pairs.
On Intertext in Chemotherapy: An Ethnography of Text in Medical Practice

DEFF Research Database (Denmark)

Christensen, Lars Rune

2016-01-01

Building on literary theory and data from a field study of text in chemotherapy, this article introduces the concept of intertext and the associated concepts of corpus and intertextuality to CSCW. It shows that the ensemble of documents used and produced in practice can be said to form a corpus......, including the complementary type, the intratextual type and the mediated type. In this manner the article aims to systematically conceptualise cooperative actors’ engagement with text in text-laden practices. The approach is arguably novel and beneficial to CSCW. The article also contributes...... with a discussion of computer enabling the activity of creating intertext. This is a key concern for cooperative work as intertext is central to text-centric work practices such as healthcare....
Use of "Google Scholar" in Corpus-Driven EAP Research

Science.gov (United States)

Brezina, Vaclav

2012-01-01

This primarily methodological article makes a proposition for linguistic exploration of textual resources available through the "Google Scholar" search engine. These resources ("Google Scholar virtual corpus") are significantly larger than any existing corpus of academic writing. "Google Scholar", however, was not designed for linguistic searches…
Using Semantic Linking to Understand Persons’ Networks Extracted from Text

Directory of Open Access Journals (Sweden)

Alessio Palmero Aprosio

2017-11-01

Full Text Available In this work, we describe a methodology to interpret large persons’ networks extracted from text by classifying cliques using the DBpedia ontology. The approach relies on a combination of NLP, Semantic web technologies, and network analysis. The classification methodology that first starts from single nodes and then generalizes to cliques is effective in terms of performance and is able to deal also with nodes that are not linked to Wikipedia. The gold standard manually developed for evaluation shows that groups of co-occurring entities share in most of the cases a category that can be automatically assigned. This holds for both languages considered in this study. The outcome of this work may be of interest to enhance the readability of large networks and to provide an additional semantic layer on top of cliques. This would greatly help humanities scholars when dealing with large amounts of textual data that need to be interpreted or categorized. Furthermore, it represents an unsupervised approach to automatically extend DBpedia starting from a corpus.
DutchParl: A corpus of parliamentary documents in Dutch

NARCIS (Netherlands)

Marx, M.; Schuth, A.

2010-01-01

A corpus called DutchParl is created which aims to contain all digitally available parliamentary documents written in the Dutch language. The first version of DutchParl contains documents from the parliaments of The Netherlands, Flanders and Belgium. The corpus is divided along three dimensions: per
Designing a Lexical Database for a Combined Use of Corpus Annotation and Dictionary Editing

DEFF Research Database (Denmark)

Kristoffersen, Jette Hedegaard; Troelsgård, Thomas; Langer, Gabriele

2016-01-01

In a combined corpus-dictionary project, you would need one lexical database that could serve as a shared “backbone” for both corpus annotation and dictionary editing, but it is not that easy to define a database structure that applies satisfactorily to both these purposes. In this paper, we...... will exemplify the problem and present ideas on how to model structures in a lexical database that facilitate corpus annotation as well as dictionary editing. The paper is a joint work between the DGS Corpus Project and the DTS Dictionary Project. The two projects come from opposite sides of the spectrum (one...... adjusting a lexical database grown from dictionary making for corpus annotating, one building a lexical database in parallel with corpus annotation and editing a corpus-based dictionary), and we will consider requirements and feasible structures for a database that can serve both corpus and dictionary....
ENVIRONMENTS and EOL: identification of Environment Ontology terms in text and the annotation of the Encyclopedia of Life.

Science.gov (United States)

Pafilis, Evangelos; Frankild, Sune P; Schnetzer, Julia; Fanini, Lucia; Faulwetter, Sarah; Pavloudi, Christina; Vasileiadou, Katerina; Leary, Patrick; Hammock, Jennifer; Schulz, Katja; Parr, Cynthia Sims; Arvanitidis, Christos; Jensen, Lars Juhl

2015-06-01

The association of organisms to their environments is a key issue in exploring biodiversity patterns. This knowledge has traditionally been scattered, but textual descriptions of taxa and their habitats are now being consolidated in centralized resources. However, structured annotations are needed to facilitate large-scale analyses. Therefore, we developed ENVIRONMENTS, a fast dictionary-based tagger capable of identifying Environment Ontology (ENVO) terms in text. We evaluate the accuracy of the tagger on a new manually curated corpus of 600 Encyclopedia of Life (EOL) species pages. We use the tagger to associate taxa with environments by tagging EOL text content monthly, and integrate the results into the EOL to disseminate them to a broad audience of users. The software and the corpus are available under the open-source BSD and the CC-BY-NC-SA 3.0 licenses, respectively, at http://environments.hcmr.gr. © The Author 2015. Published by Oxford University Press.
L'analyse des corpus multimodaux en ligne : état des lieux et perspectives

Directory of Open Access Journals (Sweden)

Develotte Christine

2012-07-01

Full Text Available Partant de l’ouvrage "Texte et ordinateur. L’écriture réinventée ?" de Jacques Anis (1998 nous cherchons à montrer la trajectoire des travaux français en sciences du langage sur les corpus médiés par la technologie jusqu’à aujourd’hui. La communication en ligne renvoie à des formes diverses selon qu’il s’agit de la production de textes fixes (par exemple, sites Web, courriels ou de formes plutôt centrées sur les processus d’interaction et de communication (par exemple, chat, visioconférence susceptibles donc d’être étudiées tant du point de vue de l’analyse du discours que de celui de l’analyse conversationnelle. Nous nous proposons de montrer, dans cet article, dans quelle mesure les deux traditions en sciences du langage ont trouvé matière à exploiter ces corpus en ligne en empiétant, parfois, l’une comme l’autre, sur leurs « territoires » respectifs. Dans cette perspective, nous commençons par mettre au jour l’apport des chercheurs revendiquant leur appartenance à l’analyse du discours, puis celui des chercheurs relevant de l’analyse des interactions et nous montrons les zones de tuilage entre les deux courants. Dans une dernière partie, nous nous intéressons aux défis juridiques, techniques et épistémologiques que doit relever le linguiste qui cherche à étudier des corpus multimodaux en ligne qui prennent des formes de plus en plus sophistiquées et complexes.
The MR findings of the corpus callosum of normal young volunteers

International Nuclear Information System (INIS)

Okamoto, Kouichirou; Ito, Jusuke; Tokiguchi, Susumu

1990-01-01

The size and shape of the corpus callosum of twenty seven normal young volunteers (age 18-31 years, 17 men and 10 women) were investigated using a superconducting high field (1.5 T) MRI unit. The length of the corpus callosum was 71.1±5.1 mm (mean±S.D.) and the height was 24.9±2.1 mm. The length ratio of the corpus callosum to the brain was 43.9±2.3% with the ratio of the height 25.0±2.3%. The callosal index (height/length) was 35.4±2.9%. The area of the corpus callosum in the midsagittal plane was 681.4±93.6 mm 2 (min. 563 mm 2 to max. 902 mm 2 ). We divided the corpus callosum into three segments: rostrum and genu; anterior and posterior trunks; splenium. Each part accounts for one third of the total area of the corpus callosum. The genu and splenium were generally equal in thickness. The minimal thickness of the trunk was 3 mm with the maximal one 9 mm. The posterior trunk was never thicker than the anterior one. The posterior part of the posterior trunk showed thinning and concavity in almost all cases. So-called impressio corporis callosi was observed in 12 cases (44.4%). Thirteen cases (48.1%) showed a shallow concave configuration at the anterior dorsal surface of the corpus callosum. Six cases of these were thought to be due to compression by the pericallosal artery. This finding was not detected in the posterior portion of the corpus callosum. This concavity was also seen in infants. The thinning of the posterior part of the posterior trunk was seen after the development of the splenium, but the concave configuration at the anterior dorsal surface of the corpus callosum may be encountered before the full development of the genu and splenium. (author)
Language Planning: Corpus Planning.

Science.gov (United States)

Baldauf, Richard B., Jr.

1989-01-01

Focuses on the historical and sociolinguistic studies that illuminate corpus planning processes. These processes are broken down and discussed under two categories: those related to the establishment of norms, referred to as codification, and those related to the extension of the linguistic functions of language, referred to as elaboration. (60…
Generation of silver standard concept annotations from biomedical texts with special relevance to phenotypes.

Science.gov (United States)

Oellrich, Anika; Collier, Nigel; Smedley, Damian; Groza, Tudor

2015-01-01

Electronic health records and scientific articles possess differing linguistic characteristics that may impact the performance of natural language processing tools developed for one or the other. In this paper, we investigate the performance of four extant concept recognition tools: the clinical Text Analysis and Knowledge Extraction System (cTAKES), the National Center for Biomedical Ontology (NCBO) Annotator, the Biomedical Concept Annotation System (BeCAS) and MetaMap. Each of the four concept recognition systems is applied to four different corpora: the i2b2 corpus of clinical documents, a PubMed corpus of Medline abstracts, a clinical trails corpus and the ShARe/CLEF corpus. In addition, we assess the individual system performances with respect to one gold standard annotation set, available for the ShARe/CLEF corpus. Furthermore, we built a silver standard annotation set from the individual systems' output and assess the quality as well as the contribution of individual systems to the quality of the silver standard. Our results demonstrate that mainly the NCBO annotator and cTAKES contribute to the silver standard corpora (F1-measures in the range of 21% to 74%) and their quality (best F1-measure of 33%), independent from the type of text investigated. While BeCAS and MetaMap can contribute to the precision of silver standard annotations (precision of up to 42%), the F1-measure drops when combined with NCBO Annotator and cTAKES due to a low recall. In conclusion, the performances of individual systems need to be improved independently from the text types, and the leveraging strategies to best take advantage of individual systems' annotations need to be revised. The textual content of the PubMed corpus, accession numbers for the clinical trials corpus, and assigned annotations of the four concept recognition systems as well as the generated silver standard annotation sets are available from http://purl.org/phenotype/resources. The textual content of the Sh
Polyethylene glycol restores axonal conduction after corpus callosum transection.

Science.gov (United States)

Bamba, Ravinder; Riley, D Colton; Boyer, Richard B; Pollins, Alonda C; Shack, R Bruce; Thayer, Wesley P

2017-05-01

Polyethylene glycol (PEG) has been shown to restore axonal continuity after peripheral nerve transection in animal models. We hypothesized that PEG can also restore axonal continuity in the central nervous system. In this current experiment, coronal sectioning of the brains of Sprague-Dawley rats was performed after animal sacrifice. 3Brain high-resolution microelectrode arrays (MEA) were used to measure mean firing rate (MFR) and peak amplitude across the corpus callosum of the ex-vivo brain slices. The corpus callosum was subsequently transected and repeated measurements were performed. The cut ends of the corpus callosum were still apposite at this time. A PEG solution was applied to the injury site and repeated measurements were performed. MEA measurements showed that PEG was capable of restoring electrophysiology signaling after transection of central nerves. Before injury, the average MFRs at the ipsilateral, midline, and contralateral corpus callosum were 0.76, 0.66, and 0.65 spikes/second, respectively, and the average peak amplitudes were 69.79, 58.68, and 49.60 μV, respectively. After injury, the average MFRs were 0.71, 0.14, and 0.25 spikes/second, respectively and peak amplitudes were 52.11, 8.98, and 16.09 μV, respectively. After application of PEG, there were spikes in MFR and peak amplitude at the injury site and contralaterally. The average MFRs were 0.75, 0.55, and 0.47 spikes/second at the ipsilateral, midline, and contralateral corpus callosum, respectively and peak amplitudes were 59.44, 45.33, 40.02 μV, respectively. There were statistically differences in the average MFRs and peak amplitudes between the midline and non-midline corpus callosum groups ( P < 0.01, P < 0.05). These findings suggest that PEG restores axonal conduction between severed central nerves, potentially representing axonal fusion.
Parenting, corpus callosum, and executive function in preschool children.

Science.gov (United States)

Kok, Rianne; Lucassen, Nicole; Bakermans-Kranenburg, Marian J; van IJzendoorn, Marinus H; Ghassabian, Akhgar; Roza, Sabine J; Govaert, Paul; Jaddoe, Vincent W; Hofman, Albert; Verhulst, Frank C; Tiemeier, Henning

2014-01-01

In this longitudinal population-based study (N = 544), we investigated whether early parenting and corpus callosum length predict child executive function abilities at 4 years of age. The length of the corpus callosum in infancy was measured using postnatal cranial ultrasounds at 6 weeks of age. At 3 years, two aspects of parenting were observed: maternal sensitivity during a teaching task and maternal discipline style during a discipline task. Parents rated executive function problems at 4 years of age in five domains of inhibition, shifting, emotional control, working memory, and planning/organizing, using the Behavior Rating Inventory of Executive Function-Preschool Version. Maternal sensitivity predicted less executive function problems at preschool age. A significant interaction was found between corpus callosum length in infancy and maternal use of positive discipline to determine child inhibition problems: The association between a relatively shorter corpus callosum in infancy and child inhibition problems was reduced in children who experienced more positive discipline. Our results point to the buffering potential of positive parenting for children with biological vulnerability.
From Business Corpus to Business Lexicon*

Directory of Open Access Journals (Sweden)

Li Lan

2011-10-01

Full Text Available
Abstract: Language corpora are now indispensable to dictionary compilation. They help broaden the role of the dictionary from standardizing the vocabulary to recording a language. The trilingual corpus generated by the Hong Kong Polytechnic University gives a record of business languages used in Hong Kong. It differs from other corpora in that (1 it includes English, Chinese and Japanese; (2 it shows local characteristics; and (3 it focuses on a specific area (financial services, including banking, accounting, auditing, insurance and investment. The paper discusses various issues of setting up a tricorpus, and how to make full use of the data to generate a trilingual lexicon.
Keywords: MULTILINGUAL, SPECIAL PURPOSE, CORPUS, LEXICON
Opsomming: Van sakekorpus tot sakeleksikon. Taalkorpora is tans onontbeerlik virdie samestelling van woordeboeke. Hulle help om die rol van die woordeboek uit te brei vanaf diestandaardisering van die woordeskat tot die optekening van ‘n taal. Die drietalige korpus wat deurdie Hongkongse Politegniese Universiteit ontwikkel is, verskaf ‘n opgawe van die saketale wat inHongkong gebruik word. Dit verskil van ander korpora deurdat (1 dit Engels, Chinees and Japaneesinsluit; (2 dit plaaslike eienskappe vertoon; en (3 dit op 'n spesifieke gebied (finansiële dienste,insluitende bankwese, rekeningkunde, ouditering, versekering en belegging fokus. Die artikelbespreek verskillende aspekte van die totstandbrenging van 'n drietalige korpus, en hoe om vollegebruik te maak van die data om 'n drietalige leksikon te genereer.
Sleutelwoorde: MEERTALIG, SPESIALE DOEL, KORPUS, LEKSIKON
Sobre la natura dels estats. Una revisió basada en corpus

Directory of Open Access Journals (Sweden)

Marta Coll-florit

2008-01-01

Full Text Available This paper aims to offer a new approach to the aspectual category of states based on Catalan data extracted from corpus. The goal is twofold: firstly, to point out that states constitute a gradual category; and secondly, to highlight that syntactic variability within the stative predicates category receives a more understandable and clear explanation if the different possibilities of situation conceptualization are taken into account.
Corpus callosum thickness in children: an MR pattern-recognition approach on the midsagittal image

Energy Technology Data Exchange (ETDEWEB)

Andronikou, Savvas; Pillay, Tanyia; Gabuza, Lungile; Mahomed, Nasreen; Naidoo, Jaishree; Tebogo Hlabangana, Linda [University of the Witwatersrand, Radiology Department, Faculty of Health Sciences, Johannesburg (South Africa); Du Plessis, Vicci [University of KwaZulu-Natal, Radiology Department, Faculty of Health Sciences, Durban (South Africa); Prabhu, Sanjay P. [Harvard Medical School, Department of Radiology, Boston Children' s Hospital, Boston, MA (United States)

2014-08-31

Thickening of the corpus callosum is an important feature of development, whereas thinning of the corpus callosum can be the result of a number of diseases that affect development or cause destruction of the corpus callosum. Corpus callosum thickness reflects the volume of the hemispheres and responds to changes through direct effects or through Wallerian degeneration. It is therefore not only important to evaluate the morphology of the corpus callosum for congenital anomalies but also to evaluate the thickness of specific components or the whole corpus callosum in association with other findings. The goal of this pictorial review is raise awareness that the thickness of the corpus callosum can be a useful feature of pathology in pediatric central nervous system disease and must be considered in the context of the stage of development of a child. Thinning of the corpus callosum can be primary or secondary, and generalized or focal. Primary thinning is caused by abnormal or failed myelination related to the hypomyelinating leukoencephalopathies, metabolic disorders affecting white matter, and microcephaly. Secondary thinning of the corpus callosum can be caused by diffuse injury such as hypoxic-ischemic encephalopathy, human immunodeficiency virus (HIV) encephalopathy, hydrocephalus, dysmyelinating conditions and demyelinating conditions. Focal disturbance of formation or focal injury also causes localized thinning, e.g., callosal dysgenesis, metabolic disorders with localized effects, hypoglycemia, white matter injury of prematurity, HIV-related atrophy, infarction and vasculitis, trauma and toxins. The corpus callosum might be too thick because of a primary disorder in which the corpus callosum finding is essential to diagnosis; abnormal thickening can also be secondary to inflammation, infection and trauma. (orig.)
Corpus callosum thickness in children: an MR pattern-recognition approach on the midsagittal image

International Nuclear Information System (INIS)

Andronikou, Savvas; Pillay, Tanyia; Gabuza, Lungile; Mahomed, Nasreen; Naidoo, Jaishree; Tebogo Hlabangana, Linda; Du Plessis, Vicci; Prabhu, Sanjay P.

2015-01-01

Thickening of the corpus callosum is an important feature of development, whereas thinning of the corpus callosum can be the result of a number of diseases that affect development or cause destruction of the corpus callosum. Corpus callosum thickness reflects the volume of the hemispheres and responds to changes through direct effects or through Wallerian degeneration. It is therefore not only important to evaluate the morphology of the corpus callosum for congenital anomalies but also to evaluate the thickness of specific components or the whole corpus callosum in association with other findings. The goal of this pictorial review is raise awareness that the thickness of the corpus callosum can be a useful feature of pathology in pediatric central nervous system disease and must be considered in the context of the stage of development of a child. Thinning of the corpus callosum can be primary or secondary, and generalized or focal. Primary thinning is caused by abnormal or failed myelination related to the hypomyelinating leukoencephalopathies, metabolic disorders affecting white matter, and microcephaly. Secondary thinning of the corpus callosum can be caused by diffuse injury such as hypoxic-ischemic encephalopathy, human immunodeficiency virus (HIV) encephalopathy, hydrocephalus, dysmyelinating conditions and demyelinating conditions. Focal disturbance of formation or focal injury also causes localized thinning, e.g., callosal dysgenesis, metabolic disorders with localized effects, hypoglycemia, white matter injury of prematurity, HIV-related atrophy, infarction and vasculitis, trauma and toxins. The corpus callosum might be too thick because of a primary disorder in which the corpus callosum finding is essential to diagnosis; abnormal thickening can also be secondary to inflammation, infection and trauma. (orig.)
Diffusion tensor analysis of corpus callosum in progressive supranuclear palsy

International Nuclear Information System (INIS)

Ito, Shoichi; Makino, Takahiro; Shirai, Wakako; Hattori, Takamichi

2008-01-01

Progressive supranuclear palsy (PSP) is a neurodegenerative disease featuring parkinsonism, supranuclear ophthalmoplegia, dysphagia, and frontal lobe dysfunction. The corpus callosum which consists of many commissure fibers probably reflects cerebral cortical function. Several previous reports showed atrophy or diffusion abnormalities of anterior corpus callosum in PSP patients, but partitioning method used in these studies was based on data obtained in nonhuman primates. In this study, we performed a diffusion tensor analysis using a new partitioning method for the human corpus callosum. Seven consecutive patients with PSP were compared with 29 age-matched patients with Parkinson's Disease (PD) and 19 age-matched healthy control subjects. All subjects underwent diffusion tensor magnetic resonance imaging, and the corpus callosum was partitioned into five areas on the mid-sagittal plane according to a recently established topography of human corpus callosum (CC1-prefrontal area, CC2-premotor and supplementary motor area, CC3-motor area, CC4-sensory area, CC5-parietal, temporal, and occipital area). Fractional anisotropy (FA) and apparent diffusion coefficient (ADC) were measured in each area and differences between groups were analyzed. In the PSP group, FA values were significantly decreased in CC1 and CC2, and ADC values were significantly increased in CC1 and CC2. Receiver operating characteristic analysis showed excellent reliability of FA and ADC analyses of CC1 for differentiating PSP from PD. The anterior corpus callosum corresponding to the prefrontal, premotor, and supplementary motor cortices is affected in PSP patients. This analysis can be an additional test for further confirmation of the diagnosis of PSP
Diffusion tensor analysis of corpus callosum in progressive supranuclear palsy

Energy Technology Data Exchange (ETDEWEB)

Ito, Shoichi; Makino, Takahiro; Shirai, Wakako; Hattori, Takamichi [Department of Neurology, Graduate School of Medicine, Chiba University (Japan)

2008-11-15

Progressive supranuclear palsy (PSP) is a neurodegenerative disease featuring parkinsonism, supranuclear ophthalmoplegia, dysphagia, and frontal lobe dysfunction. The corpus callosum which consists of many commissure fibers probably reflects cerebral cortical function. Several previous reports showed atrophy or diffusion abnormalities of anterior corpus callosum in PSP patients, but partitioning method used in these studies was based on data obtained in nonhuman primates. In this study, we performed a diffusion tensor analysis using a new partitioning method for the human corpus callosum. Seven consecutive patients with PSP were compared with 29 age-matched patients with Parkinson's Disease (PD) and 19 age-matched healthy control subjects. All subjects underwent diffusion tensor magnetic resonance imaging, and the corpus callosum was partitioned into five areas on the mid-sagittal plane according to a recently established topography of human corpus callosum (CC1-prefrontal area, CC2-premotor and supplementary motor area, CC3-motor area, CC4-sensory area, CC5-parietal, temporal, and occipital area). Fractional anisotropy (FA) and apparent diffusion coefficient (ADC) were measured in each area and differences between groups were analyzed. In the PSP group, FA values were significantly decreased in CC1 and CC2, and ADC values were significantly increased in CC1 and CC2. Receiver operating characteristic analysis showed excellent reliability of FA and ADC analyses of CC1 for differentiating PSP from PD. The anterior corpus callosum corresponding to the prefrontal, premotor, and supplementary motor cortices is affected in PSP patients. This analysis can be an additional test for further confirmation of the diagnosis of PSP.

MRI Findings of Coexistence of Ectopic Neurohypophysis, Corpus Callosum Dysgenesis, and Periventricular Neuronal Heterotopia

Directory of Open Access Journals (Sweden)

Harun Arslan

2014-01-01

Full Text Available Ectopic neurohypophysis is a pituitary gland abnormality, which can accompany growth hormone deficiency associated with dwarfism. Here we present magnetic resonance imaging (MRI findings of a rare case of ectopic neurohypophysis, corpus callosum dysgenesis, and periventricular neuronal heterotopia coexisting, with a review of the literature.
New Advances in Corpus-based Lexicography*

Directory of Open Access Journals (Sweden)

Arvi Hurskainen

2011-10-01

Full Text Available
Abstract: This article presents various approaches used in corpus-based computational lexico-graphy. A claim is made that in order for computational lexicography to be efficient, precise and comprehensive, it should utilize the method where the corpus text is first analysed, and the results of this analysis is then processed further to meet the needs of a dictionary. This method has several advantages, including high precision and recall, as well as the possibility to automate the process much further than with more traditional computational methods. The frequency list obtained by using the lemma (the equivalent of the headword as basis helps in selecting the words to be in-cluded in the dictionary. The approach is demonstrated through various phases by applying SALAMA (the Swahili Language Manager to the process. Manual work will be needed in the phase when examples of use are selected from the corpus, and possibly modified. However, the list of examples of use, arranged alphabetically according to the corresponding headword, can also be produced automatically. Thus the alphabetical list of headwords with examples of use is the mate-rial on which the lexicographer works manually. The article deals with problems encountered in compiling traditional printed dictionaries, and it excludes electronic dictionaries and thesauri.
Keywords: LEXICOGRAPHY, DICTIONARY, LANGUAGE TECHNOLOGY, COMPUTA-TIONAL LINGUISTICS, AUTOMATIC COMPILATION, DICTIONARY TESTING, INFORMA-TION RETRIEVAL, MORPHOLOGICAL ANALYSIS, SEMANTIC ANALYSIS, DISAMBIGUA-TION, HEURISTICS
Opsomming: Nuwe ontwikkelinge in korpusgebaseerde leksikografie. Hier-die artikel beskryf verskillende benaderings wat in korpusgebaseerde rekenaarleksikografie ge-bruik word. Daar word aangevoer dat vir rekenaarleksikografie om doelmatig, noukeurig en omvattend te wees, dit die metode behoort te gebruik waarby die korpusteks eers ontleed word, en die resultaat van hierdie ontleding dan verder
33 CFR 165.808 - Corpus Christi Ship Channel, Corpus Christi, TX, safety zone.

Science.gov (United States)

2010-07-01

... Petroleum Gas, the waters within a 500 yard radius of the LPG carrier while the vessel transits the Corpus Christi Ship Channel to the LPG receiving facility. The safety zone remains in effect until the LPG vessel is moored at the LPG receiving facility. (2) For outgoing tank vessels loaded with LPG, the waters...
Light-controlled relaxation of the rat penile corpus cavernosum using NOBL-1, a novel nitric oxide releaser

Directory of Open Access Journals (Sweden)

Yuji Hotta

2016-05-01

Full Text Available Purpose: To investigate whether relaxation of the rat penile corpus cavernosum could be controlled with NOBL-1, a novel, lightcontrollable nitric oxide (NO releaser. Materials and Methods: Fifteen-week-old male Wistar-ST rats were used. The penile corpus cavernosum was prepared and used in an isometric tension study. After noradrenaline (10−5 M achieved precontraction, the penile corpus cavernosum was irradiated by light (470–500 nm with and without NOBL-1 (10−6 M. In addition, we noted rats’ responses to light with vardenafil (10−6 M, a phosphodiesterase-5 (PDE-5 inhibitor. Next, responses to light in the presence of a guanylate cyclase inhibitor, ODQ (1H-[1,2,4] oxadiazolo[4,3-a]quinoxalin-1-one (10−5 M, were measured. All measurements were performed in pretreated L-NAME (10−4 M conditions to inhibit endogenous NO production. Results: Corpus cavernosal smooth muscle, precontracted with noradrenaline, was unchanged by light irradiation in the absence of NOBL-1. However, in the presence of NOBL-1, corpus cavernosal smooth muscle, precontracted with noradrenaline, relaxed in response to light irradiation. After blue light irradiation ceased, tension returned. In addition, the light response was obviously enhanced in the presence of a PDE-5 inhibitor. Conclusions: This study showed that rat corpus cavernosal smooth muscle relaxation can be light-controlled using NOBL-1, a novel, light sensitive NO releaser. Though further in vivo studies are needed to investigate possible usefulness, NOBL-1 may be prove to be a useful tool for erectile dysfunction therapy, specifically in the field of penile rehabilitation.
Revelando sentidos na prática docente: a abordagem de corpus na análise do discurso Uncovering meanings in pedagogical practice: the corpus approach in discourse analysis

Directory of Open Access Journals (Sweden)

Vander Viana

2011-01-01

Full Text Available Este artigo discute a viabilidade da utilização de ferramentas da Linguística de Corpus na análise do discurso pedagógico. Para tanto, são apresentados dois estudos de caso. O primeiro focaliza o discurso de professores de língua inglesa de um renomado curso de idiomas do Rio de Janeiro acerca da implementação de recursos tecnológicos na sala de aula. O segundo estudo, por sua vez, busca perceber qual é o posicionamento de professores universitários de literaturas em língua inglesa sobre literatura e seu ensino. Os resultados apontam para a riqueza dos dados contextuais que podem ser depreendidos a partir de uma análise linguística de base empírica. Em última análise, o artigo revela a importância e a flexibilidade da abordagem de corpus na análise do discurso, que pode ser aplicada a inúmeros contextos.This paper discusses the feasibility of using Corpus Linguistics tools in the analysis of pedagogic discourse. For doing this, two case studies are presented. The first one focuses on the discourse of English language teachers of a well-known languages course in Rio de Janeiro about the implementation of technological resources in the classroom. The second study, in its turn, seeks to realize the position held by university professors of literatures in English language with regard to literature and its teaching. The results point out to the richness of contextual data which can be inferred from a linguistic analysis with an empirical basis. All in all, the paper uncovers the importance and flexibility of the corpus approach in discourse analysis, which may be applied to several contexts.
Compiling a corpus-based dictionary grammar: an example for ...

African Journals Online (AJOL)

In this article it is shown how a corpus-based dictionary grammar may be compiled — that is, a mini-grammar fully based on corpus data and specifically written for use in and inte-grated with a dictionary. Such an effort is, to the best of our knowledge, a world's first. We exem-plify our approach for a Northern Sotho ...
The Corpus of Czech Verse

Czech Academy of Sciences Publication Activity Database

Plecháč, Petr; Kolár, Robert

2015-01-01

Roč. 2, č. 1 (2015), s. 107-118 ISSN 2346-6901 R&D Projects: GA ČR GAP406/11/1825 Institutional support: RVO:68378068 Keywords : Czech poetry * versification * corpus linguistics * theory of verse Subject RIV: AJ - Letters, Mass-media, Audiovision
Le corpus lexicographique dans les langues à tradition orale: le cas du dialecte fang-mekè*

Directory of Open Access Journals (Sweden)

Nzang-Bié Yolande

2011-10-01

Full Text Available
Résumé: Les corpus sont à la base de la plupart des recherches en linguistique et particulièrementlexicographique. La compilation d'un corpus est une activité spécialisée dont dépend lerésultat de la recherche en question. Le sujet de cet article est la compilation du corpus lexicographiquedans les langues à tradition orale, et exige une démarche différente de celle ayant unelongue tradition écrite. De ce fait, ces dernières disposent d'une importante documentationpouvant servir comme base pour de nombreux sujets de recherche. L'auteur propose commeapproche une analyse qui permettrait de mieux rendre compte des spécificités lexicales etsémantiques des langues à tradition orale.Par le truchement de la production orale libre, l'auteur base ses hypothèses de recherche surune expérience en dialecte fang-mekè, une variante linguistique localisée au Gabon. Les résultatspermettent de mettre l'accent sur deux données essentielles du processus de compilation dans leslangues à tradition orale: les informateurs et la représentativité du corpus. Cette dernière, qui doits'exprimer à travers des champs lexicaux diversifiés mais également équilibrés, permettrait d'élaborerdes dictionnaires dans lesquels les locuteurs, qui en sont les premiers utilisateurs, doivent sereconnaître.
Mots-clés: CORPUS, LEXICOGRAPHIE, LANGUES À TRADITION ORALE, LANGUES ÀTRADITION ÉCRITE, INFORMATEURS, EXHAUSTIVITÉ, REPRÉSENTATIVITÉ, CHAMPSLEXICAUX, ORALITÉ, ÉCRITURE, MÉTHODE, DIALECTE FANG-MEKÈ, CORPUS ÉQUILIBRÉ.
Abstract: The Lexicographic Corpus in Languages with an Oral Tradition: The Case of the Dialect Fang-Mek?. Corpora form the basis of most linguistic and especially lexicographic research. The compilation of a corpus is a specialised activity on which depends the result of the research to be undertaken. The subject of this article is the compilation of a lexicographic corpus in languages with an oral tradition
Agenesis of the Corpus Callosum

Science.gov (United States)

... callosum, the structure that connects the two hemispheres (left and right) of the brain. In ACC the corpus callosum is partially or completely absent. It is caused by a disruption of brain cell migration during fetal development. ACC can occur as an isolated condition or ...
Generation of a Skeleton Corpus of Digital Objects for the Validation and Evaluation of Format Identification Tools and Signatures

Directory of Open Access Journals (Sweden)

Ross Spencer

2013-06-01

Full Text Available To preserve digital information it is vital that the format of that information can be identified, in-perpetuity. This is the major focus of research within the field of Digital Preservation. The National Archives of the UK called for the Digital Preservation and Digital Curation communities to develop a test corpus of digital objects to help further develop tools to aid this purpose. Following that call, an attempt has been made to develop the suite.This paper initially outlines a methodology to generate a skeleton corpus using simple user-generated digital objects. It then explores the lessons learnt in the generation of a corpus using scripting language techniques from the file format signatures described in The National Archives PRONOM technical registry. It will also discuss the use of the digital signature for this purpose, the benefits of developing a test corpus using this technique. Finally, this paper will outline a methodology for future research before exploring how the community can best make use of the output of this project and how this project needs to be taken forward to completion.
A massively parallel corpus: the Bible in 100 languages.

Science.gov (United States)

Christodouloupoulos, Christos; Steedman, Mark

We describe the creation of a massively parallel corpus based on 100 translations of the Bible. We discuss some of the difficulties in acquiring and processing the raw material as well as the potential of the Bible as a corpus for natural language processing. Finally we present a statistical analysis of the corpora collected and a detailed comparison between the English translation and other English corpora.
Lexical Properties of Slovene Sign Language: A Corpus-Based Study

Science.gov (United States)

Vintar, Špela

2015-01-01

Slovene Sign Language (SZJ) has as yet received little attention from linguists. This article presents some basic facts about SZJ, its history, current status, and a description of the Slovene Sign Language Corpus and Pilot Grammar (SIGNOR) project, which compiled and annotated a representative corpus of SZJ. Finally, selected quantitative data…
INFECTIOUS DISEASES ARE SLEEPING MONSTERS: Conventional and culturally adapted new metaphors in a corpus of abstracts on immunology

Directory of Open Access Journals (Sweden)

Laura Hidalgo Downing

2009-04-01

Full Text Available In this paper we examine the role played by metaphor in a corpus of sixty abstracts on immunology from Scientific American. We focus on the distinction between conventional metaphors and culturally adapted new metaphors and discuss the role played by metaphor choice in the communicative purposes of the abstracts and their register features. We argue that one of the main strategies used to attract the reader‘s attention is the combination of highly conventionalized metaphors, which occur more frequently in the corpus, together with what we call “culturally adapted new metaphors”, which display different degrees of creativity and are less frequent in the corpus. Conventional metaphors typically reinforce the world view shared by the scientific community and introduce basic ideas on the subject of immunology. Culturally adapted new metaphors include a cline from slightly new perspectives of conventional models, to highly creative uses of metaphor. Culturally adapted new metaphors appeal primarily to a general readership and not to the scientific community, as they tap human emotions and mythic constructions. These play a crucial role in the abstracts, as they contribute to persuasive and didactic communicative functions in the text.
Segmenting corpora of texts Segmentação de corpora de textos

Directory of Open Access Journals (Sweden)

Tony Berber Sardinha

2002-01-01

Full Text Available The aim of the research presented here is to report on a corpus-based method for discourse analysis that is based on the notion of segmentation, or the division of texts into cohesive portions. For the purposes of this investigation, a segment is defined as a contiguous portion of written text consisting of at least two sentences. The segmentation procedure developed for the study is called LSM (link set median, which is based on the identification of lexical repetition in text. The data analysed in this investigation were three corpora of 100 texts each. Each corpus was composed of texts of one particular genre: research articles, annual business reports, and encyclopaedia entries. The total number of words in the three corpora was 1,262,710 words. The segments inserted in the texts by the LSM procedure were compared to the internal section divisions in the texts. Afterwards, the results obtained through the LSM procedure were then compared to segmentation carried out at random. The results indicated that the LSM procedure worked better than random, suggesting that lexical repetition accounts in part for the way texts are segmented into sections.O objetivo da pesquisa apresentada é relatar um método baseado em corpus para análise de discurso que se baseia na noção de segmentação, isto é, a divisão de textos em porções coesas. Para os propósitos desse estudo, um segmento é definido como uma porção contígua de texto que consiste em pelo menos sentenças. O procedimento de segmentação desenvolvido para a pesquisa chama-se LSM ('link set median' e se baseia na identificação da repetição lexical nos textos. Os dados analisados foram três corpora de 100 textos cada. Cada corpus representava um gênero específico: artigos de pesquisa, relatórios anuais de negócio e artigos de enciclopédia. O tamanho total do corpus é 1.262.710 palavras. A segmentação por LSM foi comparada à divisão interna em seções de cada texto. A
TEXT DEIXIS IN NARRATIVE SEQUENCES

Directory of Open Access Journals (Sweden)

Josep Rivera

2007-06-01

Full Text Available This study looks at demonstrative descriptions, regarding them as text-deictic procedures which contribute to weave discourse reference. Text deixis is thought of as a metaphorical referential device which maps the ground of utterance onto the text itself. Demonstrative expressions with textual antecedent-triggers, considered as the most important text-deictic units, are identified in a narrative corpus consisting of J. M. Barrie’s Peter Pan and its translation into Catalan. Some linguistic and discourse variables related to DemNPs are analysed to characterise adequately text deixis. It is shown that this referential device is usually combined with abstract nouns, thus categorising and encapsulating (non-nominal complex discourse entities as nouns, while performing a referential cohesive function by means of the text deixis + general noun type of lexical cohesion.
Learning From Short Text Streams With Topic Drifts.

Science.gov (United States)

Li, Peipei; He, Lu; Wang, Haiyan; Hu, Xuegang; Zhang, Yuhong; Li, Lei; Wu, Xindong

2017-09-18

Short text streams such as search snippets and micro blogs have been popular on the Web with the emergence of social media. Unlike traditional normal text streams, these data present the characteristics of short length, weak signal, high volume, high velocity, topic drift, etc. Short text stream classification is hence a very challenging and significant task. However, this challenge has received little attention from the research community. Therefore, a new feature extension approach is proposed for short text stream classification with the help of a large-scale semantic network obtained from a Web corpus. It is built on an incremental ensemble classification model for efficiency. First, more semantic contexts based on the senses of terms in short texts are introduced to make up of the data sparsity using the open semantic network, in which all terms are disambiguated by their semantics to reduce the noise impact. Second, a concept cluster-based topic drifting detection method is proposed to effectively track hidden topic drifts. Finally, extensive studies demonstrate that as compared to several well-known concept drifting detection methods in data stream, our approach can detect topic drifts effectively, and it enables handling short text streams effectively while maintaining the efficiency as compared to several state-of-the-art short text classification approaches.
Lingüística de Corpus: histórico e problemática

Directory of Open Access Journals (Sweden)

SARDINHA Tony Berber

2000-01-01

Full Text Available O presente trabalho oferece uma retrospectiva da Lingüística de Corpus, uma área de pesquisa que tem experimentado um crescimento vertiginoso nos últimos anos e que tem tido um impacto considerável na lingüística. A retrospectiva inclui tanto um painel histórico quanto um posicionamento em relação aos debates correntes e desenvolvimentos futuros da área. Os conceitos principais em voga na área são apresentados e discutidos. O trabalho ainda comenta os fatos mais marcantes na Lingüística de Corpus em relação à teoria e à prática, elencando os principais corpora em existência bem como as mais importantes contribuições no campo de programas de computador para análise e exploração desses corpora.
[Medicine and astrology in Arnau's corpus].

Science.gov (United States)

Giralt, Sebastià

2006-01-01

The role of astrology in Arnau de Vilanova's medical work is revisited with special attention to the problems of authorship posed by the astrological writings of Arnau's corpus and to their hypothetical chronology.
LINGUISTIC TEMPORALITY IN THE DIACHRONIC PERSPECTIVE: CORPUS ASPECT

Directory of Open Access Journals (Sweden)

Konnova Mariya Nikolaevna

2014-06-01

Full Text Available In the scope of complex cognitive and linguoculturological approach, aiming at investigating the triple unity of language, mind and culture, the author analyzes cognitive mechanisms of change in the meaning of New Testament saying "Dovleet dnevi zloba yego" (Mf. 6: 34 / "Sufficient for the day is the evil thereof" (St. Matthew 6: 34. This approach provides deeper insight into the essence of mental schemes underlying the process of lexicalisation of biblical micro-texts both as fixed phrases (quotations and idioms. The semantic shifts of microdiachronic character, which touched upon the semantic structure of biblical idiomatic expressions in 19-20th centuries and led to substantial restructuring of axiological and temporal components of meaning, are analyzed on the data of Russian National Corpus. The author proves that the use of biblical quotations outside their original context leads to their complete semantic transformation. The loss of original meaning is connected with the loss of key axiological and temporal characteristics typical for New Testament texts.
The Shona Corpus and the Problem of Tagging | Chabata | Lexikos

African Journals Online (AJOL)

An analysis of the problems that most corpus builders face shows that more problems are likely to be encountered when dealing with spoken corpora than with written corpora. The paper demonstrates that tagging is an important component of corpus building as it makes it easier for a researcher to extract relevant data.

The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text

DEFF Research Database (Denmark)

Pafilis, Evangelos; Pletscher-Frankild, Sune; Fanini, Lucia

2013-01-01

The exponential growth of the biomedical literature is making the need for efficient, accurate text-mining tools increasingly clear. The identification of named biological entities in text is a central and difficult task. We have developed an efficient algorithm and implementation of a dictionary......-based approach to named entity recognition, which we here use to identify names of species and other taxa in text. The tool, SPECIES, is more than an order of magnitude faster and as accurate as existing tools. The precision and recall was assessed both on an existing gold-standard corpus and on a new corpus...
Radiographic evaluation of 70 patients with absence of the corpus callosum

International Nuclear Information System (INIS)

Byrd, S.E.; Flannery, A.; Osborn, R.E.; Radkowski, M.A.; Naidich, T.P.; Bohan, T.P.

1987-01-01

Absence (agenesis) of the corpus callosum is one of the most common congenital malformations of the brain seen in the pediatric population. The authors used CT, MR imaging, or US to study 70 children with absence of the corpus callosum. Patients were divided into two groups; those with isolated absence of the corpus callosum, and those with other associated brain lesions. The associated brain lesions included interhemispheric arachnoid cyst, Dandy-Walker malformations, encephaloceles, and migrational disorders (heterotopias, schizencephaly, lissencaphaly, septo-optic dysplasia, lipoma, Chiari malformations, and holoprosenscephaly). The clinical presentations and radiologic findings are described
Neuromyelitis optica with linear enhancement of corpus callosum in brain magnetic resonance imaging with contrast: a case report.

Science.gov (United States)

Sahraian, Mohammad Ali; Moghadasi, Abdorreza Naser; Owji, Mahsa; Naghshineh, Hoda; Minagar, Alireza

2015-06-10

Neuromyelitis optica is a demyelinating disease of the central nervous system with various patterns of brain lesions. Corpus callosum may be involved in both multiple sclerosis and neuromyelitis optica. Previous case reports have demonstrated that callosal lesions in neuromyelitis optica are usually large and edematous and have a heterogeneous intensity showing a "marbled pattern" in the acute phase. Their size and intensity may reduce with time or disappear in the chronic stages. In this report, we describe a case of a 25-year-old Caucasian man with neuromyelitis optica who presented clinically with optic neuritis and myelitis. His brain magnetic resonance imaging demonstrated linear enhancement of the corpus callosum. Brain images with contrast agent added also showed linear ependymal layer enhancement of the lateral ventricles, which has been reported in this disease previously. Linear enhancement of corpus callosum in magnetic resonance imaging with contrast agent could help in diagnosing neuromyelitis optica and differentiating it from other demyelinating disease, especially multiple sclerosis.
Corpus callosum lipoma with frontal encephalocele

International Nuclear Information System (INIS)

Srinivasa Rao, A.; Rao, V.R.K.; Ravi Mandalam, K.; Gupta, A.K.; Kumar, S.; Joseph, S.; Unni, M.

1990-01-01

Computed tomographic and plain X-ray observations in a patient with corpus callosum lipoma associated with frontal encephalocele are reported. The rarity of the lesion and the specific diagnostic criteria on CT are emphasised. (orig.)
La arquitectura del pleno barroco en Granada: el hospital del Corpus Christi

Directory of Open Access Journals (Sweden)

Barrios Rozúa, Juan Manuel

2011-03-01

Full Text Available The Corpus Christi hospital of Granada was a victim of prejudices against the baroque on the part of influential historians. Nevertheless, the building is an interesting example of hospital architecture with a frankly original temple. Thanks to the exhaustive analysis of the institution’s very complete archive, it can be determined that some thirty artists worked there, including Alonso Cano and his disciple Juan Luis de Ortega, whose architectural works are evaluated here.

El hospital del Corpus Christi de Granada fue víctima de los prejuicios contra el barroco de influyentes historiadores. Sin embargo, el edificio constituye un interesante ejemplo de arquitectura hospitalaria con un templo francamente original. Gracias al análisis exhaustivo de su completo archivo, puede detectarse la labor de una treintena de artífices, entre ellos Alonso Cano y su discípulo Juan Luis de Ortega, cuyas obras arquitectónicas son valoradas aquí.
Annotated corpus and the empirical evaluation of probability estimates of grammatical forms

Directory of Open Access Journals (Sweden)

Ševa Nada

2003-01-01

Full Text Available The aim of the present study is to demonstrate the usage of an annotated corpus in the field of experimental psycholinguistics. Specifically, we demonstrate how the manually annotated Corpus of Serbian Language (Kostić, Đ. 2001 can be used for probability estimates of grammatical forms, which allow the control of independent variables in psycholinguistic experiments. We address the issue of processing Serbian inflected forms within two subparadigms of feminine nouns. In regression analysis, almost all processing variability of inflected forms has been accounted for by the amount of information (i.e. bits carried by the presented forms. In spite of the fact that probability distributions of inflected forms for the two paradigms differ, it was shown that the best prediction of processing variability is obtained by the probabilities derived from the predominant subparadigm which encompasses about 80% of feminine nouns. The relevance of annotated corpora in experimental psycholinguistics is discussed more in detail .
Boomerang sign: Clinical significance of transient lesion in splenium of corpus callosum

Directory of Open Access Journals (Sweden)

Hardeep Singh Malhotra

2012-01-01

Full Text Available Transient signal abnormality in the splenium of corpus callosum on magnetic resonance imaging (MRI is occasionally encountered in clinical practice. It has been reported in various clinical conditions apart from patients with epilepsy. We describe 4 patients with different etiologies presenting with signal changes in the splenium of corpus callosum. They were diagnosed as having progressive myoclonic epilepsy (case 1, localization-related epilepsy (case 2, hemicrania continua (case 3, and postinfectious parkinsonism (case 4. While three patients had complete involvement of the splenium on diffusion-weighted image ("boomerang sign", the patient having hemicrania continua showed semilunar involvement ("mini-boomerang" on T2-weighted and FLAIR image. All the cases had noncontiguous involvement of the splenium. We herein, discuss these cases with transient splenial involvement and stress that such patients do not need aggressive diagnostic and therapeutic interventions. An attempt has been made to review the literature regarding the pathophysiology, etiology, and outcome of such lesions.
A Novel DBN Feature Fusion Model for Cross-Corpus Speech Emotion Recognition

Directory of Open Access Journals (Sweden)

Zou Cairong

2016-01-01

Full Text Available The feature fusion from separate source is the current technical difficulties of cross-corpus speech emotion recognition. The purpose of this paper is to, based on Deep Belief Nets (DBN in Deep Learning, use the emotional information hiding in speech spectrum diagram (spectrogram as image features and then implement feature fusion with the traditional emotion features. First, based on the spectrogram analysis by STB/Itti model, the new spectrogram features are extracted from the color, the brightness, and the orientation, respectively; then using two alternative DBN models they fuse the traditional and the spectrogram features, which increase the scale of the feature subset and the characterization ability of emotion. Through the experiment on ABC database and Chinese corpora, the new feature subset compared with traditional speech emotion features, the recognition result on cross-corpus, distinctly advances by 8.8%. The method proposed provides a new idea for feature fusion of emotion recognition.
Corpus-aided language pedagogy : the use of concordance lines in vocabulary instruction

OpenAIRE

Kazaz, İlknur

2015-01-01

Ankara : The Program of Teaching English as a Foreign Language Bilkent University, 2015. Thesis (Master's) -- Bilkent University, 2015. Includes bibliographical references leaves 83-91. This study investigated the effectiveness of the use of a concordance software and concordance lines as a pedagogical tool to learn the target vocabulary of a text book. The purpose of the study was to compare the effects of corpus-aided vocabulary instruction with traditional vocabulary teac...
Applying corpus linguistics methodology to psycholinguistics research Aplicando a metodologia da linguística de corpus à pesquisa psicolinguística

Directory of Open Access Journals (Sweden)

Luciane Corrêa Ferreira

2010-01-01

Full Text Available This study concerns the use of corpus linguistics methodology in psycholinguistics research. Ten linguistic metaphors were selected from English and American newspapers. After that, we identified the underlying conceptual metaphor based on the conceptual metaphor inventory by Lakoff and Johnson (1980, 1999. We seek to investigate what sort of knowledge EFL-learners use when trying to understand a linguistic metaphor. We examined how EFL-learners comprehend linguistic metaphors, firstly without using the context and then using the context. The sample comprised 221 Brazilian students and 16 American students at UCSC. We have also carried out an empirical research using WebCorp.Este estudo investiga o uso de metodologia de lingüística de corpus na pesquisa psicolingüística. Selecionaram-se dez metáforas em jornais ingleses e norte-americanos. Depois, identificou-se a metáfora conceptual subjacente com base no inventário de metáforas conceptuais apresentado por Lakoff e Johnson (1980, 1999. Buscou-se investigar que tipo de conhecimento os aprendizes de LE empregam ao tentar compreender metáforas. Analisou-se a maneira como os aprendizes de LE compreendem metáforas lingüísticas sem usar o contexto e, depois, utilizando o contexto. A amostra incluiu 221 estudantes brasileiros e 16 estudantes norte-americanos da UCSC.
Linking Video and Text via Representations of Narrative

OpenAIRE

Salway, Andrew; Graham, Mike; Tomadaki, Eleftheria; Xu, Yan

2003-01-01

The ongoing TIWO project is investigating the synthesis of language technologies, like information extraction and corpus-based text analysis, video data modeling and knowledge representation. The aim is to develop a computational account of how video and text can be integrated by representations of narrative in multimedia systems. The multimedia domain is that of film and audio description – an emerging text type that is produced specifically to be informative about the events and objects dep...
How Can We Use Corpus Wordlists for Language Learning? Interfaces between Computer Corpora and Expert Intervention

Science.gov (United States)

Chen, Yu-Hua; Bruncak, Radovan

2015-01-01

With the advances in technology, wordlists retrieved from computer corpora have become increasingly popular in recent years. The lexical items in those wordlists are usually selected, according to a set of robust frequency and dispersion criteria, from large corpora of authentic and naturally occurring language. Corpus wordlists are of great value…
Vertebral morphology, dentition, age, growth, and ecology of the large lamniform shark Cardabiodon ricki

Directory of Open Access Journals (Sweden)

Michael G. Newbrey

2015-12-01

Full Text Available Cardabiodon ricki and Cardabiodon venator were large lamniform sharks with a patchy but global distribution in the Cenomanian and Turonian. Their teeth are generally rare and skeletal elements are less common. The centra of Cardabiodon ricki can be distinguished from those of other lamniforms by their unique combination of characteristics: medium length, round articulating outline with a very thick corpus calcareum, a corpus calcareum with a laterally flat rim, robust radial lamellae, thick radial lamellae that occur in low density, concentric lamellae absent, small circular or subovate pores concentrated next to each corpus calcareum, and papillose circular ridges on the surface of the corpus calcareum. The large diameter and robustness of the centra of two examined specimens suggest that Cardabiodon was large, had a rigid vertebral column, and was a fast swimmer. The sectioned corpora calcarea show both individuals deposited 13 bands (assumed to represent annual increments after the birth ring. The identification of the birth ring is supported in the holotype of Cardabiodon ricki as the back-calculated tooth size at age 0 is nearly equal to the size of the smallest known isolated tooth of this species. The birth ring size (5–6.6 mm radial distance [RD] overlaps with that of Archaeolamna kopingensis (5.4 mm RD and the range of variation of Cretoxyrhina mantelli (6–11.6 mm RD from the Smoky Hill Chalk, Niobrara Formation. The revised, reconstructed lower jaw dentition of the holotype of Cardabiodon ricki contains four anterior and 12 lateroposterior files. Total body length is estimated at 5.5 m based on 746 mm lower jaw bite circumference reconstructed from associated teeth of the holotype.
Supporting the annotation of chronic obstructive pulmonary disease (COPD) phenotypes with text mining workflows.

Science.gov (United States)

Fu, Xiao; Batista-Navarro, Riza; Rak, Rafal; Ananiadou, Sophia

2015-01-01

Chronic obstructive pulmonary disease (COPD) is a life-threatening lung disorder whose recent prevalence has led to an increasing burden on public healthcare. Phenotypic information in electronic clinical records is essential in providing suitable personalised treatment to patients with COPD. However, as phenotypes are often "hidden" within free text in clinical records, clinicians could benefit from text mining systems that facilitate their prompt recognition. This paper reports on a semi-automatic methodology for producing a corpus that can ultimately support the development of text mining tools that, in turn, will expedite the process of identifying groups of COPD patients. A corpus of 30 full-text papers was formed based on selection criteria informed by the expertise of COPD specialists. We developed an annotation scheme that is aimed at producing fine-grained, expressive and computable COPD annotations without burdening our curators with a highly complicated task. This was implemented in the Argo platform by means of a semi-automatic annotation workflow that integrates several text mining tools, including a graphical user interface for marking up documents. When evaluated using gold standard (i.e., manually validated) annotations, the semi-automatic workflow was shown to obtain a micro-averaged F-score of 45.70% (with relaxed matching). Utilising the gold standard data to train new concept recognisers, we demonstrated that our corpus, although still a work in progress, can foster the development of significantly better performing COPD phenotype extractors. We describe in this work the means by which we aim to eventually support the process of COPD phenotype curation, i.e., by the application of various text mining tools integrated into an annotation workflow. Although the corpus being described is still under development, our results thus far are encouraging and show great potential in stimulating the development of further automatic COPD phenotype extractors.
Copy Number Variations Found in Patients with a Corpus Callosum Abnormality and Intellectual Disability.

Science.gov (United States)

Heide, Solveig; Keren, Boris; Billette de Villemeur, Thierry; Chantot-Bastaraud, Sandra; Depienne, Christel; Nava, Caroline; Mignot, Cyril; Jacquette, Aurélia; Fonteneau, Eric; Lejeune, Elodie; Mach, Corinne; Marey, Isabelle; Whalen, Sandra; Lacombe, Didier; Naudion, Sophie; Rooryck, Caroline; Toutain, Annick; Caignec, Cédric Le; Haye, Damien; Olivier-Faivre, Laurence; Masurel-Paulet, Alice; Thauvin-Robinet, Christel; Lesne, Fabien; Faudet, Anne; Ville, Dorothée; des Portes, Vincent; Sanlaville, Damien; Siffroi, Jean-Pierre; Moutard, Marie-Laure; Héron, Delphine

2017-06-01

To evaluate the role that chromosomal micro-rearrangements play in patients with both corpus callosum abnormality and intellectual disability, we analyzed copy number variations (CNVs) in patients with corpus callosum abnormality/intellectual disability STUDY DESIGN: We screened 149 patients with corpus callosum abnormality/intellectual disability using Illumina SNP arrays. In 20 patients (13%), we have identified at least 1 CNV that likely contributes to corpus callosum abnormality/intellectual disability phenotype. We confirmed that the most common rearrangement in corpus callosum abnormality/intellectual disability is inverted duplication with terminal deletion of the 8p chromosome (3.2%). In addition to the identification of known recurrent CNVs, such as deletions 6qter, 18q21 (including TCF4), 1q43q44, 17p13.3, 14q12, 3q13, 3p26, and 3q26 (including SOX2), our analysis allowed us to refine the 2 known critical regions associated with 8q21.1 deletion and 19p13.1 duplication relevant for corpus callosum abnormality; report a novel 10p12 deletion including ZEB1 recently implicated in corpus callosum abnormality with corneal dystrophy; and) report a novel pathogenic 7q36 duplication encompassing SHH. In addition, 66 variants of unknown significance were identified in 57 patients encompassed candidate genes. Our results confirm the relevance of using microarray analysis as first line test in patients with corpus callosum abnormality/intellectual disability. Copyright © 2017 Elsevier Inc. All rights reserved.
Effect of Vestibulo-Proprioceptive Stimulations in a Child with Agenesis of the Corpus Callosum

Directory of Open Access Journals (Sweden)

Hamid Dalvand

2010-06-01

Full Text Available Background and Aim: The purpose of the present study was to investigate the effect of vestibulo-proprioceptive stimulations of sensory integration theory on the development of gross and fine motor, language and personal-social functions in a child with agenesis of the corpus callosum.Case: We report a 10.5 month old boy with agenesis of the corpus callosum. The intervention was administered based on sensory integration theory an hour a week for 20 weeks. The exercise intervention consisted of proprioceptive and linear, sustained and low frequency vestibular stimulations on suspension device and physio roll. A Denver Developmental Screening- II and milestones skill testing was completed pre-intervention and monthly. Post-intervention, age of gross motor, fine motor adaptive, language, and personal-social functions significantly improved. Based on milestones skills, maintenance of gross motor functions (e.g. sitting and quadruped position improved. The child could roll from side to side and released objects voluntarily. The reaction time to auditory stimulations became less than 2 seconds.Conclusion: vestibulo-proprioceptive stimulations using the neuroplasticity ability of the central nervous system is effective for development of gross and fine motor, language, and personal-social functions. These exercises can be administered for a child with agenesis of the corpus callosum.
Sirenomelia with agenesis of corpus callosum.

Science.gov (United States)

Shirani, Shapour; Rekabi, Vahab; Kamalian, Naser

2006-07-01

Sirenomelia is a very rare anomaly presented with fusion of the lower limbs. Genitourinary, neural tube, and vertebral anomalies are found in most cases. We report a case of sirenomelia with agenesis of corpus callosum, which has not been reported previously.
Temporal analysis of text data using latent variable models

DEFF Research Database (Denmark)

Mølgaard, Lasse Lohilahti; Larsen, Jan; Goutte, Cyril

2009-01-01

Detecting and tracking of temporal data is an important task in multiple applications. In this paper we study temporal text mining methods for Music Information Retrieval. We compare two ways of detecting the temporal latent semantics of a corpus extracted from Wikipedia, using a stepwise...
‘Not an ogre’:adult music learners and their teachers, a corpus-based discourse analysis

OpenAIRE

Shirley, Rachel

2015-01-01

Adult learners are an under-researched group in music education. Although music education research often uses texts (interviews, autobiographical accounts, survey responses), linguistic analysis has not yet been used in this area. Meanwhile, the internet has become a source of support and expression for adult music learners, through blogs and forums. This presentation describes part of the research undertaken for my MA in English Language, which uses a corpus of online texts to investigate di...
75 FR 31677 - Amendment of Class E Airspace; Corpus Christi, TX

Science.gov (United States)

2010-06-04

... Jose Island Airport, TX (Lat. 27[deg]56'40'' N., long. 96[deg]59'06'' W.) Rockport, Aransas County... Meacham Blvd., Fort Worth, TX 76137; telephone (817) 321- 7716. SUPPLEMENTARY INFORMATION: History On... Corpus Christi, TX [Amended] Corpus Christi International Airport, TX (Lat. 27[deg]46'13'' N., long. 97...

The Corpus of English as Lingua Franca in Academic Settings.

Science.gov (United States)

Mauranen, Anna

2003-01-01

Describes a project to make a corpus of English spoken as a lingua franca in university settings in Finland. This corpus is one of the first to address the need for corpora that show the target for English-as-a-Foreign-Language learners whose goal is not to speak with native speakers but to interact in communities where English is a lingua franca.…
Segmentation of corpus callosum using diffusion tensor imaging: validation in patients with glioblastoma

International Nuclear Information System (INIS)

Nazem-Zadeh, Mohammad-Reza; Saksena, Sona; Babajani-Fermi, Abbas; Jiang, Quan; Soltanian-Zadeh, Hamid; Rosenblum, Mark; Mikkelsen, Tom; Jain, Rajan

2012-01-01

This paper presents a three-dimensional (3D) method for segmenting corpus callosum in normal subjects and brain cancer patients with glioblastoma. Nineteen patients with histologically confirmed treatment naïve glioblastoma and eleven normal control subjects underwent DTI on a 3T scanner. Based on the information inherent in diffusion tensors, a similarity measure was proposed and used in the proposed algorithm. In this algorithm, diffusion pattern of corpus callosum was used as prior information. Subsequently, corpus callosum was automatically divided into Witelson subdivisions. We simulated the potential rotation of corpus callosum under tumor pressure and studied the reproducibility of the proposed segmentation method in such cases. Dice coefficients, estimated to compare automatic and manual segmentation results for Witelson subdivisions, ranged from 94% to 98% for control subjects and from 81% to 95% for tumor patients, illustrating closeness of automatic and manual segmentations. Studying the effect of corpus callosum rotation by different Euler angles showed that although segmentation results were more sensitive to azimuth and elevation than skew, rotations caused by brain tumors do not have major effects on the segmentation results. The proposed method and similarity measure segment corpus callosum by propagating a hyper-surface inside the structure (resulting in high sensitivity), without penetrating into neighboring fiber bundles (resulting in high specificity)
A new universality class in corpus of texts; A statistical physics study

Science.gov (United States)

Najafi, Elham; Darooneh, Amir H.

2018-05-01

Text can be regarded as a complex system. There are some methods in statistical physics which can be used to study this system. In this work, by means of statistical physics methods, we reveal new universal behaviors of texts associating with the fractality values of words in a text. The fractality measure indicates the importance of words in a text by considering distribution pattern of words throughout the text. We observed a power law relation between fractality of text and vocabulary size for texts and corpora. We also observed this behavior in studying biological data.
A Corpus-Based View of Lexical Gender in Written Business English

Science.gov (United States)

Fuertes-Olivera, Pedro A.

2007-01-01

This article investigates lexical gender in specialized communication. The key method of analysis is that of forms of address, professional titles, and "generic man" in a 10 million word corpus of written Business English. After a brief introduction and literature review on both gender in specialized communication and similar corpus-based views of…
Microstructural changes in thickened corpus callosum in children: contribution of magnetic resonance diffusion tensor imaging

Energy Technology Data Exchange (ETDEWEB)

Merlini, Laura; Anooshiravani, Mehrak; Kanavaki, Aikaterini; Hanquinet, Sylviane [University of Geneva Children' s Hospital, Pediatric Radiology Unit, Geneva (Switzerland)

2015-06-15

Thickened corpus callosum is a rare finding and its pathophysiology is not well known. An anomalous supracallosal bundle has been depicted by fiber tracking in some cases but no diffusion tensor imaging metrics of thickened corpus callosum have been reported. To use diffusion tensor imaging (DTI) in cases of thickened corpus callosum to help in understanding its clinical significance. During a 7-year period five children (ages 6 months to 15 years) with thickened corpus callosum were studied. We determined DTI metrics of fractional anisotropy (FA), mean diffusivity, and axial (λ1) and radial (λ2, λ3) diffusivity and performed 3-D fiber tracking reconstruction of the thickened corpus callosum. We compared our results with data from the literature and 24 age-matched controls. Brain abnormalities were seen in all cases. All children had at least three measurements of corpus callosum thickness above the 97th percentile according to age. In all children 3-D fiber tracking showed an anomalous supracallosal bundle and statistically significant decrease in FA (P = 0.003) and λ1 (P = 0.001) of the corpus callosum compared with controls, but no significant difference in mean diffusivity and radial diffusivity. Thickened corpus callosum was associated with abnormal bundles, suggesting underlying axonal guidance abnormality. DTI metrics suggested abnormal fiber compactness and density, which may be associated with alterations in cognition. (orig.)
Determination of indices of the corpus callosum associated with normal aging in Japanese individuals

International Nuclear Information System (INIS)

Takeda, S.; Hirashima, Y.; Ikeda, H.; Yamamoto, H.; Endo, S.; Sugino, M.

2003-01-01

Indices of the corpus callosum with normal aging and their sex differences were elucidated using quantitative MRI. We studied 94 Japanese men (mean±SD 57.3±20.8 years, range 6-90 years) and 111 Japanese women (mean±SD 61.2±17.6 years, range 9-86 years) who had no intracranial lesions on MRI and no history of neurological illness. The widths of the rostrum, body and splenium, the anterior to posterior length, and the maximum height in the midsagittal image were selected for measurement. The Evans index, which is the relative ratio of lateral ventricle expansion, and the maximum width of the third ventricle in the axial image were also estimated for comparison. The widths of rostrum, body and splenium of the corpus callosum became thinner with age. Conversely, the anterior to posterior length and the maximum height of the corpus callosum increased with age. The ratio of the width of the body to the length of the corpus callosum and the ratio of the width of the body to the height of the corpus callosum are best correlated with age. No sex differences in regional size of corpus callosum, including these two ratios, were observed in any raw measures, although ventricular indices were larger in men than women. Evaluation of the ratio of the width of the body to its length and the ratio of the width of the body to its height may enable accurate estimation of normal or pathological changes of the corpus callosum. Aging and pathological atrophy of corpus callosum can be evaluated without any adjustment for gender. (orig.)
Determination of indices of the corpus callosum associated with normal aging in Japanese individuals

Energy Technology Data Exchange (ETDEWEB)

Takeda, S.; Hirashima, Y.; Ikeda, H.; Yamamoto, H.; Endo, S. [Department of Neurosurgery, Toyama Medical and Pharmaceutical University, Sugitani 2630, Toyama-shi, 930-0194, Toyama (Japan); Sugino, M. [Department of Neurosurgery, Sugino Hospital, Sengoku-cho 6-3-3, 930-0066, Toyama (Japan)

2003-08-01

Indices of the corpus callosum with normal aging and their sex differences were elucidated using quantitative MRI. We studied 94 Japanese men (mean{+-}SD 57.3{+-}20.8 years, range 6-90 years) and 111 Japanese women (mean{+-}SD 61.2{+-}17.6 years, range 9-86 years) who had no intracranial lesions on MRI and no history of neurological illness. The widths of the rostrum, body and splenium, the anterior to posterior length, and the maximum height in the midsagittal image were selected for measurement. The Evans index, which is the relative ratio of lateral ventricle expansion, and the maximum width of the third ventricle in the axial image were also estimated for comparison. The widths of rostrum, body and splenium of the corpus callosum became thinner with age. Conversely, the anterior to posterior length and the maximum height of the corpus callosum increased with age. The ratio of the width of the body to the length of the corpus callosum and the ratio of the width of the body to the height of the corpus callosum are best correlated with age. No sex differences in regional size of corpus callosum, including these two ratios, were observed in any raw measures, although ventricular indices were larger in men than women. Evaluation of the ratio of the width of the body to its length and the ratio of the width of the body to its height may enable accurate estimation of normal or pathological changes of the corpus callosum. Aging and pathological atrophy of corpus callosum can be evaluated without any adjustment for gender. (orig.)
IL NOME IN LIS NEL SEGNATO DI ADULTI UDENTI: UNA INDAGINE PRELIMINARE SUL CORPUS LISAU

Directory of Open Access Journals (Sweden)

Matteo La Grassa

2016-09-01

Full Text Available L’indagine presenta i primi risultati emersi dall’analisi di una parte del corpus LISAU (LIS di Adulti Udenti sulla produzione segnata del sintagma nominale in LIS da parte di informanti udenti che hanno appreso la LIS come L2 in età adulta. Scopo dell’indagine è cominciare a tracciare una linea di ricerca nell’ambito della linguistica acquisizionale con riferimento all’acquisizione della LIS come L2 da parte di udenti. Il corpus LISAU include il segnato di 7 informanti udenti con livello di competenza omogenea che hanno terminato un corso di terzo livello presso la sede Ente Nazionale Sordi di Prato e di 2 informanti sordi segnanti nativi considerati come gruppo di controllo. L’analisi si è incentrata sulla realizzazione dei nomi di prima e di seconda classe rilevando anche forme non citazionali, sulla realizzazione di forme plurali e sulle modalità di accordo tra nomi e aggettivi. Dalla maggior parte dei dati analizzati si rileva la piena competenza degli informanti nella realizzazione del sintagma nominale. Nouns Signed by hearing adults in LIS: a preliminary survey on the LISAU corpus The results of an analysis concerning part of the LISAU (LIS of Hearing Adults corpus related to the production of the noun phrase in LIS by hearing informants who learned the LIS L2 in adulthood are presented. The purpose of the investigation was to outline the process with regard to the acquisition of LIS L2 by hearing adults. The LISAU corpus is composed of the sign language of 7 hearing informants with a homogeneous level of competence who completed a third-level course at the Ente Nazionale Sordi in Prato. LISAU also includes the sign language of 2 deaf native signers, considered the control group. The analysis focuses on the first and second-class nouns, including non-citation forms, plural forms and noun-adjective agreement. Most of the analyzed data reveals the informants’ full competence in creating noun phrases.
Corpus callosum tissue loss and development of motor and global cognitive impairment

DEFF Research Database (Denmark)

Frederiksen, Kristian S; Garde, Ellen; Skimminge, Arnold

2011-01-01

To examine the impact of corpus callosum (CC) tissue loss on the development of global cognitive and motor impairment in the elderly.......To examine the impact of corpus callosum (CC) tissue loss on the development of global cognitive and motor impairment in the elderly....
Corpus Juris ja Eesti : [bakalaureusetöö] / Artur Kink ; Tartu Ülikool, õigusteaduskond ; juhendaja: Eerik Kergandberg

Index Scriptorium Estoniae

Kink, Artur

1999-01-01

Corpus Jurise taust ja areng - finantshuvide kaitse areng, Corpus Jurise ajalugu; Corpus Jurise õiguslik baas (Amsterdami leping), ülesehitus ja struktuur (üleeuroopalise territoriaalsuse printsiip, kohtuliku kontrolli printsiip, "vastuväitelise" protsessi printsiip, kohaliku õiguse subsidiaarsuse printsiip)
On immune responsiveness of the organism of patients with corpus uteri cancer

International Nuclear Information System (INIS)

Gorodilova, V.V.; Yatskovskaya, N.L.

1978-01-01

Studied were some immunological indices in patients with cancer of corpus uteri. An attempt was made to elucidate a possible dependence of immunological indices on the process propagation rate and treatment methods. Updated methods used for uteri corpus cancer treatment except for progestinotherapeutics promote the decrease of organism responsiveness. Radiation therapy applied with total therapeutic dose has especially pronounced immunodepressing effect. Progestine series preparations result in the differentiation effect on tumours in some patients with cancer of corpus uteri, which clinically manifests in decreasing the tumour and even complete elimination. Simultaneously immunological indices in such patients are improved
Translation as a Paradigm Shift: A Corpus Study of Academic Writing

Directory of Open Access Journals (Sweden)

Agnes Pisanski Peterlin

2013-05-01

Full Text Available In recent decades the increasing reliance on computer technology and the emergence of electronic publishing have precipitated changes in both the production and reception of academic writing. At the same time, the dominance of English as the medium of academic communication has been asserted in all fields of study. While many scholars write their own texts in English, it is not exceptional for others to have their papers translated into English. It is interesting, however, that translation of academic discourse has received relatively little research attention so far. In the study presented here, the question how translated academic texts differ from comparable original English academic texts is addressed. To explore this question, a 700,000-word corpus comprising 104 research articles (Slovene-English translations and comparable English originals is analyzed in terms of references to the entire text itself. The results show considerable differences between the translated texts and the comparable English-language originals.
Topics in Corpus-Based Dutch Syntax

NARCIS (Netherlands)

Beek, Leonoor Johanneke van der

2005-01-01

In this dissertation, corpus data is applied in various kinds of linguistic analyses. The data serves as a source of examples and counterexamples in a theoretical linguistic analysis of the Dutch cleft construction, as the source of quantitative data in a probabilistic account of the dative
Language Functions and Medical Communication: The Human Body as Text

Science.gov (United States)

Kantz, Deirdre; Marenzi, Ivana

2016-01-01

This article presents the findings of a field experiment in medical English with first-year medical students at the University of Pavia, Northern Italy. Working in groups of 8-10, the students were asked to produce a corpus of medical texts in English demonstrating how the human body is itself a meaningful text (Baldry and Thibault 2006: Ch. 1).…
Inner change in the Corpus Paulinum: pointers for pastoral counselling

Directory of Open Access Journals (Sweden)

Y. Campbell-Lane

2007-07-01

Full Text Available The aim of this article is to establish what perspectives exist on inner change within the “Corpus Paulinum” and how it should be applied in pastoral counselling. The Scriptural guidelines of change that will be examined for the purposes of this article, are found in the following references: Ephesians 4:22-24, Colos- sians 3:8-10, and Romans 12:1-2. The work of the Holy Spirit as “Agent of change” will also be discussed and finally some pointers on inner change and the implications for pastoral counselling will be proposed.
Mechanosensitive enteric neurons in the guinea pig gastric corpus

Directory of Open Access Journals (Sweden)

Gemma eMazzuoli-Weber

2015-11-01

Full Text Available For long it was believed that a particular population of enteric neurons, referred to as intrinsic primary afferent neuron (IPANs, encodes mechanical stimulation. We recently proposed a new concept suggesting that there are in addition mechanosensitive enteric neurons (MEN that are multifunctional. Based on firing pattern MEN behaved as rapidly, slowly or ultra-slowly adapting RAMEN, SAMEN or USAMEN, respectively. We aimed to validate this concept in the myenteric plexus of the gastric corpus, a region where IPANs were not identified and existence of enteric sensory neurons was even questioned. The gastric corpus is characterized by a particularly dense extrinsic sensory innervation. Neuronal activity was recorded with voltage sensitive dye imaging after deformation of ganglia by compression (intraganglionic volume injection or von Fry hair or tension (ganglionic stretch. We demonstrated that 27% of the gastric neurons were MEN and responded to intraganglionic volume injection. Of these 73% were RAMEN, 25% SAMEN and 2% USAMEN with a firing frequency of 1.7 (1.1/ 2.2 Hz, 5.1 (2.2/7.7 Hz and of 5.4 (5.0/15.5 Hz, respectively. The responses were reproducible and stronger with increased stimulus strength. Even after adaptation another deformation evoked spike discharge again suggesting a resetting mode of the mechanoreceptors. All MEN received fast synaptic input. 55% of all MEN were cholinergic and 45% nitrergic. Responses in some MEN significantly decreased after perfusion of TTX, low Ca++/high Mg++ Krebs solution, capsaicin induced nerve defunctionalization and capsazepine indicating the involvement of TRPV1 expressing extrinsic mechanosensitive nerves. Half of gastric MEN responded to intraganglionic volume injection as well as to ganglionic stretch and 23% responded to stretch only. Tension-sensitive MEN were to a large proportion USAMEN (44%. In summary, we demonstrated for the first time compression and tension-sensitive MEN in the stomach
Measures of speech rhythm and the role of corpus-based word frequency: a multifactorial comparison of Spanish(-English speakers

Directory of Open Access Journals (Sweden)

Michael J. Harris

2011-12-01

Full Text Available In this study, we address various measures that have been employed to distinguish between syllable and stress- timed languages. This study differs from all previous ones by (i exploring and comparing multiple metrics within a quantitative and multifactorial perspective and by (ii also documenting the impact of corpus-based word frequency. We begin with the basic distinctions of speech rhythms, dealing with the differences between syllable-timed languages and stress-timed languages and several methods that have been used to attempt to distinguish between the two. We then describe how these metrics were used in the current study comparing the speech rhythms of Mexican Spanish speakers and bilingual English/Spanish speakers (speakers born to Mexican parents in California. More specifically, we evaluate how well various metrics of vowel duration variability as well as the so far understudied factor of corpus-based frequency allow to classify speakers as monolingual or bilingual. A binary logistic regression identifies several main effects and interactions. Most importantly, our results call the utility of a particular rhythm metric, the PVI, into question and indicate that corpus data in the form of lemma frequencies interact with two metrics of durational variability, suggesting that durational variability metrics should ideally be studied in conjunction with corpus-based frequency data.
Language configurations of degree-related denotations in the spoken production of a group of Colombian EFL university students: A corpus-based study

Directory of Open Access Journals (Sweden)

Wilder Yesid Escobar

2015-05-01

Full Text Available Recognizing that developing the competences needed to appropriately use linguistic resources according to contextual characteristics (pragmatics is as important as the cultural-imbedded linguistic knowledge itself (semantics and that both are equally essential to form competent speakers of English in foreign language contexts, we feel this research relies on corpus linguistics to analyze both the scope and the limitations of the sociolinguistic knowledge and the communicative skills of English students at the university level. To such end, a linguistic corpus was assembled, compared to an existing corpus of native speakers, and analyzed in terms of the frequency, overuse, underuse, misuse, ambiguity, success, and failure of the linguistic parameters used in speech acts. The findings herein describe the linguistic configurations employed to modify levels and degrees of descriptions (salient sematic theme exhibited in the EFL learners´ corpus appealing to the sociolinguistic principles governing meaning making and language use which are constructed under the social conditions of the environments where the language is naturally spoken for sociocultural exchange.
La enseñanza de la atenuación en E/LE a partir del análisis de un corpus real

Directory of Open Access Journals (Sweden)

Daniel Secchi

2017-11-01

Full Text Available Resumen: El presente trabajo busca evidenciar la potencialidad de los corpus discursivos orales reales como herramienta para la enseñanza/aprendizaje de la atenuación en E/LE, y también pretende demostrar cómo los estudiantes pueden mejorar sus habilidades pragmáticas relacionadas con la atenuación a través de un aprendizaje consciente que les permita desenvolverse en los diferentes contextos comunicativos, igual que los nativos. Palabras clave: corpus oral real, atenuación, español lengua extranjera (E/LE Abstract: The aim of the present investigation is to point out the potentiality of using real oral discourse corpus as a didactic resource in order to teach/learn mitigation in S/SL classes. Also, we want to highlight how students can improve their pragmatic and mitigation skills, thanks to a conscious learning of those strategies that help them to communicate in different communicative contexts as well as native speakers do. Keywords: real oral corpus, mitigation, Spanish as second language (S/SL
Polish Phoneme Statistics Obtained On Large Set Of Written Texts

Directory of Open Access Journals (Sweden)

Bartosz Ziółko

2009-01-01

Full Text Available The phonetical statistics were collected from several Polish corpora. The paper is a summaryof the data which are phoneme n-grams and some phenomena in the statistics. Triphonestatistics apply context-dependent speech units which have an important role in speech recognitionsystems and were never calculated for a large set of Polish written texts. The standardphonetic alphabet for Polish, SAMPA, and methods of providing phonetic transcriptions are described.

English Collocation Learning through Corpus Data: On-Line Concordance and Statistical Information

Science.gov (United States)

Ohtake, Hiroshi; Fujita, Nobuyuki; Kawamoto, Takeshi; Morren, Brian; Ugawa, Yoshihiro; Kaneko, Shuji

2012-01-01

We developed an English Collocations On Demand system offering on-line corpus and concordance information to help Japanese researchers acquire a better command of English collocation patterns. The Life Science Dictionary Corpus consists of approximately 90,000,000 words collected from life science related research papers published in academic…
A case of total agenesis of the corpus callosum

International Nuclear Information System (INIS)

Sakamoto, Masanobu; Takeda, Katsuhiko; Bandou, Mitsuaki; Murayama, Shigeo; Sakuta, Manabu

1985-01-01

We have reported a case of agenesis of the corpus callosum, in which NMR-CT revealed a complete defect of it, and have examined the localization of the speech center of this patient. The patient is a right-handed 26-year-old man who has complained of headache on the parietal region. His neurological examination revealed only a mild mental difficulty (IQ 77). X-ray CT showed the lateral ventricles to be separated widely and the posterior horns dilated, which were compatible with the agenesis of the corpus callosum. Further, NMR-CT has revealed a total agenesis of the corpus callosum. NMR-CT seems to be highly useful for the detection of the degree of the callosal defect. We have carried out the intracarotid amobarbital injection (Wada's test) for the determination of the lateralization of cerebral speech dominance. It had been reported by some authors that when it comes to the cerebral speech dominance, acallosal patients had no difference between each hemisphere. However, our results have demonstrated a left sided dominance. (author)
Preparing an annotated gold standard corpus to share with extramural investigators for de-identification research.

Science.gov (United States)

Deleger, Louise; Lingren, Todd; Ni, Yizhao; Kaiser, Megan; Stoutenborough, Laura; Marsolo, Keith; Kouril, Michal; Molnar, Katalin; Solti, Imre

2014-08-01

The current study aims to fill the gap in available healthcare de-identification resources by creating a new sharable dataset with realistic Protected Health Information (PHI) without reducing the value of the data for de-identification research. By releasing the annotated gold standard corpus with Data Use Agreement we would like to encourage other Computational Linguists to experiment with our data and develop new machine learning models for de-identification. This paper describes: (1) the modifications required by the Institutional Review Board before sharing the de-identification gold standard corpus; (2) our efforts to keep the PHI as realistic as possible; (3) and the tests to show the effectiveness of these efforts in preserving the value of the modified data set for machine learning model development. In a previous study we built an original de-identification gold standard corpus annotated with true Protected Health Information (PHI) from 3503 randomly selected clinical notes for the 22 most frequent clinical note types of our institution. In the current study we modified the original gold standard corpus to make it suitable for external sharing by replacing HIPAA-specified PHI with newly generated realistic PHI. Finally, we evaluated the research value of this new dataset by comparing the performance of an existing published in-house de-identification system, when trained on the new de-identification gold standard corpus, with the performance of the same system, when trained on the original corpus. We assessed the potential benefits of using the new de-identification gold standard corpus to identify PHI in the i2b2 and PhysioNet datasets that were released by other groups for de-identification research. We also measured the effectiveness of the i2b2 and PhysioNet de-identification gold standard corpora in identifying PHI in our original clinical notes. Performance of the de-identification system using the new gold standard corpus as a training set was very
Magnetic resonance findings of the corpus callosum in canine and feline lysosomal storage diseases.

Science.gov (United States)

Hasegawa, Daisuke; Tamura, Shinji; Nakamoto, Yuya; Matsuki, Naoaki; Takahashi, Kimimasa; Fujita, Michio; Uchida, Kazuyuki; Yamato, Osamu

2013-01-01

Several reports have described magnetic resonance (MR) findings in canine and feline lysosomal storage diseases such as gangliosidoses and neuronal ceroid lipofuscinosis. Although most of those studies described the signal intensities of white matter in the cerebrum, findings of the corpus callosum were not described in detail. A retrospective study was conducted on MR findings of the corpus callosum as well as the rostral commissure and the fornix in 18 cases of canine and feline lysosomal storage diseases. This included 6 Shiba Inu dogs and 2 domestic shorthair cats with GM1 gangliosidosis; 2 domestic shorthair cats, 2 familial toy poodles, and a golden retriever with GM2 gangliosidosis; and 2 border collies and 3 chihuahuas with neuronal ceroid lipofuscinoses, to determine whether changes of the corpus callosum is an imaging indicator of those diseases. The corpus callosum and the rostral commissure were difficult to recognize in all cases of juvenile-onset gangliosidoses (GM1 gangliosidosis in Shiba Inu dogs and domestic shorthair cats and GM2 gangliosidosis in domestic shorthair cats) and GM2 gangliosidosis in toy poodles with late juvenile-onset. In contrast, the corpus callosum and the rostral commissure were confirmed in cases of GM2 gangliosidosis in a golden retriever and canine neuronal ceroid lipofuscinoses with late juvenile- to early adult-onset, but were extremely thin. Abnormal findings of the corpus callosum on midline sagittal images may be a useful imaging indicator for suspecting lysosomal storage diseases, especially hypoplasia (underdevelopment) of the corpus callosum in juvenile-onset gangliosidoses.
Magnetic Resonance Findings of the Corpus Callosum in Canine and Feline Lysosomal Storage Diseases

Science.gov (United States)

Hasegawa, Daisuke; Tamura, Shinji; Nakamoto, Yuya; Matsuki, Naoaki; Takahashi, Kimimasa; Fujita, Michio; Uchida, Kazuyuki; Yamato, Osamu

2013-01-01

Several reports have described magnetic resonance (MR) findings in canine and feline lysosomal storage diseases such as gangliosidoses and neuronal ceroid lipofuscinosis. Although most of those studies described the signal intensities of white matter in the cerebrum, findings of the corpus callosum were not described in detail. A retrospective study was conducted on MR findings of the corpus callosum as well as the rostral commissure and the fornix in 18 cases of canine and feline lysosomal storage diseases. This included 6 Shiba Inu dogs and 2 domestic shorthair cats with GM1 gangliosidosis; 2 domestic shorthair cats, 2 familial toy poodles, and a golden retriever with GM2 gangliosidosis; and 2 border collies and 3 chihuahuas with neuronal ceroid lipofuscinoses, to determine whether changes of the corpus callosum is an imaging indicator of those diseases. The corpus callosum and the rostral commissure were difficult to recognize in all cases of juvenile-onset gangliosidoses (GM1 gangliosidosis in Shiba Inu dogs and domestic shorthair cats and GM2 gangliosidosis in domestic shorthair cats) and GM2 gangliosidosis in toy poodles with late juvenile-onset. In contrast, the corpus callosum and the rostral commissure were confirmed in cases of GM2 gangliosidosis in a golden retriever and canine neuronal ceroid lipofuscinoses with late juvenile- to early adult-onset, but were extremely thin. Abnormal findings of the corpus callosum on midline sagittal images may be a useful imaging indicator for suspecting lysosomal storage diseases, especially hypoplasia (underdevelopment) of the corpus callosum in juvenile-onset gangliosidoses. PMID:24386203
Handedness and corpus callosal morphology in Williams syndrome.

Science.gov (United States)

Martens, Marilee A; Wilson, Sarah J; Chen, Jian; Wood, Amanda G; Reutens, David C

2013-02-01

Williams syndrome is a neurodevelopmental genetic disorder caused by a hemizygous deletion on chromosome 7q11.23, resulting in atypical brain structure and function, including abnormal morphology of the corpus callosum. An influence of handedness on the size of the corpus callosum has been observed in studies of typical individuals, but handedness has not been taken into account in studies of callosal morphology in Williams syndrome. We hypothesized that callosal area is smaller and the size of the splenium and isthmus is reduced in individuals with Williams syndrome compared to healthy controls, and examined age, sex, and handedness effects on corpus callosal area. Structural magnetic resonance imaging scans were obtained on 25 individuals with Williams syndrome (18 right-handed, 7 left-handed) and 25 matched controls. We found that callosal thickness was significantly reduced in the splenium of Williams syndrome individuals compared to controls. We also found novel evidence that the callosal area was smaller in left-handed participants with Williams syndrome than their right-handed counterparts, with opposite findings observed in the control group. This novel finding may be associated with LIM-kinase hemizygosity, a characteristic of Williams syndrome. The findings may have significant clinical implications in future explorations of the Williams syndrome cognitive phenotype.
Working Together: Contributions of Corpus Analyses and Experimental Psycholinguistics to Understanding Conversation.

Science.gov (United States)

Meyer, Antje S; Alday, Phillip M; Decuyper, Caitlin; Knudsen, Birgit

2018-01-01

As conversation is the most important way of using language, linguists and psychologists should combine forces to investigate how interlocutors deal with the cognitive demands arising during conversation. Linguistic analyses of corpora of conversation are needed to understand the structure of conversations, and experimental work is indispensable for understanding the underlying cognitive processes. We argue that joint consideration of corpus and experimental data is most informative when the utterances elicited in a lab experiment match those extracted from a corpus in relevant ways. This requirement to compare like with like seems obvious but is not trivial to achieve. To illustrate this approach, we report two experiments where responses to polar (yes/no) questions were elicited in the lab and the response latencies were compared to gaps between polar questions and answers in a corpus of conversational speech. We found, as expected, that responses were given faster when they were easy to plan and planning could be initiated earlier than when they were harder to plan and planning was initiated later. Overall, in all but one condition, the latencies were longer than one would expect based on the analyses of corpus data. We discuss the implication of this partial match between the data sets and more generally how corpus and experimental data can best be combined in studies of conversation.
Quantification of structural changes in the corpus callosumin children with profound hypoxic-ischaemic brain injury

Energy Technology Data Exchange (ETDEWEB)

Stivaros, Stavros M. [Manchester Academic Health Science Centre, Academic Unit of Paediatric Radiology, Royal Manchester Children' s Hospital, Central Manchester University Hospitals NHS Foundation Trust, Manchester (United Kingdom); University of Manchester, Centre for Imaging Sciences, Institute of Population Health, Manchester (United Kingdom); Radon, Mark R. [The Walton Centre NHS Foundation Trust, Department of Neuroradiology, Liverpool (United Kingdom); Mileva, Reneta; Gledson, Ann; Keane, John A. [University of Manchester, School of Computer Science, Manchester (United Kingdom); Connolly, Daniel J.A.; Batty, Ruth [Sheffield Children' s Hospital NHS Foundation Trust, Department of Neuroradiology, Sheffield (United Kingdom); Cowell, Patricia E. [University of Sheffield, Department of Human Communication Sciences, Sheffield (United Kingdom); Hoggard, Nigel; Griffiths, Paul D. [University of Sheffield, Academic Unit of Radiology, Sheffield (United Kingdom); Wright, Neville B.; Tang, Vivian [Manchester Academic Health Science Centre, Academic Unit of Paediatric Radiology, Royal Manchester Children' s Hospital, Central Manchester University Hospitals NHS Foundation Trust, Manchester (United Kingdom)

2016-01-15

Birth-related acute profound hypoxic-ischaemic brain injury has specific patterns of damage including the paracentral lobules. To test the hypothesis that there is anatomically coherent regional volume loss of the corpus callosum as a result of this hemispheric abnormality. Study subjects included 13 children with proven acute profound hypoxic-ischaemic brain injury and 13 children with developmental delay but no brain abnormalities. A computerised system divided the corpus callosum into 100 segments, measuring each width. Principal component analysis grouped the widths into contiguous anatomical regions. We conducted analysis of variance of corpus callosum widths as well as support vector machine stratification into patient groups. There was statistically significant narrowing of the mid-posterior body and genu of the corpus callosum in children with hypoxic-ischaemic brain injury. Support vector machine analysis yielded over 95% accuracy in patient group stratification using the corpus callosum centile widths. Focal volume loss is seen in the corpus callosum of children with hypoxic-ischaemic brain injury secondary to loss of commissural fibres arising in the paracentral lobules. Support vector machine stratification into the hypoxic-ischaemic brain injury group or the control group on the basis of corpus callosum width is highly accurate and points towards rapid clinical translation of this technique as a potential biomarker of hypoxic-ischaemic brain injury. (orig.)
Quantification of structural changes in the corpus callosumin children with profound hypoxic-ischaemic brain injury

International Nuclear Information System (INIS)

Stivaros, Stavros M.; Radon, Mark R.; Mileva, Reneta; Gledson, Ann; Keane, John A.; Connolly, Daniel J.A.; Batty, Ruth; Cowell, Patricia E.; Hoggard, Nigel; Griffiths, Paul D.; Wright, Neville B.; Tang, Vivian

2016-01-01

Birth-related acute profound hypoxic-ischaemic brain injury has specific patterns of damage including the paracentral lobules. To test the hypothesis that there is anatomically coherent regional volume loss of the corpus callosum as a result of this hemispheric abnormality. Study subjects included 13 children with proven acute profound hypoxic-ischaemic brain injury and 13 children with developmental delay but no brain abnormalities. A computerised system divided the corpus callosum into 100 segments, measuring each width. Principal component analysis grouped the widths into contiguous anatomical regions. We conducted analysis of variance of corpus callosum widths as well as support vector machine stratification into patient groups. There was statistically significant narrowing of the mid-posterior body and genu of the corpus callosum in children with hypoxic-ischaemic brain injury. Support vector machine analysis yielded over 95% accuracy in patient group stratification using the corpus callosum centile widths. Focal volume loss is seen in the corpus callosum of children with hypoxic-ischaemic brain injury secondary to loss of commissural fibres arising in the paracentral lobules. Support vector machine stratification into the hypoxic-ischaemic brain injury group or the control group on the basis of corpus callosum width is highly accurate and points towards rapid clinical translation of this technique as a potential biomarker of hypoxic-ischaemic brain injury. (orig.)
Metáforas e Linguística de Corpus: metodologia de análise aplicada a um gênero de negócios Metaphors and Corpus Linguistics: a method for finding metaphors in a business genre

Directory of Open Access Journals (Sweden)

Tony Berber Sardinha

2011-01-01

Full Text Available O presente trabalho visa a relatar o desenvolvimento de uma metodologia de identificação de metáforas em corpora eletrônicos. Como exemplo, foi tomado o gênero teleconferências de apresentação de resultados financeiros. A metodologia é do tipo "bottom-up" / "corpus-driven" e se baseia na identificação de palavras com frequência marcante (palavras-chave e de seus padrões de co-ocorrência, seguido do cálculo de similaridade semântica entre essas palavras. Com isso, chega-se a um conjunto de palavras que são então interpretadas em seu co-texto, por meio de concordâncias.This paper aims at reporting the development of a method for metaphor identification in computer corpora. The method was tested on a particular corpus, namely of investment conference calls, and comprises procedures that work from the bottom up, and rely on marked frequency, collocation and semantic similarity as signalling devices for metaphor. As such, the method is an example of corpus-driven research into metaphor. The application of these procedures yields a number of metaphor candidates, which are then checked manually through concordances.
Chemokines in the corpus luteum: Implications of leukocyte chemotaxis

Directory of Open Access Journals (Sweden)

Liptak Amy R

2003-11-01

Full Text Available Abstract Chemokines are small molecular weight peptides responsible for adhesion, activation, and recruitment of leukocytes into tissues. Leukocytes are thought to influence follicular atresia, ovulation, and luteal function. Many studies in recent years have focused attention on the characterization of leukocyte populations within the ovary, the importance of leukocyte-ovarian cell interactions, and more recently, the mechanisms of ovarian leukocyte recruitment. Information about the role of chemokines and leukocyte trafficking (chemotaxis during ovarian function is important to understanding paracrine-autocrine relationships shared between reproductive and immune systems. Recent advances regarding chemokine expression and leukocyte accumulation within the ovulatory follicle and the corpus luteum are the subject of this mini-review.
ANR Corpus architecturae religiosae europeae [CARE], saec. IV-X

Directory of Open Access Journals (Sweden)

Christian Sapin

2008-07-01

Full Text Available À la fin de l’année 2007, le projet déposé auprès de l’Agence nationale de la recherche (ANR et consacré à la constitution d’un corpus des monuments religieux (CARE antérieurs à l’an Mil a été retenu. Il correspond au volet propre à la France. En effet, plusieurs pays, dont l’Italie, l’Espagne, la République Tchèque, la Slovaquie, la Pologne et la Croatie ont commencé depuis deux ans les travaux préparatoires à cette ambitieuse entreprise ; la Grèce est, depuis, intéressée, de même que l’Al...
Partial segmental thrombosis of the corpus cavernosum: imaging findings.

Science.gov (United States)

Moya-Sánchez, E; Medina-Benítez, A; Medina-Salas, V; Fernández-Navarro, L

2018-03-05

Partial segmental thrombosis of the corpus cavernosum is an unusual clinical condition of unknown origin that mainly affects young males, whose characteristic presentation is the appearance of unexplained perineal pain associated with a palpable perineal mass. This entity consists of thrombosis in the perineal portion of the corpus cavernosum, usually unilateral and it is associated with underlying malignant pathologies and predisposing factors such as microtrauma. After the adequate adherence to conservative treatment, the appearance of complications such as erectile dysfunction is very uncommon. Copyright © 2018 SERAM. Publicado por Elsevier España, S.L.U. All rights reserved.
Analysing Culture and Interculture in Saudi EFL Textbooks: A Corpus Linguistic Approach

Science.gov (United States)

Almujaiwel, Sultan

2018-01-01

This paper combines corpus processing tools to investigate the cultural elements of Saudi education of English as a foreign language (EFL). The latest Saudi EFL textbooks (2016 onwards) are available in researchable PDF formats. This helps process them through corpus search software tools. The method adopted is based on analysing 20 cultural…
Identification of histone modifications in biomedical text for supporting epigenomic research.

Science.gov (United States)

Kolárik, Corinna; Klinger, Roman; Hofmann-Apitius, Martin

2009-01-30

Posttranslational modifications of histones influence the structure of chromatine and in such a way take part in the regulation of gene expression. Certain histone modification patterns, distributed over the genome, are connected to cell as well as tissue differentiation and to the adaption of organisms to their environment. Abnormal changes instead influence the development of disease states like cancer. The regulation mechanisms for modifying histones and its functionalities are the subject of epigenomics investigation and are still not completely understood. Text provides a rich resource of knowledge on epigenomics and modifications of histones in particular. It contains information about experimental studies, the conditions used, and results. To our knowledge, no approach has been published so far for identifying histone modifications in text. We have developed an approach for identifying histone modifications in biomedical literature with Conditional Random Fields (CRF) and for resolving the recognized histone modification term variants by term standardization. For the term identification F1 measures of 0.84 by 10-fold cross-validation on the training corpus and 0.81 on an independent test corpus have been obtained. The standardization enabled the correct transformation of 96% of the terms from training and 98% from test the corpus. Due to the lack of terminologies exhaustively covering specific histone modification types, we developed a histone modification term hierarchy for use in a semantic text retrieval system. The developed approach highly improves the retrieval of articles describing histone modifications. Since text contains context information about performed studies and experiments, the identification of histone modifications is the basis for supporting literature-based knowledge discovery and hypothesis generation to accelerate epigenomic research.
Term Familiarity to indicate Perceived and Actual Difficulty of Text in Medical Digital Libraries.

Science.gov (United States)

Leroy, Gondy; Endicott, James E

2011-10-01

With increasing text digitization, digital libraries can personalize materials for individuals with different education levels and language skills. To this end, documents need meta-information describing their difficulty level. Previous attempts at such labeling used readability formulas but the formulas have not been validated with modern texts and their outcome is seldom associated with actual difficulty. We focus on medical texts and are developing new, evidence-based meta-tags that are associated with perceived and actual text difficulty. This work describes a first tag, term familiarity , which is based on term frequency in the Google corpus. We evaluated its feasibility to serve as a tag by looking at a document corpus (N=1,073) and found that terms in blogs or journal articles displayed unexpected but significantly different scores. Term familiarity was then applied to texts and results from a previous user study (N=86) and could better explain differences for perceived and actual difficulty.
MR measurement of normal corpus callosum: Age and sex differentiation

International Nuclear Information System (INIS)

Lee, Myung Seob; Kim, Myung Soon; Park, Hyun Ju

1992-01-01

Measurement of various portions of the corpus callosum was performed on magnetic resonance(MR) images of 114 subjects with no known or suspected corpus callosal disorders. Midsagittal T1-weighted images used for measurements and mean diameters of various portions in each age and sex group were obtained. Measures of five portions were made: (A) the anterio-posterior length, (B) the diameter of genu position, (C) the diameter of splenium, (D) the diameter of mid-body portion, (E) the diameter of a narrow portion at the body of corpus callosum. The mean diameter in each gender group for A, B, C, D and E were 68.8 mm, 12.1 mm, 12.3 mm, 6,9 mm, 4.1 mm in male and 69.9 mm, 12.0 mm, 12.1 mm, 6.4 mm, 4.1 mm in female, retrospectively. The groups of 0-9 years of both genders showed the minimum mean value in each portion
MR imaging of spastic diplegia. The importance of corpus callosum

International Nuclear Information System (INIS)

Hayakawa, K.; Kanda, T.; Hashimoto, K.; Okuno, Y.; Yamori, Y.; Yuge, M.; Ando, R.; Ozaki, N.; Tamamoto, A.

1996-01-01

Purpose: The MR findings in patients with spastic diplegia were investigated and the role of MR imaging in assessing the extent of brain injury was evaluated. Material and Methods: 39 male and 24 female patients (preterm/term 43/20) were imaged using a 0.5 T MR system. Results: The MR findings in term patients were quite different from those in preterm patients; 55% of the term patients showed normal and minimal changes on MR, whereas 90.7% of the 43 preterm children had periventricular leucomalacia. The deep cerebral white matter was the most frequently involved site. Objective measurements revealed significant reductions of the entire sagittal area of corpus callosum in diplegic patients in comparison with normal controls. The motor plasy severity correlated well with the extent of corpus callosum involvement. Conclusion: The corpus callosum appears to be a sensitive marker site for the assessment of the extent of white matter injury. (orig.)
MR measurement of normal corpus callosum: Age and sex differentiation

Energy Technology Data Exchange (ETDEWEB)

Lee, Myung Seob; Kim, Myung Soon; Park, Hyun Ju [Wonju College of Medicine, Yonsei University, Wonju (Korea, Republic of)

1992-07-15

Measurement of various portions of the corpus callosum was performed on magnetic resonance(MR) images of 114 subjects with no known or suspected corpus callosal disorders. Midsagittal T1-weighted images used for measurements and mean diameters of various portions in each age and sex group were obtained. Measures of five portions were made: (A) the anterio-posterior length, (B) the diameter of genu position, (C) the diameter of splenium, (D) the diameter of mid-body portion, (E) the diameter of a narrow portion at the body of corpus callosum. The mean diameter in each gender group for A, B, C, D and E were 68.8 mm, 12.1 mm, 12.3 mm, 6,9 mm, 4.1 mm in male and 69.9 mm, 12.0 mm, 12.1 mm, 6.4 mm, 4.1 mm in female, retrospectively. The groups of 0-9 years of both genders showed the minimum mean value in each portion.
Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus.

Science.gov (United States)

Savkov, Aleksandar; Carroll, John; Koeling, Rob; Cassell, Jackie

The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning.

pubmed. mineR: An R package with text-mining algorithms to ...

Indian Academy of Sciences (India)

2016-08-26

Aug 26, 2016 ... Three case studies are presented, namely, `Evolving role of diabetes educators', `Cancer risk assessment' and `Dynamic concepts on disease and comorbidity' to illustrate the use of pubmed.mineR. The package generally runs fast with small elapsed times in regular workstations even on large corpus ...
Impact of in utero exposure to EtOH on corpus callosum development and paw preference in rats: protective effects of silymarin

Directory of Open Access Journals (Sweden)

Montoya Rebecca

2002-11-01

Full Text Available Abstract Background Using a rat model we have found that the bioflavonoid silymarin (SY ameliorates some of the negative consequences of in utero exposure to ethanol (EtOH. In the current study our aim was to determine if laterality preference and corpus callosum development were altered in rat offspring whose mothers were provided with a concomitant administration of SY with EtOH throughout gestation. Methods We provided pregnant Fisher/344 rats with liquid diets containing 35% ethanol derived calories (EDC throughout the gestational period. A silymarin/phospholipid compound containing 29.8% silybin was co administered with EtOH to a separate experimental group. We tested the offspring for laterality preference at age 12 weeks. After testing the rats were sacrificed and their brains perfused for later corpus callosum extraction. Results We observed incomplete development of the splenium in the EtOH-only offspring. Callosal development was complete in all other treatment groups. Rats from the EtOH-only group displayed a left paw preference; whereas control rats were evenly divided between right and left paw preference. Inexplicably both SY groups were largely right paw preferring. Conclusions The addition of SY to the EtOH liquid diet did confer some ameliorative effects upon the developing fetal rat brain.
A Corpus-Based Lexical Study on Frequency and Distribution of Coxhead's Awl Word Families in Medical Research Articles (RAs)

Science.gov (United States)

Chen, Qi; Guang-Chun, Ge

2007-01-01

We conducted a lexical study on the word frequency and the text coverage of the 570 word families from Coxhead's Academic Word List (AWL) in medical research articles (RAs) based on a corpus of 50 medical RAs written in English with 190425 running words. By computer analysis, we found that the text coverage of the AWL words accounted for around…
ABSTRACT NOUNS IN THE SPEECH OF THE EMGLISHMEN (BASED ON FICTION WORKS AND BRITISH NATIONAL CORPUS

Directory of Open Access Journals (Sweden)

Natalia Veniaminovna Khokhlova

2015-01-01

Full Text Available The research aimed at studying the use of abstract nouns in the Englishmen’s speech from the standpoint of sociolinguistics. The article introduces a new, sociolinguistic, approach to research of abstract nouns; it is also the first time they are studied in a language corpus. The first stage of the research was based on fiction literary works: abstract nouns were extracted of analysis from the statements of the characters belonging to the opposite social classes. Later, these data was compared with the results of the original corpus research based on the British national corpus: sentences with nouns were selected out of the conversational subcorpus of BNC and were further sorted into abstract, concrete and words denoting people. Then, their frequency and vocabulary was studied with regards to speakers’ age, gender and social standing. The results revealed that abstract words are used more often that concrete ones regardless of the speaker’s social characteristics, however, the size and content of vocabulary is different (it is generally more substantial in the speech of women and representatives of higher social classes. The results of this research can be used in elaborating a course of the English language or in teaching general linguistics, sociolinguistics and country studies.
Penile erection responses of Nigella sativa seed extract on isolated rat corpus cavernosum

Science.gov (United States)

Aminyoto, M.; Ismail, S.

2018-04-01

Nigella sativa L. (NS) from Ranunculaceae family is known as black cumin in Indonesia. The seed has been used as an aphrodisiac in ethnobotanical studies and reported to have pharmacological activities such as antihypertensive through the relaxant effect of vascular smooth muscles but the direct effect to the blood vessels of the corpus cavernosum is still unknown. The purpose of this study was to examine the response of NS seed extract on penile erection in vitro. NS seeds were macerated in ethanol solvent for three days in room temperature and repeated for two times. Penile erection responses was assessed using isolated rat corpus cavernosum in Krebs-Henseleit solution, temperature 37°C, pH 7.4, aerated with carbogen gas. After acclimation, corpus cavernosum was contracted with a phenylephrine solution. Ethanolic extract of NS seeds or control solution were given after reaching the plateu phase of the highest contraction. This study showed that the contraction response of the corpus cavernosum decreased after addition of NS extract and this action was increased with the addition of the extract concentration. This study concluded that NS seed ethanol extract affects the penile erection response directly through the relaxation of blood vessels in the corpus cavernosum.
Cytokines and Angiogenesis in the Corpus Luteum

Directory of Open Access Journals (Sweden)

António M. Galvão

2013-01-01

Full Text Available In adults, physiological angiogenesis is a rare event, with few exceptions as the vasculogenesis needed for tissue growth and function in female reproductive organs. Particularly in the corpus luteum (CL, regulation of angiogenic process seems to be tightly controlled by opposite actions resultant from the balance between pro- and antiangiogenic factors. It is the extremely rapid sequence of events that determines the dramatic changes on vascular and nonvascular structures, qualifying the CL as a great model for angiogenesis studies. Using the mare CL as a model, reports on locally produced cytokines, such as tumor necrosis factor α (TNF, interferon gamma (IFNG, or Fas ligand (FASL, pointed out their role on angiogenic activity modulation throughout the luteal phase. Thus, the main purpose of this review is to highlight the interaction between immune, endothelial, and luteal steroidogenic cells, regarding vascular dynamics/changes during establishment and regression of the equine CL.
Subluxation and semantics: a corpus linguistics study.

Science.gov (United States)

Budgell, Brian

2016-06-01

The purpose of this study was to analyze the curriculum of one chiropractic college in order to discover if there were any implicit consensus definitions of the term subluxation. Using the software WordSmith Tools, the corpus of an undergraduate chiropractic curriculum was analyzed by reviewing collocated terms and through discourse analysis of text blocks containing words based on the root 'sublux.' It was possible to identify 3 distinct concepts which were each referred to as 'subluxation:' i) an acute or instantaneous injurious event; ii) a clinical syndrome which manifested post-injury; iii) a physical lesion, i.e. an anatomical or physiological derangement which in most instances acted as a pain generator. In fact, coherent implicit definitions of subluxation exist and may enjoy broad but subconscious acceptance. However, confusion likely arises from failure to distinguish which concept an author or speaker is referring to when they employ the term subluxation.
The corpus-driven revolution in Polish Sign Language: the interview with Dr. Paweł Rutkowski

Directory of Open Access Journals (Sweden)

Iztok Kosem

2018-02-01

Full Text Available Dr. Paweł Rutkowski is head of the Section for Sign Linguistics at the University of Warsaw. He is a general linguist and a specialist in the field of syntax of natural languages, carrying out research on Polish Sign Language (polski język migowy — PJM. He has been awarded a number of prizes, grants and scholarships by such institutions as the Foundation for Polish Science, Polish Ministry of Science and Higher Education, National Science Centre, Poland, Polish–U.S. Fulbright Commission, Kosciuszko Foundation and DAAD. Dr. Rutkowski leads the team developing the Corpus of Polish Sign Language and the Corpus-based Dictionary of Polish Sign Language, the first dictionary of this language prepared in compliance with modern lexicographical standards. The dictionary is an open-access publication, available freely at the following address: http://www.slownikpjm.uw.edu.pl/en/. This interview took place at eLex 2017, a biennial conference on electronic lexicography, where Dr. Rutkowski was awarded the Adam Kilgarriff Prize and gave a keynote address entitled Sign language as a challenge to electronic lexicography: The Corpus-based Dictionary of Polish Sign Language and beyond. The interview was conducted by Dr. Victoria Nyst from Leiden University, Faculty of Humanities, and Dr. Iztok Kosem from the University of Ljubljana, Faculty of Arts.
Compiling an OPEC Word List: A Corpus-Informed Lexical Analysis

Directory of Open Access Journals (Sweden)

Ebtisam Saleh Aluthman

2017-01-01

Full Text Available The present study is conducted within the borders of lexicographic research, where corpora have increasingly become all-pervasive. The overall goal of this study is to compile an open-source OPEC[1] Word List (OWL that is available for lexicographic research and vocabulary learning related to English language learning for the purpose of oil marketing and oil industries. To achieve this goal, an OPEC Monthly Reports Corpus (OMRC comprising of 1,004,542 words was compiled. The OMRC consists of 40 OPEC monthly reports released between 2003 and 2015. Consideration was given to both range and frequency criteria when compiling the OWL which consists of 255 word types. Along with this basic goal, this study aims to investigate the coverage of the most well-recognised word lists, the General Service List of English Words (GSL (West ,1953 and the Academic Word List (AWL (Coxhead, 2000 in the OMRC corpus. The 255 word types included in the OWL are not overlapping with either the AWL or the GSL. Results suggest the necessity of making this discipline-specific word list for ESL students of oil marketing industries. The availability of the OWL has significant pedagogical contributions to curriculum design, learning activities and the overall process of vocabulary learning in the context of teaching English for specific purposes (ESP. OPEC stands for Organisation of Petroleum Exporting Countries.
A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC.

Science.gov (United States)

Kors, Jan A; Clematide, Simon; Akhondi, Saber A; van Mulligen, Erik M; Rebholz-Schuhmann, Dietrich

2015-09-01

To create a multilingual gold-standard corpus for biomedical concept recognition. We selected text units from different parallel corpora (Medline abstract titles, drug labels, biomedical patent claims) in English, French, German, Spanish, and Dutch. Three annotators per language independently annotated the biomedical concepts, based on a subset of the Unified Medical Language System and covering a wide range of semantic groups. To reduce the annotation workload, automatically generated preannotations were provided. Individual annotations were automatically harmonized and then adjudicated, and cross-language consistency checks were carried out to arrive at the final annotations. The number of final annotations was 5530. Inter-annotator agreement scores indicate good agreement (median F-score 0.79), and are similar to those between individual annotators and the gold standard. The automatically generated harmonized annotation set for each language performed equally well as the best annotator for that language. The use of automatic preannotations, harmonized annotations, and parallel corpora helped to keep the manual annotation efforts manageable. The inter-annotator agreement scores provide a reference standard for gauging the performance of automatic annotation techniques. To our knowledge, this is the first gold-standard corpus for biomedical concept recognition in languages other than English. Other distinguishing features are the wide variety of semantic groups that are being covered, and the diversity of text genres that were annotated. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
The usage of amount, quantity and body in a corpus of biology

Directory of Open Access Journals (Sweden)

Purificación Sánchez Hernández

2002-04-01

Full Text Available Grammars and dictionaries usually offer relevant and accurate information to students of a second language. However, the meaning of a textual element is often dynamic and that information is not always based on real usage patterns. New occurrences on the object level in new contexts can introduce novel semantic potentials, so that existing interpretations may be superseded by new ones. Concordancing has been shown to be one of the most important tools to facilitate the understanding of the usage patterns of a language. In this paper we examine the differences between amount, quantity and body as terms expressing magnitude, sum and size in a corpus of Biology. According to some popular dictionaries and grammars, the terms amount and quantity have always been considered synonymous terms for expressing magnitude, size and sum. We demonstrate that, according to our records, they cannot be always used as synonymous terms since they have different patterns of usage. On the other hand there are other forms, such as body, that appear in our Corpus, implying magnitude, size and sum, that are not usually described as having such meanings in dictionaries.
Hemoperitoneum from Corpus Luteal Cyst Rupture: A Practical Approach in Emergency Room

Directory of Open Access Journals (Sweden)

Valeria Fiaschetti

2014-01-01

Full Text Available Corpus luteum cyst rupture with consequent hemoperitoneum is a common disorder in women in their reproductive age. This condition should be promptly recognized and treated because a delayed diagnosis may significantly reduce women’s fertility and intra-abdominal bleeding may be life-threatening. Many imaging modalities play a key role in the diagnosis of acute pelvic pain from gynecological causes. Ultrasound study (USS is usually the first imaging technique for initial evaluation. USS is used to confirm or to exclude the presence of intraperitoneal fluid but it has some limitations in the identification of the bleeding source. Contrast-enhanced computed tomography (CT is the imaging modality which could be used in the acute setting in order to recognize gynecological emergencies and to establish a correct management. Magnetic resonance imaging (MRI nowadays is the most useful technique for studying the pelvis but its low availability and the long acquisition time of the images limit its usefulness in characterization of acute gynecological complications. We report a case of a young patient with hemoperitoneum from hemorrhagic corpus luteum correctly identified by transabdominal USS and contrast-enhanced CT.
Análisis jurídico a la ley estatutaria 1095 de 2006 de Habeas Corpus

Directory of Open Access Journals (Sweden)

María Cristina Patiño-González

2010-03-01

Full Text Available Luego de casi tres años y medio sin que existiera en Colombia un desarrollo legal del habeas corpus, el pasado 2 de noviembre de 2006 fue sancionada la Ley Estatutaria 1095, por la cual se reglamentó el artículo 30 de la Carta Política. Este cuerpo normativo dispuso que el habeas corpus tenía la naturaleza jurídica de un derecho fundamental y una acción constitucional que tutela la libertad personal cuando alguien es privado de dicha libertad con violación de las garantías constitucionales y legales. Sin embargo, en aplicación del bloque de constitucionalidad, el propio desarrollo de la Ley Estatutaria y la jurisprudencia de la Corte Constitucional, el habeas corpus también se erige como la garantía fundamental que protege los derechos funda mentales colaterales de los detenidos y ostenta la naturaleza de un recurso de amparo. El artículo ofrece un estudio sobre el desarrollo previsto por la Ley Estatutaria de Habeas Corpus en materia de definición, competencia, garantías para el ejercicio de la acción, contenido de la petición, su trámite, decisión y los medios de impugnación susceptibles de impetrar, y analiza críticamente la Sentencia C-187/06 de la Corte Constitucional que realizó el control previo de constitucionalidad; ofrece además una serie de aportes para una interpretación más garantista de la institución y se hacen observaciones de lege ferenda
Corpus gastritis in patients with endoscopic diagnosis of reflux oesophagitis and Barrett's oesophagus.

NARCIS (Netherlands)

Laheij, R.J.F.; Rossum, L.G.M. van; Boer, W.A. de; Jansen, J.B.M.J.

2002-01-01

BACKGROUND: A high level of gastric acid secretion is considered to be a risk factor for reflux oesophagitis or Barrett's oesophagus. Corpus gastritis may have a protective effect on the oesophagus, because of decreased gastric acid output. AIM: To determine if corpus gastritis is associated with
Algorithm of Syntactic Idioms Recognition in the Text: Attempt of Construction

Directory of Open Access Journals (Sweden)

Sytar Hanna

2016-12-01

Full Text Available Background: Attention of national and foreign researchers was focused so far on structural and semantic features of syntactic idioms. Automatic analysis of these peculiar units that are on the verge of syntax and phraseology still was not carried out in the scientific literature. This issue requires a theoretical understanding and practical implementation. Purpose: To create an algorithm of recognition of syntactic idioms with one- or two-term core component in the corpus of texts. Results: Based on the results of previous theoretical studies we highlighted a number of formal and statistical criteria that enable to distinguish syntactic idioms from other language units in the corpus of Ukrainian-language texts. The author developed a block diagram of syntactic idioms recognition, incorporating two branches constructed accordingly for the sentences with one-term and sentences with two-term core component. The first branch is based on the presence of word repeats (full words concurrence or presence of other word forms of the word and the list of core components determined on previous stages of the study (є, це, то, не, так; як; з/із/зі, між, над, серед; а, але, зате, однак, проте. The second branch was created for another type of syntactic idioms – one with a two-term core component. It takes into account the following properties of the analyzed units: the presence of combinations of service parts of speech, service parts of speech with pronoun or adverb, pronoun and adverb; compliance of words combinations with the register of the syntactic idioms core components currently comprising 92 structures; association measure of mutual information ≥9, etc. Discussion: Offered algorithm enables automatic identification of syntactic idioms in the corpus of texts and removal of contexts of their use, it can be used to improve the procedure of automatic text processing and creation of automated translation
Resolving relative time expressions in Dutch text with Constraint Handling Rules

DEFF Research Database (Denmark)

van de Camp, Matje; Christiansen, Henning

2012-01-01

It is demonstrated how Constraint Handling Rules can be applied for resolution of indirect and relative time expressions in text as part of a shallow analysis, following a specialized tagging phase. A method is currently under development, optimized for a particular corpus of historical biographies...
A survey of text clustering techniques used for web mining

Directory of Open Access Journals (Sweden)

Dan MUNTEANU

2005-12-01

Full Text Available This paper contains an overview of basic formulations and approaches to clustering. Then it presents two important clustering paradigms: a bottom-up agglomerative technique, which collects similar documents into larger and larger groups, and a top-down partitioning technique, which divides a corpus into topic-oriented partitions.
Analysis of high signal intensities of nontumorous conditions of corpus callosum on magnetic resonance T2-weighted images

International Nuclear Information System (INIS)

Kang, Moo Song; Kim, Chul Min; Chung, Chun Phil

1995-01-01

To evaluate high signal intensity of nontumorous conditions of corpus callosum on T2-weighted MR images. Forty nine patients with nontumorous high signal intensities involving corpus callosum on sagittal T2-weighted image were retrospectively analyzed. Nontumorous condition of corpus callosum were diffuse axonal injury (DAI, 19 cases), cerebral infarctions (16 cases), multiple sclerosis (MS, 5 cases), Wilson's disease (2 cases) and hydrocephalus (7 cases) that were diagnosed by clinical and MR findings. Numbers, configuration, involved thickness and sites of high signal intensities of corpus callosum were analyzed. DAI and infarctions showed either single or multiple lesions. MS and hydrocephalus showed multiple lesions, but Wilson's diseases showed single lesion. In DAI, infarctions and MS the lesions involved any part of corpus callosum, splenium in Wilson's disease, and all parts of corpus callosum in hydrocephalus. Wilson's disease showed only partial thickness involvement, and others involved partial or full thickness of corpus callosum. Configuration of high signal intensity was linear in most cases of hydrocephalus, and oval in Wilson's disease, and oval and confluent in MS, and variable in DAI and infarctions. High signal intensities of nontumorous conditions of corpus callosum revealed variable findings, and therefore, analysis of nontumorous high signal intensities of corpus callosum is not made by only MR findings but by conjuction with clinical aspects
Conjunctions in ELF academic discourse: a corpus-based analysis

Directory of Open Access Journals (Sweden)

Laura Centonze

2014-03-01

Full Text Available Abstract – Conjunctions as fundamental elements in the construction of discourse cohesion represent a relatively neglected research area, due to their complexity and the bewildering number of “conjunctive relations” (Halliday and Hasan 1976: 226 that they may express in context, as also highlighted in Christiansen (2011. In addition to this, there does not seem to be a shared view as far as the classification and denomination of the different kinds of conjunctions are concerned (cf. Halliday and Hasan 1976; Vande Kopple 1985; Martin and Rose 2003; Hyland 2005b. The selection of a specific type of conjunction acquires more importance because they are typically open to so many different interpretations, especially when the participants in the speech event come from diverse lingua-cultural backgrounds (cf. Guido 2007; Guido 2008; Cogo et al. 2011.Following the taxonomy provided by Halliday and Hasan (1976 for conjunctions, our study attempts to shed light on the usage of conjunctions by ELF speakers in specific contexts. We shall consider ten transcripts taken from the VOICE Corpus (Seidlhofer et. al 2013, namely five interviews and five conversations in multicultural academic contexts (approximately 4,000 words each, and analyze the number of instances for each type of conjunction (additive, adversative, clausal, temporal as well as continuatives in depth, by adopting a quantitative as well as a qualitative method and by using TextSTAT 2.9 (Huning 2012. We shall then move on to the analysis of conjunctions with respect to their internal properties/collocates and eventually see the occurrence of conjunctions by comparing them with the two different speech events which are chosen as the subject of our study, i.e. interviews and conversations. We shall see the extent to which certain conjunctions are more restricted than others in terms of usage (cf. Leung 2005 in both types of speech events, despite the great number of options available to the
A Chinese text classification system based on Naive Bayes algorithm

Directory of Open Access Journals (Sweden)

Cui Wei

2016-01-01

Full Text Available In this paper, aiming at the characteristics of Chinese text classification, using the ICTCLAS(Chinese lexical analysis system of Chinese academy of sciences for document segmentation, and for data cleaning and filtering the Stop words, using the information gain and document frequency feature selection algorithm to document feature selection. Based on this, based on the Naive Bayesian algorithm implemented text classifier , and use Chinese corpus of Fudan University has carried on the experiment and analysis on the system.

Como encontrar as palavras-chave mais importantes de um corpus com WordSmith tools How to find the most important keywords in a corpus with WordSmith tools

Directory of Open Access Journals (Sweden)

Tony Berber-Sardinha

2005-12-01

Full Text Available Um dos procedimentos mais delicados envolvidos numa análise de corpus via palavras-chave com WordSmith Tools KeyWords é a seleção de um sub-conjunto de palavras para serem investigadas em detalhe. A seleção se faz necessária, via de regra, porque o tamanho do léxico chave de um corpus de estudo é em geral muito grande, em geral em torno de 1500 palavras ou até mais. Uma maneira de fazer esse recorte consiste na extração de palavras-chave exclusivas. O léxico chave exclusivo é composto das palavras-chave que ocorrem somente no corpus de estudo em questão em comparação com palavras-chave de outros corpora de estudo. Contudo, comparar a lista de palavras-chave com várias outras é um procedimento custoso e complicado, que não pode ser exigido da maioria dos usuários de WordSmith Tools KeyWords. Uma alternativa para este cenário seria a aplicação de um ponto de corte generalizado baseado em tendências de retorno de palavras-chave observadas através da aplicação do banco de palavras-chave existente. Tal ponto de corte indicaria a região da lista de palavras-chave na qual há maior probabilidade de ocorrência do léxico chave exclusivo. Os resultados obtidos aqui indicam um ponto de corte entre 31% a 53% das palavras da lista, a partir da primeira de uma lista ordenada por chavicidade.One of the most sensitive issues surrounding a keywords analysis with WordSmith Tools is the selection of a subset of words in a corpus that deserve being looked at in greater detail. This selection is normally needed because the size of the key word list can reach several hundred, up to 1,500 or more. One way to extract a selection consists of the pulling out 'exclusive key words'. This key lexis is made up of keywords that only in a single corpus only, in comparison with a bank of keyword lists. Nevertheless, comparing several keyword lists together is a demanding task, which most users of WordSmith Tools are not expected to cope with. An
The importance of the corpus callosum in the diagnosis of multiple sclerosis

International Nuclear Information System (INIS)

Goossens-Merkt, H.; Mueller-Jensen, M.; Zanella, F.D.

1991-01-01

Besides MS there are a lot of diseases with lesions of the white matter, especially vascular diseases. In quest of a specific pattern in MRI of MS especially in early diagnosis the corpus callosum in patients with MS and in other diseases was analyzed. The progressive atrophy of the corpus callosum in the course of multiple sclerosis is well known. A good correlation between atrophy of the corpus callosum in T1 weighted MRI and the severity of organic mental disorder has been demonstrated. Since Atrophy however is an unspecific sign while demyelinating lesions are much more specific for MS, a brain region in which vascular lesions are rare but demyelinating lesions are more frequent, has been studied. (author). 10 refs.; 2 figs.; 1 tab
Using Google as a Super Corpus to Drive Written Language Learning: A Comparison with the British National Corpus

Science.gov (United States)

Sha, Guoquan

2010-01-01

Data-driven learning (DDL), or corpus-based language learning, involves the learner in an exploratory task to discover appropriate expressions or collocates regarding his writing. However, the problematic units of meaning in each learner's writing are so diverse that conventional corpora often prove futile. The search engine Google with the…
The Brazilian Theory of Habeas Corpus for Great Apes

Directory of Open Access Journals (Sweden)

Heron José de Santana Gordilho

2016-06-01

Full Text Available This essay presents a comparison between human evolution and legal developments, trying to demonstrate how darwinian theory of evolution by natural selection has caused changes in the legal world, the bridge of today some lawyers using the recent discoveries about how similar genetic between man and great primates to claim extension of human rights for chimpanzees, bonobos, gorillas and orangs. It also that many activists for animal`s rights have considered the dispute an important strategy, whether to set new means for legal institutes such as the Habeas Corpus, hitherto used only to ensure human freedom, whether to increase the movement and increase the conscietization of the general population about the importance of the recognition of animals as holders of basic rights.
Embedding epistemic modals in English: A corpus-based study

Directory of Open Access Journals (Sweden)

Valentine Hacquard

2012-07-01

Full Text Available The question of whether epistemic modals contribute to the truth conditions of the sentences they appear in is a matter of active debate in the literature. Fueling this debate is the lack of consensus about the extent to which epistemics can appear in the scope of other operators. This corpus study investigates the distribution of epistemics in naturalistic data. Our results indicate that they do embed, supporting the view that they contribute semantic content. However, their distribution is limited, compared to that of other modals. This limited distribution seems to call for a nuanced account: while epistemics are semantically contentful, they may require special licensing conditions. http://dx.doi.org/10.3765/sp.5.4 BibTeX info
A Corpus-based Study on the Use of Contractions by EFL Learners in Argumentative Essays

Directory of Open Access Journals (Sweden)

M. Pınar Babanoğlu

2017-01-01

Full Text Available Contraction forms in English are mostly occur in speech and informal writing and they are generally avoided in formal writing types such as academic prose, business reports and journal articles, therefore, most teachers discourage their use in academic essays (Biber, Johansonn, Leech, Conrad and Finegan 1999. Contractions in English have two types; negative contractions (isn’t, haven’t, doesn’t and verb contractions (I’m, they’ve, that’s. This corpus based study attempts to investigate contraction usage in learner and native English speaker essays. Major goal is to examine whether learners consider essay writing rules in respect of contractions which are accepted inappropriate for academic prose style. Five corpora, three learner and two native English, were utilized in order to analyze verb and not-contraction forms. Frequency calculations of contraction forms in each corpus compared via log-likelihood measurement for statistical significance. Results revealed that learners use considerably more contraction forms, especially negative ones, than native English students in their argumentative essays.
Human corpus luteum: presence of epidermal growth factor receptors and binding characteristics

International Nuclear Information System (INIS)

Ayyagari, R.R.; Khan-Dawood, F.S.

1987-01-01

Epidermal growth factor receptors are present in many reproductive tissues but have not been demonstrated in the human corpus luteum. To determine the presence of epidermal growth factor receptors and its binding characteristics, we carried out studies on the plasma cell membrane fraction of seven human corpora lutea (days 16 to 25) of the menstrual cycle. Specific epidermal growth factor receptors were present in human corpus luteum. Insulin, nerve growth factor, and human chorionic gonadotropin did not competitively displace epidermal growth factor binding. The optimal conditions for corpus luteum-epidermal growth factor receptor binding were found to be incubation for 2 hours at 4 degrees C with 500 micrograms plasma membrane protein and 140 femtomol 125 I-epidermal growth factor per incubate. The number (mean +/- SEM) of epidermal growth factor binding sites was 12.34 +/- 2.99 X 10(-19) mol/micrograms protein; the dissociation constant was 2.26 +/- 0.56 X 10(-9) mol/L; the association constant was 0.59 +/- 0.12 X 10(9) L/mol. In two regressing corpora lutea obtained on days 2 and 3 of the menstrual cycle, there was no detectable specific epidermal growth factor receptor binding activity. Similarly no epidermal growth factor receptor binding activity could be detected in ovarian stromal tissue. Our findings demonstrate that specific receptors for epidermal growth factor are present in the human corpus luteum. The physiologic significance of epidermal growth factor receptors in human corpus luteum is unknown, but epidermal growth factor may be involved in intragonadal regulation of luteal function
Wireless: Some Facts and Figures from a Corpus-driven Study

Directory of Open Access Journals (Sweden)

Camino Rea Rizzo

2009-12-01

Full Text Available
Wireless is the word selected to illustrate a model of analysis designed to determine the specialized character of a lexical unit. Wireless belongs to the repertoire of specialized vocabulary automatically extracted from a corpus of telecommunication engineering English (TEC. This paper describes the procedure followed in the analysis which is intended to fulfil a twofold purpose: first, to validate the automatic classification; and second, to gain a better insight on the lexical profile of telecommunication English. The statistical information provided by the variables of frequency, distribution and keyness, are combined with the data extracted from the exploration of the surrounding co-text, in order to describe the sintagmatic relations established.

El término Wireless ha sido seleccionado para ilustrar un método de análisis que tiene como fin determinar la naturaleza de la unidad léxica. Wíreless es un término especializado, extraído automáticamente de un corpus de inglés para telecomunicaciones (TEC. Este trabajo describe el procedimiento seguido para obtener un objetivo doble: primero, validar la clasificación automática; segundo, profundizar en la definición del inglés para las telecomunicaciones. La información estadística obtenida con las variables de frecuencia, distribución y palabras-clave se combina con datos extraídos del análisis del co-texto, con el fin de describir las relaciones sintagmáticas existentes.
Applying Corpus-Based Findings to Form-Focused Instruction: The Case of Reported Speech

Science.gov (United States)

Barbieri, Federica; Eckhardt, Suzanne E. B.

2007-01-01

Arguing that the introduction of corpus linguistics in teaching materials and the language classroom should be informed by theories and principles of SLA, this paper presents a case study illustrating how corpus-based findings on reported speech can be integrated into a form-focused model of instruction. After overviewing previous work which…
A Methodology for Mapping Meanings in Text-Based Sustainability Communication

Directory of Open Access Journals (Sweden)

Mark Brown

2013-06-01

Full Text Available In moving society towards more sustainable forms of consumption and production, social learning must play an important role. Making the assumption that it occurs as a consequence of changes in understanding, this article presents a methodology for mapping meanings in sustainability communication texts. The methodology uses techniques from corpus linguistics and framing theory. Two large databases of text were constructed by copying material down from the websites of two different groups of social actors: (i environmental NGOs and (ii British green business, and saving it as .txt files. The findings on individual words show that the NGOs and business use them very differently. Focusing on words expressing concern for the natural environment, it is proposed that the two actors also conceptualize their concern differently. Green business’s cognitive system of concern has two well-developed frames; good intentions and risk management. However, three frames—concern for the natural environment, perception of the damage, and responsibility, are light on detail. In contrast, within the NGOs’ system of concern, the frames of concern for the natural environment, perception of the damage and responsibility, contain words making detailed representations.
DESIGNING EAP MATERIALS BASED ON INTERCULTURAL CORPUS ANALYSES: THE CASE OF LOGICAL MARKERS IN RESEARCH ARTICLES

Directory of Open Access Journals (Sweden)

Pilar Mur Dueñas

2009-10-01

Full Text Available The ultimate aim of intercultural analyses in English for Academic Purposes is to help non-native scholars function successfully in the international disciplinary community in English. The aim of this paper is to show how corpus-based intercultural analyses can be useful to design EAP materials on a particular metadiscourse category, logical markers, in research article writing. The paper first describes the analysis carried out of additive, contrastive and consecutive logical markers in a corpus of research articles in English and in Spanish in a particular discipline, Business Management. Differences were found in their frequency and also in the use of each of the sub-categories. Then, five activities designed on the basis of these results are presented. They are aimed at raising Spanish Business scholars' awareness of the specific uses and pragmatic function of frequent logical markers in international research articles in English.
Cholinergic neurotransmission in human corpus cavernosum. II. Acetylcholine synthesis

International Nuclear Information System (INIS)

Blanco, R.; De Tejada, S.; Goldstein, I.; Krane, R.J.; Wotiz, H.H.; Cohen, R.A.

1988-01-01

Physiological and histochemical evidence indicates that cholinergic nerves may participate in mediating penile erection. Acetylcholine synthesis and release was studied in isolated human corporal tissue. Human corpus cavernosum incubated with [ 3 H]choline accumulated [ 3 H]choline and synthesized [ 3 H]acethylcholine in an concentration-dependent manner. [ 3 H]Acetylcholine accumulation by the tissue was inhibited by hemicholinium-3, a specific antagonist of the high-affinity choline transport in cholinergic nerves. Transmural electrical field stimulation caused release of [ 3 H]acetylcholine which was significantly diminished by inhibiting neurotransmission with calcium-free physiological salt solution or tetrodotoxin. These observations provide biochemical and physiological evidence for the existence of cholinergic innervation in human corpus cavernosum
Murine Models of Gastric Corpus PreneoplasiaSummary

Directory of Open Access Journals (Sweden)

Christine P. Petersen

2017-01-01

Full Text Available Intestinal-type gastric adenocarcinoma evolves in a field of pre-existing metaplasia. Over the past 20 years, a number of murine models have been developed to address aspects of the physiology and pathophysiology of metaplasia induction. Although none of these models has achieved true recapitulation of the induction of adenocarcinoma, they have led to important insights into the factors that influence the induction and progression of metaplasia. Here, we review the pathologic definitions relevant to alterations in gastric corpus lineages and classification of metaplasia by specific lineage markers. In addition, we review present murine models of the induction and progression of spasmolytic polypeptide (TFF2âexpressing metaplasia, the predominant metaplastic lineage observed in murine models. These models provide a basis for the development of a broader understanding of the physiological and pathophysiological roles of metaplasia in the stomach. Keywords: SPEM, Intestinal Metaplasia, Gastric Cancer, TFF2, Chief Cell, Hyperplasia
Üstverinin Tam-Metin Bilgi Erişim Performansı Üzerindeki Etkisi: Küçük Ölçekli Türkçe Külliyat Üzerinde Deneysel Bir Araştırma / Impact of Metadata on Full-text Information Retrieval Performance: An Experimental Research on a Small Scale Turkish Corpus

OpenAIRE

Çapkın, Çağdaş

2016-01-01

Information institutions use text-based information retrieval systems to store, index and retrieve metadata, full-text, or both metadata and full-text (hybrid) contents. The aim of this research was to evaluate impact of these contents on information retrieval performance. For this purpose, metadata (MIR), full-text (FIR) and hybrid (HIR) content information retrieval systems were developed with default Lucene information retrieval model for a small scale Turkish corpus. In order to evaluate ...
A critical re-examination of sexual dimorphism in the corpus callosum microstructure

DEFF Research Database (Denmark)

Westerhausen, René; Kompus, Kristiina; Dramsdahl, Margaretha

2011-01-01

the diffusion parameters did not correlate with regional callosal size. The present results indicate a stronger inter-hemispheric connectivity between the frontal lobes in males than females, which might be related to sex differences in hemispheric asymmetry and brain size........ The objective of the present DTI study was to re-examine microstructural sex differences in the corpus callosum, while controlling for corpus callosum size differences between sexes. We compared 41 female and 34 male participants using regional tract-based spatial statistics (TBSS) analysis. Clusters...... of significantly higher fractional anisotropy (FA) and lower diffusion strength in males compared to females were detected in the genu and truncus of the corpus callosum. However, only the sex difference located in the anterior genu subregions could be unequivocally interpreted. This was the only cluster where...
Neural analysis of bovine ovaries ultrasound images in the identification process of the corpus luteum

Science.gov (United States)

Górna, K.; Jaśkowski, B. M.; Okoń, P.; Czechlowski, M.; Koszela, K.; Zaborowicz, M.; Idziaszek, P.

2017-07-01

The aim of the paper is to shown the neural image analysis as a method useful for identifying the development stage of the domestic bovine corpus luteum on digital USG (UltraSonoGraphy) images. Corpus luteum (CL) is a transient endocrine gland that develops after ovulation from the follicle secretory cells. The aim of CL is the production of progesterone, which regulates many reproductive functions. In the presented studies, identification of the corpus luteum was carried out on the basis of information contained in ultrasound digital images. Development stage of the corpus luteum was considered in two aspects: just before and middle of domination phase and luteolysis and degradation phase. Prior to the classification, the ultrasound images have been processed using a GLCM (Gray Level Co-occurence Matrix). To generate a classification model, a Neural Networks module implemented in the STATISTICA was used. Five representative parameters describing the ultrasound image were used as learner variables. On the output of the artificial neural network was generated information about the development stage of the corpus luteum. Results of this study indicate that neural image analysis combined with GLCM texture analysis may be a useful tool for identifying the bovine corpus luteum in the context of its development phase. Best-generated artificial neural network model was the structure of MLP (Multi Layer Perceptron) 5:5-17-1:1.
Efficient extraction of protein-protein interactions from full-text articles.

Science.gov (United States)

Hakenberg, Jörg; Leaman, Robert; Vo, Nguyen Ha; Jonnalagadda, Siddhartha; Sullivan, Ryan; Miller, Christopher; Tari, Luis; Baral, Chitta; Gonzalez, Graciela

2010-01-01

Proteins and their interactions govern virtually all cellular processes, such as regulation, signaling, metabolism, and structure. Most experimental findings pertaining to such interactions are discussed in research papers, which, in turn, get curated by protein interaction databases. Authors, editors, and publishers benefit from efforts to alleviate the tasks of searching for relevant papers, evidence for physical interactions, and proper identifiers for each protein involved. The BioCreative II.5 community challenge addressed these tasks in a competition-style assessment to evaluate and compare different methodologies, to make aware of the increasing accuracy of automated methods, and to guide future implementations. In this paper, we present our approaches for protein-named entity recognition, including normalization, and for extraction of protein-protein interactions from full text. Our overall goal is to identify efficient individual components, and we compare various compositions to handle a single full-text article in between 10 seconds and 2 minutes. We propose strategies to transfer document-level annotations to the sentence-level, which allows for the creation of a more fine-grained training corpus; we use this corpus to automatically derive around 5,000 patterns. We rank sentences by relevance to the task of finding novel interactions with physical evidence, using a sentence classifier built from this training corpus. Heuristics for paraphrasing sentences help to further remove unnecessary information that might interfere with patterns, such as additional adjectives, clauses, or bracketed expressions. In BioCreative II.5, we achieved an f-score of 22 percent for finding protein interactions, and 43 percent for mapping proteins to UniProt IDs; disregarding species, f-scores are 30 percent and 55 percent, respectively. On average, our best-performing setup required around 2 minutes per full text. All data and pattern sets as well as Java classes that
Avoid violence, rioting, and outrage; approach celebration, delight, and strength: Using large text corpora to compute valence, arousal, and the basic emotions.

Science.gov (United States)

Westbury, Chris; Keith, Jeff; Briesemeister, Benny B; Hofmann, Markus J; Jacobs, Arthur M

2015-01-01

Ever since Aristotle discussed the issue in Book II of his Rhetoric, humans have attempted to identify a set of "basic emotion labels". In this paper we propose an algorithmic method for evaluating sets of basic emotion labels that relies upon computed co-occurrence distances between words in a 12.7-billion-word corpus of unselected text from USENET discussion groups. Our method uses the relationship between human arousal and valence ratings collected for a large list of words, and the co-occurrence similarity between each word and emotion labels. We assess how well the words in each of 12 emotion label sets-proposed by various researchers over the past 118 years-predict the arousal and valence ratings on a test and validation dataset, each consisting of over 5970 items. We also assess how well these emotion labels predict lexical decision residuals (LDRTs), after co-varying out the effects attributable to basic lexical predictors. We then demonstrate a generalization of our method to determine the most predictive "basic" emotion labels from among all of the putative models of basic emotion that we considered. As well as contributing empirical data towards the development of a more rigorous definition of basic emotions, our method makes it possible to derive principled computational estimates of emotionality-specifically, of arousal and valence-for all words in the language.
Publishing a Quality Context-aware Annotated Corpus and Lexicon for Harassment Research

OpenAIRE

Rezvan, Mohammadreza; Shekarpour, Saeedeh; Balasuriya, Lakshika; Thirunarayan, Krishnaprasad; Shalin, Valerie; Sheth, Amit

2018-01-01

Having a quality annotated corpus is essential especially for applied research. Despite the recent focus of Web science community on researching about cyberbullying, the community dose not still have standard benchmarks. In this paper, we publish first, a quality annotated corpus and second, an offensive words lexicon capturing different types type of harassment as (i) sexual harassment, (ii) racial harassment, (iii) appearance-related harassment, (iv) intellectual harassment, and (v) politic...
Morphometry of the corpus callosum in Chinese children: relationship with gender and academic performance

International Nuclear Information System (INIS)

Ng, Wing Hung Alex; Chan, Yu.Lung; Au, Kit Sum Agnes; Yeung, Ka Wai David; Kwan, Ting Fai; To, Cho Yee

2005-01-01

The corpus callosum has been widely studied, but no study has demonstrated whether its size and shape have any relationship with language and calculation performance. To examine the morphometry of the corpus callosum of normal Chinese children and its relationship with gender and academic performance. One hundred primary school children (63 boys, 37 girls; age 6.5-10 years) were randomly selected and the standardized academic performance for each was ascertained. On the mid-sagittal section of a brain MRI, the length, height and total area of the corpus callosum and its thickness at different sites were measured. These were correlated with sex and academic performance. Apart from the normal average dimension of the different parts of the corpus callosum, thickness at the body-splenium junction in the average-to-good performance group was significantly greater than the below-average performance group in Chinese language (P=0.005), English language (P=0.02) and mathematics (P=0.01). The remainder of the callosal thickness showed no significant relationship with academic performance. There was no significant sex difference in the thickness of any part of the corpus callosum. These findings raise the suggestion that language and mathematics proficiency may be related to the morphometry of the fibre connections in the posterior parietal lobes. (orig.)

Morphometry of the corpus callosum in Chinese children: relationship with gender and academic performance

Energy Technology Data Exchange (ETDEWEB)

Ng, Wing Hung Alex; Chan, Yu.Lung [Prince of Wales Hospital, Department of Diagnostic Radiology and Organ Imaging, Shatin, Hong Kong (Hong Kong); Au, Kit Sum Agnes [James Cook University, Department of Psychology, Townsville, Queensland (Australia); Yeung, Ka Wai David; Kwan, Ting Fai; To, Cho Yee

2005-06-01

The corpus callosum has been widely studied, but no study has demonstrated whether its size and shape have any relationship with language and calculation performance. To examine the morphometry of the corpus callosum of normal Chinese children and its relationship with gender and academic performance. One hundred primary school children (63 boys, 37 girls; age 6.5-10 years) were randomly selected and the standardized academic performance for each was ascertained. On the mid-sagittal section of a brain MRI, the length, height and total area of the corpus callosum and its thickness at different sites were measured. These were correlated with sex and academic performance. Apart from the normal average dimension of the different parts of the corpus callosum, thickness at the body-splenium junction in the average-to-good performance group was significantly greater than the below-average performance group in Chinese language (P=0.005), English language (P=0.02) and mathematics (P=0.01). The remainder of the callosal thickness showed no significant relationship with academic performance. There was no significant sex difference in the thickness of any part of the corpus callosum. These findings raise the suggestion that language and mathematics proficiency may be related to the morphometry of the fibre connections in the posterior parietal lobes. (orig.)
EuroGOV: Engineering a Multilingual Web Corpus

NARCIS (Netherlands)

Sigurbjörnsson, B.; Kamps, J.; de Rijke, M.

2005-01-01

EuroGOV is a multilingual web corpus that was created to serve as the document collection for WebCLEF, the CLEF 2005 web retrieval task. EuroGOV is a collection of web pages crawled from the European Union portal, European Union member state governmental web sites, and Russian government web sites.
Tradução, padrões e nuances: um estudo de Linguística de Corpus sobre diferentes prosódias semânticas na língua fonte e na língua alvo Translation, patterns and nuance: a study based on Corpus Linguistics about different semantic prosodies found in the source and target languages

Directory of Open Access Journals (Sweden)

Maria Cecília Lopes

2011-01-01

Full Text Available Neste artigo propomos uma discussão sobre a importância da prosódia semântica em traduções. Para tanto, estudamos sete itens lexicais analisados anteriormente por Coterril (2001, em inglês, que também ocorreram nos 162 textos jornalísticos do corpus paralelo desta pesquisa, formado por dados originalmente escritos em inglês (língua fonte e traduzidos para o português (língua-alvo. As ferramentas Alinhador, Concordanciador Paralelo (CEPRIL, LAEL, PUC-SP e WordSmith Tools 4 (SCOTT, 2004 foram usadas para a organização e o processamento dos dados. Como corpus de referência usou-se o Corpus do Português (DAVIES; FERREIRA, 2008. A análise mostrou resultados diferentes entre as prosódias semânticas de itens usados na língua fonte e na língua-alvo sugerindo que as escolhas tradutórias não imprimem o mesmo sentido do original. Concluimos este artigo sinalizando para a importância de estudos bilíngues que descrevam padrões lexicogramaticais, como a prosódia semântica, de maneira a contribuir para o estudo e a prática de tradução.This article aims to discuss the importance of semantic prosody in translation. Therefore, we present the analysis of seven lexical items previously studied in English by Coterril (2001 which also occurred in our parallel corpus (English-Portuguese composed by 162 journalistic texts. The methodology included data submitted to Alinhador Paralelo, Concordanciador Paralelo (CEPRIL, LAEL, PUC-SP and WordSmith Tools 4 (SCOTT, 2004. Corpus do Português (DAVIES; FERREIRA, 2008 was used as a reference corpus. The results showed different semantic prosody when comparing the itens used in the source and target languages (English-Portuguese. In our conclusion we offer some suggestions for future research considering two languages when investigating lexicogrammatical patterns such as the semantic prosody for Translation studies and practice.
Amphetamine-enhanced accumulation of [3H]-spiperone in mouse corpus striatum in vivo: Modification by other drugs

International Nuclear Information System (INIS)

Dorris, R.L.

1989-01-01

Other investigators have reported that amphetamine administered to rodents results in an increase in the in vivo accumulation of either the tritiated dopamine receptor ligand, spiperone or pimozide in the dopaminergic corpus striatum, (specific binding) while not altering that in the sparsely dopaminergically innervated cerebellum (non-specific binding). Experiments were undertaken to determine if the results could be replicated and if some other drugs would modify the effect. Male mice were injected with [ 3 H]-spiperone (20 μCi/Kg, 0.0003 mg/kg) s.c. and killed 2 hrs later for determination of radioactivity in corpus striatum and cerebellum. Amphetamine (20 mg/kg, i.p.) given 15 min before [ 3 H]-spiperone, increased accumulation in striatum but not cerebellum. The increase was inhibited by α - methyltyrosine (α-MT), haloperidol, reserpine or amantadine. It is suggested that the amphetamine-induced increase in accumulation of [ 3 H]-spiperone in corpus striatum (specific binding) depends on release of large amounts of dopamine, which then must be able to interact with the dopamine receptor. The antagonism of the effect by α-MT or reserpine can be explained by dopamine depletion, that of haloperidol by antagonism for binding at the receptor site. It is suggested that amantadine acts by a dual mechanism: (1) as a low efficacy agonist, it competes for binding to the receptor and (2) it has some ability to block dopamine release
Architecture of the Corpus Spongiosum : An Anatomical Study

NARCIS (Netherlands)

Ottenhof, Sarah R; de Graaf, Petra; Soeterik, Timo F W; Neeter, Lidewij M F H; Zilverschoon, Marijn; Spinder, Matty; Bosch, J L H Ruud; Bleys, Ronald L A W; Heck-de Kort, Laetitia

PURPOSE: Urethral reconstruction is performed for urethral stricture or hypospadias correction. Research on urethral tissue engineering is increasing. Because the corpus spongiosum is important to support the urethra, urethral tissue engineering should ideally be combined with reconstruction of a
Inflation Metaphor in the TIME Magazine Corpus

Science.gov (United States)

Hu, Chunyu; Liu, Huijie

2016-01-01

A historical perspective on economy metaphor can shed new lights on economic thoughts. Based on the TIME Magazine Corpus (TMC), this paper investigates inflation metaphor over 83 years and compares findings against the economic data over the relatively corresponding period. The results show how inflation, an abstract concept and a normal economic…
MORPHOMETRIC ANALYSIS OF CORPUS CALLOSUM- A STUDY IN CADAVER AND MRI

Directory of Open Access Journals (Sweden)

Ambili Puthanveetil

2017-07-01

Full Text Available BACKGROUND The Corpus Callosum (CC can best be seen in the mid-sagittal section of brain both in cadaver and MRI. The morphometric measurements of the same will be of use in neurosurgical procedures. Sexual dimorphism and the age-related changes in its measurements remained controversial. Till date, no studies have been done on corpus callosum in Kerala. MATERIALS AND METHODS Measurements of CC has been taken and studied in detail in 24 formalin fixed brains from the Department of Anatomy and 48 MR images from the Department of Radiology. The changes according to age and sex were analysed. RESULTS The mean length of CC in the cadaver was 7.24 cm, which was 3.38 cm posterior to frontal pole and 5.73 cm anterior to occipital pole. In MR images, the mean length was 7.10 in males and 6.76 in females. The difference we got was not statistically significant. The length increased with age. Thickness of genu and body decreased as the age advances, but the splenial thickness was found to be increasing with age. There was significant correlation between the thicknesses of various parts of CC. CONCLUSION The values were almost similar to those in the previous studies. Morphometrically, a significant gender difference was not identified in the present study. There were changes according to age both in males and females.
Word-Length Correlations and Memory in Large Texts: A Visibility Network Analysis

Directory of Open Access Journals (Sweden)

Lev Guzmán-Vargas

2015-11-01

Full Text Available We study the correlation properties of word lengths in large texts from 30 ebooks in the English language from the Gutenberg Project (www.gutenberg.org using the natural visibility graph method (NVG. NVG converts a time series into a graph and then analyzes its graph properties. First, the original sequence of words is transformed into a sequence of values containing the length of each word, and then, it is integrated. Next, we apply the NVG to the integrated word-length series and construct the network. We show that the degree distribution of that network follows a power law, P ( k ∼ k - γ , with two regimes, which are characterized by the exponents γ s ≈ 1 . 7 (at short degree scales and γ l ≈ 1 . 3 (at large degree scales. This suggests that word lengths are much more strongly correlated at large distances between words than at short distances between words. That finding is also supported by the detrended fluctuation analysis (DFA and recurrence time distribution. These results provide new information about the universal characteristics of the structure of written texts beyond that given by word frequencies.
n-Gram-Based Text Compression

Directory of Open Access Journals (Sweden)

Vu H. Nguyen

2016-01-01

Full Text Available We propose an efficient method for compressing Vietnamese text using n-gram dictionaries. It has a significant compression ratio in comparison with those of state-of-the-art methods on the same dataset. Given a text, first, the proposed method splits it into n-grams and then encodes them based on n-gram dictionaries. In the encoding phase, we use a sliding window with a size that ranges from bigram to five grams to obtain the best encoding stream. Each n-gram is encoded by two to four bytes accordingly based on its corresponding n-gram dictionary. We collected 2.5 GB text corpus from some Vietnamese news agencies to build n-gram dictionaries from unigram to five grams and achieve dictionaries with a size of 12 GB in total. In order to evaluate our method, we collected a testing set of 10 different text files with different sizes. The experimental results indicate that our method achieves compression ratio around 90% and outperforms state-of-the-art methods.
Corpus-Based Websites to Promote Learner Autonomy in Correcting Writing Collocation Errors

Directory of Open Access Journals (Sweden)

Pham Thuy Dung

2016-12-01

Full Text Available The recent yet powerful emergence of E-learning and using online resources in learning EFL (English as a Foreign Language has helped promote learner autonomy in language acquisition including self-correcting their mistakes. This pilot study despite conducted on a modest sample of 25 second year students majoring in Business English at Hanoi Foreign Trade University is an initial attempt to investigate the feasibility of using corpus-based websites to promote learner autonomy in correcting collocation errors in EFL writing. The data is collected using a pre-questionnaire and a post-interview aiming to find out the participants’ change in belief and attitude toward learner autonomy in collocation errors in writing, the extent of their success in using the corpus-based websites to self-correct the errors and the change in their confidence in self-correcting the errors using the websites. The findings show that a significant majority of students have shifted their belief and attitude toward a more autonomous mode of learning, enjoyed a fair success of using the websites to self-correct the errors and become more confident. The study also yields an implication that a face-to-face training of how to use these online tools is vital to the later confidence and success of the learners
Text Mining to inform construction of Earth and Environmental Science Ontologies

Science.gov (United States)

Schildhauer, M.; Adams, B.; Rebich Hespanha, S.

2013-12-01

There is a clear need for better semantic representation of Earth and environmental concepts, to facilitate more effective discovery and re-use of information resources relevant to scientists doing integrative research. In order to develop general-purpose Earth and environmental science ontologies, however, it is necessary to represent concepts and relationships that span usage across multiple disciplines and scientific specialties. Traditional knowledge modeling through ontologies utilizes expert knowledge but inevitably favors the particular perspectives of the ontology engineers, as well as the domain experts who interacted with them. This often leads to ontologies that lack robust coverage of synonymy, while also missing important relationships among concepts that can be extremely useful for working scientists to be aware of. In this presentation we will discuss methods we have developed that utilize statistical topic modeling on a large corpus of Earth and environmental science articles, to expand coverage and disclose relationships among concepts in the Earth sciences. For our work we collected a corpus of over 121,000 abstracts from many of the top Earth and environmental science journals. We performed latent Dirichlet allocation topic modeling on this corpus to discover a set of latent topics, which consist of terms that commonly co-occur in abstracts. We match terms in the topics to concept labels in existing ontologies to reveal gaps, and we examine which terms are commonly associated in natural language discourse, to identify relationships that are important to formally model in ontologies. Our text mining methodology uncovers significant gaps in the content of some popular existing ontologies, and we show how, through a workflow involving human interpretation of topic models, we can bootstrap ontologies to have much better coverage and richer semantics. Because we base our methods directly on what working scientists are communicating about their
A Relation Extraction Framework for Biomedical Text Using Hybrid Feature Set

Directory of Open Access Journals (Sweden)

Abdul Wahab Muzaffar

2015-01-01

Full Text Available The information extraction from unstructured text segments is a complex task. Although manual information extraction often produces the best results, it is harder to manage biomedical data extraction manually because of the exponential increase in data size. Thus, there is a need for automatic tools and techniques for information extraction in biomedical text mining. Relation extraction is a significant area under biomedical information extraction that has gained much importance in the last two decades. A lot of work has been done on biomedical relation extraction focusing on rule-based and machine learning techniques. In the last decade, the focus has changed to hybrid approaches showing better results. This research presents a hybrid feature set for classification of relations between biomedical entities. The main contribution of this research is done in the semantic feature set where verb phrases are ranked using Unified Medical Language System (UMLS and a ranking algorithm. Support Vector Machine and Naïve Bayes, the two effective machine learning techniques, are used to classify these relations. Our approach has been validated on the standard biomedical text corpus obtained from MEDLINE 2001. Conclusively, it can be articulated that our framework outperforms all state-of-the-art approaches used for relation extraction on the same corpus.
Comment constituer et traiter des corpus issus des médias sociaux à l’aide des CAQDAS? NVivo : un instrument pour l’analyse des discours numériques

Directory of Open Access Journals (Sweden)

Ferrari Giovannipaolo

2015-01-01

Full Text Available Avec l’apparition des médias sociaux, le chercheur en Sciences Humaines et Sociales a vu s’ouvrir des possibilités de création de corpus à partir des données disponibles en ligne. Le risque dans ces modalités de création de corpus est de créer de grandes bases de données difficilement gérables avec les outils traditionnels et surtout dans le cadre d’une recherche qualitative. Pour cette raison, il semble important de mobiliser des outils numériques pour l’aide à la recherche. Ces logiciels sont appelés CAQDAS. Cette contribution montre comment les utiliser pour une enquête de terrain en ligne ou numérique afin d’organiser et d’analyser un corpus. Ce corpus est constitué en vue d’une analyse des discours produits dans le contexte professionnel de la radio utilisant les nouveaux médias.
ECPC: el discurso parlamentario europeo desde la perspectiva de los estudios traductológicos de corpus

Directory of Open Access Journals (Sweden)

José Manuel Martínez Martínez

2012-12-01

Full Text Available Este artículo presenta la labor investigadora del grupo ECPC, que ha diseñado y creado un Archivo de discursos parlamentarios europeos con el fin de estudiar dicho género y la hipotética influencia de la traducción en la construcción de la identidad europea. La investigación se ha restringido al Parlamento Europeo (mediante la construcción de un corpus paralelo â€”EN y ESâ€” con las versiones en inglés y español y a dos parlamentos nacionales, la House of Commons británica (HC y el Congreso de los Diputados español CD, que constituyen sendos corpus comparables. El Archivo contiene los discursos recogidos en las actas de las sesiones plenarias celebradas a lo largo de la VI legislatura del Parlamento Europeo (2004-2009 en cada una de las cámaras anteriormente mencionadas.
Corpus-Based Rhythmic Pattern Analysis of Ragtime Syncopation

NARCIS (Netherlands)

Koops, Hendrik Vincent; Volk, A.; de Haas, W.B.

2015-01-01

This paper presents a corpus-based study on rhythmic patterns in the RAG-collection of approximately 11.000 symbolically encoded ragtime pieces. While characteristic musical features that define ragtime as a genre have been debated since its inception, musicologists argue that specific syncopation
I will proclaim myself what I am : corpus stylistics and the language of Shakespeare’s soliloquies

OpenAIRE

Murphy, Sean Edward

2015-01-01

This article reports on a corpus stylistic study of the language of soliloquies in Shakespeare’s plays. Literary corpus stylistics can use corpus linguistic methods to test claims made by literary critics and identify hitherto unnoticed features. Existing literary studies of soliloquies tend to define and classify them, to trace the history of the form or to offer literary appreciation; yet they pay surprisingly little attention to the language which characterises soliloquies. By creating a s...
Atrophy and magnetization transfer ratio of the corpus callosum in patients with Alzheimer's disease

International Nuclear Information System (INIS)

Imon, Yukari; Hanyu, Haruo; Iwamoto, Toshihiko; Takasaki, Masaru; Abe, Kimihiko

1998-01-01

We compared atrophy and magnetization transfer ratio (MTR) in the corpus callosum in patients with Alzheimer's disease and age-matched normal subjects. Fifteen patients with Alzheimer's disease and fourteen normal subjects received MRI. The corpus callosum was divided into three parts (anterior, middle, and posterior portions) on midsagittal slice, and their areas on T2-weighted reversed images and MTR on magnetization transfer contrast images in each portion were measured. The area and MTR decreased significantly in the posterior portion in patients with Alzheimer's disease. In the anterior portion, MTR decreased significantly, but although the area showed no significant change. In the middle portion, the area and MTR showed no significant change. MTR and the area was correlated in each portion in patients with Alzheimer's disease. The score of Hasegawa dementia scale-revised (HDS-R) and the area of the middle, posterior and total of corpus callosum were significantly related. The score of HDS-R and MTR in the anterior portion of corpus callosum were significantly related. The present study revealed decreases in MTR in the anterior portion of the corpus callosum of patients with Alzheimer's disease although the area showed no significant change, and this change suggests the increase in free water and/or the decrease in bound water in tissues, probably due to demyelination and axonal degeneration. (author)
From text to political positions: The convergence of political, linguistic and discourse analysis

NARCIS (Netherlands)

van Elfrinkhof, A.M.E.; Maks, I.; Kaal, A.R.; Kaal, A.R.; Maks, I.; van Elfrinkhof, A.M.E.

2014-01-01

Abstract: This chapter explores how three methods of political text analysis can complement each other to differentiate parties in detail. A word-frequency method and corpus linguistic techniques are joined by critical discourse analysis in an attempt to assess the ideological relation between
It’s about This and That: A Description of Anaphoric Expressions in Clinical Text

Science.gov (United States)

Wang, Yan; Melton, Genevieve B.; Pakhomov, Serguei

2011-01-01

Although anaphoric expressions are very common in biomedical and clinical documents, little work has been done to systematically characterize their use in clinical text. Samples of ‘it’, ‘this’, and ‘that’ expressions occurring in inpatient clinical notes from four metropolitan hospitals were analyzed using a combination of semi-automated and manual annotation techniques. We developed a rule-based approach to filter potential non-referential expressions. A physician then manually annotated 1000 potential referential instances to determine referent status and the antecedent of each referent expression. A distributional analysis of the three referring expressions in the entire corpus of notes demonstrates a high prevalence of anaphora and large variance in distributions of referential expressions with different notes. Our results confirm that anaphoric expressions are common in clinical texts. Effective co-reference resolution with anaphoric expressions remains an important challenge in medical natural language processing research. PMID:22195211
Automatic Contextual Text Correction Using The Linguistic Habits Graph Lhg

Directory of Open Access Journals (Sweden)

Marcin Gadamer

2009-01-01

Full Text Available Automatic text correction is an essential problem of today text processors and editors. Thispaper introduces a novel algorithm for automation of contextual text correction using a LinguisticHabit Graph (LHG also introduced in this paper. A specialist internet crawler hasbeen constructed for searching through web sites in order to build a Linguistic Habit Graphafter text corpuses gathered in polish web sites. The achieved correction results on a basis ofthis algorithm using this LHG were compared with commercial programs which also enableto make text correction: Microsoft Word 2007, Open Office Writer 3.0 and search engineGoogle. The achieved results of text correction were much better than correction made bythese commercial tools.

The interpretation of dream meaning: Resolving ambiguity using Latent Semantic Analysis in a small corpus of text.

Science.gov (United States)

Altszyler, Edgar; Ribeiro, Sidarta; Sigman, Mariano; Fernández Slezak, Diego

2017-11-01

Computer-based dreams content analysis relies on word frequencies within predefined categories in order to identify different elements in text. As a complementary approach, we explored the capabilities and limitations of word-embedding techniques to identify word usage patterns among dream reports. These tools allow us to quantify words associations in text and to identify the meaning of target words. Word-embeddings have been extensively studied in large datasets, but only a few studies analyze semantic representations in small corpora. To fill this gap, we compared Skip-gram and Latent Semantic Analysis (LSA) capabilities to extract semantic associations from dream reports. LSA showed better performance than Skip-gram in small size corpora in two tests. Furthermore, LSA captured relevant word associations in dream collection, even in cases with low-frequency words or small numbers of dreams. Word associations in dreams reports can thus be quantified by LSA, which opens new avenues for dream interpretation and decoding. Copyright © 2017 Elsevier Inc. All rights reserved.
How Do Skilled and Less-Skilled Spellers Write Text Messages? A Longitudinal Study

Science.gov (United States)

Bernicot, J.; Goumi, A.; Bert-Erboul, A.; Volckaert-Legrier, O.

2014-01-01

The link between students' spelling level and their text-messaging practice gives rise to numerous questions from teachers, parents and the media. A corpus of 4524 text messages produced in daily-life situations by students in sixth and seventh grade (n?=?19, 11-12 years of age) was compiled. None of the participants had ever owned or used a…
An Analysis of Stative Verbs Used with the Progressive Aspect in Corpus-Informed Textbooks

Science.gov (United States)

Belli, Serap Atasever

2018-01-01

This study was designed to investigate whether contemporary corpus-informed grammar textbooks written for English language learners and teachers presented the progressive use of stative verbs and if yes, which stative verbs were presented to occur with the progressive aspect and for which functions they took this aspect. A corpus of six electronic…
US News Media Portrayal of Islam and Muslims: A Corpus-Assisted Critical Discourse Analysis

Science.gov (United States)

Samaie, Mahmoud; Malmir, Bahareh

2017-01-01

This article exploits the synergy of critical discourse studies and Corpus Linguistics to study the pervasive representation of Islam and Muslims in an approximate 670,000-word corpus of US news media stories published between 2001 and 2015. Following collocation and concordance analysis of the most frequent topics or categories which revolve…
Investigation of ground-water contamination at a drainage ditch, Installation Restoration Site 4, Naval Air Station Corpus Christi, Corpus Christi, Texas, 2005–06

Science.gov (United States)

Vroblesky, Don A.; Casey, Clifton C.

2007-01-01

The U.S. Geological Survey, in cooperation with the Naval Facilities Engineering Command Southeast, used newly developed sampling methods to investigate ground-water contamination by chlorobenzenes beneath a drainage ditch on the southwestern side of Installation Restoration Site 4, Naval Air Station Corpus Christi, Corpus Christi, Texas, during 2005-06. The drainage ditch, which is a potential receptor for ground-water contaminants from Installation Restoration Site 4, intermittently discharges water to Corpus Christi Bay. This report uses data from a new type of pore-water sampler developed for this investigation and other methods to examine the subsurface contamination beneath the drainage ditch. Analysis of ground water from the samplers indicated that chlorobenzenes (maximum detected concentration of 160 micrograms per liter) are present in the ground water beneath the ditch. The concentrations of dissolved oxygen in the samples (less than 0.05-0.4 milligram per liter) showed that the ground water beneath and near the ditch is anaerobic, indicating that substantial chlorobenzene biodegradation in the aquifer beneath the ditch is unlikely. Probable alternative mechanisms of chlorobenzene removal in the ground water beneath the drainage ditch include sorption onto the organic-rich sediment and contaminant depletion by cattails through uptake, sorption, and localized soil aeration.
CROATIAN ADULT SPOKEN LANGUAGE CORPUS (HrAL

Directory of Open Access Journals (Sweden)

Jelena Kuvač Kraljević

2016-01-01

Full Text Available Interest in spoken-language corpora has increased over the past two decades leading to the development of new corpora and the discovery of new facets of spoken language. These types of corpora represent the most comprehensive data source about the language of ordinary speakers. Such corpora are based on spontaneous, unscripted speech defined by a variety of styles, registers and dialects. The aim of this paper is to present the Croatian Adult Spoken Language Corpus (HrAL, its structure and its possible applications in different linguistic subfields. HrAL was built by sampling spontaneous conversations among 617 speakers from all Croatian counties, and it comprises more than 250,000 tokens and more than 100,000 types. Data were collected during three time slots: from 2010 to 2012, from 2014 to 2015 and during 2016. HrAL is today available within TalkBank, a large database of spoken-language corpora covering different languages (https://talkbank.org, in the Conversational Analyses corpora within the subsection titled Conversational Banks. Data were transcribed, coded and segmented using the transcription format Codes for Human Analysis of Transcripts (CHAT and the Computerised Language Analysis (CLAN suite of programmes within the TalkBank toolkit. Speech streams were segmented into communication units (C-units based on syntactic criteria. Most transcripts were linked to their source audios. The TalkBank is public free, i.e. all data stored in it can be shared by the wider community in accordance with the basic rules of the TalkBank. HrAL provides information about spoken grammar and lexicon, discourse skills, error production and productivity in general. It may be useful for sociolinguistic research and studies of synchronic language changes in Croatian.
Interhemispheric functional disconnection because of abnormal corpus callosum integrity in bipolar disorder type II.

Science.gov (United States)

Yasuno, Fumihiko; Kudo, Takashi; Matsuoka, Kiwamu; Yamamoto, Akihide; Takahashi, Masato; Nakagawara, Jyoji; Nagatsuka, Kazuyuki; Iida, Hidehiro; Kishimoto, Toshifumi

2016-11-01

A significantly lower fractional anisotropy (FA) value has been shown in anterior parts of the corpus callosum in patients with bipolar disorder. We investigated the association between abnormal corpus callosum integrity and interhemispheric functional connectivity (IFC) in patients with bipolar disorder. We examined the association between FA values in the corpus callosum (CC-FA) and the IFC between homotopic regions in the anterior cortical structures of bipolar disorder ( n =16) and major depressive disorder ( n =22) patients with depressed or euthymic states. We found a positive correlation between the CC-FA and IFC values between homotopic regions of the ventral prefrontal cortex and insula cortex, and significantly lower IFC between these regions in bipolar disorder patients. The abnormal corpus callosum integrity in bipolar disorder patients is relevant to the IFC between homotopic regions, possibly disturbing the exchange of emotional information between the cerebral hemispheres resulting in emotional dysregulation. None. © The Royal College of Psychiatrists 2016. This is an open access article distributed under the terms of the Creative Commons Non-Commercial, No Derivatives (CC BY-NC-ND) license.
As metáforas do presidente lula na perspectiva da linguística de corpus: o caso do desenvolvimento President Lula's metaphors in a corpus linguistic perspective: The case of 'development'

Directory of Open Access Journals (Sweden)

Tony Berber Sardinha

2010-01-01

Full Text Available Um dos grandes fenômenos linguísticos da vida política brasileira recente é o que a mídia vem chamando de 'metáforas do presidente Lula'. O ponto de partida deste trabalho é o fato de que deve haver muitas metáforas que passam despercebidas no discurso do presidente e que podem ser descobertas por meio de pesquisa com corpora eletrônicos. Investigamos a presença de metáforas conceptuais relacionadas a 'desenvolvimento' em um corpus composto por pronunciamentos emitidos ao longo de três anos pelo presidente Luís Inácio Lula da Silva. Os resultados indicam que há uso sistemático de três conceitos metafóricos que definem a noção de desenvolvimento do chefe de Estado: VIAGEM, CONSTRUÇÃO e ORGANISMO. Esses três conceitos, em geral, equacionam desenvolvimento com um processo longo, construído, planejado e gerado pelo governo.One of the main linguistic phenomena in recent Brazilian politics is what the media has called 'President Lula's metaphors'. The starting point for the present investigation is that there must be lots of metaphors that go unnoticed in the president's discourse and that these may be uncovered by corpus-based research. We looked at the presence of conceptual metaphors related to 'development' in a corpus of three years of official presidential speeches. The results indicated the systematic use of three metaphorical concepts that together define the notion of development for the head of State: JOURNEY, BUILDING and ORGANISM. These three concepts together equate development with a long process that is generated, planned and carried out by the government.
Open Corpus Adaptation++ in GALE : friend or foe?

NARCIS (Netherlands)

De Bra, P.M.E.; Smits, D.; Pechenizkiy, M.; Knutov, E.; Yudelson, M.; Abel, F.; Houben, G.J.P.M.; Herder, E.

2012-01-01

"Open" has quickly become the hottest topic in any field related to information, including open government data, open learning resources, open user models, … Open Corpus Adaptation has been defined as the ability to perform adaptation to resources located anywhere on the Web. This leaves the
Text-mining as a methodology to assess eating disorder-relevant factors: Comparing mentions of fitness tracking technology across online communities.

Science.gov (United States)

McCaig, Duncan; Bhatia, Sudeep; Elliott, Mark T; Walasek, Lukasz; Meyer, Caroline

2018-05-07

Text-mining offers a technique to identify and extract information from a large corpus of textual data. As an example, this study presents the application of text-mining to assess and compare interest in fitness tracking technology across eating disorder and health-related online communities. A list of fitness tracking technology terms was developed, and communities (i.e., 'subreddits') on a large online discussion platform (Reddit) were compared regarding the frequency with which these terms occurred. The corpus used in this study comprised all comments posted between May 2015 and January 2018 (inclusive) on six subreddits-three eating disorder-related, and three relating to either fitness, weight-management, or nutrition. All comments relating to the same 'thread' (i.e., conversation) were concatenated, and formed the cases used in this study (N = 377,276). Within the eating disorder-related subreddits, the findings indicated that a 'pro-eating disorder' subreddit, which is less recovery focused than the other eating disorder subreddits, had the highest frequency of fitness tracker terms. Across all subreddits, the weight-management subreddit had the highest frequency of the fitness tracker terms' occurrence, and MyFitnessPal was the most frequently mentioned fitness tracker. The technique exemplified here can potentially be used to assess group differences to identify at-risk populations, generate and explore clinically relevant research questions in populations who are difficult to recruit, and scope an area for which there is little extant literature. The technique also facilitates methodological triangulation of research findings obtained through more 'traditional' techniques, such as surveys or interviews. © 2018 Wiley Periodicals, Inc.
n-Gram-Based Text Compression

Science.gov (United States)

Duong, Hieu N.; Snasel, Vaclav

2016-01-01

We propose an efficient method for compressing Vietnamese text using n-gram dictionaries. It has a significant compression ratio in comparison with those of state-of-the-art methods on the same dataset. Given a text, first, the proposed method splits it into n-grams and then encodes them based on n-gram dictionaries. In the encoding phase, we use a sliding window with a size that ranges from bigram to five grams to obtain the best encoding stream. Each n-gram is encoded by two to four bytes accordingly based on its corresponding n-gram dictionary. We collected 2.5 GB text corpus from some Vietnamese news agencies to build n-gram dictionaries from unigram to five grams and achieve dictionaries with a size of 12 GB in total. In order to evaluate our method, we collected a testing set of 10 different text files with different sizes. The experimental results indicate that our method achieves compression ratio around 90% and outperforms state-of-the-art methods. PMID:27965708
Representation of Gamblers in the Singaporean Press since C-A-S-I-N-O Legalization: A Corpus-driven Critical Analysis

Directory of Open Access Journals (Sweden)

Ray Leung

2016-11-01

Full Text Available Capitalizing on the lack of gambling-related research among discourse analysts and the recent liberalization of C-A-S-I-N-O operations in Singapore, the present article reports on the discursive representation of gamblers in Singapore newspaper texts by merging corpus linguistics and critical discourse analysis. 889 articles from the popular daily paper The Straits Times (Singapore were retrieved via LexisNexis in accordance with a series of criteria. The extracted texts, which were dated from 17 April 2005 to 28 April 2013, constitute the 615 827-word corpus of the current study. WordSmith Tools 6.0 was used to perform collocation analysis, which was enriched by critical examination of the concordance lines. The findings indicate that apart from gender stereotyping, social alienation is manifested in various ways while gamblers are being portrayed. For instance, the pronoun collocate ‘we’ of the node ‘gambler*’ tends to signify the non-gamblers’ voice which is geared towards the institutional stance. The verb collocate ‘say’ is frequently used in contexts where the gamblers are being commented upon or criticized. The analytic outcomes of the research have once again confirmed the ‘hegemonizing’ character of newspaper texts.
The English Definite Article: What ESL/EFL Grammars Say and What Corpus Findings Show

Science.gov (United States)

WonHo Yoo, Isaiah

2009-01-01

To ascertain whether what ESL/EFL grammars say is informed by what scholars discuss in the literature and supported by what corpus findings actually show, this paper first presents a brief overview of the literature on the English definite article and then compares popular ESL/EFL grammars' coverage of "the" and corpus findings on definite article…
JaSlo: Integration of a Japanese-Slovene Bilingual Dictionary with a Corpus Search System

Directory of Open Access Journals (Sweden)

HMELJAK SANGAWA, Kristina

2012-12-01

Full Text Available The paper presents a set of integrated on-line language resources targeted at Japanese language learners, primarily those whose mother tongue is Slovene. The resources consist of the on-line Japanese-Slovene learners’ dictionary jaSlo and two corpora, a 1 million word Japanese-Slovene parallel corpus and a 300 million word corpus of web pages, where each word and sentence is marked by its difficulty level; this corpus is furthermore available as a set of five distinct corpora, each one containing sentences of the particular level. The corpora are available for exploration through NoSketch Engine, the open source version of the commercial state-of-the-art corpus analysis software Sketch Engine. The dictionary is available for Web searching, and dictionary entries have direct links to examples from the corpora, thus offering a wider picture of a possible translations in concrete contextualised examples, and b monolingual Japanese usage examples of different difficulty levels to support language learning.-----Članek predstavlja japonsko-slovenski slovar jaSlo, spletni slovar za slovensko govoreče učence japonščine, in vključitev primerov iz dveh korpusov s pomočjo odprto-kodnega korpusnega iskalnika NoSketch Engine. Korpusa sta jaSlo (milijon besed, vzporedni korpus japonskih in slovenskih besedil, ki je bil zgrajen za ta namen in vsebuje večinoma literarna, spletna in akademska besedila, ter JpWaC-L (300 milijonov besed, korpus spletnih besedil, razdeljenih v povedi, ki so rangirane po težavnostnih stopnjah. S pregledno povezavo korpusnih primerov in slovarskih iztočnic v dvojezičnem slovarju za učence japonščine kot tujega jezika, ponuja sistem uporabnikom prijazen dostop k slovarskim podatkom, tj. reprezentativnim prevodnim ustreznicam, in korpusnim podatkom, ki ponujajo a širšo sliko možnih prevodnih ustreznic v konkretnih primerih s sobesedilom in b enojezične primere rabe japonskih besed v povedih različnih te
Emotional Intelligence in Agenesis of the Corpus Callosum.

Science.gov (United States)

Anderson, Luke B; Paul, Lynn K; Brown, Warren S

2017-05-01

People with agenesis of the corpus callosum (AgCC) with normal general intelligence have deficits in complex cognitive processing, as well as in social cognition. It is uncertain the extent to which impoverished processing of emotions may contribute to social processing deficiencies. We used the Mayer-Salovey-Caruso Emotional Intelligence Test to clarify the nature of emotional intelligence in 16 adults with AgCC. As hypothesized, persons with AgCC exhibited greater disparities from norms on tests involving more socially complex aspects of emotions. The AgCC group did not differ from norms on the Experiential subscale, but they were significantly below norms on the Strategic subscale. These findings suggest that the corpus callosum is not essential for experiencing and thinking about basic emotions in a "normal" way, but is necessary for more complex processes involving emotions in the context of social interactions. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Improving Terminology Mapping in Clinical Text with Context-Sensitive Spelling Correction.

Science.gov (United States)

Dziadek, Juliusz; Henriksson, Aron; Duneld, Martin

2017-01-01

The mapping of unstructured clinical text to an ontology facilitates meaningful secondary use of health records but is non-trivial due to lexical variation and the abundance of misspellings in hurriedly produced notes. Here, we apply several spelling correction methods to Swedish medical text and evaluate their impact on SNOMED CT mapping; first in a controlled evaluation using medical literature text with induced errors, followed by a partial evaluation on clinical notes. It is shown that the best-performing method is context-sensitive, taking into account trigram frequencies and utilizing a corpus-based dictionary.
Integrity of the corpus callosum in patients with periventricular nodular heterotopia related epilepsy by FLNA mutation.

Science.gov (United States)

Liu, Wenyu; An, Dongmei; Niu, Running; Gong, Qiyong; Zhou, Dong

2018-01-01

To investigate the quantitative diffusion properties of the corpus callosum (CC) in a large group of patients with periventricular nodular heterotopia (PNH) related epilepsy and to further investigate the effect of Filamin A ( FLNA ) mutation on these properties. Patients with PNH (n = 34), subdivided into FLNA -mutated (n = 11) and FLNA -nonmutated patients (n = 23) and healthy controls (n = 34), underwent 3.0 T structural MRI and diffusion imaging scan (64 direction). Fractional anisotropy (FA) and mean diffusivity (MD) were measured in the three major subdivisions of the CC (genu, body and splenium). Correlations between DTI metric changes and clinical parameters were also evaluated. Furthermore, the effect of FLNA mutation on structural integrity of the corpus callosum was examined. Patients with PNH and epilepsy had significant reductions in FA for the genu and splenium of the CC, accompanied by increases in MD for the splenium, as compared to healthy controls. There were no correlations between clinical parameters of epilepsy and MD. The FA value in the splenium negatively correlated with epilepsy duration. Interestingly, FLNA -mutated patients showed significantly decreased FA for all three major subdivisions of the CC, and increased MD for the genu and splenium, as compared to HCs and FLNA -nonmutated patients. These findings support the conclusion that patients with epilepsy secondary to PNH present widespread microstructural changes found in the corpus callosum that extend beyond the macroscopic MRI-visible lesions. This study also indicates that FLNA may affect white matter integrity in this disorder.
Agenesis of the corpus callosum and autism: a comprehensive comparison.

Science.gov (United States)

Paul, Lynn K; Corsello, Christina; Kennedy, Daniel P; Adolphs, Ralph

2014-06-01

The corpus callosum, with its ∼200 million axons, remains enigmatic in its contribution to cognition and behaviour. Agenesis of the corpus callosum is a congenital condition in which the corpus callosum fails to develop; such individuals exhibit localized deficits in non-literal language comprehension, humour, theory of mind and social reasoning. These findings together with parent reports suggest that behavioural and cognitive impairments in subjects with callosal agenesis may overlap with the profile of autism spectrum disorders, particularly with respect to impairments in social interaction and communication. To provide a comprehensive test of this hypothesis, we directly compared a group of 26 adults with callosal agenesis to a group of 28 adults with a diagnosis of autism spectrum disorder but no neurological abnormality. All participants had full-scale intelligence quotient scores >78 and groups were matched on age, handedness, and gender ratio. Using the Autism Diagnostic Observation Schedule together with current clinical presentation to assess autistic symptomatology, we found that 8/26 (about a third) of agenesis subjects presented with autism. However, more formal diagnosis additionally involving recollective parent-report measures regarding childhood behaviour showed that only 3/22 met complete formal criteria for an autism spectrum disorder (parent reports were unavailable for four subjects). We found no relationship between intelligence quotient and autism symptomatology in callosal agenesis, nor evidence that the presence of any residual corpus callosum differentiated those who exhibited current autism spectrum symptoms from those who did not. Relative to the autism spectrum comparison group, parent ratings of childhood behaviour indicated children with agenesis were less likely to meet diagnostic criteria for autism, even for those who met autism spectrum criteria as adults, and even though there was no group difference in parent report of current
Instruction and Interaction in an American Lecture Class. Observations from a Corpus

Directory of Open Access Journals (Sweden)

Carmen Pérez-Llantada

2012-05-01

Full Text Available Taking the Michigan Corpus of Academic Spoken English, this paper explores the pragmatic behavior of one-word tags – a common feature in conversational English – in academic speech. The analysis indicates that university professors use tags within textual metadiscourse patterns to signpost their audiences and facilitate comprehension. In addition, tags correlate with interpersonal metadiscourse elements typical of conversation that help lecturers adopt stances, convey solidarity and socialize with their undergraduates. The conclusion section relates the interpersonal semiotics of lectures to the communicative goals of university talk and suggests the need to approach listening comprehension through students’ awareness of genres as social actions.
A corpus-based analysis of textbooks used in the orientation course for immigrants in Germany: Ideological and pedagogic implications

Directory of Open Access Journals (Sweden)

Leung Ray C. H.

2016-09-01

Full Text Available Contextualized within immigrants’ acquisition of specialized knowledge about the host country at the institutional level, this article examines a 64295-word corpus of textbooks written for participants of the orientation course in German politics, history and culture. Corpus-based techniques (“keyness,” collocation and qualitative examination of concordance lines are deployed to explore the corpus. The findings reveal that the collocational patterns of the identified keywords construct particular world views vis-à-vis Germany. For instance, the keyword DDR [German Democratic Republic (GDR, aka East Germany] frequently co-occurs with negatively connoted lexis while collocates of the keywords denoting present-day Germany (e.g., Bundesrepublik Deutschland [Federal Republic of Germany] and Staat [nation, country, state] facilitate the portrayal of Germany as a nurturing welfare state that is popular among foreigners. It is argued that such discursively-construed opposition between the “bad” GDR and the “good” Federal Republic of Germany helps to legitimize the German reunification. Furthermore, it is found that certain keywords (e.g., Sie [you], Kurs [course, class] and z.B. [e.g.] are “metadiscourse resources” (Hyland, 2005. Their pedagogic effects are discussed in relation to the ideological implications of the research findings.

Mesures de comparabilité pour la construction assistée de corpus comparables bilingues thématiques

OpenAIRE

Ke , Guiyao

2014-01-01

Thematic comparable corpora regroup texts from a same topic and written in several languages, highly similar but without mutual translations. Comparing with parallel corpora which regroup pairs of translations, comparable corpora have three advantages: firstly, they are rich and big resources jointly in volume and in covered period; secondly, comparable corpora provide original language and thematic resources. Finally, they are less expensive to develop than parallel corpus. With the consider...
LINGUISTIC ANALYSIS FOR THE BELARUSIAN CORPUS WITH THE APPLICATION OF NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING TECHNIQUES

Directory of Open Access Journals (Sweden)

Yu. S. Hetsevich

2017-01-01

Full Text Available The article focuses on the problems existing in text-to-speech synthesis. Different morphological, lexical and syntactical elements were localized with the help of the Belarusian unit of NooJ program. Those types of errors, which occur in Belarusian texts, were analyzed and corrected. Language model and part of speech tagging model were built. The natural language processing of Belarusian corpus with the help of developed algorithm using machine learning was carried out. The precision of developed models of machine learning has been 80–90 %. The dictionary was enriched with new words for the further using it in the systems of Belarusian speech synthesis.
ONTOGRABBING: Extracting Information from Texts Using Generative Ontologies

DEFF Research Database (Denmark)

Nilsson, Jørgen Fischer; Szymczak, Bartlomiej Antoni; Jensen, P.A.

2009-01-01

We describe principles for extracting information from texts using a so-called generative ontology in combination with syntactic analysis. Generative ontologies are introduced as semantic domains for natural language phrases. Generative ontologies extend ordinary finite ontologies with rules...... for producing recursively shaped terms representing the ontological content (ontological semantics) of NL noun phrases and other phrases. We focus here on achieving a robust, often only partial, ontology-driven parsing of and ascription of semantics to a sentence in the text corpus. The aim of the ontological...... analysis is primarily to identify paraphrases, thereby achieving a search functionality beyond mere keyword search with synsets. We further envisage use of the generative ontology as a phrase-based rather than word-based browser into text corpora....
Effect of hypothyroidism on the purinergic responses of corpus cavernosal smooth muscle in rabbits.

Science.gov (United States)

Yildirim, M K; Bagcivan, I; Sarac, B; Kilicarslan, H; Yildirim, S; Kaya, T

2008-01-01

Several studies have reported evidence of hormonal abnormalities in 25-35% of impotent men. Hypothyroidism has been reported to occur in 6% of impotent men. In the present study, we examined purinergic relaxation responses in hypothyroidism in an experimental rabbit model and compared them with controls to evaluate the possible involvement of the purinergic pathway. The study comprised 20 male New Zealand white rabbits. The rabbits were divided into two equal groups. We tested the effects of ATP, alpha beta ATP, and adenosine precontracted with phenylephrine on the isolated corpus cavernosum preparations from control and hypothyroid rabbits. We also evaluated the effects of ATP, alpha beta ATP, and adenosine on the cGMP levels in the isolated corpus cavernosum preparations from control and hypothyroid rabbits. T3, T4, and testosterone levels were significantly lower in hypothyroid rabbits. ATP, alpha beta ATP, carbachol, and electrical field stimulation (EFS)-induced frequency-dependent relaxation responses in the isolated rabbit corpus cavernosum strips precontracted with phenylephrine reduced significantly (Phypothyroid rabbits. Reduction of relaxation response in hypothyroid rabbits corpus cavernosum can depend on a decreased release of nitric oxide (NO) from nitrergic nerves and endothelium.
Low-cost, rapidly-developed, 3D printed in vitro corpus callosum model for mucopolysaccharidosis type I [version 2; referees: 2 approved

Directory of Open Access Journals (Sweden)

Anthony Tabet

2017-03-01

Full Text Available The rising prevalence of high throughput screening and the general inability of (1 two dimensional (2D cell culture and (2 in vitro release studies to predict in vivo neurobiological and pharmacokinetic responses in humans has led to greater interest in more realistic three dimensional (3D benchtop platforms. Advantages of 3D human cell culture over its 2D analogue, or even animal models, include taking the effects of microgeometry and long-range topological features into consideration. In the era of personalized medicine, it has become increasingly valuable to screen candidate molecules and synergistic therapeutics at a patient-specific level, in particular for diseases that manifest in highly variable ways. The lack of established standards and the relatively arbitrary choice of probing conditions has limited in vitro drug release to a largely qualitative assessment as opposed to a predictive, quantitative measure of pharmacokinetics and pharmacodynamics in tissue. Here we report the methods used in the rapid, low-cost development of a 3D model of a mucopolysaccharidosis type I patient’s corpus callosum, which may be used for cell culture and drug release. The CAD model is developed from in vivo brain MRI tracing of the corpus callosum using open-source software, printed with poly (lactic-acid on a Makerbot Replicator 5X, UV-sterilized, and coated with poly (lysine for cellular adhesion. Adaptations of material and 3D printer for expanded applications are also discussed.
Reversible Restricted Diffusion in the Corpus Callosum in Various Pediatric Diseases

Energy Technology Data Exchange (ETDEWEB)

Kim, Won Kyung; Hong, Hyun Sook; Lee, A Leum; Cha, Jang Gyu; Lee, Hae Kyung [Dept. of Radiology, Soonchunhyang University Bucheon Hospital, Soonchunhyang University College of Medicine, Bucheon (Korea, Republic of); Bae, Won Kyung [Dept. of Radiology, Soonchunhyang University Cheonan Hospital, Soonchunhyang University College of Medicine, Cheonan (Korea, Republic of)

2012-04-15

To evaluate the reversible restricted diffusion in the corpus callosum in pediatric patients with clinical findings, and to discuss the possible pathogenesis of these lesions. Between 2007 and 2011, seven children with reversible signal abnormalities in the corpus callosum were identified and retrospectively reviewed. Diseases and conditions associated with lesions included: trauma (n = 3), neonatal seizure (n = 1), clinically suspected mild encephalopathy (n = 1), multiple sclerosis (n = 1), and seizure with subdural hygroma (n = 1). The callosal lesions were located in the splenium and the genu (n = 2), the splenium and the body (n = 1), and the splenium only (n 4). The shape of the lesions was round-to-ovoid (n = 4) or linear (n = 3). Follow-up MRI scans showed completely resolved (n = 6) or persistent (n = 1) signal abnormalities on diffusion-weighted imaging as well as apparent diffusion coefficient mapping. Clinical outcomes were good in six of the patents but poor in the seventh. Reversible restricted diffusion in the corpus callosum can develop in various diseases. Knowledge of the MRI findings and associated diseases might be helpful in predicting patients' conditions and clinical outcomes.
Using machine learning to disentangle homonyms in large text corpora.

Science.gov (United States)

Roll, Uri; Correia, Ricardo A; Berger-Tal, Oded

2018-06-01

Systematic reviews are an increasingly popular decision-making tool that provides an unbiased summary of evidence to support conservation action. These reviews bridge the gap between researchers and managers by presenting a comprehensive overview of all studies relating to a particular topic and identify specifically where and under which conditions an effect is present. However, several technical challenges can severely hinder the feasibility and applicability of systematic reviews, for example, homonyms (terms that share spelling but differ in meaning). Homonyms add noise to search results and cannot be easily identified or removed. We developed a semiautomated approach that can aid in the classification of homonyms among narratives. We used a combination of automated content analysis and artificial neural networks to quickly and accurately sift through large corpora of academic texts and classify them to distinct topics. As an example, we explored the use of the word reintroduction in academic texts. Reintroduction is used within the conservation context to indicate the release of organisms to their former native habitat; however, a Web of Science search for this word returned thousands of publications in which the term has other meanings and contexts. Using our method, we automatically classified a sample of 3000 of these publications with over 99% accuracy, relative to a manual classification. Our approach can be used easily with other homonyms and can greatly facilitate systematic reviews or similar work in which homonyms hinder the harnessing of large text corpora. Beyond homonyms we see great promise in combining automated content analysis and machine-learning methods to handle and screen big data for relevant information in conservation science. © 2017 Society for Conservation Biology.
The Wildcat Corpus of Native- and Foreign-Accented English: Communicative Efficiency across Conversational Dyads with Varying Language Alignment Profiles

Science.gov (United States)

Van Engen, Kristin J.; Baese-Berk, Melissa; Baker, Rachel E.; Choi, Arim; Kim, Midam; Bradlow, Ann R.

2010-01-01

This paper describes the development of the Wildcat Corpus of native- and foreign-accented English, a corpus containing scripted and spontaneous speech recordings from 24 native speakers of American English and 52 non-native speakers of English. The core element of this corpus is a set of spontaneous speech recordings, for which a new method of…
A Framework for Text Mining in Scientometric Study: A Case Study in Biomedicine Publications

Science.gov (United States)

Silalahi, V. M. M.; Hardiyati, R.; Nadhiroh, I. M.; Handayani, T.; Rahmaida, R.; Amelia, M.

2018-04-01

The data of Indonesians research publications in the domain of biomedicine has been collected to be text mined for the purpose of a scientometric study. The goal is to build a predictive model that provides a classification of research publications on the potency for downstreaming. The model is based on the drug development processes adapted from the literatures. An effort is described to build the conceptual model and the development of a corpus on the research publications in the domain of Indonesian biomedicine. Then an investigation is conducted relating to the problems associated with building a corpus and validating the model. Based on our experience, a framework is proposed to manage the scientometric study based on text mining. Our method shows the effectiveness of conducting a scientometric study based on text mining in order to get a valid classification model. This valid model is mainly supported by the iterative and close interactions with the domain experts starting from identifying the issues, building a conceptual model, to the labelling, validation and results interpretation.
Corpus Planning for the Southern Peruvian Quechua Language.

Science.gov (United States)

Coronel-Molina, Serafin M.

1997-01-01

The discussion of corpus planning for the Southern Quechua language variety of Peru examines issues of graphization, standardization, modernization, and renovation of Quechua in the face of increasing domination by the Spanish language. The efforts of three major groups of linguists and other scholars working on language planning in Peru, and the…
Corpus applications for the African languages, with special ...

African Journals Online (AJOL)

In order to illustrate the feasibility of corpus applications for the African languages at present, the article first considers 'fundamental linguistic research' in the fields of phonetics and question particles. It is shown how that research was boosted as a result of the utilisation of corpora. In a second section 'language teaching ...
Considering bilingual dictionaries against a corpus. Do English ...

African Journals Online (AJOL)

This article investigates the extent to which four representatives of the latest generation of English-French / French-English dictionaries present "real English", i.e. actually used meanings of actually used English word patterns. The findings of a corpus study of the verb CONSIDER are confronted with the entries for this verb ...
Cerebral Visual Impairment and Dysgenesis of Corpus Callosum in Multidisabled Children Aged 1 to 9 Years Old

Directory of Open Access Journals (Sweden)

Roxana CZIKER

2009-12-01

Full Text Available Aims: To emphasize the functional vision characteristics in visually impaired multiple disabled children (MDVI aged 2 to 9 years old related to brain damages on magnetic resonance imaging in different cortical and subcortical areas and in the corpus callosum region. Material and Method: 12 MDVI children with severe and mild neurological disorders were medically and neuropsychological assessed. The clinical - psychological, neurological and ophthalmological – and paraclinical methods – visual evoked potential (VEP and magnetic resonance imaging (MRI were carried out in order to outline the complete profile of each child. The assessment was completed by morphometric measurement of corpus callosum and brain. Results: 10 of infants with severe neurological disorders showed ocular disorders such as ocular motility and visual function abnormalities. Severe cognitive and psychomotor retardation were associated in visual disorders in MDVI children. Significant correlation between neurological disorders, neuropsychological [τ(12 = 0.783, p = 0.001] evaluation and visual acuity [τ(12 = 0.783, p = 0.001] were found in multiple disabled children. The significant difference of diameter [t(22 = -4.858, p = 0.000] and surface of corpus callosum [t(22 = -6.254, p = 0.000] in multiple disabled children compared with control group was found. Conclusion: The structured assessment of visually impaired children due to neurological disorders, as early as possible, is the remarkably key which reveals the functionality of child and outlines the appropriate developmental and educational rehabilitation.
Microstructural damage of the posterior corpus callosum contributes to the clinical severity of neglect.

Directory of Open Access Journals (Sweden)

Marco Bozzali

Full Text Available One theory to account for neglect symptoms in patients with right focal damage invokes a release of inhibition of the right parietal cortex over the left parieto-frontal circuits, by disconnection mechanism. This theory is supported by transcranial magnetic stimulation studies showing the existence of asymmetric inhibitory interactions between the left and right posterior parietal cortex, with a right hemispheric advantage. These inhibitory mechanisms are mediated by direct transcallosal projections located in the posterior portions of the corpus callosum. The current study, using diffusion imaging and tract-based spatial statistics (TBSS, aims at assessing, in a data-driven fashion, the contribution of structural disconnection between hemispheres in determining the presence and severity of neglect. Eleven patients with right acute stroke and 11 healthy matched controls underwent MRI at 3T, including diffusion imaging, and T1-weighted volumes. TBSS was modified to account for the presence of the lesion and used to assess the presence and extension of changes in diffusion indices of microscopic white matter integrity in the left hemisphere of patients compared to controls, and to investigate, by correlation analysis, whether this damage might account for the presence and severity of patients' neglect, as assessed by the Behavioural Inattention Test (BIT. None of the patients had any macroscopic abnormality in the left hemisphere; however, 3 cases were discarded due to image artefacts in the MRI data. Conversely, TBSS analysis revealed widespread changes in diffusion indices in most of their left hemisphere tracts, with a predominant involvement of the corpus callosum and its projections on the parietal white matter. A region of association between patients' scores at BIT and brain FA values was found in the posterior part of the corpus callosum. This study strongly supports the hypothesis of a major role of structural disconnection between the
Établir un corpus oral de questions : L’analyse semi-automatisée avec Praat et Perl à l’exemple de cinq épisodes de Maya l’Abeille

Directory of Open Access Journals (Sweden)

Reinhardt Janina

2016-01-01

Full Text Available Cette communication donne des directives pour la sélection des textes ainsi que des propositions concernant l’usage des outils Praat et Perl, puis les applique à un exemple. À l’heure actuelle, le traitement par ordinateur devient de plus en plus important pour l’analyse des corpus. Cependant, l’exploitation de données prêtes à l’usage nous fait parfois oublier qu’un corpus doit tout d’abord être composé de manière adéquate. De plus, l’automatisation peut être très utile, mais il est impératif de l’utiliser uniquement pour ce qui est décidable par les ordinateurs. Par conséquent, la contribution de cet article est une annotation manuelle par Praat, joint à l’application d’un script de Perl exécutant la partie automatisable. Cet article a trois objectifs : supporter et améliorer les recherches s’appuyant sur des corpus, développer une méthodologie pour établir et analyser un corpus de questions parlées, et enfin d’exemplifier une telle procédure à l’aide d’un petit corpus, à savoir cinq épisodes de l’émission télévisée Maya l’Abeille. Dans ce dernier, je démontre que les patrons intonatifs ne peuvent pas être associés directement aux structures morphosyntaxiques. De surcroît, les résultats soutiennent l’idée que la variation morphosyntaxique ne peut pas être expliquée par une seule catégorie de variables (intralinguistiques, extralinguistiques ou discursifs mais par l’ensemble des facteurs appartenant à ces trois catégories.
Incidence and lifetime risk of uterine corpus cancer in Taiwanese women from 1991 to 2010

Directory of Open Access Journals (Sweden)

Jerry Cheng-Yen Lai

2017-02-01

Conclusion: According to the observed changes in incidence rate, the burden of uterine corpus cancer in the general female population is expected to increase in the near future. From a public-health perspective, care providers should develop strategies for the prevention, early detection, and intervention to reduce the rapidly increasing incidence of uterine corpus cancer in Taiwan.
A comparison of Data Driven models of solving the task of gender identification of author in Russian language texts for cases without and with the gender deception

Science.gov (United States)

Sboev, A.; Moloshnikov, I.; Gudovskikh, D.; Rybka, R.

2017-12-01

In this work we compare several data-driven approaches to the task of author’s gender identification for texts with or without gender imitation. The data corpus has been specially gathered with crowdsourcing for this task. The best models are convolutional neural network with input of morphological data (fl-measure: 88%±3) for texts without imitation, and gradient boosting model with vector of character n-grams frequencies as input data (f1-measure: 64% ± 3) for texts with gender imitation. The method to filter the crowdsourced corpus using limited reference sample of texts to increase the accuracy of result is discussed.
Corpus callosotomy in a patient with startle epilepsy.

Science.gov (United States)

Gómez, Nicolás Garófalo; Hamad, Ana Paula; Marinho, Murilo; Tavares, Igor M; Carrete, Henrique; Caboclo, Luís Otávio; Yacubian, Elza Márcia; Centeno, Ricardo

2013-03-01

Startle epilepsy is a syndrome of reflex epilepsy in which the seizures are precipitated by a sudden and surprising, usually auditory, stimulus. We describe herein a girl who had been suffering with startle-induced seizures since 2 years of age. She had focal, tonic and tonic-clonic seizures, refractory to antiepileptic treatment. Daily tonic seizures led to very frequent falls and morbidity. Neurologically, she had no deficit. Interictal EEG showed slow waves and epileptiform discharges in central and fronto-central regions. Video-polygraphic recordings of seizures, triggered by stimuli, showed generalised symmetric tonic posturing with ictal EEG, characterised by an abrupt and diffuse electrodecremental pattern of fast activity, followed by alpha-theta rhythm superimposed by epileptic discharges predominantly over the vertex and anterior regions. Magnetic resonance imaging showed no abnormalities. Corpus callosotomy was performed when the patient was 17. Since surgery, the patient (one year follow-up) has remained seizure-free. Corpus callosotomy may be considered in patients with startle epilepsy and tonic seizures, in the absence of focal lesions amenable to surgery. [Published with video sequences].
Metric Ambiguity and Flow in Rap Music: A Corpus-Assisted Study of Outkast's "Mainstream" (1996

Directory of Open Access Journals (Sweden)

Mitchell Ohriner

2017-01-01

Full Text Available Recent years have seen the rise of musical corpus studies, primarily detailing harmonic tendencies of tonal music. This article extends this scholarship by addressing a new genre (rap music and a new parameter of focus (rhythm. More specifically, I use corpus methods to investigate the relation between metric ambivalence in the instrumental parts of a rap track (i.e., the beat and an emcee's rap delivery (i.e., the flow. Unlike virtually every other rap track, the instrumental tracks of Outkast's "Mainstream" (1996 simultaneously afford hearing both a four-beat and a three-beat metric cycle. Because three-beat durations between rhymes, phrase endings, and reiterated rhythmic patterns are rare in rap music, an abundance of them within a verse of "Mainstream" suggests that an emcee highlights the three-beat cycle, especially if that emcee is not prone to such durations more generally. Through the construction of three corpora, one representative of the genre as a whole, and two that are artist specific, I show how the emcee T-Mo Goodie's expressive practice highlights the rare three-beat affordances of the track.
Learner features in a New Corpus-based Swahili dictionary ...

African Journals Online (AJOL)

As far as traditionally published Swahili language dictionaries are concerned, throughout the long history of Swahili lexicography, most new dictionaries were based on their predecessors. Thus far the only innovative traditionally printed corpus-based dictionary has been published by Finnish scholars (Abdulla et al. 2002).

Why size matters: differences in brain volume account for apparent sex differences in callosal anatomy: the sexual dimorphism of the corpus callosum.

Science.gov (United States)

Luders, Eileen; Toga, Arthur W; Thompson, Paul M

2014-01-01

Numerous studies have demonstrated a sexual dimorphism of the human corpus callosum. However, the question remains if sex differences in brain size, which typically is larger in men than in women, or biological sex per se account for the apparent sex differences in callosal morphology. Comparing callosal dimensions between men and women matched for overall brain size may clarify the true contribution of biological sex, as any observed group difference should indicate pure sex effects. We thus examined callosal morphology in 24 male and 24 female brains carefully matched for overall size. In addition, we selected 24 extremely large male brains and 24 extremely small female brains to explore if observed sex effects might vary depending on the degree to which male and female groups differed in brain size. Using the individual T1-weighted brain images (n=96), we delineated the corpus callosum at midline and applied a well-validated surface-based mesh-modeling approach to compare callosal thickness at 100 equidistant points between groups determined by brain size and sex. The corpus callosum was always thicker in men than in women. However, this callosal sex difference was strongly determined by the cerebral sex difference overall. That is, the larger the discrepancy in brain size between men and women, the more pronounced the sex difference in callosal thickness, with hardly any callosal differences remaining between brain-size matched men and women. Altogether, these findings suggest that individual differences in brain size account for apparent sex differences in the anatomy of the corpus callosum. © 2013.
Annotated text databases in the context of the Kaj Munk corpus

DEFF Research Database (Denmark)

Sandborg-Petersen, Ulrik

procedure described in Part I can be brought to bear on the task of making Kaj Munk’s works available electronically to the general public. I do so by describing how I have implemented a “Munk Browser” desktop application. Chapter 13 discusses ways in which the EMdF model and the MQL query language can...... language can be extended to support the requirements of the problem of storing and retrieving annotated text even better. Finally, Chapter 15 concludes the dissertation. Appendix A gives the grammar for the subset of the MQL query language which closely resembles Doedens’s QL. Seven already-published...
You Should Have the Body: Understanding Habeas Corpus

Science.gov (United States)

Landman, James

2008-01-01

English legal commentator William Blackstone described the writ of habeas corpus as a second Magna Carta, and Supreme Court Chief Justice John Marshall called it the "great writ." It has been part of the Anglo-American common law tradition since the Middle Ages. In the United States, it has been a source of tension between state and…
Afasia fluente. Materiales para su estudio.(Volumen 01 del Corpus PerLA)

OpenAIRE

Gallardo-Paúls, Beatriz; Sanmartín Sáez, Julia

2005-01-01

El corpus PerLA (“Percepción, Lenguaje y Afasia”), surge en el área de Lingüística General de la Universitat de València como respuesta a la necesidad de integrar el estudio de las patologías lingüísticas en las tendencias actuales de la pragmática y la lingüística de corpus. Para satisfacer las exigencias de estas disciplinas y disponer de datos con validez ecológica, se han realizado grabaciones a diferentes hablantes con afasia, en un contexto que intenta alejarse de las convenciones propi...
Introducing the Geneva Multimodal expression corpus for experimental research on emotion perception.

Science.gov (United States)

Bänziger, Tanja; Mortillaro, Marcello; Scherer, Klaus R

2012-10-01

Research on the perception of emotional expressions in faces and voices is exploding in psychology, the neurosciences, and affective computing. This article provides an overview of some of the major emotion expression (EE) corpora currently available for empirical research and introduces a new, dynamic, multimodal corpus of emotion expressions, the Geneva Multimodal Emotion Portrayals Core Set (GEMEP-CS). The design features of the corpus are outlined and justified, and detailed validation data for the core set selection are presented and discussed. Finally, an associated database with microcoded facial, vocal, and body action elements, as well as observer ratings, is introduced.
Comparação linguística e perfilação gramatical sistêmica em um corpus combinado

Directory of Open Access Journals (Sweden)

Francieli Silvéria Oliveira

2015-12-01

Full Text Available Com base nos pressupostos da Linguística de Corpus (BERBER SARDINHA, 2000; VIANA, 2011, este trabalho investiga a organização gramatical e semântica de um corpus combinado de manual de instrução no par linguístico inglês / português brasileiro, objetivando apresentar a variação linguística característica desse registro, bem como comparar as línguas e descrever a produção textual de significado da tradução técnica. Pesquisas anteriores apresentam que é possível estudar os registros de uma língua por meio da análise de corpus (BIBER, 2010. Figueredo (2014 propõe o uso da perfilação gramatical sistêmica para encontrar padrões gramaticais. Mediante esses conceitos, os resultados apresentaram que o manual de instrução tem como padrão linguístico as funções semânticas ‘explicar’, ‘comandar’, ‘classificar’ e ‘introduzir’ com suas respectivas funções gramaticais. Com relação à sua tradução técnica, conclui-se que ela é constituída de textoshíbridos e multilíngues que, para significar, utilizam-se do pareamento do texto fonte, da língua alvo e de novos significados.
Toward an enhanced Arabic text classification using cosine similarity and Latent Semantic

Directory of Open Access Journals (Sweden)

Fawaz S. Al-Anzi

2017-04-01

Full Text Available Cosine similarity is one of the most popular distance measures in text classification problems. In this paper, we used this important measure to investigate the performance of Arabic language text classification. For textual features, vector space model (VSM is generally used as a model to represent textual information as numerical vectors. However, Latent Semantic Indexing (LSI is a better textual representation technique as it maintains semantic information between the words. Hence, we used the singular value decomposition (SVD method to extract textual features based on LSI. In our experiments, we conducted comparison between some of the well-known classification methods such as Naïve Bayes, k-Nearest Neighbors, Neural Network, Random Forest, Support Vector Machine, and classification tree. We used a corpus that contains 4,000 documents of ten topics (400 document for each topic. The corpus contains 2,127,197 words with about 139,168 unique words. The testing set contains 400 documents, 40 documents for each topics. As a weighing scheme, we used Term Frequency.Inverse Document Frequency (TF.IDF. This study reveals that the classification methods that use LSI features significantly outperform the TF.IDF-based methods. It also reveals that k-Nearest Neighbors (based on cosine measure and support vector machine are the best performing classifiers.
Estra: um corpus para o estudo do estilo da tradução

Directory of Open Access Journals (Sweden)

Célia Magalhães

2014-12-01

Full Text Available Este artigo apresenta a evolução e as contribuições da pesquisa em estudos da tradução orientados para corpora no Brasil. Faz-se uma revisão dos trabalhos iniciais desenvolvidos no Laboratório Experimental de Tradução (LETRA, mostrando que a maioria deles adotava uma abordagem de linguística contrastiva da tradução e que as pesquisas, gradualmente, foram evoluindo para uma preocupação com a estilística tradutória e o estilo do tradutor literário. Também se relata a compilação de um corpus para o estudo do estilo da tradução, o ESTRA, projetado exclusivamente para esse fim. Mostra-se como a pesquisa de corpora do ESTRA promove a interdisciplinaridade nos estudos da tradução e introduz a triangulação de resultados de análises realizadas com procedimentos metodológicos das diferentes abordagens usadas para estudar o estilo. Descrevem-se procedimentos metodológicos novos, em especial a etiquetagem do corpus para algumas das categorias de estilo. Termina-se o artigo com uma visão crítica sobre o que tem sido feito até o presente, apresentando perspectivas futuras de pesquisa em estilística tradutória no LETRA.
Olomouc Corpus of Spoken Czech: characterization and main features of the project

Directory of Open Access Journals (Sweden)

Pořízka, Petr

2009-01-01

Full Text Available This study presents the results of the author's research project called Olomouc Corpus of Spoken Czech (OCSC. The paper is focused on the state and partial phases of constructing the corpora, its methodology and annotation. Within the OCSC we use so called dual system of transcription, which means (1 an orthographic one with the purpose of linguistic (morpho-logical analysis and tagging and (2 a phonetic version of transcript which consists of three layers of the text: first the real transcription and further various types of the metatexts as a second and third layer, including communication aspects of the texts. The criteria of selection of speakers are also listed here and the highly important statistical analysis of the sociolin-guistic categories (gender, age, type of education, types of recordings is presented as well. This analysis can serve as a base for a partial correction of possible non-balance among those sociolinguistic parameters. The annotation rules and principles are mentioned at the end of this study.
Hereditary spastic paraplegia associated with thin corpus callosum Paraplegia espástica hereditária associada a hipoplasia de corpo caloso

Directory of Open Access Journals (Sweden)

Hélio A. Ghizoni Teive

2001-09-01

Full Text Available Autosomal recessive hereditary spastic paraplegia (AR-HSP associated with thin corpus callosum was recently described in Japan, and most families were linked to chromosome 15q13-15. We report two patients from two different Brazilian families with progressive gait disturbance starting at the second decade of life, spastic paraparesis, and mental deterioration. One patient presented cerebellar ataxia. Magnetic resonance imaging (MRI of the head of both patients showed a thin corpus callosum. AR-HSP with a thin corpus callosum is a rare disorder, mainly described in Japanese patients. We found only 4 Caucasian families with AR-HSP with thin corpus callosum described in the literature. Further studies including additional Caucasian families of AR-HSP with thin corpus callosum are required to delineate the genetic profile of this syndrome in occidental countries.A paraplegia espástica hereditária autossômica recessiva (PEH-AR associada com hipoplasia de corpo caloso foi inicialmente descrita no Japão. Estudos de ligação genética mostram que a maioria das famílias estão relacionadas ao cromossomo 15q13-15. Relatamos dois pacientes de famílias brasileiras, não relacionadas, com distúrbio de marcha com início na segunda década de vida, paraparesia espástica e comprometimento das funções cognitivas. Um dos pacientes apresentava ataxia cerebelar. A ressonância magnética de encéfalo de ambos os pacientes mostrou hipoplasia de corpo caloso. PEH-AR associada com hipoplasia de corpo caloso é uma condição rara, descrita principalmente em pacientes do Japão. Encontramos apenas 4 famílias caucasianas com PEH-AR e hipoplasia de corpo caloso. Mais estudos com famílias caucasianas são necessários para delinear o perfil genético dessa síndrome em países ocidentais.
Corpus callosum thickness on mid-sagittal MRI as a marker of brain volume: a pilot study in children with HIV-related brain disease and controls

Energy Technology Data Exchange (ETDEWEB)

Andronikou, Savvas [University of the Witwatersrand, Department of Radiology, Faculty of Health Sciences, Cape Town (South Africa); Ackermann, Christelle [University of Stellenbosch, Department of Radiology, Stellenbosch (South Africa); Laughton, Barbara; Cotton, Mark [Stellenbosch University and Tygerberg Children' s Hospital, Children' s Infectious Diseases Research Unit, Stellenbosch (South Africa); Tomazos, Nicollette [University of Cape Town, Faculty of Commerce, Department of Management Studies, Cape Town (South Africa); Spottiswoode, Bruce [University of Cape Town, MRC/UCT Medical Imaging Research Unit, Department of Human Biology, Cape Town (South Africa); Mauff, Katya [University of Cape Town, Department of Statistical Sciences, Cape Town (South Africa); Pettifor, John M. [University of the Witwatersrand, MRC/Wits Developmental Pathways for Health Research Unit, Department of Paediatrics, Faculty of Health Sciences, Witwatersrand (South Africa)

2015-07-15

Corpus callosum thickness measurement on mid-sagittal MRI may be a surrogate marker of brain volume. This is important for evaluation of diseases causing brain volume gain or loss, such as HIV-related brain disease and HIV encephalopathy. To determine if thickness of the corpus callosum on mid-sagittal MRI is a surrogate marker of brain volume in children with HIV-related brain disease and in controls without HIV. A retrospective MRI analysis in children (<5 years old) with HIV-related brain disease and controls used a custom-developed semi-automated tool, which divided the midline corpus callosum and measured its thickness in multiple locations. Brain volume was determined using volumetric analysis. Overall corpus callosum thickness and thickness of segments of the corpus callosum were correlated with overall and segmented (grey and white matter) brain volume. Forty-four children (33 HIV-infected patients and 11 controls) were included. Significant correlations included overall corpus callosum (mean) and total brain volume (P = 0.05); prefrontal corpus callosum maximum with white matter volume (P = 0.02); premotor corpus callosum mean with total brain volume (P = 0.04) and white matter volume (P = 0.02), premotor corpus callosum maximum with white matter volume (P = 0.02) and sensory corpus callosum mean with total brain volume (P = 0.02). Corpus callosum thickness correlates with brain volume both in HIV-infected patients and controls. (orig.)
Corpus callosum thickness on mid-sagittal MRI as a marker of brain volume: a pilot study in children with HIV-related brain disease and controls

International Nuclear Information System (INIS)

Andronikou, Savvas; Ackermann, Christelle; Laughton, Barbara; Cotton, Mark; Tomazos, Nicollette; Spottiswoode, Bruce; Mauff, Katya; Pettifor, John M.

2015-01-01

Corpus callosum thickness measurement on mid-sagittal MRI may be a surrogate marker of brain volume. This is important for evaluation of diseases causing brain volume gain or loss, such as HIV-related brain disease and HIV encephalopathy. To determine if thickness of the corpus callosum on mid-sagittal MRI is a surrogate marker of brain volume in children with HIV-related brain disease and in controls without HIV. A retrospective MRI analysis in children (<5 years old) with HIV-related brain disease and controls used a custom-developed semi-automated tool, which divided the midline corpus callosum and measured its thickness in multiple locations. Brain volume was determined using volumetric analysis. Overall corpus callosum thickness and thickness of segments of the corpus callosum were correlated with overall and segmented (grey and white matter) brain volume. Forty-four children (33 HIV-infected patients and 11 controls) were included. Significant correlations included overall corpus callosum (mean) and total brain volume (P = 0.05); prefrontal corpus callosum maximum with white matter volume (P = 0.02); premotor corpus callosum mean with total brain volume (P = 0.04) and white matter volume (P = 0.02), premotor corpus callosum maximum with white matter volume (P = 0.02) and sensory corpus callosum mean with total brain volume (P = 0.02). Corpus callosum thickness correlates with brain volume both in HIV-infected patients and controls. (orig.)
Morpheme matching based text tokenization for a scarce resourced language.

Science.gov (United States)

Rehman, Zobia; Anwar, Waqas; Bajwa, Usama Ijaz; Xuan, Wang; Chaoying, Zhou

2013-01-01

Text tokenization is a fundamental pre-processing step for almost all the information processing applications. This task is nontrivial for the scarce resourced languages such as Urdu, as there is inconsistent use of space between words. In this paper a morpheme matching based approach has been proposed for Urdu text tokenization, along with some other algorithms to solve the additional issues of boundary detection of compound words, affixation, reduplication, names and abbreviations. This study resulted into 97.28% precision, 93.71% recall, and 95.46% F1-measure; while tokenizing a corpus of 57000 words by using a morpheme list with 6400 entries.
Enemy Combatant Detainees: Habeas Corpus Challenges in Federal Court

Science.gov (United States)

2006-09-26

Separation of Powers Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Eliminating Federal Court Jurisdiction Where There Is No State Court Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 1 542 U.S. 466 (2004). Enemy Combatant Detainees: Habeas Corpus Challenges in Federal Court In Rasul v. Bush,1 a divided Supreme Court declared that “a state
Assessing semantic similarity of texts - Methods and algorithms

Science.gov (United States)

Rozeva, Anna; Zerkova, Silvia

2017-12-01

Assessing the semantic similarity of texts is an important part of different text-related applications like educational systems, information retrieval, text summarization, etc. This task is performed by sophisticated analysis, which implements text-mining techniques. Text mining involves several pre-processing steps, which provide for obtaining structured representative model of the documents in a corpus by means of extracting and selecting the features, characterizing their content. Generally the model is vector-based and enables further analysis with knowledge discovery approaches. Algorithms and measures are used for assessing texts at syntactical and semantic level. An important text-mining method and similarity measure is latent semantic analysis (LSA). It provides for reducing the dimensionality of the document vector space and better capturing the text semantics. The mathematical background of LSA for deriving the meaning of the words in a given text by exploring their co-occurrence is examined. The algorithm for obtaining the vector representation of words and their corresponding latent concepts in a reduced multidimensional space as well as similarity calculation are presented.
Understanding Depressive Symptoms and Psychosocial Stressors on Twitter: A Corpus-Based Study.

Science.gov (United States)

Mowery, Danielle; Smith, Hilary; Cheney, Tyler; Stoddard, Greg; Coppersmith, Glen; Bryan, Craig; Conway, Mike

2017-02-28

With a lifetime prevalence of 16.2%, major depressive disorder is the fifth biggest contributor to the disease burden in the United States. The aim of this study, building on previous work qualitatively analyzing depression-related Twitter data, was to describe the development of a comprehensive annotation scheme (ie, coding scheme) for manually annotating Twitter data with Diagnostic and Statistical Manual of Mental Disorders, Edition 5 (DSM 5) major depressive symptoms (eg, depressed mood, weight change, psychomotor agitation, or retardation) and Diagnostic and Statistical Manual of Mental Disorders, Edition IV (DSM-IV) psychosocial stressors (eg, educational problems, problems with primary support group, housing problems). Using this annotation scheme, we developed an annotated corpus, Depressive Symptom and Psychosocial Stressors Acquired Depression, the SAD corpus, consisting of 9300 tweets randomly sampled from the Twitter application programming interface (API) using depression-related keywords (eg, depressed, gloomy, grief). An analysis of our annotated corpus yielded several key results. First, 72.09% (6829/9473) of tweets containing relevant keywords were nonindicative of depressive symptoms (eg, "we're in for a new economic depression"). Second, the most prevalent symptoms in our dataset were depressed mood and fatigue or loss of energy. Third, less than 2% of tweets contained more than one depression related category (eg, diminished ability to think or concentrate, depressed mood). Finally, we found very high positive correlations between some depression-related symptoms in our annotated dataset (eg, fatigue or loss of energy and educational problems; educational problems and diminished ability to think). We successfully developed an annotation scheme and an annotated corpus, the SAD corpus, consisting of 9300 tweets randomly-selected from the Twitter application programming interface using depression-related keywords. Our analyses suggest that keyword
Organisation syntaxique des constituants autour de l’infinitif dans les textes injonctifs : Étude de cas1

Directory of Open Access Journals (Sweden)

Khodabocus Nooreeda

2016-01-01

Full Text Available Nous proposons un travail sur l’organisation syntaxique des constituants autour d’un infinitif injonctif. En effet, l’utilisation de l’infinitif dans les injonctions nous a amené à nous interroger sur le rapport entre la tournure de la phrase et la nature du texte en question. Dans cette optique, nous proposons une étude de cas sur un corpus constitué de trois différents types de textes utilisant l’infinitif injonctif : des recettes de cuisine (Davies, 2003, Collectif, 2011, un manuel d’entretien et de maintenance des appareils de laboratoire (Organisation mondiale de la Santé, 2008 et un référentiel de compétences de secouristes (Direction de la Défense et de la Sécurité Civiles, 2007. Ces textes s’adressent à des publics différents et ont une visée propre. Dans un premier temps, nous présentons brièvement les caractéristiques linguistiques du texte injonctif en considérant les différents aspects morphosyntaxique, sémantique, pragmatique, énonciatif, la notion de sujet et les propriétés de l’infinitif injonctif comme forme verbale. Dans un second temps, nous faisons une étude sur corpus. Après avoir expliqué le choix du corpus et la construction de la base de données, nous procédons à une analyse des occurrences recueillies. Nous relevons les différentes structures syntaxiques autour de l’infinitif et nous les examinons en considérant la nature des textes où elles apparaissent. Enfin, nous soumettons une interprétation des résultats obtenus.
Musculoskeletal Fitness Measures Are Not Created Equal: An Assessment of School Children in Corpus Christi, Texas

Directory of Open Access Journals (Sweden)

Toyin Ajisafe

2018-05-01

Full Text Available This study investigated current obesity prevalence and associations between musculoskeletal fitness test scores and the odds of being underweight, overweight, or obese compared to having a healthy weight in elementary school children in Corpus Christi, Texas. The sample analyzed consisted of 492 public elementary school children between kindergarten and fifth grade. Their ages ranged from 5 to 11 years. Trunk lift, 90° push-up, curl-up, and back saver sit and reach tests were administered. Weight status was determined using BMI scores and the CDC growth charts. Obesity prevalence remains high among elementary school-aged children in Corpus Christi, Texas. Higher 90° push-up test scores were most consistently associated with decreased odds of being obese as compared to being overweight and having healthy weight except in kindergarten. Conversely, higher trunk lift test scores were associated with increased odds of being obese in second and fourth grades. When children achieved the minimum score to be classified in the Healthy Fitness Zone, those with healthy weight had similarly low musculoskeletal fitness (i.e., abdominal strength and endurance, hamstring flexibility, and trunk extensor strength and flexibility as peers with overweight and obesity, especially in the lower grades. It was concluded that increased obesity prevalence in higher grades may be precipitated (at least in part by low musculoskeletal fitness in the lower grades, especially kindergarten. Given previous associations in the literature, low musculoskeletal fitness may be symptomatic of poor motor skill competence in the current sample. These findings suggest a need for early and focused school-based interventions that leverage both known and novel strategies to combat pediatric obesity in Corpus Christi.
Comorbidity is an independent prognostic factor in women with uterine corpus cancer

DEFF Research Database (Denmark)

Noer, Mette C; Sperling, Cecilie; Christensen, Ib J

2014-01-01

OBJECTIVE: To determine whether comorbidity independently affects overall survival in women with uterine corpus cancer. DESIGN: Cohort study. SETTING: Denmark. STUDY POPULATION: A total of 4244 patients registered in the Danish Gynecologic Cancer database with uterine corpus cancer from 1 January....... RESULTS: Univariate survival analysis showed a significant (p independent prognostic factor with hazard ratios...... ranging from 1.27 to 1.42 in mild, 1.69 to 1.74 in moderate, and 1.72 to 2.48 in severe comorbidity. Performance status was independently associated to overall survival and was found to slightly reduce the prognostic impact of comorbidity. CONCLUSION: Comorbidity is an independent prognostic factor...
Recognizing Cursive Typewritten Text Using Segmentation-Free System

Directory of Open Access Journals (Sweden)

Mohammad S. Khorsheed

2015-01-01

Full Text Available Feature extraction plays an important role in text recognition as it aims to capture essential characteristics of the text image. Feature extraction algorithms widely range between robust and hard to extract features and noise sensitive and easy to extract features. Among those feature types are statistical features which are derived from the statistical distribution of the image pixels. This paper presents a novel method for feature extraction where simple statistical features are extracted from a one-pixel wide window that slides across the text line. The feature set is clustered in the feature space using vector quantization. The feature vector sequence is then injected to a classification engine for training and recognition purposes. The recognition system is applied to a data corpus which includes cursive Arabic text of more than 600 A4-size sheets typewritten in multiple computer-generated fonts. The system performance is compared to a previously published system from the literature with a similar engine but a different feature set.

[Structural change of the corpus callosum fibers in toddlers with autism spectrum disorder: two-year follow-up].

Science.gov (United States)

Chang, C; Qiu, N N; Xiao, T; Xiao, X; Chu, K K; Li, Y; Wu, Q R; Fang, H; Ke, X Y

2017-12-02

Objective: To conduct a follow-up investigation of structural changes of the corpus callosum fibers of toddlers (2 to 5 years of age) with autism spectrum disorder(ASD) and to explore the associations with clinical symptoms. Method: In this prospective randomized controlled study, ASD children who were diagnosed in the Child Mental Health Research Center, Nanjing Brain Hospital Affiliated to Nanjing Medical University from May 2011 to November 2012 were included in the ASD group, and developmentally delayed children were included in the control group (DD group). Diffusion tensor imaging (DTI) data from the two groups were obtained at two age levels: 2-3 years of age, and 4-5 years of age. Region of interest analysis was applied to assess characteristic values of total area and sub-regions of corpus callosum: the fraction anisotropy (FA), the mean diffusivity (MD), the radial diffusivity (RD) and the axial diffusivity (AD). All children were assessed using the Autism Diagnostic Interview-Revised (ADI-R) and Autism Treatment Evaluation Checklist (ATEC). The characteristic values of total area and sub-regions of corpus callosum of ASD group at two age levels were analyzed by paired sample t test; the characteristic values of total area and sub-regions of corpus callosum of ASD group and DD group were analyzed by independent-sample t test; the correlations between FA values of the total area and sub-regions of corpus callosum and ADI-R or ATEC scores were analyzed by Pearson correlation analysis. Result: Forty cases meeting inclusion criteria were enrolled in ASD group, and 31 eligible cases were enrolled in the control group. Four children in the ASD group were lost to follow-up, and 5 children in the control group were lost to follow-up. Longitudinal comparison between the two age subgroups of ASD patients showed that the FA values of the total corpus callosum increased (0.499 55±0.027 59 vs . 0.505 83±0.086 64, t= 4.88, P 0.05 for all comparisons); as compared
The incidence rate of corpus uteri cancer among females in Saudi Arabia: an observational descriptive epidemiological analysis of data from Saudi Cancer Registry 2001–2008

Directory of Open Access Journals (Sweden)

Alghamdi IG

2014-01-01

Full Text Available Ibrahim G Alghamdi,1 Issam I Hussain,1 Mohamed S Alghamdi,2 Mohamed A El-Sheemy1,3 1University of Lincoln, Brayford Pool, Lincoln, UK; 2Ministry of Health, General Directorate of Health Affairs, Al-Baha, Kingdom of Saudi Arabia; 3Research and Development, Lincoln Hospital, Lincolnshire Hospitals NHS Trust, Lincoln, UK Background: The present study reviews the epidemiological data on corpus uteri cancer among Saudi women, including its frequency, crude incidence rate, and age-standardized incidence rate (ASIR, adjusted by region and year of diagnosis. Methods: A retrospective, descriptive epidemiological analysis was conducted of all the corpus uteri cancer cases recorded in the Saudi Cancer Registry between January 2001 and December 2008. The statistical analyses were performed using descriptive statistics, analysis of variance, Poisson regression, and a simple linear model. Results: A total of 1,060 corpus uteri cancer cases were included. Women aged 60–74 years of age were most affected by the disease. The region of Riyadh in Saudi Arabia had the highest overall ASIR, at 4.4 cases per 100,000 female patients, followed by the eastern region, at 4.2, and Makkah, at 3.7. Jazan, Najran, and Qassim had the lowest average ASIRs, ranging from 0.8 to 1.4. A Poisson regression model using Jazan as the reference revealed that the corpus uteri cancer incidence rate ratio was significantly higher for the regions of Makkah, at 16.5 times (95% confidence interval [CI]: 8.0–23.0, followed by Riyadh, at 16.0 times (95% CI: 9.0–22.0, and the eastern region, at 9.9 times (95% CI: 5.6–17.6. The northern region experienced the highest changes in ASIRs of corpus uteri cancer among female Saudi patients between 2001 and 2008. Conclusion: There was a slight increase in the crude incidence rates and ASIRs for corpus uteri cancer in Saudi Arabia between 2001 and 2008. Older Saudi women were most affected by the disease. Riyadh, the eastern region, and Makkah
A cascade of morphogenic signaling initiated by the meninges controls corpus callosum formation.

Science.gov (United States)

Choe, Youngshik; Siegenthaler, Julie A; Pleasure, Samuel J

2012-02-23

The corpus callosum is the most prominent commissural connection between the cortical hemispheres, and numerous neurodevelopmental disorders are associated with callosal agenesis. By using mice either with meningeal overgrowth or selective loss of meninges, we have identified a cascade of morphogenic signals initiated by the meninges that regulates corpus callosum development. The meninges produce BMP7, an inhibitor of callosal axon outgrowth. This activity is overcome by the induction of expression of Wnt3 by the callosal pathfinding neurons, which antagonize the inhibitory effects of BMP7. Wnt3 expression in the cingulate callosal pathfinding axons is developmentally regulated by another BMP family member, GDF5, which is produced by the adjacent Cajal-Retzius neurons and turns on before outgrowth of the callosal axons. The effects of GDF5 are in turn under the control of a soluble GDF5 inhibitor, Dan, made by the meninges. Thus, the meninges and medial neocortex use a cascade of signals to regulate corpus callosum development. Copyright © 2012 Elsevier Inc. All rights reserved.
The relationship between early life stress and microstructural integrity of the corpus callosum in a non-clinical population

Directory of Open Access Journals (Sweden)

Robert Paul

2008-03-01

Full Text Available Robert Paul1, Lorrie Henry2, Stuart M Grieve3, Thomas J Guilmette2,4, Raymond Niaura4, Richard Bryant5, Steven Bruce1, Leanne M Williams3,6, Clark C Richard7, Ronald A Cohen4, Evian Gordon3,71University of Missouri, St. Louis, St. Louis, MO, USA; 2Providence College, Providence, RI, USA; 3The Brain Resource International Database, The Brain Resource Company, Ultimo, NSW, Australia; 4Brown Medical School, Department of Psychiatry, Providence, RI, USA; 5School of Psychology, University of New South Wales, Sydney, NSW, Australia; 6Brain Dynamics Centre, Westmead Millennium Institute, Westmead Hospital, Westmead, NSW, Australia; 7Cognitive Neuroscience Laboratory and School of Psychology, Flinders University, Adelaide, SA, AustraliaBackground: Previous studies have examined the impact of early life stress (ELS on the gross morphometry of brain regions, including the corpus callosum. However, studies have not examined the relationship between ELS and the microstructural integrity of the brain.Methods: In the present study we evaluated this relationship in healthy non-clinical participants using diffusion tensor imaging (DTI and self-reported history of ELS.Results: Regression analyses revealed significant reductions in fractional anisotropy (FA within the genu of the corpus callosum among those exposed to the greatest number of early life stressors, suggesting reduced microstructural integrity associated with increased ELS. These effects were most pronounced in the genu of the corpus callosum compared to the body and splenium, and were evident for females rather than males despite no differences in total ELS exposure between the sexes. In addition, a further comparison of those participants who were exposed to no ELS vs. three or more ELS events revealed lower FA in the genu of the corpus callosum among the ELS-exposed group, with trends of FA reduction in the body and the whole corpus callosum. By contrast, there were no relationships between ELS
Protective effects of erythropoietin against cuprizone-induced oxidative stress and demyelination in the mouse corpus callosum

Directory of Open Access Journals (Sweden)

Iraj Ragerdi Kashani

2017-08-01

Full Text Available Objective(s: Increasing evidence in both experimental and clinical studies suggests that oxidative stress plays a major role in the pathogenesis of multiple sclerosis. The aim of the present work is to investigate the protective effects of erythropoietin against cuprizone-induced oxidative stress. Materials and Methods: Adult male C57BL/6J mice were fed a chow containing 0.2 % cuprizone for 6 weeks. After 3 weeks, mice were simultaneously treated with erythropoietin (5,000 IU/ kg body weight by daily intraperitoneal injections. Results: Our results showed that cuprizone induced oxidative stress accompanied with down-regulation of subunits of the respiratory chain complex and demyelination of corpus callosum. Erythropoietin antagonized these effects. Biochemical analysis showed that oxidative stress induced by cuprizone was regulated by erythropoietin. Similarly, erythropoietin induced the expression of subunits of the respiratory chain complex over normal control values reflecting a mechanism to compensate cuprizone-mediated down-regulation of these genes. Conclusion: The data implicate that erythropoietin abolishes destructive cuprizone effects in the corpus callosum by decreasing oxidative stress and restoring mitochondrial respiratory enzyme activity.
A Corpus-Based Comparative Study of "Learn" and "Acquire"

Science.gov (United States)

Yang, Bei

2016-01-01

As an important yet intricate linguistic feature in English language, synonymy poses a great challenge for second language learners. Using the 100 million-word British National Corpus (BNC) as data and the software Sketch Engine (SkE) as an analyzing tool, this article compares the usage of "learn" and "acquire" used in natural…
Rab proteins in the brain and corpus allatum of Bombyx mori.

Science.gov (United States)

Uno, Tomohide; Furutani, Masayuki; Watanabe, Chihiro; Sakamoto, Katsuhiko; Uno, Yuichi; Kanamaru, Kengo; Yamagata, Hiroshi; Mizoguchi, Akira; Takeda, Makio

2016-07-01

In eukaryotic cells, Rab guanosine triphosphate-ases serve as key regulators of membrane-trafficking events, such as exocytosis and endocytosis. Rab3, Rab6, and Rab27 control the regulatory secretory pathway of neuropeptides and neurotransmitters. The cDNAs of Rab3, Rab6, and Rab27 from B. mori were inserted into a plasmid, transformed into Escherichia coli, and then subsequently purified. We then produced antibodies against Rab3, Rab6, and Rab27 of Bombyx mori in rabbits and rats for use in western immunoblotting and immunohistochemistry. Western immunoblotting of brain tissue revealed a single band at approximately 26 kDa. Immunohistochemistry results revealed that Rab3, Rab6, and Rab27 expression was restricted to neurons in the pars intercerebralis and dorsolateral protocerebrum of the brain. Rab3 and Rab6 co-localized with bombyxin, an insect neuropeptide. However, there was no Rab that co-localized with prothoracicotropic hormone. The corpus allatum secretes neuropeptides synthesized in the brain into the hemolymph. Results showed that Rab3 and Rab6 co-localized with bombyxin in the corpus allatum. These findings suggest that Rab3 and Rab6 are involved in neurosecretion in B. mori. This study is the first to report a possible relationship between Rab and neurosecretion in the insect corpus allatum.
Touching the Void - Introducing CoST: Corpus of Social Touch

NARCIS (Netherlands)

Jung, Merel M.; Poppe, Ronald; Poel, Mannes; Heylen, Dirk K. J.

2014-01-01

Touch behavior is of great importance during social interaction. To transfer the tactile modality from interpersonal interaction to other areas such as Human-Robot Interaction (HRI) and remote communication automatic recognition of social touch is necessary. This paper introduces CoST: Corpus of
The Danish NOMCO Corpus Multimodal Interaction in First Acquaintance Conversations

DEFF Research Database (Denmark)

Paggio, Patrizia; Navarretta, Costanza

2016-01-01

, specifically head movements, facial expressions, and body posture. The corpus has served as the empirical basis for a number of studies of communication phenomena related to turn management, feedback exchange, information packaging and the expression of emotional attitudes. We describe the annotation scheme...
Defining Formats and Corpus- based Examples in the General ...

African Journals Online (AJOL)

rbr

Institute, University of Zimbabwe, Harare, Zimbabwe (langa@arts.uz.ac.zw). Abstract: In this article the writer ... sentative" in terms of size in order to be appropriately used as basis for such corpus-based diction- aries, the ISN editors .... (e) the format should suggest a preference rather than a restriction. For COBUILD, a good ...
Text in social networking Web sites: A word frequency analysis of Live Spaces

OpenAIRE

Thelwall, Mike

2008-01-01

Social networking sites are owned by a wide section of society and seem to dominate Web usage. Despite much research into this phenomenon, little systematic data is available. This article partially fills this gap with a pilot text analysis of one social networking site, Live Spaces. The text in 3,071 English language Live Spaces sites was monitored daily for six months and word frequency statistics calculated and compared with those from the British National Corpus. The results confirmed the...
Hereditary motor and sensory neuropathy with agenesis of the corpus callosum.

Science.gov (United States)

Dupré, Nicolas; Howard, Heidi C; Mathieu, Jean; Karpati, George; Vanasse, Michel; Bouchard, Jean-Pierre; Carpenter, Stirling; Rouleau, Guy A

2003-07-01

Hereditary motor and sensory neuropathy associated with agenesis of the corpus callosum (OMIM 218000) is an autosomal recessive disease of early onset characterized by a delay in developmental milestones, a severe sensory-motor polyneuropathy with areflexia, a variable degree of agenesis of the corpus callosum, amyotrophy, hypotonia, and cognitive impairment. Although this disorder has rarely been reported worldwide, it has a high prevalence in the Saguenay-Lac-St-Jean region of the province of Quebec (Canada) predominantly because of a founder effect. The gene defect responsible for this disorder recently has been identified, and it is a protein-truncating mutation in the SLC12A6 gene, which codes for a cotransporter protein known as KCC3. Herein, we provide the first extensive review of this disorder, covering epidemiological, clinical, and molecular genetic studies.
Using the Corpus of Spoken Afrikaans to generate an Afrikaans ...

African Journals Online (AJOL)

This paper presents two chatbot systems, ALICE and. Elizabeth, illustrating the dialogue knowledge representation and pattern matching techniques of each. We discuss the problems which arise when using the. Corpus of Spoken Afrikaans (Korpus Gesproke Afrikaans) to retrain the ALICE chatbot system with human ...
PEDANT: Parallel Texts in Göteborg

Directory of Open Access Journals (Sweden)

Daniel Ridings

2012-09-01

Full Text Available
The article presents the status of the PEDANT project with parallel corpora at the Language Bank at Göteborg University. The solutions for access to the corpus data are presented. Access is provided by way of the internet and standard applications and SGML-aware programming tools. The SGML format for encoding translation pairs is outlined together. The methods allow working with everything from plain text to texts densely encoded with linguistic information.

In hierdie artikel word 'n beskrywing gegee van die stand van die PEDANT-projek met parallelle korpora by die Taalbank by die Universiteit van Göteborg. Oplossings vir die verkryging van toegang tot die korpusdata word aangedui. Toegang word verskaf deur middel van die Internet en standaardtoepassings en SGML-sensitiewe programmeringshulpmiddels. Die SGML-formaat vir die enkodering van vertaalpare word gesamentlik geskets. Hierdie metodes laat toe dat gewerk kan word met enigiets vanaf suiwer teks tot tekste wat taalkundig dig geëtiketteer is.
The Agreement between Conjoined Subjects and Predicate: Croatian Church Slavonic Corpus Analysis

Directory of Open Access Journals (Sweden)

Ana Kovačević

2017-08-01

Full Text Available The abundance of grammatical categories in Slavonic and their overlap are particularly evident in the agreement between conjoined subjects and predicate. When they are accompanied by agreement conditions, such as word order and animacy in Slavic languages, different agreement patterns, dependent also on concrete context and speaker, are to be expected. In this paper the study of the agreement between conjoined subjects and predicate is based on an analysis of the medieval Glagolitic Croatian Church Slavonic corpus. Number, gender, and person are grammatical categories, i. e., features of conjoined noun phrases and predicate agreement. The analysis includes noun phrases conjoined by coordinating and some non-coordinating conjunctions as well as noun phrases conjoined by a gradational ‛not only [. . .] but also’ structure. Comitative and reciprocal noun phrases are included as well. The research in the given corpus shows that the conjoined noun phrases with predicate agreement can be syntactic (predicate showing agreement with one conjunct or semantic (predicate showing agreement with all conjuncts. Syntactic agreement appears as the so-called contact agreement (predicate showing agreement with the closest conjunct and as distant agreement (predicate showing agreement with the most distant conjunct. Semantic agreement is applied mostly in accordance with G. G. Corbett’s resolution rules for Slavic languages. However, the analysis shows that some resolution rules for number should be revised due to dual number. Although absent from the majority of contemporary Slavic languages, it is precisely in historical Slavic idioms that dual number reveals its identity, highlighted in agreement study as well.
Evaluation and Classification of Syntax Usage in Determining Short-Text Semantic Similarity

Directory of Open Access Journals (Sweden)

V. Batanović

2014-06-01

Full Text Available This paper outlines and categorizes ways of using syntactic information in a number of algorithms for determining the semantic similarity of short texts. We consider the use of word order information, part-of-speech tagging, parsing and semantic role labeling. We analyze and evaluate the effects of syntax usage on algorithm performance by utilizing the results of a paraphrase detection test on the Microsoft Research Paraphrase Corpus. We also propose a new classification of algorithms based on their applicability to languages with scarce natural language processing tools.
Corpus callosum atrophy is associated with mental slowing and executive deficits in subjects with age-related white matter hyperintensities: the LADIS Study

DEFF Research Database (Denmark)

Jokinen, Hanna; Ryberg, Charlotte; Kalska, Hely

2007-01-01

BACKGROUND: Previous research has indicated that corpus callosum atrophy is associated with global cognitive decline in neurodegenerative diseases, but few studies have investigated specific cognitive functions. OBJECTIVE: To investigate the role of regional corpus callosum atrophy in mental speed...... of the total corpus callosum area and its subregions with cognitive performance were analysed using multiple linear regression, controlling for volume of WMH and other confounding factors. RESULTS: Atrophy of the total corpus callosum area was associated with poor performance in tests assessing speed of mental...... processing--namely, trail making A and Stroop test parts I and II. Anterior, but not posterior, corpus callosum atrophy was associated with deficits of attention and executive functions as reflected by the symbol digit modalities and digit cancellation tests, as well as by the subtraction scores in the trail...
Corpus callosum atrophy is associated with mental slowing and executive deficits in subjects with age-related white matter hyperintensities. The LADIS study

DEFF Research Database (Denmark)

Jokinen, Hanne; Ryberg, Charlotte; Stegmann, Mikkel Bille

2007-01-01

Background: Previous research has indicated that corpus callosum atrophy is associated with global cognitive decline in neurodegenerative diseases, but few studies have investigated specific cognitive functions. Objective: To investigate the role of regional corpus callosum atrophy in mental speed...... of the total corpus callosum area and its subregions with cognitive performance were analysed using multiple linear regression, controlling for volume of WMH and other confounding factors. Results: Atrophy of the total corpus callosum area was associated with poor performance in tests assessing speed of mental...... processing - namely, trail making A and Stroop test parts I and II. Anterior, but not posterior, corpus callosum atrophy was associated with deficits of attention and executive functions as reflected by the symbol digit modalities and digit cancellation tests, as well as by the subtraction scores...
Phosphodiesterase-9 (PDE9) inhibition with BAY 73-6691 increases corpus cavernosum relaxations mediated by nitric oxide-cyclic GMP pathway in mice.

Science.gov (United States)

da Silva, F H; Pereira, M N; Franco-Penteado, C F; De Nucci, G; Antunes, E; Claudino, M A

2013-01-01

Phosphodiesterase-9 (PDE9) specifically hydrolyzes cyclic GMP, and was detected in human corpus cavernosum. However, no previous studies explored the selective PDE9 inhibition with BAY 73-6691 in corpus cavernosum relaxations. Therefore, this study aimed to characterize the PDE9 mRNA expression in mice corpus cavernosum, and investigate the effects of BAY 73-6691 in endothelium-dependent and -independent relaxations, along with the nitrergic corpus cavernosum relaxations. Male mice received daily gavage of BAY 73-6691 (or dimethylsulfoxide) at 3 mg kg(-1) per day for 21 days. Relaxant responses to acetylcholine (ACh), nitric oxide (NO) (as acidified sodium nitrite; NaNO2 solution), sildenafil and electrical-field stimulation (EFS) were obtained in corpus cavernosum in control and BAY 73-6691-treated mice. BAY 73-6691 was also added in vitro 30 min before construction of concentration-responses and frequency curves. PDE9A and PDE5 mRNA expression was detected in the mice corpus cavernosum in a similar manner. In vitro addition of BAY 73-6691 neither itself relaxed mice corpus cavernosum nor changed the NaNO2, sildenafil and EFS-induced relaxations. However, in mice treated chronically with BAY 73-6691, the potency (pEC50) values for ACh, NaNO2 and sildenafil were significantly greater compared with control group. The maximal responses (Emax) to NaNO2 and sildenafil were also significantly greater in BAY 73-6691-treated mice. BAY 73-6691 treatment also significantly increased the magnitude and duration of the nitrergic corpus cavernosum relaxations (8-32 Hz). In conclusion, murine corpus cavernosum expresses PDE9 mRNA. Prolonged PDE9 inhibition with BAY 73-6691 amplifies the NO-cGMP-mediated cavernosal responses, and may be of therapeutic value for erectile dysfunction.
Effects of Icariside II on Corpus Cavernosum and Major Pelvic Ganglion Neuropathy in Streptozotocin-Induced Diabetic Rats

Directory of Open Access Journals (Sweden)

Guang-Yi Bai

2014-12-01

Full Text Available Diabetic erectile dysfunction is associated with penile dorsal nerve bundle neuropathy in the corpus cavernosum and the mechanism is not well understood. We investigated the neuropathy changes in the corpus cavernosum of rats with streptozotocin-induced diabetes and the effects of Icariside II (ICA II on improving neuropathy. Thirty-six 8-week-old Sprague-Dawley rats were randomly distributed into normal control group, diabetic group and ICA-II treated group. Diabetes was induced by a one-time intraperitoneal injection of streptozotocin (60 mg/kg. Three days later, the diabetic rats were randomly divided into 2 groups including a saline treated placebo group and an ICA II-treated group (5 mg/kg/day, by intragastric administration daily. Twelve weeks later, erectile function was measured by cavernous nerve electrostimulation with real time intracorporal pressure assessment. The penis was harvested for the histological examination (immunofluorescence and immunohistochemical staining and transmission electron microscopy detecting. Diabetic animals exhibited a decreased density of dorsal nerve bundle in penis. The neurofilament of the dorsal nerve bundle was fragmented in the diabetic rats. There was a decreased expression of nNOS and NGF in the diabetic group. The ICA II group had higher density of dorsal nerve bundle, higher expression of NGF and nNOS in the penis. The pathological change of major pelvic nerve ganglion (including the microstructure by transmission electron microscope and the neurite outgrowth length of major pelvic nerve ganglion tissue cultured in vitro was greatly attenuated in the ICA II-treated group (p < 0.01. ICA II treatment attenuates the diabetes-related impairment of corpus cavernosum and major pelvic ganglion neuropathy in rats with Streptozotocin-Induced Diabetes.

ENTREVIS - a Spanish machine-readable text corpus

DEFF Research Database (Denmark)

Jensen, Kjær

1991-01-01

Præsentation af første halvdel et spansk tekskorpus bestående af samtlige interviews med spaniere i de to ugeskrifter Cambio16 og Tiempo i 1990. Dette korpus er siden suppleret med samtlige interviews i de samme tidsskrifter i 1995. Korpus samlede størrelse: over 1.2 million ord...
Enhanced muscarinic M1 receptor gene expression in the corpus striatum of streptozotocin-induced diabetic rats

Directory of Open Access Journals (Sweden)

Mathew Jobin

2009-04-01

Full Text Available Abstract Acetylcholine (ACh, the first neurotransmitter to be identified, regulate the activities of central and peripheral functions through interactions with muscarinic receptors. Changes in muscarinic acetylcholine receptor (mAChR have been implicated in the pathophysiology of many major diseases of the central nervous system (CNS. Previous reports from our laboratory on streptozotocin (STZ induced diabetic rats showed down regulation of muscarinic M1 receptors in the brainstem, hypothalamus, cerebral cortex and pancreatic islets. In this study, we have investigated the changes of acetylcholine esterase (AChE enzyme activity, total muscarinic and muscarinic M1 receptor binding and gene expression in the corpus striatum of STZ – diabetic rats and the insulin treated diabetic rats. The striatum, a neuronal nucleus intimately involved in motor behaviour, is one of the brain regions with the highest acetylcholine content. ACh has complex and clinically important actions in the striatum that are mediated predominantly by muscarinic receptors. We observed that insulin treatment brought back the decreased maximal velocity (Vmax of acetylcholine esterase in the corpus striatum during diabetes to near control state. In diabetic rats there was a decrease in maximal number (Bmax and affinity (Kd of total muscarinic receptors whereas muscarinic M1 receptors were increased with decrease in affinity in diabetic rats. We observed that, in all cases, the binding parameters were reversed to near control by the treatment of diabetic rats with insulin. Real-time PCR experiment confirmed the increase in muscarinic M1 receptor gene expression and a similar reversal with insulin treatment. These results suggest the diabetes-induced changes of the cholinergic activity in the corpus striatum and the regulatory role of insulin on binding parameters and gene expression of total and muscarinic M1 receptors.
Academic writing in a corpus of 4th grade science notebooks: An analysis of student language use and adult expectations of the genres of school science

Science.gov (United States)

Esquinca, Alberto

This is a study of language use in the context of an inquiry-based science curriculum in which conceptual understanding ratings are used split texts into groups of "successful" and "unsuccessful" texts. "Successful" texts could include known features of science language. 420 texts generated by students in 14 classrooms from three school districts, culled from a prior study on the effectiveness of science notebooks to assess understanding, in addition to the aforementioned ratings are the data sources. In science notebooks, students write in the process of learning (here, a unit on electricity). The analytical framework is systemic functional linguistics (Halliday and Matthiessen, 2004; Eggins, 2004), specifically the concepts of genre, register and nominalization. Genre classification involves an analysis of the purpose and register features in the text (Schleppegrell, 2004). The use of features of the scientific academic register, namely the use relational processes and nominalization (Halliday and Martin, 1993), requires transitivity analysis and noun analysis. Transitivity analysis, consisting of the identification of the process type, is conducted on 4737 ranking clauses. A manual count of each noun used in the corpus allows for a typology of nouns. Four school science genres, procedures, procedural recounts reports and explanations, are found. Most texts (85.4%) are factual, and 14.1% are classified as explanations, the analytical genre. Logistic regression analysis indicates that there is no significant probability that the texts classified as explanation are placed in the group of "successful" texts. In addition, material process clauses predominate in the corpus, followed by relational process clauses. Results of a logistic regression analysis indicate that there is a significant probability (Chi square = 15.23, p placed in the group of "successful" texts. In addition, 59.5% of 6511 nouns are references to physical materials, followed by references to
The readability of scientific texts is decreasing over time

Science.gov (United States)

2017-01-01

Clarity and accuracy of reporting are fundamental to the scientific process. Readability formulas can estimate how difficult a text is to read. Here, in a corpus consisting of 709,577 abstracts published between 1881 and 2015 from 123 scientific journals, we show that the readability of science is steadily decreasing. Our analyses show that this trend is indicative of a growing use of general scientific jargon. These results are concerning for scientists and for the wider public, as they impact both the reproducibility and accessibility of research findings. PMID:28873054
Experiences with Text Mining Large Collections of Unstructured Systems Development Artifacts at JPL

Science.gov (United States)

Port, Dan; Nikora, Allen; Hihn, Jairus; Huang, LiGuo

2011-01-01

Often repositories of systems engineering artifacts at NASA's Jet Propulsion Laboratory (JPL) are so large and poorly structured that they have outgrown our capability to effectively manually process their contents to extract useful information. Sophisticated text mining methods and tools seem a quick, low-effort approach to automating our limited manual efforts. Our experiences of exploring such methods mainly in three areas including historical risk analysis, defect identification based on requirements analysis, and over-time analysis of system anomalies at JPL, have shown that obtaining useful results requires substantial unanticipated efforts - from preprocessing the data to transforming the output for practical applications. We have not observed any quick 'wins' or realized benefit from short-term effort avoidance through automation in this area. Surprisingly we have realized a number of unexpected long-term benefits from the process of applying text mining to our repositories. This paper elaborates some of these benefits and our important lessons learned from the process of preparing and applying text mining to large unstructured system artifacts at JPL aiming to benefit future TM applications in similar problem domains and also in hope for being extended to broader areas of applications.
Floral foregrounding: A corpus-assisted, cognitive stylistic study of the foregrounding of flowers in Mrs Dalloway

DEFF Research Database (Denmark)

Jensen, Marie Møller; Lottrup, Katrine; Nordentoft, Signe

2018-01-01

The study reported here combines quantitative and qualitative methods from both cognitive stylistics and corpus stylistics to analyze the flower-motif in Virginia Woolf’s novel Mrs Dalloway. The quantitative analysis compared the frequency of flower lemmas in the novel to both a reference corpus...... consisting of Woolf’s other works as well as a general corpus (the BNC). The analysis found significant differences between the frequencies in the novel and both corpora. The qualitative analysis is based on in the statistically significant results and considers cognitive entrenchment and salience...... in relation to these. Furthermore, the analysis also links these two notions to different types of foregrounding as conceptualized in stylistics proper. Finally, aspects of repetition, parallelism and symbolism in relation to the flower-motif are considered. In conclusion, it is found that the flower...
Variation in Citational Practice in a Corpus of Student Biology Papers: From Parenthetical Plonking to Intertextual Storytelling

Science.gov (United States)

Swales, John M.

2014-01-01

This is a corpus-based study of a key aspect of academic writing in one discipline (biology) by final-year undergraduates and first-, second-, and third-year graduate students. The papers come from the Michigan Corpus of Upper-level Student Papers, a freely available electronic database. The principal aim of the study is to examine the extent of…
Constitution d’un Corpus de Français Langue Etrangère destiné aux Apprenants Allemands

Directory of Open Access Journals (Sweden)

Fauth Camille

2014-07-01

Nationale de la Recherche et Deutsche Forschungsgemeinschaft attribué à l’équipe Parole du LORIA UMR 7503, Nancy – France et à l’Equipe de Linguistique Computationnelle et de Phonétique FR 4.7 de l’Université de la Sarre Sarrebruck – Allemagne dans lequel le français et l’allemand sont des langues cibles. Pour la paire allemand-français, peu de corpus parallèles sont disponibles. Nous présentons ici l’élaboration d’un corpus de productions orales de locuteurs natifs et non natifs pour la paire allemand-français. Notre corpus entend mettre au jour les déviations phonétiques et phonologiques que les locuteurs allemands produisent lorsqu’ils apprennent le français. Ce travail s’insère dans un projet plus global, Ce projet entend étudier les difficultés que les locuteurs français rencontrent lorsqu’ils apprennent l’allemand, et réciproquement. Aussi, cinquante locuteurs allemands seront recrutés dans des milieux universitaires et scolaires (niveau lycée en Allemagne et cinquante locuteurs français dans les mêmes milieux en France. Il s’agit pour les deux populations de produire d’une part le corpus en langue étrangère (en langue française pour les locuteurs allemands et en langue allemande pour les locuteurs français mais également le corpus en langue maternelle (en allemand pour les allemands et en français pour les français. Les corpus ainsi obtenus devraient nous permettre d’identifier les difficultés que les locuteurs allemands ou français rencontrent lorsqu’ils apprennent le français ou l’allemand. Les données de contrôle sont doubles puisque l’on pourra à la fois se référer aux productions des apprenants dans leur langue maternelle (ici l’allemand, mais également à celles de locuteurs natifs (ici germanophones. Nous ne présenterons ici que la constitution du corpus en français.
Avaliação da anotação semântica do PALAVRAS e sua pós-edição manual para o Corpus Summ-it

Directory of Open Access Journals (Sweden)

Élen Cátia Tomazela

2011-01-01

Full Text Available Este artigo apresenta uma avaliação da anotação semântica automática do parser PALAVRAS e sua pós-edição manual para um corpus de textos em português – o Corpus Summ-it. Essa pós-edição visou ao aprimoramento de um modelo linguístico para a sumarização automática de textos e buscou atribuir etiquetas semânticas mais adequadas aos itens lexicais, comparadas às empregadas pelo parser. Essa tarefa foi realizada por linguistas e os casos problemáticos são apresentados neste artigo, os quais levam a considerações sobre o próprio modelo de etiquetagem do PALAVRAS. O corpus revisado estará disponível para a comunidade e poderá ser útil para várias aplicações de Processamento de Línguas Naturais.
Avaliação da anotação semântica do PALAVRAS e sua pós-edição manual para o Corpus Summ-it

Directory of Open Access Journals (Sweden)

Élen Cátia Tomazela

2011-01-01

Full Text Available Este artigo apresenta uma avaliação da anotação semântica automática do parser PALAVRAS e sua pós-edição manual para um corpus de textos em português – o Corpus Summ-it. Essa pós-edição visou ao aprimoramento de um modelo linguístico para a sumarização automática de textos e buscou atribuir etiquetas semânticas mais adequadas aos itens lexicais, comparadas às empregadas pelo parser. Essa tarefa foi realizada por linguistas e os casos problemáticos são apresentados neste artigo, os quais levam a considerações sobre o próprio modelo de etiquetagem do PALAVRAS. O corpus revisado estará disponível para a comunidade e poderá ser útil para várias aplicações de Processamento de Línguas Naturais.
Age-related signal intensity changes in the corpus callosum: assessment with three orthogonal FLAIR images

Energy Technology Data Exchange (ETDEWEB)

Yamamoto, Akira; Miki, Yukio; Kanagaki, Mitsunori; Takahashi, Takahiro; Fushimi, Yasutaka; Haque, Tabassum Laz; Togashi, Kaori [Kyoto University, Department of Nuclear Medicine and Diagnostic Imaging, Graduate School of Medicine, Kyoto (Japan); Tomimoto, Hidekazu [Kyoto University, Department of Neurology, Graduate School of Medicine, Kyoto (Japan); Konishi, Junya [Kobe University, Department of Radiology, Graduate School of Medicine, Kobe, Hyogo (Japan)

2005-11-01

The presence of age-related hyperintensities of the corpus callosum has not been thoroughly evaluated. Fifty-two patients of 50 years of age or older (mean, 71 years; range, 50-87 years) were included in this study. Fluid-attenuated inversion recovery images were obtained in three orthogonal planes. Periventricular hyperintensities (PVHs) and deep white matter hyperintensities (DWMHs) were graded according to Fazekas' rating scale. Correlations between the presence of hyperintensities in the corpus callosum and age, and the grade of PVH and DWMH were statistically analyzed. PVH was categorized as grade 0 (n=4), grade 1 (n=28), grade 2 (n=10), or grade 3 (n=10). DWMH was categorized as grade 0 (n=4), grade 1 (n=25), grade 2 (n=8), or grade 3 (n=15). Hyperintensity was considered present in the corpus callosum in 31 of the 52 patients (60%). In these 31 patients, PVH was categorized as grade 1 (n=16), grade 2 (n=7), or grade 3 (n=8), while DWMH was categorized as grade 0 (n=1), grade 1 (n=10), grade 2 (n=7), or grade 3 (n=13). The presence of callosal hyperintensities was significantly correlated with age (p=0.001), and with PVH (p=0.04) and DWMH grades (p=0.004). Hyperintensities may be present in the corpus callosum with aging, and are correlated with PVH and DWMH. (orig.)
Age-related signal intensity changes in the corpus callosum: assessment with three orthogonal FLAIR images

International Nuclear Information System (INIS)

Yamamoto, Akira; Miki, Yukio; Kanagaki, Mitsunori; Takahashi, Takahiro; Fushimi, Yasutaka; Haque, Tabassum Laz; Togashi, Kaori; Tomimoto, Hidekazu; Konishi, Junya

2005-01-01

The presence of age-related hyperintensities of the corpus callosum has not been thoroughly evaluated. Fifty-two patients of 50 years of age or older (mean, 71 years; range, 50-87 years) were included in this study. Fluid-attenuated inversion recovery images were obtained in three orthogonal planes. Periventricular hyperintensities (PVHs) and deep white matter hyperintensities (DWMHs) were graded according to Fazekas' rating scale. Correlations between the presence of hyperintensities in the corpus callosum and age, and the grade of PVH and DWMH were statistically analyzed. PVH was categorized as grade 0 (n=4), grade 1 (n=28), grade 2 (n=10), or grade 3 (n=10). DWMH was categorized as grade 0 (n=4), grade 1 (n=25), grade 2 (n=8), or grade 3 (n=15). Hyperintensity was considered present in the corpus callosum in 31 of the 52 patients (60%). In these 31 patients, PVH was categorized as grade 1 (n=16), grade 2 (n=7), or grade 3 (n=8), while DWMH was categorized as grade 0 (n=1), grade 1 (n=10), grade 2 (n=7), or grade 3 (n=13). The presence of callosal hyperintensities was significantly correlated with age (p=0.001), and with PVH (p=0.04) and DWMH grades (p=0.004). Hyperintensities may be present in the corpus callosum with aging, and are correlated with PVH and DWMH. (orig.)
Understanding disciplinary vocabularies using a full-text enabled domain-independent term extraction approach.

Science.gov (United States)

Yan, Erjia; Williams, Jake; Chen, Zheng

2017-01-01

Publication metadata help deliver rich analyses of scholarly communication. However, research concepts and ideas are more effectively expressed through unstructured fields such as full texts. Thus, the goals of this paper are to employ a full-text enabled method to extract terms relevant to disciplinary vocabularies, and through them, to understand the relationships between disciplines. This paper uses an efficient, domain-independent term extraction method to extract disciplinary vocabularies from a large multidisciplinary corpus of PLoS ONE publications. It finds a power-law pattern in the frequency distributions of terms present in each discipline, indicating a semantic richness potentially sufficient for further study and advanced analysis. The salient relationships amongst these vocabularies become apparent in application of a principal component analysis. For example, Mathematics and Computer and Information Sciences were found to have similar vocabulary use patterns along with Engineering and Physics; while Chemistry and the Social Sciences were found to exhibit contrasting vocabulary use patterns along with the Earth Sciences and Chemistry. These results have implications to studies of scholarly communication as scholars attempt to identify the epistemological cultures of disciplines, and as a full text-based methodology could lead to machine learning applications in the automated classification of scholarly work according to disciplinary vocabularies.
MR measurement of normal brainstem cerebellum and corpus callosum on midsagittal section

International Nuclear Information System (INIS)

Kogame, Saeko; Sawa, S.; Inoue, Yuichi; Fukuda, Teruo; Tada, Takuji; Shakudo, Miyuki; Yahata, Kunifumi; Shimizu, Hiroshi; Onoyama, Yasuhito.

1989-01-01

The dimensions of the brainstem, cerebellum and corpus callosum were measured on magnetic resonance (MR) images with sagittal spin-echo sequence. Eighty-two normal adults (average 49.6 years old) were measured. The mesencephalic, pontine or cerebellar diamaters and lengths could be measured more accurately and reproducibly than medullary diameter and length. The anterio-posterior diameter of the pons and the cerebellum was 23.2±1.4 mm and 26.4±2.5 mm respectively. The length of the pons and the cerebellum was 27.8±2 mm and 45.8±3.5 mm respectively. We have observed focal thinning at the body of corpus callosum in 73%. This narrowing is almost unquestionably a normal variant. (author)
Expression profile of endothelin receptors (ETA and ETB and microRNAs-155 and -199 in the corpus cavernosum of rats submitted to chronic alcoholism and diabetes mellitus

Directory of Open Access Journals (Sweden)

F.Z. Gonçalves

2018-03-01

Full Text Available Recent evidence shows that chronic ethanol consumption increases endothelin (ET-1 induced sustained contraction of trabecular smooth muscle cells of the corpora cavernosa in corpus cavernosum of rats by a mechanism that involves increased expression of ETA and ETB receptors. Our goal was to evaluate the effects of alcohol and diabetes and their relationship to miRNA-155, miRNA-199 and endothelin receptors in the corpus cavernosum and blood of rats submitted to the experimental model of diabetes mellitus and chronic alcoholism. Forty-eight male Wistar rats were divided into four groups: control (C, alcoholic (A, diabetic (D, and alcoholic-diabetic (AD. Samples of the corpus cavernosum were prepared to study the protein expression of endothelin receptors by immunohistochemistry and expression of miRNAs-155 and -199 in serum and the cavernous tissue. Immunostaining for endothelin receptors was markedly higher in the A, D, and AD groups than in the C group. Moreover, a significant hypoexpression of the miRNA-199 in the corpus cavernosum tissue from the AD group was observed, compared to the C group. When analyzing the microRNA profile in blood, a significant hypoexpression of miRNA-155 in the AD group was observed compared to the C group. The miRNA-199 analysis demonstrated significant hypoexpression in D and AD groups compared to the C group. Our findings in corpus cavernosum showed downregulated miRNA-155 and miRNA-199 levels associated with upregulated protein expression and unaltered mRNA expression of ET receptors suggesting decreased ET receptor turnover, which can contribute to erectile dysfunction in diabetic rats exposed to high alcohol levels.
Ruptured corpus luteal cyst: Prediction of clinical outcomes with CT

Energy Technology Data Exchange (ETDEWEB)

Lee, Myoung Seok; Moon, Min Hoan; Woo, Hyun Sik; Sung, Chang Kyu; Jeon, Hye Won; Lee, Taek Sang [SMG-SNU Boramae Medical Center, Seoul National University College of Medicine, Seoul (Korea, Republic of)

2017-08-01

To evaluate the determinant pretreatment CT findings that can predict surgical intervention for patients suffering from corpus luteal cyst rupture with hemoperitoneum. From January 2009 to December 2014, a total of 106 female patients (mean age, 26.1 years; range, 17–44 years) who visited the emergency room of our institute for acute abdominal pain and were subsequently diagnosed with ruptured corpus luteal cyst with hemoperitoneum were included in the retrospective study. The analysis of CT findings included cyst size, cyst shape, sentinel clot sign, ring of fire sign, hemoperitoneum depth, active bleeding in portal phase and attenuation of hemoperitoneum. The comparison of CT findings between the surgery and conservative management groups was performed with the Mann-Whitney U test or chi-square test. Logistic regression analysis was used to determine significant CT findings in predicting surgical intervention for a ruptured cyst. Comparative analysis revealed that the presence of active bleeding and the hemoperitoneum depth were significantly different between the surgery and conservative management groups and were confirmed as significant CT findings for predicting surgery, with adjusted odds ratio (ORs) of 3.773 and 1.318, respectively (p < 0.01). On the receiver-operating characteristic curve analysis for hemoperitoneum depth, the optimal cut-off value was 5.8 cm with 73.7% sensitivity and 58.6% specificity (Az = 0.711, p = 0.004). In cases with a hemoperitoneum depth > 5.8 cm and concurrent active bleeding, the OR for surgery increased to 5.786. The presence of active bleeding and the hemoperitoneum depth on a pretreatment CT scan can be predictive warning signs of surgery for a patient with a ruptured corpus luteal cyst with hemoperitoneum.
Corpus callosum atrophy in patients with mild Alzheimer's disease

DEFF Research Database (Denmark)

Frederiksen, Kristian Steen; Garde, Ellen; Skimminge, Arnold

2011-01-01

Several studies have found atrophy of the corpus callosum (CC) in patients with Alzheimer's disease (AD). However, it remains unclear whether callosal atrophy is already present in the early stages of AD, and to what extent it may be associated with other structural changes in the brain......, such as age-related white matter changes (ARWMC) and progression of the disease....
Fatty acid composition of the postmortem corpus callosum of patients with schizophrenia, bipolar disorder, or major depressive disorder.

Science.gov (United States)

Hamazaki, K; Maekawa, M; Toyota, T; Dean, B; Hamazaki, T; Yoshikawa, T

2017-01-01

Studies investigating the relationship between n-3 polyunsaturated fatty acid (PUFA) levels and psychiatric disorders have thus far focused mainly on analyzing gray matter, rather than white matter, in the postmortem brain. In this study, we investigated whether PUFA levels showed abnormalities in the corpus callosum, the largest area of white matter, in the postmortem brain tissue of patients with schizophrenia, bipolar disorder, or major depressive disorder. Fatty acids in the phospholipids of the postmortem corpus callosum were evaluated by thin-layer chromatography and gas chromatography. Specimens were evaluated for patients with schizophrenia (n=15), bipolar disorder (n=15), or major depressive disorder (n=15) and compared with unaffected controls (n=15). In contrast to some previous studies, no significant differences were found in the levels of PUFAs or other fatty acids in the corpus callosum between patients and controls. A subanalysis by sex gave the same results. No significant differences were found in any PUFAs between suicide completers and non-suicide cases regardless of psychiatric disorder diagnosis. Patients with psychiatric disorders did not exhibit n-3 PUFAs deficits in the postmortem corpus callosum relative to the unaffected controls, and the corpus callosum might not be involved in abnormalities of PUFA metabolism. This area of research is still at an early stage and requires further investigation. Copyright © 2016 Elsevier Masson SAS. All rights reserved.
Anthologie d'écrits de compositeurs extraits de recueils de motets, de messes et de livres d'orgue parus en France (XVIIe-XVIIIe siècles) : Textes rassemblés par Nathalie Berton-Blivet et Marie Demeilliez

OpenAIRE

Berton-Blivet, Nathalie; Demeilliez, Marie; Davy-Rigaux, Cécile

2014-01-01

Nous présentons ici une anthologie de textes liminaires, rédigés par les compositeurs ou leurs éditeurs, relatifs aux pratiques des musiques d'Église et à leur cadre liturgique. Ce corpus comporte des textes ou extraits de textes issus de recueils imprimés de motets, messes et livres d'orgue des XVIIe et XVIIIe siècles.; This article presents an anthology of texts written by composers or their publishers on the practices of church music and their liturgical setting. This corpus includes texts...
Anglophonic Influence in the Use of Sound Symbolism in Italian Disney Comics: A Corpus-based Analysis

Directory of Open Access Journals (Sweden)

Pischedda Pier Simone

2017-12-01

Full Text Available This article will explore the linguistic implications of employing and creating sound symbolism (ideophones, onomatopoeia and interjections in Italian Disney comics. It will endeavour to investigate the way sound symbolic forms in both imported Disney US comics and original Italian stories have profoundly influenced the development of Italian sound symbolism in the last century. The diachronic analysis is carried out thanks to the creation of a corpus of ideophones and interjections from 210 Disney stories published between 1932 and 2013. The corpus will allow the author to investigate how these forms have changed diachronically throughout the eighty years under investigation with the final aim of highlighting changes and patterns in both original and translated Italian stories. The unique status of ideophones, confirmed by language, sociological and neurological studies, has led to interesting experimentations but also to complicated dynamics. Certain linguistic settings seem to foster a better affinity towards the device- particularly if compared to Romance languages, such as Italian and Spanish, that often have to rely on Anglophone renditions. Anglicisation has indeed overshadowed previous original attempts. Nevertheless, recent creations, particularly from cartoonists, bear witness to a willingness to stretch language again in order to enhance language iconicity.

Custodians of Sacred Space : Constructing the Franciscan Holy Land through texts and sacri monti (ca. 1480-1650)

NARCIS (Netherlands)

Ritsema van Eck, M.P.

2017-01-01

This dissertation investigates the construction of the Franciscan Holy Land as an ideological construct during the late medieval and early modern period. Based on an extensive corpus of texts, defined as Franciscan Holy Land writing, and a (re-)consideration of the sacri monti of Varallo and La
Informação como objeto para construção do corpus interdisciplinar entre Ciência da Informação e Ciência da Administração | Information from object to construction to interdisciplinary corpus between Information Science and Administrative Science

Directory of Open Access Journals (Sweden)

Joaquim Francisco Cavalcante de Oliveira

2011-03-01

Full Text Available Resumo Descrição e análise dos fundamentos teóricos e procedimentos metodológicos para a construção de um corpus interdisciplinar, compreendido como “o conjunto de categorias interdisciplinares afins e de significados próximos, valorizados por duas ou mais Ciências”, nesta pesquisa entre a Ciência da Informação e a Ciência da Administração, com base na informação e gestão da informação. As etapas de seu desenvolvimento incluem o referencial teórico em torno da interdisciplinaridade das áreas estudadas, especialmente Heckhausen; pesquisa documental para identificação dos autores de Ciência da Informação e de Ciência da Administração, identificados via Currículo Lattes e Diretório de Grupos de Pesquisa do CNPq e autores estrangeiros por meio do Google. A partir da análise de citação e a análise de conteúdo em revistas brasileiras de Administração e Ciência da Informação selecionadas e da identificação dos conceitos contidos em artigos, foi construído o corpus interdisciplinar. Esta metodologia tem aplicação mais ampla e pode ser adotada em pesquisas semelhantes de interdisciplinaridade de outros campos do conhecimento. Palavras-chave interdisciplinaridade; Ciência da Informação; Ciência da Administração; informação; metodologia da pesquisa; corpus interdisciplinar; gestão da informação. Abstract A description and analysis of the theoretical foundations and methodological procedures in the development of an interdisciplinary corpus, defined/described as “a group of related interdisciplinary categories and similar meanings, recognized by two or more Sciences”, the article presents results of research between Information Science and Administrative Science, based on information and information management. The stages of its development include the theoretical referential that surrounds the interdisciplinarity of the studied areas, especially Heckhausen: documental research for the
Wann ist ein terminus technicus ein terminus technicus? – Das Beispiel àτρεμής im Corpus Hippocraticum

Directory of Open Access Journals (Sweden)

Eva Wöckener-Gade

2017-07-01

Full Text Available In this paper, I present some results of my research done in the project eXChange. My main goal was to investigate to what extent and how the terminology of the Corpus Hippocraticum has been influenced by the language of the preceding (esp. lyric literature. A common view in the linguistics concerned with languages for special purposes (the German ‘Fachsprachen’ holds that termini technici are often created by implementing a word from the common language into a technical context and narrowing its former meaning down to a specialized and strictly defined one. While this mechanism could be traced for several terms in the Corpus Hippocraticum, some terms could be found which denote special processes or methods and have been taken over into the medical context without a significant change of meaning. The question is raised if these terms can be regarded as termini technici even if their meaning has not been coined in a specialized context.
Marchiafava-Bignami disease: magnetic resonance imaging findings in corpus callosum and subcortical white matter

Energy Technology Data Exchange (ETDEWEB)

Kawarabuki, Kentaro E-mail: bukky@h2.dion.ne.jp; Sakakibara, Takehiko; Hirai, Makoto; Yoshioka, Yuji; Yamamoto, Yasumasa; Yamaki, Tarumi

2003-11-01

A case of Marchiafava-Bignami disease (MBD) is presented using magnetic resonance imaging (MRI). A patient with a long history of alcoholism developed a gait disturbance with involuntary movements at the lower extremities. MRI scans taken at the onset showed no particular abnormalities. He progressed to a coma 10 days later. MRI scans taken 20 days after the onset showed a focal lesion at the genu of the corpus callosum and he was diagnosed as having MBD. In addition, multiple lesions were observed in bilateral frontoparietal subcortical white matter. These lesions demonstrated similar intense MRI signals as the corpus callosum.
Text Comprehension And roduction in University Students: Text Reformulation

OpenAIRE

Tittarelli, Ana María; Piacente, Irma Telma

2006-01-01

This paper sets out to report on findings about features of task-specific reformulation observed in university students in the middle stretch of the Psychology degree course (N=58) and in a reference group of students from the degree courses in Modern Languages, Spanish and Library Studies (N=33) from the National University of La Plata (Argentina). Three types of reformulation were modeled: summary reformulation, comprehensive and productive reformulation.The study was based on a corpus of 6...
Text comprehension and production in university students: text reformulation

OpenAIRE

Tittarelli, Ana María; Piacente, Telma

2006-01-01

This paper sets out to report on findings about features of task-specific reformulation observed in university students in the middle stretch of the Psychology degree course (N=58) and in a reference group of students from the degree courses in Modern Languages, Spanish and Library Studies (N=33) from the National University of La Plata (Argentina). Three types of reformulation were modeled: summary reformulation, comprehensive and productive reformulation.The study was based on a corpus of 6...
From university research to innovation: Detecting knowledge transfer via text mining

Energy Technology Data Exchange (ETDEWEB)

Woltmann, S.; Clemmensen, L.; Alkærsig, L

2016-07-01

Knowledge transfer by universities is a top priority in innovation policy and a primary purpose for public research funding, due to being an important driver of technical change and innovation. Current empirical research on the impact of university research relies mainly on formal databases and indicators such as patents, collaborative publications and license agreements, to assess the contribution to the socioeconomic surrounding of universities. In this study, we present an extension of the current empirical framework by applying new computational methods, namely text mining and pattern recognition. Text samples for this purpose can include files containing social media contents, company websites and annual reports. The empirical focus in the present study is on the technical sciences and in particular on the case of the Technical University of Denmark (DTU). We generated two independent text collections (corpora) to identify correlations of university publications and company webpages. One corpus representing the company sites, serving as sample of the private economy and a second corpus, providing the reference to the university research, containing relevant publications. We associated the former with the latter to obtain insights into possible text and semantic relatedness. The text mining methods are extrapolating the correlations, semantic patterns and content comparison of the two corpora to define the document relatedness. We expect the development of a novel tool using contemporary techniques for the measurement of public research impact. The approach aims to be applicable across universities and thus enable a more holistic comparable assessment. This rely less on formal databases, which is certainly beneficial in terms of the data reliability. We seek to provide a supplementary perspective for the detection of the dissemination of university research and hereby enable policy makers to gain additional insights of (informal) contributions of knowledge
Disconnection Syndrome and Verbal, Spatial and Tactile Amnesia following a Tumor of the Splenium of the Corpus Callosum

Directory of Open Access Journals (Sweden)

Marina Scarpa

1990-01-01

Full Text Available A patient with a severe amnesic syndrome following a glioma of the splenium of the corpus callosum is reported. The long-term memory deficit involved anterograde as well as retrograde events dating back to 40 years and causing topographical disorientation. Short-term memory test performance was in the normal range, with the exception of tactile memory which was severely impaired. The patient also showed disconnection symptoms, due to severing of occipito-parietal and parieto-temporal connections, while parieto-parietal connections were undamaged.
Splenial lesions of the corpus callosum: Disease Spectrum and MRI findings

Energy Technology Data Exchange (ETDEWEB)

Park, Sung Eun; Choi, Dae Seob; Shin, Hwa Seon; Baek, Hye Jin; Choi, Ho Cheol; Kim, Ji Eun; Choi, Hye Young; Park, Min Jung [Dept. of Radiology, Gyeongsang National University School of Medicine, Jinju (Korea, Republic of)

2017-08-01

The corpus callosum (CC) is the largest white matter structure in the brain, consisting of more than 200–250 million axons that provide a large connection mainly between homologous cerebral cortical areas in mirror image sites. The posterior end of the CC is the thickest part, which is called the slenium. Various diseases including congenital to acquired lesions including congenital anomalies, traumatic lesions, ischemic diseases, tumors, metabolic, toxic, degenerative, and demyelinating diseases, can involve the splenium of the CC and their clinical symptoms and signs are also variable. Therefore, knowledge of the disease entities and the imaging findings of lesions involving the splenium is valuable in clinical practice. MR imaging is useful for the detection and differential diagnosis of splenial lesions of the CC. In this study, we classify the disease entities and describe imaging findings of lesions involving the splenium of the CC based on our experiences and a review of the literature.
Corpus luteum blood flow in normal and abnormal early pregnancy: evaluation and analysis with transvaginal color and pulsed doppler sonography

International Nuclear Information System (INIS)

Tang Xiaoyi; Lin Meifang; Zheng Meirong; Liang Xiaoxian; Liu Jianfeng

2005-01-01

Objective: Detecting and assessment the corpus luteum blood flow in normal and abnormal early pregnancy. Methods: Using transvaginal color and pulse Doppler sonography, we detected 215 pregnant women including 150 normal intrauterine pregnancies, 25 abortion, 29 ectopic pregnancies, and then recorded corpus luteum blood flow feature and the blood flow indexes (Vmax, RI and PI). Results: 1) Corpus luteum was successfully identified in 148 cases out of 150 of normal early pregnancies, 25 cases out of 26 of threatened abortion; 22 cases out of 29 of ectopic pregnancy. 2) Three groups shared the same feature of Color Doppler imaging: a circumferential rim around the entire corpus luteum. 3) The flow index revealed mean PVS, RI and PI had no statistical difference in normal and abnormal early pregnancy; The mean PVS was lower in ectopic pregnancy than in normal pregnancy (P<0.05), while PI and PR had no characteristic in ectopic pregnancy group compared with the indexes obtained in normal pregnancy group. Conclusion: The corpus luteum can be precisely identified in most pregnancy using transvaginal color Doppler and manifests a characterized rim Doppler imaging. PVS may help in differentiating the ectopic pregnancy from normal early pregnancy. (authors)
The Use of Corpus Examples for Language Comprehension and Production

Science.gov (United States)

Frankenberg-Garcia, Ana

2014-01-01

One of the many new features of English language learners' dictionaries derived from the technological developments that have taken place over recent decades is the presence of corpus-based examples to illustrate the use of words in context. However, empirical studies have generally not been able to produce conclusive evidence about their…
Interaction as 'involvement' in writing for students: a corpus linguistic ...

African Journals Online (AJOL)

Interaction as 'involvement' in writing for students: a corpus linguistic analysis of a key readability feature. E Hilton Hubbard. Abstract. The rapid change in the demographics of South Africa's tertiary level student population over the last decade — and most specifically the huge increase in those who have to study at a ...
Stochastic modeling and mathematical statistics a text for statisticians and quantitative scientists

CERN Document Server

Samaniego, Francisco J

2014-01-01

""Stochastic Modeling and Mathematical Statistics is a new and welcome addition to the corpus of undergraduate statistical textbooks in the market. The singular thing that struck me when I initially perused the book was its lucid and endearing conversational tone, which pervades the entire text. It radiated warmth. … In my course at the University of Michigan, I rely primarily on my own lecture notes and have used Rice as supplementary material. Having gone through this text, I am strongly inclined to add this to the supplementary list as well. I have little doubt that this book will be very s
Tracking Anglicisms in Domains by the Corpus-Linguistic Method

DEFF Research Database (Denmark)

Mousten, Birthe; Laursen, Anne Lise

2015-01-01

Lay investors and semi-professionals lean on professional stock bloggers and stock analysts for advice on stock investments; semi-professionals and professionals write about investments globally, and stock information has to be available in many local markets. Using the correct terminology......’s critical sense is not enough to make the right choices. Our corpus-linguistic tool can be a help in this specialized field....
Segmentation of the Canine Corpus Callosum using Diffusion Tensor Imaging Tractography

Science.gov (United States)

Pierce, T.T.; Calabrese, E.; White, L.E.; Chen, S.D.; Platt, S.R.; Provenzale, J.M.

2014-01-01

Background We set out to determine functional white matter (WM) connections passing through the canine corpus callosum useful for subsequent studies of canine brains that serve as models for human WM pathway disease. Based on prior studies, we anticipated that the anterior corpus callosum would send projections to the anterior cerebral cortex while progressively posterior segments would send projections to more posterior cortex. Methods A post mortem canine brain was imaged using a 7T MRI producing 100 micron isotropic resolution DTI analyzed by tractography. Using ROIs within cortical locations, which were confirmed by a Nissl stain that identified distinct cortical architecture, we successfully identified 6 important WM pathways. We also compared fractional anisotropy (FA), apparent diffusion coefficient (ADC), radial diffusivity (RD), and axial diffusivity (AD) in tracts passing through the genu and splenium. Results Callosal fibers were organized based upon cortical destination, i.e. fibers from the genu project to the frontal cortex. Histologic results identified the motor cortex based on cytoarchitectonic criteria that allowed placement of ROIs to discriminate between frontal and parietal lobes. We also identified cytoarchitecture typical of the orbital frontal, anterior frontal, and occipital regions and placed ROIs accordingly. FA, ADC, RD and AD values were all higher in posterior corpus callosum fiber tracts. Conclusions Using 6 cortical ROIs, we identified 6 major white matter tracts that reflect major functional divisions of the cerebral hemispheres and we derived quantitative values that can be used for study of canine models of human WM pathological states. PMID:24370161
Segmentation of the canine corpus callosum using diffusion-tensor imaging tractography.

Science.gov (United States)

Pierce, Theodore T; Calabrese, Evan; White, Leonard E; Chen, Steven D; Platt, Simon R; Provenzale, James M

2014-01-01

We set out to determine functional white matter (WM) connections passing through the canine corpus callosum; these WM connections would be useful for subsequent studies of canine brains that serve as models for human WM pathway disease. Based on prior studies, we anticipated that the anterior corpus callosum would send projections to the anterior cerebral cortex whereas progressively posterior segments would send projections to more posterior cortex. A postmortem canine brain was imaged using a 7-T MRI system producing 100-μm-isotropic-resolution diffusion-tensor imaging analyzed by tractography. Using regions of interest (ROIs) within cortical locations, which were confirmed by a Nissl stain that identified distinct cortical architecture, we successfully identified six important WM pathways. We also compared fractional anisotropy (FA), apparent diffusion coefficient (ADC), radial diffusivity, and axial diffusivity in tracts passing through the genu and splenium. Callosal fibers were organized on the basis of cortical destination (e.g., fibers from the genu project to the frontal cortex). Histologic results identified the motor cortex on the basis of cytoarchitectonic criteria that allowed placement of ROIs to discriminate between frontal and parietal lobes. We also identified cytoarchitecture typical of the orbital frontal, anterior frontal, and occipital regions and placed ROIs accordingly. FA, ADC, radial diffusivity, and axial diffusivity values were all higher in posterior corpus callosum fiber tracts. Using six cortical ROIs, we identified six major WM tracts that reflect major functional divisions of the cerebral hemispheres, and we derived quantitative values that can be used for study of canine models of human WM pathologic states.
Primary Diffuse Large B-cell Lymphoma of the Uterus Manifesting as a Leiomyoma: A Unique Presentation with Review of Literature

Directory of Open Access Journals (Sweden)

Rajan Dewar

2013-01-01

Full Text Available We report a primary diffuse large B-cell lymphoma of uterine corpus in a 70-years old woman who presented with symptoms of increased urinary frequency and sense of bloating. Magnetic Resonance Imaging (MRI findings were suggestive of a degenerating intramural fibroid. Histological examination of tissue samples obtained during hysteroscopy showed diffuse infiltration of fibrous stroma by atypical enlarged mononuclear cells. Immunohistochemical studies were consistent with the diagnosis of diffuse large B-cell lymphoma.Further imaging studies showed no evidence of lymphoma outside the uterus. To our knowledge,this represents the first welldocumented case of primary uterine lymphoma presenting as a leiomyoma on imaging studies.
Advantages and Disadvantages in the Use of Internet as a Corpus

DEFF Research Database (Denmark)

Tarp, Sven; Fuertes-Olivera, Pedro A.

2016-01-01

This paper initially discusses some of the consequences which the technological development has for lexicography, especially in terms of the different types of empirical basis which can be used in dictionary projects. The most important advantages and disadvantages of using the Internet as a corpus...
Methodological Flaws in Corpus-Based Studies on Malaysian ESL Textbooks

Science.gov (United States)

Zarifi, Abdolvahed; Mukundan, Jayakaran; Rezvani Kalajahi, Seyed Ali

2014-01-01

With the increasing interest among the pedagogy researchers in the use of corpus linguistics methodologies to study textbooks, there has emerged a similar enthusiasm among the materials developers to draw on empirical findings in the development of the state-of-the-art curricula and syllabi. In order for these research findings to have their…
The significance of estradiol metabolites in human corpus luteum physiology.

Science.gov (United States)

Devoto, Luigi; Henríquez, Soledad; Kohen, Paulina; Strauss, Jerome F

2017-07-01

The human corpus luteum (CL) is a temporary endocrine gland derived from the ovulated follicle. Its formation and limited lifespan is critical for steroid hormone production required to support menstrual cyclicity, endometrial receptivity for successful implantation, and the maintenance of early pregnancy. Endocrine and paracrine-autocrine molecular mechanisms associated with progesterone production throughout the luteal phase are critical for the development, maintenance, regression, and rescue by hCG which sustains CL function into early pregnancy. However, the signaling systems driving the regression of the primate corpus luteum in non-conception cycles are not well understood. Recently, there has been interest in the functional roles of estradiol metabolites (EMs), mostly in estrogen-producing tissues. The human CL produces a number of EMs, and it has been postulated that the EMs acting via paracrine-autocrine pathways affect angiogenesis or LH-mediated events. The present review describes advances in understanding the role of EMs in the functional lifespan and regression of the human CL in non-conception cycles. Copyright © 2017 Elsevier Inc. All rights reserved.

A Corpus-based Study of EFL Learners’ Errors in IELTS Essay Writing

Directory of Open Access Journals (Sweden)

Hoda Divsar

2017-03-01

Full Text Available The present study analyzed different types of errors in the EFL learners’ IELTS essays. In order to determine the major types of errors, a corpus of 70 IELTS examinees’ writings were collected, and their errors were extracted and categorized qualitatively. Errors were categorized based on a researcher-developed error-coding scheme into 13 aspects. Based on the descriptive statistical analyses, the frequency of each error type was calculated and the commonest errors committed by the EFL learners in IELTS essays were identified. The results indicated that the two most frequent errors that IELTS candidates committed were related to word choice and verb forms. Based on the research results, pedagogical implications highlight analyzing EFL learners’ writing errors as a useful basis for instructional purposes including creating pedagogical teaching materials that are in line with learners’ linguistic strengths and weaknesses.
Construction of a Learner Corpus for Japanese Language Learners: Natane and Nutmeg

Directory of Open Access Journals (Sweden)

Kikuko NISHINA

2014-12-01

Full Text Available Japanese language learners aim to acquire reading, listening, writing and speaking skills. We at the Hinoki project (https://hinoki-project.org/ have recently been working on the Natsume collocation search system (https://hinoki-project.org/natsume/, the Natane learner corpus to support Natsume (https://hinoki-project.org/natane/ and the Nutmeg writing support system (http://hinoki-project.org/nutmeg/. In order to test the effectiveness of Nutmeg, we conducted an online experiment with 36 participants who used the system's register misuse identification feature to correct four writing assignments. Results show that Nutmeg can be an effective tool in correcting common register-related errors, especially those involving auxiliary verbs. However, the accuracy of verb and adverb identification was too low, suggesting the need for improvements in the variety of corpora used for identifying register misuse.
Metodología de corpus y formación en la traducción especializada (inglés-español: una propuesta para la mejora de la adquisición de vocabulario especializado

Directory of Open Access Journals (Sweden)

María del Mar Sánchez Ramos

2017-07-01

Full Text Available El presente artículo tiene como principal objetivo ejemplificar la metodología de corpus en la etapa inicial de adquisición de vocabulario especializado como parte del desarrollo de la competencia léxica traductora. En un primer momento el trabajo se adentra en el concepto de competencia léxica traductora. Seguidamente, se describe cómo la metodología de corpus ocupa un lugar primordial en la formación de traductores y cómo puede ayudar en la fase de pretraducción (traducción inversa de un texto científico. Finalmente se propone el uso de un corpus virtual monolingüe como herramienta documental previa para la adquisición de vocabulario especializado y la valoración inicial de la propuesta por parte del alumnado.
Uptake of 3H-choline and synthesis of 3H-acetylcholine by human penile corpus cavernosum

International Nuclear Information System (INIS)

Blanco, R.; Saenz de Tejada, I.; Azadzoi, K.; Goldstein, I.; Krane, R.J.; Wotiz, H.H.; Cohen, R.A.

1986-01-01

The neuroeffectors which relax penile smooth muscle and lead to erection are unknown; physiological studies of human corpus cavernosum, in vitro, have suggested a significant role of cholinergic neurotransmission. To further characterize the importance of cholinergic nerves, biopsies of human corpus cavernosum were obtained at the time of penile prosthesis implantation. Tissues were incubated in 3 H-choline (10 -5 M, 80 Ci/mmol) in oxygenated physiological salt solution at 37 0 C, pH 7.4 for 1 hour. Radiolabelled compounds were extracted with perchloric acid (0.4 M) and acetylcholine and choline were separated by HPLC; 14 C-acetylcholine was used as internal standard. 3 H-choline was accumulated by the tissues (20 +/- 1.9 fmol/mg), and 3 H-acetylcholine was synthesized (4.0 +/- 1.1 fmol/mg). In control experiments, heating of the tissue blocked synthesis of 3 H-acetylcholine. Inhibition of high affinity choline transport by hemicholinium-3 (10 -5 M) diminished tissue accumulation of 3 H-choline and significantly reduced the synthesis of 3 H-acetylcholine (0.5 +/ 0.2 fmol/mg, p < 0.05). These results provide direct evidence of neuronal accumulation of choline and enzymatic conversion to acetylcholine in human corpus cavernosum. Taken together with the physiological studies, it can be concluded that cholinergic neurotransmission in human corpus cavernosum plays a role in penile erection
The BioLexicon: a large-scale terminological resource for biomedical text mining

Directory of Open Access Journals (Sweden)

Thompson Paul

2011-10-01

Full Text Available Abstract Background Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. Results This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is
Le corpus lexicographique dans les langues à tradition orale: le cas ...

African Journals Online (AJOL)

rbr

langues à tradition orale: les informateurs et la représentativité du corpus. Cette dernière, qui doit ..... les techniques, les instruments, la pêche par les hommes, la pêche par .... cet outil indispensable qu'est le dictionnaire. Je pense que ces ...
Partial segmental thrombosis of the corpus cavernosum presenting with perineal pain.

Science.gov (United States)

Christodoulidou, Michelle; Parnham, Arie; Ramachandran, Navin; Muneer, Asif

2016-11-22

We describe the case of a man aged 43 years who presented with a 2-week history of a palpable lump in the right proximal penile shaft. This was preceded by a 6-month history of perineal pain, accompanied by erectile dysfunction. An urgent MRI scan of his penis identified a thrombus within the right crus and corpus of the penis. His thrombophilia screen was normal. The patient was started on oral anticoagulation and a phosphodiesterase inhibitor (PDE-5i) to prevent thrombus progression and maintain erectile function. At 5 months, the patients' symptoms had resolved and an MRI showed a reduction in the thrombus size. MRI is a useful imaging modality to diagnose a thrombus within the corpus cavernosum in patients presenting with a history of penile and perineal pain together with a palpable lump. The non-enhancement of the lesion helps to differentiate this from alternative rare lesions within the penis and perineum. 2016 BMJ Publishing Group Ltd.
Attitudinal Modeling of Affect, Behavior and Cognition: Semantic Mining of Disaster Text Corpus

Science.gov (United States)

2010-10-01

known as H1N1, H1N2 , H3N1, H3N2, and H2N3. Swine influenza virus is common throughout pig populations worldwide. Transmission of the virus from pigs to...large region, for instance a continent, or even worldwide. Swine Flu Swine influenza (also called pig influenza , swine flu, hog flu and pig flu) is an...infection by any one of several types of swine influenza virus. Swine influenza virus (SIV) or S-OIV (swine- origin influenza virus) is any strain
Male non-insulin users with type 2 diabetes mellitus are predisposed to gastric corpus-predominant inflammation after H. pylori infection.

Science.gov (United States)

Yang, Yao-Jong; Wu, Chung-Tai; Ou, Horng-Yih; Lin, Chin-Han; Cheng, Hsiu-Chi; Chang, Wei-Lun; Chen, Wei-Ying; Yang, Hsiao-Bai; Lu, Cheng-Chan; Sheu, Bor-Shyang

2017-10-30

Both H. pylori infection and diabetes increase the risk of gastric cancer. This study investigated whether patients with type 2 diabetes mellitus (T2DM) and H. pylori infection had more severe corpus gastric inflammation and higher prevalence of precancerous lesions than non-diabetic controls. A total of 797 patients with type 2 diabetes mellitus were screened for H. pylori, of whom 264 had H. pylori infection. Of these patients, 129 received esophagogastroduodenoscopy to obtain topographic gastric specimens for gastric histology according to the modified Updated Sydney System, corpus-predominant gastritis index (CGI), Operative Link on Gastritis Assessment, and Operative Link on Gastric Intestinal Metaplasia Assessment. Non-diabetic dyspeptic patients who had H. pylori infection confirmed by esophagogastroduodenoscopy were enrolled as controls. The male as well as total T2DM patients had higher acute/chronic inflammatory and lymphoid follicle scores in the corpus than non-diabetic controls (p H. pylori-infected patients with type 2 diabetes mellitus. Patients with type 2 diabetes mellitus and H. pylori infection had more severe corpus gastric inflammation than non-diabetic controls. Moreover, male gender and non-insulin users of T2DM patients were predisposed to have corpus-predominant gastritis after H. pylori infection. ClinicalTrial: NCT02466919 , retrospectively registered may 17, 2015.
Involvement of corpus callosum in amyotrophic lateral sclerosis shown by MRI

Energy Technology Data Exchange (ETDEWEB)

Zandijcke, M. van [Dept. of Neurology, Bruges (Belgium); Casselman, J. [Dept. of Medical Imaging, Bruges (Belgium)

1995-05-01

Abnormal high signal in the corticospinal tracts on MRI has been described in amyotrophic lateral sclerosis. We report a case with further high signal in fibres of the corpus callosum on proton density and T2-weighted spin-echo images, closely matching findings of earlier pathological reports. (orig.)
Involvement of corpus callosum in amyotrophic lateral sclerosis shown by MRI

International Nuclear Information System (INIS)

Zandijcke, M. van; Casselman, J.

1995-01-01

Abnormal high signal in the corticospinal tracts on MRI has been described in amyotrophic lateral sclerosis. We report a case with further high signal in fibres of the corpus callosum on proton density and T2-weighted spin-echo images, closely matching findings of earlier pathological reports. (orig.)
Usability-driven pruning of large ontologies: the case of SNOMED CT.

Science.gov (United States)

López-García, Pablo; Boeker, Martin; Illarramendi, Arantza; Schulz, Stefan

2012-06-01

To study ontology modularization techniques when applied to SNOMED CT in a scenario in which no previous corpus of information exists and to examine if frequency-based filtering using MEDLINE can reduce subset size without discarding relevant concepts. Subsets were first extracted using four graph-traversal heuristics and one logic-based technique, and were subsequently filtered with frequency information from MEDLINE. Twenty manually coded discharge summaries from cardiology patients were used as signatures and test sets. The coverage, size, and precision of extracted subsets were measured. Graph-traversal heuristics provided high coverage (71-96% of terms in the test sets of discharge summaries) at the expense of subset size (17-51% of the size of SNOMED CT). Pre-computed subsets and logic-based techniques extracted small subsets (1%), but coverage was limited (24-55%). Filtering reduced the size of large subsets to 10% while still providing 80% coverage. Extracting subsets to annotate discharge summaries is challenging when no previous corpus exists. Ontology modularization provides valuable techniques, but the resulting modules grow as signatures spread across subhierarchies, yielding a very low precision. Graph-traversal strategies and frequency data from an authoritative source can prune large biomedical ontologies and produce useful subsets that still exhibit acceptable coverage. However, a clinical corpus closer to the specific use case is preferred when available.
Reversible splenial lesion on the corpus callosum in nonfulminant hepatitis A presenting as encephalopathy

Directory of Open Access Journals (Sweden)

Soon Young Ko

2014-12-01

Full Text Available Reversible focal lesions on the splenium of the corpus callosum (SCC have been reported in patients with mild encephalitis/encephalopathy caused by various infectious agents, such as influenza, mumps, adenovirus, Varicella zoster, Escherichia coli, Legionella pneumophila, and Staphylococcus aureus. We report a case of a reversible SCC lesion causing reversible encephalopathy in nonfulminant hepatitis A. A 30-year-old healthy male with dysarthria and fever was admitted to our hospital. After admission his mental status became confused, and so we performed electroencephalography (EEG and magnetic resonance imaging (MRI of the brain, which revealed an intensified signal on diffusion-weighted imaging (DWI at the SCC. His mental status improved 5 days after admission, and the SCC lesion had completely disappeared 15 days after admission.
Quantitative analysis of the myelin g-ratio from electron microscopy images of the macaque corpus callosum

Directory of Open Access Journals (Sweden)

Nikola Stikov

2015-09-01

Full Text Available We provide a detailed morphometric analysis of eight transmission electron micrographs (TEMs obtained from the corpus callosum of one cynomolgus macaque. The raw TEM images are included in the article, along with the distributions of the axon caliber and the myelin g-ratio in each image. The distributions are analyzed to determine the relationship between axon caliber and g-ratio, and compared against the aggregate metrics (myelin volume fraction, fiber volume fraction, and the aggregate g-ratio, as defined in the accompanying research article entitled ‘In vivo histology of the myelin g-ratio with magnetic resonance imaging’ (Stikov et al., NeuroImage, 2015.
Effect of biliary cirrhosis on nonadrenergic noncholinergic-mediated relaxation of rat corpus cavernosum: Role of nitric oxide pathway and endocannabinoid system

Directory of Open Access Journals (Sweden)

Dehpour A.R.

2008-06-01

Full Text Available Background: Relaxation of the corpus cavernosum plays a major role in penile erection. Nitric oxide (NO is known to be the most important factor mediating relaxation of corpus cavernosum, which is mainly derived from nonadrenergic noncholinergic (NANC nerves. The aim of the present study was to investigate the effect of biliary cirrhosis on nonadrenergic noncholinergic (NANC-mediated relaxation of rat corpus cavernosum as well as the possible relevant roles of endocannabinoid and nitric oxide systems.Methods: Corporal strips from sham-operated and biliary cirrhotic rats were mounted under tension in a standard oxygenated organ bath with guanethidine sulfate (5 µM and atropine (1 µM to induce adrenergic and cholinergic blockade. The strips were precontracted with phenylephrine hydrochloride (7.5 µM and electrical field stimulation was applied at different frequencies (2, 5, 10, 15 Hz to obtain NANC-mediated relaxation. In separate precontracted strips of the sham and cirrhotic groups, the concentration-dependent relaxant responses to sodium nitroprusside (10 nM-1mM, as an NO donor, were assessed. Results: The NANC-mediated relaxation was significantly enhanced in cirrhotic animals (P<0.01. Anandamide potentiated the relaxations in both groups (P<0.05. The cannabinoid CB1 receptor antagonist AM251 (10 µM and the vanilloid receptor antagonist capsazepine (10 µM each significantly prevented the enhanced relaxations in cirrhotic rats (P<0.01. The CB2 receptor antagonist AM630 had no effect on relaxations in the cirrhotic group. In a concentration-dependent manner, L-NAME (30-1000 nM inhibited relaxations in both the sham and cirrhotic groups, although cirrhotic groups were more resistant to the inhibitory effects of L-NAME. The degree of relaxation induced by sodium nitroprusside (10 nM-1 mM was similar in the two groups.Conclusions: Biliary cirrhosis enhances the neurogenic relaxation in rat corpus cavernosum probably via the NO pathway and
ParaText : scalable solutions for processing and searching very large document collections : final LDRD report.

Energy Technology Data Exchange (ETDEWEB)

Crossno, Patricia Joyce; Dunlavy, Daniel M.; Stanton, Eric T.; Shead, Timothy M.

2010-09-01

This report is a summary of the accomplishments of the 'Scalable Solutions for Processing and Searching Very Large Document Collections' LDRD, which ran from FY08 through FY10. Our goal was to investigate scalable text analysis; specifically, methods for information retrieval and visualization that could scale to extremely large document collections. Towards that end, we designed, implemented, and demonstrated a scalable framework for text analysis - ParaText - as a major project deliverable. Further, we demonstrated the benefits of using visual analysis in text analysis algorithm development, improved performance of heterogeneous ensemble models in data classification problems, and the advantages of information theoretic methods in user analysis and interpretation in cross language information retrieval. The project involved 5 members of the technical staff and 3 summer interns (including one who worked two summers). It resulted in a total of 14 publications, 3 new software libraries (2 open source and 1 internal to Sandia), several new end-user software applications, and over 20 presentations. Several follow-on projects have already begun or will start in FY11, with additional projects currently in proposal.
Quantitative analysis of the corpus callosum in children with cerebral palsy and developmental delay: correlation with cerebral white matter volume

International Nuclear Information System (INIS)

Panigrahy, Ashok; Barnes, Patrick D.; Robertson, Robert L.; Sleeper, Lynn A.; Sayre, James W.

2005-01-01

This study was conducted to quantitatively correlate the thickness of the corpus callosum with the volume of cerebral white matter in children with cerebral palsy and developmental delay. Material and methods: A clinical database of 70 children with cerebral palsy and developmental delay was established with children between the ages of 1 and 5 years. These children also demonstrated abnormal periventricular T2 hyperintensities associated with and without ventriculomegaly. Mid-sagittal T1-weighted images were used to measure the thickness (genu, mid-body, and splenium) and length of the corpus callosum. Volumes of interest were digitized based on gray-scale densities to define the hemispheric cerebral white matter on axial T2-weighted and FLAIR images. The thickness of the mid-body of the corpus callosum was correlated with cerebral white matter volume. Subgroup analysis was also performed to examine the relationship of this correlation with both gestational age and neuromotor outcome. Statistical analysis was performed using analysis of variance and Pearson correlation coefficients. There was a positive correlation between the thickness of the mid-body of the corpus callosum and the volume of cerebral white matter across all children studied (R=0.665, P=0.0001). This correlation was not dependent on gestational age. The thickness of the mid-body of the corpus callosum was decreased in the spastic diplegia group compared to the two other groups (hypotonia and developmental delay only; P<0.0001). Within each neuromotor subgroup, there was a positive correlation between thickness of the mid-body of the corpus callosum and volume of the cerebral white matter. (orig.)
Quantitative analysis of the corpus callosum in children with cerebral palsy and developmental delay: correlation with cerebral white matter volume

Energy Technology Data Exchange (ETDEWEB)

Panigrahy, Ashok [Childrens Hospital Los Angeles, Department of Radiology, Los Angeles, CA (United States); Barnes, Patrick D. [Stanford University Medical Center, Department of Radiology, Lucile Salter Packard Children' s Hospital, Palo Alto, CA (United States); Robertson, Robert L. [Children' s Hospital Boston, Department of Radiology, Boston, MA (United States); Sleeper, Lynn A. [New England Research Institute, Watertown, MA (United States); Sayre, James W. [UCLA Medical Center, Departments of Radiology and Biostatistics, Los Angeles, CA (United States)

2005-12-01

This study was conducted to quantitatively correlate the thickness of the corpus callosum with the volume of cerebral white matter in children with cerebral palsy and developmental delay. Material and methods: A clinical database of 70 children with cerebral palsy and developmental delay was established with children between the ages of 1 and 5 years. These children also demonstrated abnormal periventricular T2 hyperintensities associated with and without ventriculomegaly. Mid-sagittal T1-weighted images were used to measure the thickness (genu, mid-body, and splenium) and length of the corpus callosum. Volumes of interest were digitized based on gray-scale densities to define the hemispheric cerebral white matter on axial T2-weighted and FLAIR images. The thickness of the mid-body of the corpus callosum was correlated with cerebral white matter volume. Subgroup analysis was also performed to examine the relationship of this correlation with both gestational age and neuromotor outcome. Statistical analysis was performed using analysis of variance and Pearson correlation coefficients. There was a positive correlation between the thickness of the mid-body of the corpus callosum and the volume of cerebral white matter across all children studied (R=0.665, P=0.0001). This correlation was not dependent on gestational age. The thickness of the mid-body of the corpus callosum was decreased in the spastic diplegia group compared to the two other groups (hypotonia and developmental delay only; P<0.0001). Within each neuromotor subgroup, there was a positive correlation between thickness of the mid-body of the corpus callosum and volume of the cerebral white matter. (orig.)
Development and Use of a Corpus Tailored for Legal English Learning

Science.gov (United States)

Skier, Jason; Vibulphol, Jutarat

2016-01-01

While corpus linguistics has been applied towards many specific academic purposes, reports are few regarding its use to facilitate learning of legal English by non-native English speakers. Specialized corpora are required because legal English often differs significantly from ordinary usage, with words such as bar, motion, and hearing having…
Automatic recognition of touch gestures in the corpus of social touch

NARCIS (Netherlands)

Jung, Merel Madeleine; Poel, Mannes; Poppe, Ronald Walter; Heylen, Dirk K.J.

For an artifact such as a robot or a virtual agent to respond appropriately to human social touch behavior, it should be able to automatically detect and recognize touch. This paper describes the data collection of CoST: Corpus of Social Touch, a data set containing 7805 captures of 14 different

Some links on this page may take you to non-federal websites. Their policies may differ from this site.