WorldWideScience

Sample records for arretes circulaires textes

  1. Statistiques circulaires et utilisations en psychologie

    Directory of Open Access Journals (Sweden)

    Catherine Mello

    2005-09-01

    Full Text Available Le chercheur en psychologie traitant des données angulaires ou cycliques est confronté aux problèmes de périodicité et de l'arbitraire du système de mesure que posent les statistiques circulaires. Les méthodes usuelles de calcul de paramètres comme la moyenne ne sont alors d'aucune utilité. Cette introduction aux statistiques circulaires présente des fonctions trigonométriques permettant le calcul des paramètres circulaires: centre de masse, concentration, dispersion et homeward component. Des distributions circulaires couramment utilisées ainsi que des méthodes d'inférence statistique mises au point pour des mesures circulaires pertinentes à la psychologie sont également décrites. Un exemple d'une expérience simple d'orientation illustre l'application des différents tests statistiques au moyen du logiciel Microsoft Excel.

  2. Antennes lecteurs RFID à polarisation circulaire pour application robotique

    OpenAIRE

    Hebib, Sami; Bouaziz, Sofiene; Aubert, Hervé; Lerasle, Frédéric

    2011-01-01

    National audience Dans ce papier, une nouvelle antenne lecteur RFID à polarisation circulaire a été développée pour la chaine de radiolocalisation du robot Rackham du LAAS-CNRS. Cette antenne (20 cm x 20 cm) permet de couvrir la totalité de bande RFID UHF (860-960 MHz) et présente un gain simulé de 4dBi. Deux exemplaires de cette antenne ont été fabriqués et mesurés. Les tests en radiolocalisation de ces antennes montrent leur conformité aux exigences de l'application robotique considérée....

  3. G8: trois Francais, auteurs presumes de violences à Genève, arretes mercredi

    CERN Multimedia

    2003-01-01

    "Trois Francais, soupconnes d'avoir participe a des violences survenues a Geneve en marge du sommet du G8 en juin dernier, ont ete arretes mercredi a Geneve, apres avoir reconnu les faits qui leur sont reproches, a annonce la police locale" (1/2 page).

  4. La nouvelle circulaire adhérence de la Direction des routes nationales de France

    OpenAIRE

    Dupont, P.; BAUDUIN, A

    2005-01-01

    La politique de la maîtrise d'ouvrage nationale française en matière d'adhérence est présentée. Les différentes circulaires publiées et les raisons principales de leurs remplacements successifs sont rappelées. La dernière circulaire, publiée en 2002, est présentée en détail. Elle résulte des travaux d'un sous-groupe de travail du Groupe national des caractéristiques De surface (GNCDS), créé par le Directeur des routes de France en 1991. Elle définit des spécifications en profondeur moyenne de...

  5. Chances for a circular economy in the Netherlands; Kansen voor de circulaire economie in Nederland

    Energy Technology Data Exchange (ETDEWEB)

    Bastein, T.; Roelofs, E.; Rietveld, E.; Hoogendoorn, A.

    2013-06-15

    The concept of circular economy is an economic and industrial system that focuses on the reusability of products and raw materials, reduces value destruction in the overall system and aims at value creation within each tier of the system. In this report the (economic) opportunities are quantified as much as possible, and impacts on employment and the environmental are addressed. The study focuses specifically on the Dutch economy. The analysis starts by means of two detailed case studies: the use of biomass wastes and the circular economy that may arise in the metal-electronics industry [Dutch] Het begrip 'circulaire economie' is een economisch en industrieel systeem dat zich richt op de herbruikbaarheid van producten en grondstoffen, waarde vernietiging in het totale systeem minimaliseert en waarde creatie in iedere schakel van het systeem nastreeft. In dit rapport worden de (economische) kansen zoveel mogelijk gekwantificeerd, waarbij effecten op werkgelegenheid en milieudruk aan bod komen. De studie richt zich nadrukkelijk op de gehele Nederlandse economie. De analyse start aan de hand van twee gedetailleerde case studies: de benutting van reststromen uit biomassa en de circulaire economie die kan ontstaan t.b.v. producten uit de metaalelektro-sector.

  6. Collection of regulatory texts relative to radiation protection. Part 2: orders and decisions taken in application of the Public Health Code and Labour Code concerning the protection of populations, patients and workers against the risks of ionizing radiations; Recueil de textes reglementaires relatifs a la radioprotection. Partie 2: arretes et decisions pris en application du Code de Sante Publique et du Code du Travail concernant la protection de la population, des patients et des travailleurs contre les dangers des rayonnements ionisants

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2007-05-15

    This collection of texts includes the general measures of population protection, exposure to natural radiations, general system of authorizations and statements, protection of persons exposed to ionizing radiations for medical purpose, situations of radiological emergency and long exposure to ionizing radiations, penal dispositions, application of the Public Health code and application of the Labour code. Chronological contents by date of publication is given. (N.C.)

  7. J.O. no. 7, text no. 18. Decree, orders, general text. Decree no. 2004-25 of the 8 january 2004 allowing the ''Commissariat a l'Energie Atomique'' to modify the nuclear installation no. 35 (INB no. 35) named radioactive liquid effluents management area of the nuclear research center of Saclay (Essonne); J.O. no. 7, texte no. 18. Decrets, arretes, circulaires, textes generaux. Decret no. 2004-25 du 8 janvier 2004 autorisant le Commissariat a l'energie atomique a modifier l'installation nucleaire de base no. 35 (INB no.35) denommee zone de gestion des effluents liquides radioactifs du centre d'etudes nucleaires de Saclay (Essonne)

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2004-01-01

    The radioactive liquid effluents management area, no. 35, has been declared by the Cea the 27 may 1964. The Cea asked for a modification authorization of this installation, the 6 june 2001. The new installation Stella will be operational on 2005-2006. This decree defines the operating conditions of the new installation. (A.L.B.)

  8. Boucliers Circulaires de l'Orient Musulman (Évolution et utilisation

    Directory of Open Access Journals (Sweden)

    Kalus, Ludvik

    1974-12-01

    Full Text Available LE bouclier, le plus simple et le plus ancien des armes défensives des guerriers solitaires et des soldats de l'infanterie ou de la cavalerie, fut utilisé chez presque tous les peuples à un certain stade de leur développement et il ne disparu qu'avec l'introduction des armes modernes (armes à poudre, armes chimiques et nucléaires. En suivant l'histoire des armes de tous les peuples du monde, nous remarquons des formes et des dimensions de boucliers très différentes, conditionnées par la mobilité de l'armée dans laquelle ils étaient utilisés, par le caractère des armes contre lesquelles ils devaient servir comme moyen de défense, par le poids du matériau de leur base et sans doute par les traditions du milieu où ils étaient utilisés.

  9. Combat desertification, arret deforestation

    International Nuclear Information System (INIS)

    This article presents the major progress on the actions of the Forest Department and Dry Zone Greening Department to arrest forestation and to combat desertification in the dry zone of central Myanmar

  10. Text Mining.

    Science.gov (United States)

    Trybula, Walter J.

    1999-01-01

    Reviews the state of research in text mining, focusing on newer developments. The intent is to describe the disparate investigations currently included under the term text mining and provide a cohesive structure for these efforts. A summary of research identifies key organizations responsible for pushing the development of text mining. A section…

  11. Text Laws

    Czech Academy of Sciences Publication Activity Database

    Hřebíček, Luděk

    Vol. 26. Ein internationales Handbuch/An International Handbook. Berlin-New York : Walter de Gruyter, 2005 - (Köhler, R.; Altmann, G.; Piotrowski, R.), s. 348-361 ISBN 978-3-11-015578-5 Institutional research plan: CEZ:AV0Z90210515 Keywords : Text structure * Quantitative linguistics Subject RIV: AI - Linguistics

  12. Circular from January 26, 2004, taken for the enforcement of the by-law from January 26, 2004, relative to the national defense secrecy protection in the domain of nuclear materials protection and control; Circulaire du 26 janvier 2004 prise pour l'application de l'arrete du 26 janvier 2004 relatif a la protection du secret de la defense nationale dans le domaine de la protection et du controle des matieres nucleaires

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2004-01-15

    The by-law of January 26, 2004 gives a regulatory foundation to the classification of sensible informations relative to the security and physical protection of nuclear materials. This circular recalls, in this framework, the conditions of implementation of the regulation relative to the protection of national defense secrecies in the domain of the protection of nuclear facilities and materials. (J.S.)

  13. Contextual Text Mining

    Science.gov (United States)

    Mei, Qiaozhu

    2009-01-01

    With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the…

  14. Effective Classification of Text

    OpenAIRE

    A Saritha; N NaveenKumar

    2014-01-01

    Text mining is the process of obtaining useful and interesting information from text. Huge amount of text data is available in the form of various formats. Most of it is unstructured.Text mining usually involves the process of structuring the input text which involves parsing it, structuring it by inserting results into a database, deriving patterns from the structured data, and finally evaluation and interpretation of the output. There are several data mining techniques proposed for mi...

  15. Text Coherence in Translation

    OpenAIRE

    Yanping Zheng

    2009-01-01

    In the thesis a coherent text is defined as a continuity of senses of the outcome of combining concepts and relations into a network composed of knowledge space centered around main topics. And the author maintains that in order to obtain the coherence of a target language text from a source text during the process of translation, a translator can utilize the following approaches: retention of the continuity of senses of a text; reconstruction of the target text for the purpose of continuity;...

  16. Quality text editing

    Directory of Open Access Journals (Sweden)

    Gyöngyi Bujdosó

    2009-10-01

    Full Text Available Text editing is more than the knowledge of word processing techniques. Originally typographers, printers, text editors were the ones qualified to edit texts, which were well structured, legible, easily understandable, clear, and were able to emphasize the coreof the text. Time has changed, and nowadays everyone has access to computers as well as to text editing software and most users believe that having these tools is enough to edit texts. However, text editing requires more skills. Texts appearing either in printed or inelectronic form reveal that most of the users do not realize that they are not qualified to edit and publish their works. Analyzing the ‘text-products’ of the last decade a tendency can clearly be drawn. More and more documents appear, which instead of emphasizingthe subject matter, are lost in the maze of unstructured text slices. Without further thoughts different font types, colors, sizes, strange arrangements of objects, etc. are applied. We present examples with the most common typographic and text editing errors. Our aim is to call the attention to these mistakes and persuadeusers to spend time to educate themselves in text editing. They have to realize that a well-structured text is able to strengthen the effect on the reader, thus the original message will reach the target group.

  17. Text Mining: (Asynchronous Sequences

    Directory of Open Access Journals (Sweden)

    Sheema Khan

    2014-12-01

    Full Text Available In this paper we tried to correlate text sequences those provides common topics for semantic clues. We propose a two step method for asynchronous text mining. Step one check for the common topics in the sequences and isolates these with their timestamps. Step two takes the topic and tries to give the timestamp of the text document. After multiple repetitions of step two, we could give optimum result.

  18. Questioning the Text.

    Science.gov (United States)

    Harvey, Stephanie

    2001-01-01

    One way teachers can improve students' reading comprehension is to teach them to think while reading, questioning the text and carrying on an inner conversation. This involves: choosing the text for questioning; introducing the strategy to the class; modeling thinking aloud and marking the text with stick-on notes; and allowing time for guided…

  19. Text Coherence in Translation

    Science.gov (United States)

    Zheng, Yanping

    2009-01-01

    In the thesis a coherent text is defined as a continuity of senses of the outcome of combining concepts and relations into a network composed of knowledge space centered around main topics. And the author maintains that in order to obtain the coherence of a target language text from a source text during the process of translation, a translator can…

  20. Arabic Short Text Compression

    Directory of Open Access Journals (Sweden)

    Eman Omer

    2010-01-01

    Full Text Available Problem statement: Text compression permits representing a document by using less space. This is useful not only to save disk space, but more importantly, to save disk transfer and network transmission time. With the continues increase in the number of Arabic short text messages sent by mobile phones, the use of a suitable compression scheme would allow users to use more characters than the default value specified by the provider. The development of an efficient compression scheme to compress short Arabic texts is not a straight forward task. Approach: This study combined the benefits of pre-processing, entropy reduction through splitting files and hybrid dynamic coding: A new technique proposed in this study that uses the fact that Arabic texts have single case letters. Experimental tests had been performed on short Arabic texts and a comparison with the well known plain Huffman compression was made to measure the performance of the proposed schema for Arabic short text. Results: The proposed schema can achieve a compression ratio around 4.6 bits byte-1 for very short Arabic text sequences of 15 bytes and around 4 bits byte-1 for 50 bytes text sequences, using only 8 Kbytes overhead of memory. Conclusion: Furthermore, a reasonable compression ratio can be achieved using less than 0.4 KB of memory overhead. We recommended the use of proposed schema to compress small Arabic text with recourses limited.

  1. Vocabulary Constraint on Texts

    Directory of Open Access Journals (Sweden)

    C. Sutarsyah

    2008-01-01

    Full Text Available This case study was carried out in the English Education Department of State University of Malang. The aim of the study was to identify and describe the vocabulary in the reading text and to seek if the text is useful for reading skill development. A descriptive qualitative design was applied to obtain the data. For this purpose, some available computer programs were used to find the description of vocabulary in the texts. It was found that the 20 texts containing 7,945 words are dominated by low frequency words which account for 16.97% of the words in the texts. The high frequency words occurring in the texts were dominated by function words. In the case of word levels, it was found that the texts have very limited number of words from GSL (General Service List of English Words (West, 1953. The proportion of the first 1,000 words of GSL only accounts for 44.6%. The data also show that the texts contain too large proportion of words which are not in the three levels (the first 2,000 and UWL. These words account for 26.44% of the running words in the texts.  It is believed that the constraints are due to the selection of the texts which are made of a series of short-unrelated texts. This kind of text is subject to the accumulation of low frequency words especially those of content words and limited of words from GSL. It could also defeat the development of students' reading skills and vocabulary enrichment.

  2. EMOTION DETECTION FROM TEXT

    Directory of Open Access Journals (Sweden)

    Shiv Naresh Shivhare

    2012-05-01

    Full Text Available Emotion can be expressed in many ways that can be seen such as facial expression and gestures, speech and by written text. Emotion Detection in text documents is essentially a content – based classification problem involving concepts from the domains of Natural Language Processing as well as Machine Learning. In this paper emotion recognition based on textual data and the techniques used in emotion detection are discussed.

  3. Automatic Arabic Text Classification

    OpenAIRE

    Al-harbi, S; Almuhareb, A.; Al-Thubaity , A; Khorsheed, M. S.; Al-Rajeh, A.

    2008-01-01

    Automated document classification is an important text mining task especially with the rapid growth of the number of online documents present in Arabic language. Text classification aims to automatically assign the text to a predefined category based on linguistic features. Such a process has different useful applications including, but not restricted to, e-mail spam detection, web page content filtering, and automatic message routing. This paper presents the results of experiments on documen...

  4. Planning Argumentative Texts

    CERN Document Server

    Huang, X

    1994-01-01

    This paper presents \\proverb\\, a text planner for argumentative texts. \\proverb\\'s main feature is that it combines global hierarchical planning and unplanned organization of text with respect to local derivation relations in a complementary way. The former splits the task of presenting a particular proof into subtasks of presenting subproofs. The latter simulates how the next intermediate conclusion to be presented is chosen under the guidance of the local focus.

  5. Text Summarizing In Polish

    Directory of Open Access Journals (Sweden)

    Emilia Branny

    2005-01-01

    Full Text Available The aim of this article is to describe an existing implementation of a text summarizer forPolish, to analyze the results and propose the possibilities of further development. Theproblem of text summarizing has been already addressed by science but until now there hasbeen no implementation designed for Polish. The implemented algorithm is based on existingdevelopments in the field but it also includes some improvements. It has been optimized fornewspaper texts ranging from approx. 10 to 50 sentences. Evaluation has shown that it worksbetter than known generic summarization tools when applied to Polish.

  6. Instant Sublime Text starter

    CERN Document Server

    Haughee, Eric

    2013-01-01

    A starter which teaches the basic tasks to be performed with Sublime Text with the necessary practical examples and screenshots. This book requires only basic knowledge of the Internet and basic familiarity with any one of the three major operating systems, Windows, Linux, or Mac OS X. However, as Sublime Text 2 is primarily a text editor for writing software, many of the topics discussed will be specifically relevant to software development. That being said, the Sublime Text 2 Starter is also suitable for someone without a programming background who may be looking to learn one of the tools of

  7. Mining text data

    CERN Document Server

    Aggarwal, Charu C

    2012-01-01

    Text mining applications have experienced tremendous advances because of web 2.0 and social networking applications. Recent advances in hardware and software technology have lead to a number of unique scenarios where text mining algorithms are learned. ""Mining Text Data"" introduces an important niche in the text analytics field, and is an edited volume contributed by leading international researchers and practitioners focused on social networks & data mining. This book contains a wide swath in topics across social networks & data mining. Each chapter contains a comprehensive survey including

  8. Systematic text condensation

    DEFF Research Database (Denmark)

    Malterud, Kirsti

    2012-01-01

    To present background, principles, and procedures for a strategy for qualitative analysis called systematic text condensation and discuss this approach compared with related strategies.......To present background, principles, and procedures for a strategy for qualitative analysis called systematic text condensation and discuss this approach compared with related strategies....

  9. Linguistics in Text Interpretation

    DEFF Research Database (Denmark)

    Togeby, Ole

    A model for how text interpretation proceeds from what is pronounced, through what is said to what is comunicated, and definition of the concepts 'presupposition' and 'implicature'.......A model for how text interpretation proceeds from what is pronounced, through what is said to what is comunicated, and definition of the concepts 'presupposition' and 'implicature'....

  10. Clustering Text Data Streams

    Institute of Scientific and Technical Information of China (English)

    Yu-Bao Liu; Jia-Rong Cai; Jian Yin; Ada Wai-Chee Fu

    2008-01-01

    Clustering text data streams is an important issue in data mining community and has a number of applications such as news group filtering, text crawling, document organization and topic detection and tracing etc. However, most methods are similarity-based approaches and only use the TF*IDF scheme to represent the semantics of text data and often lead to poor clustering quality. Recently, researchers argue that semantic smoothing model is more efficient than the existing TF.IDF scheme for improving text clustering quality. However, the existing semantic smoothing model is not suitable for dynamic text data context. In this paper, we extend the semantic smoothing model into text data streams context firstly. Based on the extended model, we then present two online clustering algorithms OCTS and OCTSM for the clustering of massive text data streams. In both algorithms, we also present a new cluster statistics structure named cluster profile which can capture the semantics of text data streams dynamically and at the same time speed up the clustering process. Some efficient implementations for our algorithms are also given. Finally, we present a series of experimental results illustrating the effectiveness of our technique.

  11. Making Sense of Texts

    Science.gov (United States)

    Harper, Rebecca G.

    2014-01-01

    This article addresses the triadic nature regarding meaning construction of texts. Grounded in Rosenblatt's (1995; 1998; 2004) Transactional Theory, research conducted in an undergraduate Language Arts curriculum course revealed that when presented with unfamiliar texts, students used prior experiences, social interactions, and literary…

  12. Centroid Based Text Clustering

    Directory of Open Access Journals (Sweden)

    Priti Maheshwari

    2010-09-01

    Full Text Available Web mining is a burgeoning new field that attempts to glean meaningful information from natural language text. Web mining refers generally to the process of extracting interesting information and knowledge from unstructured text. Text clustering is one of the important Web mining functionalities. Text clustering is the task in which texts are classified into groups of similar objects based on their contents. Current research in the area of Web mining is tacklesproblems of text data representation, classification, clustering, information extraction or the search for and modeling of hidden patterns. In this paper we propose for mining large document collections it is necessary to pre-process the web documents and store the information in a data structure, which is more appropriate for further processing than a plain web file. In this paper we developed a php-mySql based utility to convert unstructured web documents into structured tabular representation by preprocessing, indexing .We apply centroid based web clustering method on preprocessed data. We apply three methods for clustering. Finally we proposed a method that can increase accuracy based on clustering ofdocuments.

  13. EMOTION DETECTION FROM TEXT

    OpenAIRE

    Shiv Naresh Shivhare; Saritha Khethawat

    2012-01-01

    Emotion can be expressed in many ways that can be seen such as facial expression and gestures, speech and by written text. Emotion Detection in text documents is essentially a content – based classification problem involving concepts from the domains of Natural Language Processing as well as Machine Learning. In this paper emotion recognition based on textual data and the techniques used in emotion detection are discussed.

  14. Texts of Television Advertisements

    OpenAIRE

    Michalewski, Kazimierz

    1995-01-01

    Short advertisement films occupy a large part (especially around the peak viewing hours) of everyday programmes of the Polish stale television. Even though it is possible to imagine an advertisement film employing only extralinguistic means of communication, the advertisements in generał, have so far been using written and spoken texts. The basic function of such a text and of the whole film is to encourage the viewers to buy the advertised product. However, independently of th...

  15. Emotion Detection from Text

    CERN Document Server

    Shivhare, Shiv Naresh

    2012-01-01

    Emotion can be expressed in many ways that can be seen such as facial expression and gestures, speech and by written text. Emotion Detection in text documents is essentially a content - based classification problem involving concepts from the domains of Natural Language Processing as well as Machine Learning. In this paper emotion recognition based on textual data and the techniques used in emotion detection are discussed.

  16. Text simplification for children

    OpenAIRE

    De Belder, Jan; Moens, Marie-Francine

    2010-01-01

    The goal in this paper is to automatically transform text into a simpler text, so that it is easier to understand by children. We perform syntactic simplification, i.e. the splitting of sentences, and lexical simplification, i.e. replacing difficult words with easier synonyms. We test the performance of this approach for each component separately on a per sentence basis, and globally with the automatic construction of simplified news articles and encyclopedia articles. By including informatio...

  17. Polyglotte Texte : Einleitung

    OpenAIRE

    Zemanek, Evi; Willms, Weertje

    2014-01-01

    Ist von Polyglossie oder Multilingualität die Rede, so kann damit Verschiedenes gemeint sein: Erstens die literarische Mehrsprachigkeit einzelner Autoren oder Kulturgemeinschaften, die in verschiedenen Sprachen kommunizieren und Texte verfassen – ohne dass ein und derselbe "Text" notwendig mehrsprachig sein muss. Dabei handelt es sich um ein traditionsreiches Phänomen: Man denke nur an das jahrhundertelange Nebeneinander von Volkssprache und Latein in mehreren europäischen Kulturen zwischen S...

  18. Reading Authentic Texts

    DEFF Research Database (Denmark)

    Balling, Laura Winther

    2013-01-01

    Most research on cognates has focused on words presented in isolation that are easily defined as cognate between L1 and L2. In contrast, this study investigates what counts as cognate in authentic texts and how such cognates are read. Participants with L1 Danish read news articles in their highly...

  19. Reading Authorship into Texts.

    Science.gov (United States)

    Werner, Walter

    2000-01-01

    Provides eight concepts, with illustrative questions for interpreting the authorship of texts, that are borrowed from cultural studies literature: (1) representation; (2) the gaze; (3) voice; (4) intertextuality; (5) absence; (6) authority; (7) mediation; and (8) reflexivity. States that examples were taken from British Columbia's (Canada) social…

  20. 26. Text laws

    Czech Academy of Sciences Publication Activity Database

    Hřebíček, Luděk

    Vol. 26. Ein internationales Handbuch/An International Handbook. Berlin-New York : Walter de Gruyter, 2005 - (Köhler, R.; Altmann, G.; Piotrowski, R.), s. 348-361 ISBN 978-3-11-015578-5 Institutional research plan: CEZ:AV0Z9021901 Keywords : Text structure * Quantitative linguistics Subject RIV: AI - Linguistics

  1. EAL studying texts

    CERN Document Server

    Napthin, Melanie

    2013-01-01

    EAL: Studying texts has been developed out of Insight's best-selling ESL English for Year 12, which has helped thousands of ESL/ EAL students to achieve top marks. Offering comprehensive coverage of Area of Study 1: Reading and responding in VCE English, the book takes a highly practical approach that builds students' skills progressively.

  2. Text Induced Spelling Correction

    NARCIS (Netherlands)

    Reynaert, M.W.C.

    2004-01-01

    We present TISC, a language-independent and context-sensitive spelling checking and correction system designed to facilitate the automatic removal of non-word spelling errors in large corpora. Its lexicon is derived from a very large corpus of raw text, without supervision, and contains word unigram

  3. Texts in the landscape

    Directory of Open Access Journals (Sweden)

    James Graham-Campbell

    1998-11-01

    Full Text Available The Institute's members of UCL's "Celtic Inscribed Stones" project describe, in collaboration with Wendy Davies, Mark Handley and Paul Kershaw (Department of History, a major interdisciplinary study of inscriptions of the early middle ages from the Celtic areas of northwest Europe.

  4. About CABI Full Text

    Institute of Scientific and Technical Information of China (English)

    2013-01-01

    <正>Centre for Agriculture and Bioscience International(CABI) is a not-for-profit international Agricultural Information Institute with headquarters in Britain. It aims to improve people’s lives by providing information and applying scientific expertise to solve problems in agriculture and the environment. CABI Full-text is one of the publishing products of CABI.

  5. Text Mining for Neuroscience

    Science.gov (United States)

    Tirupattur, Naveen; Lapish, Christopher C.; Mukhopadhyay, Snehasis

    2011-06-01

    Text mining, sometimes alternately referred to as text analytics, refers to the process of extracting high-quality knowledge from the analysis of textual data. Text mining has wide variety of applications in areas such as biomedical science, news analysis, and homeland security. In this paper, we describe an approach and some relatively small-scale experiments which apply text mining to neuroscience research literature to find novel associations among a diverse set of entities. Neuroscience is a discipline which encompasses an exceptionally wide range of experimental approaches and rapidly growing interest. This combination results in an overwhelmingly large and often diffuse literature which makes a comprehensive synthesis difficult. Understanding the relations or associations among the entities appearing in the literature not only improves the researchers current understanding of recent advances in their field, but also provides an important computational tool to formulate novel hypotheses and thereby assist in scientific discoveries. We describe a methodology to automatically mine the literature and form novel associations through direct analysis of published texts. The method first retrieves a set of documents from databases such as PubMed using a set of relevant domain terms. In the current study these terms yielded a set of documents ranging from 160,909 to 367,214 documents. Each document is then represented in a numerical vector form from which an Association Graph is computed which represents relationships between all pairs of domain terms, based on co-occurrence. Association graphs can then be subjected to various graph theoretic algorithms such as transitive closure and cycle (circuit) detection to derive additional information, and can also be visually presented to a human researcher for understanding. In this paper, we present three relatively small-scale problem-specific case studies to demonstrate that such an approach is very successful in

  6. Reading Text While Driving

    OpenAIRE

    Liang, Yulan; Horrey, William J.; Hoffman, Joshua D.

    2015-01-01

    Objective In this study, we investigated how drivers adapt secondary-task initiation and time-sharing behavior when faced with fluctuating driving demands. Background Reading text while driving is particularly detrimental; however, in real-world driving, drivers actively decide when to perform the task. Method In a test track experiment, participants were free to decide when to read messages while driving along a straight road consisting of an area with increased driving demands (demand zone)...

  7. Toponym Resolution in Text

    OpenAIRE

    Leidner, Jochen Lothar

    2007-01-01

    Background. In the area of Geographic Information Systems (GIS), a shared discipline between informatics and geography, the term geo-parsing is used to describe the process of identifying names in text, which in computational linguistics is known as named entity recognition and classification (NERC). The term geo-coding is used for the task of mapping from implicitly geo-referenced datasets (such as structured address records) to explicitly geo-referenced representations (e.g.,...

  8. Text classification method review

    OpenAIRE

    Mahinovs, Aigars; Tiwari, Ashutosh; Roy, Rajkumar; Baxter, David

    2007-01-01

    With the explosion of information fuelled by the growth of the World Wide Web it is no longer feasible for a human observer to understand all the data coming in or even classify it into categories. With this growth of information and simultaneous growth of available computing power automatic classification of data, particularly textual data, gains increasingly high importance. This paper provides a review of generic text classification process, phases of that process and met...

  9. Text a jeho ilustrace

    OpenAIRE

    SNÁŠELOVÁ, Karolína

    2015-01-01

    The thesis deals with question of linking visual representation of a literary work of art. It focuses primarily on the genre of book illustration and the question of the relationship between verbal and visual components literary work, resp. possibilities and limits of the language transformation and visual representation. Theoretical explanation accompanies the analysis of several illustrations of specific works of Czech literature in their relation to visual text.

  10. Text and Music Revisited

    OpenAIRE

    Fornäs, Johan

    1997-01-01

    Are words and music two separate symbolic modes, or rather variants of the same human symbolic practice? Are they parallel, opposing or over­lap­ping? What do they have in common and how does each of them exceed the other? Is music perhaps incomparably dif­fer­ent from words, or even their anti-verbal Other? Distinctions between text (in the verbal sense of units of words rather than in the wide sense of symbolic webs in general) and music are regularly made – but also prob­lem­atized – withi...

  11. Weaving with text

    DEFF Research Database (Denmark)

    Hagedorn-Rasmussen, Peter

    This paper explores how a school principal by means of practical authorship creates reservoirs of language that provide a possible context for collective sensemaking. The paper draws upon a field study in which a school principal, and his managerial team, was shadowed in a period of intensive cha...... changes. The paper explores how the manager weaves with text, extracted from stakeholders, administration, politicians, employees, public discourse etc., as a means of creating a new fabric, a texture, of diverse perspectives that aims for collective sensemaking....

  12. Weitere Texte physiognomischen Inhalts

    Directory of Open Access Journals (Sweden)

    Böck, Barbara

    2004-12-01

    Full Text Available The present article offers the edition of three cuneiform texts belonging to the Akkadian handbook of omens drawn from the physical appearance as well as the morals and behaviour of man. The book comprising up to 27 chapters with more than 100 omens each was entitled in antiquity Alamdimmû. The edition of the three cuneiform tablets completes, thus, the author's monographic study on the ancient Mesopotamian divinatory discipline of physiognomy (Die babylonisch-assyrische Morphoskopie (Wien 2000 [=AfO Beih. 27].

    En este artículo se presenta la editio princeps de tres textos cuneiformes conservados en el British Museum (Londres y el Vorderasiatisches Museum (Berlín, que pertenecen al libro asirio-babilonio de presagios fisiognómicos. Este libro, titulado originalmente Alamdimmû ('forma, figura', consta de 27 capítulos, cada uno con más de cien presagios escritos en lengua acadia. Los tres textos completan así el estudio monográfico de la autora sobre la disciplina adivinatoria de la fisiognomía en el antiguo Oriente (Die babylonisch-assyrische Morphoskopie (Wien 2000 [=AfO Beih. 27].

  13. Documents and legal texts

    International Nuclear Information System (INIS)

    This section reprints the text of the two following laws: 1 - United Arab Emirates: Federal Law by Decree No. 4 of 2012 concerning civil liability for nuclear damage; India: The Civil Liability for Nuclear Damage Act, 2010, No. 38 of 2010, 21 September 2010 (An Act to provide for civil liability for Nuclear Damage and prompt compensation to the victims of a Nuclear accident through a No Fault Liability Regime channeling liability to the operator, appointment of Claims Commissioner, establishment of Nuclear Damage Claims commission and for matters connected therewith or incidental thereto); 2 - Republic of Moldova - Parliament: Law No. 132 of 08.06.2012 on the safe conduct of nuclear and radiological activities (Published: 02.11.2012 in the Official Gazette No. 229-233 art. no: 739). the purpose of this law is to regulate nuclear and radiological activities in accordance with the international requirements in this field arising out of several treaties, conventions and directives

  14. Documents and legal texts

    International Nuclear Information System (INIS)

    This section reprints a selection of recently published legislative texts and documents: - Russian Federation: Federal Law No.170 of 21 November 1995 on the use of atomic energy, Adopted by the State Duma on 20 October 1995; - Uruguay: Law No.19.056 On the Radiological Protection and Safety of Persons, Property and the Environment (4 January 2013); - Japan: Third Supplement to Interim Guidelines on Determination of the Scope of Nuclear Damage resulting from the Accident at the Tokyo Electric Power Company Fukushima Daiichi and Daini Nuclear Power Plants (concerning Damages related to Rumour-Related Damage in the Agriculture, Forestry, Fishery and Food Industries), 30 January 2013; - France and the United States: Joint Statement on Liability for Nuclear Damage (Aug 2013); - Franco-Russian Nuclear Power Declaration (1 November 2013)

  15. Interconnectedness und digitale Texte

    Directory of Open Access Journals (Sweden)

    Detlev Doherr

    2013-04-01

    Full Text Available Zusammenfassung Die multimedialen Informationsdienste im Internet werden immer umfangreicher und umfassender, wobei auch die nur in gedruckter Form vorliegenden Dokumente von den Bibliotheken digitalisiert und ins Netz gestellt werden. Über Online-Dokumentenverwaltungen oder Suchmaschinen können diese Dokumente gefunden und dann in gängigen Formaten wie z.B. PDF bereitgestellt werden. Dieser Artikel beleuchtet die Funktionsweise der Humboldt Digital Library, die seit mehr als zehn Jahren Dokumente von Alexander von Humboldt in englischer Übersetzung im Web als HDL (Humboldt Digital Library kostenfrei zur Verfügung stellt. Anders als eine digitale Bibliothek werden dabei allerdings nicht nur digitalisierte Dokumente als Scan oder PDF bereitgestellt, sondern der Text als solcher und in vernetzter Form verfügbar gemacht. Das System gleicht damit eher einem Informationssystem als einer digitalen Bibliothek, was sich auch in den verfügbaren Funktionen zur Auffindung von Texten in unterschiedlichen Versionen und Übersetzungen, Vergleichen von Absätzen verschiedener Dokumente oder der Darstellung von Bilden in ihrem Kontext widerspiegelt. Die Entwicklung von dynamischen Hyperlinks auf der Basis der einzelnen Textabsätze der Humboldt‘schen Werke in Form von Media Assets ermöglicht eine Nutzung der Programmierschnittstelle von Google Maps zur geographischen wie auch textinhaltlichen Navigation. Über den Service einer digitalen Bibliothek hinausgehend, bietet die HDL den Prototypen eines mehrdimensionalen Informationssystems, das mit dynamischen Strukturen arbeitet und umfangreiche thematische Auswertungen und Vergleiche ermöglicht. Summary The multimedia information services on Internet are becoming more and more comprehensive, even the printed documents are digitized and republished as digital Web documents by the libraries. Those digital files can be found by search engines or management tools and provided as files in usual formats as

  16. Documents and legal texts

    International Nuclear Information System (INIS)

    This section treats of the following Documents and legal texts: 1 - Canada: Nuclear Liability and Compensation Act (An Act respecting civil liability and compensation for damage in case of a nuclear incident, repealing the Nuclear Liability Act and making consequential amendments to other acts); 2 - Japan: Act on Compensation for Nuclear Damage (The purpose of this act is to protect persons suffering from nuclear damage and to contribute to the sound development of the nuclear industry by establishing a basic system regarding compensation in case of nuclear damage caused by reactor operation etc.); Act on Indemnity Agreements for Compensation of Nuclear Damage; 3 - Slovak Republic: Act on Civil Liability for Nuclear Damage and on its Financial Coverage and on Changes and Amendments to Certain Laws (This Act regulates: a) The civil liability for nuclear damage incurred in the causation of a nuclear incident, b) The scope of powers of the Nuclear Regulatory Authority (hereinafter only as the 'Authority') in relation to the application of this Act, c) The competence of the National Bank of Slovakia in relation to the supervised financial market entities in the financial coverage of liability for nuclear damage; and d) The penalties for violation of this Act)

  17. Text

    International Nuclear Information System (INIS)

    The purpose of this act is to safeguard against the dangers and harmful effects of radioactive waste and to contribute to public safety and environmental protection by laying down requirements for the safe and efficient management of radioactive waste. We will find definitions, interrelation with other legislation, responsibilities of the state and local governments, responsibilities of radioactive waste management companies and generators, formulation of the basic plan for the control of radioactive waste, radioactive waste management ( with public information, financing and part of spent fuel management), Korea radioactive waste management corporation ( business activities, budget), establishment of a radioactive waste fund in order to secure the financial resources required for radioactive waste management, and penalties in case of improper operation of radioactive waste management. (N.C.)

  18. A Survey on Web Text Information Retrieval in Text Mining

    OpenAIRE

    Tapaswini Nayak; Srinivash Prasad; Manas Ranjan Senapat

    2015-01-01

    In this study we have analyzed different techniques for information retrieval in text mining. The aim of the study is to identify web text information retrieval. Text mining almost alike to analytics, which is a process of deriving high quality information from text. High quality information is typically derived in the course of the devising of patterns and trends through means such as statistical pattern learning. Typical text mining tasks include text categorization, text clustering, concep...

  19. Circular letter from January 22, 2004 to the presidents of companies having the status of chartered storage facility; Lettre circulaire du 22 janvier 2004 a Messieurs les presidents de societes titulaires du statut d'entrepositaire agree

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2004-07-01

    This circular letter is intended for owners of storage facilities for petroleum products benefiting from the obligation of strategic storage according to the article 2 of law no 92-1443 from December 31, 1992. The attached document recalls the reasons and content of this obligation, the prevailing strategic storage rules in France (reference texts, products in concern, operators, stockpiles localization, product substitution possibilities..), the monthly declarations, the controls and sanctions, the annual plan of stocks localization, the obligation of information, the loss of chartered status or the renouncement. A schematic synthesis of the system of stockpiles constitution is presented in appendix, for France and for the French overseas departements. The other appendixes concern: the list of petroleum products concerned by the legal obligation of strategic storage, the relations between the professional committee of strategic stockpiles (CPSSP) and the anonymous society of security stocks management (SAGESS), and some examples of monthly and annual declaration forms. (J.S.)

  20. Classroom Texting in College Students

    Science.gov (United States)

    Pettijohn, Terry F.; Frazier, Erik; Rieser, Elizabeth; Vaughn, Nicholas; Hupp-Wilds, Bobbi

    2015-01-01

    A 21-item survey on texting in the classroom was given to 235 college students. Overall, 99.6% of students owned a cellphone and 98% texted daily. Of the 138 students who texted in the classroom, most texted friends or significant others, and indicate the reason for classroom texting is boredom or work. Students who texted sent a mean of 12.21…

  1. Short Text Classification: A Survey

    Directory of Open Access Journals (Sweden)

    Ge Song

    2014-05-01

    Full Text Available With the recent explosive growth of e-commerce and online communication, a new genre of text, short text, has been extensively applied in many areas. So many researches focus on short text mining. It is a challenge to classify the short text owing to its natural characters, such as sparseness, large-scale, immediacy, non-standardization. It is difficult for traditional methods to deal with short text classification mainly because too limited words in short text cannot represent the feature space and the relationship between words and documents. Several researches and reviews on text classification are shown in recent times. However, only a few of researches focus on short text classification. This paper discusses the characters of short text and the difficulty of short text classification. Then we introduce the existing popular works on short text classifiers and models, including short text classification using sematic analysis, semi-supervised short text classification, ensemble short text classification, and real-time classification. The evaluations of short text classification are analyzed in our paper. Finally we summarize the existing classification technology and prospect for development trend of short text classification

  2. Mining the Text: 34 Text Features that Can Ease or Obstruct Text Comprehension and Use

    Science.gov (United States)

    White, Sheida

    2012-01-01

    This article presents 34 characteristics of texts and tasks ("text features") that can make continuous (prose), noncontinuous (document), and quantitative texts easier or more difficult for adolescents and adults to comprehend and use. The text features were identified by examining the assessment tasks and associated texts in the national…

  3. Text-Attentional Convolutional Neural Network for Scene Text Detection.

    Science.gov (United States)

    He, Tong; Huang, Weilin; Qiao, Yu; Yao, Jian

    2016-06-01

    Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images. They extract a high-level feature globally computed from a whole image component (patch), where the cluttered background information may dominate true text features in the deep representation. This leads to less discriminative power and poorer robustness. In this paper, we present a new system for scene text detection by proposing a novel text-attentional convolutional neural network (Text-CNN) that particularly focuses on extracting text-related regions and features from the image components. We develop a new learning mechanism to train the Text-CNN with multi-level and rich supervised information, including text region mask, character label, and binary text/non-text information. The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components. The training process is formulated as a multi-task learning problem, where low-level supervised information greatly facilitates the main task of text/non-text classification. In addition, a powerful low-level detector called contrast-enhancement maximally stable extremal regions (MSERs) is developed, which extends the widely used MSERs by enhancing intensity contrast between text patterns and background. This allows it to detect highly challenging text patterns, resulting in a higher recall. Our approach achieved promising results on the ICDAR 2013 data set, with an F-measure of 0.82, substantially improving the state-of-the-art results. PMID:27093723

  4. Text Classification using Artificial Intelligence

    CERN Document Server

    Kamruzzaman, S M

    2010-01-01

    Text classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Existing supervised learning algorithms for classifying text need sufficient documents to learn accurately. This paper presents a new algorithm for text classification using artificial intelligence technique that requires fewer documents for training. Instead of using words, word relation i.e. association rules from these words is used to derive feature set from pre-classified text documents. The concept of na\\"ive Bayes classifier is then used on derived features and finally only a single concept of genetic algorithm has been added for final classification. A syste...

  5. Text Classification using Data Mining

    CERN Document Server

    Kamruzzaman, S M; Hasan, Ahmed Ryadh

    2010-01-01

    Text classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Existing supervised learning algorithms to automatically classify text need sufficient documents to learn accurately. This paper presents a new algorithm for text classification using data mining that requires fewer documents for training. Instead of using words, word relation i.e. association rules from these words is used to derive feature set from pre-classified text documents. The concept of Naive Bayes classifier is then used on derived features and finally only a single concept of Genetic Algorithm has been added for final classification. A system based on the...

  6. Text Mining Infrastructure in R

    OpenAIRE

    Kurt Hornik; Ingo Feinerer; David Meyer

    2008-01-01

    During the last decade text mining has become a widely used discipline utilizing statistical and machine learning methods. We present the tm package which provides a framework for text mining applications within R. We give a survey on text mining facilities in R and explain how typical application tasks can be carried out using our framework. We present techniques for count-based analysis methods, text clustering, text classiffication and string kernels. (authors' abstract)

  7. Text analysis devices, articles of manufacture, and text analysis methods

    Science.gov (United States)

    Turner, Alan E; Hetzler, Elizabeth G; Nakamura, Grant C

    2013-05-28

    Text analysis devices, articles of manufacture, and text analysis methods are described according to some aspects. In one aspect, a text analysis device includes processing circuitry configured to analyze initial text to generate a measurement basis usable in analysis of subsequent text, wherein the measurement basis comprises a plurality of measurement features from the initial text, a plurality of dimension anchors from the initial text and a plurality of associations of the measurement features with the dimension anchors, and wherein the processing circuitry is configured to access a viewpoint indicative of a perspective of interest of a user with respect to the analysis of the subsequent text, and wherein the processing circuitry is configured to use the viewpoint to generate the measurement basis.

  8. Contrastive Study of Coherence in Chinese Text and English Text

    Institute of Scientific and Technical Information of China (English)

    王婷

    2013-01-01

    The paper presents the text-linguistic concepts on which the analysis of textual structure is based including text and discourse, coherence and cohesive. In addition we try to discover different manifestations of text between ET and CT, including different coherent structures.

  9. Metamorphoses d'un texte (Metamorphoses of a Text).

    Science.gov (United States)

    Meitinger, Guy Roger

    1993-01-01

    A variety of exercises based on manipulation of a single text are described. The activities involve replacing words or phrases in the text with synonyms or opposites, transposing gender, changing tenses, filling in blanks, and answering multiple-choice questions about linguistic forms. Three brief sample texts are offered. (MSE)

  10. Text mining from ontology learning to automated text processing applications

    CERN Document Server

    Biemann, Chris

    2014-01-01

    This book comprises a set of articles that specify the methodology of text mining, describe the creation of lexical resources in the framework of text mining and use text mining for various tasks in natural language processing (NLP). The analysis of large amounts of textual data is a prerequisite to build lexical resources such as dictionaries and ontologies and also has direct applications in automated text processing in fields such as history, healthcare and mobile applications, just to name a few. This volume gives an update in terms of the recent gains in text mining methods and reflects

  11. La quadrature de l'économie circulaire

    OpenAIRE

    Pirard, Eric

    2016-01-01

    La quadrature du cercle économique est une source d’inspiration et d’innovation pour tous les ingénieurs au Nord comme au Sud. Essayer de la résoudre, c’est contribuer sans conteste à une utilisation plus efficiente des ressources de notre planète. Mais pour cela, il est indispensable de bien comprendre l’ensemble de la chaîne de valeurs de la matière et d’éviter les leurres et les slogans simplistes comme « zero waste » ou « closing the loop »

  12. Author Gender Identification from Text

    OpenAIRE

    Rezaei, Atoosa Mohammad

    2014-01-01

    ABSTRACT: The identification of an author's gender from a text has become a popular research area within the scope of text categorization. The number of users of social network applications based on text, such as Twitter, Facebook and text messaging services, has grown rapidly over the past few decades. As a result, text has become one of the most important and prevalent media types on the Internet. This thesis aims to determine the gender of an author from an arbitrary piece of text such as,...

  13. Working with text tools, techniques and approaches for text mining

    CERN Document Server

    Tourte, Gregory J L

    2016-01-01

    Text mining tools and technologies have long been a part of the repository world, where they have been applied to a variety of purposes, from pragmatic aims to support tools. Research areas as diverse as biology, chemistry, sociology and criminology have seen effective use made of text mining technologies. Working With Text collects a subset of the best contributions from the 'Working with text: Tools, techniques and approaches for text mining' workshop, alongside contributions from experts in the area. Text mining tools and technologies in support of academic research include supporting research on the basis of a large body of documents, facilitating access to and reuse of extant work, and bridging between the formal academic world and areas such as traditional and social media. Jisc have funded a number of projects, including NaCTem (the National Centre for Text Mining) and the ResDis programme. Contents are developed from workshop submissions and invited contributions, including: Legal considerations in te...

  14. Text mining: A Brief survey

    OpenAIRE

    Falguni N. Patel , Neha R. Soni

    2012-01-01

    The unstructured texts which contain massive amount of information cannot simply be used for further processing by computers. Therefore, specific processing methods and algorithms are required in order to extract useful patterns. The process of extracting interesting information and knowledge from unstructured text completed by using Text mining. In this paper, we have discussed text mining, as a recent and interesting field with the detail of steps involved in the overall process. We have...

  15. Informational Text and the CCSS

    Science.gov (United States)

    Aspen Institute, 2012

    2012-01-01

    What constitutes an informational text covers a broad swath of different types of texts. Biographies & memoirs, speeches, opinion pieces & argumentative essays, and historical, scientific or technical accounts of a non-narrative nature are all included in what the Common Core State Standards (CCSS) envisions as informational text. Also included…

  16. Too Dumb for Complex Texts?

    Science.gov (United States)

    Bauerlein, Mark

    2011-01-01

    High school students' lack of experience and practice with reading complex texts is a primary cause of their difficulties with college-level reading. Filling the syllabus with digital texts does little to address this deficiency. Complex texts demand three dispositions from readers: a willingness to probe works characterized by dense meanings, the…

  17. Slippery Texts and Evolving Literacies

    Science.gov (United States)

    Mackey, Margaret

    2007-01-01

    The idea of "slippery texts" provides a useful descriptor for materials that mutate and evolve across different media. Eight adult gamers, encountering the slippery text "American McGee's Alice," demonstrate a variety of ways in which players attempt to manage their attention as they encounter a new text with many resonances. The range of their…

  18. Text Association Analysis and Ambiguity in Text Mining

    Science.gov (United States)

    Bhonde, S. B.; Paikrao, R. L.; Rahane, K. U.

    2010-11-01

    Text Mining is the process of analyzing a semantically rich document or set of documents to understand the content and meaning of the information they contain. The research in Text Mining will enhance human's ability to process massive quantities of information, and it has high commercial values. Firstly, the paper discusses the introduction of TM its definition and then gives an overview of the process of text mining and the applications. Up to now, not much research in text mining especially in concept/entity extraction has focused on the ambiguity problem. This paper addresses ambiguity issues in natural language texts, and presents a new technique for resolving ambiguity problem in extracting concept/entity from texts. In the end, it shows the importance of TM in knowledge discovery and highlights the up-coming challenges of document mining and the opportunities it offers.

  19. Multilingual Text Analysis for Text-to-Speech Synthesis

    CERN Document Server

    Sproat, R

    1996-01-01

    We present a model of text analysis for text-to-speech (TTS) synthesis based on (weighted) finite-state transducers, which serves as the text-analysis module of the multilingual Bell Labs TTS system. The transducers are constructed using a lexical toolkit that allows declarative descriptions of lexicons, morphological rules, numeral-expansion rules, and phonological rules, inter alia. To date, the model has been applied to eight languages: Spanish, Italian, Romanian, French, German, Russian, Mandarin and Japanese.

  20. A Survey on Web Text Information Retrieval in Text Mining

    Directory of Open Access Journals (Sweden)

    Tapaswini Nayak

    2015-08-01

    Full Text Available In this study we have analyzed different techniques for information retrieval in text mining. The aim of the study is to identify web text information retrieval. Text mining almost alike to analytics, which is a process of deriving high quality information from text. High quality information is typically derived in the course of the devising of patterns and trends through means such as statistical pattern learning. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, creation of coarse taxonomies, sentiment analysis, document summarization and entity relation modeling. It is used to mine hidden information from not-structured or semi-structured data. This feature is necessary because a large amount of the Web information is semi-structured due to the nested structure of HTML code, is linked and is redundant. Web content categorization with a content database is the most important tool to the efficient use of search engines. A customer requesting information on a particular subject or item would otherwise have to search through hundred of results to find the most relevant information to his query. Hundreds of results through use of mining text are reduced by this step. This eliminates the aggravation and improves the navigation of information on the Web.

  1. Monitoring interaction and collective text production through text mining

    Directory of Open Access Journals (Sweden)

    Macedo, Alexandra Lorandi

    2014-04-01

    Full Text Available This article presents the Concepts Network tool, developed using text mining technology. The main objective of this tool is to extract and relate terms of greatest incidence from a text and exhibit the results in the form of a graph. The Network was implemented in the Collective Text Editor (CTE which is an online tool that allows the production of texts in synchronized or non-synchronized forms. This article describes the application of the Network both in texts produced collectively and texts produced in a forum. The purpose of the tool is to offer support to the teacher in managing the high volume of data generated in the process of interaction amongst students and in the construction of the text. Specifically, the aim is to facilitate the teacher’s job by allowing him/her to process data in a shorter time than is currently demanded. The results suggest that the Concepts Network can aid the teacher, as it provides indicators of the quality of the text produced. Moreover, messages posted in forums can be analyzed without their content necessarily having to be pre-read.

  2. Predicting Prosody from Text for Text-to-Speech Synthesis

    CERN Document Server

    Rao, K Sreenivasa

    2012-01-01

    Predicting Prosody from Text for Text-to-Speech Synthesis covers the specific aspects of prosody, mainly focusing on how to predict the prosodic information from linguistic text, and then how to exploit the predicted prosodic knowledge for various speech applications. Author K. Sreenivasa Rao discusses proposed methods along with state-of-the-art techniques for the acquisition and incorporation of prosodic knowledge for developing speech systems. Positional, contextual and phonological features are proposed for representing the linguistic and production constraints of the sound units present in the text. This book is intended for graduate students and researchers working in the area of speech processing.

  3. Text mining: A Brief survey

    Directory of Open Access Journals (Sweden)

    Falguni N. Patel , Neha R. Soni

    2012-12-01

    Full Text Available The unstructured texts which contain massive amount of information cannot simply be used for further processing by computers. Therefore, specific processing methods and algorithms are required in order to extract useful patterns. The process of extracting interesting information and knowledge from unstructured text completed by using Text mining. In this paper, we have discussed text mining, as a recent and interesting field with the detail of steps involved in the overall process. We have also discussed different technologies that teach computers with natural language so that they may analyze, understand, and even generate text. In addition, we briefly discuss a number of successful applications of text mining which are used currently and in future.

  4. TEXT DEIXIS IN NARRATIVE SEQUENCES

    Directory of Open Access Journals (Sweden)

    Josep Rivera

    2007-06-01

    Full Text Available This study looks at demonstrative descriptions, regarding them as text-deictic procedures which contribute to weave discourse reference. Text deixis is thought of as a metaphorical referential device which maps the ground of utterance onto the text itself. Demonstrative expressions with textual antecedent-triggers, considered as the most important text-deictic units, are identified in a narrative corpus consisting of J. M. Barrie’s Peter Pan and its translation into Catalan. Some linguistic and discourse variables related to DemNPs are analysed to characterise adequately text deixis. It is shown that this referential device is usually combined with abstract nouns, thus categorising and encapsulating (non-nominal complex discourse entities as nouns, while performing a referential cohesive function by means of the text deixis + general noun type of lexical cohesion.

  5. Text structures in medical text processing: empirical evidence and a text understanding prototype.

    OpenAIRE

    Hahn, U.; Romacker, M

    1997-01-01

    We consider the role of textual structures in medical texts. In particular, we examine the impact the lacking recognition of text phenomena has on the validity of medical knowledge bases fed by a natural language understanding front-end. First, we review the results from an empirical study on a sample of medical texts considering, in various forms of local coherence phenomena (anaphora and textual ellipses). We then discuss the representation bias emerging in the text knowledge base that is l...

  6. Texting while driving: is speech-based text entry less risky than handheld text entry?

    Science.gov (United States)

    He, J; Chaparro, A; Nguyen, B; Burge, R J; Crandall, J; Chaparro, B; Ni, R; Cao, S

    2014-11-01

    Research indicates that using a cell phone to talk or text while maneuvering a vehicle impairs driving performance. However, few published studies directly compare the distracting effects of texting using a hands-free (i.e., speech-based interface) versus handheld cell phone, which is an important issue for legislation, automotive interface design and driving safety training. This study compared the effect of speech-based versus handheld text entries on simulated driving performance by asking participants to perform a car following task while controlling the duration of a secondary text-entry task. Results showed that both speech-based and handheld text entries impaired driving performance relative to the drive-only condition by causing more variation in speed and lane position. Handheld text entry also increased the brake response time and increased variation in headway distance. Text entry using a speech-based cell phone was less detrimental to driving performance than handheld text entry. Nevertheless, the speech-based text entry task still significantly impaired driving compared to the drive-only condition. These results suggest that speech-based text entry disrupts driving, but reduces the level of performance interference compared to text entry with a handheld device. In addition, the difference in the distraction effect caused by speech-based and handheld text entry is not simply due to the difference in task duration. PMID:25089769

  7. Situational Interest in Literary Text

    Science.gov (United States)

    Schraw

    1997-10-01

    This study examined relationships among text characteristics, situational interest, two measures of text understanding, and personal responses when reading a literary text. A factor analysis of ratings made after reading revealed six interrelated text characteristics. Of these, suspense, coherence and thematic complexity explained 54% of the variance in interest. Additional analyses found that situational interest was unrelated to a multiple choice test of main ideas; but was related to personal responses and holistic interpretations of the text. These results suggest that multiple aspects of literary texts are interesting to readers, and that interest is related to personal engagement variables, even when it is not related to the comprehension of main ideas. Copyright 1997Academic Press PMID:9356182

  8. Outer Texts in Bilingual Dictionaries

    OpenAIRE

    Rufus H Gouws

    2011-01-01

    Abstract: Dictionaries often display a central list bias with little or no attention to the use ofouter texts. This article focuses on dictionaries as text compounds and carriers of different texttypes. Utilising either a partial or a complete frame structure, a variety of outer text types can beused to enhance the data distribution structure of a dictionary and to ensure a better informationretrieval by the intended target user. A distinction is made between primary frame structures...

  9. Active Learning for Text Classification

    OpenAIRE

    Hu, Rong

    2011-01-01

    Text classification approaches are used extensively to solve real-world challenges. The success or failure of text classification systems hangs on the datasets used to train them, without a good dataset it is impossible to build a quality system. This thesis examines the applicability of active learning in text classification for the rapid and economical creation of labelled training data. Four main contributions are made in this thesis. First, we present two novel selection strategies to cho...

  10. Multimodal texts in kindergarten rooms

    OpenAIRE

    Granly, Astrid; Maagerø, Eva

    2012-01-01

    This article provides an overview of the results of our project “The Kindergarten Room: A Multimodal Pedagogical Text”. Our major initiative was to investigate what the multimodal texts in kindergarten represent and the extent to which they reflect and provide attributions to the children’s activities. In addition, we wanted to investigate whether kindergarten walls and floors can be called ‘pedagogical texts’, and the extent to which texts on walls and floors establish a particular text cult...

  11. Text Type and Translation Strategy

    Institute of Scientific and Technical Information of China (English)

    刘福娟

    2015-01-01

    Translation strategy and translation standards are undoubtedly the core problems translators are confronted with in translation. There have arisen many kinds of translation strategies in translation history, among which the text type theory is considered an important breakthrough and a significant complement of traditional translation standards. This essay attempts to demonstrate the value of text typology (informative, expressive, and operative) to translation strategy, emphasizing the importance of text types and their communicative functions.

  12. Typesafe Modeling in Text Mining

    OpenAIRE

    Steeg, Fabian

    2011-01-01

    Based on the concept of annotation-based agents, this report introduces tools and a formal notation for defining and running text mining experiments using a statically typed domain-specific language embedded in Scala. Using machine learning for classification as an example, the framework is used to develop and document text mining experiments, and to show how the concept of generic, typesafe annotation corresponds to a general information model that goes beyond text processing.

  13. Strategies for Translating Vocative Texts

    OpenAIRE

    Olga COJOCARU

    2014-01-01

    The paper deals with the linguistic and cultural elements of vocative texts and the techniques used in translating them by giving some examples of texts that are typically vocative (i.e. advertisements and instructions for use). Semantic and communicative strategies are popular in translation studies and each of them has its own advantages and disadvantages in translating vocative texts. The advantage of semantic translation is that it takes more account of the aesthetic value of the SL te...

  14. Approaches to Automatic Text Structuring

    OpenAIRE

    Erbs, Nicolai

    2015-01-01

    Structured text helps readers to better understand the content of documents. In classic newspaper texts or books, some structure already exists. In the Web 2.0, the amount of textual data, especially user-generated data, has increased dramatically. As a result, there exists a large amount of textual data which lacks structure, thus making it more difficult to understand. In this thesis, we will explore techniques for automatic text structuring to help readers to fulfill their information need...

  15. Task specific image text recognition

    OpenAIRE

    Ben-Haim, Nadav

    2008-01-01

    This thesis addresses the problem of reading image text, which we define here as a digital image of machine printed text. Images of license plates, signs, and scanned documents fall into this category, whereas images of handwriting do not. Automatically reading image text is a very well researched problem, which falls into the broader category of Optical Character Recognition (OCR). Virtually all work in this domain begins by segmenting characters from the image and proceeds with a classifica...

  16. Text Mining Applications and Theory

    CERN Document Server

    Berry, Michael W

    2010-01-01

    Text Mining: Applications and Theory presents the state-of-the-art algorithms for text mining from both the academic and industrial perspectives.  The contributors span several countries and scientific domains: universities, industrial corporations, and government laboratories, and demonstrate the use of techniques from machine learning, knowledge discovery, natural language processing and information retrieval to design computational models for automated text analysis and mining. This volume demonstrates how advancements in the fields of applied mathematics, computer science, machine learning

  17. TRANSLATION PROBLEMS IN MEDICAL TEXTS

    OpenAIRE

    OĞUZ, Derya

    2014-01-01

    In this study, our aim was to emphasize the fact that the translation of medical texts represents a special area in translation, and that the most important aspect of medical text translation is being aware of the purpose and intent behind these translations. Medical text translation is not an area in which any person engaged in translation can work effectively. The translator first needs to have a considerable scientific background and experience. The translator’s task, in this context, is t...

  18. Knowledge Representation in Travelling Texts

    DEFF Research Database (Denmark)

    Mousten, Birthe; Locmele, Gunta

    2014-01-01

    Today, information travels fast. Texts travel, too. In a corporate context, the question is how to manage which knowledge elements should travel to a new language area or market and in which form? The decision to let knowledge elements travel or not travel highly depends on the limitation...... and the purpose of the text in a new context as well as on predefined parameters for text travel. For texts used in marketing and in technology, the question is whether culture-bound knowledge representation should be domesticated or kept as foreign elements, or should be mirrored or moulded—or should not travel...

  19. Linguistic Dating of Biblical Texts

    DEFF Research Database (Denmark)

    Ehrensvärd, Martin Gustaf

    For two centuries, scholars have pointed to consistent differences in the Hebrew of certain biblical texts and interpreted these differences as reflecting the date of composition of the texts. Until the 1980s, this was quite uncontroversial as the linguistic findings largely confirmed the...

  20. Strategies for Translating Vocative Texts

    Directory of Open Access Journals (Sweden)

    Olga COJOCARU

    2014-12-01

    Full Text Available The paper deals with the linguistic and cultural elements of vocative texts and the techniques used in translating them by giving some examples of texts that are typically vocative (i.e. advertisements and instructions for use. Semantic and communicative strategies are popular in translation studies and each of them has its own advantages and disadvantages in translating vocative texts. The advantage of semantic translation is that it takes more account of the aesthetic value of the SL text, while communicative translation attempts to render the exact contextual meaning of the original text in such a way that both content and language are readily acceptable and comprehensible to the readership. Focus is laid on the strategies used in translating vocative texts, strategies that highlight and introduce a cultural context to the target audience, in order to achieve their overall purpose, that is to sell or persuade the reader to behave in a certain way. Thus, in order to do that, a number of advertisements from the field of cosmetics industry and electronic gadgets were selected for analysis. The aim is to gather insights into vocative text translation and to create new perspectives on this field of research, now considered a process of innovation and diversion, especially in areas as important as economy and marketing.

  1. English Metafunction Analysis in Chemistry Text: Characterization of Scientific Text

    Directory of Open Access Journals (Sweden)

    Ahmad Amin Dalimunte, M.Hum

    2013-09-01

    Full Text Available The objectives of this research are to identify what Metafunctions are applied in chemistry text and how they characterize a scientific text. It was conducted by applying content analysis. The data for this research was a twelve-paragraph chemistry text. The data were collected by applying a documentary technique. The document was read and analyzed to find out the Metafunction. The data were analyzed by some procedures: identifying the types of process, counting up the number of the processes, categorizing and counting up the cohesion devices, classifying the types of modulation and determining modality value, finally counting up the number of sentences and clauses, then scoring the grammatical intricacy index. The findings of the research show that Material process (71of 100 is mostly used, circumstance of spatial location (26 of 56 is more dominant than the others. Modality (5 is less used in order to avoid from subjectivity. Impersonality is implied through less use of reference either pronouns (7 or demonstrative (7, conjunctions (60 are applied to develop ideas, and the total number of the clauses are found much more dominant (109 than the total number of the sentences (40 which results high grammatical intricacy index. The Metafunction found indicate that the chemistry text has fulfilled the characteristics of scientific or academic text which truly reflects it as a natural science.

  2. Text Analytics to Data Warehousing

    Directory of Open Access Journals (Sweden)

    Kalli Srinivasa Nageswara Prasad

    2010-09-01

    Full Text Available Information hidden or stored in unstructured data can play a critical role in making decisions, understanding and conducting other business functions. Integrating data stored in both structured and unstructured formats can add significant value to an organization. With the extent of development happening in Text Mining and technologies to deal with unstructured and semi structured data like XML and MML(Mining Markup Language to extract and analyze data, textanalytics has evolved to handle unstructured data to helps unlock and predict business results via Business Intelligence and Data Warehousing. Text mining involves dealing with texts in documents and discovering hidden patterns, but Text Analytics enhances InformationRetrieval in form of search and enabling clustering of results and more over Text Analytics is text mining and visualization. In this paper we would discuss on handling unstructured data that are in documents so that they fit into business applications like Data Warehouses for further analysis and it helps in the framework we have used for the solution.

  3. Outer Texts in Bilingual Dictionaries

    Directory of Open Access Journals (Sweden)

    Rufus H. Gouws

    2011-10-01

    Full Text Available

    Abstract: Dictionaries often display a central list bias with little or no attention to the use ofouter texts. This article focuses on dictionaries as text compounds and carriers of different texttypes. Utilising either a partial or a complete frame structure, a variety of outer text types can beused to enhance the data distribution structure of a dictionary and to ensure a better informationretrieval by the intended target user. A distinction is made between primary frame structures andsecondary frame structures and attention is drawn to the use of complex outer texts and the need ofan extended complex outer text with its own table of contents to guide the user to the relevant textsin the complex outer text. It is emphasised that outer texts need to be planned in a meticulous wayand that they should participate in the lexicographic functions of the specific dictionary, bothknowledge-orientated and communication-orientated functions, to ensure a transtextual functionalapproach.

    Keywords: BACK MATTER, CENTRAL LIST, COMMUNICATION-ORIENTATED FUNCTIONS,COMPLEX TEXT, CULTURAL DATA, EXTENDED COMPLEX TEXT, EXTENDED TEXTS,FRONT MATTER, FRAME STRUCTURE, KNOWLEDGE-ORIENTATED FUNCTIONS, LEXICOGRAPHICFUNCTIONS, OUTER TEXTS, PRIMARY FRAME, SECONDARY FRAME

    Opsomming: Buitetekste in tweetalige woordeboeke. Woordeboeke vertoondikwels 'n partydigheid ten gunste van die sentrale lys met min of geen aandag aan die buitetekstenie. Hierdie artikel fokus op woordeboeke as tekssamestellings en draers van verskillende tekssoorte.Met die benutting van óf 'n gedeeltelike óf 'n volledige raamstruktuur kan 'n verskeidenheidbuitetekste aangewend word om die dataverspreidingstruktuur van 'n woordeboek te verbeteren om 'n beter herwinning van inligting deur die teikengebruiker te verseker. 'n Onderskeidword gemaak tussen primêre en sekondêre raamstrukture en die aandag word gevestig op kompleksebuitetekste en die behoefte aan 'n uitgebreide komplekse

  4. Biomarker Identification Using Text Mining

    Directory of Open Access Journals (Sweden)

    Hui Li

    2012-01-01

    Full Text Available Identifying molecular biomarkers has become one of the important tasks for scientists to assess the different phenotypic states of cells or organisms correlated to the genotypes of diseases from large-scale biological data. In this paper, we proposed a text-mining-based method to discover biomarkers from PubMed. First, we construct a database based on a dictionary, and then we used a finite state machine to identify the biomarkers. Our method of text mining provides a highly reliable approach to discover the biomarkers in the PubMed database.

  5. Why is Light Text Harder to Read Than Dark Text?

    Science.gov (United States)

    Scharff, Lauren V.; Ahumada, Albert J.

    2005-01-01

    Scharff and Ahumada (2002, 2003) measured text legibility for light text and dark text. For paragraph readability and letter identification, responses to light text were slower and less accurate for a given contrast. Was this polarity effect (1) an artifact of our apparatus, (2) a physiological difference in the separate pathways for positive and negative contrast or (3) the result of increased experience with dark text on light backgrounds? To rule out the apparatus-artifact hypothesis, all data were collected on one monitor. Its luminance was measured at all levels used, and the spatial effects of the monitor were reduced by pixel doubling and quadrupling (increasing the viewing distance to maintain constant angular size). Luminances of vertical and horizontal square-wave gratings were compared to assess display speed effects. They existed, even for 4-pixel-wide bars. Tests for polarity asymmetries in display speed were negative. Increased experience might develop full letter templates for dark text, while recognition of light letters is based on component features. Earlier, an observer ran all conditions at one polarity and then switched. If dark and light letters were intermixed, the observer might use component features on all trials and do worse on the dark letters, reducing the polarity effect. We varied polarity blocking (completely blocked, alternating smaller blocks, and intermixed blocks). Letter identification responses times showed polarity effects at all contrasts and display resolution levels. Observers were also more accurate with higher contrasts and more pixels per degree. Intermixed blocks increased the polarity effect by reducing performance on the light letters, but only if the randomized block occurred prior to the nonrandomized block. Perhaps observers tried to use poorly developed templates, or they did not work as hard on the more difficult items. The experience hypothesis and the physiological gain hypothesis remain viable explanations.

  6. Stemming Malay Text and Its Application in Automatic Text Categorization

    Science.gov (United States)

    Yasukawa, Michiko; Lim, Hui Tian; Yokoo, Hidetoshi

    In Malay language, there are no conjugations and declensions and affixes have important grammatical functions. In Malay, the same word may function as a noun, an adjective, an adverb, or, a verb, depending on its position in the sentence. Although extensively simple root words are used in informal conversations, it is essential to use the precise words in formal speech or written texts. In Malay, to make sentences clear, derivative words are used. Derivation is achieved mainly by the use of affixes. There are approximately a hundred possible derivative forms of a root word in written language of the educated Malay. Therefore, the composition of Malay words may be complicated. Although there are several types of stemming algorithms available for text processing in English and some other languages, they cannot be used to overcome the difficulties in Malay word stemming. Stemming is the process of reducing various words to their root forms in order to improve the effectiveness of text processing in information systems. It is essential to avoid both over-stemming and under-stemming errors. We have developed a new Malay stemmer (stemming algorithm) for removing inflectional and derivational affixes. Our stemmer uses a set of affix rules and two types of dictionaries: a root-word dictionary and a derivative-word dictionary. The use of set of rules is aimed at reducing the occurrence of under-stemming errors, while that of the dictionaries is believed to reduce the occurrence of over-stemming errors. We performed an experiment to evaluate the application of our stemmer in text mining software. For the experiment, text data used were actual web pages collected from the World Wide Web to demonstrate the effectiveness of our Malay stemming algorithm. The experimental results showed that our stemmer can effectively increase the precision of the extracted Boolean expressions for text categorization.

  7. An Experimental Text-Commentary

    Science.gov (United States)

    O'Brien, Joan

    1976-01-01

    An experimental text-commentary of selected passages from Sophocles'"Antigone" is described. The commentary is intended for students seeking more than a conventional translation who do not know enough Greek to use a standard commentary. (RM)

  8. Anomaly Detection with Text Mining

    Data.gov (United States)

    National Aeronautics and Space Administration — Many existing complex space systems have a significant amount of historical maintenance and problem data bases that are stored in unstructured text forms. The...

  9. Text Steganographic Approaches: A Comparison

    Directory of Open Access Journals (Sweden)

    Monika Agarwal

    2013-02-01

    Full Text Available This paper presents three novel approaches of text steganography. The first approach uses the theme ofmissing letter puzzle where each character of message is hidden by missing one or more letters in a wordof cover. The average Jaro score was found to be 0.95 indicating closer similarity between cover andstego file. The second approach hides a message in a wordlist where ASCII value of embedded characterdetermines length and starting letter of a word. The third approach conceals a message, withoutdegrading cover, by using start and end letter of words of the cover. For enhancing the security of secretmessage, the message is scrambled using one-time pad scheme before being concealed and cipher text isthen concealed in cover. We also present an empirical comparison of the proposed approaches with someof the popular text steganographic approaches and show that our approaches outperform the existingapproaches.

  10. System for Distributed Text Mining

    OpenAIRE

    Torgersen, Martin Nordseth

    2011-01-01

    Text mining presents us with new possibilities for the use of collections of documents.There exists a large amount of hidden implicit information inside these collection, which text mining techniques may help us to uncover. Unfortunately, these techniques generally requires large amounts of computational power. This is addressed by the introduction of distributed systems and methods for distributed processing, such as Hadoop and MapReduce.This thesis aims to describe, design, implement and ev...

  11. Text Mining in Social Networks

    Science.gov (United States)

    Aggarwal, Charu C.; Wang, Haixun

    Social networks are rich in various kinds of contents such as text and multimedia. The ability to apply text mining algorithms effectively in the context of text data is critical for a wide variety of applications. Social networks require text mining algorithms for a wide variety of applications such as keyword search, classification, and clustering. While search and classification are well known applications for a wide variety of scenarios, social networks have a much richer structure both in terms of text and links. Much of the work in the area uses either purely the text content or purely the linkage structure. However, many recent algorithms use a combination of linkage and content information for mining purposes. In many cases, it turns out that the use of a combination of linkage and content information provides much more effective results than a system which is based purely on either of the two. This paper provides a survey of such algorithms, and the advantages observed by using such algorithms in different scenarios. We also present avenues for future research in this area.

  12. MMI Diversity Based Text Summarization

    Directory of Open Access Journals (Sweden)

    Ladda Suanmali

    2009-03-01

    Full Text Available The searching for interesting information in a huge data collection is a tough job frustrating the seekers for that information. The automatic text summarization has come to facilitate such searching process. The selection of distinct ideas “diversity” from the original document can produce an appropriate summary. Incorporating of multiple means can help to find the diversity in the text. In this paper, we propose approach for text summarization, in which three evidences are employed (clustering, binary tree and diversity based method to help in finding the document distinct ideas. The emphasis of our approach is on controlling the redundancy in the summarized text. The role of clustering is very important, where some clustering algorithms perform better than others. Therefore we conducted an experiment for comparing two clustering algorithms (K-means and complete linkage clustering algorithms based on the performance of our method, the results shown that k-means performs better than complete linkage. In general, the experimental results shown that our method performs well for text summarization comparing with the benchmark methods used in this study.

  13. Analysing ESP Texts, but How?

    Directory of Open Access Journals (Sweden)

    Borza Natalia

    2015-03-01

    Full Text Available English as a second language (ESL teachers instructing general English and English for specific purposes (ESP in bilingual secondary schools face various challenges when it comes to choosing the main linguistic foci of language preparatory courses enabling non-native students to study academic subjects in English. ESL teachers intending to analyse English language subject textbooks written for secondary school students with the aim of gaining information about what bilingual secondary school students need to know in terms of language to process academic textbooks cannot avoiding deal with a dilemma. It needs to be decided which way it is most appropriate to analyse the texts in question. Handbooks of English applied linguistics are not immensely helpful with regard to this problem as they tend not to give recommendation as to which major text analytical approaches are advisable to follow in a pre-college setting. The present theoretical research aims to address this lacuna. Respectively, the purpose of this pedagogically motivated theoretical paper is to investigate two major approaches of ESP text analysis, the register and the genre analysis, in order to find the more suitable one for exploring the language use of secondary school subject texts from the point of view of an English as a second language teacher. Comparing and contrasting the merits and limitations of the two contrastive approaches allows for a better understanding of the nature of the two different perspectives of text analysis. The study examines the goals, the scope of analysis, and the achievements of the register perspective and those of the genre approach alike. The paper also investigates and reviews in detail the starkly different methods of ESP text analysis applied by the two perspectives. Discovering text analysis from a theoretical and methodological angle supports a practical aspect of English teaching, namely making an informed choice when setting out to analyse

  14. Text segmentation with character-level text embeddings

    NARCIS (Netherlands)

    Chrupała, Grzegorz

    2013-01-01

    Learning word representations has recently seen much success in computational linguistics. However, assuming sequences of word tokens as input to linguistic analysis is often unjustified. For many languages word segmentation is a non-trivial task and naturally occurring text is sometimes a mixture o

  15. GPU-Accelerated Text Mining

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Mueller, Frank [North Carolina State University; Zhang, Yongpeng [ORNL; Potok, Thomas E [ORNL

    2009-01-01

    Accelerating hardware devices represent a novel promise for improving the performance for many problem domains but it is not clear for which domains what accelerators are suitable. While there is no room in general-purpose processor design to significantly increase the processor frequency, developers are instead resorting to multi-core chips duplicating conventional computing capabilities on a single die. Yet, accelerators offer more radical designs with a much higher level of parallelism and novel programming environments. This present work assesses the viability of text mining on CUDA. Text mining is one of the key concepts that has become prominent as an effective means to index the Internet, but its applications range beyond this scope and extend to providing document similarity metrics, the subject of this work. We have developed and optimized text search algorithms for GPUs to exploit their potential for massive data processing. We discuss the algorithmic challenges of parallelization for text search problems on GPUs and demonstrate the potential of these devices in experiments by reporting significant speedups. Our study may be one of the first to assess more complex text search problems for suitability for GPU devices, and it may also be one of the first to exploit and report on atomic instruction usage that have recently become available in NVIDIA devices.

  16. GPU-Accelerated Text Mining

    International Nuclear Information System (INIS)

    Accelerating hardware devices represent a novel promise for improving the performance for many problem domains but it is not clear for which domains what accelerators are suitable. While there is no room in general-purpose processor design to significantly increase the processor frequency, developers are instead resorting to multi-core chips duplicating conventional computing capabilities on a single die. Yet, accelerators offer more radical designs with a much higher level of parallelism and novel programming environments. This present work assesses the viability of text mining on CUDA. Text mining is one of the key concepts that has become prominent as an effective means to index the Internet, but its applications range beyond this scope and extend to providing document similarity metrics, the subject of this work. We have developed and optimized text search algorithms for GPUs to exploit their potential for massive data processing. We discuss the algorithmic challenges of parallelization for text search problems on GPUs and demonstrate the potential of these devices in experiments by reporting significant speedups. Our study may be one of the first to assess more complex text search problems for suitability for GPU devices, and it may also be one of the first to exploit and report on atomic instruction usage that have recently become available in NVIDIA devices

  17. Al-Hadith Text Classifier

    Directory of Open Access Journals (Sweden)

    Mohammed Naji Al-Kabi

    2005-01-01

    Full Text Available This study explore the implementation of a text classification method to classify the prophet Mohammed (PBUH hadiths (sayings using Sahih Al-Bukhari classification. The sayings explain the Holy Qur`an, which considered by Muslims to be the direct word of Allah. Present method adopts TF/IDF (Term Frequency-Inverse Document Frequency which is used usually for text search. TF/IDF was used for term weighting, in which document weights for the selected terms are computed, to classify non-vocalized sayings, after their terms (keywords have been transformed to the corresponding canonical form (i.e., roots, to one of eight Books (classes, according to Al-Bukhari classification. A term would have a higher weight if it were a good descriptor for a particular book, i.e., it appears frequently in the book but is infrequent in the entire corpus.

  18. Emotion Detection From Text Documents

    Directory of Open Access Journals (Sweden)

    Shiv Naresh Shivhare

    2014-11-01

    Full Text Available Emotion Detection is one of the most emerging issues in human computer interaction. A sufficient amount of work has been done by researchers to detect emotions from facial and audio information whereas recognizing emotions from textual data is still a fresh and hot research area. This paper presented a knowledge based survey on emotion detection based on textual data and the methods used for this purpose. At the next step paper also proposed a new architecture for recognizing emotions from text document.Proposed architecture is composed of two main parts, emotion ontology and emotion detector algorithm.Proposed emotion detector system takes a text document and the emotion ontology as inputs and produces one of the six emotion classes (i.e. love, joy, anger, sadness, fear and surprise as the output.

  19. Text Recognition from an Image

    Directory of Open Access Journals (Sweden)

    Shrinath Janvalkar

    2014-04-01

    Full Text Available To achieve high speed in data processing it is necessary to convert the analog data into digital data. Storage of hard copy of any document occupies large space and retrieving of information from that document is time consuming. Optical character recognition system is an effective way in recognition of printed character. It provides an easy way to recognize and convert the printed text on image into the editable text. It also increases the speed of data retrieval from the image. The image which contains characters can be scanned through scanner and then recognition engine of the OCR system interpret the images and convert images of printed characters into machine-readable characters [8].It improving the interface between man and machine in many applications

  20. Princess Brambilla - images/text

    Directory of Open Access Journals (Sweden)

    Maria Aparecida Barbosa

    2016-06-01

    Full Text Available Read the illustrated literary text is simultaneously think pictures and words. This articulation between the written text and pictures adds potential, expands and becomes complex. Coincides with nowadays discussions on Giorgio Agamben's "contemporary" that add to what adheres to respectively time the displacement and the distance needed to understand it, shakes linear notions of historical chronology. Somehow the coincidence is related to the current interest in the concept of "Nachleben" (survival, which assumes the images of the past ransom, postulated by the art historian Aby Warburg in a research on ancient art of motion characteristics in Renaissance pictures Botticelli's. For the translation of the Princesa Brambilla – um capriccio segundo Jakob Callot, de E. T. A. Hoffmann, com 8 gravuras cunhadas a partir de moldes originais de Callot (1820 to Portuguese such discussions were fundamental, as I try to present in this article.

  1. Fuzzy Swarm Based Text Summarization

    Directory of Open Access Journals (Sweden)

    Mohammed S. Binwahlan

    2009-01-01

    Full Text Available Problem statement: The aim of automatic text summarization systems is to select the most relevant information from an abundance of text sources. A daily rapid growth of data on the internet makes the achieve events of such aim a big challenge. Approach: In this study, we incorporated fuzzy logic with swarm intelligence; so that risks, uncertainty, ambiguity and imprecise values of choosing the features weights (scores could be flexibly tolerated. The weights obtained from the swarm experiment were used to adjust the text features scores and then the features scores were used as inputs for the fuzzy inference system to produce the final sentence score. The sentences were ranked in descending order based on their scores and then the top n sentences were selected as final summary. Results: The experiments showed that the incorporation of fuzzy logic with swarm intelligence could play an important role in the selection process of the most important sentences to be included in the final summary. Also the results showed that the proposed method got a good performance outperforming the swarm model and the benchmark methods. Conclusion: Incorporating more than one technique for dealing with the sentence scoring proved to be an effective mechanism. The PSO was employed for producing the text features weights. The purpose of this process was to emphasize on dealing with the text features fairly based on their importance and to differentiate between more and less important features. The fuzzy inference system was employed to determine the final sentence score, on which the decision was made to include the sentence in the summary or not.

  2. Al-Hadith Text Classifier

    OpenAIRE

    Mohammed Naji Al-Kabi; Ghassan Kanaan; Riyad Al-Shalabi; Saja I. Al- Sinjilawi; Ronza S. Al- Mustafa

    2005-01-01

    This study explore the implementation of a text classification method to classify the prophet Mohammed (PBUH) hadiths (sayings) using Sahih Al-Bukhari classification. The sayings explain the Holy Qur`an, which considered by Muslims to be the direct word of Allah. Present method adopts TF/IDF (Term Frequency-Inverse Document Frequency) which is used usually for text search. TF/IDF was used for term weighting, in which document weights for the selected terms are computed, to classify non-vocali...

  3. Multimodal interactive handwritten text transcription

    CERN Document Server

    Romero, Veronica; Vidal, Enrique

    2012-01-01

    This book presents an interactive multimodal approach for efficient transcription of handwritten text images. This approach, rather than full automation, assists the expert in the recognition and transcription process.Until now, handwritten text recognition (HTR) systems are far from being perfect and heavy human intervention is often required to check and correct the results of such systems. The interactive scenario studied in this book combines the efficiency of automatic handwriting recognition systems with the accuracy of the experts, leading to a cost-effective perfect transcription of th

  4. Cluster Based Text Classification Model

    DEFF Research Database (Denmark)

    Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock

    2011-01-01

    We propose a cluster based classification model for suspicious email detection and other text classification tasks. The text classification tasks comprise many training examples that require a complex classification model. Using clusters for classification makes the model simpler and increases the...... classifier is trained on each cluster having reduced dimensionality and less number of examples. The experimental results show that the proposed model outperforms the existing classification models for the task of suspicious email detection and topic categorization on the Reuters-21578 and 20 Newsgroups...... datasets. Our model also outperforms A Decision Cluster Classification (ADCC) and the Decision Cluster Forest Classification (DCFC) models on the Reuters-21578 dataset....

  5. Quality Inspection of Printed Texts

    DEFF Research Database (Denmark)

    Pedersen, Jesper Ballisager; Nasrollahi, Kamal; Moeslund, Thomas B.

    2016-01-01

    -folded: for costumers of the printing and verification system, the overall grade used to verify if the text is of sufficient quality, while for printer's manufacturer, the detailed character/symbols grades and quality measurements are used for the improvement and optimization of the printing task. The......Inspecting the quality of printed texts has its own importance in many industrial applications. To do so, this paper proposes a grading system which evaluates the performance of the printing task using some quality measures for each character and symbols. The purpose of these grading system is two...

  6. Ontological representation of texts, and its applicationsin text analysis

    OpenAIRE

    Solheim, Bent André; Vågsnes, Kristian

    2003-01-01

    For the management of a company, the need to know what people think of their products or services is becoming increasingly important in an increasingly competitive market. As the Internet can nearly be described as a digital mirror of events in the ”real“ world, being able to make sense of the semi structured nature of natural language texts published in this ubiquitous medium has received growing interest. The approach proposed in the thesis combines natural language processin...

  7. A Guide Text or Many Texts? "That is the Question”

    Directory of Open Access Journals (Sweden)

    Delgado de Valencia Sonia

    2001-08-01

    Full Text Available The use of supplementary materials in the classroom has always been an essential part of the teaching and learning process. To restrict our teaching to the scope of one single textbook means to stand behind the advances of knowledge, in any area and context. Young learners appreciate any new and varied support that expands their knowledge of the world: diaries, letters, panels, free texts, magazines, short stories, poems or literary excerpts, and articles taken from Internet are materials that will allow learnersto share more and work more collaboratively. In this article we are going to deal with some of these materials, with the criteria to select, adapt, and create them that may be of interest to the learner and that may promote reading and writing processes. Since no text can entirely satisfy the needs of students and teachers, the creativity of both parties will be necessary to improve the quality of teaching through the adequate use and adaptation of supplementary materials.

  8. Comparison of Text Categorization Algorithms

    Institute of Scientific and Technical Information of China (English)

    SHI Yong-feng; ZHAO Yan-ping

    2004-01-01

    This paper summarizes several automatic text categorization algorithms in common use recently, analyzes and compares their advantages and disadvantages.It provides clues for making use of appropriate automatic classifying algorithms in different fields.Finally some evaluations and summaries of these algorithms are discussed, and directions to further research have been pointed out.

  9. Multilingual text induced spelling correction

    NARCIS (Netherlands)

    Reynaert, M.W.C.

    2004-01-01

    We present TISC, a multilingual, language-independent and context-sensitive spelling checking and correction system designed to facilitate the automatic removal of non-word spelling errors in large corpora. Its lexicon is derived from raw text corpora, without supervision, and contains word unigrams

  10. Values Education: Texts and Supplements.

    Science.gov (United States)

    Curriculum Review, 1979

    1979-01-01

    This column describes and evaluates almost 40 texts, instructional kits, and teacher resources on values, interpersonal relations, self-awareness, self-help skills, juvenile psychology, and youth suicide. Eight effective picture books for the primary grades and seven titles in values fiction for teens are also reviewed. (SJL)

  11. Solar Concepts: A Background Text.

    Science.gov (United States)

    Gorham, Jonathan W.

    This text is designed to provide teachers, students, and the general public with an overview of key solar energy concepts. Various energy terms are defined and explained. Basic thermodynamic laws are discussed. Alternative energy production is described in the context of the present energy situation. Described are the principal contemporary solar…

  12. Reviving "Walden": Mining the Text.

    Science.gov (United States)

    Hewitt Julia

    2000-01-01

    Describes how the author and her high school English students begin their study of Thoreau's "Walden" by mining the text for quotations to inspire their own writing and discussion on the topic, "How does Thoreau speak to you or how could he speak to someone you know?" (SR)

  13. Presentation of the math text

    OpenAIRE

    KREJČOVÁ, Iva

    2009-01-01

    The aim of this bachelor thesis is basic mapping out the mediums for creating mathematical texts and their presentation and the acquisition of basic user skills in the usage of these programs. These funds also compare in terms of availability and ease of use, their opportunities and quality of the output.

  14. Functional Stylistics and Peripeteic Texts

    DEFF Research Database (Denmark)

    Borchmann, Simon

    2008-01-01

    Using a pragmatically based linguistic description apparatus on literary use of language is not unproblematic. Observations show that literary use of language violates the norms contained by this apparatus. With this paper I suggest how we can deal with this problem by setting up a frame for the ...... use of a functional linguistic description apparatus on literary texts. As an extension of this suggestion I present a model for describing a specific type of literary texts.......Using a pragmatically based linguistic description apparatus on literary use of language is not unproblematic. Observations show that literary use of language violates the norms contained by this apparatus. With this paper I suggest how we can deal with this problem by setting up a frame for the...

  15. TEXT tf coil bonding system

    International Nuclear Information System (INIS)

    An extensive bond test program was conducted prior to manufacturing and bonding the toroidal field (TF) coils for the Texas Experimental Tokamak (TEXT). The bonding materials consisted of fiberglass cloth with pre-impregnated, 'B' staged Hexcel F-159 resin. Approximately 100 double lap bond samples were constructed to test quality, strength, and repeatability of the bonds. The variables investigated included surface machining methods, surface preparations, bond sample size (planform area), bonding pressure, bonding temperature, and the number of laminations bonded simultaneously. Double lap shear tests conducted at room temperature resulted in ultimate shear strengths for all variables in the range of 3000 to 7000 psi with an average value of 5650 psi. Fatigue tests were also conducted to demonstrate bond integrity over the anticipated cycle lifetime of the TEXT machine (10/sup 6/ cycles) under simulated worst case conditions. 2 refs

  16. Challenges in Kurdish Text Processing

    OpenAIRE

    Esmaili, Kyumars Sheykh

    2012-01-01

    Despite having a large number of speakers, the Kurdish language is among the less-resourced languages. In this work we highlight the challenges and problems in providing the required tools and techniques for processing texts written in Kurdish. From a high-level perspective, the main challenges are: the inherent diversity of the language, standardization and segmentation issues, and the lack of language resources.

  17. Learning Context for Text Categorization

    OpenAIRE

    Haribhakta, Y. V.; Parag Kulkarni

    2011-01-01

    This paper describes our work which is based on discovering context for text document categorization. The document categorization approach is derived from a combination of a learning paradigm known as relation extraction and an technique known as context discovery. We demonstrate the effectiveness of our categorization approach using reuters 21578 dataset and synthetic real world data from sports domain. Our experimental results indicate that the learned context greatly improves t...

  18. Psychologische Interpretation. Biographien, Texte, Tests

    OpenAIRE

    Fahrenberg, Jochen

    2002-01-01

    Biographien, Texte und Tests werden psychologisch interpretiert. Psychologische Interpretation wird als Übersetzung einer Aussage mit beziehungsstiftenden Erläuterungen definiert. So werden Zusammenhänge erschlossen und Ergebnisse eingeordnet. Interpretation ist Übersetzung und Verständigung. Sie muss Heuristik und Methodenkritik verbinden. Eingeführt wird in diese methodischen Grundlagen und Regeln psychologischer Interpretationen. Die ersten Kapitel des Buches führen mit einer Interpretatio...

  19. Text Analytics to Data Warehousing

    OpenAIRE

    Kalli Srinivasa Nageswara Prasad; S. Ramakrishna

    2010-01-01

    Information hidden or stored in unstructured data can play a critical role in making decisions, understanding and conducting other business functions. Integrating data stored in both structured and unstructured formats can add significant value to an organization. With the extent of development happening in Text Mining and technologies to deal with unstructured and semi structured data like XML and MML(Mining Markup Language) to extract and analyze data, textanalytics has evolved to handle un...

  20. Survey on Text Document Clustering

    OpenAIRE

    M.Thangamani; Dr.P.Thangaraj

    2010-01-01

    Document clustering is also referred as text clustering, and its concept is merely equal to data clustering. It is hardly difficult to find the selective information from an ‘N’number of series information, so that document clustering came into picture. Basically cluster means a group of similar data, document clustering means segregating the data into different groups of similar data. Clustering can be of mathematical, statistical or numerical domain. Clustering is a fundamental data analysi...

  1. Learning Context for Text Categorization

    CERN Document Server

    Haribhakta, Y V

    2011-01-01

    This paper describes our work which is based on discovering context for text document categorization. The document categorization approach is derived from a combination of a learning paradigm known as relation extraction and an technique known as context discovery. We demonstrate the effectiveness of our categorization approach using reuters 21578 dataset and synthetic real world data from sports domain. Our experimental results indicate that the learned context greatly improves the categorization performance as compared to traditional categorization approaches.

  2. TEXT CATEGORIZATION USING QLEARNING ALOGRITHM

    OpenAIRE

    Dr.S.R.Suresh; T.Karthikeyan,; D.B.Shanmugam,; J.Dhilipan

    2011-01-01

    This paper aims at creation of an efficient document classification process using reinforcement learning, a branch of machine learning that concerns itself with optimal sequential decision-making. Onestrength of reinforcement learning is that it provides formalism for measuring the utility of actions that gives benefit only in the future. An effective and flexible classifier learning algorithm is provided, which classifies a set of text documents into a more specific domain like Cricket, Tenn...

  3. [On two antique medical texts].

    Science.gov (United States)

    Rosa, Maria Carlota

    2005-01-01

    The two texts presented here--Regimento proueytoso contra ha pestenença [literally, "useful regime against pestilence"] and Modus curandi cum balsamo ["curing method using balm"]--represent the extent of Portugal's known medical library until circa 1530, produced in gothic letters by foreign printers: Germany's Valentim Fernandes, perhaps the era's most important printer, who worked in Lisbon between 1495 and 1518, and Germdo Galharde, a Frenchman who practiced his trade in Lisbon and Coimbra between 1519 and 1560. Modus curandi, which came to light in 1974 thanks to bibliophile José de Pina Martins, is anonymous. Johannes Jacobi is believed to be the author of Regimento proueytoso, which was translated into Latin (Regimen contra pestilentiam), French, and English. Both texts are presented here in facsimile and in modern Portuguese, while the first has also been reproduced in archaic Portuguese using modern typographical characters. This philological venture into sixteenth-century medicine is supplemented by a scholarly glossary which serves as a valuable tool in interpreting not only Regimento proueytoso but also other texts from the era. Two articles place these documents in historical perspective. PMID:17500134

  4. Text Mining for Protein Docking.

    Directory of Open Access Journals (Sweden)

    Varsha D Badal

    2015-12-01

    Full Text Available The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking. Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu. The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound

  5. Les dislocations: textes et contextes

    OpenAIRE

    Leonarduzzi, Laetitia; Herry, Nadine

    2005-01-01

    In this paper we analyse the contexts in which left and right dislocations appear. Our corpus is based on written as well as oral discourse, with texts ranging from the year 1884 to 2005. After trying to define both types of dislocation, and seeing how far the definitions can be extended, we notice that the dislocated NP is most of the time definite (90% of our examples). This phenomenon may be explained by the notions of anaphora, deixis and thematisation. Dislocations appear both in oral an...

  6. Text writing in the air

    OpenAIRE

    Beg, Saira; Khan, M. Fahad; Baig, Faisal

    2016-01-01

    This paper presents a real time video based pointing method which allows sketching and writing of English text over air in front of mobile camera. Proposed method have two main tasks: first it track the colored finger tip in the video frames and then apply English OCR over plotted images in order to recognize the written characters. Moreover, proposed method provides a natural human-system interaction in such way that it do not require keypad, stylus, pen or glove etc for character input. For...

  7. Thematic networks and text types

    OpenAIRE

    Thomas, Shirley

    2011-01-01

    Dans cet article, la question de l’organisation textuelle est abordée par l’analyse de la progression thématique. Nous nous proposons d’étudier à quel degré les différents types de progression thématique établis dans un texte sont liés à la question de son genre textuel, s’agissant en l’occurrence d’un article de recherche scientifique et d’un article de vulgarisation. Nous considérons également certaines orientations didactiques issues de cette étude.

  8. New Historicism: Text and Context

    Directory of Open Access Journals (Sweden)

    Violeta M. Vesić

    2016-02-01

    Full Text Available During most of the twentieth century history was seen as a phenomenon outside of literature that guaranteed the veracity of literary interpretation. History was unique and it functioned as a basis for reading literary works. During the seventies of the twentieth century there occurred a change of attitude towards history in American literary theory, and there appeared a new theoretical approach which soon became known as New Historicism. Since its inception, New Historicism has been identified with the study of Renaissance and Romanticism, but nowadays it has been increasingly involved in other literary trends. Although there are great differences in the arguments and practices at various representatives of this school, New Historicism has clearly recognizable features and many new historicists will agree with the statement of Walter Cohen that New Historicism, when it appeared in the eighties, represented something quite new in reference to the studies of theory, criticism and history (Cohen 1987, 33. Theoretical connection with Bakhtin, Foucault and Marx is clear, as well as a kind of uneasy tie with deconstruction and the work of Paul de Man. At the center of this approach is a renewed interest in the study of literary works in the light of historical and political circumstances in which they were created. Foucault encouraged readers to begin to move literary texts and to link them with discourses and representations that are not literary, as well as to examine the sociological aspects of the texts in order to take part in the social struggles of today. The study of literary works using New Historicism is the study of politics, history, culture and circumstances in which these works were created. With regard to one of the main fact which is located in the center of the criticism, that history cannot be viewed objectively and that reality can only be understood through a cultural context that reveals the work, re-reading and interpretation of

  9. IMPROVED TEXT CLUSTERING WITH NEIGHBORS

    Directory of Open Access Journals (Sweden)

    Sri Lalitha Y

    2015-03-01

    Full Text Available With ever increasing number of documents on web and other repositories, the task of organizing and categorizing these documents to the diverse need of the user by manual means is a complicated job, hence a machine learning technique named clustering is very useful. Text documents are clustered by pair wise similarity of documents with similarity measures like Cosine, Jaccard or Pearson. Best clustering results are seen when overlapping of terms in documents is less, that is, when clusters are distinguishable. Hence for this problem, to find document similarity we apply link and neighbor introduced in ROCK. Link specifies number of shared neighbors of a pair of documents. Significantly similar documents are called as neighbors. This work applies links and neighbors to Bisecting K-means clustering in identifying seed documents in the dataset, as a heuristic measure in choosing a cluster to be partitioned and as a means to find the number of partitions possible in the dataset. Our experiments on real-time datasets showed a significant improvement in terms of accuracy with minimum time.

  10. By-law no 36/MRA/SG/ANEA from 29 july 2004 on the role, organisation and functioning of l'Autorite Nationale de l'Energie Atomique

    International Nuclear Information System (INIS)

    The text called Arrete approaches in its general provisions the organization, attributions of the various parts (secretariat and various services) and their operation. The provisions emphasize the institutional anchoring of the Authority as well as the details of implementation of its mechanisms of operation

  11. A programmed text in statistics

    CERN Document Server

    Hine, J

    1975-01-01

    Exercises for Section 2 42 Physical sciences and engineering 42 43 Biological sciences 45 Social sciences Solutions to Exercises, Section 1 47 Physical sciences and engineering 47 49 Biological sciences 49 Social sciences Solutions to Exercises, Section 2 51 51 PhYSical sciences and engineering 55 Biological sciences 58 Social sciences 62 Tables 2 62 x - tests involving variances 2 63,64 x - one tailed tests 2 65 x - two tailed tests F-distribution 66-69 Preface This project started some years ago when the Nuffield Foundation kindly gave a grant for writing a pro­ grammed text to use with service courses in statistics. The work carried out by Mrs. Joan Hine and Professor G. B. Wetherill at Bath University, together with some other help from time to time by colleagues at Bath University and elsewhere. Testing was done at various colleges and universities, and some helpful comments were received, but we particularly mention King Edwards School, Bath, who provided some sixth formers as 'guinea pigs' for the fir...

  12. Initial bolometric measurements on text

    International Nuclear Information System (INIS)

    A platinum resistance bolometer has been used to measure the total radiated power in TEXT. Preliminary attempts to determine the scaling of the total radiated power with impurity content, toroidal field, electron density, and plasma current have been made. These measurements indicate that the radiated power is strongly dependent on the impurity content, proportional to the plasma current and electron density, and inversely proportional to the toroidal field. The density and toroidal field dependences are apparently connected with changes in impurity confinement with these parameters. Increases in total radiated power during different impurity injections have also been measured. Shot to shot radial scans of the bolometer across the plasma have been made for several plasma conditions. Estimates of the total radiated power have also been made for these conditions. Comparisons with the ohmic heating input power show that the radiated power is a large percentage of the input power, so that the radiated power is a significant term in thermal transport calculations. This report describes the experimental techniques used and preliminary results of the power measurements

  13. Text documents as social networks

    Science.gov (United States)

    Balinsky, Helen; Balinsky, Alexander; Simske, Steven J.

    2012-03-01

    The extraction of keywords and features is a fundamental problem in text data mining. Document processing applications directly depend on the quality and speed of the identification of salient terms and phrases. Applications as disparate as automatic document classification, information visualization, filtering and security policy enforcement all rely on the quality of automatically extracted keywords. Recently, a novel approach to rapid change detection in data streams and documents has been developed. It is based on ideas from image processing and in particular on the Helmholtz Principle from the Gestalt Theory of human perception. By modeling a document as a one-parameter family of graphs with its sentences or paragraphs defining the vertex set and with edges defined by Helmholtz's principle, we demonstrated that for some range of the parameters, the resulting graph becomes a small-world network. In this article we investigate the natural orientation of edges in such small world networks. For two connected sentences, we can say which one is the first and which one is the second, according to their position in a document. This will make such a graph look like a small WWW-type network and PageRank type algorithms will produce interesting ranking of nodes in such a document.

  14. Orientalist discourse in media texts

    Directory of Open Access Journals (Sweden)

    Necla Mora

    2009-10-01

    Full Text Available By placing itself at the center of the world with a Eurocentric point of view, the West exploits other countries and communities through inflicting cultural change and transformation on them either from within via colonialist movements or from outside via “Orientalist” discourses in line with its imperialist objectives.The West has fictionalized the “image of the Orient” in terms of science by making use of social sciences like anthropology, history and philology and launched an intensive propaganda which covers literature, painting, cinema and other fields of art in order to actualize this fiction. Accordingly, the image of the Orient – which has been built firstly in terms of science then socially – has been engraved into the collective memory of both the Westerner and the Easterner.The internalized “Orientalist” point of view and discourse cause the Westerner to see and perceive the Easterner with the image formed in his/her memory while looking at them. The Easterner represents and expresses himself/herself from the eyes of the Westerner and with the image which the Westerner fictionalized for him/her. Hence, in order to gain acceptance from the West, the East tries to shape itself into the “Orientalist” mold which the Westerner fictionalized for it.Artists, intellectuals, writers and media professionals, who embrace and internalize the stereotypical hegemonic-driven “Orientalist” discourse of the Westerner and who rank among the elite group, reflect their internalized “Orientalist” discourse on their own actions. This condition causes the “Orientalist” clichés to be engraved in the memory of the society; causes the society to view itself with an “Orientalist” point of view and perceive itself with the clichés of the Westerner. Consequently, the second ring of the hegemony is reproduced by the symbolic elites who represent the power/authority within the country.The “Orientalist” discourse, which is

  15. What's so Simple about Simplified Texts? A Computational and Psycholinguistic Investigation of Text Comprehension and Text Processing

    Science.gov (United States)

    Crossley, Scott A.; Yang, Hae Sung; McNamara, Danielle S.

    2014-01-01

    This study uses a moving windows self-paced reading task to assess both text comprehension and processing time of authentic texts and these same texts simplified to beginning and intermediate levels. Forty-eight second language learners each read 9 texts (3 different authentic, beginning, and intermediate level texts). Repeated measures ANOVAs…

  16. Bengali text summarization by sentence extraction

    CERN Document Server

    Sarkar, Kamal

    2012-01-01

    Text summarization is a process to produce an abstract or a summary by selecting significant portion of the information from one or more texts. In an automatic text summarization process, a text is given to the computer and the computer returns a shorter less redundant extract or abstract of the original text(s). Many techniques have been developed for summarizing English text(s). But, a very few attempts have been made for Bengali text summarization. This paper presents a method for Bengali text summarization which extracts important sentences from a Bengali document to produce a summary.

  17. Text History of the Greek Exodus

    OpenAIRE

    Wevers, John William

    1992-01-01

    Chapter I: The Hexaplaric Recension 9; Chapter II: The Byzantine Text Group 41; Chapter III: The Catena Text 64; Chapter IV: The Texts of A and B 81; Chapter V: The Text of Cyril of Alexandria's De Adoratione and Glaphyra 104; Chapter VI: The Composition of Exod 35 to 40 117;Chapter VII: The Critical Text (Exod) 147; Index of Passages 273

  18. SIAM 2007 Text Mining Competition dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — Subject Area: Text Mining Description: This is the dataset used for the SIAM 2007 Text Mining competition. This competition focused on developing text mining...

  19. Text Categorization with Latent Dirichlet Allocation

    Directory of Open Access Journals (Sweden)

    ZLACKÝ Daniel

    2014-05-01

    Full Text Available This paper focuses on the text categorization of Slovak text corpora using latent Dirichlet allocation. Our goal is to build text subcorpora that contain similar text documents. We want to use these better organized text subcorpora to build more robust language models that can be used in the area of speech recognition systems. Our previous research in the area of text categorization showed that we can achieve better results with categorized text corpora. In this paper we used latent Dirichlet allocation for text categorization. We divided initial text corpus into 2, 5, 10, 20 or 100 subcorpora with various iterations and save steps. Language models were built on these subcorpora and adapted with linear interpolation to judicial domain. The experiment results showed that text categorization using latent Dirichlet allocation can improve the system for automatic speech recognition by creating the language models from organized text corpora.

  20. TEXT CLASSIFICATION TOWARD A SCIENTIFIC FORUM

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    Text mining, also known as discovering knowledge from the text, which has emerged as a possible solution for the current information explosion, refers to the process of extracting non-trivial and useful patterns from unstructured text. Among the general tasks of text mining such as text clustering,summarization, etc, text classification is a subtask of intelligent information processing, which employs unsupervised learning to construct a classifier from training text by which to predict the class of unlabeled text. Because of its simplicity and objectivity in performance evaluation, text classification was usually used as a standard tool to determine the advantage or weakness of a text processing method, such as text representation, text feature selection, etc. In this paper, text classification is carried out to classify the Web documents collected from XSSC Website (http://www. xssc.ac.cn). The performance of support vector machine (SVM) and back propagation neural network (BPNN) is compared on this task. Specifically, binary text classification and multi-class text classification were conducted on the XSSC documents. Moreover, the classification results of both methods are combined to improve the accuracy of classification. An experiment is conducted to show that BPNN can compete with SVM in binary text classification; but for multi-class text classification, SVM performs much better. Furthermore, the classification is improved in both binary and multi-class with the combined method.

  1. Text Power: tools for the Cultural Heritage.

    OpenAIRE

    Picchi, Eugenio; Sassolini, Eva

    2009-01-01

    This article presents NLP techniques (text mining, text analysis) to create tools for the avaluation, analysis and classification of text materials available on the web. In particular we developed tools for the automatic extraction of mistic relevant information related to the cultural heritage domain and tools for linguistic resouces creation. On this knowledge basis, we also developed a system for text browsing.

  2. Characterizing Reading Comprehension of Mathematical Texts

    Science.gov (United States)

    Osterholm, Magnus

    2006-01-01

    This study compares reading comprehension of three different texts: two mathematical texts and one historical text. The two mathematical texts both present basic concepts of group theory, but one does it using mathematical symbols and the other only uses natural language. A total of 95 upper secondary and university students read one of the…

  3. Examining Text Complexity in the Early Grades

    Science.gov (United States)

    Fitzgerald, Jill; Elmore, Jeff; Hiebert, Elfrieda H.; Koons, Heather H.; Bowen, Kimberly; Sanford-Moore, Eleanor E.; Stenner, A. Jackson

    2016-01-01

    The Common Core raises the stature of texts to new heights, creating a hubbub. The fuss is especially messy at the early grades, where children are expected to read more complex texts than in the past. But early-grades teachers have been given little actionable guidance about text complexity. The authors recently examined early-grades texts to…

  4. Open architecture for multilingual parallel texts

    CERN Document Server

    Benitez, M T Carrasco

    2008-01-01

    Multilingual parallel texts (abbreviated to parallel texts) are linguistic versions of the same content ("translations"); e.g., the Maastricht Treaty in English and Spanish are parallel texts. This document is about creating an open architecture for the whole Authoring, Translation and Publishing Chain (ATP-chain) for the processing of parallel texts.

  5. Scalable Text Mining with Sparse Generative Models

    OpenAIRE

    Puurula, Antti

    2016-01-01

    The information age has brought a deluge of data. Much of this is in text form, insurmountable in scope for humans and incomprehensible in structure for computers. Text mining is an expanding field of research that seeks to utilize the information contained in vast document collections. General data mining methods based on machine learning face challenges with the scale of text data, posing a need for scalable text mining methods. This thesis proposes a solution to scalable text mining: gener...

  6. Using compression to identify acronyms in text

    OpenAIRE

    Yeates, Stuart; Bainbridge, David; Witten, Ian H

    2000-01-01

    Text mining is about looking for patterns in natural language text, and may be defined as the process of analyzing text to extract information from it for particular purposes. In previous work, we claimed that compression is a key technology for text mining, and backed this up with a study that showed how particular kinds of lexical tokens---names, dates, locations, etc.---can be identified and located in running text, using compression models to provide the leverage necessary to distinguish ...

  7. Discover Effective Pattern for Text Mining

    OpenAIRE

    Khade, A. D.; A. B. Karche

    2014-01-01

    Many data mining techniques have been discovered for finding useful patterns in documents like text document. However, how to use effective and bring to up to date discovered patterns is still an open research task, especially in the domain of text mining. Text mining is the finding of very interesting knowledge (or features) in the text documents. It is a challenging task to find appropriate knowledge (or features) in text documents to help users to find what they exactly want...

  8. Text and Context: Language Analytics in Finance

    OpenAIRE

    Das, Sanjiv Ranjan

    2014-01-01

    This monograph surveys the technology and empirics of text analytics in finance. I present various tools of information extraction and basic text analytics. I survey a range of techniques of classification and predictive analytics, and metrics used to assess the performance of text analytics algorithms. I then review the literature on text mining and predictive analytics in finance, and its connection to networks, covering a wide range of text sources such as blogs, news, web posts, corporate...

  9. Composing Measures for Computing Text Similarity

    OpenAIRE

    Bär, Daniel; Zesch, Torsten; Gurevych, Iryna

    2015-01-01

    We present a comprehensive study of computing similarity between texts. We start from the observation that while the concept of similarity is well grounded in psychology, text similarity is much less well-defined in the natural language processing community. We thus define the notion of text similarity and distinguish it from related tasks such as textual entailment and near-duplicate detection. We then identify multiple text dimensions, i.e. characteristics inherent to texts that can be used...

  10. Parsing Arabic Texts Using Rhetorical Structure Theory

    OpenAIRE

    H. I. Mathkour; A. A. Touir; W. A. Al-Sanea

    2008-01-01

    Problem Statement: Processing texts based on rhetorical structure theory has shown interesting results. Rhetorical Structure Theory (RST) improves the ability of extracting the semantic behind the processed text. Different applications such as information retrieval, text summarization, and text generation have proved to give better result using RST. The applicability of RST to process and understand texts has been studied in several languages, but little is devoted to the Arabic language. Giv...

  11. Text To Speech System for Telugu Language

    OpenAIRE

    Siva kumar, M; E. Prakash Babu

    2014-01-01

    Telugu is one of the oldest languages in India. This paper describes the development of Telugu Text-to-Speech System (TTS).In Telugu TTS the input is Telugu text in Unicode. The voices are sampled from real recorded speech. The objective of a text to speech system is to convert an arbitrary text into its corresponding spoken waveform. Speech synthesis is a process of building machinery that can generate human-like speech from any text input to imitate human speakers. Text proc...

  12. ERGONYMS AS COMPONENTS OF PHARMACEUTICALS ADVERTISING TEXTS

    OpenAIRE

    НАСАКІНА, Світлана Вікторівна

    2016-01-01

    The article deals with the functioning of ergonyms in pharmaceuticals advertising texts. The purpose of the article is the analysis of the ergonyms functioning in pharmaceuticals advertising texts. The tasks are defining the specifics of the ergonyms functioning in pharmaceuticals advertising texts and creating an ergonyms classification on the material under study. There were found similarities in the pharmaceuticals advertising texts in the Ukrainian, Bulgarian and Russian languages. The ar...

  13. Automatic text categorisation of racist webpages

    OpenAIRE

    Greevy, Edel

    2004-01-01

    Automatic Text Categorisation (TC) involves the assignment of one or more predefined categories to text documents in order that they can be effectively managed. In this thesis we examine the possibility of applying automatic text categorisation to the problem of categorising texts (web pages) based on whether or not they are racist. TC has proven successful for topic-based problems such as news story categorisation. However, the problem of detecting racism is dissimilar to topic-based pro...

  14. Arabic Text Mining Using Rule Based Classification

    OpenAIRE

    Fadi Thabtah; Omar Gharaibeh; Rashid Al-Zubaidy

    2012-01-01

    A well-known classification problem in the domain of text mining is text classification, which concerns about mapping textual documents into one or more predefined category based on its content. Text classification arena recently attracted many researchers because of the massive amounts of online documents and text archives which hold essential information for a decision-making process. In this field, most of such researches focus on classifying English documents while there are limited studi...

  15. A Survey on Preprocessing in Text Mining

    OpenAIRE

    Dr. Anadakumar. K; Ms. Padmavathy. V

    2013-01-01

    Now-a-days information’s are stored electronically in databases. Extracting reliable, unknown and useful information from the abundant source is an eminent task. Data mining and Text mining are the process for extracting unknown and useful information. Text Mining is the process of extracting interesting and non-trivial patterns or knowledge from text documents. This paper presents the related activities and focuses on preprocessing steps in text mining.

  16. Effective Term Based Text Clustering Algorithms

    OpenAIRE

    P. Ponmuthuramalingam,; T. Devi

    2010-01-01

    Text clustering methods can be used to group large sets of text documents. Most of the text clustering methods do not address the problems of text clustering such as very high dimensionality of the data and understandability of the clustering descriptions. In this paper, a frequent term based approach of clustering has been introduced; it provides a natural way of reducing a large dimensionality of the document vector space. This approach is based on clustering the low dimensionality frequent...

  17. Folklore text in a process of oblivion

    Directory of Open Access Journals (Sweden)

    Ilić Marija

    2005-01-01

    Full Text Available The paper presents contemporary research of the traditional folklore from the perspective of ethno linguistics and anthrop linguistics. The analysis is based on material collected among Serbs from Szigetcsep (Hungary, 2001 during the ethno linguistic field survey. The paper specifically discusses the methods of collecting traditional folklore texts in the ethno linguistic interview, discourse analysis of utterances commenting on folklore texts and ways of memorization of folklore texts.

  18. On-Line Full Text Pathology Database

    OpenAIRE

    Fink, Daniel; Clark, Anthony; Sideli, Robert

    1988-01-01

    A free text database for pathology reports has been developed using the BRS/SEARCH free text management software. All pathology reports are stored in the free text pathology database. Standardized section headings make any word searchable both by itself or within the context of a specific part of the report. The free text management software supplies a rich set of Boolean, positional, and relational operators. These operators make an iterative search strategy an effective method of searching ...

  19. The Language of Ancient Greek Philological Texts

    OpenAIRE

    Brigita Kukjalko

    2011-01-01

    Annotation to the Doctoral Thesis by Brigita Aleksejeva: The Language of Ancient Greek Philological Texts An Ancient Greek philological text often combined the research of various language-related issues, which are nowadays studied by separate branches of linguistics – such as orthography, phonology, morphology, lexicology, syntax, and stylistics. The language of these texts differs from that of the fictional and non-theoretical texts of the period: since they represent the origins of the ...

  20. Mathematical Texts as Narrative: Rethinking Curriculum

    Science.gov (United States)

    Dietiker, Leslie

    2013-01-01

    This paper proposes a framework for reading mathematics texts as narratives. Building from a narrative framework of Meike Bal, a reader's experience with the mathematical content as it unfolds in the text (the "mathematical story") is distinguished from his or her logical reconstruction of the content beyond the text (the…

  1. The Costs of Texting in the Classroom

    Science.gov (United States)

    Lawson, Dakota; Henderson, Bruce B.

    2015-01-01

    Many college students seem to find it impossible to resist the temptation to text on electronic devices during class lectures and discussions. One common response of college professors is to yield to the inevitable and try to ignore student texting. However, research indicates that because of limited cognitive capacities, even simple texting can…

  2. Knowledge discovery data and text mining

    CERN Document Server

    Olmer, Petr

    2008-01-01

    Data mining and text mining refer to techniques, models, algorithms, and processes for knowledge discovery and extraction. Basic de nitions are given together with the description of a standard data mining process. Common models and algorithms are presented. Attention is given to text clustering, how to convert unstructured text to structured data (vectors), and how to compute their importance and position within clusters.

  3. Role of Terms in Popular Science Text

    Directory of Open Access Journals (Sweden)

    Zhabbarova F. U.

    2013-01-01

    Full Text Available The article examines and determines the specifics of terminological vocabulary used in a popular science text. It differentiates the notions of cohesion and coherence. The article reveals the main terminological means realizing cohesion in the text of a popular science article.

  4. Text History of the Greek Numbers

    OpenAIRE

    Wevers, John William

    1982-01-01

    Chapter 1 The x Group 7; Chapter 2 The Byzantine Text 17; Chapter 3 The Hexaplaric Recension 43; Chapter 4 The Texts of B and A 66; Chapter 5 Papyrus 963 as Textual Witness 86; Chapter 6 The Critical Text (Num) 94; Index of Passages 136

  5. Applying statistical methods to text steganography

    CERN Document Server

    Nechta, Ivan

    2011-01-01

    This paper presents a survey of text steganography methods used for hid- ing secret information inside some covertext. Widely known hiding techniques (such as translation based steganography, text generating and syntactic embed- ding) and detection are considered. It is shown that statistical analysis has an important role in text steganalysis.

  6. Figure text extraction in biomedical literature.

    Directory of Open Access Journals (Sweden)

    Daehyun Kim

    Full Text Available BACKGROUND: Figures are ubiquitous in biomedical full-text articles, and they represent important biomedical knowledge. However, the sheer volume of biomedical publications has made it necessary to develop computational approaches for accessing figures. Therefore, we are developing the Biomedical Figure Search engine (http://figuresearch.askHERMES.org to allow bioscientists to access figures efficiently. Since text frequently appears in figures, automatically extracting such text may assist the task of mining information from figures. Little research, however, has been conducted exploring text extraction from biomedical figures. METHODOLOGY: We first evaluated an off-the-shelf Optical Character Recognition (OCR tool on its ability to extract text from figures appearing in biomedical full-text articles. We then developed a Figure Text Extraction Tool (FigTExT to improve the performance of the OCR tool for figure text extraction through the use of three innovative components: image preprocessing, character recognition, and text correction. We first developed image preprocessing to enhance image quality and to improve text localization. Then we adapted the off-the-shelf OCR tool on the improved text localization for character recognition. Finally, we developed and evaluated a novel text correction framework by taking advantage of figure-specific lexicons. RESULTS/CONCLUSIONS: The evaluation on 382 figures (9,643 figure texts in total randomly selected from PubMed Central full-text articles shows that FigTExT performed with 84% precision, 98% recall, and 90% F1-score for text localization and with 62.5% precision, 51.0% recall and 56.2% F1-score for figure text extraction. When limiting figure texts to those judged by domain experts to be important content, FigTExT performed with 87.3% precision, 68.8% recall, and 77% F1-score. FigTExT significantly improved the performance of the off-the-shelf OCR tool we used, which on its own performed with 36

  7. Text line Segmentation of Curved Document Images

    Directory of Open Access Journals (Sweden)

    Anusree.M

    2014-05-01

    Full Text Available Document image analysis has been widely used in historical and heritage studies, education and digital library. Document image analytical techniques are mainly used for improving the human readability and the OCR quality of the document. During the digitization, camera captured images contain warped document due perspective and geometric distortions. The main difficulty is text line detection in the document. Many algorithms had been proposed to address the problem of printed document text line detection, but they failed to extract text lines in curved document. This paper describes a segmentation technique that detects the curled text line in camera captured document images.

  8. An Embedded Application for Degraded Text Recognition

    Directory of Open Access Journals (Sweden)

    Thillou Céline

    2005-01-01

    Full Text Available This paper describes a mobile device which tries to give the blind or visually impaired access to text information. Three key technologies are required for this system: text detection, optical character recognition, and speech synthesis. Blind users and the mobile environment imply two strong constraints. First, pictures will be taken without control on camera settings and a priori information on text (font or size and background. The second issue is to link several techniques together with an optimal compromise between computational constraints and recognition efficiency. We will present the overall description of the system from text detection to OCR error correction.

  9. Text To Speech System for Telugu Language

    Directory of Open Access Journals (Sweden)

    M. Siva Kumar

    2014-03-01

    Full Text Available Telugu is one of the oldest languages in India. This paper describes the development of Telugu Text-to-Speech System (TTS.In Telugu TTS the input is Telugu text in Unicode. The voices are sampled from real recorded speech. The objective of a text to speech system is to convert an arbitrary text into its corresponding spoken waveform. Speech synthesis is a process of building machinery that can generate human-like speech from any text input to imitate human speakers. Text processing and speech generation are two main components of a text to speech system. To build a natural sounding speech synthesis system, it is essential that text processing component produce an appropriate sequence of phonemic units. Generation of sequence of phonetic units for a given standard word is referred to as letter to phoneme rule or text to phoneme rule. The complexity of these rules and their derivation depends upon the nature of the language. The quality of a speech synthesizer is judged by its closeness to the natural human voice and understandability. In this paper we described an approach to build a Telugu TTS system using concatenative synthesis method with syllable as a basic unit of concatenation.

  10. The Research of Chinese Text Proofreading Algorithm

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    Generally, text proofreading consists of two procedures, finding the wrongly used words and then presenting the correct forms. At present, most of the Chinese text proofreading focuses on finding the wrongly used words, but pays less attention to correcting these errors. In this paper, the Chinese text features are interpreted first and then a Chinese text proofreading method and its algorithm are introduced. In this algorithm, text features, including text statistical feature and language structure feature, are properly used. Here, correcting errors goes on at the same time with finding errors. Experimental results show that this method has a performance of detecting 75% of wrongly used Chinese words and correcting about 60% of them with the first candidates.

  11. NEW TECHNIQUES USED IN AUTOMATED TEXT ANALYSIS

    Directory of Open Access Journals (Sweden)

    M. I strate

    2010-12-01

    Full Text Available Automated analysis of natural language texts is one of the most important knowledge discovery tasks for any organization. According to Gartner Group, almost 90% of knowledge available at an organization today is dispersed throughout piles of documents buried within unstructured text. Analyzing huge volumes of textual information is often involved in making informed and correct business decisions. Traditional analysis methods based on statistics fail to help processing unstructured texts and the society is in search of new technologies for text analysis. There exist a variety of approaches to the analysis of natural language texts, but most of them do not provide results that could be successfully applied in practice. This article concentrates on recent ideas and practical implementations in this area.

  12. Text genre and time discursive construction

    Directory of Open Access Journals (Sweden)

    Maria Antónia Coutinho

    2012-12-01

    Full Text Available This paper adopts a text and discourse linguistics framework. We will assume discourses (or discourse types represent an intermediate organizational level between text genre and the specific linguistic devices in use (Bronckart, 1997, 2008. Discourse types play a key role in language activity, as they allow the transition between individual and collective representations, and they involve temporal and agentive relationships. In our research, we focus the linguistic devices associated with temporal relationship, in order to distinguish between expositive and narrative discourses and to verify our main research issue: to understand how social activity and text genre can constrain the presence of different discourse types and to describe what function can assume the same discourse type in different texts (of different genres. To obtain evidence to address these questions, we analyze texts produced in literary, familiar and scientific activities.

  13. Tabularité : des textes aux corpus

    OpenAIRE

    Florea, Marie-Laure

    2011-01-01

    Cet article propose d’envisager un type particulier de texte, le texte tabulaire, qui s’oppose au texte linéaire, du double point de vue du texte et du corpus. Le texte tabulaire est un texte composé de plusieurs modules, ayant chacun une autonomie relative mais étant interdépendants les uns des autres, regroupés sur un espace matériel borné. L’objectif est double : il s’agit d’une part de décrire, dans une perspective croisée d’analyse du discours et de linguistique textuelle, le fonctionnem...

  14. Holiday text message well-wishing

    Directory of Open Access Journals (Sweden)

    Ivanović-Barišić Milina

    2008-01-01

    Full Text Available Text messages became an important means of everyday communication, especially so among the younger generations. As a relatively new way of communication sending and receiving of text messages is shoving other, classical means such as letter writing. Sending a written message via cellular phone became usual means of communication in almost all life circumstances. This paper discusses messages sent out as well-wishing/cards for the most important yearly holidays.

  15. Mining Quality Phrases from Massive Text Corpora

    OpenAIRE

    Liu, Jialu; Shang, Jingbo; Wang, Chi; Ren, Xiang; Han, Jiawei

    2015-01-01

    Text data are ubiquitous and play an essential role in big data applications. However, text data are mostly unstructured. Transforming unstructured text into structured units (e.g., semantically meaningful phrases) will substantially reduce semantic ambiguity and enhance the power and efficiency at manipulating such data using database technology. Thus mining quality phrases is a critical research problem in the field of databases. In this paper, we propose a new framework that extracts quali...

  16. Extracting text data from the webpages

    OpenAIRE

    Mazal, Zdeněk

    2011-01-01

    This work focus at data and especially text mining from Web pages, an overview of programs for downloading the text and ways of their extraction. It also contains an overview of the most frequently used programs for extracting data from internet. The output of this thesis is a Java program that can download text from a selection of servers and save them into xml le.

  17. Frontiers of biomedical text mining: current progress

    OpenAIRE

    Zweigenbaum, Pierre; Demner-Fushman, Dina; Hong YU; Cohen, Kevin B.

    2007-01-01

    It is now almost 15 years since the publication of the first paper on text mining in the genomics domain, and decades since the first paper on text mining in the medical domain. Enormous progress has been made in the areas of information retrieval, evaluation methodologies and resource construction. Some problems, such as abbreviation-handling, can essentially be considered solved problems, and others, such as identification of gene mentions in text, seem likely to be solved soon. However, a ...

  18. Hex: dynamics and probabilistic text entry

    OpenAIRE

    J. Williamson; Murray-Smith, R.

    2005-01-01

    We present a gestural interface for entering text on a mobile device via continuous movements, with control based on feedback from a probabilistic language model. Text is represented by continuous trajectories over a hexagonal tessellation, and entry becomes a manual control task. The language model is used to infer user intentions and provide predictions about future actions, and the local dynamics adapt to reduce effort in entering probable text. This leads to an interface with a stable lay...

  19. Automatic summary evaluation based on text grammars

    OpenAIRE

    Branny, Emilia

    2007-01-01

    In this paper, I describe a method for evaluating automatically generated text summaries. The method is inspired by research in text grammars by Teun Van Dijk. It addresses a text as a complex structure, the elements of which are interconnected both on the level of form and meaning, and the well-formedness of which should be described on both of these levels. The method addresses current problems of summary evaluation methods, especially the problem of quantifying informativity, as well as th...

  20. Text Mining of Supreme Administrative Court Jurisdictions

    OpenAIRE

    Feinerer , Ingo; Hornik, Kurt

    2007-01-01

    Within the last decade text mining, i.e., extracting sensitive information from text corpora, has become a major factor in business intelligence. The automated textual analysis of law corpora is highly valuable because of its impact on a company's legal options and the raw amount of available jurisdiction. The study of supreme court jurisdiction and international law corpora is equally important due to its effects on business sectors. In this paper we use text mining methods to investigate Au...

  1. Scene text segmentation based on thresholding

    OpenAIRE

    Perez Sanmartín, Alejandro

    2014-01-01

    This research deals with the problem of text segmentation in scene images. Introduction deals with the information contained in an image and the different properties that will be useful for image segmentation. After that, the process of extraction of textual information is explained step by step. Furthermore, the problem of scene text segmentation is described more precisely and an overview of more popular existing methods is given. Text segmentation method is created and implemented using...

  2. Multilingual Text Detection with Nonlinear Neural Network

    OpenAIRE

    Lin Li; Shengsheng Yu; Luo Zhong; Xiaozhen Li

    2015-01-01

    Multilingual text detection in natural scenes is still a challenging task in computer vision. In this paper, we apply an unsupervised learning algorithm to learn language-independent stroke feature and combine unsupervised stroke feature learning and automatically multilayer feature extraction to improve the representational power of text feature. We also develop a novel nonlinear network based on traditional Convolutional Neural Network that is able to detect multilingual text regions in th...

  3. Translation Strategies of Non-literary Texts

    Institute of Scientific and Technical Information of China (English)

    杨静

    2015-01-01

    Translator's subjectivity is closely related to the choice of the style of the translated texts and translation strategies.This paper presents an analytical study of translation strategies of non-literary texts.It introduces different non-literary texts,and then generalizes some factors influencing the selection of translation strategies.Take these Influencing factors into account,Translators should adopt different translation strategies

  4. Beyond Text Theory: Understanding Literary Response

    OpenAIRE

    Miall, David S.; Kuiken, Don

    1994-01-01

    Approaches to text comprehension that focus on propositional, inferential, and elaborative processes have often been considered capable of extension in principle to literary texts, such as stories or poems. However, we argue that literary response is influenced by stylistic features that result in defamiliarization; that defamiliarization invokes feeling which calls on personal perspectives and meanings; and that these aspects of literary response are not addressed by current text theories. T...

  5. Financial Statement Fraud Detection using Text Mining

    Directory of Open Access Journals (Sweden)

    Rajan Gupta

    2013-01-01

    Full Text Available Data mining techniques have been used enormously by the researchers’ community in detecting financial statement fraud. Most of the research in this direction has used the numbers (quantitative information i.e. financial ratios present in the financial statements for detecting fraud. There is very little or no research on the analysis of text such as auditor’s comments or notes present in published reports. In this study we propose a text mining approach for detecting financial statement fraud by analyzing the hidden clues in the qualitative information (text present in financial statements.

  6. Robust Text Detection in Natural Scene Images.

    Science.gov (United States)

    Yin, Xu-Cheng; Yin, Xuwang; Huang, Kaizhu; Hao, Hong-Wei

    2014-05-01

    Text detection in natural scene images is an important prerequisite for many content-based image analysis tasks. In this paper, we propose an accurate and robust method for detecting texts in natural scene images. A fast and effective pruning algorithm is designed to extract Maximally Stable Extremal Regions (MSERs) as character candidates using the strategy of minimizing regularized variations. Character candidates are grouped into text candidates by the single-link clustering algorithm, where distance weights and clustering threshold are learned automatically by a novel self-training distance metric learning algorithm. The posterior probabilities of text candidates corresponding to non-text are estimated with a character classifier; text candidates with high non-text probabilities are eliminated and texts are identified with a text classifier. The proposed system is evaluated on the ICDAR 2011 Robust Reading Competition database; the f-measure is over 76%, much better than the state-of-the-art performance of 71%. Experiments on multilingual, street view, multi-orientation and even born-digital databases also demonstrate the effectiveness of the proposed method. PMID:26353230

  7. An Improved Algorithm of Bayesian Text Categorization

    Directory of Open Access Journals (Sweden)

    Tao Dong

    2011-08-01

    Full Text Available Text categorization is a fundamental methodology of text mining and a hot topic of the research of data mining and web mining in recent years. It plays an important role in building traditional information retrieval, web indexing architecture, Web information retrieval, and so on. This paper presents an improved algorithm of text categorization that combines the feature weighting technique with Naïve Bayesian classifier. Experimental results show that using the improved Gini index algorithm to feature weight can improve the performance of Naïve Bayesian classifier effectively. This algorithm obtains good application in the sensitive information recognition system.

  8. An approach for NL text interpretation

    Directory of Open Access Journals (Sweden)

    Anatol Popescu

    2007-11-01

    Full Text Available For modeling the interpretation process of NL sentences we use the mechanisms implying semantic networks that assure syntactic - semantic text interpretation (SSI, including an understanding axiomatic model, interpretation model and denotation model to represent the result of SSI. These models estimate the correctness and the consistency of texts too. Also it implements an information extraction from texts in NL. Our approach based, mainly, upon semantic networks grammars has an extraordinary interpretation potential implying a system of completely new concepts and processing methods.

  9. Rapid and effective synthesis of $\\text{}^{40}\\text{Ca}-\\text{}^{27}\\text{Al}$ ion pair towards quantum logic optical clock

    CERN Document Server

    Shang, Junjuan; Cao, Jian; Wang, Shaomao; Shu, Hualin; Huang, Xueren

    2016-01-01

    High precision atomic clocks have been applied not only to very important technological problems such as synchronization and global navigation systems, but to the fundament precision measurement physics. Single $\\text{}^{27}\\text{Al}^+$ is one of the most attractions of selection system due to its very low blackbody radiation effect which dominates frequency shifts in other optical clock systems. Up to now, the $\\text{}^{27}\\text{Al}^+$ still could not be laser-cooled directly by reason that the absence of 167nm laser. Sympathetic cooling is a viable method to solve this problem. In this work, we used a single laser cooled $\\text{}^{40}\\text{Ca}^+$ to sympathetically cool one $\\text{}^{27}\\text{Al}^+$ in linear Paul trap. Comparing to laser ablation method we got a much lower velocity atoms sprayed from a home-made atom oven, which would make loading aluminum ion more efficient and the sympathetic cooling much easier. By the method of precisely measuring the secular frequency of the ion pair, finally we prove...

  10. Flexible frontiers for text division into rows

    Directory of Open Access Journals (Sweden)

    Dan L. Lacrămă

    2009-01-01

    Full Text Available This paper presents an original solution for flexible hand-written text division into rows. Unlike the standard procedure, the proposed method avoids the isolated characters extensions amputation and reduces the recognition error rate in the final stage.

  11. Multimodality, Literacy and Texts: Developing a Discourse

    Science.gov (United States)

    Bearne, Eve

    2009-01-01

    This article argues for the development of a framework through which to describe children's multimodal texts. Such a shared discourse should be capable of including different modes and media and the ways in which children integrate and combine them for their own meaning-making purposes. It should also acknowledge that multimodal texts are not…

  12. Text mining and visualization using VOSviewer

    OpenAIRE

    van Eck, Nees Jan; Waltman, Ludo

    2011-01-01

    VOSviewer is a computer program for creating, visualizing, and exploring bibliometric maps of science. In this report, the new text mining functionality of VOSviewer is presented. A number of examples are given of applications in which VOSviewer is used for analyzing large amounts of text data.

  13. Readability Revisited? The Implications of Text Complexity

    Science.gov (United States)

    Wray, David; Janan, Dahlia

    2013-01-01

    The concept of readability has had a variable history, moving from a position where it was considered as a very important topic for those responsible for producing texts and matching those texts to the abilities and needs of learners, to its current declining visibility in the education literature. Some important work has been coming from the USA…

  14. Ontology Assisted Formal Specification Extraction from Text

    Directory of Open Access Journals (Sweden)

    Andreea Mihis

    2010-12-01

    Full Text Available In the field of knowledge processing, the ontologies are the most important mean. They make possible for the computer to understand better the natural language and to make judgments. In this paper, a method which use ontologies in the semi-automatic extraction of formal specifications from a natural language text is proposed.

  15. Text comprehension strategy instruction with poor readers

    NARCIS (Netherlands)

    Van den Bos, K.P.; Aarnoudse, C.C.; Brand-Gruwel, S.

    1998-01-01

    The goal of this study was to investigate the effects of teaching text comprehension strategies to children with decoding and reading comprehension problems and with a poor or normal listening ability. Two experiments are reported. Four text comprehension strategies, viz., question generation, summa

  16. Modeling text with generalizable Gaussian mixtures

    DEFF Research Database (Denmark)

    Hansen, Lars Kai; Sigurdsson, Sigurdur; Kolenda, Thomas;

    2000-01-01

    We apply and discuss generalizable Gaussian mixture (GGM) models for text mining. The model automatically adapts model complexity for a given text representation. We show that the generalizability of these models depends on the dimensionality of the representation and the sample size. We discuss...

  17. The socio-demographics of texting

    DEFF Research Database (Denmark)

    Ling, Richard; Bertel, Troels Fibæk; Sundsøy, Pål

    2012-01-01

    messages go to only five other persons. Finally, we find that there is pronounced homophily in terms of age and gender in texting relationships. These findings support previous claims that texting is an important element of teen culture and is an element in the construction of a bounded solidarity....

  18. Arabic Text Classification Using Support Vector Machines

    NARCIS (Netherlands)

    Gharib, Tarek Fouad; Habib, Mena Badieh; Fayed, Zaki Taha; Zhu, Qiang

    2009-01-01

    Text classification (TC) is the process of classifying documents into a predefined set of categories based on their content. Arabic language is highly inflectional and derivational language which makes text mining a complex task. In this paper we applied the Support Vector Machines (SVM) model in cl

  19. Teaching Theory through Popular Culture Texts

    Science.gov (United States)

    Trier, James

    2007-01-01

    In this article, the author describes a pedagogical approach to teaching theory to pre-service teachers. This approach involves articulating academic texts that introduce theoretical ideas and tools with carefully selected popular culture texts that can be taken up to illustrate the elements of a particular theory. Examples of the theories…

  20. Text, Talk, and Journalistic Quoting Practices.

    Science.gov (United States)

    Zelizer, Barbie

    1995-01-01

    Explores journalistic quoting practices as an interface between written and oral modes of communication, or between text and talk. Examines both prescriptive and performative dimensions of journalistic quoting across the media. States that when quoting, journalists creatively mix and meld text and talk. Suggests that the cogency of news…

  1. The illusion of creation of the text

    OpenAIRE

    Rahman, Md. Shaifur

    2005-01-01

    Is it really possible to create a (literary) text? It is actually impossible as to create an authentic text we need an authentic context which is impossible to create. So all the literary txets we have are not authentic and are not created authentically.

  2. Helping Children Become More Knowledgeable through Text

    Science.gov (United States)

    Neuman, Susan B.; Roskos, Kathleen

    2012-01-01

    With the adoption of the Common Core State Standards, curriculum resources are shifting from an emphasis on literary texts to a greater focus on informational texts. Although we need to understand the intention of these new Common Core State Standards, and the important drive toward greater content knowledge for all students, we must be wary of…

  3. A text in Romani from 1622

    DEFF Research Database (Denmark)

    Bakker, Peter

    this is a reprint of a 2012 article: A new old text in Romani: Lord's Prayer, 1622. International Journal of Romani Language and Culture 2 (2011): 193-212.......this is a reprint of a 2012 article: A new old text in Romani: Lord's Prayer, 1622. International Journal of Romani Language and Culture 2 (2011): 193-212....

  4. Text Steganography with Multi level Shielding

    Directory of Open Access Journals (Sweden)

    Sharon Rose Govada

    2012-07-01

    Full Text Available Steganography it is a form of security through obscurity. It is the art and science of writing hidden messages in such a way that no one, except sender and intended recipient can understand the hidden message,. The purpose of steganography is covert communication-to hide the existence of a message from a third party. Compared with study on text-steganography, research on text-steganalysis is in its infancy. In this paper, we present a method that is capable of performing text Steganography that is more reliable and secure when compared to the existing algorithms. Our method is a combination of Word shifting, Text Steganography and Synonym Text Steganography. So we called this as “Three Phase Shielding Text Steganography” This method overcomes various limitations faced by the existing Steganographic algorithms. The experimental results are very encouraging when compared to the already existing algorithms. Our method also helps in finding out the embedding rate of a secret message in a text document.

  5. Role of Terms in Popular Science Text

    OpenAIRE

    Zhabbarova F. U.

    2013-01-01

    The article examines and determines the specifics of terminological vocabulary used in a popular science text. It differentiates the notions of cohesion and coherence. The article reveals the main terminological means realizing cohesion in the text of a popular science article.

  6. Bodily Pleasures and/as the Text

    Science.gov (United States)

    Hagood, Margaret C.

    2005-01-01

    Literacy education is at a crossroads. While traditional school experiences still prize disembodied experiences of reading print-based texts as the pinnacle of sound education, informal learning experiences provide fruitful examples of the ways that visual texts are read as they are embodied by readers. In this paper I draw from the literacy lives…

  7. Texting your way to healthier eating?

    DEFF Research Database (Denmark)

    Pedersen, Susanne; Grønhøj, Alice; Thøgersen, John

    2016-01-01

    This study investigates the effects of a feedback intervention employing text messaging during 11 weeks on adolescents’ behavior, self-efficacy and outcome expectations regarding fruit and vegetable intake. A pre- and post-survey was completed by 1488 adolescents school-wise randomly allocated to a...... control group and two experimental groups.Bothexperimentalgroupssetweeklygoals on fruit and vegetable intake, reported their consumption daily and subsequently received feedback on their performance via mobile text messaging (Short Message Service [SMS]). The second experimental group also received, in...... sent text messages, on intervention outcomes. Participantssendingmorethanhalfofthepossible text messages significantly increased their fruit and vegetable intake. Participants sending between 10% and 50% of the possible text messages experienced a significant drop in self-efficacy and those sending less...

  8. Using Genetic Algorithms for Texts Classification Problems

    Directory of Open Access Journals (Sweden)

    A. A. Shumeyko

    2009-01-01

    Full Text Available The avalanche quantity of the information developed by mankind has led to concept of automation of knowledge extraction – Data Mining ([1]. This direction is connected with a wide spectrum of problems - from recognition of the fuzzy set to creation of search machines. Important component of Data Mining is processing of the text information. Such problems lean on concept of classification and clustering ([2]. Classification consists in definition of an accessory of some element (text to one of in advance created classes. Clustering means splitting a set of elements (texts on clusters which quantity are defined by localization of elements of the given set in vicinities of these some natural centers of these clusters. Realization of a problem of classification initially should lean on the given postulates, basic of which – the aprioristic information on primary set of texts and a measure of affinity of elements and classes.

  9. Integrating Text Plans for Conciseness and Coherence

    CERN Document Server

    Harvey, T; Harvey, Terrence; Carberry, Sandra

    1998-01-01

    Our experience with a critiquing system shows that when the system detects problems with the user's performance, multiple critiques are often produced. Analysis of a corpus of actual critiques revealed that even though each individual critique is concise and coherent, the set of critiques as a whole may exhibit several problems that detract from conciseness and coherence, and consequently assimilation. Thus a text planner was needed that could integrate the text plans for individual communicative goals to produce an overall text plan representing a concise, coherent message. This paper presents our general rule-based system for accomplishing this task. The system takes as input a \\emph{set} of individual text plans represented as RST-style trees, and produces a smaller set of more complex trees representing integrated messages that still achieve the multiple communicative goals of the individual text plans. Domain-independent rules are used to capture strategies across domains, while the facility for addition...

  10. Code-Mixing in Social Media Text

    Directory of Open Access Journals (Sweden)

    Amitava Das

    Full Text Available Automatic understanding of noisy social media text is one of the prime presentday research areas. Most research has so far concentrated on English texts; however, more than half of the users are writing in other languages, making language identification a prerequisite for comprehensive processing of social media text. Though language identification has been considered an almost solved problem in other applications, language detectors fail in the social media context due to phenomena such as code-mixing, code-switching, lexical borrowings, Anglicisms, and phonetic typing. This paper reports an initial study to understand the characteristics of code-mixing in the social media context and presents a system developed to automatically detect language boundaries in code-mixed social media text, here exemplified by Facebook messages in mixed English-Bengali and English-Hindi.

  11. Adaptive Text Entry for Mobile Devices

    DEFF Research Database (Denmark)

    Proschowsky, Morten Smidt

    for mobile devices and a framework for adaptive context-aware language models. Based on analysis of current text entry methods, the requirements to the new text entry methods are established. Transparent User guided Prediction (TUP) is a text entry method for devices with one dimensional touch input....... It can be touch sensitive wheels, sliders or similar input devices. The interaction design of TUP is done with a combination of high level task models and low level models of human motor behaviour. Three prototypes of TUP are designed and evaluated by more than 30 users. Observations from the...... evaluations are used to improve the models of human motor behaviour. TUP-Key is a variant of TUP, designed for 12 key phone keyboards. It is introduced in the thesis but has not been implemented or evaluated. Both text entry methods support adaptive context-aware language models. YourText is a framework for...

  12. A Survey on Text Mining in Clustering

    Directory of Open Access Journals (Sweden)

    S.Logeswari

    2011-02-01

    Full Text Available Text mining has important applications in the area of data mining and information retrieval. One of the important tasks in text mining is document clustering. Many existing document clustering techniques use the bag-of-words model to represent the content of a document. It is only effective for grouping related documents when these documents share a large proportion of lexically equivalent terms. The synonymy between related documents is ignored. It reduces the effectiveness of applications using a standard full-text document representation. This paper emphasis on the various techniques that are used to cluster the text documents based on keywords, phrases and concepts. It also includes the different performance measures that are used to evaluate the quality of clusters.

  13. Engaging Texts: Effects of Concreteness on Comprehensibility, Interest, and Recall in Four Text Types.

    Science.gov (United States)

    Sadoski, Mark; Goetz, Ernest T.; Rodriguez, Maximo

    2000-01-01

    Investigates concreteness as a text feature that engaged undergraduate readers' comprehension, interest, and learning in four text types: persuasion, exposition, literary stories, and narratives. Results show that concrete texts were recalled better than abstract texts, although the magnitude of the advantage varied across text types. Concreteness…

  14. Text-Based Recall and Extra-Textual Generations Resulting from Simplified and Authentic Texts

    Science.gov (United States)

    Crossley, Scott A.; McNamara, Danielle S.

    2016-01-01

    This study uses a moving windows self-paced reading task to assess text comprehension of beginning and intermediate-level simplified texts and authentic texts by L2 learners engaged in a text-retelling task. Linear mixed effects (LME) models revealed statistically significant main effects for reading proficiency and text level on the number of…

  15. How Popular Culture Texts Inform and Shape Students' Discussions of Social Studies Texts

    Science.gov (United States)

    Hall, Leigh A.

    2012-01-01

    In this article, I examine how 6th-grade students used pop culture texts to inform their understandings about social studies texts and shape their discussions of it. Discussions showed that students used pop culture texts in three ways when talking about social studies texts. First, students applied comprehension strategies to pop culture texts to…

  16. The network of concepts in written texts

    CERN Document Server

    Caldeira, S M G; Andrade, R F S; Neme, A; Miranda, J G V; Caldeira, Silvia M. G.; Lobao, Thierry C. Petit; Neme, Alexis

    2005-01-01

    Complex network theory is used to investigate the structure of meaningful concepts in written texts of individual authors. Networks have been constructed after a two phase filtering, where words with less meaning contents are eliminated, and all remaining words are set to their canonical form, without any number, gender or time flexion. Each sentence in the text is added to the network as a clique. A large number of written texts have been scrutinized, and its found that texts have small-world as well as scale-free structures. The growth process of these networks has also been investigated, and a universal evolution of network quantifiers have been found among the set of texts written by distinct authors. Further analyzes, based on shufling procedures taken either on the texts or on the constructed networks, provide hints on the role played by the word frequency and sentence length distributions to the network structure. Since the meaningful words are related to concepts in the author's mind, results for text...

  17. Figure-associated text summarization and evaluation.

    Directory of Open Access Journals (Sweden)

    Balaji Polepalli Ramesh

    Full Text Available Biomedical literature incorporates millions of figures, which are a rich and important knowledge resource for biomedical researchers. Scientists need access to the figures and the knowledge they represent in order to validate research findings and to generate new hypotheses. By themselves, these figures are nearly always incomprehensible to both humans and machines and their associated texts are therefore essential for full comprehension. The associated text of a figure, however, is scattered throughout its full-text article and contains redundant information content. In this paper, we report the continued development and evaluation of several figure summarization systems, the FigSum+ systems, that automatically identify associated texts, remove redundant information, and generate a text summary for every figure in an article. Using a set of 94 annotated figures selected from 19 different journals, we conducted an intrinsic evaluation of FigSum+. We evaluate the performance by precision, recall, F1, and ROUGE scores. The best FigSum+ system is based on an unsupervised method, achieving F1 score of 0.66 and ROUGE-1 score of 0.97. The annotated data is available at figshare.com (http://figshare.com/articles/Figure_Associated_Text_Summarization_and_Evaluation/858903.

  18. HANDWRITTEN TEXT IMAGE AUTHENTICATION USING BACK PROPAGATION

    Directory of Open Access Journals (Sweden)

    A S N Chakravarthy

    2011-10-01

    Full Text Available Authentication is the act of confirming the truth of an attribute of a datum or entity. This might involveconfirming the identity of a person, tracing the origins of an artefact, ensuring that a product is whatit’s packaging and labelling claims to be, or assuring that a computer program is a trusted one. Theauthentication of information can pose special problems (especially man-in-the-middle attacks, and isoften wrapped up with authenticating identity. Literary can involve imitating the style of a famous author.If an original manuscript, typewritten text, or recording is available, then the medium itself (or itspackaging - anything from a box to e-mail headers can help prove or disprove the authenticity of thedocument. The use of digital images of handwritten historical documents has become more popular inrecent years. Volunteers around the world now read thousands of these images as part of theirindexing process. Handwritten text images of old documents are sometimes difficult to read or noisy dueto the preservation of the document and quality of the image [1]. Handwritten text offers challenges thatare rarely encountered in machine-printed text. In addition, most problems faced in reading machineprintedtext (e.g., character recognition, word segmentation, letter segmentation, etc. are more severe, inhandwritten text. In this paper we Here in this paper we proposed a method for authenticating handwritten text images using back propagation algorithm..

  19. A New Text Location Approach Based Wavelet

    Institute of Scientific and Technical Information of China (English)

    Weihua Li; Zhen Fang; Shuozhong Wang

    2002-01-01

    With the advancement of content-based retrieval technology, the importance of semantics for text information contained in images attracts many researchers. An algorithm which will automatically locate the textual regions in the input image will facilitate the retrieving task, and the optical character recognizer can then be applied to only those regions of the image which contain text. In this paper a new text location method based wavelet is described, which can be used to locate textual regions from complex image and video frame. Experimental results show that the textual regions in image can be located effectively and quickly.

  20. A New Text Location Approach Based Wavelet

    Institute of Scientific and Technical Information of China (English)

    Weihua Li; Zhen Fang; Shuozhong Wang

    2002-01-01

    With the advancement of content-based retrieval technology, the importance of semantics for text information contained in images attracts many researchers. An algorithm which will automatically locate the textual regions in the input image will facilitate the retrieving task, and the optical character recognizer can then be applied to only those regions of the image which contain text. In this paper a new text location method is described, which can be used to locate textual regions from complex image and video frame. Experimental results show that the textual regions in image can be located effectively and quickly.

  1. Editor of mind map's text equivalent

    OpenAIRE

    Hazuza, Petr

    2013-01-01

    The work analyzes the possibility of writing of mental maps in the text form and compares it with the classical creation of mental maps in graphical form. This work also tries to find the ideal solution of mental maps in the text form that fulfil the most functions possible as it is in the graphical version and at the same time it is a rival to the graphical version of mental maps thanks to its simplicity. Work then performs an analysis of the text editor of mental maps. It describes how and ...

  2. Extracting and Sharing Knowledge from Medical Texts

    Institute of Scientific and Technical Information of China (English)

    曹存根

    2002-01-01

    In recent years, we have been developing a new framework for acquiring medical knowledge from Encyclopedic texts. This framework consists of three major parts. The first part is an extended high-level conceptual language (called HLCL 1.1) for use by knowledge engineers to formalize knowledge texts in an encyclopedia. The other part is an HLCL 1.1compiler for parsing and analyzing the formalized texts into knowledge models. The third part is a set of domain-specific ontologies for sharing knowledge.

  3. The Evaluation of Ontology Matching versus Text

    OpenAIRE

    Andreea-Diana MIHIS

    2010-01-01

    Lately, the ontologies have become more and more complex, and they are used in different domains. Some of the ontologies are domain independent; some are specific to a domain. In the case of text processing and information retrieval, it is important to identify the corresponding ontology to a specific text. If the ontology is of a great scale, only a part of it may be reflected in the natural language text. This article presents metrics which evaluate the degree in which an ontology matches a...

  4. Text Clustering with String Kernels in R

    OpenAIRE

    Karatzoglou, Alexandros; Feinerer , Ingo

    2006-01-01

    We present a package which provides a general framework, including tools and algorithms, for text mining in R using the S4 class system. Using this package and the kernlab R package we explore the use of kernel methods for clustering (e.g., kernel k-means and spectral clustering) on a set of text documents, using string kernels. We compare these methods to a more traditional clustering technique like k-means on a bag of word representation of the text and evaluate the viability of kernel-base...

  5. Building Fluency through the Phrased Text Lesson

    Science.gov (United States)

    Rasinski, Timothy; Yildirim, Kasim; Nageldinger, James

    2012-01-01

    This Teaching Tip article explores the importance of phrasing while reading. It also presents an instructional intervention strategy for helping students develop greater proficiency in reading with phrases that reflect the meaning of the text.

  6. AUTOMATIC TEXT SUMMARIZATION BASED ON TEXTUAL COHESION

    Institute of Scientific and Technical Information of China (English)

    Chen Yanmin; Liu Bingquan; Wang Xiaolong

    2007-01-01

    This paper presents two different algorithms that derive the cohesion structure in the form of lexical chains from two kinds of language resources HowNet and TongYiCiCiLin.The research that connects the cohesion structure of a text to the derivation of its summary is displayed.A novel model of automatic text summarization is devised,based on the data provided by lexicai chains from original texts.Moreover,the construction rules of lexical chains are modified according to characteristics of the knowledge database in order to be more suitable for Chinese suIninarization.Evaluation results show that high quality indicative summaries are produced from Chinese texts.

  7. Text-Filled Stacked Area Graphs

    DEFF Research Database (Denmark)

    Kraus, Martin

    2011-01-01

    Text can add a significant amount of detail and value to an information visualization. In particular, it can integrate more of the data that a visualization is based on, and it can also integrate information that is personally relevant to readers of a visualization. This may influence readers...... to consider a visualization a detailed enrichment of their personal experience instead of an abstract representation of anonymous numbers. However, the integration of textual detail into a visualization is often very challenging. This work discusses one particular approach to this problem, namely text......-filled stacked area graphs; i.e., graphs that feature stacked areas that are filled with small-typed text. Since these graphs allow for computing the text layout automatically, it is possible to include large amounts of textual detail with very little effort. We discuss the most important challenges and some...

  8. Understanding How Headings Influence Text Processing

    Directory of Open Access Journals (Sweden)

    Julie Lemarié

    2012-07-01

    Full Text Available Titles and headings are commonly used signaling devices in expository texts. Researchers in cognitive and educational psychology have demonstrated several important effects of headings and titles on text processing: headings improve memory for text organization; headings influence text comprehension by activating readers’ prior knowledge; and titles can bias text comprehension by their emphasis on a particular text topic. However, the lack of precise linguistic analyses of titles/headings has limited both the scope of empirical research and the precision of conclusions. We present a theory of signaling devices that provides a detailed analysis of variation in titles and headings and generates predictions concerning their effects. We discuss the implications of our analyses for research on titles and headings and summarize recent research findings that illustrate the validity of a central component of our analyses. Finally, we propose some future research directions integrating insights from linguistics for the study of how headings and titles affect text processing.Les titres et intertitres sont des dispositifs de signalisation fréquemment utilisés dans les textes expositifs. De nombreuses recherches réalisées en psychologie cognitive et psychologie des apprentissages ont mis en évidence leurs effets sur le traitement du texte par le lecteur : les intertitres améliorent la représentation mnésique de l’organisation du texte et influencent la compréhension du texte par un mécanisme d’activation des connaissances antérieures du lecteur. Les titres généraux, lorsqu’ils mettent en avant un des thèmes du texte, biaisent la compréhension du texte. Cependant, l’absence d’analyse linguistique approfondie des titres et intertitres a limité la portée de ces travaux et a mené à des conclusions méritant d’être affinées. Nous présentons une théorie générale de la signalisation des textes qui propose un cadre d

  9. Figures of thought mathematics and mathematical texts

    CERN Document Server

    Reed, David

    2003-01-01

    Examines the ways in which mathematical works can be read as texts, examines their textual strategiesand demonstrates that such readings provide a rich source of philosophical debate regarding mathematics.

  10. Discovery of Recurring Anomalies in Text Reports

    Data.gov (United States)

    National Aeronautics and Space Administration — This paper describes the results of a significant research and development effort conducted at NASA Ames Research Center to develop new text mining algorithms to...

  11. Punctuation effects in English and Esperanto texts

    CERN Document Server

    Ausloos, M

    2010-01-01

    A statistical physics study of punctuation effects on sentence lengths is presented for written texts: {\\it Alice in wonderland} and {\\it Through a looking glass}. The translation of the first text into esperanto is also considered as a test for the role of punctuation in defining a style, and for contrasting natural and artificial, but written, languages. Several log-log plots of the sentence length-rank relationship are presented for the major punctuation marks. Different power laws are observed with characteristic exponents. The exponent can take a value much less than unity ($ca.$ 0.50 or 0.30) depending on how a sentence is defined. The texts are also mapped into time series based on the word frequencies. The quantitative differences between the original and translated texts are very minutes, at the exponent level. It is argued that sentences seem to be more reliable than word distributions in discussing an author style.

  12. Punctuation effects in english and esperanto texts

    Science.gov (United States)

    Ausloos, M.

    2010-07-01

    A statistical physics study of punctuation effects on sentence lengths is presented for written texts: Alice in wonderland and Through a looking glass. The translation of the first text into esperanto is also considered as a test for the role of punctuation in defining a style, and for contrasting natural and artificial, but written, languages. Several log-log plots of the sentence-length-rank relationship are presented for the major punctuation marks. Different power laws are observed with characteristic exponents. The exponent can take a value much less than unity ( ca. 0.50 or 0.30) depending on how a sentence is defined. The texts are also mapped into time series based on the word frequencies. The quantitative differences between the original and translated texts are very minutes, at the exponent level. It is argued that sentences seem to be more reliable than word distributions in discussing an author style.

  13. Talking, Texting Teen Drivers Take Deadly Toll

    Science.gov (United States)

    ... gov/medlineplus/news/fullstory_159138.html Talking, Texting Teen Drivers Take Deadly Toll Distractions played role in ... too many cases -- killing people in crashes involving teen drivers, a new report shows. A full 60 ...

  14. A new text book for forest planning

    OpenAIRE

    Scotti R

    2007-01-01

    A new text book by P. Corona (University of Tuscia, Viterbo, Italy) is presented, dealing with sampling and measuring methods for determining forest stand volumes and increments in the frame of forest planning. The book is written in Italian.

  15. Cohesion and Metaphor Aspects in Andabhuana Text

    Directory of Open Access Journals (Sweden)

    Ida Bagus Mahardika

    2015-02-01

    Full Text Available Cohesion and metaphor are the unique and interesting parts of language aspects in Andhabhuan text to research. They are quite dominant aspects in the story in developing its literature aesthetic. This research is based on the arts technical and analytical method. The result of the research on those two aspects shows that traditional aesthetic style in arts, as described in Andabhuana verses emphasize on the reference, meaning, selection and variation of words. The language parts used are aimed at bringing the text ideology to humanity perspective, especially the ?iwatattwa values as parts of Hindu teaching. Hence the cohesion and metaphor in Andabhuana text  are  semiotic description to transform to Balinese Hindus as most of them follow ?iwatattwa belief.

  16. The Relationship between Paraphrasing and Text Analysis

    Directory of Open Access Journals (Sweden)

    María Luisa Cepeda Islas

    2013-04-01

    Full Text Available Given the importance of paraphrasing in the process of comprehension for college students, this study assessed the level of implementation of text analysis and paraphrases the response of a sample of senior students of the career psychology. We selected a group of freshmen to the Psychology course, which was asked to answer a questionnaire and carry out the summary of an empirical article. The results showed that participants have a low level of text analysis, at the same time had low levels of paraphrasing. It was seen that the predominant textual copy. They envision some possibilities for the structure of a training workshop not only paraphrasing but on the analysis of text.

  17. Talking, Texting Teen Drivers Take Deadly Toll

    Science.gov (United States)

    ... medlineplus.gov/news/fullstory_159138.html Talking, Texting Teen Drivers Take Deadly Toll Distractions played role in ... too many cases -- killing people in crashes involving teen drivers, a new report shows. A full 60 ...

  18. Voice to Text Language Translation (VTLT) Project

    Data.gov (United States)

    National Aeronautics and Space Administration — A feasibility analysis of adding a second modality to pilot/Air Traffic Control (ATC) communications. The real time availability of text in Air Traffic Control...

  19. QuitNowTXT Text Messaging Library

    Data.gov (United States)

    U.S. Department of Health & Human Services — Overview: The QuitNowTXT text messaging program is designed as a resource that can be adapted to specific contexts including those outside the United States and in...

  20. Utterance and Text in Freshman English.

    Science.gov (United States)

    Lotto, Edward

    1989-01-01

    Analyzes the distinction between utterance and writing to determine why students have difficulty using specific details to explore their generalizations. Describes successful strategies and assignments to encourage student awareness of text and concrete expression. (KEH)

  1. Text document classification based on mixture models

    Czech Academy of Sciences Publication Activity Database

    Novovičová, Jana; Malík, Antonín

    2004-01-01

    Roč. 40, č. 3 (2004), s. 293-304. ISSN 0023-5954 R&D Projects: GA AV ČR IAA2075302; GA ČR GA102/03/0049; GA AV ČR KSK1019101 Institutional research plan: CEZ:AV0Z1075907 Keywords : text classification * text categorization * multinomial mixture model Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.224, year: 2004

  2. Multinomial Inverse Regression for Text Analysis

    OpenAIRE

    Taddy, Matt

    2010-01-01

    Text data, including speeches, stories, and other document forms, are often connected to sentiment variables that are of interest for research in marketing, economics, and elsewhere. It is also very high dimensional and difficult to incorporate into statistical analyses. This article introduces a straightforward framework of sentiment-preserving dimension reduction for text data. Multinomial inverse regression is introduced as a general tool for simplifying predictor sets that can be represen...

  3. Text mining for the biocuration workflow

    OpenAIRE

    Hirschman, L.; Burns, G. A. P. C.; Krallinger, M.; Arighi, C.; Cohen, K. B.; Valencia, A.; Wu, C H; Chatr-aryamontri, A; Dowell, K. G.; Huala, E; Lourenco, A.; Nash, R; Veuthey, A.-L.; Wiegers, T.; Winter, A. G.

    2012-01-01

    Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations too...

  4. Financial Statement Fraud Detection using Text Mining

    OpenAIRE

    Rajan Gupta; Nasib Singh Gill

    2013-01-01

    Data mining techniques have been used enormously by the researchers’ community in detecting financial statement fraud. Most of the research in this direction has used the numbers (quantitative information) i.e. financial ratios present in the financial statements for detecting fraud. There is very little or no research on the analysis of text such as auditor’s comments or notes present in published reports. In this study we propose a text mining approach for detecting financial statement frau...

  5. Chapter 16: Text Mining for Translational Bioinformatics

    OpenAIRE

    Bretonnel Cohen, K; Hunter, Lawrence E.

    2013-01-01

    Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. P...

  6. Text Messaging for Addiction: A Review

    OpenAIRE

    Keoleian, Victoria; Polcin, Douglas; Galloway, Gantt P.

    2015-01-01

    Individuals seeking treatment for addiction often experience barriers due to cost, lack of local treatment resources, or either school or work schedule conflicts. Text messaging-based addiction treatment is inexpensive and has the potential to be widely accessible in real time. We conducted a comprehensive literature review identifying 11 published randomized controlled trials (RCTs) evaluating text messaging-based interventions for tobacco smoking, 4 studies for reducing alcohol consumption,...

  7. Ellogon: A New Text Engineering Platform

    OpenAIRE

    Petasis, Georgios; Karkaletsis, Vangelis; Paliouras, Georgios; Androutsopoulos, Ion; Spyropoulos, Constantine D.

    2002-01-01

    This paper presents Ellogon, a multi-lingual, cross-platform, general-purpose text engineering environment. Ellogon was designed in order to aid both researchers in natural language processing, as well as companies that produce language engineering systems for the end-user. Ellogon provides a powerful TIPSTER-based infrastructure for managing, storing and exchanging textual data, embedding and managing text processing components as well as visualising textual data and their associated linguis...

  8. CLUSTERING-BASED ANALYSIS OF TEXT SIMILARITY

    OpenAIRE

    Bovcon , Borja

    2013-01-01

    The focus of this thesis is comparison of analysis of text-document similarity using clustering algorithms. We begin by defining main problem and then, we proceed to describe the two most used text-document representation techniques, where we present words filtering methods and their importance, Porter's algorithm and tf-idf term weighting algorithm. We then proceed to apply all previously described algorithms on selected data-sets, which vary in size and compactness. Fallowing this, we ...

  9. Learning semantic similarity for very short texts

    OpenAIRE

    De Boom, Cedric; Van Canneyt, Steven; Bohez, Steven; Demeester, Thomas; Dhoedt, Bart

    2015-01-01

    Levering data on social media, such as Twitter and Facebook, requires information retrieval algorithms to become able to relate very short text fragments to each other. Traditional text similarity methods such as tf-idf cosine-similarity, based on word overlap, mostly fail to produce good results in this case, since word overlap is little or non-existent. Recently, distributed word representations, or word embeddings, have been shown to successfully allow words to match on the semantic level....

  10. Metadiscoursal Markers in Medical and Literary Texts

    OpenAIRE

    Marzieh Mostafavi; Ghaffar Tajalli

    2012-01-01

    English medical and literary texts were compared and contrasted to find out whether there were any significant differences between the two kinds of texts in terms of the number and types of metadiscoursal markers. To this end, first, 30 medical and literary journal articles were chosen. Then, 3 successive paragraphs were extracted randomly from each of the selected articles which totaled 90 paragraphs out of which 45 were medical and 45 literary. The frequency and type of metadiscoursal marke...

  11. Text genre and time discursive construction

    OpenAIRE

    Maria Antónia Coutinho; Noémia Jorge

    2012-01-01

    This paper adopts a text and discourse linguistics framework. We will assume discourses (or discourse types) represent an intermediate organizational level between text genre and the specific linguistic devices in use (Bronckart, 1997, 2008). Discourse types play a key role in language activity, as they allow the transition between individual and collective representations, and they involve temporal and agentive relationships. In our research, we focus the linguistic devices associated with t...

  12. Text mining for the biocuration workflow.

    Science.gov (United States)

    Hirschman, Lynette; Burns, Gully A P C; Krallinger, Martin; Arighi, Cecilia; Cohen, K Bretonnel; Valencia, Alfonso; Wu, Cathy H; Chatr-Aryamontri, Andrew; Dowell, Karen G; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G

    2012-01-01

    Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on 'Text Mining for the BioCuration Workflow' at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community. PMID:22513129

  13. Statistical machine translation for automobile marketing texts

    OpenAIRE

    Läubli, Samuel; Fishel, Mark; Weibel, Manuela; Volk, Martin

    2013-01-01

    We describe a project on introducing an in-house statistical machine translation system for marketing texts from the automobile industry with the final aim of replacing manual translation with post-editing, based on the translation system. The focus of the paper is the suitability of such texts for SMT; we present experiments in domain adaptation and decompounding that improve the baseline translation systems, the results of which are evaluated using automatic metrics as well as manual evalua...

  14. Komparatistik online 2014 Heft 2 : Polyglotte Texte

    OpenAIRE

    Willms, Weertje; Zemanek, Evi

    2014-01-01

    Die literarische Mehrsprachigkeit einzelner Autoren oder Kulturgemeinschaften – die in verschiedenen Sprachen unterschiedliche Texte verfassen, ohne dass ein und derselbe Text mehrsprachig sein muss – ist ein altes Phänomen. Man denke nur an das historische Nebeneinander von volkssprachlicher und lateinischer Literatur oder an die Koexistenz von Schrift- und Umgangssprache, die bis heute verschiedene Kulturen und Nationen kennzeichnen. Ebenso alt ist die davon zu unterscheidende Mischsprachig...

  15. Revenue - similarities and differences of normative texts

    OpenAIRE

    Miloslav Janhuba

    2004-01-01

    This exposition focused to some basic problems by definition of the microeconomic (and accounting) category "revenue", of its congruence and differences in canonic texts (law act, instruction, directive etc.), which regulate a complex of accounting performances, in the first place, income statements. As canonic texts was make use of directives EU, International Accounting Standards, Financial Accounting Standards in USA and Czech law. Theoretically compared are the functions of revenue (and i...

  16. WRITTEN TEXT AUTHOR'S CHARACTERISTICS ASCERTAINMENT (PROFILING)

    OpenAIRE

    Litvinova, Tat'yana

    2012-01-01

    Nowadays it is considered to be proved that a text reflects its author's personality, the author of the article states that one of the effective ways of personal peculiarities revelation is the analysis of deixis units, especially personal pronouns, prepositions and conjunctions, and suggests applying the techniques of a text author's personality peculiarities ascertainment by deixis units analysis to the Russian language and considering the possibility of deixis analysis as the means of a wr...

  17. A Survey of Unstructured Text Summarization Techniques

    Directory of Open Access Journals (Sweden)

    Sherif Elfayoumy

    2014-05-01

    Full Text Available Due to the explosive amounts of text data being created and organizations increased desire to leverage their data corpora, especially with the availability of Big Data platforms, there is not usually enough time to read and understand each document and make decisions based on document contents. Hence, there is a great demand for summarizing text documents to provide a representative substitute for the original documents. By improving summarizing techniques, precision of document retrieval through search queries against summarized documents is expected to improve in comparison to querying against the full spectrum of original documents. Several generic text summarization algorithms have been developed, each with its own advantages and disadvantages. For example, some algorithms are particularly good for summarizing short documents but not for long ones. Others perform well in identifying and summarizing single-topic documents but their precision degrades sharply with multi-topic documents. In this article we present a survey of the literature in text summarization. We also surveyed some of the most common evaluation methods for the quality of automated text summarization techniques. Last, we identified some of the challenging problems that are still open, in particular the need for a universal approach that yields good results for mixed types of documents.

  18. Text Entry by Gazing and Smiling

    Directory of Open Access Journals (Sweden)

    Outi Tuisku

    2013-01-01

    Full Text Available Face Interface is a wearable prototype that combines the use of voluntary gaze direction and facial activations, for pointing and selecting objects on a computer screen, respectively. The aim was to investigate the functionality of the prototype for entering text. First, three on-screen keyboard layout designs were developed and tested (n=10 to find a layout that would be more suitable for text entry with the prototype than traditional QWERTY layout. The task was to enter one word ten times with each of the layouts by pointing letters with gaze and select them by smiling. Subjective ratings showed that a layout with large keys on the edge and small keys near the center of the keyboard was rated as the most enjoyable, clearest, and most functional. Second, using this layout, the aim of the second experiment (n=12 was to compare entering text with Face Interface to entering text with mouse. The results showed that text entry rate for Face Interface was 20 characters per minute (cpm and 27 cpm for the mouse. For Face Interface, keystrokes per character (KSPC value was 1.1 and minimum string distance (MSD error rate was 0.12. These values compare especially well with other similar techniques.

  19. Automatic Arabic Hand Written Text Recognition System

    Directory of Open Access Journals (Sweden)

    I. A. Jannoud

    2007-01-01

    Full Text Available Despite of the decent development of the pattern recognition science applications in the last decade of the twentieth century and this century, text recognition remains one of the most important problems in pattern recognition. To the best of our knowledge, little work has been done in the area of Arabic text recognition compared with those for Latin, Chins and Japanese text. The main difficulty encountered when dealing with Arabic text is the cursive nature of Arabic writing in both printed and handwritten forms. An Automatic Arabic Hand-Written Text Recognition (AHTR System is proposed. An efficient segmentation stage is required in order to divide a cursive word or sub-word into its constituting characters. After a word has been extracted from the scanned image, it is thinned and its base line is calculated by analysis of horizontal density histogram. The pattern is then followed through the base line and the segmentation points are detected. Thus after the segmentation stage, the cursive word is represented by a sequence of isolated characters. The recognition problem thus reduces to that of classifying each character. A set of features extracted from each individual characters. A minimum distance classifier is used. Some approaches are used for processing the characters and post processing added to enhance the results. Recognized characters will be appended directly to a word file which is editable form.

  20. Practical vision based degraded text recognition system

    Science.gov (United States)

    Mohammad, Khader; Agaian, Sos; Saleh, Hani

    2011-02-01

    Rapid growth and progress in the medical, industrial, security and technology fields means more and more consideration for the use of camera based optical character recognition (OCR) Applying OCR to scanned documents is quite mature, and there are many commercial and research products available on this topic. These products achieve acceptable recognition accuracy and reasonable processing times especially with trained software, and constrained text characteristics. Even though the application space for OCR is huge, it is quite challenging to design a single system that is capable of performing automatic OCR for text embedded in an image irrespective of the application. Challenges for OCR systems include; images are taken under natural real world conditions, Surface curvature, text orientation, font, size, lighting conditions, and noise. These and many other conditions make it extremely difficult to achieve reasonable character recognition. Performance for conventional OCR systems drops dramatically as the degradation level of the text image quality increases. In this paper, a new recognition method is proposed to recognize solid or dotted line degraded characters. The degraded text string is localized and segmented using a new algorithm. The new method was implemented and tested using a development framework system that is capable of performing OCR on camera captured images. The framework allows parameter tuning of the image-processing algorithm based on a training set of camera-captured text images. Novel methods were used for enhancement, text localization and the segmentation algorithm which enables building a custom system that is capable of performing automatic OCR which can be used for different applications. The developed framework system includes: new image enhancement, filtering, and segmentation techniques which enabled higher recognition accuracies, faster processing time, and lower energy consumption, compared with the best state of the art published

  1. Native Language Processing using Exegy Text Miner

    Energy Technology Data Exchange (ETDEWEB)

    Compton, J

    2007-10-18

    Lawrence Livermore National Laboratory's New Architectures Testbed recently evaluated Exegy's Text Miner appliance to assess its applicability to high-performance, automated native language analysis. The evaluation was performed with support from the Computing Applications and Research Department in close collaboration with Global Security programs, and institutional activities in native language analysis. The Exegy Text Miner is a special-purpose device for detecting and flagging user-supplied patterns of characters, whether in streaming text or in collections of documents at very high rates. Patterns may consist of simple lists of words or complex expressions with sub-patterns linked by logical operators. These searches are accomplished through a combination of specialized hardware (i.e., one or more field-programmable gates arrays in addition to general-purpose processors) and proprietary software that exploits these individual components in an optimal manner (through parallelism and pipelining). For this application the Text Miner has performed accurately and reproducibly at high speeds approaching those documented by Exegy in its technical specifications. The Exegy Text Miner is primarily intended for the single-byte ASCII characters used in English, but at a technical level its capabilities are language-neutral and can be applied to multi-byte character sets such as those found in Arabic and Chinese. The system is used for searching databases or tracking streaming text with respect to one or more lexicons. In a real operational environment it is likely that data would need to be processed separately for each lexicon or search technique. However, the searches would be so fast that multiple passes should not be considered as a limitation a priori. Indeed, it is conceivable that large databases could be searched as often as necessary if new queries were deemed worthwhile. This project is concerned with evaluating the Exegy Text Miner installed in the

  2. ERRORS AND DIFFICULTIES IN TRANSLATING LEGAL TEXTS

    Directory of Open Access Journals (Sweden)

    Camelia, CHIRILA

    2014-11-01

    Full Text Available Nowadays the accurate translation of legal texts has become highly important as the mistranslation of a passage in a contract, for example, could lead to lawsuits and loss of money. Consequently, the translation of legal texts to other languages faces many difficulties and only professional translators specialised in legal translation should deal with the translation of legal documents and scholarly writings. The purpose of this paper is to analyze translation from three perspectives: translation quality, errors and difficulties encountered in translating legal texts and consequences of such errors in professional translation. First of all, the paper points out the importance of performing a good and correct translation, which is one of the most important elements to be considered when discussing translation. Furthermore, the paper presents an overview of the errors and difficulties in translating texts and of the consequences of errors in professional translation, with applications to the field of law. The paper is also an approach to the differences between languages (English and Romanian that can hinder comprehension for those who have embarked upon the difficult task of translation. The research method that I have used to achieve the objectives of the paper was the content analysis of various Romanian and foreign authors' works.

  3. Introduction, Critical Text logy and Textual Criticism

    Directory of Open Access Journals (Sweden)

    فرزاد قائمی

    2013-06-01

    Full Text Available Asadi’s Shahnameh is a great epic consisting of twenty-four thousand distiches and is attributed to Asadi or another poet of the same nickname. This work was created in the same line of development as Ferdowsi’s Shahnameh. The main theme is the old campaign of Soleymān to Iran to confront with Rostam and Keykhosrow and to repeat the pattern of Rostam’s battles with his children in a state of anonymity. The text structure is episodic with numerous central characters. The narratives are for the most part derived from oral literature. Textual evidence demonstrates that the poet is Shiite. The narrative content, chronogram as well as the literary and linguistic style of one of the manuscripts reveal that the text was written in the ninth century (probably 809 A.H.. The article first introduces the text and the origin of its narratives in oral literature; it then proceeds with the study of the narrative structure of the epic using three available manuscripts dating back to the thirteenth and fourteenth centuries (A.H.. Textology and Textual Criticism have been employed as the research methodology. The literary and linguistic features of the text have also been examined at three levels: lexical, syntactic and rhetorical.

  4. The Impact of Texting on Comprehension

    Directory of Open Access Journals (Sweden)

    Jamal K. M. Ali

    2015-07-01

    Full Text Available This paper presents a study of the effects of texting on English language comprehension. The authors believe that English used in texting causes a lack of comprehension for English speakers, learners, and texters. Wei, Xian-hai and Jiang (2008:3 declare “In Netspeak, there are some newly-created vocabularies, which people cannot comprehend them either from their partial pronunciation or from their figures.” Crystal (2007:23 claims; “variation causes problems of comprehension and acceptability. If you speak or write differently from the way I do, we may fail to understand each other.”  In this paper, the authors conducted a questionnaire at Aligarh Muslim University to ninety respondents from five different Faculties and four different levels. To measure respondents’ comprehension of English texting, the authors gave the respondents abbreviations used by texters and asked them to write the full forms of the abbreviations. The authors found that many abbreviations were not understood, which suggested that most of the respondents did not understand and did not use these abbreviations.Keywords: abbreviation, comprehension, texting, texters, variation

  5. @Note: a workbench for biomedical text mining.

    Science.gov (United States)

    Lourenço, Anália; Carreira, Rafael; Carneiro, Sónia; Maia, Paulo; Glez-Peña, Daniel; Fdez-Riverola, Florentino; Ferreira, Eugénio C; Rocha, Isabel; Rocha, Miguel

    2009-08-01

    Biomedical Text Mining (BioTM) is providing valuable approaches to the automated curation of scientific literature. However, most efforts have addressed the benchmarking of new algorithms rather than user operational needs. Bridging the gap between BioTM researchers and biologists' needs is crucial to solve real-world problems and promote further research. We present @Note, a platform for BioTM that aims at the effective translation of the advances between three distinct classes of users: biologists, text miners and software developers. Its main functional contributions are the ability to process abstracts and full-texts; an information retrieval module enabling PubMed search and journal crawling; a pre-processing module with PDF-to-text conversion, tokenisation and stopword removal; a semantic annotation schema; a lexicon-based annotator; a user-friendly annotation view that allows to correct annotations and a Text Mining Module supporting dataset preparation and algorithm evaluation. @Note improves the interoperability, modularity and flexibility when integrating in-home and open-source third-party components. Its component-based architecture allows the rapid development of new applications, emphasizing the principles of transparency and simplicity of use. Although it is still on-going, it has already allowed the development of applications that are currently being used. PMID:19393341

  6. Handwriting segmentation of unconstrained Oriya text

    Indian Academy of Sciences (India)

    N Tripathy; U Pal

    2006-12-01

    Segmentation of handwritten text into lines, words and characters is one of the important steps in the handwritten text recognition process. In this paper we propose a water reservoir concept-based scheme for segmentation of unconstrained Oriya handwritten text into individual characters. Here, at first, the text image is segmented into lines, and the lines are then segmented into individual words. For line segmentation, the document is divided into vertical stripes. Analysing the heights of the water reservoirs obtained from different components of the document, the width of a stripe is calculated. Stripe-wise horizontal histograms are then computed and the relationship of the peak–valley points of the histograms is used for line segmentation. Based on vertical projection profiles and structural features of Oriya characters, text lines are segmented into words. For character segmentation, at first, the isolated and connected (touching) characters in a word are detected. Using structural, topological and water reservoir concept-based features, characters of the word that touch are then segmented. From experiments we have observed that the proposed “touching character” segmentation module has 96·7% accuracy for two-character touching strings.

  7. Exploring the Effect of Background Knowledge and Text Cohesion on Learning from Texts in Computer Science

    Science.gov (United States)

    Gasparinatou, Alexandra; Grigoriadou, Maria

    2013-01-01

    In this study, we examine the effect of background knowledge and local cohesion on learning from texts. The study is based on construction-integration model. Participants were 176 undergraduate students who read a Computer Science text. Half of the participants read a text of maximum local cohesion and the other a text of minimum local cohesion.…

  8. OMG! Texting in Class = U Fail :( Empirical Evidence That Text Messaging During Class Disrupts Comprehension

    Science.gov (United States)

    Gingerich, Amanda C.; Lineweaver, Tara T.

    2014-01-01

    In two experiments, we examined the effects of text messaging during lecture on comprehension of lecture material. Students (in Experiment 1) and randomly assigned participants (in Experiment 2) in a text message condition texted a prescribed conversation while listening to a brief lecture. Students and participants in the no-text condition…

  9. Modified Approach to Transform Arc From Text to Linear Form Text : A Preprocessing Stage for OCR

    Directory of Open Access Journals (Sweden)

    Vijayashree C S

    2014-08-01

    Full Text Available Arc-form-text is an artistic-text which is quite common in several documents such as certificates, advertisements and history documents. OCRs fail to read such arc-form-text and it is necessary to transform the same to linear-form-text at preprocessing stage. In this paper, we present a modification to an existing transformation model for better readability by OCRs. The method takes the segmented arcform-text as input. Initially two concentric ellipses are approximated to enclose the arc-form-text and later the modified transformation model transforms the text in arc-form to linear-form. The proposed method is implemented on several upper semi-circular arc-form-text inputs and the readability of the transformed text is analyzed with an OCR

  10. Text mining patents for biomedical knowledge.

    Science.gov (United States)

    Rodriguez-Esteban, Raul; Bundschus, Markus

    2016-06-01

    Biomedical text mining of scientific knowledge bases, such as Medline, has received much attention in recent years. Given that text mining is able to automatically extract biomedical facts that revolve around entities such as genes, proteins, and drugs, from unstructured text sources, it is seen as a major enabler to foster biomedical research and drug discovery. In contrast to the biomedical literature, research into the mining of biomedical patents has not reached the same level of maturity. Here, we review existing work and highlight the associated technical challenges that emerge from automatically extracting facts from patents. We conclude by outlining potential future directions in this domain that could help drive biomedical research and drug discovery. PMID:27179985

  11. Multilingual access to full text databases

    International Nuclear Information System (INIS)

    Many full text databases are available in only one language, or more, they may contain documents in different languages. Even if the user is able to understand the language of the documents in the database, it could be easier for him to express his need in his own language. For the case of databases containing documents in different languages, it is more simple to formulate the query in one language only and to retrieve documents in different languages. This paper present the developments and the first experiments of multilingual search, applied to french-english pair, for text data in nuclear field, based on the system SPIRIT. After reminding the general problems of full text databases search by queries formulated in natural language, we present the methods used to reformulate the queries and show how they can be expanded for multilingual search. The first results on data in nuclear field are presented (AFCEN norms and INIS abstracts). 4 refs

  12. PUNJABI TEXT CLUSTERING BY SENTENCE STRUCTURE ANALYSIS

    Directory of Open Access Journals (Sweden)

    Saurabh Sharma

    2012-10-01

    Full Text Available Punjabi Text Document Clustering is done by analyzing the sentence structure of similar documents sharing same topics and grouping them into clusters. The prevalent algorithms in this field utilize the vector space model which treats the documents as a bag of words. The meaning in natural language inherently depends on the word sequences which are overlooked and ignored while clustering. The current paper deals with a new Punjabi text clustering algorithm named Clustering by Sentence Structure Analysis(CSSA which has been carried out on 221 Punjabi news articles available on news sites. The phrases are extracted for processing by a meticulous analysis of the structure of a sentence by applying the basic grammatical rules of Karaka. Sequences formed from phrases, are used to find the topic and for finding similarities among all documents which results in the formation of meaningful clusters.

  13. Monolingual accounting dictionaries for EFL text production

    DEFF Research Database (Denmark)

    Nielsen, Sandro

    2006-01-01

    Monolingual accounting dictionaries are important for producing financial reporting texts in English in an international setting, because of the lack of specialised bilingual dictionaries. As the intended user groups have different factual and linguistic competences, they require specific types of...... information. By identifying and analysing the users' factual and linguistic competences, user needs, use-situations and the stages involved in producing accounting texts in English as a foreign language, lexicographers will have a sound basis for designing the optimal English accounting dictionary for EFL...... text production. The monolingual accounting dictionary needs to include information about UK, US and international accounting terms, their grammatical properties, their potential for being combined with other words in collocations, phrases and sentences in order to meet user requirements. Data items...

  14. Monolingual Accounting Dictionaries for EFL Text Production

    DEFF Research Database (Denmark)

    Nielsen, Sandro

    2009-01-01

    Monolingual accounting dictionaries are important for producing financial reporting texts in English in an international setting, because of the lack of specialised bilingual dictionaries. As the intended user groups have different factual and linguistic competences, they require specific types of...... information. By identifying and analysing the users' factual and linguistic competences, user needs, use-situations and the stages involved in producing accounting texts in English as a foreign language, lexicographers will have a sound basis for designing the optimal English accounting dictionary for EFL...... text production. The monolingual accounting dictionary needs to include information about UK, US and international accounting terms, their grammatical properties, their potential for being combined with other words in collocations, phrases and sentences in order to meet user requirements. Data items...

  15. Preprocessing and Morphological Analysis in Text Mining

    Directory of Open Access Journals (Sweden)

    Krishna Kumar Mohbey Sachin Tiwari

    2011-12-01

    Full Text Available This paper is based on the preprocessing activities which is performed by the software or language translators before applying mining algorithms on the huge data. Text mining is an important area of Data mining and it plays a vital role for extracting useful information from the huge database or data ware house. But before applying the text mining or information extraction process, preprocessing is must because the given data or dataset have the noisy, incomplete, inconsistent, dirty and unformatted data. In this paper we try to collect the necessary requirements for preprocessing. When we complete the preprocess task then we can easily extract the knowledgful information using mining strategy. This paper also provides the information about the analysis of data like tokenization, stemming and semantic analysis like phrase recognition and parsing. This paper also collect the procedures for preprocessing data i.e. it describe that how the stemming, tokenization or parsing are applied.

  16. Context Based Word Sense Extraction in Text

    Directory of Open Access Journals (Sweden)

    Ranjeetsingh S.Suryawanshi

    2011-11-01

    Full Text Available In the era of modern e-document technology, everyone using computerized document for their purpose. Due to huge amount of text document available in the form of pdf, doc, txt, html, and xml user may confuse about reading sense of these entire documents, if same word interpret different sense. Word sense has always been an important problem in information retrieval and extraction, as well as, text mining, because machines don’t have that much intelligence as compared to human to sense word in particular context. User want to determine which sense of a word is used in a given context. Word is usage-based, and part of it can be created automatically from an electronic dictionary. This paper describes word sense as expressed by its WordNet synsets, arranged according to their relevance and their context are expressed by means of word association

  17. Text Classification Using Sentential Frequent Itemsets

    Institute of Scientific and Technical Information of China (English)

    Shi-Zhu Liu; He-Ping Hu

    2007-01-01

    Text classification techniques mostly rely on single term analysis of the document data set, while more concepts,especially the specific ones, are usually conveyed by set of terms. To achieve more accurate text classifier, more informative feature including frequent co-occurring words in the same sentence and their weights are particularly important in such scenarios. In this paper, we propose a novel approach using sentential frequent itemset, a concept comes from association rule mining, for text classification, which views a sentence rather than a document as a transaction, and uses a variable precision rough set based method to evaluate each sentential frequent itemset's contribution to the classification. Experiments over the Reuters and newsgroup corpus are carried out, which validate the practicability of the proposed system.

  18. Kombination av text och bild i undervisningen

    OpenAIRE

    Lindbom, Yvonne

    2008-01-01

    Mitt arbete beskriver hur man kan arbeta ämnesövergripande med bild och text i kombination. Jag vill i mitt arbete genom exempel visa vikten av att bild och text vävs samman i undervisningen. Mitt val av arbete grundar sig på att mina elever saknade förståelse av bildspråket. Orden och texterna kom därför in som ett naturligt moment i undervisningen. Syftet med arbetet är att undersöka hur text och bild i kombination kan ge en djupare förståelse för bildspråket. Ett ämnesövergripande arbete g...

  19. Tagging and Morphological Disambiguation of Turkish Text

    CERN Document Server

    Oflazer, K; Oflazer, Kemal; Kuruoz, Ilker

    1994-01-01

    Automatic text tagging is an important component in higher level analysis of text corpora, and its output can be used in many natural language processing applications. In languages like Turkish or Finnish, with agglutinative morphology, morphological disambiguation is a very crucial process in tagging, as the structures of many lexical forms are morphologically ambiguous. This paper describes a POS tagger for Turkish text based on a full-scale two-level specification of Turkish morphology that is based on a lexicon of about 24,000 root words. This is augmented with a multi-word and idiomatic construct recognizer, and most importantly morphological disambiguator based on local neighborhood constraints, heuristics and limited amount of statistical information. The tagger also has functionality for statistics compilation and fine tuning of the morphological analyzer, such as logging erroneous morphological parses, commonly used roots, etc. Preliminary results indicate that the tagger can tag about 98-99\\% of the...

  20. Combinatorial Classification for Chunking Arabic Text

    Directory of Open Access Journals (Sweden)

    Feriel Ben Fraj

    2012-10-01

    Full Text Available Text parsing has always benefited from special attention since the first applications of natural languageprocessing (NLP. The problem gets worse for the Arabic language because of its specific features thatmake it quite different and even more ambiguous than other natural languages when processed. In thispaper, we discuss a new approach for chunking Arabic texts based on a combinatorial classificationprocess. It is a modular chunker that identifies the chunk heads using a combinatorial binary classificationbefore recognizing their types based on the parts-of-speech of the chunk heads, already identified. For theexperimentation, we use over than 2300 words as training data. The evaluation of the chunker consists oftwo steps and gives results that we consider very satisfactory (average accuracy of 89,60% for theclassification step and 80,46% for the full chunking process.

  1. Combinatorial Classification for Chunking Arabic Texts

    Directory of Open Access Journals (Sweden)

    Fériel Ben Fraj

    2012-09-01

    Full Text Available Text parsing has always benefited from special attention since the first applications of natural language processing (NLP. The problem gets worse for the Arabic language because of its specific features that make it quite different and even more ambiguous than other natural languages when processed. In this paper, we discuss a new approach for chunking Arabic texts based on a combinatorial classification process. It is a modular chunker that identifies the chunk heads using a combinatorial binary classification before recognizing their types based on the parts-of-speech of the chunk heads, already identified. For the experimentation, we use over than 2300 words as training data. The evaluation of the chunker consists of two steps and gives results that we consider very satisfactory (average accuracy of 89,60% for the classification step and 80,46% for the full chunking process.

  2. Runaway electrons in TEXT-U

    International Nuclear Information System (INIS)

    Runaway electrons have long been studied in tokamak plasmas. The previous results regarding runaway electrons and the detection of hard x-rays are reviewed. The hard x-ray energy on TEXT-U is measured and the scaling of energy with electron density, ne, is noted. This scaling suggests a runaway source term that scales roughly as ne/1. The results indicate that runaways are created throughout the discharges. An upper bound for Xe due to magnetic fluctuations was found to be .0343 m2/s. This is an order of magnitude too low to explain the thermal transport in TEXT, implying that electrostatic fluctuations are important in thermal transport in TEXT

  3. Segmentation of Handwritten Text in Gurmukhi Script

    Directory of Open Access Journals (Sweden)

    Rajiv K. Sharma

    2008-09-01

    Full Text Available Character segmentation is an important preprocessing step for text recognition.The size and shape of characters generally play an important role in the processof segmentation. But for any optical character recognition (OCR system, thepresence of touching characters in textual as well handwritten documents furtherdecreases correct segmentation as well as recognition rate drastically. Becauseone can not control the size and shape of characters in handwritten documentsso the segmentation process for the handwritten document is too difficult. Wetried to segment handwritten text by proposing some algorithms, which wereimplemented and have shown encouraging results. Algorithms have beenproposed to segment the touching characters. These algorithms have shown areasonable improvement in segmenting the touching handwritten characters inGurmukhi script.

  4. WYLBUR reference manual. [For interactive text editing

    Energy Technology Data Exchange (ETDEWEB)

    Krupp, R.F.; Messina, P.C.; Peavler, J.M.; Schustack, S.; Starai, T.

    1977-04-01

    WYLBUR is a system for manipulating various kinds of text, such as computer programs, manuscripts, letters, forms, articles, or reports. Its on-line interactive text-editing capabilities allow the user to create, change, and correct text, and to search and display it. WYLBUR also has facilities for job submission and retrieval from remote terminals that make it possible for a user to inquire about the status of any job in the system, cancel jobs that are executing or awaiting execution, reroute output, raise job priority, or get information on the backlog of batch jobs. WYLBUR also has excellent recovery capabilities and a fast response time. This manual describes the WYLBUR version currently used at ANL. It is intended primarily as a reference manual; thus, examples of WYLBUR commands are kept to a minimum. (RWR)

  5. Text Mining the History of Medicine.

    Directory of Open Access Journals (Sweden)

    Paul Thompson

    Full Text Available Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc., synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.. TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research

  6. Quantum mechanics a comprehensive text for chemistry

    CERN Document Server

    Arora, Kishor

    2010-01-01

    This book contains 14 chapters. The text includes the inadequacy of classical mechanics and covers basic and fundamental concepts of quantum mechanics including concepts of transitional, vibration rotation and electronic energies, introduction to concepts of angular momenta, approximatemethods and their application concepts related to electron spin, symmetery concepts and quantum mechanics and ultimately the book features the theories of chemical bonding and use of softwares in quantum mechanics. the text of the book is presented in a lucid manner with ample examples and illustrations wherever

  7. Text Data Mining: Theory and Methods

    Directory of Open Access Journals (Sweden)

    Jeffrey L. Solka

    2008-01-01

    Full Text Available This paper provides the reader with a very brief introduction to some of the theory and methods of text data mining. The intent of this article is to introduce the reader to some of the current methodologies that are employed within this discipline area while at the same time making the reader aware of some of the interesting challenges that remain to be solved within the area. Finally, the articles serves as a very rudimentary tutorial on some of techniques while also providing the reader with a list of references for additional study.

  8. Events and Trends in Text Streams

    Energy Technology Data Exchange (ETDEWEB)

    Engel, David W.; Whitney, Paul D.; Cramer, Nicholas O.

    2010-03-04

    "Text streams--collections of documents or messages that are generated and observed over time--are ubiquitous. Our research and development are targeted at developing algorithms to find and characterize changes in topic within text streams. To date, this research has demonstrated the ability to detect and describe 1) short duration, atypical events and 2) the emergence of longer-term shifts in topical content. This technology has been applied to predefined temporally ordered document collections but is also suitable for application to near-real-time textual data streams."

  9. Stemming of Slovenian library science texts

    OpenAIRE

    Polona Vilar; Jasna Maver

    2002-01-01

    The theme of the article is the preparation of a stemming algorithm for Slovenian library science texts. The procedure consisted of three phases: learning, testing and evaluation.The preparation of the optimal stemmer for Slovenian texts from the field of library science is presented, its testing and comparison with two other stemmers for the Slovenian language: the Popovič stemmer and the Generic stemmer. A corpus of 790.000 words from the field of library science was used for learning. List...

  10. Extracting Conceptual Feature Structures from Text

    DEFF Research Database (Denmark)

    Andreasen, Troels; Bulskov, Henrik; Jensen, Per Anker;

    2011-01-01

    This paper describes an approach to indexing texts by their conceptual content using ontologies along with lexico-syntactic information and semantic role assignment provided by lexical resources. The conceptual content of meaningful chunks of text is transformed into conceptual feature structures...... and mapped into concepts in a generative ontology. Synonymous but linguistically quite distinct expressions are mapped to the same concept in the ontology. This allows us to perform a content-based search which will retrieve relevant documents independently of the linguistic form of the query as well...

  11. Text Classification: A Sequential Reading Approach

    CERN Document Server

    Dulac-Arnold, Gabriel; Gallinari, Patrick

    2011-01-01

    We propose to model the text classification process as a sequential decision process. In this process, an agent learns to classify documents into topics while reading the document sentences sequentially and learns to stop as soon as enough information was read for deciding. The proposed algorithm is based on a modelisation of Text Classification as a Markov Decision Process and learns by using Reinforcement Learning. Experiments on four different classical mono-label corpora show that the proposed approach performs comparably to classical SVM approaches for large training sets, and better for small training sets. In addition, the model automatically adapts its reading process to the quantity of training information provided.

  12. Distinguishing Word Senses in Untagged Text

    CERN Document Server

    Pedersen, T; Pedersen, Ted; Bruce, Rebecca

    1997-01-01

    This paper describes an experimental comparison of three unsupervised learning algorithms that distinguish the sense of an ambiguous word in untagged text. The methods described in this paper, McQuitty's similarity analysis, Ward's minimum-variance method, and the EM algorithm, assign each instance of an ambiguous word to a known sense definition based solely on the values of automatically identifiable features in text. These methods and feature sets are found to be more successful in disambiguating nouns rather than adjectives or verbs. Overall, the most accurate of these procedures is McQuitty's similarity analysis in combination with a high dimensional feature set.

  13. CCM: A Text Classification Method by Clustering

    DEFF Research Database (Denmark)

    Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock;

    2011-01-01

    In this paper, a new Cluster based Classification Model (CCM) for suspicious email detection and other text classification tasks, is presented. Comparative experiments of the proposed model against traditional classification models and the boosting algorithm are also discussed. Experimental results...... show that the CCM outperforms traditional classification models as well as the boosting algorithm for the task of suspicious email detection on terrorism domain email dataset and topic categorization on the Reuters-21578 and 20 Newsgroups datasets. The overall finding is that applying a cluster based...... approach to text classification tasks simplifies the model and at the same time increases the accuracy....

  14. Multilingual Topic Models for Unaligned Text

    CERN Document Server

    Boyd-Graber, Jordan

    2012-01-01

    We develop the multilingual topic model for unaligned text (MuTo), a probabilistic model of text that is designed to analyze corpora composed of documents in two languages. From these documents, MuTo uses stochastic EM to simultaneously discover both a matching between the languages and multilingual latent topics. We demonstrate that MuTo is able to find shared topics on real-world multilingual corpora, successfully pairing related documents across languages. MuTo provides a new framework for creating multilingual topic models without needing carefully curated parallel corpora and allows applications built using the topic model formalism to be applied to a much wider class of corpora.

  15. Choices of texts for literary education

    DEFF Research Database (Denmark)

    Skyggebjerg, Anna Karlskov

    literature studies at universities, where criteria concerning language and form are often more valued than criteria concerning character and content. This tendency to celebrate the formal aspects and the literariness of literature is recognized in governmental documents, teaching materials, and in the...... the possibility for positioning pupils/young adults ? What does the choice of texts mean for pupils’/young adults’ possibilities as readers and individual interpreters? How are the pupils’ potentials for envisioning and engaging in literature with certain choices of texts?...

  16. Requirements to a text of advertisement

    OpenAIRE

    Shmilyk, Iryna

    2013-01-01

    In the given article the author has studied the peculiarities of the structure of advertisement: the title, introduction, content, conclusion, price and address. The author has also dealt in the article with the main requirements to a text of advertisement: its singularity, conciseness, expressiveness, civility and compliance with linguistic norms.

  17. Teaching "Paul's Case" as a Gay Text.

    Science.gov (United States)

    Meem, Deborah T.

    2002-01-01

    Notes how the author usually limits her instructional role to that of facilitator, focusing on methods more than on specific content, but occasionally she feels compelled to take a more proactive approach, to guide the students toward one reading of a text. Considers that teaching Willa Cather's 1905 short story "Paul's Case" as a piece of gay…

  18. Examining Response Confidence in Multiple Text Tasks

    Science.gov (United States)

    List, Alexandra; Alexander, Patricia A.

    2015-01-01

    Students' confidence in their responses to a multiple text-processing task and their justifications for those confidence ratings were investigated. Specifically, 215 undergraduates responded to two academic questions, differing by type (i.e., discrete and open-ended) and by domain (i.e., developmental psychology and astrophysics), using a digital…

  19. Database citation in full text biomedical articles.

    Science.gov (United States)

    Kafkas, Şenay; Kim, Jee-Hyub; McEntyre, Johanna R

    2013-01-01

    Molecular biology and literature databases represent essential infrastructure for life science research. Effective integration of these data resources requires that there are structured cross-references at the level of individual articles and biological records. Here, we describe the current patterns of how database entries are cited in research articles, based on analysis of the full text Open Access articles available from Europe PMC. Focusing on citation of entries in the European Nucleotide Archive (ENA), UniProt and Protein Data Bank, Europe (PDBe), we demonstrate that text mining doubles the number of structured annotations of database record citations supplied in journal articles by publishers. Many thousands of new literature-database relationships are found by text mining, since these relationships are also not present in the set of articles cited by database records. We recommend that structured annotation of database records in articles is extended to other databases, such as ArrayExpress and Pfam, entries from which are also cited widely in the literature. The very high precision and high-throughput of this text-mining pipeline makes this activity possible both accurately and at low cost, which will allow the development of new integrated data services. PMID:23734176

  20. Writing Treatment for Aphasia: A Texting Approach

    Science.gov (United States)

    Beeson, Pelagie M.; Higginson, Kristina; Rising, Kindle

    2013-01-01

    Purpose: Treatment studies have documented the therapeutic and functional value of lexical writing treatment for individuals with severe aphasia. The purpose of this study was to determine whether such retraining could be accomplished using the typing feature of a cellular telephone, with the ultimate goal of using text messaging for…

  1. Electromagnetic Induction Rediscovered Using Original Texts.

    Science.gov (United States)

    Barth, Michael

    2000-01-01

    Describes a teaching unit on electromagnetic induction using historic texts. Uses some of Faraday's diary entries from 1831 to introduce the phenomenon of electromagnetic induction and teach about the properties of electricity, of taking conclusions from experiment, and scientific methodology. (ASK)

  2. Investigating Text Input Methods for Mobile Phones

    Directory of Open Access Journals (Sweden)

    Barry O’Riordan

    2005-01-01

    Full Text Available Human Computer Interaction is a primary factor in the success or failure of any device but if an objective view is taken of the current mobile phone market you would be forgiven for thinking usability was secondary to aesthetics. Many phone manufacturers modify the design of phones to be different than the competition and to target fashion trends, usually at the expense of usability and performance. There is a lack of awareness among many buyers of the usability of the device they are purchasing and the disposability of modern technology is an effect rather than a cause of this. Designing new text entry methods for mobile devices can be expensive and labour-intensive. The assessment and comparison of a new text entry method with current methods is a necessary part of the design process. The best way to do this is through an empirical evaluation. The aim of the study was to establish which mobile phone text input method best suits the requirements of a select group of target users. This study used a diverse range of users to compare devices that are in everyday use by most of the adult population. The proliferation of the devices is as yet unmatched by the study of their application and the consideration of their user friendliness.

  3. BaffleText: a Human Interactive Proof

    Science.gov (United States)

    Chew, Monica; Baird, Henry S.

    2003-01-01

    Internet services designed for human use are being abused by programs. We present a defense against such attacks in the form of a CAPTCHA (Completely Automatic Public Turing test to tell Computers and Humans Apart) that exploits the difference in ability between humans and machines in reading images of text. CAPTCHAs are a special case of 'human interactive proofs,' a broad class of security protocols that allow people to identify themselves over networks as members of given groups. We point out vulnerabilities of reading-based CAPTCHAs to dictionary and computer-vision attacks. We also draw on the literature on the psychophysics of human reading, which suggests fresh defenses available to CAPTCHAs. Motivated by these considerations, we propose BaffleText, a CAPTCHA which uses non-English pronounceable words to defend against dictionary attacks, and Gestalt-motivated image-masking degradations to defend against image restoration attacks. Experiments on human subjects confirm the human legibility and user acceptance of BaffleText images. We have found an image-complexity measure that correlates well with user acceptance and assists in engineering the generation of challenges to fit the ability gap. Recent computer-vision attacks, run independently by Mori and Jitendra, suggest that BaffleText is stronger than two existing CAPTCHAs.

  4. Automatic Syntactic Analysis of Free Text.

    Science.gov (United States)

    Schwarz, Christoph

    1990-01-01

    Discusses problems encountered with the syntactic analysis of free text documents in indexing. Postcoordination and precoordination of terms is discussed, an automatic indexing system call COPSY (context operator syntax) that uses natural language processing techniques is described, and future developments are explained. (60 references) (LRW)

  5. Task-Driven Dynamic Text Summarization

    Science.gov (United States)

    Workman, Terri Elizabeth

    2011-01-01

    The objective of this work is to examine the efficacy of natural language processing (NLP) in summarizing bibliographic text for multiple purposes. Researchers have noted the accelerating growth of bibliographic databases. Information seekers using traditional information retrieval techniques when searching large bibliographic databases are often…

  6. Ontology Assisted Formal Specification Extraction from Text

    OpenAIRE

    Andreea Mihis

    2010-01-01

    In the field of knowledge processing, the ontologies are the most important mean. They make possible for the computer to understand better the natural language and to make judgments. In this paper, a method which use ontologies in the semi-automatic extraction of formal specifications from a natural language text is proposed.

  7. Learn Japanese: Secondary School Text, Volume 5.

    Science.gov (United States)

    Hirai, Bernice; And Others

    This is the fifth in a series of ten texts designed for teaching Japanese at the secondary level. Also available are supplementary instructional materials and teacher's guides. Throughout the two units of four lessons each, the theme centers around life in Japan as seen through the eyes of an American student. Each unit contains conversations,…

  8. Fieldwork, Heritage and Engaging Landscape Texts

    Science.gov (United States)

    Mains, Susan P.

    2014-01-01

    This paper outlines and analyses efforts to critically engage with "heritage" through the development and responses to a series of undergraduate residential fieldwork trips held in the North Coast of Jamaica. The ways in which we read heritage through varied "texts"--specifically, material landscapes, guided heritage tours,…

  9. Modeling statistical properties of written text.

    Directory of Open Access Journals (Sweden)

    M Angeles Serrano

    Full Text Available Written text is one of the fundamental manifestations of human language, and the study of its universal regularities can give clues about how our brains process information and how we, as a society, organize and share it. Among these regularities, only Zipf's law has been explored in depth. Other basic properties, such as the existence of bursts of rare words in specific documents, have only been studied independently of each other and mainly by descriptive models. As a consequence, there is a lack of understanding of linguistic processes as complex emergent phenomena. Beyond Zipf's law for word frequencies, here we focus on burstiness, Heaps' law describing the sublinear growth of vocabulary size with the length of a document, and the topicality of document collections, which encode correlations within and across documents absent in random null models. We introduce and validate a generative model that explains the simultaneous emergence of all these patterns from simple rules. As a result, we find a connection between the bursty nature of rare words and the topical organization of texts and identify dynamic word ranking and memory across documents as key mechanisms explaining the non trivial organization of written text. Our research can have broad implications and practical applications in computer science, cognitive science and linguistics.

  10. Validation Study of Waray Text Readability Instrument

    Science.gov (United States)

    Oyzon, Voltaire Q.; Corrales, Juven B.; Estardo, Wilfredo M., Jr.

    2015-01-01

    In 2012 the Leyte Normal University developed a computer software--modelled after the Spache Readability Formula (1953) made for English--made to help rank texts that can is used by teachers or research groups on selecting appropriate reading materials to support the DepEd's MTB-MLE program in Region VIII, in the Philippines. However,…

  11. Text Memorisation in Chinese Foreign Language Education

    Science.gov (United States)

    Yu, Xia

    2012-01-01

    In China, a widespread learning practice for foreign languages are reading, reciting and memorising texts. This book investigates this practice against a background of Confucian heritage learning and western attitudes towards memorising, particularly audio-lingual approaches to language teaching and later largely negative attitudes. The author…

  12. Automatic Induction of Rule Based Text Categorization

    Directory of Open Access Journals (Sweden)

    D.Maghesh Kumar

    2010-12-01

    Full Text Available The automated categorization of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuingneed to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. This paper describes, a novel method for the automatic induction of rule-based text classifiers. This method supports a hypothesis language of the form "if T1, … or Tn occurs in document d, and none of T1+n,... Tn+m occurs in d, then classify d under category c," where each Ti is a conjunction of terms. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. Issues pertaining tothree different problems, namely, document representation, classifier construction, and classifier evaluation were discussed in detail.

  13. CONAN : Text Mining in the Biomedical Domain

    NARCIS (Netherlands)

    Malik, R.

    2006-01-01

    This thesis is about Text Mining. Extracting important information from literature. In the last years, the number of biomedical articles and journals is growing exponentially. Scientists might not find the information they want because of the large number of publications. Therefore a system was cons

  14. Relations between Adolescents' Text Processing and Reasoning

    Science.gov (United States)

    Wolfe, Michael B. W.; Goldman, Susan R.

    2005-01-01

    This research examines adolescents' learning about a historical issue from multiple information sources. Adolescents read 2 contradictory texts explaining the Fall of Rome and thought out loud after each sentence. After reading, a series of questions probed their understanding and ability to reason with the information. Think-aloud protocols were…

  15. Interword spacing in Chinese text layout.

    Science.gov (United States)

    Hsu, S H; Huang, K C

    2000-10-01

    Three experiments using Chinese text were conducted to investigate word spacing and its effect on reading performance. In Exp. 1, a sonogram detector was used to analyze interword and intercharacter (within a word) time intervals from text read aloud by professional TV broadcasters versus college graduates. The results showed interword intervals were significantly longer than intercharacter intervals, indicating that interword spacing has psychological reality in speech. Exp. 2 examined the effect on reading performance due to separating the characters that compose a word. Separating the characters of a word did not decrease reading accuracy but did result in significantly longer reading times. Exp. 3 explored the effect of word spacing in Chinese sentences on reading performance. Analysis showed that word spacing did not affect reading accuracy, but half character and whole-character spacing significantly reduced reading time. The results of the present study suggest that word spacing in Chinese text layout enhances reading performance. Word spacing may help the reader to segment more quickly a string of characters into words and reduce the likelihood of misinterpretation. Also, ambiguity of sentence structure severely degraded reading accuracy. The implications of the results for word spacing design in Chinese text are discussed. PMID:11065294

  16. Assessing Assessment Texts: Where Is Planning?

    Science.gov (United States)

    Fives, Helenrose; Barnes, Nicole; Dacey, Charity; Gillis, Anna

    2016-01-01

    We conducted a content analysis of 27 assessment textbooks to determine how assessment planning was framed in texts for preservice teachers. We identified eight assessment planning themes: alignment, assessment purpose and types, reliability and validity, writing goals and objectives, planning specific assessments, unpacking, overall assessment…

  17. Subsegmental language detection in Celtic language text

    OpenAIRE

    Tyers, Francis Morton; Minocha, Akshay

    2014-01-01

    This paper describes an experiment to perform language identification on a sub-sentence basis. The typical case of language identification is to detect the language of documents or sentences. However, it may be the case that a single sentence or segment contains more than one language. This is especially the case in texts where code switching occurs.

  18. Full Text Journal Subscriptions: An Evolutionary Process.

    Science.gov (United States)

    Luther, Judy

    1997-01-01

    Provides an overview of companies offering Web accessible subscriptions to full text electronic versions of scientific, technical, and medical journals (Academic Press, Blackwell, EBSCO, Elsevier, Highwire Press, Information Quest, Institute of Physics, Johns Hopkins University Press, OCLC, OVID, Springer, and SWETS). Also lists guidelines for…

  19. A Scheme for Text Analysis Using Fortran.

    Science.gov (United States)

    Koether, Mary E.; Coke, Esther U.

    Using string-manipulation algorithms, FORTRAN computer programs were designed for analysis of written material. The programs measure length of a text and its complexity in terms of the average length of words and sentences, map the occurrences of keywords or phrases, calculate word frequency distribution and certain indicators of style. Trials of…

  20. INNER DIALOGICITY OF MEDICAL SCIENTIFIC TEXTS

    Directory of Open Access Journals (Sweden)

    Efremova Nataliya Vladimirovna

    2015-06-01

    Full Text Available The author studies inner dialogicity as an integral property of a scientist's thinking activity, a way of a scientific idea development, one of the cognitive and discursive mechanisms of new knowledge formation, its crystallization and dementalisation in a text, as a way of search for truth. Such approach to dialogicity in the study of a scientific text makes it possible to analyze the cogitative processes proceeding in human consciousness and cognitive activity, allows to fully understand the stated scientific concept, to define pragmatic strategies of the author, to plunge into his reflexive world. On the material of medical scientific texts of N.M. Amosov and F. G. Uglov, famous scientists in the field of cardio surgery, it is established that traces of internal dialogicity manifestation in the textual space of scientists actualize the origin of new knowledge, the change of author's semantic positions, his ability to reflect, compare, analyze his own thoughts and actions, to estimate oneself and the features of thinking process which are realized in logic of a statement of the scientific concept, an explanation of concepts, terms at judgment of the points of view of contemporaries and predecessors, adherents and scientist's opponents, and also orientation to the addressee's presupposition, activization of his cogitative activity. Linguistic, discursive, verbal analysis singles out the impact on the addressee, his mental activity.

  1. Exploring Academic Voice in Multimodal Quantitative Texts

    Directory of Open Access Journals (Sweden)

    Robert Prince

    2014-10-01

    Full Text Available Research on students’ academic literacies practices has tended to focus on the written mode in order to understand the academic conventions necessary to access Higher Education. However, the representation of quantitative information can be a challenge to many students. Quantitative information can be represented through a range of modes (such as writing, visuals and numbers and different information graphics (such as tables, charts, graphs. This paper focuses on the semiotic aspects of graphic representation in academic work, using student and published data from the Health Science, and an information graphic from the social domain as a counterpoint to explore aspects about agency and choice in academic voice in multimodal texts. It explores voice in terms of three aspects which work across modes, namely authorial engagement, citation and modality. The work of different modes and their inter-relations in quantitative texts is established, as is the use of sources in citation. We also look at the ways in which credibility and validity are established through modality. This exploration reveals that there is a complex interplay of modes in the construction of academic voice, which are largely tacit. This has implications for the way we think about and teach writing and text-making in quantitative disciplines in Higher Education.

  2. Mining Texts in Reading to Write.

    Science.gov (United States)

    Greene, Stuart

    1992-01-01

    Proposes a set of strategies for connecting reading and writing, placing the discussion in the context of other pedagogical approaches designed to exploit the relationship between reading and writing. Explores ways in which students employ the strategies involved in "mining" a text--reconstructing context, inferring or imposing structure, and…

  3. Assessing Literary Reasoning: Text and Task Complexities

    Science.gov (United States)

    Lee, Carol D.; Goldman, Susan R.

    2015-01-01

    This article addresses 3 broad challenges of assessment in reading comprehension: (a) explicitly articulating the knowledge and skills students need to recognize and be able to use in comprehending complex texts; (b) understanding how knowledge and skills progress and successively deepen and develop over repeated opportunities to engage in tasks…

  4. Linguistic expertise of the advertising text

    Directory of Open Access Journals (Sweden)

    Milaeva O. V.

    2011-03-01

    Full Text Available This article is devoted to the analysis such indicator of development of Russian science, such as bibliometric (the number of of publications in the world of publication stream. The main analyzed period is 1999-2008 years. Statistical data on the main directions of research are taken from the analytical report Thomson Reuters during January 2010.

  5. Text Independent Biometric Speaker Recognition System

    Directory of Open Access Journals (Sweden)

    Luqman Gbadamosi

    2013-11-01

    Full Text Available Designing a machine that mimics the human behavior, particularly with the capability of responding properly to spoken language, has intrigued engineers and scientists for centuries. The earlier research work on voice recognition system which is text-dependent requires that the user must say exactly the same text or passphrase for both enrollment and verification before gaining access. In this method the testing speech is polluted by additive noise at different noise decibel levels to achieve only 75% recognition rate and would require full cooperation by the speaker which could not be used for forensic investigation. This paper presents the historical background, and technological advances in voice recognition and most importantly the study and implementation of text-independent biometric voice recognition system which could be used for speaker identification with 100% recognition rate. The technique makes it possible to use the speaker's voice to verify their identity and control access to services such as voice dialing, telephone shopping, database access services, information services, voice mail, and remote access to computers. The implementation mainly incorporates Mel frequency Cepstral Coefficient (MFCCs which was used for feature extraction and Vector quantization using the Linde-Buzo-Gray (VQLBG algorithm used to minimize the amount of data to be handled. The matching result is given on the basis of minimum distortion distance. The project is coded in MATLAB.

  6. Studies of electron cyclotron emission on text

    International Nuclear Information System (INIS)

    The Auburn University electron cyclotron emission (ECE) system has made many significant contributions to the TEXT experimental program during the past five years. Contributions include electron temperature information used in the following areas of study: electron cyclotron heating (ECH), pellet injection, and impurity/energy transport. Details of the role which the Auburn ECE system has played will now be discussed

  7. A TWO STAGE METHOD FOR BENGALI TEXT EXTRACTION FROM STILL IMAGES CONTAINING TEXT

    Directory of Open Access Journals (Sweden)

    Ankita Sikdar

    2012-07-01

    Full Text Available Bengali text data present in multimedia images having multiple content forms, such as still images and text, contain information that when extracted finds a lot of applications. The images can be of different types, where objects and text may be completely separated or overlapped or embedded in each other. The Bengali text can be of different shapes and sizes. Extraction of text from these types of images becomes challenging because the textual portion has to be correctly separated from the rest of the background. The input image passes through two stages. The first step tries to locate the different components in the image using entropy filtering and the second stage distinguishes the components representing text from the non-textual components based on several features of Bengali text. The text thus obtained from the image can then be used in software such as Bengali OCR for character recognition.

  8. DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures.

    Directory of Open Access Journals (Sweden)

    Xu-Cheng Yin

    Full Text Available Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information. A high-quality ground truth standard can greatly facilitate the development of an automated system. This article describes DeTEXT: A database for evaluating text extraction from biomedical literature figures. It is the first publicly available, human-annotated, high quality, and large-scale figure-text dataset with 288 full-text articles, 500 biomedical figures, and 9308 text regions. This article describes how figures were selected from open-access full-text biomedical articles and how annotation guidelines and annotation tools were developed. We also discuss the inter-annotator agreement and the reliability of the annotations. We summarize the statistics of the DeTEXT data and make available evaluation protocols for DeTEXT. Finally we lay out challenges we observed in the automated detection and recognition of figure text and discuss research directions in this area. DeTEXT is publicly available for downloading at http://prir.ustb.edu.cn/DeTEXT/.

  9. Text Mining the History of Medicine.

    Science.gov (United States)

    Thompson, Paul; Batista-Navarro, Riza Theresa; Kontonatsios, Georgios; Carter, Jacob; Toon, Elizabeth; McNaught, John; Timmermann, Carsten; Worboys, Michael; Ananiadou, Sophia

    2016-01-01

    Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while

  10. Can An Evolutionary Process Create English Text?

    Energy Technology Data Exchange (ETDEWEB)

    Bailey, David H.

    2008-10-29

    Critics of the conventional theory of biological evolution have asserted that while natural processes might result in some limited diversity, nothing fundamentally new can arise from 'random' evolution. In response, biologists such as Richard Dawkins have demonstrated that a computer program can generate a specific short phrase via evolution-like iterations starting with random gibberish. While such demonstrations are intriguing, they are flawed in that they have a fixed, pre-specified future target, whereas in real biological evolution there is no fixed future target, but only a complicated 'fitness landscape'. In this study, a significantly more sophisticated evolutionary scheme is employed to produce text segments reminiscent of a Charles Dickens novel. The aggregate size of these segments is larger than the computer program and the input Dickens text, even when comparing compressed data (as a measure of information content).

  11. Extraction of information from unstructured text

    Energy Technology Data Exchange (ETDEWEB)

    Irwin, N.H.; DeLand, S.M.; Crowder, S.V.

    1995-11-01

    Extracting information from unstructured text has become an emphasis in recent years due to the large amount of text now electronically available. This status report describes the findings and work done by the end of the first year of a two-year LDRD. Requirements of the approach included that it model the information in a domain independent way. This means that it would differ from current systems by not relying on previously built domain knowledge and that it would do more than keyword identification. Three areas that are discussed and expected to contribute to a solution include (1) identifying key entities through document level profiling and preprocessing, (2) identifying relationships between entities through sentence level syntax, and (3) combining the first two with semantic knowledge about the terms.

  12. Bimodal Emotion Recognition from Speech and Text

    Directory of Open Access Journals (Sweden)

    Weilin Ye

    2014-01-01

    Full Text Available This paper presents an approach to emotion recognition from speech signals and textual content. In the analysis of speech signals, thirty-seven acoustic features are extracted from the speech input. Two different classifiers Support Vector Machines (SVMs and BP neural network are adopted to classify the emotional states. In text analysis, we use the two-step classification method to recognize the emotional states. The final emotional state is determined based on the emotion outputs from the acoustic and textual analyses. In this paper we have two parallel classifiers for acoustic information and two serial classifiers for textual information, and a final decision is made by combing these classifiers in decision level fusion. Experimental results show that the emotion recognition accuracy of the integrated system is better than that of either of the two individual approaches.

  13. Improved VSM for Incremental Text Classification

    Science.gov (United States)

    Yang, Zhen; Lei, Jianjun; Wang, Jian; Zhang, Xing; Guo, Jim

    2008-11-01

    As a simple classification method VSM has been widely applied in text information processing field. There are some problems for traditional VSM to select a refined vector model representation, which can make a good tradeoff between complexity and performance, especially for incremental text mining. To solve these problems, in this paper, several improvements, such as VSM based on improved TF, TFIDF and BM25, are discussed. And then maximum mutual information feature selection is introduced to achieve a low dimension VSM with less complexity, and at the same time keep an acceptable precision. The experimental results of spam filtering and short messages classification shows that the algorithm can achieve higher precision than existing algorithms under same conditions.

  14. Clustering Analysis within Text Classification Techniques

    Directory of Open Access Journals (Sweden)

    Madalina ZURINI

    2011-01-01

    Full Text Available The paper represents a personal approach upon the main applications of classification which are presented in the area of knowledge based society by means of methods and techniques widely spread in the literature. Text classification is underlined in chapter two where the main techniques used are described, along with an integrated taxonomy. The transition is made through the concept of spatial representation. Having the elementary elements of geometry and the artificial intelligence analysis, spatial representation models are presented. Using a parallel approach, spatial dimension is introduced in the process of classification. The main clustering methods are described in an aggregated taxonomy. For an example, spam and ham words are clustered and spatial represented, when the concepts of spam, ham and common and linkage word are presented and explained in the xOy space representation.

  15. An unpublished text of Jovellanos about mineralogy

    Directory of Open Access Journals (Sweden)

    Jorge ORDAZ GARGALLO

    2012-02-01

    Full Text Available An unpublished manuscript of Gaspar Melchor de Jovellanos about the history of mineralogy, written during his captivity in Bellver Castle (Palma de Mallorca is presented and analyzed. In this writing the importance of the chemical knowledge as a source of other branches of science and its applications in different fields of agriculture, mining and industry is considered. The author made a historical synthesis reviewing the men of science that contributed in a great extent to the advance of the chemistry and mineralogy. The text clearly supports the new contributions of Lavoisier and other supporters of experimentation as a scientific method, which agrees with Jovellanos’ ideas about the development of the «useful» sciences for the progress of the countries.

  16. Ordinary differential equations a graduate text

    CERN Document Server

    Bhamra, K S

    2015-01-01

    ORDINARY DIFFERENTIAL EQUATIONS: A Graduate Text presents a systematic and comprehensive introduction to ODEs for graduate and postgraduate students. The systematic organized text on differential inequalities, Gronwall's inequality, Nagumo's theorems, Osgood's criteria and applications of different equations of first order is dealt with in a greater depth. The book discusses qualitative and quantitative aspects of the Strum - Liouville problems, Green's function, integral equations, Laplace transform and is supported by a number of worked-out examples in each lesson to make the concepts clear. A lot of stress on stability theory is laid down, especially on Lyapunov and Poincare stability theory. A numerous figures in various lessons (in particular lessons dealing with stability theory) have been added to clarify the key concepts in DE theory. Nonlinear oscillation in conservative systems and Hamiltonian systems highlights basic nature of the systems considered. Perturbation techniques lesson deals in fairly d...

  17. Word Sense Disambiguation Approach for Arabic Text

    OpenAIRE

    Nadia Bouhriz; Faouzia Benabbou; El Habib Ben Lahmar

    2016-01-01

    Word Sense Disambiguation (WSD) consists of identifying the correct sense of an ambiguous word occurring in a given context. Most of Arabic WSD systems are based generally on the information extracted from the local context of the word to be disambiguated. This information is not usually sufficient for a best disambiguation. To overcome this limit, we propose an approach that takes into consideration, in addition to the local context, the global context too extracted from the full text. More ...

  18. Intertextuality in Text-based Discussions

    Directory of Open Access Journals (Sweden)

    Hamidah Mohd Ismail

    2011-01-01

    Full Text Available One  of  the  main  issues  often  discussed  among  academics  is  how  to  encourage  active participation by students during classroom discussions. This applies particularly to students at the tertiary level who are expected to possess creative and critical thinking skills. Hence, this paper reports on a study that examined how these skills were demonstrated by a group of university students  who  employed  intertextual  links  during  a  follow-up  reading  activity involving  small-group  text  discussions.  Thirty  undergraduates  who  were  in  their  fifth semester of a TESL degree programme were prescribed reading texts consisting of two chapters taken  from  a  book.  Findings  reveal  that  intertextual  links  made  during  text discussions created successfully a “collaborative environment” where beliefs and values were shared judicially among participants. Pedagogical implications for ESL classroom practice include  heightening  the  awareness  amongst  academics  and  students  of  the  role  of intertextuality in order to promote students’ use of their critical and creative thinking skills in a supportive classroom environment.

  19. Services for annotation of biomedical text

    OpenAIRE

    Hakenberg, Jörg

    2008-01-01

    Motivation: Text mining in the biomedical domain in recent years has focused on the development of tools for recognizing named entities and extracting relations. Such research resulted from the need for such tools as basic components for more advanced solutions. Named entity recognition, entity mention normalization, and relationship extraction now have reached a stage where they perform comparably to human annotators (considering inter--annotator agreement, measured in many studies to be aro...

  20. Stemming of Slovenian library science texts

    Directory of Open Access Journals (Sweden)

    Polona Vilar

    2002-01-01

    Full Text Available The theme of the article is the preparation of a stemming algorithm for Slovenian library science texts. The procedure consisted of three phases: learning, testing and evaluation.The preparation of the optimal stemmer for Slovenian texts from the field of library science is presented, its testing and comparison with two other stemmers for the Slovenian language: the Popovič stemmer and the Generic stemmer. A corpus of 790.000 words from the field of library science was used for learning. Lists of stems, word endings and stop-words were built. In the testing phase, the component parts of the algorithm were tested on an additional corpus of 167.000 words. In the evaluation phase, a comparison of the three stemmers processing the same word corpus was made. The results of each stemmer were compared with an intellectually prepared control result of the stemming of the corpus. It consisted of groups of semantically connected words with no errors. Understemming was especially monitored – the number of stems for semantically connected words, produced by an algorithm. The results were statistically processed with the Kruskal-Wallis test. The Optimal stemmer produced the best results.It matched best with the reference results and also gave the smallest number of stems for one semantic meaning. The Popovič stemmer followed closely. The Generic stemmer proved to be the least accurate. The procedures described in the thesis can represent a platform for the development of the tools for automatic indexing and retrieval for library science texts in Slovenian language.

  1. Sublanguage, text type and machine translation

    OpenAIRE

    O'Brien, Sharon

    1993-01-01

    This thesis explores the domains of sublanguage, machine translation and textual analysis. Chapter 1 discusses the definitions and characteristics of sublanguage put forward by researchers to date, as well as the background of textual analysis in linguistics. This discussion reveals that, although there is much to be gained from textual analysis, little consideration has been given to the notion of "text" in the sublanguage approach to machine translation (MT). Before any sublanguage anal...

  2. Identification of anatomical terminology in medical text.

    OpenAIRE

    Sneiderman, C. A.; Rindflesch, T. C.; Bean, C. A.

    1998-01-01

    We report on an experiment to use the natural language processing tools being developed in the SPECIALIST system to accurately identify terminology associated with the coronary arteries as expressed in coronary catheterization reports. The ultimate goal is to map from any anatomically-oriented medical text to online images, using the UMLS as an intermediate knowledge source. We describe some of the problems encountered when processing coronary artery terminology and report on the results of a...

  3. Arabic multi-document text summarisation

    OpenAIRE

    El-Haj, Mahmoud

    2012-01-01

    Multi-document summarisation is the process of producing a single summary of a collection of related documents. Much of the current work on multi-document text summarisation is concerned with the English language; relevant resources are numerous and readily available. These resources include human generated (gold-standard) and automatic summaries. Arabic multi-document summarisation is still in its infancy. One of the obstacles to progress is the limited availability of Arabic resources to su...

  4. Automatic Induction of Rule Based Text Categorization

    OpenAIRE

    D.Maghesh Kumar

    2010-01-01

    The automated categorization of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuingneed to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. This paper describ...

  5. Machine Learning in Automated Text Categorization

    OpenAIRE

    Sebastiani, Fabrizio

    2001-01-01

    The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categori...

  6. READING ALOUD STRATEGIES IN READING ENGLISH TEXTS

    OpenAIRE

    Iyen Nurlaelawati; Shofa Dzulqodah

    2014-01-01

    Abstract: Reading aloud by a young language learner shows unique patterns as the evidence of his/her language data processing. This study, thus, explored the strategies applied by an Indonesian young language learner to read English written texts aloud to identify errors that actually bring certain benefits in her language learning process such as making intelligent guesses when she encountered unfamiliar words. It adopted qualitative case study design involving a seven-year old girl as the s...

  7. Stochastic text models for music categorization

    OpenAIRE

    Pérez Sancho, Carlos; Rizo Valero, David; Iñesta Quereda, José Manuel

    2008-01-01

    Music genre meta-data is of paramount importance for the organization of music repositories. People use genre in a natural way when entering a music store or looking into music collections. Automatic genre classification has become a popular topic in music information retrieval research. This work brings to symbolic music recognition some technologies, like the stochastic language models, already successfully applied to text categorization. In this work we model chord progressions and melodie...

  8. Clustering Analysis within Text Classification Techniques

    OpenAIRE

    Madalina ZURINI; Catalin SBORA

    2011-01-01

    The paper represents a personal approach upon the main applications of classification which are presented in the area of knowledge based society by means of methods and techniques widely spread in the literature. Text classification is underlined in chapter two where the main techniques used are described, along with an integrated taxonomy. The transition is made through the concept of spatial representation. Having the elementary elements of geometry and the artificial intelligence analysis,...

  9. Library of Algorithms for Text Ciphering

    OpenAIRE

    Mikulka, Jiří

    2011-01-01

    p, li { white-space: pre-wrap; } p, li { white-space: pre-wrap; } This thesis deals with text ciphering. The presented paper describes at first basic theoretical background of cryptology and basic distribution of cryptographic algorithms. Then it describes a brief history of encryption from beginning to present. Theoretical description of ciphering methods and its implementation details are discussed here. All basic types of conventional encryption algorithms and also some modern ciphering me...

  10. TEXT SIGNAGE RECOGNITION IN ANDROID MOBILE DEVICES

    OpenAIRE

    Oi-Mean Foong; Suziah Sulaiman; Kiing Kiu Ling

    2013-01-01

    This study presents a Text Signage Recognition (TSR) model in Android mobile devices for Visually Impaired People (VIP). Independence navigation is always a challenge to VIP for indoor navigation in unfamiliar surroundings. Assistive Technology such as Android smart devices has great potential to assist VIPs in indoor navigation using built-in speech synthesizer. In contrast to previous TSR research which was deployed in standalone personal computer system using Otsu’s algorithm, we hav...

  11. Interdisciplinary Interpretation of the Bible Text

    OpenAIRE

    Ying Han

    2015-01-01

    Literature and critical and creative thinking complement each other. The Bible’s classics status has been demonstrated and approved by many Western and domestic scholars, accordingly literary criticism of the Bible examined extensively and from different angles. This paper makes an interdisciplinary interpretation of the Bible text from the perspective of literature, history, philosophy and so on. It suggests offering course of literary criticism of the Bible and cultivating critical and crea...

  12. Logistic regression a self-learning text

    CERN Document Server

    Kleinbaum, David G

    1994-01-01

    This textbook provides students and professionals in the health sciences with a presentation of the use of logistic regression in research. The text is self-contained, and designed to be used both in class or as a tool for self-study. It arises from the author's many years of experience teaching this material and the notes on which it is based have been extensively used throughout the world.

  13. Generating text from functional brain images

    Directory of Open Access Journals (Sweden)

    Francisco ePereira

    2011-08-01

    Full Text Available Recent work has shown that it is possible to take brain images acquired during viewing of a scene and reconstruct an approximation of the scene from those images. Here we show that it is also possible to generate text about the mental content reflected in brain images. We began with images collected as participants read names of concrete items (e.g., "Apartment" while also seeing line drawings of the item named. We built a model of the mental semantic representation of concrete concepts from text data and learned to map aspects of such representation to patterns of activation in the corresponding brain image. In order to validate this mapping, without accessing information about the items viewed for left-out individual brain images, we were able to generate from each one a collection of semantically pertinent words (e.g., "door," "window" for "Apartment". Furthermore, we show that the ability to generate such words allows us to perform a classification task and thus validate our method quantitatively.

  14. TEXT SIGNAGE RECOGNITION IN ANDROID MOBILE DEVICES

    Directory of Open Access Journals (Sweden)

    Oi-Mean Foong

    2013-01-01

    Full Text Available This study presents a Text Signage Recognition (TSR model in Android mobile devices for Visually Impaired People (VIP. Independence navigation is always a challenge to VIP for indoor navigation in unfamiliar surroundings. Assistive Technology such as Android smart devices has great potential to assist VIPs in indoor navigation using built-in speech synthesizer. In contrast to previous TSR research which was deployed in standalone personal computer system using Otsu’s algorithm, we have developed an affordable Text Signage Recognition in Android Mobile Devices using Tesseract OCR engine. The proposed TSR model used the input images from the International Conference on Document Analysis and Recognition (ICDAR 2003 dataset for system training and testing. The TSR model was tested by four volunteers who were blind-folded. The system performance of the TSR model was assessed using different metrics (i.e., Precision, Recall, F-Score and Recognition Formulas to determine its accuracy. Experimental results show that the proposed TSR model has achieved recognition rate satisfactorily.

  15. PEDANT: Parallel Texts in Göteborg

    Directory of Open Access Journals (Sweden)

    Daniel Ridings

    2012-09-01

    Full Text Available

    The article presents the status of the PEDANT project with parallel corpora at the Language Bank at Göteborg University. The solutions for access to the corpus data are presented. Access is provided by way of the internet and standard applications and SGML-aware programming tools. The SGML format for encoding translation pairs is outlined together. The methods allow working with everything from plain text to texts densely encoded with linguistic information.

     

    In hierdie artikel word 'n beskrywing gegee van die stand van die PEDANT-projek met parallelle korpora by die Taalbank by die Universiteit van Göteborg. Oplossings vir die verkryging van toegang tot die korpusdata word aangedui. Toegang word verskaf deur middel van die Internet en standaardtoepassings en SGML-sensitiewe programmeringshulpmiddels. Die SGML-formaat vir die enkodering van vertaalpare word gesamentlik geskets. Hierdie metodes laat toe dat gewerk kan word met enigiets vanaf suiwer teks tot tekste wat taalkundig dig geëtiketteer is.

     

  16. Text Belief Consistency Effects in the Comprehension of Multiple Texts with Conflicting Information

    Science.gov (United States)

    Maier, Johanna; Richter, Tobias

    2013-01-01

    When reading multiple texts about controversial scientific issues, learners must construct a coherent mental representation of the issue based on conflicting information that can be more or less belief-consistent. The present experiment investigated the effects of text-belief consistency on the situation model and memory for text. Students read…

  17. Learning from Conflicting Texts: The Role of Intertextual Conflict Resolution in Between-Text Integration

    Science.gov (United States)

    Kobayashi, Keiichi

    2015-01-01

    The present study examined the effect of intertextual conflict resolution on learning from conflicting texts. In two experiments, participants read sets of two texts under the condition of being encouraged either to resolve a conflict between the texts' arguments (the resolution condition) or to comprehend the arguments (the comprehension…

  18. How Much Handwritten Text Is Needed for Text-Independent Writer Verification and Identification

    NARCIS (Netherlands)

    Brink, Axel; Bulacu, Marius; Schomaker, Lambert

    2008-01-01

    The performance of off-line text-independent writer verification and identification increases when the documents contain more text. This relation was examined by repeatedly conducting writer verification and identification performance tests while gradually increasing the amount of text on the pages.

  19. On the application of text input metrics to handwritten text input

    OpenAIRE

    Read, Janet C.

    2006-01-01

    This paper describes the current metrics used in text input research, considering those used for discrete text input as well as those used for spoken input. It examines how these metrics might be used for handwritten text input and provides some thoughts about different metrics that might allow for a more fine grained evaluation of recognition improvement or input accuracy.

  20. “Girls Text Really Weird”: Gender, Texting and Identity Among Teens

    DEFF Research Database (Denmark)

    Ling, Richard; Baron, Naomi; Lenhart, Amanda;

    2014-01-01

    and other paralinguistic devices. In addition, they use texts to characterize the opposite sex. Teen boys' texts are seen as short and perhaps brisk when viewed by girls. Boys see teen girls' texts as being overly long, prying and containing unneeded elements. The discussion of these practices shows...... how teens engage in their sense of gender...

  1. Layout-aware text extraction from full-text PDF of scientific articles

    Directory of Open Access Journals (Sweden)

    Ramakrishnan Cartic

    2012-05-01

    Full Text Available Abstract Background The Portable Document Format (PDF is the most commonly used file format for online scientific publications. The absence of effective means to extract text from these PDF files in a layout-aware manner presents a significant challenge for developers of biomedical text mining or biocuration informatics systems that use published literature as an information source. In this paper we introduce the ‘Layout-Aware PDF Text Extraction’ (LA-PDFText system to facilitate accurate extraction of text from PDF files of research articles for use in text mining applications. Results Our paper describes the construction and performance of an open source system that extracts text blocks from PDF-formatted full-text research articles and classifies them into logical units based on rules that characterize specific sections. The LA-PDFText system focuses only on the textual content of the research articles and is meant as a baseline for further experiments into more advanced extraction methods that handle multi-modal content, such as images and graphs. The system works in a three-stage process: (1 Detecting contiguous text blocks using spatial layout processing to locate and identify blocks of contiguous text, (2 Classifying text blocks into rhetorical categories using a rule-based method and (3 Stitching classified text blocks together in the correct order resulting in the extraction of text from section-wise grouped blocks. We show that our system can identify text blocks and classify them into rhetorical categories with Precision1 = 0.96% Recall = 0.89% and F1 = 0.91%. We also present an evaluation of the accuracy of the block detection algorithm used in step 2. Additionally, we have compared the accuracy of the text extracted by LA-PDFText to the text from the Open Access subset of PubMed Central. We then compared this accuracy with that of the text extracted by the PDF2Text system, 2commonly used to extract text from PDF

  2. Text4Health: a qualitative evaluation of parental readiness for text message immunization reminders.

    Science.gov (United States)

    Kharbanda, Elyse Olshen; Stockwell, Melissa S; Fox, Harrison W; Rickert, Vaughn I

    2009-12-01

    We conducted focus groups and individual interviews in a diverse population of parents to qualitatively explore preferences and readiness for text message immunization reminders. We used content analysis to review and independently code transcripts. Text message reminders were well-accepted by parents; many thought they would be more effective than standard phone or mail reminders. Parents preferred text message reminders to be brief and personalized. Most parents were able to retrieve sample text messages but many had difficulty with interactive texting. PMID:19833982

  3. Computational text analysis and reading comprehension exam complexity towards automatic text classification

    CERN Document Server

    Liontou, Trisevgeni

    2014-01-01

    This book delineates a range of linguistic features that characterise the reading texts used at the B2 (Independent User) and C1 (Proficient User) levels of the Greek State Certificate of English Language Proficiency exams in order to help define text difficulty per level of competence. In addition, it examines whether specific reader variables influence test takers' perceptions of reading comprehension difficulty. The end product is a Text Classification Profile per level of competence and a formula for automatically estimating text difficulty and assigning levels to texts consistently and re

  4. Text summarization as a decision support aid

    Directory of Open Access Journals (Sweden)

    Workman T

    2012-05-01

    Full Text Available Abstract Background PubMed data potentially can provide decision support information, but PubMed was not exclusively designed to be a point-of-care tool. Natural language processing applications that summarize PubMed citations hold promise for extracting decision support information. The objective of this study was to evaluate the efficiency of a text summarization application called Semantic MEDLINE, enhanced with a novel dynamic summarization method, in identifying decision support data. Methods We downloaded PubMed citations addressing the prevention and drug treatment of four disease topics. We then processed the citations with Semantic MEDLINE, enhanced with the dynamic summarization method. We also processed the citations with a conventional summarization method, as well as with a baseline procedure. We evaluated the results using clinician-vetted reference standards built from recommendations in a commercial decision support product, DynaMed. Results For the drug treatment data, Semantic MEDLINE enhanced with dynamic summarization achieved average recall and precision scores of 0.848 and 0.377, while conventional summarization produced 0.583 average recall and 0.712 average precision, and the baseline method yielded average recall and precision values of 0.252 and 0.277. For the prevention data, Semantic MEDLINE enhanced with dynamic summarization achieved average recall and precision scores of 0.655 and 0.329. The baseline technique resulted in recall and precision scores of 0.269 and 0.247. No conventional Semantic MEDLINE method accommodating summarization for prevention exists. Conclusion Semantic MEDLINE with dynamic summarization outperformed conventional summarization in terms of recall, and outperformed the baseline method in both recall and precision. This new approach to text summarization demonstrates potential in identifying decision support data for multiple needs.

  5. Extracting laboratory test information from biomedical text

    Directory of Open Access Journals (Sweden)

    Yanna Shen Kang

    2013-01-01

    Full Text Available Background: No previous study reported the efficacy of current natural language processing (NLP methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with the current tools and techniques, especially machine learning and symbolic NLP methods. The study data came from a text corpus maintained by the U.S. Food and Drug Administration, containing a rich set of information on laboratory tests and test devices. Methods: The authors developed a symbolic information extraction (SIE system to extract device and test specific information about four types of laboratory test entities: Specimens, analytes, units of measures and detection limits. They compared the performance of SIE and three prominent machine learning based NLP systems, LingPipe, GATE and BANNER, each implementing a distinct supervised machine learning method, hidden Markov models, support vector machines and conditional random fields, respectively. Results: Machine learning systems recognized laboratory test entities with moderately high recall, but low precision rates. Their recall rates were relatively higher when the number of distinct entity values (e.g., the spectrum of specimens was very limited or when lexical morphology of the entity was distinctive (as in units of measures, yet SIE outperformed them with statistically significant margins on extracting specimen, analyte and detection limit information in both precision and F-measure. Its high recall performance was statistically significant on analyte information extraction. Conclusions: Despite its shortcomings against machine learning methods, a well-tailored symbolic system may better discern relevancy among a pile of information of the same type and may outperform a machine learning system by tapping into lexically non-local contextual information such as the document structure.

  6. Dialogical surface text features in abstracts

    OpenAIRE

    Ingrid García-Østbye

    2008-01-01

    A sample driven description of Research Article-Comment-Reply (RA-C-R) abstracts in terms of abstract sentence length, reference, possessive structures, modal verbs and word range was carried out to find out whether their surface text features showed some trace of a dialogical construction of knowledge within the psychology discourse community. The study served an exploratory purpose. A Boolean search was conducted in the PsycLIT database yielding a sample of 149 PsycLIT RA-C-R abstracts (13,...

  7. Text Data Mining: Theory and Methods

    OpenAIRE

    Solka, Jeffrey L.

    2008-01-01

    This paper provides the reader with a very brief introduction to some of the theory and methods of text data mining. The intent of this article is to introduce the reader to some of the current methodologies that are employed within this discipline area while at the same time making the reader aware of some of the interesting challenges that remain to be solved within the area. Finally, the articles serves as a very rudimentary tutorial on some of techniques while also providing the reader wi...

  8. Methods for Mining and Summarizing Text Conversations

    CERN Document Server

    Carenini, Giuseppe; Murray, Gabriel

    2011-01-01

    Due to the Internet Revolution, human conversational data -- in written forms -- are accumulating at a phenomenal rate. At the same time, improvements in speech technology enable many spoken conversations to be transcribed. Individuals and organizations engage in email exchanges, face-to-face meetings, blogging, texting and other social media activities. The advances in natural language processing provide ample opportunities for these "informal documents" to be analyzed and mined, thus creating numerous new and valuable applications. This book presents a set of computational methods

  9. Unsupervised information extraction by text segmentation

    CERN Document Server

    Cortez, Eli

    2013-01-01

    A new unsupervised approach to the problem of Information Extraction by Text Segmentation (IETS) is proposed, implemented and evaluated herein. The authors' approach relies on information available on pre-existing data to learn how to associate segments in the input string with attributes of a given domain relying on a very effective set of content-based features. The effectiveness of the content-based features is also exploited to directly learn from test data structure-based features, with no previous human-driven training, a feature unique to the presented approach. Based on the approach, a

  10. Text to reference ratios in scientific journals

    OpenAIRE

    Little, Anne E.; Roma M. Harris; Nicholls, Paul T.

    1990-01-01

    I n 1987, Peter Junars, the editor of Limnology and Oceanography, reported that the ratio of printed pages of text t o nunber of references had decreased during the period 1980 to 1987. I n other words, authors were using an increasing nunber o f references - an observation which was o f sane concern because Limnozoology and Oceanography publishes only a fixed nunber of pages per year. I n the present study, an attempt was made t o detenine whether journals from other scientific discipl...

  11. On the role of autocorrelations in texts

    CERN Document Server

    Lande, D V

    2007-01-01

    The task of finding a criterion allowing to distinguish a text from an arbitrary set of words is rather relevant in itself, for instance, in the aspect of development of means for internet-content indexing or separating signals and noise in communication channels. The Zipf law is currently considered to be the most reliable criterion of this kind [3]. At any rate, conventional stochastic word sets do not meet this law. The present paper deals with one of possible criteria based on the determination of the degree of data compression.

  12. On the role of autocorrelations in texts

    OpenAIRE

    Lande, D. V.; Snarskii, A. A.

    2007-01-01

    The task of finding a criterion allowing to distinguish a text from an arbitrary set of words is rather relevant in itself, for instance, in the aspect of development of means for internet-content indexing or separating signals and noise in communication channels. The Zipf law is currently considered to be the most reliable criterion of this kind [3]. At any rate, conventional stochastic word sets do not meet this law. The present paper deals with one of possible criteria based on the determi...

  13. Convolutional Neural Networks for Direct Text Deblurring

    Czech Academy of Sciences Publication Activity Database

    Hradiš, M.; Kotera, Jan; Zemčík, P.; Šroubek, Filip

    Swansea: The British Machine Vision Association and Society for Pattern Recognition, 2015. ISBN 1-901725-53-7. [The British Machine Vision Conference (BMVC) 2015 /26./. Swansea (GB), 07.09.2015-10.09.2015] R&D Projects: GA ČR GA13-29225S; GA MŠk 7H14004 Grant ostatní: GA UK(CZ) 938213/2013 Institutional support: RVO:67985556 Keywords : image deblurring * text deblurring * convolutional neural networks * image restoration Subject RIV: JD - Computer Applications, Robotics http://library.utia.cas.cz/separaty/2015/ZOI/kotera-0450667.pdf

  14. Restructuring Compressed Texts without Explicit Decompression

    CERN Document Server

    Goto, Keisuke; Inenaga, Shunsuke; Bannai, Hideo; Sakamoto, Hiroshi; Takeda, Masayuki

    2011-01-01

    We consider the problem of {\\em restructuring} compressed texts without explicit decompression. We present algorithms which allow conversions from compressed representations of a string $T$ produced by any grammar-based compression algorithm, to representations produced by several specific compression algorithms including LZ77, LZ78, run length encoding, and some grammar based compression algorithms. These are the first algorithms that achieve running times polynomial in the size of the compressed input and output representations of $T$. Since most of the representations we consider can achieve exponential compression, our algorithms are theoretically faster in the worst case, than any algorithm which first decompresses the string for the conversion.

  15. Combinatory hybrid elementary analysis of text (CHEAT)

    OpenAIRE

    Atwell, ES

    2007-01-01

    We propose the CHEAT approach to the MorphoChallenge contest: Combinatory Hybrid Elementary Analysis of Text. The idea is: acquire results from a number of other candidate systems; CHEAT will read in the output files of each of the other systems, and then line-by-line select the "majority vote" analysis - the analysis which most systems have gone for. If there is a tie, take the result produced by the system with the highest F-measure; if the other systems’ output files are ordered best-first...

  16. Quality of OCR for Degraded Text Images

    OpenAIRE

    Hartley, Roger T; Crumpton, Kathleen

    1999-01-01

    Commercial OCR packages work best with high-quality scanned images. They often produce poor results when the image is degraded, either because the original itself was poor quality, or because of excessive photocopying. The ability to predict the word failure rate of OCR from a statistical analysis of the image can help in making decisions in the trade-off between the success rate of OCR and the cost of human correction of errors. This paper describes an investigation of OCR of degraded text i...

  17. Reading Instruments: Objects, Texts and Museums

    Science.gov (United States)

    Anderson, Katharine; Frappier, Mélanie; Neswald, Elizabeth; Trim, Henry

    2013-05-01

    Science educators, historians of science and their students often share a curiosity about historical instruments as a tangible link between past and present practices in the sciences. We less often integrate instruments into our research and pedagogy, considering artefact study as the domain of museum specialists. We argue here that scholars and teachers new to material culture can readily use artefacts to reveal rich and complex networks of narratives. We illustrate this point by describing our own lay encounter with an artefact turned over for our analysis during a week-long workshop at the Canada Science and Technology Museum. The text explains how elements as disparate as the military appearance of the instrument, the crest stamped on its body, the manipulation of its telescopes, or a luggage tag revealed the object's scientific and political significance in different national contexts. In this way, the presence of the instrument in the classroom vividly conveyed the nature of geophysics as a field practice and an international science, and illuminated relationships between pure and applied science for early twentieth century geologists. We conclude that artefact study can be an unexpectedly powerful and accessible tool in the study of science, making visible the connections between past and present, laboratory and field, texts and instruments.

  18. Chemical-text hybrid search engines.

    Science.gov (United States)

    Zhou, Yingyao; Zhou, Bin; Jiang, Shumei; King, Frederick J

    2010-01-01

    As the amount of chemical literature increases, it is critical that researchers be enabled to accurately locate documents related to a particular aspect of a given compound. Existing solutions, based on text and chemical search engines alone, suffer from the inclusion of "false negative" and "false positive" results, and cannot accommodate diverse repertoire of formats currently available for chemical documents. To address these concerns, we developed an approach called Entity-Canonical Keyword Indexing (ECKI), which converts a chemical entity embedded in a data source into its canonical keyword representation prior to being indexed by text search engines. We implemented ECKI using Microsoft Office SharePoint Server Search, and the resultant hybrid search engine not only supported complex mixed chemical and keyword queries but also was applied to both intranet and Internet environments. We envision that the adoption of ECKI will empower researchers to pose more complex search questions that were not readily attainable previously and to obtain answers at much improved speed and accuracy. PMID:20047295

  19. Lexical Meaning in the Corpus of Texts

    Directory of Open Access Journals (Sweden)

    Andrey Evgenyevich Bochkarev

    2015-11-01

    Full Text Available The methods and procedures of task solving in the field of semantics are inevitably influenced by the paradigm change. In particular, using text corpora, it is possible to re-examine problems that are traditional for lexical semantics, such as polysemy, the unity of meaning as well as opportunities for interpretation of units provided by a context or an intertext. Moreover, on analyzing the uses of words registered in corpora, one can realize that together with the system of language social norms are additional coding systems that also play a decisive role in meaning determination. Thus, the Russian Language National Corpus reveals that the notion of «high moral character» which is defined as «dignity» in dictionaries may imply quite different properties depending on the distribution of roles that are customary for the given culture. For males it can imply intellect, fidelity to duty, poise or stalk; for an unmarried woman it can mean comeliness, good reputation and the art to dress with elegance; when applied to a wife, the notion can imply conjugal fidelity, debonnaire and homemaking; a maiden should be smooth-tempered and of rosy disposition; a subordinate is expected to be obedient and able to execute orders. The author comes to the conclusion that it is social norms that may have modulative influence on the word meaning in every case of its actualization in speech.

  20. El manual como texto Schoolbook as text

    Directory of Open Access Journals (Sweden)

    Agustín Escolano Benito

    2012-12-01

    Full Text Available Este trabajo aborda la cuestión de la identidad del libro escolar como un género textual específico en el contexto de la manualística clásica y moderna, contextualizando los análisis en el marco de la cultura de la escuela tradicional y en la era de la revolución digital y bajo una perspectiva historiográfica y teórica. También plantea el nacimiento y primeros desarrollos de la manualística como campo intelectual y académico y sus contribuciones a la definición de la identidad del libro escolar.This paper discusses the question of identifying a coursebook as a specific text genre in the context of the classical and modern manualistics, situating the analysis within the traditional school culture and the digital revolution era, under a historical and theoretical perspective. It also covers the birth and initial development of manualistics as an intelectual and academic field and its contributions to the definition of the schoolbook identity.

  1. Handwritten Text Image Authentication using Back Propagation

    CERN Document Server

    Chakravarthy, A S N; Avadhani, P S

    2011-01-01

    Authentication is the act of confirming the truth of an attribute of a datum or entity. This might involve confirming the identity of a person, tracing the origins of an artefact, ensuring that a product is what it's packaging and labelling claims to be, or assuring that a computer program is a trusted one. The authentication of information can pose special problems (especially man-in-the-middle attacks), and is often wrapped up with authenticating identity. Literary can involve imitating the style of a famous author. If an original manuscript, typewritten text, or recording is available, then the medium itself (or its packaging - anything from a box to e-mail headers) can help prove or disprove the authenticity of the document. The use of digital images of handwritten historical documents has become more popular in recent years. Volunteers around the world now read thousands of these images as part of their indexing process. Handwritten text images of old documents are sometimes difficult to read or noisy du...

  2. [Formula: see text] and [Formula: see text] Spoken Word Processing: Evidence from Divided Attention Paradigm.

    Science.gov (United States)

    Shafiee Nahrkhalaji, Saeedeh; Lotfi, Ahmad Reza; Koosha, Mansour

    2016-10-01

    The present study aims to reveal some facts concerning first language ([Formula: see text] and second language ([Formula: see text] spoken-word processing in unbalanced proficient bilinguals using behavioral measures. The intention here is to examine the effects of auditory repetition word priming and semantic priming in first and second languages of these bilinguals. The other goal is to explore the effects of attention manipulation on implicit retrieval of perceptual and conceptual properties of spoken [Formula: see text] and [Formula: see text] words. In so doing, the participants performed auditory word priming and semantic priming as memory tests in their [Formula: see text] and [Formula: see text]. In a half of the trials of each experiment, they carried out the memory test while simultaneously performing a secondary task in visual modality. The results revealed that effects of auditory word priming and semantic priming were present when participants processed [Formula: see text] and [Formula: see text] words in full attention condition. Attention manipulation could reduce priming magnitude in both experiments in [Formula: see text]. Moreover, [Formula: see text] word retrieval increases the reaction times and reduces accuracy on the simultaneous secondary task to protect its own accuracy and speed. PMID:26643309

  3. Graphical support for comprehending science texts: The contributions of diagram design and text directives

    Science.gov (United States)

    McTigue, Erin M.

    The present study examined the combined effect of diagram design and text directives on the comprehension of explanatory science texts for middle school readers. Three types of diagram designs were compared. Each design contained the same graphical representation of a cycle but differed in the labels. The labels indicated either the (a) parts of the, cycle, (b) steps of the cycle, or (c) both the parts and steps. Additionally, there were two conditions of text, both with and without embedded directives. The directives guided the reader to the diagram to help readers integrate the two sources of information. Finally, each of the 189 sixth grade participants read two texts---a life-science text and a physical-science text. Results indicated that for the life-science text both the parts diagrams and the steps diagrams facilitated the readers' comprehension, but that the parts & steps diagram did not. Overall, the directives assisted readers in the life-science text, when they were viewing the complex diagrams: the steps diagram, and the parts & steps diagrams, but not the parts diagram. Directives also helped girls who were reading at the below- and on-grade level, but not the girls reading above-grade level. Neither the diagrams nor directives facilitated comprehension of the physical science text. There was a gender difference favoring boys on the physical science but no gender difference on the life-science text.

  4. MANAGING THE TRANSLATION OF ECONOMIC TEXTS

    Directory of Open Access Journals (Sweden)

    Pop Anamaria Mirabela

    2012-12-01

    Full Text Available Theoretically, translation may pass as science; practically, it seems closer to art. Translation is a challenging activity requiring a set of abilities and posing few difficulties that appear during the translation process. This paper investigates the extent to which sub-technical vocabulary can constitute a problem to Romanian students of economics reading in English, by looking at the translations produced as independent or pair work during English classes and analyzing the various errors which may appeared. The exigencies required by the efficient business communication have increased in the past few decades because of rising international trade, increased migration, globalization, the recognition of linguistic minorities, and the expansion of the mass media and technology. All these led us to approach the topic of translation which is actually a job that requires skills, stages of research necessary for disclosure of transfer characteristic into the target language, training, experience and a good sense of languages. The paper defines the theoretical issues and terminology: translation, types of translation, economic texts and then focuses on the presentation of the practical work carried out throughout the academic year of second year students. Considering that only 28% of the entire European population can read English, and even less people in South America and Asia can, it is obvious that an effective communication of business matters relies on an accurate understanding of terminology. Economics is a field of knowledge in accelerated scientific and technological development. As there is a permanent and ever increasing need to quickly update their knowledge, economists read and learn directly in the original language of the publication and stick to it in daily usage, including conferences, scientific events and articles written in Romanian. Besides researching properly the markets, finding distribution channels, and dealing with legal

  5. Text Mining Approaches To Extract Interesting Association Rules from Text Documents

    Directory of Open Access Journals (Sweden)

    Vishwadeepak Singh Baghela

    2012-05-01

    Full Text Available A handful of text data mining approaches are available to extract many potential information and association from large amount of text data. The term data mining is used for methods that analyze data with the objective of finding rules and patterns describing the characteristic properties of the data. The 'mined information is typically represented as a model of the semantic structure of the dataset, where the model may be used on new data for prediction or classification. In general, data mining deals with structured data (for example relational databases, whereas text presents special characteristics and is unstructured. The unstructured data is totally different from databases, where mining techniques are usually applied and structured data is managed. Text mining can work with unstructured or semi-structured data sets A brief review of some recent researches related to mining associations from text documents is presented in this paper.

  6. How indexicals function in texts: Discourse, text, and one neo-Gricean account of indexical reference

    OpenAIRE

    Cornish, Francis

    2008-01-01

    International audience My goal in this article is to compare the behavior of a variety of non clause-bound types of indexical expression in English across three texts from different genres, spoken as well as written. A key distinction is the one claimed to exist between the dimensions of text and discourse, and the comparison of the indexical types demonstrates its relevance. In a given text, certain lexically-specific types of indexical bearing an anaphoric interpretation may perform part...

  7. Sentence connexion and global text structures: a case study of a political text, English leader article

    OpenAIRE

    Stein, Dieter; Mattei, Adriana

    1993-01-01

    The paper first gives a brief overview of the history and theoretical status of discourse analysis, or "text linguistics." The main body of the paper consists of a detailed analysis of sentence connexion, i.e. the logical relationship between sentences and larger chunks of text, performed on a newspaper leader article. The results of this local analysis are then related to the global organisation of text structure with components such as macro- and super-structure by way of int...

  8. PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks

    OpenAIRE

    Tang, Jian; Qu, Meng; Mei, Qiaozhu

    2015-01-01

    Unsupervised text embedding methods, such as Skip-gram and Paragraph Vector, have been attracting increasing attention due to their simplicity, scalability, and effectiveness. However, comparing to sophisticated deep learning architectures such as convolutional neural networks, these methods usually yield inferior results when applied to particular machine learning tasks. One possible reason is that these text embedding methods learn the representation of text in a fully unsupervised way, wit...

  9. Metaphor identification in large texts corpora.

    Directory of Open Access Journals (Sweden)

    Yair Neuman

    Full Text Available Identifying metaphorical language-use (e.g., sweet child is one of the challenges facing natural language processing. This paper describes three novel algorithms for automatic metaphor identification. The algorithms are variations of the same core algorithm. We evaluate the algorithms on two corpora of Reuters and the New York Times articles. The paper presents the most comprehensive study of metaphor identification in terms of scope of metaphorical phrases and annotated corpora size. Algorithms' performance in identifying linguistic phrases as metaphorical or literal has been compared to human judgment. Overall, the algorithms outperform the state-of-the-art algorithm with 71% precision and 27% averaged improvement in prediction over the base-rate of metaphors in the corpus.

  10. Algorithmic Detection of Computer Generated Text

    CERN Document Server

    Lavoie, Allen

    2010-01-01

    Computer generated academic papers have been used to expose a lack of thorough human review at several computer science conferences. We assess the problem of classifying such documents. After identifying and evaluating several quantifiable features of academic papers, we apply methods from machine learning to build a binary classifier. In tests with two hundred papers, the resulting classifier correctly labeled papers either as human written or as computer generated with no false classifications of computer generated papers as human and a 2% false classification rate for human papers as computer generated. We believe generalizations of these features are applicable to similar classification problems. While most current text-based spam detection techniques focus on the keyword-based classification of email messages, a new generation of unsolicited computer-generated advertisements masquerade as legitimate postings in online groups, message boards and social news sites. Our results show that taking the formatti...

  11. Dialogical surface text features in abstracts

    Directory of Open Access Journals (Sweden)

    Ingrid García-Østbye

    2008-04-01

    Full Text Available A sample driven description of Research Article-Comment-Reply (RA-C-R abstracts in terms of abstract sentence length, reference, possessive structures, modal verbs and word range was carried out to find out whether their surface text features showed some trace of a dialogical construction of knowledge within the psychology discourse community. The study served an exploratory purpose. A Boolean search was conducted in the PsycLIT database yielding a sample of 149 PsycLIT RA-C-R abstracts (13,978 words. Relative frequency percent distributions were calculated for all variables, including reported speech verbs. Specific comparisons with a Medline corpus were conducted and variations were accounted for in terms of scientific discourse characteristics, field, database policies, and dialogical nature; that is, in the framework provided by the strands of research of quantitative applied linguistics, social concerns in genre analysis and the model monopoly theory developed in the implementation in sociology of the systems theory. The results suggest: (i a word range affected by both psychology as a discipline and the dialogical content on which PsycLIT RA-C-R abstracts report; (ii a complementarity of reference and possessive structures characterised by features of scientific discourse, feedback genres and dialogical dimensions; (iii the presence of both deontic and epistemic modality in the modal verbs of our sample; (iv and also that abstract length, sentence length and number of sentences per paragraph in our sample may not vary greatly in general terms from those of the social sciences.

  12. Generation Text: The Influence of Audience, Environment, and Social Impression on Text Message Construction

    Science.gov (United States)

    Camuti, Alice Kerlin

    2011-01-01

    The purpose of this interpretivist qualitative study is to discover what factors influence first-year college students as they construct their text messages. Using grounded theory methodology, 11 first-year college students at a university in the Southeast were interviewed one-on-one and through text messaging in order to gain insight into…

  13. The Link between Text Difficulty, Reading Speed and Exploration of Printed Text during Shared Book Reading

    Science.gov (United States)

    Roy-Charland, Annie; Perron, Melanie; Turgeon, Krystle-Lee; Hoffman, Nichola; Chamberland, Justin A.

    2016-01-01

    In the current study the reading speed of the narration and the difficulty of the text was manipulated and links were explored with children's attention to the printed text in shared book reading. Thirty-nine children (24 grade 1 and 15 grade 2) were presented easy and difficult books at slow (syllable by syllable) or fast (adult reading speed)…

  14. "Romeo and Juliet" in the Minneapolis Public Schools: Accurate Text or Bowdlerized Text?

    Science.gov (United States)

    Reed, Margaret A.

    In 1984, parents of a Minneapolis, Minnesota, ninth grader came before the school district's "Students' Right to Learn Committee" to object to what they described as a bowdlerized version of "Romeo and Juliet" in the Scott, Foresman text, and the publisher's failure to acknowledge in the text that the play was abridged. The committee concurred…

  15. Expository Text Comprehension: Helping Primary-Grade Teachers Use Expository Texts to Full Advantage

    Science.gov (United States)

    Hall, Kendra M.; Sabey, Brenda L.; McClellan, Michelle

    2005-01-01

    This study investigated the effectiveness of an instructional program designed to teach expository text comprehension during guided reading. Participants included 72 second graders in six classrooms, organized into four guided reading groups in each class (n = 24). The six classes were randomly assigned to one of three groups: Text Structure,…

  16. Quality Control in Software Documentation Based on Measurement of Text Comprehension and Text Comprehensibility.

    Science.gov (United States)

    Lehner, Franz

    1993-01-01

    Discusses methods of textual documentation that can be used for software documentation. Highlights include measurement of text comprehensibility; methods for the measurement of documentation quality, including readability and the Cloze Procedure; tools for the measurement of text readability; and the development of the Reading Measurability…

  17. Powerful Vocabulary Acquisition through Texts Comparison

    Directory of Open Access Journals (Sweden)

    Mohammad Reza Hasannejad

    2015-03-01

    Full Text Available This study aimed to investigate if dual version reading comprehension had a positive effect on Intermediate EFL students’ general vocabulary acquisition, receptive and productive knowledge of vocabulary and students’ synonymous power of words. Two groups were selected - the experimental group and the control group. The study included: (1 four pretests (2 the dual version reading comprehension, and (3 four posttests. It was found that there was no significant difference between the two groups of students on the pretests. However there was a significant difference between the two groups of the students on the posttests. Overall, the dual version reading comprehension vocabulary-learning made the experimental group learners outperformed the control groups in terms of their performance on four types of vocabulary tests. This indicates that students following dual version reading comprehension were more successful in vocabulary acquisition, and developing their receptive knowledge of vocabulary, transferring their receptive knowledge in to the productive knowledge and enhancing the memorization of the synonymous words.Key words: Dual version reading comprehension, Receptive knowledge, Productive knowledge, Synonymous power

  18. Linguistically informed digital fingerprints for text

    Science.gov (United States)

    Uzuner, Özlem

    2006-02-01

    Digital fingerprinting, watermarking, and tracking technologies have gained importance in the recent years in response to growing problems such as digital copyright infringement. While fingerprints and watermarks can be generated in many different ways, use of natural language processing for these purposes has so far been limited. Measuring similarity of literary works for automatic copyright infringement detection requires identifying and comparing creative expression of content in documents. In this paper, we present a linguistic approach to automatically fingerprinting novels based on their expression of content. We use natural language processing techniques to generate "expression fingerprints". These fingerprints consist of both syntactic and semantic elements of language, i.e., syntactic and semantic elements of expression. Our experiments indicate that syntactic and semantic elements of expression enable accurate identification of novels and their paraphrases, providing a significant improvement over techniques used in text classification literature for automatic copy recognition. We show that these elements of expression can be used to fingerprint, label, or watermark works; they represent features that are essential to the character of works and that remain fairly consistent in the works even when works are paraphrased. These features can be directly extracted from the contents of the works on demand and can be used to recognize works that would not be correctly identified either in the absence of pre-existing labels or by verbatim-copy detectors.

  19. TEACHING SPEAKING REPORT TEXT USING SPEAKING PROMPT

    Directory of Open Access Journals (Sweden)

    Sunarti

    2015-01-01

    Full Text Available Learning a language means learning how to communicate either in oral or written way, that is how to listen, speak, read and write fluently, accurately and acceptaby. However students find difficulties in learning them. In speaking session, the students can not express their idea well because they have problems in vocabulay, putting the words together in the correct structure, and pronouncing them besides they are lack of information or they don’t have enough background knowledge about the topic. Those problems makes the students don’t want to speak, or they need long time to prepare their speaking. Another problem is that they are accustmed to write before speaking and memorize it to perform their speaking task. Based on these problems it is necesary to use the teaching strategies, one of them is using speaking promt. As pre activity, the teacher reviews the generic structure, the simple present tense, shows pictures related to the topic, introduces the facts which are classified based on the generic structure, and pronunciation practise. In the main activy, students describe the picture based on the facts that have been given.The sentence pattern of simple present tense is also shown. As the post activity, the students give comment on their performance each other.These activities in fact can solve their problems. Speaking prompt helps them in speaking. They don’t need to think about the background knowledge, the generic structure and the sentence pattern

  20. Semantic text mining support for lignocellulose research

    Directory of Open Access Journals (Sweden)

    Meurs Marie-Jean

    2012-04-01

    Full Text Available Abstract Background Biofuels produced from biomass are considered to be promising sustainable alternatives to fossil fuels. The conversion of lignocellulose into fermentable sugars for biofuels production requires the use of enzyme cocktails that can efficiently and economically hydrolyze lignocellulosic biomass. As many fungi naturally break down lignocellulose, the identification and characterization of the enzymes involved is a key challenge in the research and development of biomass-derived products and fuels. One approach to meeting this challenge is to mine the rapidly-expanding repertoire of microbial genomes for enzymes with the appropriate catalytic properties. Results Semantic technologies, including natural language processing, ontologies, semantic Web services and Web-based collaboration tools, promise to support users in handling complex data, thereby facilitating knowledge-intensive tasks. An ongoing challenge is to select the appropriate technologies and combine them in a coherent system that brings measurable improvements to the users. We present our ongoing development of a semantic infrastructure in support of genomics-based lignocellulose research. Part of this effort is the automated curation of knowledge from information on fungal enzymes that is available in the literature and genome resources. Conclusions Working closely with fungal biology researchers who manually curate the existing literature, we developed ontological natural language processing pipelines integrated in a Web-based interface to assist them in two main tasks: mining the literature for relevant knowledge, and at the same time providing rich and semantically linked information.

  1. AB027. Penile augmentation: informed text briefing

    Science.gov (United States)

    Park, Nam Cheol

    2016-01-01

    The men’s desire to have larger and longer penis have created endless medical demands throughout human history. Until up to date, various medical skills for penile augmentation have developed in aspect of experimental and clinical outcome. Recently with throwing away socially unacceptable ideas, the need for penile augmentation is considered as equivalent level with mammoplasty for breast augmentation in women for cosmetic and psychological reason. Concurrently advanced technologies in medical material and tissue engineering provide a variety of options to features functional plastic surgery as well as defected tissue compensation procedures. This creative description works accordingly presents state of art knowledge on the penile augmentation with more than 100 full-colored helpful illustrations clarifying penile surgical anatomy, operative procedures by experienced surgeon from the traditional fat transfer to the penile disassembly technique, the newest tissue engineering techniques by researchers with valuable data of world top level, auxiliary medical devices, and how to reconstruct for damaged penis by a quack or accident. Obviously this text book will be a great guidebook in clinical practice for all who are involved or interested in the penile augmentation procedure.

  2. Named entity recognition in Slovene text

    Directory of Open Access Journals (Sweden)

    Tadej Štajner

    2013-12-01

    Full Text Available This paper presents an approach and an implementation of a named entity extractor for Slovene language, based on a machine learning approach. It is designed as a supervised algorithm based on Conditional Random Fields and is trained on the ssj500k annotated corpus of Slovene. The corpus, which is available under a Creative Commons CC-BY-NC-SA licence, is annotated with morphosyntactic tags, as well as named entities for people, locations, organisations, and miscellaneous names. The paper discusses the influence of morphosyntactic tags, lexicons and conjunctions of features of neighbouring words. An important contribution of this investigation is that morphosyntactic tags benefit named entity extraction. Using all the best-performing features the recognizer reaches a precision of 74% and a recall of 72%, having stronger performance on personal and geographical named entities, followed by organizations, but performs poorly on the miscellaneous entities, since this class is very diverse and consequently difficult to predict. A major contribution of the paper is also showing the benefits of splitting the class of miscellaneous entities into organizations and other entities, which in turn improves performance even on personal and organizational names. The software, developed in this research is freely available under the Apache 2.0 licence at http://ailab.ijs.si/~tadej/slner.zip, while development versions are available at https://github.com/tadejs/slner.

  3. Hybrid Method for Tagging Arabic Text

    Directory of Open Access Journals (Sweden)

    Yamina Tlili-Guiassa

    2006-01-01

    Full Text Available Many natural language expressions are ambiguous and need to draw on other sources of information to be interpreted. Interpretation of the word ﺗﻌﺎون to be considered as a noun or a verb depends on the presence of contextual cues. This study proposes a hybrid method of based- rules and a machine learning method for tagging Arabic words. So this method is based firstly on rules (that considered the post-position, ending of a word and patterns and then the anomaly is corrected by adopting a memory-based learning method (MBL. The memory based learning is an efficient method to integrate various sources of information and handling exceptional data in natural language processing tasks. Secondly checking the exceptional cases of rules and more information is made available to the learner for treating those exceptional cases. To evaluate the proposed method a number of experiments has been run and in order, to improve the importance of the various information in learning.

  4. Mining Causality for Explanation Knowledge from Text

    Institute of Scientific and Technical Information of China (English)

    Chaveevan Pechsiri; Asanee Kawtrakul

    2007-01-01

    Mining causality is essential to provide a diagnosis. This research aims at extracting the causality existing within multiple sentences or EDUs (Elementary Discourse Unit). The research emphasizes the use of causality verbs because they make explicit in a certain way the consequent events of a cause, e.g., "Aphids suck the sap from rice leaves. Then leaves will shrink. Later, they will become yellow and dry.". A verb can also be the causal-verb link between cause and effect within EDU(s), e.g., "Aphids suck the sap from rice leaves causing leaves to be shrunk" ("causing" is equivalent to a causal-verb link in Thai). The research confronts two main problems: identifying the interesting causality events from documents and identifying their boundaries. Then, we propose mining on verbs by using two different machine learning techniques, Naive Bayes classifier and Support Vector Machine. The resulted mining rules will be used for the identification and the causality extraction of the multiple EDUs from text. Our multiple EDUs extraction shows 0.88 precision with 0.75 recall from Na'ive Bayes classifier and 0.89 precision with 0.76 recall from Support Vector Machine.

  5. Sustainable packaging. Packaging for a circular economy; Duurzaam verpakken. Verpakken voor de circulaire economie

    Energy Technology Data Exchange (ETDEWEB)

    Haffmans, S. [Partners for Innovation, Amsterdam (Netherlands); Standhardt, G. [Nederlands Verpaskkingscentrum NVC, Gouda (Netherlands); Hamer, A. [Agentschap NL, Utrecht (Netherlands)

    2013-10-15

    What is Sustainable Packaging? And what is the most sustainable packaging for a product? The publication is intended for anyone who wants to take into account the environment in the design of a product and packaging. It offers concrete suggestions and inspiring examples to bring sustainable packaging into practice [Dutch] Wat is Duurzaam Verpakken? En wat is de duurzaamste verpakking voor mijn product? De publicatie is bestemd voor iedereen die rekening wil houden met het milieu bij het ontwerp van een product-verpakkingscombinatie. Ze biedt concrete aanknopingspunten en inspirerende voorbeelden om hier praktisch mee aan de slag te gaan.

  6. Rupture sismique des fondations par perte de capacit\\'e portante: Le cas des semelles circulaires

    OpenAIRE

    Chatzigogos, Charisis; Pecker, Alain; Salençon, J.

    2008-01-01

    Within the context of earthquake-resistant design of shallow foundations, the present study is concerned with the determination of the seismic bearing capacity of a circular footing resting on the surface of a heterogene-ous purely cohesive semi-infinite soil layer. In the first part of the paper, a database, containing case histories of civil engineering structures that sustained a foundation seismic bearing capacity failure, is briefly pre-sented, aiming at a better understanding of the stu...

  7. Rupture sismique des fondations par perte de capacité portante: Le cas des semelles circulaires

    OpenAIRE

    Chatzigogos, Charisis; Pecker, Alain; Salençon, J.

    2007-01-01

    International audience Within the context of earthquake-resistant design of shallow foundations, the present study is concerned with the determination of the seismic bearing capacity of a circular footing resting on the surface of a heterogene-ous purely cohesive semi-infinite soil layer. In the first part of the paper, a database, containing case histories of civil engineering structures that sustained a foundation seismic bearing capacity failure, is briefly pre-sented, aiming at a bette...

  8. Rupture sismique des fondations par perte de capacit\\'e portante: Le cas des semelles circulaires

    CERN Document Server

    Chatzigogos, Charisis; Salençon, J

    2008-01-01

    Within the context of earthquake-resistant design of shallow foundations, the present study is concerned with the determination of the seismic bearing capacity of a circular footing resting on the surface of a heterogene-ous purely cohesive semi-infinite soil layer. In the first part of the paper, a database, containing case histories of civil engineering structures that sustained a foundation seismic bearing capacity failure, is briefly pre-sented, aiming at a better understanding of the studied phenomenon and offering a number of case studies useful for validation of theoretical computations. In the second part of the paper, the aforementioned problem is addressed using the kinematic approach of the Yield Design theory, thus establishing optimal upper bounds for the ultimate seismic loads supported by the soil-footing system. The results lead to the establishment of some very simple guidelines that extend the existing formulae for the seismic bearing capacity contained in the European norms (proposed for st...

  9. Text from corners: a novel approach to detect text and caption in videos.

    Science.gov (United States)

    Zhao, Xu; Lin, Kai-Hsiang; Fu, Yun; Hu, Yuxiao; Liu, Yuncai; Huang, Thomas S

    2011-03-01

    Detecting text and caption from videos is important and in great demand for video retrieval, annotation, indexing, and content analysis. In this paper, we present a corner based approach to detect text and caption from videos. This approach is inspired by the observation that there exist dense and orderly presences of corner points in characters, especially in text and caption. We use several discriminative features to describe the text regions formed by the corner points. The usage of these features is in a flexible manner, thus, can be adapted to different applications. Language independence is an important advantage of the proposed method. Moreover, based upon the text features, we further develop a novel algorithm to detect moving captions in videos. In the algorithm, the motion features, extracted by optical flow, are combined with text features to detect the moving caption patterns. The decision tree is adopted to learn the classification criteria. Experiments conducted on a large volume of real video shots demonstrate the efficiency and robustness of our proposed approaches and the real-world system. Our text and caption detection system was recently highlighted in a worldwide multimedia retrieval competition, Star Challenge, by achieving the superior performance with the top ranking. PMID:20729170

  10. Analyzing Statistical and Syntactical English Text for Word Prediction and Text Generation

    OpenAIRE

    Taher S.K. Homeed; Mansoor Al-A'ali

    2007-01-01

    This research describes a technique for word and phrase prediction to help the writer of an English text in text generation and to speed up the typing process. The technique consists of a learning phase and a generation or prediction phase. The learning phase learns the English phrase and sentence syntax as well as keeps all the corpora for reference against the syntax and the actual text. The prediction of the next word or phrase is based on the preceding one or more words and the history an...

  11. Using LSA and text segmentation to improve automatic Chinese dialogue text summarization

    Institute of Scientific and Technical Information of China (English)

    LIU Chuan-han; WANG Yong-cheng; ZHENG Fei; LIU De-rong

    2007-01-01

    Automatic Chinese text summarization for dialogue style is a relatively new research area. In this paper, Latent Semantic Analysis (LSA) is first used to extract semantic knowledge from a given document, all question paragraphs are identified,an automatic text segmentation approach analogous to TextTiling is exploited to improve the precision of correlating question paragraphs and answer paragraphs, and finally some "important" sentences are extracted from the generic content and the question-answer pairs to generate a complete summary. Experimental results showed that our approach is highly efficient and improves significantly the coherence of the summary while not compromising informativeness.

  12. Computer-Aided Generation of Result Text for Clinical Laboratory Texts

    OpenAIRE

    Kuzmak, Peter M.; Miller, R. E.

    1983-01-01

    Efficient processing of non-numeric textual data is a frequent requirement with medical computer applications such as clinical laboratory result reporting. In such instances, it is often desirable that the computer control the generation of the text to ensure that the intended meaning is conveyed. This paper describes a technique for interactively selecting predefined text segments to form complex textual reports for laboratory tests. The approach, which uses algorithms based on augmented tra...

  13. Connected Text Reading and Differences in Text Reading Fluency in Adult Readers

    OpenAIRE

    2013-01-01

    The process of connected text reading has received very little attention in contemporary cognitive psychology. This lack of attention is in parts due to a research tradition that emphasizes the role of basic lexical constituents, which can be studied in isolated words or sentences. However, this lack of attention is in parts also due to the lack of statistical analysis techniques, which accommodate interdependent time series. In this study, we investigate text reading performance with traditi...

  14. Text Mining Approaches To Extract Interesting Association Rules from Text Documents

    OpenAIRE

    Vishwadeepak Singh Baghela; S. P. Tripathi

    2012-01-01

    A handful of text data mining approaches are available to extract many potential information and association from large amount of text data. The term data mining is used for methods that analyze data with the objective of finding rules and patterns describing the characteristic properties of the data. The 'mined information is typically represented as a model of the semantic structure of the dataset, where the model may be used on new data for prediction or classification. In general, data mi...

  15. Making sense of text : skills that support text comprehension and its development.

    OpenAIRE

    Cain, Kate

    2009-01-01

    Skilled reading involves two main components: word reading and text comprehension. In this article, I focus on three skills that have been shown to support the latter: integration and inference, comprehension monitoring, and knowledge and use of story structure. Research has shown that children with unexpectedly poor reading comprehension have difficulties with each of these text processing skills and that each skill contributes to development in reading comprehension during middle childhood....

  16. The Machine in the Text, and the Text in the Machine

    OpenAIRE

    Portela, Manuel

    2010-01-01

    "The Machine in the Text, and the Text in the Machine" is a review essay on Electronic Literature: New Horizons for the Literary (Notre Dame, IN: University of Notre Dame, 2008), by N. Katherine Hayles, and Mechanisms: New Media and the Forensic Imagination (Cambridge, Mass.: MIT Press, 2008), by Matthew G. Kirschenbaum. Both works make remarkable contributions for the emerging field of digital literary studies and for the theory of digital media. While Hayles analyses the interaction between...

  17. PROSAIC TEXTS OF ABBÂS VESIM AND INVESTIGATING OF THE POEMS IN THESE TEXTS

    Directory of Open Access Journals (Sweden)

    İbrahim HALİL TUĞLUK

    2015-12-01

    Full Text Available Classical Turkish Literature has been formed by effect of Persian literature to a large extent and formed its own language and so had a large important period of Turkish literature. Basic expression means of this literature is poetry. Prose has been always in the shadow of poetry and second class, in fact prosaic texts are usually about didactical subjects just as history, geography, science of religion, astronomy, medicine and biography. Prosaic wording has also differences in the context of purpose and content. An important feature of prose form is its effort to approach to poetry form. Harmony in language that is constituted by rhythm especially approached prose texts to poetic wording further. Studies about these poetic texts in prosaic texts are quite important in points of confirming statistics of poetic texts in Classical Turkish prose literature, retaining, meaning and harmony combinations that have been established by poetry in prose and also designating classical cultural substructure of Ottoman. There are a lot of scholar and craftsman who came into prominence by their literal and scholarly identity in Ottoman history. Abbâs Vesȋm is among the people who lived in 18th century and had these features. His works must be investigated in many aspects because of these features. In this context, it is important to research poetical sections of his works about medicine and astrology that Abbâs Vesȋm wrote except literature in the sense of confirming poetical wording in prosaic texts and designating wording features in prosaic texts.In this study, it is aimed to search the prosaic works of Abbâs Vesȋm who is poet of 18th century, confirm the copies, transcript of poetic sections in these works and search these works in the sense of form and content.

  18. Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications

    CERN Document Server

    Miner, Gary; Hill, Thomas; Nisbet, Robert; Delen, Dursun

    2012-01-01

    The world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. This makes it possible to do many things that previously could not be done: spot business trends, prevent diseases, combat crime and so on. Managed well, the textual data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account. As the Internet expands and our natural capacity to process the unstructured text that it contains diminishes, the value of text mining for information retrieval and search will increase d

  19. The Original Text and Translated Text in Derrida's Deconstruction Theory of Translation

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    Since the 1960s translation has made great progress on the way to becoming a systematic and scientific discipline. The theory of deconstruction, originating in France, has made great impact on traditional translation. It has become more influential in recent days. Through the discussion of deconstruction and its idea of translation, this thesis clarifies people's skeptical attitudes towards deconstruction and explains radical changes it has brought for translation field, especially in explaining the relationship between the original text and the translated text in Derrida's deconstruction theory. At the end of this thesis, the application and limitations of deconstruction are discussed.

  20. Automatic Summarization of Opinionated Texts Résumé automatique de textes d'opinion

    Directory of Open Access Journals (Sweden)

    Thierry Poibeau

    2011-04-01

    Full Text Available In this paper, we present a summarization system that is specifically designed to process blog posts, where factual information is mixed with opinions on the discussed facts. Our approach combines redundancy analysis with new information tracking and is enriched by a module that computes the polarity of textual fragments in order to summarize blog posts more efficiently. The system is evaluated against English data, especially through the participation in TAC (Text Analysis Conference, an international evaluation framework for automatic summarization, in which our system obtained interesting results.

  1. Psychologie des discours et didactique des textes (Psychology of Discourse and the Teaching of Texts).

    Science.gov (United States)

    Bronckart, Jean-Paul, Ed.

    1995-01-01

    This collection of articles on the nature of discourse and writing instruction include: "Une demarche de psychologie de discours; quelques aspects introductifs" ("An Application of Discourse Psychology; Introductory Thoughts") (Jean-Paul Bronckart); "Les procedes de prise en charge enonciative dans trois genres de texts expositifs" ("The Processes…

  2. Connected text reading and differences in text reading fluency in adult readers

    NARCIS (Netherlands)

    Wallot, S.; Hollis, G.; Rooij, M. de

    2013-01-01

    The process of connected text reading has received very little attention in contemporary cognitive psychology. This lack of attention is in parts due to a research tradition that emphasizes the role of basic lexical constituents, which can be studied in isolated words or sentences. However, this lac

  3. Does Compare-Contrast Text Structure Help Students with Autism Spectrum Disorder Comprehend Science Text?

    Science.gov (United States)

    Carnahan, Christina R.; Williamson, Pamela S.

    2013-01-01

    Using a single-subject reversal design, this study evaluated the use of a compare-contrast strategy on the ability of students with autism spectrum disorder to comprehend science text. Three middle school students with high-functioning autism and their teacher participated in this study. A content analysis comparing the number of meaning units in…

  4. Text(ing) in Context: The Future of Workplace Communication in the United States

    Science.gov (United States)

    Kiddie, Thomas J.

    2014-01-01

    Following Rogers's theory of the diffusion of innovations, the author questions whether youth entering the workforce will act as change agents to evolve primary business communication channels from email to text-messaging. Expanding on research performed in 2009, the author investigates three communication scenarios: scheduling meetings,…

  5. Processing the Text of the Holy Quran: a Text Mining Study

    Directory of Open Access Journals (Sweden)

    Mohammad Alhawarat

    2015-02-01

    Full Text Available The Holy Quran is the reference book for more than 1.6 billion of Muslims all around the world Extracting information and knowledge from the Holy Quran is of high benefit for both specialized people in Islamic studies as well as non-specialized people. This paper initiates a series of research studies that aim to serve the Holy Quran and provide helpful and accurate information and knowledge to the all human beings. Also, the planned research studies aim to lay out a framework that will be used by researchers in the field of Arabic natural language processing by providing a ”Golden Dataset” along with useful techniques and information that will advance this field further. The aim of this paper is to find an approach for analyzing Arabic text and then providing statistical information which might be helpful for the people in this research area. In this paper the holly Quran text is preprocessed and then different text mining operations are applied to it to reveal simple facts about the terms of the holy Quran. The results show a variety of characteristics of the Holy Quran such as its most important words, its wordcloud and chapters with high term frequencies. All these results are based on term frequencies that are calculated using both Term Frequency (TF and Term Frequency-Inverse Document Frequency (TF-IDF methods.

  6. Comprehending expository texts: The dynamic neurobiological correlates of building a coherent text representation

    Directory of Open Access Journals (Sweden)

    Amanda eMiller

    2013-12-01

    Full Text Available Little is known about the neural correlates of expository text comprehension. In this study, we sought to identify neural networks underlying expository text comprehension, how those networks change over the course of comprehension, and whether information central to the overall meaning of the text is functionally distinct from peripheral information. Seventeen adult subjects read expository passages while being scanned using functional magnetic resonance imaging (fMRI. By convolving phrase onsets with the hemodynamic response function (HRF, we were able to identify regions that increase and decrease in activation over the course of passage comprehension. We found that expository text comprehension relies on the co-activation of the semantic control network and regions in the posterior midline previously associated with mental model updating and integration (posterior cingulate cortex (PCC and precuneus (PCU. When compared to single word comprehension, left PCC and left Angular Gyrus (AG were activated only for discourse-level comprehension. Over the course of comprehension, reliance on the same regions in the semantic control network and posterior midline increased, while a parietal region associated with attention (intraparietal sulcus (IPS decreased. These results parallel previous findings in narrative comprehension that the initial stages of mental model building require greater visuospatial attention processes, while maintenance of the model increasingly relies on semantic integration regions. Additionally, we used an event-related analysis to examine phrases central to the text’s overall meaning versus peripheral phrases. It was found that central ideas are functionally distinct from peripheral (showing greater activation in the PCC and PCU, and also recruit different parts of the semantic control network over time than peripheral ideas. These findings support previous behavioral models on the cognitive importance of distinguishing

  7. Mobile characters, mobile texts: homelessness and intertextuality in contemporary texts for young people

    Directory of Open Access Journals (Sweden)

    Mavis Reimer

    2013-06-01

    Full Text Available Since the 1990s, narratives about homelessness for and about young people have proliferated around the world. A cluster of thematic elements shared by many of these narratives of the age of globalization points to the deep anxiety that is being expressed about a social, economic, and cultural system under stress or struggling to find a new formation. More surprisingly, many of the narratives also use canonical cultural texts extensively as intertexts. This article considers three novels from three different national traditions to address the work of intertextuality in narratives about homelessness: Skellig by UK author David Almond, which was published in 1998; Chronicler of the Winds by Swedish author Henning Mankell, which was first published in 1988 in Swedish as Comédia Infantil and published in an English translation in 2006; and Stained Glass by Canadian author Michael Bedard, which was published in 2002. Using Julia Kristeva's definition of intertextuality as the “transposition of one (or several sign systems into another,” I propose that all intertexts can be thought of as metaphoric texts, in the precise sense that they carry one text into another. In the narratives under discussion in this article, the idea of homelessness is in perpetual motion between texts and intertexts, ground and figure, the literal and the symbolic. What the child characters and the readers who take up the position offered to implied readers are asked to do, I argue, is to put on a way of seeing that does not settle, a way of being that strains forward toward the new.

  8. On the reduction of generalized polylogarithms to $\\text{Li}_n$ and $\\text{Li}_{2,2}$ and on the evaluation thereof

    CERN Document Server

    Frellesvig, Hjalte; Wever, Christopher

    2016-01-01

    We give expressions for all generalized polylogarithms up to weight four in terms of the functions log, $\\text{Li}_n$, and $\\text{Li}_{2,2}$, valid for arbitrary complex variables. Furthermore we provide algorithms for manipulation and numerical evaluation of $\\text{Li}_n$ and $\\text{Li}_{2,2}$, and add codes in Mathematica and C++ implementing the results. With these results we calculate a number of previously unknown integrals, which we add in App. C.

  9. Text2Video: text-driven facial animation using MPEG-4

    Science.gov (United States)

    Rurainsky, J.; Eisert, P.

    2005-07-01

    We present a complete system for the automatic creation of talking head video sequences from text messages. Our system converts the text into MPEG-4 Facial Animation Parameters and synthetic voice. A user selected 3D character will perform lip movements synchronized to the speech data. The 3D models created from a single image vary from realistic people to cartoon characters. A voice selection for different languages and gender as well as a pitch shift component enables a personalization of the animation. The animation can be shown on different displays and devices ranging from 3GPP players on mobile phones to real-time 3D render engines. Therefore, our system can be used in mobile communication for the conversion of regular SMS messages to MMS animations.

  10. Aspects in developing of a text analizer for processing unstructured text data

    OpenAIRE

    Petic, Mircea; Osoian, Ecaterina

    2015-01-01

    Тhe article presents our approach in the elaboration of the system for processing unstructured text data in order to create a structured data output as computer linguistics resources using a lexicon of markers. First, a description of the research on the proposed topic, as well as its relation to the national and international level research is presented, being followed by the depiction of a useful to this particular research functionality - PoS Tagger for Romanian. A special section is de...

  11. On the Place of Text Data in Lifelogs, and Text Analysis via Semantic Facets

    OpenAIRE

    Grefenstette, Gregory; Muchemi, Lawrence

    2016-01-01

    Current research in lifelog data has not paid enough attention to analysis of cognitive activities in comparison to physical activities. We argue that as we look into the future, wearable devices are going to be cheaper and more prevalent and textual data will play a more significant role. Data captured by lifelogging devices will increasingly include speech and text, potentially useful in analysis of intellectual activities. Analyzing what a person hears, reads, and sees, we should be able t...

  12. Text-based Research of Early Warning Platform from Food Complaint Texts

    OpenAIRE

    Yueyi Zhang; Taiyi Chen; Jing Hu; Xinghua Fang

    2015-01-01

    This study proposes a food complaint text early warning method based on the guidance of ontology and establishes a scientific and reasonable system of early warning, builds and improves the food security early warning platform. All of those make this study play a supplementary role in the research content of food safety regulators. Based on traditional early warning system, this study constructs food safety complaints warning platform model and builds the food domain ontology and expands food...

  13. A new graph based text segmentation using Wikipedia for automatic text summarization

    Directory of Open Access Journals (Sweden)

    Mohsen Pourvali

    2012-01-01

    Full Text Available The technology of automatic document summarization is maturing and may provide a solution to the information overload problem. Nowadays, document summarization plays an important role in information retrieval. With a large volume of documents, presenting the user with a summary of each document greatly facilitates the task of finding the desired documents. Document summarization is a process of automatically creating a compressed version of a given document that provides useful information to users, and multi-document summarization is to produce a summary delivering the majority of information content from a set of documents about an explicit or implicit main topic. According to the input text, in this paper we use the knowledge base of Wikipedia and the words of the main text to create independent graphs. We will then determine the important of graphs. Then we are specified importance of graph and sentences that have topics with high importance. Finally, we extract sentences with high importance. The experimental results on an open benchmark datasets from DUC01 and DUC02 show that our proposed approach can improve the performance compared to state-of-the-art summarization approaches

  14. Advanced text authorship detection methods and their application to biblical texts

    Science.gov (United States)

    Putniņš, Tālis; Signoriello, Domenic J.; Jain, Samant; Berryman, Matthew J.; Abbott, Derek

    2005-12-01

    Authorship attribution has a range of applications in a growing number of fields such as forensic evidence, plagiarism detection, email filtering, and web information management. In this study, three attribution techniques are extended, tested on a corpus of English texts, and applied to a book in the New Testament of disputed authorship. The word recurrence interval based method compares standard deviations of the number of words between successive occurrences of a keyword both graphically and with chi-squared tests. The trigram Markov method compares the probabilities of the occurrence of words conditional on the preceding two words to determine the similarity between texts. The third method extracts stylometric measures such as the frequency of occurrence of function words and from these constructs text classification models using multiple discriminant analysis. The effectiveness of these techniques is compared. The accuracy of the results obtained by some of these extended methods is higher than many of the current state of the art approaches. Statistical evidence is presented about the authorship of the selected book from the New Testament.

  15. Relating interesting quantitative time series patterns with text events and text features

    Science.gov (United States)

    Wanner, Franz; Schreck, Tobias; Jentner, Wolfgang; Sharalieva, Lyubka; Keim, Daniel A.

    2013-12-01

    In many application areas, the key to successful data analysis is the integrated analysis of heterogeneous data. One example is the financial domain, where time-dependent and highly frequent quantitative data (e.g., trading volume and price information) and textual data (e.g., economic and political news reports) need to be considered jointly. Data analysis tools need to support an integrated analysis, which allows studying the relationships between textual news documents and quantitative properties of the stock market price series. In this paper, we describe a workflow and tool that allows a flexible formation of hypotheses about text features and their combinations, which reflect quantitative phenomena observed in stock data. To support such an analysis, we combine the analysis steps of frequent quantitative and text-oriented data using an existing a-priori method. First, based on heuristics we extract interesting intervals and patterns in large time series data. The visual analysis supports the analyst in exploring parameter combinations and their results. The identified time series patterns are then input for the second analysis step, in which all identified intervals of interest are analyzed for frequent patterns co-occurring with financial news. An a-priori method supports the discovery of such sequential temporal patterns. Then, various text features like the degree of sentence nesting, noun phrase complexity, the vocabulary richness, etc. are extracted from the news to obtain meta patterns. Meta patterns are defined by a specific combination of text features which significantly differ from the text features of the remaining news data. Our approach combines a portfolio of visualization and analysis techniques, including time-, cluster- and sequence visualization and analysis functionality. We provide two case studies, showing the effectiveness of our combined quantitative and textual analysis work flow. The workflow can also be generalized to other

  16. Strategic Use of Multiple Texts for the Evaluation of Arguments

    Science.gov (United States)

    Kobayashi, Keiichi

    2010-01-01

    Two experiments were conducted to examine whether students use arguments with refutation in one text for evaluating the opposite arguments without refutation in another text. Undergraduate students read two conflicting texts in either of the two orders: pro arguments text first and con arguments text first. After reading each text, they evaluated…

  17. Intelligent Text Retrieval and Knowledge Acquisition from Texts for NASA Applications: Preprocessing Issues

    Science.gov (United States)

    2001-01-01

    In this contract, which is a component of a larger contract that we plan to submit in the coming months, we plan to study the preprocessing issues which arise in applying natural language processing techniques to NASA-KSC problem reports. The goals of this work will be to deal with the issues of: a) automatically obtaining the problem reports from NASA-KSC data bases, b) the format of these reports and c) the conversion of these reports to a format that will be adequate for our natural language software. At the end of this contract, we expect that these problems will be solved and that we will be ready to apply our natural language software to a text database of over 1000 KSC problem reports.

  18. T-Scan: a new tool for analyzing Dutch text

    NARCIS (Netherlands)

    Pander Maat, H.L.W.; Kraf, R.L.; van den Bosch, Antal; van Gompel, Maarten; Kleijn, S.; Sanders, T.J.M.; van der Sloot, Ko

    2014-01-01

    T-Scan is a new tool for analyzing Dutch text. It aims at extracting text features that are theoretically interesting, in that they relate to genre and text complexity, as well as practically interesting, in that they enable users and text producers to make text-specific diagnoses. T-Scan derives it

  19. Texting, Textese and Literacy Abilities: A Naturalistic Study

    Science.gov (United States)

    Drouin, Michelle; Driver, Brent

    2014-01-01

    In this study, we examined texting behaviours, text message characteristics (textese) of actual sent text messages and the relationships between texting, textese and literacy abilities in a sample of 183 American undergraduates. As compared to previous naturalistic and experimental studies with English-speaking adults, both texting frequency and…

  20. Acoustic Evaluation as a Variety of Text Metonymy

    Directory of Open Access Journals (Sweden)

    Ella V. Nesterik

    2013-01-01

    Full Text Available The article deals with sensorial evaluation, namely, acoustic evaluation as a text-forming category, studied in terms of text linguistics and text stylistics. Acoustic evaluation is considered as a variety of text metonymy, a sort of stylistic device expressing characters’ emotional state and time perception metonymically

  1. TEXT DETECTION, REMOVALAND REGION FILLING USING IMAGE INPAINTING

    OpenAIRE

    RAJESH H. DAVDA; NOOR MOHAMMED

    2012-01-01

    In this paper, we have explain that how to detect the text from image using various text detection algorithm. After text detection how to generate mask image of given text and how to apply image inpainting algorithm on given original image using resulted mask image to generate text free image.

  2. Closely Reading Informational Texts in the Primary Grades

    Science.gov (United States)

    Fisher, Douglas; Frey, Nancy

    2014-01-01

    In this article we discuss the differences between close reading in the primary grades and upper elementary grades. We focus on text selection, initial reading. repeated reading, annotation, text-based discussions, and responding to texts.

  3. LITURGICAL TEXT IN ANTON CHEKHOV'S NOVELLA "THE DUEL"

    Directory of Open Access Journals (Sweden)

    Syzranov S. V.

    2008-11-01

    Full Text Available The article examines the principle of interaction between the sacred speech, embodied in liturgical texts, and the literary text, typical for Anton Chekhov's works, by the example of his novella "The Duel".

  4. A linguistic and navigational knowledge approach to text navigation

    OpenAIRE

    Couto, Javier; Minel, Jean-Luc

    2008-01-01

    We present an approach to text navigation conceived as a cognitive process exploiting linguistic information present in texts. We claim that the navigational knowledge in-volved in this process can be modeled in a declarative way with the Sextant language. Since Sextant refers exhaustively to specific linguistic phenomena, we have defined a customized text representation. These dif-ferent components are implemented in the text navigation system NaviTexte. Two ap-plications of NaviTexte are de...

  5. Text Analytics: the convergence of Big Data and Artificial Intelligence

    OpenAIRE

    Antonio Moreno; Teófilo Redondo

    2016-01-01

    The analysis of the text content in emails, blogs, tweets, forums and other forms of textual communication constitutes what we call text analytics. Text analytics is applicable to most industries: it can help analyze millions of emails; you can analyze customers’ comments and questions in forums; you can perform sentiment analysis using text analytics by measuring positive or negative perceptions of a company, brand, or product. Text Analytics has also been called text mining, and is a subcat...

  6. KACST Arabic Text Classification Project: Overview and Preliminary Results

    OpenAIRE

    Althubaity, A.; Almuhareb, A.; Alharbi, S.; Al-Rajeh, A.; Khorsheed , M.

    2008-01-01

    Electronically formatted Arabic free-texts can be found in abundance these days on the World Wide Web, often linked to commercial enterprises and/or government organizations. Vast tracts of knowledge and relations lie hidden within these texts, knowledge that can be exploited once the correct intelligent tools have been identified and applied. For example, text mining may help with text classification and categorization. Text classification aims to automatically assign text to a predefined ca...

  7. Reading Spaced and Unspaced Chinese Text: Evidence From Eye Movements

    OpenAIRE

    Bai, Xuejun; Yan, Guoli; Zang, Chuanli; Liversedge, Simon P.; Rayner, Keith

    2008-01-01

    Native Chinese readers’ eye movements were monitored as they read text that did or did not demark word boundary information. In Experiment 1, sentences had 4 types of spacing: normal unspaced text, text with spaces between words, text with spaces between characters that yielded nonwords, and finally text with spaces between every character. The authors investigated whether the introduction of spaces into unspaced Chinese text facilitates reading and whether the word or, alternatively, the cha...

  8. Techniques, Applications and Challenging Issue in Text Mining

    Directory of Open Access Journals (Sweden)

    Shaidah Jusoh

    2012-11-01

    Full Text Available Text mining is a very exciting research area as it tries to discover knowledge from unstructured texts. These texts can be found on a desktop, intranets and the internet. The aim of this paper is to give an overview of text mining in the contexts of its techniques, application domains and the most challenging issue. The focus is given on fundamentals methods of text mining which include natural language possessing and information extraction. This paper also gives a short review on domains which have employed text mining. The challenging issue in text mining which is caused by the complexity in a natural language is also addressed in this paper.

  9. A Method for Text Summarization by Bacterial Foraging Optimization Algorithm

    Directory of Open Access Journals (Sweden)

    Morteza Dastkhosh Nikoo

    2012-07-01

    Full Text Available Due to rapid and increasingly growth of electronic texts and documents, we need some techniques for integration, communication and appropriate utilization of these texts. Summarizing the literature is one of the most fundamental tasks for integrating and taking advantages of these gathered texts. Selecting key words and then integrating them as a summary text, is the most common method in text summarization. In this paper we present a new method of automatic text summarization, with bacterial foraging optimization. The main idea of this method, is weighting words, then valuing the sentences, and finally extracting key sentences from the text, as the summarized text. It should be noted that, here we used the weighting term TF-IDF method, to determine weight for each text. Also, the bacterial foraging optimization used to converge the solutions is obtained from each bacteria, and finally the best candidate summarized text is given.

  10. More Than Words can Tell - Using Multimodal Texts to Support Reading Comprehension of Literary Texts in English

    OpenAIRE

    Leismann, Silke

    2015-01-01

    This thesis explores the possibilities of multimodality in supporting text comprehension of literary texts in language learning of the L2. While multimodal texts offer multiple ways of meaning making that sometimes go beyond the written text, I have focussed on multimodal expressions that mirror the context of a given text. I conducted an empirical study with 114 students (grade 9; 13-14 years) in two schools in Trondheim, Norway. The material I used consisted of three literary texts (e...

  11. What oral text reading fluency can reveal about reading comprehension

    NARCIS (Netherlands)

    Veenendaal, N.J.; Groen, M.A.; Verhoeven, L.T.W.

    2015-01-01

    Text reading fluency – the ability to read quickly, accurately and with a natural intonation – has been proposed as a predictor of reading comprehension. In the current study, we examined the role of oral text reading fluency, defined as text reading rate and text reading prosody, as a contributor t

  12. Text Processing and Formatting: Composure, Composition and Eros.

    Science.gov (United States)

    Blair, John C., Jr.

    1984-01-01

    Review of computer software offering text editing/processing capabilities highlights work habits, elements of computer style and composition, buffers, the CRT, line- and screen-oriented text editors, video attributes, "swapping,""cache" memory, "disk emulators," text editing versus text processing, and UNIX operating system. Specific programs…

  13. TEXT MINING – PREREQUISITE FOR KNOWLEDGE MANAGEMENT SYSTEMS

    OpenAIRE

    Dragoº Marcel VESPAN

    2009-01-01

    Text mining is an interdisciplinary field with the main purpose of retrieving new knowledge from large collections of text documents. This paper presents the main techniques used for knowledge extraction through text mining and their main areas of applicability and emphasizes the importance of text mining in knowledge management systems.

  14. Acoustic Evaluation as a Variety of Text Metonymy

    OpenAIRE

    Ella V. Nesterik; Anna D. Matrossova

    2013-01-01

    The article deals with sensorial evaluation, namely, acoustic evaluation as a text-forming category, studied in terms of text linguistics and text stylistics. Acoustic evaluation is considered as a variety of text metonymy, a sort of stylistic device expressing characters’ emotional state and time perception metonymically

  15. Introducing Text Analytics as a Graduate Business School Course

    Science.gov (United States)

    Edgington, Theresa M.

    2011-01-01

    Text analytics refers to the process of analyzing unstructured data from documented sources, including open-ended surveys, blogs, and other types of web dialog. Text analytics has enveloped the concept of text mining, an analysis approach influenced heavily from data mining. While text mining has been covered extensively in various computer…

  16. IM Set to Talk with You with Text!

    Science.gov (United States)

    Descy, Don E.

    2007-01-01

    In this article, the author discusses text messaging and instant messaging (IM). In a nutshell, text messaging is another name for Short Message Service (SMS). SMS is a service available on most digital mobile phones that permits the sending of short messages (also known as SMSes, text messages, messages, or more colloquially texts or even txts)…

  17. Toward a Model of Text Comprehension and Production.

    Science.gov (United States)

    Kintsch, Walter; Van Dijk, Teun A.

    1978-01-01

    Described is the system of mental operations occurring in text comprehension and in recall and summarization. A processing model is outlined: 1) the meaning elements of a text become organized into a coherent whole, 2) the full meaning of the text is condensed into its gist, and 3) new texts are generated from the comprehension processes.…

  18. Syntactic Complexity as an Aspect of Text Complexity

    Science.gov (United States)

    Frantz, Roger S.; Starr, Laura E.; Bailey, Alison L.

    2015-01-01

    Students' ability to read complex texts is emphasized in the Common Core State Standards (CCSS) for English Language Arts and Literacy. The standards propose a three-part model for measuring text complexity. Although the model presents a robust means for determining text complexity based on a variety of features inherent to a text as well as…

  19. Drawing on Text Features for Reading Comprehension and Composing

    Science.gov (United States)

    Risko, Victoria J.; Walker-Dalhouse, Doris

    2011-01-01

    Students read multiple-genre texts such as graphic novels, poetry, brochures, digitized texts with videos, and informational and narrative texts. Features such as overlapping illustrations and implied cause-and-effect relationships can affect students' comprehension. Teaching with these texts and drawing attention to organizational features hold…

  20. Teaching Literature in an Age of Text Complexity

    Science.gov (United States)

    Alsup, Janet

    2013-01-01

    The recently released Common Core State Standards increase classroom emphasis on informational texts in high school and recommend a three-part measurement for text complexity when selecting texts for classroom use. In this commentary I argue that fictional narratives can not only meet these stated criteria for complex texts and result in critical…

  1. Fiction vs Informational Texts: Which Will Kindergartners Choose?

    Science.gov (United States)

    Correia, Marlene Ponte

    2011-01-01

    Informational texts include books as well as text in other formats such as magazines, newspapers, and online articles. The primary purpose of informational text is to provide information about the natural and social world. Literacy research cites many reasons why nonfiction/informational texts should be included in primary classrooms. The…

  2. What Oral Text Reading Fluency Can Reveal about Reading Comprehension

    Science.gov (United States)

    Veenendaal, Nathalie J.; Groen, Margriet A.; Verhoeven, Ludo

    2015-01-01

    Text reading fluency--the ability to read quickly, accurately and with a natural intonation--has been proposed as a predictor of reading comprehension. In the current study, we examined the role of oral text reading fluency, defined as text reading rate and text reading prosody, as a contributor to reading comprehension outcomes in addition to…

  3. A New Method to Extract Text from Natural Scenes

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

    This paper presents a new method for text detection, location and binarization fron natural scenes. Several morphological steps are used to detect the general positian of the text, including English, Chinese and Japanese characters. Next bounding boxes are processed by a new "Expand, Break and Merge" (EBM) method to get the precise text areas. Finally, text is binarized by a hybrid method based on Otsu and Niblack. This new approach can extract different kinds of text from complicated natural scenes. It is insensitive to noise, distortedness, and text orientation. It also has good performance on extracting texts in various sizes.

  4. Techniques, Applications and Challenging Issue in Text Mining

    OpenAIRE

    Shaidah Jusoh; Hejab M. Alfawareh

    2012-01-01

    Text mining is a very exciting research area as it tries to discover knowledge from unstructured texts. These texts can be found on a desktop, intranets and the internet. The aim of this paper is to give an overview of text mining in the contexts of its techniques, application domains and the most challenging issue. The focus is given on fundamentals methods of text mining which include natural language possessing and information extraction. This paper also gives a short review on domains whi...

  5. Research on Text Mining Based on Domain Ontology

    OpenAIRE

    Li-hua, Jiang; Neng-fu, Xie; Hong-bin, Zhang

    2013-01-01

    This paper improves the traditional text mining technology which cannot understand the text semantics. The author discusses the text mining methods based on ontology and puts forward text mining model based on domain ontology. Ontology structure is built firstly and the “concept-concept” similarity matrix is introduced, then a conception vector space model based on domain ontology is used to take the place of traditional vector space model to represent the documents in order to realize text m...

  6. Text Normalization and Diphone Preparation for Bangla Speech Synthesis

    OpenAIRE

    Muhammad Masud Rashid; Md. Akter Hussain; M. Shahidur Rahman

    2010-01-01

    This paper presents methodologies involved in text normalization and diphone preparation for Bangla Text to Speech (TTS) synthesis. A Concatenation based TTS system comprises  basically two modules-  one is natural language processing and the other is Digital Signal Processing (DSP). Natural language processing deals with converting text to its pronounceable  form, called Text Normalization and the diphone selection method based on the normalized text is called Grapheme to Phoneme (G2P) conve...

  7. A contrastive analysis of French and English social statistics texts

    OpenAIRE

    Creed, Mairead

    1995-01-01

    This thesis adopts the theoretical framework of contrastive textology (CT) developed by Hartmann (1980) for the analysis of the language of French and English expository texts from the domain of social statistics CT results from a combination of two linguistic orientations text linguistics and contrastive stylistics (CS). Hartmann uses the term parallel texts to describe (a) translated texts and (b) non-translated texts in two languages which were produced in circumstances so similar as ...

  8. Investigating text message classification using case-based reasoning

    OpenAIRE

    Healy, Matt, (Thesis)

    2007-01-01

    Text classification is the categorization of text into a predefined set of categories. Text classification is becoming increasingly important given the large volume of text stored electronically e.g. email, digital libraries and the World Wide Web (WWW). These documents represent a massive amount of information that can be accessed easily. To gain benefit from using this information requires organisation. One way of organising it automatically is to use text classification. A number of well k...

  9. A Survey On Various Approaches Of Text Extraction In Images

    Directory of Open Access Journals (Sweden)

    C.P. Sumathi

    2012-09-01

    Full Text Available Text Extraction plays a major role in finding vital and valuable information. Text extraction involvesdetection, localization, tracking, binarization, extraction, enhancement and recognition of the text from the given image. These text characters are difficult to be detected and recognized due to their deviation of size, font, style, orientation, alignment, contrast, complex colored, textured background. Due to rapid growth of available multimedia documents and growing requirement for information, identification, indexing and retrieval, many researches have been done on text extraction in images.Several techniqueshave been developed for extracting the text from an image. The proposed methods were based on morphological operators, wavelet transform, artificial neural network,skeletonization operation,edge detection algorithm, histogram technique etc. All these techniques have their benefits and restrictions. This article discusses various schemes proposed earlier for extracting the text from an image. This paper also provides the performance comparison of several existing methods proposed by researchers in extracting the text from an image.

  10. Why Are Some Texts Good and Others Not? Relationship between Text Quality and Management of the Writing Processes

    Science.gov (United States)

    Beauvais, Caroline; Olive, Thierry; Passerault, Jean-Michel

    2011-01-01

    Two experiments examined whether text quality is related to online management of the writing processes. Experiment 1 focused on the relationship between online management and text quality in narrative and argumentative texts. Experiment 2 investigated how this relationship might be affected by a goal emphasizing text quality. In both experiments,…

  11. How to Recover Deleted Text Messages from iPhone

    OpenAIRE

    Terry

    2015-01-01

    For most of us, we are using text messages to contact our friends and family everyday with our iPhone. We also send many important information and others via text messages. So if you delete or lost important text messages from iPhone, it will be really a disaster for you. But it is not the end of the world, iPhone Data Recovery software can retrieve deleted text messages from iPhone for you easily. It provides 3 ways of recovering text messages from iPhone. It can recover deleted text mes...

  12. The Application of the Cooperative Principle in Text Messages

    Institute of Scientific and Technical Information of China (English)

    李军霞

    2015-01-01

    The language of text messages speeds up the transmission of information,shows the richness of languages,and contains all kinds of implication. Many researches on text messages have been published but the analysis of the languages of text messages in the domain of Grice’s cooperative principle is open to investigate. This paper explores the language of text messages based on Grice’s Cooperative Principle(CP) and its maxims,which aims to understand how the theory influences the text message communication and create some humorous effect. It is of practical significance to research text messages as a kind of language phenomenon.

  13. The Application of the Cooperative Principle in Text Messages

    Institute of Scientific and Technical Information of China (English)

    李军霞

    2015-01-01

    The language of text messages speeds up the transmission of information,shows the richness of languages,and contains all kinds of implication. Many researches on text messages have been published but the analysis of the languages of text messages in the domain of Grice’s cooperative principle is open to investigate. This paper explores the language of text messages based on Grice’s Cooperative Principle (CP) and its maxims,which aims to understand how the theory influences the text message communication and create some humorous effect. It is of practical significance to research text messages as a kind of language phenomenon.

  14. Generating an Arabic Calligraphy Text Blocks for Global Texture Analysis

    Directory of Open Access Journals (Sweden)

    Bilal Bataineh

    2011-01-01

    Full Text Available This paper objective is to improve the current method for generating an Arabic Calligraphy text blocks. We test on seven types of Arabic Calligraphy text. We apply  projection profiles and a proposed filter to discriminate each line of the Arabic Calligraphy scripts. After performing text detection, skew correction, text and line normalization subsequently, we generate Arabic Calligraphy text blocks for global texture analysis purposes. We compare our proposed filter with current method and median filter. The results show that the proposed filter  is outperformed. The proposed method can be further  improved to boost the overall performance.

  15. The text plan concept: contributions to the writing planning process

    Directory of Open Access Journals (Sweden)

    Ana Lúcia Tinoco Cabral

    2013-12-01

    Full Text Available Students - at different levels, ranging from early grades up to PhD - face problems both on comprehension and text production. This paper focuses on the text plan concept according to the DTA (Discourse Text Analysis approach, i.e., a principle of organization that allows students to put into practice the production intention as well as to arrange text information while producing; being responsible for the text compositional structure (Adam, 2008. The study analyzes the relation between text plan and the writing planning process, in which the first one provides the second with theoretical support. In order to develop such research, the study covers some issues related to the reading skill, analyzes an argumentative text as per its textual plan, and presents some reflections on the writing process, focusing on the relation between textual plan and the writing planning process.

  16. Modeling, Learning, and Processing of Text Technological Data Structures

    CERN Document Server

    Kühnberger, Kai-Uwe; Lobin, Henning; Lüngen, Harald; Storrer, Angelika; Witt, Andreas

    2012-01-01

    Researchers in many disciplines have been concerned with modeling textual data in order to account for texts as the primary information unit of written communication. The book “Modelling, Learning and Processing of Text-Technological Data Structures” deals with this challenging information unit. It focuses on theoretical foundations of representing natural language texts as well as on concrete operations of automatic text processing. Following this integrated approach, the present volume includes contributions to a wide range of topics in the context of processing of textual data. This relates to the learning of ontologies from natural language texts, the annotation and automatic parsing of texts as well as the detection and tracking of topics in texts and hypertexts. In this way, the book brings together a wide range of approaches to procedural aspects of text technology as an emerging scientific discipline.

  17. The classical dramatic text and its value in contemporary theatre

    Directory of Open Access Journals (Sweden)

    Nina Žavbi Milojević

    2013-06-01

    Full Text Available This paper deals with the classical dramatic text and its staging in contemporary theatre. Specifically, it aims to show that classical texts can address topical issues. This is illustrated by the example of several stagings of Ivan Cankar’s Hlapci, one of the most influential dramatic texts in Slovene literature. The history of this dramatic text is presented from its first publication and reception to the different stagings in various Slovene professional theatres. The focus is on how the situation in Slovene society is reflected in each examined staging. The drama Hlapci was first staged almost one hundred years ago, when the staging followed closely the dramatic text. However, after 1980 stagings became more independent from the text and more artistic freedom was allowed. The paper will prove that classical dramatic texts are very appropriate for staging in contemporary theatre, especially with an innovative director’s approach.

  18. Towards Multi Label Text Classification through Label Propagation

    Directory of Open Access Journals (Sweden)

    Shweta C. Dharmadhikari

    2012-06-01

    Full Text Available Classifying text data has been an active area of research for a long time. Text document is multifaceted object and often inherently ambiguous by nature. Multi-label learning deals with such ambiguous object. Classification of such ambiguous text objects often makes task of classifier difficult while assigning relevant classes to input document. Traditional single label and multi class text classification paradigms cannot efficiently classify such multifaceted text corpus. Through our paper we are proposing a novel label propagation approach based on semi supervised learning for Multi Label Text Classification. Our proposed approach models the relationship between class labels and also effectively represents input text documents. We are using semi supervised learning technique for effective utilization of labeled and unlabeled data for classification. Our proposed approach promises better classification accuracy and handling of complexity and elaborated on the basis of standard datasets such as Enron, Slashdot and Bibtex.

  19. Hierarchical Three-level Ontology for Text Processing

    OpenAIRE

    Gladun, Victor; Velychko, Vitalii; Svyatogor, Leonid

    2008-01-01

    The principal feature of ontology, which is developed for a text processing, is wider knowledge representation of an external world due to introduction of three-level hierarchy. It allows to improve semantic interpretation of natural language texts.

  20. Complex network analysis of literary and scientific texts

    CERN Document Server

    Grabska-Gradzinska, Iwona; Kwapien, Jaroslaw; Drozdz, Stanislaw

    2012-01-01

    We present results from our quantitative study of statistical and network properties of literary and scientific texts written in two languages: English and Polish. We show that Polish texts are described by the Zipf law with the scaling exponent smaller than the one for the English language. We also show that the scientific texts are typically characterized by the rank-frequency plots with relatively short range of power-law behavior as compared to the literary texts. We then transform the texts into their word-adjacency network representations and find another difference between the languages. For the majority of the literary texts in both languages, the corresponding networks revealed the scale-free structure, while this was not always the case for the scientific texts. However, all the network representations of texts were hierarchical. We do not observe any qualitative and quantitative difference between the languages. However, if we look at other network statistics like the clustering coefficient and the...