WorldWideScience

Sample records for arretes circulaires textes

  1. Statistiques circulaires et utilisations en psychologie

    Directory of Open Access Journals (Sweden)

    Catherine Mello

    2005-09-01

    Full Text Available Le chercheur en psychologie traitant des données angulaires ou cycliques est confronté aux problèmes de périodicité et de l'arbitraire du système de mesure que posent les statistiques circulaires. Les méthodes usuelles de calcul de paramètres comme la moyenne ne sont alors d'aucune utilité. Cette introduction aux statistiques circulaires présente des fonctions trigonométriques permettant le calcul des paramètres circulaires: centre de masse, concentration, dispersion et homeward component. Des distributions circulaires couramment utilisées ainsi que des méthodes d'inférence statistique mises au point pour des mesures circulaires pertinentes à la psychologie sont également décrites. Un exemple d'une expérience simple d'orientation illustre l'application des différents tests statistiques au moyen du logiciel Microsoft Excel.

  2. Antennes lecteurs RFID à polarisation circulaire pour application robotique

    OpenAIRE

    Hebib, Sami; Bouaziz, Sofiene; Aubert, Hervé; Lerasle, Frédéric

    2011-01-01

    National audience Dans ce papier, une nouvelle antenne lecteur RFID à polarisation circulaire a été développée pour la chaine de radiolocalisation du robot Rackham du LAAS-CNRS. Cette antenne (20 cm x 20 cm) permet de couvrir la totalité de bande RFID UHF (860-960 MHz) et présente un gain simulé de 4dBi. Deux exemplaires de cette antenne ont été fabriqués et mesurés. Les tests en radiolocalisation de ces antennes montrent leur conformité aux exigences de l'application robotique considérée....

  3. G8: trois Francais, auteurs presumes de violences à Genève, arretes mercredi

    CERN Document Server

    2003-01-01

    "Trois Francais, soupconnes d'avoir participe a des violences survenues a Geneve en marge du sommet du G8 en juin dernier, ont ete arretes mercredi a Geneve, apres avoir reconnu les faits qui leur sont reproches, a annonce la police locale" (1/2 page).

  4. La nouvelle circulaire adhérence de la Direction des routes nationales de France

    OpenAIRE

    Dupont, P.; BAUDUIN, A

    2005-01-01

    La politique de la maîtrise d'ouvrage nationale française en matière d'adhérence est présentée. Les différentes circulaires publiées et les raisons principales de leurs remplacements successifs sont rappelées. La dernière circulaire, publiée en 2002, est présentée en détail. Elle résulte des travaux d'un sous-groupe de travail du Groupe national des caractéristiques De surface (GNCDS), créé par le Directeur des routes de France en 1991. Elle définit des spécifications en profondeur moyenne de...

  5. Chances for a circular economy in the Netherlands; Kansen voor de circulaire economie in Nederland

    Energy Technology Data Exchange (ETDEWEB)

    Bastein, T.; Roelofs, E.; Rietveld, E.; Hoogendoorn, A.

    2013-06-15

    The concept of circular economy is an economic and industrial system that focuses on the reusability of products and raw materials, reduces value destruction in the overall system and aims at value creation within each tier of the system. In this report the (economic) opportunities are quantified as much as possible, and impacts on employment and the environmental are addressed. The study focuses specifically on the Dutch economy. The analysis starts by means of two detailed case studies: the use of biomass wastes and the circular economy that may arise in the metal-electronics industry [Dutch] Het begrip 'circulaire economie' is een economisch en industrieel systeem dat zich richt op de herbruikbaarheid van producten en grondstoffen, waarde vernietiging in het totale systeem minimaliseert en waarde creatie in iedere schakel van het systeem nastreeft. In dit rapport worden de (economische) kansen zoveel mogelijk gekwantificeerd, waarbij effecten op werkgelegenheid en milieudruk aan bod komen. De studie richt zich nadrukkelijk op de gehele Nederlandse economie. De analyse start aan de hand van twee gedetailleerde case studies: de benutting van reststromen uit biomassa en de circulaire economie die kan ontstaan t.b.v. producten uit de metaalelektro-sector.

  6. Collection of regulatory texts relative to radiation protection. Part 2: orders and decisions taken in application of the Public Health Code and Labour Code concerning the protection of populations, patients and workers against the risks of ionizing radiations; Recueil de textes reglementaires relatifs a la radioprotection. Partie 2: arretes et decisions pris en application du Code de Sante Publique et du Code du Travail concernant la protection de la population, des patients et des travailleurs contre les dangers des rayonnements ionisants

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2007-05-15

    This collection of texts includes the general measures of population protection, exposure to natural radiations, general system of authorizations and statements, protection of persons exposed to ionizing radiations for medical purpose, situations of radiological emergency and long exposure to ionizing radiations, penal dispositions, application of the Public Health code and application of the Labour code. Chronological contents by date of publication is given. (N.C.)

  7. Conformity of nuclear construction codes with the requirements of the French order dated December 12, 2005 related to nuclear pressure equipment; Conformite des codes de construction nucleaires avec les exigences de l'arrete du 12 decembre 2005 relatif aux equipements sous pression nucleaires

    Energy Technology Data Exchange (ETDEWEB)

    Grandemange, J.M.; Renaut, P. [Areva-NP, Tour AREVA, 92084 - Paris La Defense cedex, (France); Paris, D. [EDF-Ceidre 2 rue Ampere - 93206 SAINT-DENIS Cedex (France); Faidy, C. [EDF-Septen 12/14, Avenue Dutrievoz 69628 Villeurbanne Cedex (France)

    2007-07-01

    The French Decree dated December 13, 1999 transposing the Pressure Equipment Directive (PED) has replaced the fundamental texts on which up to now the regulation for pressure equipment important for the safety of nuclear reactors was also founded. By a Ministerial Order - called 'ESPN Order' - dated December 12, 2005, a new regulation has been issued for nuclear pressure equipment. This text makes reference to the Decree transposing the PED while completing these provisions by supplementary requirements having the objective to provide a very high level of integrity guarantee for equipments which are the most important for safety, and to cover the prevention of radioactive release risks. These regulatory evolutions are presented in the Plenary Session of the ESOPE conference. Referencing the Decree and thus the PED, and including specific provisions, the Ministerial Order implies that the Manufacturers update their documents and, if necessary, their prescriptions in the following two domains: - that of the conformity of Codes and Standards used, generally inspired from the ASME Code Section III, with the essential safety requirements of the PED, - that of the respect of the complementary provisions brought by the ESPN Order. This paper presents the more significant conclusions of this work and the resulting amendments of the RCC-M Code, introduced by the 2007 addendum to that Code. The analysis will lead to specify the same type of complementary requirements to Code when a manufacturer wishes to use the German KTA Rules or the ASME Code Section III. (authors) [French] Le decret du 13 decembre 1999 transposant la directive europeenne (DESP) relative aux equipements sous pression a remplace les textes fondamentaux sur lesquels se fondait egalement jusque la la reglementation des appareils a pression importants pour la surete des reacteurs nucleaires. Par arrete - dit 'arrete ESPN' - du 12 decembre 2005, une nouvelle reglementation a ete dictee. Ce

  8. Le shunt circulaire dans la forme néonatale sévère de la maladie d’Ebstein : effet bénéfique ou délétère des prostaglandines?

    Science.gov (United States)

    Hakim, Kaouthar; Boussaada, Rafik; Ayari, Jihen; Imen, Hamdi; Msaad, Hela; Ouarda, Fatma; Chaker, Lilia

    2014-01-01

    Résumé La maladie d’Ebstein avec atrésie pulmonaire fonctionnelle est une présentation sévère néonatale de la maladie d’Ebstein où la conduite thérapeutique se base classiquement sur la prescription de prostaglandines. Le shunt circulaire est une complication « hémodynamique » grave et souvent méconnue, incitant à l’arrêt des prostaglandines. Nous rapportons une forme néonatale sévère de maladie d’Ebstein avec aggravation hémodynamique attribuée à un shunt circulaire. Le diagnostic de maladie d’Ebstein avec atrésie pulmonaire fonctionnelle a été fait en anténatal à 36 semaines d’aménorrhée. Le patient est né à 38 semaines d’aménorrhée par césarienne. Une échographie post-natale a confirmé le diagnostic. Un traitement par prostaglandines a été initialement institué pour maintenir le canal artériel vicariant. Malgré ce traitement, une dégradation hémodynamique a été observée. L’échographie de contrôle a montré des images en faveur d’un shunt circulaire. En effet, Le sang arrivant dans l’artère pulmonaire par le canal artériel large, était « aspiré » vers le ventricule droit, puis dans l’oreillette droite du fait de la régurgitation tricuspide et de là vers le coeur gauche via le foramen ovale shuntant droite-gauche; il était alors éjecté dans l’aorte et le canal artériel. Devant ce shunt circulaire, le traitement par prostaglandines était interrompu et un traitement visant à réduire plutôt les résistances pulmonaires a été prescrit. Cependant, le patient est décédé avant l’instauration de ce traitement. La forme néonatale de maladie d’Ebstein est une forme grave qui peut se compliquer d’un shunt circulaire. Ce phénomène hémodynamique encourage la fermeture précoce du canal artériel contre indiquant ainsi la prescription des prostaglandines. PMID:25642457

  9. Combat desertification, arret deforestation

    International Nuclear Information System (INIS)

    This article presents the major progress on the actions of the Forest Department and Dry Zone Greening Department to arrest forestation and to combat desertification in the dry zone of central Myanmar

  10. Circular from January 26, 2004, taken for the enforcement of the by-law from January 26, 2004, relative to the national defense secrecy protection in the domain of nuclear materials protection and control; Circulaire du 26 janvier 2004 prise pour l'application de l'arrete du 26 janvier 2004 relatif a la protection du secret de la defense nationale dans le domaine de la protection et du controle des matieres nucleaires

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2004-01-15

    The by-law of January 26, 2004 gives a regulatory foundation to the classification of sensible informations relative to the security and physical protection of nuclear materials. This circular recalls, in this framework, the conditions of implementation of the regulation relative to the protection of national defense secrecies in the domain of the protection of nuclear facilities and materials. (J.S.)

  11. Text Mining.

    Science.gov (United States)

    Trybula, Walter J.

    1999-01-01

    Reviews the state of research in text mining, focusing on newer developments. The intent is to describe the disparate investigations currently included under the term text mining and provide a cohesive structure for these efforts. A summary of research identifies key organizations responsible for pushing the development of text mining. A section…

  12. Contextual Text Mining

    Science.gov (United States)

    Mei, Qiaozhu

    2009-01-01

    With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the…

  13. Quality text editing

    Directory of Open Access Journals (Sweden)

    Gyöngyi Bujdosó

    2009-10-01

    Full Text Available Text editing is more than the knowledge of word processing techniques. Originally typographers, printers, text editors were the ones qualified to edit texts, which were well structured, legible, easily understandable, clear, and were able to emphasize the coreof the text. Time has changed, and nowadays everyone has access to computers as well as to text editing software and most users believe that having these tools is enough to edit texts. However, text editing requires more skills. Texts appearing either in printed or inelectronic form reveal that most of the users do not realize that they are not qualified to edit and publish their works. Analyzing the ‘text-products’ of the last decade a tendency can clearly be drawn. More and more documents appear, which instead of emphasizingthe subject matter, are lost in the maze of unstructured text slices. Without further thoughts different font types, colors, sizes, strange arrangements of objects, etc. are applied. We present examples with the most common typographic and text editing errors. Our aim is to call the attention to these mistakes and persuadeusers to spend time to educate themselves in text editing. They have to realize that a well-structured text is able to strengthen the effect on the reader, thus the original message will reach the target group.

  14. Text Mining: (Asynchronous Sequences

    Directory of Open Access Journals (Sweden)

    Sheema Khan

    2014-12-01

    Full Text Available In this paper we tried to correlate text sequences those provides common topics for semantic clues. We propose a two step method for asynchronous text mining. Step one check for the common topics in the sequences and isolates these with their timestamps. Step two takes the topic and tries to give the timestamp of the text document. After multiple repetitions of step two, we could give optimum result.

  15. Text Coherence in Translation

    Science.gov (United States)

    Zheng, Yanping

    2009-01-01

    In the thesis a coherent text is defined as a continuity of senses of the outcome of combining concepts and relations into a network composed of knowledge space centered around main topics. And the author maintains that in order to obtain the coherence of a target language text from a source text during the process of translation, a translator can…

  16. Arabic Short Text Compression

    Directory of Open Access Journals (Sweden)

    Eman Omer

    2010-01-01

    Full Text Available Problem statement: Text compression permits representing a document by using less space. This is useful not only to save disk space, but more importantly, to save disk transfer and network transmission time. With the continues increase in the number of Arabic short text messages sent by mobile phones, the use of a suitable compression scheme would allow users to use more characters than the default value specified by the provider. The development of an efficient compression scheme to compress short Arabic texts is not a straight forward task. Approach: This study combined the benefits of pre-processing, entropy reduction through splitting files and hybrid dynamic coding: A new technique proposed in this study that uses the fact that Arabic texts have single case letters. Experimental tests had been performed on short Arabic texts and a comparison with the well known plain Huffman compression was made to measure the performance of the proposed schema for Arabic short text. Results: The proposed schema can achieve a compression ratio around 4.6 bits byte-1 for very short Arabic text sequences of 15 bytes and around 4 bits byte-1 for 50 bytes text sequences, using only 8 Kbytes overhead of memory. Conclusion: Furthermore, a reasonable compression ratio can be achieved using less than 0.4 KB of memory overhead. We recommended the use of proposed schema to compress small Arabic text with recourses limited.

  17. Vocabulary Constraint on Texts

    Directory of Open Access Journals (Sweden)

    C. Sutarsyah

    2008-01-01

    Full Text Available This case study was carried out in the English Education Department of State University of Malang. The aim of the study was to identify and describe the vocabulary in the reading text and to seek if the text is useful for reading skill development. A descriptive qualitative design was applied to obtain the data. For this purpose, some available computer programs were used to find the description of vocabulary in the texts. It was found that the 20 texts containing 7,945 words are dominated by low frequency words which account for 16.97% of the words in the texts. The high frequency words occurring in the texts were dominated by function words. In the case of word levels, it was found that the texts have very limited number of words from GSL (General Service List of English Words (West, 1953. The proportion of the first 1,000 words of GSL only accounts for 44.6%. The data also show that the texts contain too large proportion of words which are not in the three levels (the first 2,000 and UWL. These words account for 26.44% of the running words in the texts.  It is believed that the constraints are due to the selection of the texts which are made of a series of short-unrelated texts. This kind of text is subject to the accumulation of low frequency words especially those of content words and limited of words from GSL. It could also defeat the development of students' reading skills and vocabulary enrichment.

  18. EMOTION DETECTION FROM TEXT

    Directory of Open Access Journals (Sweden)

    Shiv Naresh Shivhare

    2012-05-01

    Full Text Available Emotion can be expressed in many ways that can be seen such as facial expression and gestures, speech and by written text. Emotion Detection in text documents is essentially a content – based classification problem involving concepts from the domains of Natural Language Processing as well as Machine Learning. In this paper emotion recognition based on textual data and the techniques used in emotion detection are discussed.

  19. Mining text data

    CERN Document Server

    Aggarwal, Charu C

    2012-01-01

    Text mining applications have experienced tremendous advances because of web 2.0 and social networking applications. Recent advances in hardware and software technology have lead to a number of unique scenarios where text mining algorithms are learned. ""Mining Text Data"" introduces an important niche in the text analytics field, and is an edited volume contributed by leading international researchers and practitioners focused on social networks & data mining. This book contains a wide swath in topics across social networks & data mining. Each chapter contains a comprehensive survey including

  20. Instant Sublime Text starter

    CERN Document Server

    Haughee, Eric

    2013-01-01

    A starter which teaches the basic tasks to be performed with Sublime Text with the necessary practical examples and screenshots. This book requires only basic knowledge of the Internet and basic familiarity with any one of the three major operating systems, Windows, Linux, or Mac OS X. However, as Sublime Text 2 is primarily a text editor for writing software, many of the topics discussed will be specifically relevant to software development. That being said, the Sublime Text 2 Starter is also suitable for someone without a programming background who may be looking to learn one of the tools of

  1. Linguistics in Text Interpretation

    DEFF Research Database (Denmark)

    Togeby, Ole

    2011-01-01

    A model for how text interpretation proceeds from what is pronounced, through what is said to what is comunicated, and definition of the concepts 'presupposition' and 'implicature'.......A model for how text interpretation proceeds from what is pronounced, through what is said to what is comunicated, and definition of the concepts 'presupposition' and 'implicature'....

  2. Systematic text condensation

    DEFF Research Database (Denmark)

    Malterud, Kirsti

    2012-01-01

    To present background, principles, and procedures for a strategy for qualitative analysis called systematic text condensation and discuss this approach compared with related strategies.......To present background, principles, and procedures for a strategy for qualitative analysis called systematic text condensation and discuss this approach compared with related strategies....

  3. Clustering Text Data Streams

    Institute of Scientific and Technical Information of China (English)

    Yu-Bao Liu; Jia-Rong Cai; Jian Yin; Ada Wai-Chee Fu

    2008-01-01

    Clustering text data streams is an important issue in data mining community and has a number of applications such as news group filtering, text crawling, document organization and topic detection and tracing etc. However, most methods are similarity-based approaches and only use the TF*IDF scheme to represent the semantics of text data and often lead to poor clustering quality. Recently, researchers argue that semantic smoothing model is more efficient than the existing TF.IDF scheme for improving text clustering quality. However, the existing semantic smoothing model is not suitable for dynamic text data context. In this paper, we extend the semantic smoothing model into text data streams context firstly. Based on the extended model, we then present two online clustering algorithms OCTS and OCTSM for the clustering of massive text data streams. In both algorithms, we also present a new cluster statistics structure named cluster profile which can capture the semantics of text data streams dynamically and at the same time speed up the clustering process. Some efficient implementations for our algorithms are also given. Finally, we present a series of experimental results illustrating the effectiveness of our technique.

  4. Centroid Based Text Clustering

    Directory of Open Access Journals (Sweden)

    Priti Maheshwari

    2010-09-01

    Full Text Available Web mining is a burgeoning new field that attempts to glean meaningful information from natural language text. Web mining refers generally to the process of extracting interesting information and knowledge from unstructured text. Text clustering is one of the important Web mining functionalities. Text clustering is the task in which texts are classified into groups of similar objects based on their contents. Current research in the area of Web mining is tacklesproblems of text data representation, classification, clustering, information extraction or the search for and modeling of hidden patterns. In this paper we propose for mining large document collections it is necessary to pre-process the web documents and store the information in a data structure, which is more appropriate for further processing than a plain web file. In this paper we developed a php-mySql based utility to convert unstructured web documents into structured tabular representation by preprocessing, indexing .We apply centroid based web clustering method on preprocessed data. We apply three methods for clustering. Finally we proposed a method that can increase accuracy based on clustering ofdocuments.

  5. Extracting Text from Video

    Directory of Open Access Journals (Sweden)

    Jayshree Ghorpade

    2011-09-01

    Full Text Available The text data present in images and video contain certain useful information for automatic annotation,indexing, and structuring of images. However variations of the text due to differences in text style, font, size, orientation, alignment as well as low image contrast and complex background make the problem of automatic text extraction extremely difficult and challenging job. A large number of techniques have been proposed to address this problem and the purpose of this paper is to design algorithms for each phase of extracting text from a video using java libraries and classes. Here first we frame the input video into stream of images using the Java Media Framework (JMF with the input being a real time or a video from the database. Then we apply pre processing algorithms to convert the image to gray scale and remove the disturbances like superimposed lines over the text, discontinuity removal, and dot removal.Then we continue with the algorithms for localization, segmentation and recognition for which we use the neural network pattern matching technique. The performance of our approach is demonstrated by presenting experimental results for a set of static images.

  6. EXTRACTING TEXT FROM VIDEO

    Directory of Open Access Journals (Sweden)

    Jayshree Ghorpade

    2011-06-01

    Full Text Available The text data present in images and video contain certain useful information for automatic annotation,indexing, and structuring of images. However variations of the text due to differences in text style, font, size, orientation, alignment as well as low image contrast and complex background make the problem of automatic text extraction extremely difficult and challenging job. A large number of techniques have been proposed to address this problem and the purpose of this paper is to design algorithms for each phase of extracting text from a video using java libraries and classes. Here first we frame the input video into stream of images using the Java Media Framework (JMF with the input being a real time or a video from the database. Then we apply pre processing algorithms to convert the image to gray scale and remove the disturbances like superimposed lines over the text, discontinuity removal, and dot removal.Then we continue with the algorithms for localization, segmentation and recognition for which we use the neural network pattern matching technique. The performance of our approach is demonstrated by presenting experimental results for a set of static images.

  7. About CABI Full Text

    Institute of Scientific and Technical Information of China (English)

    2012-01-01

    <正>Centre for Agriculture and Bioscience International( CABI) is a not-for-profit international Agricultural Information Institute with headquarters in Britain. It aims to improve people’s lives by providing information and applying scientific expertise to solve problems in agriculture and the environment. CABI Full-text is one of the publishing products of CABI.CABI’s full text repository is growing rapidly and has now been integrated into all our databases including CAB Abstracts,Global Health,our Internet Resources and Abstract Journals. There are currently over 60,000 full text articles available to access. These documents,made possible by agreement with third

  8. Texts of Television Advertisements

    OpenAIRE

    Michalewski, Kazimierz

    1995-01-01

    Short advertisement films occupy a large part (especially around the peak viewing hours) of everyday programmes of the Polish stale television. Even though it is possible to imagine an advertisement film employing only extralinguistic means of communication, the advertisements in generał, have so far been using written and spoken texts. The basic function of such a text and of the whole film is to encourage the viewers to buy the advertised product. However, independently of th...

  9. Machine Translation from Text

    Science.gov (United States)

    Habash, Nizar; Olive, Joseph; Christianson, Caitlin; McCary, John

    Machine translation (MT) from text, the topic of this chapter, is perhaps the heart of the GALE project. Beyond being a well defined application that stands on its own, MT from text is the link between the automatic speech recognition component and the distillation component. The focus of MT in GALE is on translating from Arabic or Chinese to English. The three languages represent a wide range of linguistic diversity and make the GALE MT task rather challenging and exciting.

  10. Emotion Detection from Text

    CERN Document Server

    Shivhare, Shiv Naresh

    2012-01-01

    Emotion can be expressed in many ways that can be seen such as facial expression and gestures, speech and by written text. Emotion Detection in text documents is essentially a content - based classification problem involving concepts from the domains of Natural Language Processing as well as Machine Learning. In this paper emotion recognition based on textual data and the techniques used in emotion detection are discussed.

  11. Text simplification for children

    OpenAIRE

    De Belder, Jan; Moens, Marie-Francine

    2010-01-01

    The goal in this paper is to automatically transform text into a simpler text, so that it is easier to understand by children. We perform syntactic simplification, i.e. the splitting of sentences, and lexical simplification, i.e. replacing difficult words with easier synonyms. We test the performance of this approach for each component separately on a per sentence basis, and globally with the automatic construction of simplified news articles and encyclopedia articles. By including informatio...

  12. About CABI Full Text

    Institute of Scientific and Technical Information of China (English)

    2014-01-01

    <正>Centre for Agriculture and Bioscience International(CABI)is a not-for-profit international Agricultural Information Institute with headquarters in Britain.It aims to improve people’s lives by providing information and applying scientific expertise to solve problems in agriculture and the environment.CABI Full-text is one of the publishing products of CABI.CABI’s full text repository is growing rapidly

  13. About CABI Full Text

    Institute of Scientific and Technical Information of China (English)

    2013-01-01

    <正>Centre for Agriculture and Bioscience International(CABI)is a not-for-profit international Agricultural Information Institute with headquarters in Britain.It aims to improve people’s lives by providing information and applying scientific expertise to solve problems in agriculture and the environment.CABI Full-text is one of the publishing products of CABI.CABI’s full text repository is growing rapidly and has now been integrated into all our databases including CAB Abstracts,Global Health

  14. About CABI Full Text

    Institute of Scientific and Technical Information of China (English)

    2013-01-01

    <正>Centre for Agriculture and Bioscience International(CABI)is a not-for-profit international Agricultural Information Institute with headquarters in Britain.It aims to improve people’s lives by providing information and applying scientific expertise to solve problems in agriculture and the environment.CABI Full-text is one of the publishing products of CABI.CABI’s full text repository is growing rapidly and has now been integrated into all our databases including CAB Abstracts,Global Health,our Internet Resources and Jour-

  15. Reading Authorship into Texts.

    Science.gov (United States)

    Werner, Walter

    2000-01-01

    Provides eight concepts, with illustrative questions for interpreting the authorship of texts, that are borrowed from cultural studies literature: (1) representation; (2) the gaze; (3) voice; (4) intertextuality; (5) absence; (6) authority; (7) mediation; and (8) reflexivity. States that examples were taken from British Columbia's (Canada) social…

  16. Reading Authentic Texts

    DEFF Research Database (Denmark)

    Balling, Laura Winther

    2013-01-01

    Most research on cognates has focused on words presented in isolation that are easily defined as cognate between L1 and L2. In contrast, this study investigates what counts as cognate in authentic texts and how such cognates are read. Participants with L1 Danish read news articles in their highly...

  17. Texts in the landscape

    Directory of Open Access Journals (Sweden)

    James Graham-Campbell

    1998-11-01

    Full Text Available The Institute's members of UCL's "Celtic Inscribed Stones" project describe, in collaboration with Wendy Davies, Mark Handley and Paul Kershaw (Department of History, a major interdisciplinary study of inscriptions of the early middle ages from the Celtic areas of northwest Europe.

  18. Polymorphous Perversity in Texts

    Science.gov (United States)

    Johnson-Eilola, Johndan

    2012-01-01

    Here's the tricky part: If we teach ourselves and our students that texts are made to be broken apart, remixed, remade, do we lose the polymorphous perversity that brought us pleasure in the first place? Does the pleasure of transgression evaporate when the borders are opened?

  19. Text Induced Spelling Correction

    NARCIS (Netherlands)

    Reynaert, M.W.C.

    2004-01-01

    We present TISC, a language-independent and context-sensitive spelling checking and correction system designed to facilitate the automatic removal of non-word spelling errors in large corpora. Its lexicon is derived from a very large corpus of raw text, without supervision, and contains word unigram

  20. About CABI Full Text

    Institute of Scientific and Technical Information of China (English)

    2013-01-01

    <正>Centre for Agriculture and Bioscience International( CABI) is a not-for-profit international Agricultural Information Institute with headquarters in Britain. It aims to improve people’s lives by providing information and applying scientific expertise to solve problems in agriculture and the environment. CABI Full-text is one of the publishing products of CABI.

  1. About CABI Full Text

    Institute of Scientific and Technical Information of China (English)

    2013-01-01

    <正>Centre for Agriculture and Bioscience International(CABI) is a not-for-profit international Agricultural Information Institute with headquarters in Britain. It aims to improve people’s lives by providing information and applying scientific expertise to solve problems in agriculture and the environment. CABI Full-text is one of the publishing products of CABI.

  2. About CABI Full Text

    Institute of Scientific and Technical Information of China (English)

    2011-01-01

    <正>Centre for Agriculture and Bioscience International(CABI)is a not-for-profit international Agricultural Information Institute with headquarters in Britain. It aims to improve people s lives by providing information and applying scientific expertise to solve problems in agriculture and the environment. CABI Full-text is one of the publishing products of CABI.

  3. Text as Image.

    Science.gov (United States)

    Woal, Michael; Corn, Marcia Lynn

    As electronically mediated communication becomes more prevalent, print is regaining the original pictorial qualities which graphemes (written signs) lost when primitive pictographs (or picture writing) and ideographs (simplified graphemes used to communicate ideas as well as to represent objects) evolved into first written, then printed, texts of…

  4. Text Mining for Neuroscience

    Science.gov (United States)

    Tirupattur, Naveen; Lapish, Christopher C.; Mukhopadhyay, Snehasis

    2011-06-01

    Text mining, sometimes alternately referred to as text analytics, refers to the process of extracting high-quality knowledge from the analysis of textual data. Text mining has wide variety of applications in areas such as biomedical science, news analysis, and homeland security. In this paper, we describe an approach and some relatively small-scale experiments which apply text mining to neuroscience research literature to find novel associations among a diverse set of entities. Neuroscience is a discipline which encompasses an exceptionally wide range of experimental approaches and rapidly growing interest. This combination results in an overwhelmingly large and often diffuse literature which makes a comprehensive synthesis difficult. Understanding the relations or associations among the entities appearing in the literature not only improves the researchers current understanding of recent advances in their field, but also provides an important computational tool to formulate novel hypotheses and thereby assist in scientific discoveries. We describe a methodology to automatically mine the literature and form novel associations through direct analysis of published texts. The method first retrieves a set of documents from databases such as PubMed using a set of relevant domain terms. In the current study these terms yielded a set of documents ranging from 160,909 to 367,214 documents. Each document is then represented in a numerical vector form from which an Association Graph is computed which represents relationships between all pairs of domain terms, based on co-occurrence. Association graphs can then be subjected to various graph theoretic algorithms such as transitive closure and cycle (circuit) detection to derive additional information, and can also be visually presented to a human researcher for understanding. In this paper, we present three relatively small-scale problem-specific case studies to demonstrate that such an approach is very successful in

  5. Toponym Resolution in Text

    OpenAIRE

    Leidner, Jochen Lothar

    2007-01-01

    Background. In the area of Geographic Information Systems (GIS), a shared discipline between informatics and geography, the term geo-parsing is used to describe the process of identifying names in text, which in computational linguistics is known as named entity recognition and classification (NERC). The term geo-coding is used for the task of mapping from implicitly geo-referenced datasets (such as structured address records) to explicitly geo-referenced representations (e.g.,...

  6. Reading Text While Driving

    OpenAIRE

    Liang, Yulan; Horrey, William J.; Hoffman, Joshua D.

    2015-01-01

    Objective In this study, we investigated how drivers adapt secondary-task initiation and time-sharing behavior when faced with fluctuating driving demands. Background Reading text while driving is particularly detrimental; however, in real-world driving, drivers actively decide when to perform the task. Method In a test track experiment, participants were free to decide when to read messages while driving along a straight road consisting of an area with increased driving demands (demand zone)...

  7. Circular letter from January 22, 2004 to the presidents of companies having the status of chartered storage facility; Lettre circulaire du 22 janvier 2004 a Messieurs les presidents de societes titulaires du statut d'entrepositaire agree

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2004-07-01

    This circular letter is intended for owners of storage facilities for petroleum products benefiting from the obligation of strategic storage according to the article 2 of law no 92-1443 from December 31, 1992. The attached document recalls the reasons and content of this obligation, the prevailing strategic storage rules in France (reference texts, products in concern, operators, stockpiles localization, product substitution possibilities..), the monthly declarations, the controls and sanctions, the annual plan of stocks localization, the obligation of information, the loss of chartered status or the renouncement. A schematic synthesis of the system of stockpiles constitution is presented in appendix, for France and for the French overseas departements. The other appendixes concern: the list of petroleum products concerned by the legal obligation of strategic storage, the relations between the professional committee of strategic stockpiles (CPSSP) and the anonymous society of security stocks management (SAGESS), and some examples of monthly and annual declaration forms. (J.S.)

  8. Text and Music Revisited

    OpenAIRE

    Fornäs, Johan

    1997-01-01

    Are words and music two separate symbolic modes, or rather variants of the same human symbolic practice? Are they parallel, opposing or over­lap­ping? What do they have in common and how does each of them exceed the other? Is music perhaps incomparably dif­fer­ent from words, or even their anti-verbal Other? Distinctions between text (in the verbal sense of units of words rather than in the wide sense of symbolic webs in general) and music are regularly made – but also prob­lem­atized – withi...

  9. TRMM Gridded Text Products

    Science.gov (United States)

    Stocker, Erich Franz

    2007-01-01

    NASA's Tropical Rainfall Measuring Mission (TRMM) has many products that contain instantaneous or gridded rain rates often among many other parameters. However, these products because of their completeness can often seem intimidating to users just desiring surface rain rates. For example one of the gridded monthly products contains well over 200 parameters. It is clear that if only rain rates are desired, this many parameters might prove intimidating. In addition, for many good reasons these products are archived and currently distributed in HDF format. This also can be an inhibiting factor in using TRMM rain rates. To provide a simple format and isolate just the rain rates from the many other parameters, the TRMM product created a series of gridded products in ASCII text format. This paper describes the various text rain rate products produced. It provides detailed information about parameters and how they are calculated. It also gives detailed format information. These products are used in a number of applications with the TRMM processing system. The products are produced from the swath instantaneous rain rates and contain information from the three major TRMM instruments: radar, radiometer, and combined. They are simple to use, human readable, and small for downloading.

  10. Weitere Texte physiognomischen Inhalts

    Directory of Open Access Journals (Sweden)

    Böck, Barbara

    2004-12-01

    Full Text Available The present article offers the edition of three cuneiform texts belonging to the Akkadian handbook of omens drawn from the physical appearance as well as the morals and behaviour of man. The book comprising up to 27 chapters with more than 100 omens each was entitled in antiquity Alamdimmû. The edition of the three cuneiform tablets completes, thus, the author's monographic study on the ancient Mesopotamian divinatory discipline of physiognomy (Die babylonisch-assyrische Morphoskopie (Wien 2000 [=AfO Beih. 27].

    En este artículo se presenta la editio princeps de tres textos cuneiformes conservados en el British Museum (Londres y el Vorderasiatisches Museum (Berlín, que pertenecen al libro asirio-babilonio de presagios fisiognómicos. Este libro, titulado originalmente Alamdimmû ('forma, figura', consta de 27 capítulos, cada uno con más de cien presagios escritos en lengua acadia. Los tres textos completan así el estudio monográfico de la autora sobre la disciplina adivinatoria de la fisiognomía en el antiguo Oriente (Die babylonisch-assyrische Morphoskopie (Wien 2000 [=AfO Beih. 27].

  11. Weaving with text

    DEFF Research Database (Denmark)

    Hagedorn-Rasmussen, Peter

    This paper explores how a school principal by means of practical authorship creates reservoirs of language that provide a possible context for collective sensemaking. The paper draws upon a field study in which a school principal, and his managerial team, was shadowed in a period of intensive cha...... changes. The paper explores how the manager weaves with text, extracted from stakeholders, administration, politicians, employees, public discourse etc., as a means of creating a new fabric, a texture, of diverse perspectives that aims for collective sensemaking.......This paper explores how a school principal by means of practical authorship creates reservoirs of language that provide a possible context for collective sensemaking. The paper draws upon a field study in which a school principal, and his managerial team, was shadowed in a period of intensive...

  12. Documents and legal texts

    International Nuclear Information System (INIS)

    This section reprints a selection of recently published legislative texts and documents: - Russian Federation: Federal Law No.170 of 21 November 1995 on the use of atomic energy, Adopted by the State Duma on 20 October 1995; - Uruguay: Law No.19.056 On the Radiological Protection and Safety of Persons, Property and the Environment (4 January 2013); - Japan: Third Supplement to Interim Guidelines on Determination of the Scope of Nuclear Damage resulting from the Accident at the Tokyo Electric Power Company Fukushima Daiichi and Daini Nuclear Power Plants (concerning Damages related to Rumour-Related Damage in the Agriculture, Forestry, Fishery and Food Industries), 30 January 2013; - France and the United States: Joint Statement on Liability for Nuclear Damage (Aug 2013); - Franco-Russian Nuclear Power Declaration (1 November 2013)

  13. Interconnectedness und digitale Texte

    Directory of Open Access Journals (Sweden)

    Detlev Doherr

    2013-04-01

    Full Text Available Zusammenfassung Die multimedialen Informationsdienste im Internet werden immer umfangreicher und umfassender, wobei auch die nur in gedruckter Form vorliegenden Dokumente von den Bibliotheken digitalisiert und ins Netz gestellt werden. Über Online-Dokumentenverwaltungen oder Suchmaschinen können diese Dokumente gefunden und dann in gängigen Formaten wie z.B. PDF bereitgestellt werden. Dieser Artikel beleuchtet die Funktionsweise der Humboldt Digital Library, die seit mehr als zehn Jahren Dokumente von Alexander von Humboldt in englischer Übersetzung im Web als HDL (Humboldt Digital Library kostenfrei zur Verfügung stellt. Anders als eine digitale Bibliothek werden dabei allerdings nicht nur digitalisierte Dokumente als Scan oder PDF bereitgestellt, sondern der Text als solcher und in vernetzter Form verfügbar gemacht. Das System gleicht damit eher einem Informationssystem als einer digitalen Bibliothek, was sich auch in den verfügbaren Funktionen zur Auffindung von Texten in unterschiedlichen Versionen und Übersetzungen, Vergleichen von Absätzen verschiedener Dokumente oder der Darstellung von Bilden in ihrem Kontext widerspiegelt. Die Entwicklung von dynamischen Hyperlinks auf der Basis der einzelnen Textabsätze der Humboldt‘schen Werke in Form von Media Assets ermöglicht eine Nutzung der Programmierschnittstelle von Google Maps zur geographischen wie auch textinhaltlichen Navigation. Über den Service einer digitalen Bibliothek hinausgehend, bietet die HDL den Prototypen eines mehrdimensionalen Informationssystems, das mit dynamischen Strukturen arbeitet und umfangreiche thematische Auswertungen und Vergleiche ermöglicht. Summary The multimedia information services on Internet are becoming more and more comprehensive, even the printed documents are digitized and republished as digital Web documents by the libraries. Those digital files can be found by search engines or management tools and provided as files in usual formats as

  14. Documents and legal texts

    International Nuclear Information System (INIS)

    This section treats of the following Documents and legal texts: 1 - Canada: Nuclear Liability and Compensation Act (An Act respecting civil liability and compensation for damage in case of a nuclear incident, repealing the Nuclear Liability Act and making consequential amendments to other acts); 2 - Japan: Act on Compensation for Nuclear Damage (The purpose of this act is to protect persons suffering from nuclear damage and to contribute to the sound development of the nuclear industry by establishing a basic system regarding compensation in case of nuclear damage caused by reactor operation etc.); Act on Indemnity Agreements for Compensation of Nuclear Damage; 3 - Slovak Republic: Act on Civil Liability for Nuclear Damage and on its Financial Coverage and on Changes and Amendments to Certain Laws (This Act regulates: a) The civil liability for nuclear damage incurred in the causation of a nuclear incident, b) The scope of powers of the Nuclear Regulatory Authority (hereinafter only as the 'Authority') in relation to the application of this Act, c) The competence of the National Bank of Slovakia in relation to the supervised financial market entities in the financial coverage of liability for nuclear damage; and d) The penalties for violation of this Act)

  15. A Survey on Web Text Information Retrieval in Text Mining

    OpenAIRE

    Tapaswini Nayak; Srinivash Prasad; Manas Ranjan Senapat

    2015-01-01

    In this study we have analyzed different techniques for information retrieval in text mining. The aim of the study is to identify web text information retrieval. Text mining almost alike to analytics, which is a process of deriving high quality information from text. High quality information is typically derived in the course of the devising of patterns and trends through means such as statistical pattern learning. Typical text mining tasks include text categorization, text clustering, concep...

  16. Classroom Texting in College Students

    Science.gov (United States)

    Pettijohn, Terry F.; Frazier, Erik; Rieser, Elizabeth; Vaughn, Nicholas; Hupp-Wilds, Bobbi

    2015-01-01

    A 21-item survey on texting in the classroom was given to 235 college students. Overall, 99.6% of students owned a cellphone and 98% texted daily. Of the 138 students who texted in the classroom, most texted friends or significant others, and indicate the reason for classroom texting is boredom or work. Students who texted sent a mean of 12.21…

  17. Short Text Classification: A Survey

    Directory of Open Access Journals (Sweden)

    Ge Song

    2014-05-01

    Full Text Available With the recent explosive growth of e-commerce and online communication, a new genre of text, short text, has been extensively applied in many areas. So many researches focus on short text mining. It is a challenge to classify the short text owing to its natural characters, such as sparseness, large-scale, immediacy, non-standardization. It is difficult for traditional methods to deal with short text classification mainly because too limited words in short text cannot represent the feature space and the relationship between words and documents. Several researches and reviews on text classification are shown in recent times. However, only a few of researches focus on short text classification. This paper discusses the characters of short text and the difficulty of short text classification. Then we introduce the existing popular works on short text classifiers and models, including short text classification using sematic analysis, semi-supervised short text classification, ensemble short text classification, and real-time classification. The evaluations of short text classification are analyzed in our paper. Finally we summarize the existing classification technology and prospect for development trend of short text classification

  18. The Challenge of Challenging Text

    Science.gov (United States)

    Shanahan, Timothy; Fisher, Douglas; Frey, Nancy

    2012-01-01

    The Common Core State Standards emphasize the value of teaching students to engage with complex text. But what exactly makes a text complex, and how can teachers help students develop their ability to learn from such texts? The authors of this article discuss five factors that determine text complexity: vocabulary, sentence structure, coherence,…

  19. Text-Attentional Convolutional Neural Network for Scene Text Detection.

    Science.gov (United States)

    He, Tong; Huang, Weilin; Qiao, Yu; Yao, Jian

    2016-06-01

    Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images. They extract a high-level feature globally computed from a whole image component (patch), where the cluttered background information may dominate true text features in the deep representation. This leads to less discriminative power and poorer robustness. In this paper, we present a new system for scene text detection by proposing a novel text-attentional convolutional neural network (Text-CNN) that particularly focuses on extracting text-related regions and features from the image components. We develop a new learning mechanism to train the Text-CNN with multi-level and rich supervised information, including text region mask, character label, and binary text/non-text information. The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components. The training process is formulated as a multi-task learning problem, where low-level supervised information greatly facilitates the main task of text/non-text classification. In addition, a powerful low-level detector called contrast-enhancement maximally stable extremal regions (MSERs) is developed, which extends the widely used MSERs by enhancing intensity contrast between text patterns and background. This allows it to detect highly challenging text patterns, resulting in a higher recall. Our approach achieved promising results on the ICDAR 2013 data set, with an F-measure of 0.82, substantially improving the state-of-the-art results. PMID:27093723

  20. Text Classification using Data Mining

    CERN Document Server

    Kamruzzaman, S M; Hasan, Ahmed Ryadh

    2010-01-01

    Text classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Existing supervised learning algorithms to automatically classify text need sufficient documents to learn accurately. This paper presents a new algorithm for text classification using data mining that requires fewer documents for training. Instead of using words, word relation i.e. association rules from these words is used to derive feature set from pre-classified text documents. The concept of Naive Bayes classifier is then used on derived features and finally only a single concept of Genetic Algorithm has been added for final classification. A system based on the...

  1. Text Classification using Artificial Intelligence

    CERN Document Server

    Kamruzzaman, S M

    2010-01-01

    Text classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Existing supervised learning algorithms for classifying text need sufficient documents to learn accurately. This paper presents a new algorithm for text classification using artificial intelligence technique that requires fewer documents for training. Instead of using words, word relation i.e. association rules from these words is used to derive feature set from pre-classified text documents. The concept of na\\"ive Bayes classifier is then used on derived features and finally only a single concept of genetic algorithm has been added for final classification. A syste...

  2. Text Mining Infrastructure in R

    OpenAIRE

    Kurt Hornik; Ingo Feinerer; David Meyer

    2008-01-01

    During the last decade text mining has become a widely used discipline utilizing statistical and machine learning methods. We present the tm package which provides a framework for text mining applications within R. We give a survey on text mining facilities in R and explain how typical application tasks can be carried out using our framework. We present techniques for count-based analysis methods, text clustering, text classiffication and string kernels. (authors' abstract)

  3. Text analysis devices, articles of manufacture, and text analysis methods

    Science.gov (United States)

    Turner, Alan E; Hetzler, Elizabeth G; Nakamura, Grant C

    2013-05-28

    Text analysis devices, articles of manufacture, and text analysis methods are described according to some aspects. In one aspect, a text analysis device includes processing circuitry configured to analyze initial text to generate a measurement basis usable in analysis of subsequent text, wherein the measurement basis comprises a plurality of measurement features from the initial text, a plurality of dimension anchors from the initial text and a plurality of associations of the measurement features with the dimension anchors, and wherein the processing circuitry is configured to access a viewpoint indicative of a perspective of interest of a user with respect to the analysis of the subsequent text, and wherein the processing circuitry is configured to use the viewpoint to generate the measurement basis.

  4. Text-Attentional Convolutional Neural Network for Scene Text Detection

    Science.gov (United States)

    He, Tong; Huang, Weilin; Qiao, Yu; Yao, Jian

    2016-06-01

    Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images. They extract a high-level feature computed globally from a whole image component (patch), where the cluttered background information may dominate true text features in the deep representation. This leads to less discriminative power and poorer robustness. In this work, we present a new system for scene text detection by proposing a novel Text-Attentional Convolutional Neural Network (Text-CNN) that particularly focuses on extracting text-related regions and features from the image components. We develop a new learning mechanism to train the Text-CNN with multi-level and rich supervised information, including text region mask, character label, and binary text/nontext information. The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components. The training process is formulated as a multi-task learning problem, where low-level supervised information greatly facilitates main task of text/non-text classification. In addition, a powerful low-level detector called Contrast- Enhancement Maximally Stable Extremal Regions (CE-MSERs) is developed, which extends the widely-used MSERs by enhancing intensity contrast between text patterns and background. This allows it to detect highly challenging text patterns, resulting in a higher recall. Our approach achieved promising results on the ICDAR 2013 dataset, with a F-measure of 0.82, improving the state-of-the-art results substantially.

  5. Contrastive Study of Coherence in Chinese Text and English Text

    Institute of Scientific and Technical Information of China (English)

    王婷

    2013-01-01

    The paper presents the text-linguistic concepts on which the analysis of textual structure is based including text and discourse, coherence and cohesive. In addition we try to discover different manifestations of text between ET and CT, including different coherent structures.

  6. Text mining from ontology learning to automated text processing applications

    CERN Document Server

    Biemann, Chris

    2014-01-01

    This book comprises a set of articles that specify the methodology of text mining, describe the creation of lexical resources in the framework of text mining and use text mining for various tasks in natural language processing (NLP). The analysis of large amounts of textual data is a prerequisite to build lexical resources such as dictionaries and ontologies and also has direct applications in automated text processing in fields such as history, healthcare and mobile applications, just to name a few. This volume gives an update in terms of the recent gains in text mining methods and reflects

  7. Working with text tools, techniques and approaches for text mining

    CERN Document Server

    Tourte, Gregory J L

    2016-01-01

    Text mining tools and technologies have long been a part of the repository world, where they have been applied to a variety of purposes, from pragmatic aims to support tools. Research areas as diverse as biology, chemistry, sociology and criminology have seen effective use made of text mining technologies. Working With Text collects a subset of the best contributions from the 'Working with text: Tools, techniques and approaches for text mining' workshop, alongside contributions from experts in the area. Text mining tools and technologies in support of academic research include supporting research on the basis of a large body of documents, facilitating access to and reuse of extant work, and bridging between the formal academic world and areas such as traditional and social media. Jisc have funded a number of projects, including NaCTem (the National Centre for Text Mining) and the ResDis programme. Contents are developed from workshop submissions and invited contributions, including: Legal considerations in te...

  8. Author Gender Identification from Text

    OpenAIRE

    Rezaei, Atoosa Mohammad

    2014-01-01

    ABSTRACT: The identification of an author's gender from a text has become a popular research area within the scope of text categorization. The number of users of social network applications based on text, such as Twitter, Facebook and text messaging services, has grown rapidly over the past few decades. As a result, text has become one of the most important and prevalent media types on the Internet. This thesis aims to determine the gender of an author from an arbitrary piece of text such as,...

  9. Text mining: A Brief survey

    OpenAIRE

    Falguni N. Patel , Neha R. Soni

    2012-01-01

    The unstructured texts which contain massive amount of information cannot simply be used for further processing by computers. Therefore, specific processing methods and algorithms are required in order to extract useful patterns. The process of extracting interesting information and knowledge from unstructured text completed by using Text mining. In this paper, we have discussed text mining, as a recent and interesting field with the detail of steps involved in the overall process. We have...

  10. Text Association Analysis and Ambiguity in Text Mining

    Science.gov (United States)

    Bhonde, S. B.; Paikrao, R. L.; Rahane, K. U.

    2010-11-01

    Text Mining is the process of analyzing a semantically rich document or set of documents to understand the content and meaning of the information they contain. The research in Text Mining will enhance human's ability to process massive quantities of information, and it has high commercial values. Firstly, the paper discusses the introduction of TM its definition and then gives an overview of the process of text mining and the applications. Up to now, not much research in text mining especially in concept/entity extraction has focused on the ambiguity problem. This paper addresses ambiguity issues in natural language texts, and presents a new technique for resolving ambiguity problem in extracting concept/entity from texts. In the end, it shows the importance of TM in knowledge discovery and highlights the up-coming challenges of document mining and the opportunities it offers.

  11. Predicting Prosody from Text for Text-to-Speech Synthesis

    CERN Document Server

    Rao, K Sreenivasa

    2012-01-01

    Predicting Prosody from Text for Text-to-Speech Synthesis covers the specific aspects of prosody, mainly focusing on how to predict the prosodic information from linguistic text, and then how to exploit the predicted prosodic knowledge for various speech applications. Author K. Sreenivasa Rao discusses proposed methods along with state-of-the-art techniques for the acquisition and incorporation of prosodic knowledge for developing speech systems. Positional, contextual and phonological features are proposed for representing the linguistic and production constraints of the sound units present in the text. This book is intended for graduate students and researchers working in the area of speech processing.

  12. Monitoring interaction and collective text production through text mining

    Directory of Open Access Journals (Sweden)

    Macedo, Alexandra Lorandi

    2014-04-01

    Full Text Available This article presents the Concepts Network tool, developed using text mining technology. The main objective of this tool is to extract and relate terms of greatest incidence from a text and exhibit the results in the form of a graph. The Network was implemented in the Collective Text Editor (CTE which is an online tool that allows the production of texts in synchronized or non-synchronized forms. This article describes the application of the Network both in texts produced collectively and texts produced in a forum. The purpose of the tool is to offer support to the teacher in managing the high volume of data generated in the process of interaction amongst students and in the construction of the text. Specifically, the aim is to facilitate the teacher’s job by allowing him/her to process data in a shorter time than is currently demanded. The results suggest that the Concepts Network can aid the teacher, as it provides indicators of the quality of the text produced. Moreover, messages posted in forums can be analyzed without their content necessarily having to be pre-read.

  13. A Survey on Web Text Information Retrieval in Text Mining

    Directory of Open Access Journals (Sweden)

    Tapaswini Nayak

    2015-08-01

    Full Text Available In this study we have analyzed different techniques for information retrieval in text mining. The aim of the study is to identify web text information retrieval. Text mining almost alike to analytics, which is a process of deriving high quality information from text. High quality information is typically derived in the course of the devising of patterns and trends through means such as statistical pattern learning. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, creation of coarse taxonomies, sentiment analysis, document summarization and entity relation modeling. It is used to mine hidden information from not-structured or semi-structured data. This feature is necessary because a large amount of the Web information is semi-structured due to the nested structure of HTML code, is linked and is redundant. Web content categorization with a content database is the most important tool to the efficient use of search engines. A customer requesting information on a particular subject or item would otherwise have to search through hundred of results to find the most relevant information to his query. Hundreds of results through use of mining text are reduced by this step. This eliminates the aggravation and improves the navigation of information on the Web.

  14. Text comprehension practice in school

    Directory of Open Access Journals (Sweden)

    Hernández, José Emilio

    2010-01-01

    Full Text Available The starting point of the study is the existence of relations between the two dimensions of text compression: the instrumental dimension and the cognitive dimension. The first one includes the system of actions, the second one the system of knowledge. A description of identifying, describing, inferring apprising and creating actions are suggested for each type of text. Likewise, the importance of implementing text comprehension is outlined on the basis of the assumption that the text is a tool for preserving and communicating culture, that allows human beings to wide their respective cultural horizons and develop cognitive and affective process that allow them to get universal morals.

  15. Exploring lexical patterns in text

    OpenAIRE

    Teich, Elke; Fankhauser, Peter

    2005-01-01

    We present a system for the linguistic exploration and analysis of lexical cohesion in English texts. Using an electronic thesaurus-like resource, Princeton WordNet, and the Brown Corpus of English, we have implemented a process of annotating text with lexical chains and a graphical user interface for inspection of the annotated text. We describe the system and report on some sample linguistic analyses carried out using the combined thesaurus-corpus resource.

  16. Text Mining Applications and Theory

    CERN Document Server

    Berry, Michael W

    2010-01-01

    Text Mining: Applications and Theory presents the state-of-the-art algorithms for text mining from both the academic and industrial perspectives.  The contributors span several countries and scientific domains: universities, industrial corporations, and government laboratories, and demonstrate the use of techniques from machine learning, knowledge discovery, natural language processing and information retrieval to design computational models for automated text analysis and mining. This volume demonstrates how advancements in the fields of applied mathematics, computer science, machine learning

  17. Text Type and Translation Strategy

    Institute of Scientific and Technical Information of China (English)

    刘福娟

    2015-01-01

    Translation strategy and translation standards are undoubtedly the core problems translators are confronted with in translation. There have arisen many kinds of translation strategies in translation history, among which the text type theory is considered an important breakthrough and a significant complement of traditional translation standards. This essay attempts to demonstrate the value of text typology (informative, expressive, and operative) to translation strategy, emphasizing the importance of text types and their communicative functions.

  18. Typesafe Modeling in Text Mining

    OpenAIRE

    Steeg, Fabian

    2011-01-01

    Based on the concept of annotation-based agents, this report introduces tools and a formal notation for defining and running text mining experiments using a statically typed domain-specific language embedded in Scala. Using machine learning for classification as an example, the framework is used to develop and document text mining experiments, and to show how the concept of generic, typesafe annotation corresponds to a general information model that goes beyond text processing.

  19. Knowledge Representation in Travelling Texts

    DEFF Research Database (Denmark)

    Mousten, Birthe; Locmele, Gunta

    2014-01-01

    and the purpose of the text in a new context as well as on predefined parameters for text travel. For texts used in marketing and in technology, the question is whether culture-bound knowledge representation should be domesticated or kept as foreign elements, or should be mirrored or moulded—or should not travel......Today, information travels fast. Texts travel, too. In a corporate context, the question is how to manage which knowledge elements should travel to a new language area or market and in which form? The decision to let knowledge elements travel or not travel highly depends on the limitation...

  20. Improve Reading with Complex Texts

    Science.gov (United States)

    Fisher, Douglas; Frey, Nancy

    2015-01-01

    The Common Core State Standards have cast a renewed light on reading instruction, presenting teachers with the new requirements to teach close reading of complex texts. Teachers and administrators should consider a number of essential features of close reading: They are short, complex texts; rich discussions based on worthy questions; revisiting…

  1. Strategies for Translating Vocative Texts

    Directory of Open Access Journals (Sweden)

    Olga COJOCARU

    2014-12-01

    Full Text Available The paper deals with the linguistic and cultural elements of vocative texts and the techniques used in translating them by giving some examples of texts that are typically vocative (i.e. advertisements and instructions for use. Semantic and communicative strategies are popular in translation studies and each of them has its own advantages and disadvantages in translating vocative texts. The advantage of semantic translation is that it takes more account of the aesthetic value of the SL text, while communicative translation attempts to render the exact contextual meaning of the original text in such a way that both content and language are readily acceptable and comprehensible to the readership. Focus is laid on the strategies used in translating vocative texts, strategies that highlight and introduce a cultural context to the target audience, in order to achieve their overall purpose, that is to sell or persuade the reader to behave in a certain way. Thus, in order to do that, a number of advertisements from the field of cosmetics industry and electronic gadgets were selected for analysis. The aim is to gather insights into vocative text translation and to create new perspectives on this field of research, now considered a process of innovation and diversion, especially in areas as important as economy and marketing.

  2. Linguistic Dating of Biblical Texts

    DEFF Research Database (Denmark)

    Ehrensvärd, Martin Gustaf

    2003-01-01

    the chronology of the texts established by other means: the Hebrew of Genesis-2 Kings was judged to be early and that of Esther, Daniel, Ezra, Nehemiah, and Chronicles to be late. In the current debate where revisionists have questioned the traditional dating, linguistic arguments in the dating of texts have......For two centuries, scholars have pointed to consistent differences in the Hebrew of certain biblical texts and interpreted these differences as reflecting the date of composition of the texts. Until the 1980s, this was quite uncontroversial as the linguistic findings largely confirmed...... come more into focus. The study critically examines some linguistic arguments adduced to support the traditional position, and reviewing the arguments it points to weaknesses in the linguistic dating of EBH texts to pre-exilic times. When viewing the linguistic evidence in isolation it will be clear...

  3. Text Analytics to Data Warehousing

    Directory of Open Access Journals (Sweden)

    Kalli Srinivasa Nageswara Prasad

    2010-09-01

    Full Text Available Information hidden or stored in unstructured data can play a critical role in making decisions, understanding and conducting other business functions. Integrating data stored in both structured and unstructured formats can add significant value to an organization. With the extent of development happening in Text Mining and technologies to deal with unstructured and semi structured data like XML and MML(Mining Markup Language to extract and analyze data, textanalytics has evolved to handle unstructured data to helps unlock and predict business results via Business Intelligence and Data Warehousing. Text mining involves dealing with texts in documents and discovering hidden patterns, but Text Analytics enhances InformationRetrieval in form of search and enabling clustering of results and more over Text Analytics is text mining and visualization. In this paper we would discuss on handling unstructured data that are in documents so that they fit into business applications like Data Warehouses for further analysis and it helps in the framework we have used for the solution.

  4. Biomarker Identification Using Text Mining

    Directory of Open Access Journals (Sweden)

    Hui Li

    2012-01-01

    Full Text Available Identifying molecular biomarkers has become one of the important tasks for scientists to assess the different phenotypic states of cells or organisms correlated to the genotypes of diseases from large-scale biological data. In this paper, we proposed a text-mining-based method to discover biomarkers from PubMed. First, we construct a database based on a dictionary, and then we used a finite state machine to identify the biomarkers. Our method of text mining provides a highly reliable approach to discover the biomarkers in the PubMed database.

  5. Outer Texts in Bilingual Dictionaries

    Directory of Open Access Journals (Sweden)

    Rufus H. Gouws

    2011-10-01

    Full Text Available

    Abstract: Dictionaries often display a central list bias with little or no attention to the use ofouter texts. This article focuses on dictionaries as text compounds and carriers of different texttypes. Utilising either a partial or a complete frame structure, a variety of outer text types can beused to enhance the data distribution structure of a dictionary and to ensure a better informationretrieval by the intended target user. A distinction is made between primary frame structures andsecondary frame structures and attention is drawn to the use of complex outer texts and the need ofan extended complex outer text with its own table of contents to guide the user to the relevant textsin the complex outer text. It is emphasised that outer texts need to be planned in a meticulous wayand that they should participate in the lexicographic functions of the specific dictionary, bothknowledge-orientated and communication-orientated functions, to ensure a transtextual functionalapproach.

    Keywords: BACK MATTER, CENTRAL LIST, COMMUNICATION-ORIENTATED FUNCTIONS,COMPLEX TEXT, CULTURAL DATA, EXTENDED COMPLEX TEXT, EXTENDED TEXTS,FRONT MATTER, FRAME STRUCTURE, KNOWLEDGE-ORIENTATED FUNCTIONS, LEXICOGRAPHICFUNCTIONS, OUTER TEXTS, PRIMARY FRAME, SECONDARY FRAME

    Opsomming: Buitetekste in tweetalige woordeboeke. Woordeboeke vertoondikwels 'n partydigheid ten gunste van die sentrale lys met min of geen aandag aan die buitetekstenie. Hierdie artikel fokus op woordeboeke as tekssamestellings en draers van verskillende tekssoorte.Met die benutting van óf 'n gedeeltelike óf 'n volledige raamstruktuur kan 'n verskeidenheidbuitetekste aangewend word om die dataverspreidingstruktuur van 'n woordeboek te verbeteren om 'n beter herwinning van inligting deur die teikengebruiker te verseker. 'n Onderskeidword gemaak tussen primêre en sekondêre raamstrukture en die aandag word gevestig op kompleksebuitetekste en die behoefte aan 'n uitgebreide komplekse

  6. Why is Light Text Harder to Read Than Dark Text?

    Science.gov (United States)

    Scharff, Lauren V.; Ahumada, Albert J.

    2005-01-01

    Scharff and Ahumada (2002, 2003) measured text legibility for light text and dark text. For paragraph readability and letter identification, responses to light text were slower and less accurate for a given contrast. Was this polarity effect (1) an artifact of our apparatus, (2) a physiological difference in the separate pathways for positive and negative contrast or (3) the result of increased experience with dark text on light backgrounds? To rule out the apparatus-artifact hypothesis, all data were collected on one monitor. Its luminance was measured at all levels used, and the spatial effects of the monitor were reduced by pixel doubling and quadrupling (increasing the viewing distance to maintain constant angular size). Luminances of vertical and horizontal square-wave gratings were compared to assess display speed effects. They existed, even for 4-pixel-wide bars. Tests for polarity asymmetries in display speed were negative. Increased experience might develop full letter templates for dark text, while recognition of light letters is based on component features. Earlier, an observer ran all conditions at one polarity and then switched. If dark and light letters were intermixed, the observer might use component features on all trials and do worse on the dark letters, reducing the polarity effect. We varied polarity blocking (completely blocked, alternating smaller blocks, and intermixed blocks). Letter identification responses times showed polarity effects at all contrasts and display resolution levels. Observers were also more accurate with higher contrasts and more pixels per degree. Intermixed blocks increased the polarity effect by reducing performance on the light letters, but only if the randomized block occurred prior to the nonrandomized block. Perhaps observers tried to use poorly developed templates, or they did not work as hard on the more difficult items. The experience hypothesis and the physiological gain hypothesis remain viable explanations.

  7. Stemming Malay Text and Its Application in Automatic Text Categorization

    Science.gov (United States)

    Yasukawa, Michiko; Lim, Hui Tian; Yokoo, Hidetoshi

    In Malay language, there are no conjugations and declensions and affixes have important grammatical functions. In Malay, the same word may function as a noun, an adjective, an adverb, or, a verb, depending on its position in the sentence. Although extensively simple root words are used in informal conversations, it is essential to use the precise words in formal speech or written texts. In Malay, to make sentences clear, derivative words are used. Derivation is achieved mainly by the use of affixes. There are approximately a hundred possible derivative forms of a root word in written language of the educated Malay. Therefore, the composition of Malay words may be complicated. Although there are several types of stemming algorithms available for text processing in English and some other languages, they cannot be used to overcome the difficulties in Malay word stemming. Stemming is the process of reducing various words to their root forms in order to improve the effectiveness of text processing in information systems. It is essential to avoid both over-stemming and under-stemming errors. We have developed a new Malay stemmer (stemming algorithm) for removing inflectional and derivational affixes. Our stemmer uses a set of affix rules and two types of dictionaries: a root-word dictionary and a derivative-word dictionary. The use of set of rules is aimed at reducing the occurrence of under-stemming errors, while that of the dictionaries is believed to reduce the occurrence of over-stemming errors. We performed an experiment to evaluate the application of our stemmer in text mining software. For the experiment, text data used were actual web pages collected from the World Wide Web to demonstrate the effectiveness of our Malay stemming algorithm. The experimental results showed that our stemmer can effectively increase the precision of the extracted Boolean expressions for text categorization.

  8. Anomaly Detection with Text Mining

    Data.gov (United States)

    National Aeronautics and Space Administration — Many existing complex space systems have a significant amount of historical maintenance and problem data bases that are stored in unstructured text forms. The...

  9. Text Steganographic Approaches: A Comparison

    Directory of Open Access Journals (Sweden)

    Monika Agarwal

    2013-02-01

    Full Text Available This paper presents three novel approaches of text steganography. The first approach uses the theme ofmissing letter puzzle where each character of message is hidden by missing one or more letters in a wordof cover. The average Jaro score was found to be 0.95 indicating closer similarity between cover andstego file. The second approach hides a message in a wordlist where ASCII value of embedded characterdetermines length and starting letter of a word. The third approach conceals a message, withoutdegrading cover, by using start and end letter of words of the cover. For enhancing the security of secretmessage, the message is scrambled using one-time pad scheme before being concealed and cipher text isthen concealed in cover. We also present an empirical comparison of the proposed approaches with someof the popular text steganographic approaches and show that our approaches outperform the existingapproaches.

  10. System for Distributed Text Mining

    OpenAIRE

    Torgersen, Martin Nordseth

    2011-01-01

    Text mining presents us with new possibilities for the use of collections of documents.There exists a large amount of hidden implicit information inside these collection, which text mining techniques may help us to uncover. Unfortunately, these techniques generally requires large amounts of computational power. This is addressed by the introduction of distributed systems and methods for distributed processing, such as Hadoop and MapReduce.This thesis aims to describe, design, implement and ev...

  11. Text Mining in Social Networks

    Science.gov (United States)

    Aggarwal, Charu C.; Wang, Haixun

    Social networks are rich in various kinds of contents such as text and multimedia. The ability to apply text mining algorithms effectively in the context of text data is critical for a wide variety of applications. Social networks require text mining algorithms for a wide variety of applications such as keyword search, classification, and clustering. While search and classification are well known applications for a wide variety of scenarios, social networks have a much richer structure both in terms of text and links. Much of the work in the area uses either purely the text content or purely the linkage structure. However, many recent algorithms use a combination of linkage and content information for mining purposes. In many cases, it turns out that the use of a combination of linkage and content information provides much more effective results than a system which is based purely on either of the two. This paper provides a survey of such algorithms, and the advantages observed by using such algorithms in different scenarios. We also present avenues for future research in this area.

  12. Text segmentation with character-level text embeddings

    NARCIS (Netherlands)

    Chrupała, Grzegorz

    2013-01-01

    Learning word representations has recently seen much success in computational linguistics. However, assuming sequences of word tokens as input to linguistic analysis is often unjustified. For many languages word segmentation is a non-trivial task and naturally occurring text is sometimes a mixture o

  13. Analysing ESP Texts, but How?

    Directory of Open Access Journals (Sweden)

    Borza Natalia

    2015-03-01

    Full Text Available English as a second language (ESL teachers instructing general English and English for specific purposes (ESP in bilingual secondary schools face various challenges when it comes to choosing the main linguistic foci of language preparatory courses enabling non-native students to study academic subjects in English. ESL teachers intending to analyse English language subject textbooks written for secondary school students with the aim of gaining information about what bilingual secondary school students need to know in terms of language to process academic textbooks cannot avoiding deal with a dilemma. It needs to be decided which way it is most appropriate to analyse the texts in question. Handbooks of English applied linguistics are not immensely helpful with regard to this problem as they tend not to give recommendation as to which major text analytical approaches are advisable to follow in a pre-college setting. The present theoretical research aims to address this lacuna. Respectively, the purpose of this pedagogically motivated theoretical paper is to investigate two major approaches of ESP text analysis, the register and the genre analysis, in order to find the more suitable one for exploring the language use of secondary school subject texts from the point of view of an English as a second language teacher. Comparing and contrasting the merits and limitations of the two contrastive approaches allows for a better understanding of the nature of the two different perspectives of text analysis. The study examines the goals, the scope of analysis, and the achievements of the register perspective and those of the genre approach alike. The paper also investigates and reviews in detail the starkly different methods of ESP text analysis applied by the two perspectives. Discovering text analysis from a theoretical and methodological angle supports a practical aspect of English teaching, namely making an informed choice when setting out to analyse

  14. GPU-Accelerated Text Mining

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Mueller, Frank [North Carolina State University; Zhang, Yongpeng [ORNL; Potok, Thomas E [ORNL

    2009-01-01

    Accelerating hardware devices represent a novel promise for improving the performance for many problem domains but it is not clear for which domains what accelerators are suitable. While there is no room in general-purpose processor design to significantly increase the processor frequency, developers are instead resorting to multi-core chips duplicating conventional computing capabilities on a single die. Yet, accelerators offer more radical designs with a much higher level of parallelism and novel programming environments. This present work assesses the viability of text mining on CUDA. Text mining is one of the key concepts that has become prominent as an effective means to index the Internet, but its applications range beyond this scope and extend to providing document similarity metrics, the subject of this work. We have developed and optimized text search algorithms for GPUs to exploit their potential for massive data processing. We discuss the algorithmic challenges of parallelization for text search problems on GPUs and demonstrate the potential of these devices in experiments by reporting significant speedups. Our study may be one of the first to assess more complex text search problems for suitability for GPU devices, and it may also be one of the first to exploit and report on atomic instruction usage that have recently become available in NVIDIA devices.

  15. GPU-Accelerated Text Mining

    International Nuclear Information System (INIS)

    Accelerating hardware devices represent a novel promise for improving the performance for many problem domains but it is not clear for which domains what accelerators are suitable. While there is no room in general-purpose processor design to significantly increase the processor frequency, developers are instead resorting to multi-core chips duplicating conventional computing capabilities on a single die. Yet, accelerators offer more radical designs with a much higher level of parallelism and novel programming environments. This present work assesses the viability of text mining on CUDA. Text mining is one of the key concepts that has become prominent as an effective means to index the Internet, but its applications range beyond this scope and extend to providing document similarity metrics, the subject of this work. We have developed and optimized text search algorithms for GPUs to exploit their potential for massive data processing. We discuss the algorithmic challenges of parallelization for text search problems on GPUs and demonstrate the potential of these devices in experiments by reporting significant speedups. Our study may be one of the first to assess more complex text search problems for suitability for GPU devices, and it may also be one of the first to exploit and report on atomic instruction usage that have recently become available in NVIDIA devices

  16. Text Recognition from an Image

    Directory of Open Access Journals (Sweden)

    Shrinath Janvalkar

    2014-04-01

    Full Text Available To achieve high speed in data processing it is necessary to convert the analog data into digital data. Storage of hard copy of any document occupies large space and retrieving of information from that document is time consuming. Optical character recognition system is an effective way in recognition of printed character. It provides an easy way to recognize and convert the printed text on image into the editable text. It also increases the speed of data retrieval from the image. The image which contains characters can be scanned through scanner and then recognition engine of the OCR system interpret the images and convert images of printed characters into machine-readable characters [8].It improving the interface between man and machine in many applications

  17. Emotion Detection From Text Documents

    Directory of Open Access Journals (Sweden)

    Shiv Naresh Shivhare

    2014-11-01

    Full Text Available Emotion Detection is one of the most emerging issues in human computer interaction. A sufficient amount of work has been done by researchers to detect emotions from facial and audio information whereas recognizing emotions from textual data is still a fresh and hot research area. This paper presented a knowledge based survey on emotion detection based on textual data and the methods used for this purpose. At the next step paper also proposed a new architecture for recognizing emotions from text document.Proposed architecture is composed of two main parts, emotion ontology and emotion detector algorithm.Proposed emotion detector system takes a text document and the emotion ontology as inputs and produces one of the six emotion classes (i.e. love, joy, anger, sadness, fear and surprise as the output.

  18. Princess Brambilla - images/text

    Directory of Open Access Journals (Sweden)

    Maria Aparecida Barbosa

    2016-06-01

    Full Text Available Read the illustrated literary text is simultaneously think pictures and words. This articulation between the written text and pictures adds potential, expands and becomes complex. Coincides with nowadays discussions on Giorgio Agamben's "contemporary" that add to what adheres to respectively time the displacement and the distance needed to understand it, shakes linear notions of historical chronology. Somehow the coincidence is related to the current interest in the concept of "Nachleben" (survival, which assumes the images of the past ransom, postulated by the art historian Aby Warburg in a research on ancient art of motion characteristics in Renaissance pictures Botticelli's. For the translation of the Princesa Brambilla – um capriccio segundo Jakob Callot, de E. T. A. Hoffmann, com 8 gravuras cunhadas a partir de moldes originais de Callot (1820 to Portuguese such discussions were fundamental, as I try to present in this article.

  19. Fuzzy Swarm Based Text Summarization

    Directory of Open Access Journals (Sweden)

    Mohammed S. Binwahlan

    2009-01-01

    Full Text Available Problem statement: The aim of automatic text summarization systems is to select the most relevant information from an abundance of text sources. A daily rapid growth of data on the internet makes the achieve events of such aim a big challenge. Approach: In this study, we incorporated fuzzy logic with swarm intelligence; so that risks, uncertainty, ambiguity and imprecise values of choosing the features weights (scores could be flexibly tolerated. The weights obtained from the swarm experiment were used to adjust the text features scores and then the features scores were used as inputs for the fuzzy inference system to produce the final sentence score. The sentences were ranked in descending order based on their scores and then the top n sentences were selected as final summary. Results: The experiments showed that the incorporation of fuzzy logic with swarm intelligence could play an important role in the selection process of the most important sentences to be included in the final summary. Also the results showed that the proposed method got a good performance outperforming the swarm model and the benchmark methods. Conclusion: Incorporating more than one technique for dealing with the sentence scoring proved to be an effective mechanism. The PSO was employed for producing the text features weights. The purpose of this process was to emphasize on dealing with the text features fairly based on their importance and to differentiate between more and less important features. The fuzzy inference system was employed to determine the final sentence score, on which the decision was made to include the sentence in the summary or not.

  20. Ontological representation of texts, and its applicationsin text analysis

    OpenAIRE

    Solheim, Bent André; Vågsnes, Kristian

    2003-01-01

    For the management of a company, the need to know what people think of their products or services is becoming increasingly important in an increasingly competitive market. As the Internet can nearly be described as a digital mirror of events in the ”real“ world, being able to make sense of the semi structured nature of natural language texts published in this ubiquitous medium has received growing interest. The approach proposed in the thesis combines natural language processin...

  1. Cluster Based Text Classification Model

    DEFF Research Database (Denmark)

    Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock

    2011-01-01

    We propose a cluster based classification model for suspicious email detection and other text classification tasks. The text classification tasks comprise many training examples that require a complex classification model. Using clusters for classification makes the model simpler and increases......, the classifier is trained on each cluster having reduced dimensionality and less number of examples. The experimental results show that the proposed model outperforms the existing classification models for the task of suspicious email detection and topic categorization on the Reuters-21578 and 20 Newsgroups...... datasets. Our model also outperforms A Decision Cluster Classification (ADCC) and the Decision Cluster Forest Classification (DCFC) models on the Reuters-21578 dataset....

  2. Quality Inspection of Printed Texts

    DEFF Research Database (Denmark)

    Pedersen, Jesper Ballisager; Nasrollahi, Kamal; Moeslund, Thomas B.

    2016-01-01

    Inspecting the quality of printed texts has its own importance in many industrial applications. To do so, this paper proposes a grading system which evaluates the performance of the printing task using some quality measures for each character and symbols. The purpose of these grading system is two......-folded: for costumers of the printing and verification system, the overall grade used to verify if the text is of sufficient quality, while for printer's manufacturer, the detailed character/symbols grades and quality measurements are used for the improvement and optimization of the printing task. The proposed system...

  3. A Guide Text or Many Texts? "That is the Question”

    Directory of Open Access Journals (Sweden)

    Delgado de Valencia Sonia

    2001-08-01

    Full Text Available The use of supplementary materials in the classroom has always been an essential part of the teaching and learning process. To restrict our teaching to the scope of one single textbook means to stand behind the advances of knowledge, in any area and context. Young learners appreciate any new and varied support that expands their knowledge of the world: diaries, letters, panels, free texts, magazines, short stories, poems or literary excerpts, and articles taken from Internet are materials that will allow learnersto share more and work more collaboratively. In this article we are going to deal with some of these materials, with the criteria to select, adapt, and create them that may be of interest to the learner and that may promote reading and writing processes. Since no text can entirely satisfy the needs of students and teachers, the creativity of both parties will be necessary to improve the quality of teaching through the adequate use and adaptation of supplementary materials.

  4. Presentation of the math text

    OpenAIRE

    KREJČOVÁ, Iva

    2009-01-01

    The aim of this bachelor thesis is basic mapping out the mediums for creating mathematical texts and their presentation and the acquisition of basic user skills in the usage of these programs. These funds also compare in terms of availability and ease of use, their opportunities and quality of the output.

  5. Seductive Texts with Serious Intentions.

    Science.gov (United States)

    Nielsen, Harriet Bjerrum

    1995-01-01

    Debates whether a text claiming to have scientific value is using seduction irresponsibly at the expense of the truth, and discusses who is the subject and who is the object of such seduction. It argues that, rather than being an assault against scientific ethics, seduction is a necessary premise for a sensible conversation to take place. (GR)

  6. Values Education: Texts and Supplements.

    Science.gov (United States)

    Curriculum Review, 1979

    1979-01-01

    This column describes and evaluates almost 40 texts, instructional kits, and teacher resources on values, interpersonal relations, self-awareness, self-help skills, juvenile psychology, and youth suicide. Eight effective picture books for the primary grades and seven titles in values fiction for teens are also reviewed. (SJL)

  7. Comparison of Text Categorization Algorithms

    Institute of Scientific and Technical Information of China (English)

    SHI Yong-feng; ZHAO Yan-ping

    2004-01-01

    This paper summarizes several automatic text categorization algorithms in common use recently, analyzes and compares their advantages and disadvantages.It provides clues for making use of appropriate automatic classifying algorithms in different fields.Finally some evaluations and summaries of these algorithms are discussed, and directions to further research have been pointed out.

  8. Multilingual text induced spelling correction

    NARCIS (Netherlands)

    Reynaert, M.W.C.

    2004-01-01

    We present TISC, a multilingual, language-independent and context-sensitive spelling checking and correction system designed to facilitate the automatic removal of non-word spelling errors in large corpora. Its lexicon is derived from raw text corpora, without supervision, and contains word unigrams

  9. COMPENDEX/TEXT-PAC: CIS.

    Science.gov (United States)

    Standera, Oldrich

    This report evaluates the engineering information services provided by the University of Calgary since implementation of the COMPENDEX (tape service of Engineering Index, Inc.) service using the IBM TEXT-PAC system. Evaluation was made by a survey of the users of the Current Information Selection (CIS) service, the interaction between the system…

  10. On Text Realization Image Steganography

    Directory of Open Access Journals (Sweden)

    Dr. Mohammed Nasser Hussein Al-Turfi

    2012-02-01

    Full Text Available In this paper the steganography strategy is going to be implemented but in a different way from a different scope since the important data will neither be hidden in an image nor transferred through the communication channel inside an image, but on the contrary, a well known image will be used that exists on both sides of the channel and a text message contains important data will be transmitted. With the suitable operations, we can re-mix and re-make the source image. MATLAB7 is the program where the algorithm implemented on it, where the algorithm shows high ability for achieving the task to different type and size of images. Perfect reconstruction was achieved on the receiving side. But the most interesting is that the algorithm that deals with secured image transmission transmits no images at all

  11. Linguistic dating of biblical texts

    DEFF Research Database (Denmark)

    Young, Ian; Rezetko, Robert; Ehrensvärd, Martin Gustaf

    at the university or divinity school level, but also to scholars of the Hebrew Bible in general who have not been exposed to the full scope of issues. The book is useful to a wide range of readers by introducing topics at a basic level before entering into detailed discussion. Among the many issues discussed......Since the beginning of critical scholarship biblical texts have been dated using linguistic evidence. In recent years this has become a controversial topic, especially with the publication of Ian Young (ed.), Biblical Hebrew: Studies in Chronology and Typology (2003). However, until now there has...... been no introduction and comprehensive study of the field. Volume 1 introduces the field of linguistic dating of biblical texts, particularly to intermediate and advanced students of biblical Hebrew who have a reasonable background in the language, having completed at least an introductory course...

  12. Challenges in Kurdish Text Processing

    OpenAIRE

    Esmaili, Kyumars Sheykh

    2012-01-01

    Despite having a large number of speakers, the Kurdish language is among the less-resourced languages. In this work we highlight the challenges and problems in providing the required tools and techniques for processing texts written in Kurdish. From a high-level perspective, the main challenges are: the inherent diversity of the language, standardization and segmentation issues, and the lack of language resources.

  13. Psychologische Interpretation. Biographien, Texte, Tests

    OpenAIRE

    Fahrenberg, Jochen

    2002-01-01

    Biographien, Texte und Tests werden psychologisch interpretiert. Psychologische Interpretation wird als Übersetzung einer Aussage mit beziehungsstiftenden Erläuterungen definiert. So werden Zusammenhänge erschlossen und Ergebnisse eingeordnet. Interpretation ist Übersetzung und Verständigung. Sie muss Heuristik und Methodenkritik verbinden. Eingeführt wird in diese methodischen Grundlagen und Regeln psychologischer Interpretationen. Die ersten Kapitel des Buches führen mit einer Interpretatio...

  14. Learning Context for Text Categorization

    CERN Document Server

    Haribhakta, Y V

    2011-01-01

    This paper describes our work which is based on discovering context for text document categorization. The document categorization approach is derived from a combination of a learning paradigm known as relation extraction and an technique known as context discovery. We demonstrate the effectiveness of our categorization approach using reuters 21578 dataset and synthetic real world data from sports domain. Our experimental results indicate that the learned context greatly improves the categorization performance as compared to traditional categorization approaches.

  15. TEXT CATEGORIZATION USING QLEARNING ALOGRITHM

    OpenAIRE

    Dr.S.R.Suresh; T.Karthikeyan,; D.B.Shanmugam,; J.Dhilipan

    2011-01-01

    This paper aims at creation of an efficient document classification process using reinforcement learning, a branch of machine learning that concerns itself with optimal sequential decision-making. Onestrength of reinforcement learning is that it provides formalism for measuring the utility of actions that gives benefit only in the future. An effective and flexible classifier learning algorithm is provided, which classifies a set of text documents into a more specific domain like Cricket, Tenn...

  16. Survey on Text Document Clustering

    OpenAIRE

    M.Thangamani; Dr.P.Thangaraj

    2010-01-01

    Document clustering is also referred as text clustering, and its concept is merely equal to data clustering. It is hardly difficult to find the selective information from an ‘N’number of series information, so that document clustering came into picture. Basically cluster means a group of similar data, document clustering means segregating the data into different groups of similar data. Clustering can be of mathematical, statistical or numerical domain. Clustering is a fundamental data analysi...

  17. Text Analytics to Data Warehousing

    OpenAIRE

    Kalli Srinivasa Nageswara Prasad; S. Ramakrishna

    2010-01-01

    Information hidden or stored in unstructured data can play a critical role in making decisions, understanding and conducting other business functions. Integrating data stored in both structured and unstructured formats can add significant value to an organization. With the extent of development happening in Text Mining and technologies to deal with unstructured and semi structured data like XML and MML(Mining Markup Language) to extract and analyze data, textanalytics has evolved to handle un...

  18. Text Mining for Protein Docking.

    Directory of Open Access Journals (Sweden)

    Varsha D Badal

    2015-12-01

    Full Text Available The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking. Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu. The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound

  19. Text writing in the air

    OpenAIRE

    Beg, Saira; Khan, M. Fahad; Baig, Faisal

    2016-01-01

    This paper presents a real time video based pointing method which allows sketching and writing of English text over air in front of mobile camera. Proposed method have two main tasks: first it track the colored finger tip in the video frames and then apply English OCR over plotted images in order to recognize the written characters. Moreover, proposed method provides a natural human-system interaction in such way that it do not require keypad, stylus, pen or glove etc for character input. For...

  20. New Historicism: Text and Context

    Directory of Open Access Journals (Sweden)

    Violeta M. Vesić

    2016-02-01

    Full Text Available During most of the twentieth century history was seen as a phenomenon outside of literature that guaranteed the veracity of literary interpretation. History was unique and it functioned as a basis for reading literary works. During the seventies of the twentieth century there occurred a change of attitude towards history in American literary theory, and there appeared a new theoretical approach which soon became known as New Historicism. Since its inception, New Historicism has been identified with the study of Renaissance and Romanticism, but nowadays it has been increasingly involved in other literary trends. Although there are great differences in the arguments and practices at various representatives of this school, New Historicism has clearly recognizable features and many new historicists will agree with the statement of Walter Cohen that New Historicism, when it appeared in the eighties, represented something quite new in reference to the studies of theory, criticism and history (Cohen 1987, 33. Theoretical connection with Bakhtin, Foucault and Marx is clear, as well as a kind of uneasy tie with deconstruction and the work of Paul de Man. At the center of this approach is a renewed interest in the study of literary works in the light of historical and political circumstances in which they were created. Foucault encouraged readers to begin to move literary texts and to link them with discourses and representations that are not literary, as well as to examine the sociological aspects of the texts in order to take part in the social struggles of today. The study of literary works using New Historicism is the study of politics, history, culture and circumstances in which these works were created. With regard to one of the main fact which is located in the center of the criticism, that history cannot be viewed objectively and that reality can only be understood through a cultural context that reveals the work, re-reading and interpretation of

  1. Succincter Text Indexing with Wildcards

    CERN Document Server

    Thachuk, Chris

    2011-01-01

    We study the problem of indexing text with wildcard positions, motivated by the challenge of aligning sequencing data to large genomes that contain millions of single nucleotide polymorphisms (SNPs)---positions known to differ between individuals. SNPs modeled as wildcards can lead to more informed and biologically relevant alignments. We improve the space complexity of previous approaches by giving a succinct index requiring $(2 + o(1))n \\log \\sigma + O(n) + O(d \\log n) + O(k \\log k)$ bits for a text of length $n$ over an alphabet of size $\\sigma$ containing $d$ groups of $k$ wildcards. A key to the space reduction is a result we give showing how any compressed suffix array can be supplemented with auxiliary data structures occupying $O(n) + O(d \\log \\frac{n}{d})$ bits to also support efficient dictionary matching queries. The query algorithm for our wildcard index is faster than previous approaches using reasonable working space. More importantly our new algorithm greatly reduces the query working space to ...

  2. Everyday Life as a Text

    Directory of Open Access Journals (Sweden)

    Michael Lahey

    2016-02-01

    Full Text Available This article explores how audience data are utilized in the tentative partnerships created between television and social media companies. Specially, it looks at the mutually beneficial relationship formed between the social media platform Twitter and television. It calls attention to how audience data are utilized as a way for the television industry to map itself onto the everyday lives of digital media audiences. I argue that the data-intensive monitoring of everyday life offers some measure of soft control over audiences in a digital media landscape. To do this, I explore “Social TV”—the relationships created between social media technologies and television—before explaining how Twitter leverages user data into partnerships with various television companies. Finally, the article explains what is fruitful about understanding the Twitter–television relationship as a form of soft control.

  3. Text documents as social networks

    Science.gov (United States)

    Balinsky, Helen; Balinsky, Alexander; Simske, Steven J.

    2012-03-01

    The extraction of keywords and features is a fundamental problem in text data mining. Document processing applications directly depend on the quality and speed of the identification of salient terms and phrases. Applications as disparate as automatic document classification, information visualization, filtering and security policy enforcement all rely on the quality of automatically extracted keywords. Recently, a novel approach to rapid change detection in data streams and documents has been developed. It is based on ideas from image processing and in particular on the Helmholtz Principle from the Gestalt Theory of human perception. By modeling a document as a one-parameter family of graphs with its sentences or paragraphs defining the vertex set and with edges defined by Helmholtz's principle, we demonstrated that for some range of the parameters, the resulting graph becomes a small-world network. In this article we investigate the natural orientation of edges in such small world networks. For two connected sentences, we can say which one is the first and which one is the second, according to their position in a document. This will make such a graph look like a small WWW-type network and PageRank type algorithms will produce interesting ranking of nodes in such a document.

  4. A programmed text in statistics

    CERN Document Server

    Hine, J

    1975-01-01

    Exercises for Section 2 42 Physical sciences and engineering 42 43 Biological sciences 45 Social sciences Solutions to Exercises, Section 1 47 Physical sciences and engineering 47 49 Biological sciences 49 Social sciences Solutions to Exercises, Section 2 51 51 PhYSical sciences and engineering 55 Biological sciences 58 Social sciences 62 Tables 2 62 x - tests involving variances 2 63,64 x - one tailed tests 2 65 x - two tailed tests F-distribution 66-69 Preface This project started some years ago when the Nuffield Foundation kindly gave a grant for writing a pro­ grammed text to use with service courses in statistics. The work carried out by Mrs. Joan Hine and Professor G. B. Wetherill at Bath University, together with some other help from time to time by colleagues at Bath University and elsewhere. Testing was done at various colleges and universities, and some helpful comments were received, but we particularly mention King Edwards School, Bath, who provided some sixth formers as 'guinea pigs' for the fir...

  5. Orientalist discourse in media texts

    Directory of Open Access Journals (Sweden)

    Necla Mora

    2009-10-01

    Full Text Available By placing itself at the center of the world with a Eurocentric point of view, the West exploits other countries and communities through inflicting cultural change and transformation on them either from within via colonialist movements or from outside via “Orientalist” discourses in line with its imperialist objectives.The West has fictionalized the “image of the Orient” in terms of science by making use of social sciences like anthropology, history and philology and launched an intensive propaganda which covers literature, painting, cinema and other fields of art in order to actualize this fiction. Accordingly, the image of the Orient – which has been built firstly in terms of science then socially – has been engraved into the collective memory of both the Westerner and the Easterner.The internalized “Orientalist” point of view and discourse cause the Westerner to see and perceive the Easterner with the image formed in his/her memory while looking at them. The Easterner represents and expresses himself/herself from the eyes of the Westerner and with the image which the Westerner fictionalized for him/her. Hence, in order to gain acceptance from the West, the East tries to shape itself into the “Orientalist” mold which the Westerner fictionalized for it.Artists, intellectuals, writers and media professionals, who embrace and internalize the stereotypical hegemonic-driven “Orientalist” discourse of the Westerner and who rank among the elite group, reflect their internalized “Orientalist” discourse on their own actions. This condition causes the “Orientalist” clichés to be engraved in the memory of the society; causes the society to view itself with an “Orientalist” point of view and perceive itself with the clichés of the Westerner. Consequently, the second ring of the hegemony is reproduced by the symbolic elites who represent the power/authority within the country.The “Orientalist” discourse, which is

  6. What's so Simple about Simplified Texts? A Computational and Psycholinguistic Investigation of Text Comprehension and Text Processing

    Science.gov (United States)

    Crossley, Scott A.; Yang, Hae Sung; McNamara, Danielle S.

    2014-01-01

    This study uses a moving windows self-paced reading task to assess both text comprehension and processing time of authentic texts and these same texts simplified to beginning and intermediate levels. Forty-eight second language learners each read 9 texts (3 different authentic, beginning, and intermediate level texts). Repeated measures ANOVAs…

  7. Bengali text summarization by sentence extraction

    CERN Document Server

    Sarkar, Kamal

    2012-01-01

    Text summarization is a process to produce an abstract or a summary by selecting significant portion of the information from one or more texts. In an automatic text summarization process, a text is given to the computer and the computer returns a shorter less redundant extract or abstract of the original text(s). Many techniques have been developed for summarizing English text(s). But, a very few attempts have been made for Bengali text summarization. This paper presents a method for Bengali text summarization which extracts important sentences from a Bengali document to produce a summary.

  8. Text History of the Greek Exodus

    OpenAIRE

    Wevers, John William

    1992-01-01

    Chapter I: The Hexaplaric Recension 9; Chapter II: The Byzantine Text Group 41; Chapter III: The Catena Text 64; Chapter IV: The Texts of A and B 81; Chapter V: The Text of Cyril of Alexandria's De Adoratione and Glaphyra 104; Chapter VI: The Composition of Exod 35 to 40 117;Chapter VII: The Critical Text (Exod) 147; Index of Passages 273

  9. Text Categorization with Latent Dirichlet Allocation

    Directory of Open Access Journals (Sweden)

    ZLACKÝ Daniel

    2014-05-01

    Full Text Available This paper focuses on the text categorization of Slovak text corpora using latent Dirichlet allocation. Our goal is to build text subcorpora that contain similar text documents. We want to use these better organized text subcorpora to build more robust language models that can be used in the area of speech recognition systems. Our previous research in the area of text categorization showed that we can achieve better results with categorized text corpora. In this paper we used latent Dirichlet allocation for text categorization. We divided initial text corpus into 2, 5, 10, 20 or 100 subcorpora with various iterations and save steps. Language models were built on these subcorpora and adapted with linear interpolation to judicial domain. The experiment results showed that text categorization using latent Dirichlet allocation can improve the system for automatic speech recognition by creating the language models from organized text corpora.

  10. SIAM 2007 Text Mining Competition dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — Subject Area: Text Mining Description: This is the dataset used for the SIAM 2007 Text Mining competition. This competition focused on developing text mining...

  11. TEXT CLASSIFICATION TOWARD A SCIENTIFIC FORUM

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    Text mining, also known as discovering knowledge from the text, which has emerged as a possible solution for the current information explosion, refers to the process of extracting non-trivial and useful patterns from unstructured text. Among the general tasks of text mining such as text clustering,summarization, etc, text classification is a subtask of intelligent information processing, which employs unsupervised learning to construct a classifier from training text by which to predict the class of unlabeled text. Because of its simplicity and objectivity in performance evaluation, text classification was usually used as a standard tool to determine the advantage or weakness of a text processing method, such as text representation, text feature selection, etc. In this paper, text classification is carried out to classify the Web documents collected from XSSC Website (http://www. xssc.ac.cn). The performance of support vector machine (SVM) and back propagation neural network (BPNN) is compared on this task. Specifically, binary text classification and multi-class text classification were conducted on the XSSC documents. Moreover, the classification results of both methods are combined to improve the accuracy of classification. An experiment is conducted to show that BPNN can compete with SVM in binary text classification; but for multi-class text classification, SVM performs much better. Furthermore, the classification is improved in both binary and multi-class with the combined method.

  12. Noticeable Focuses in Reading a Text

    Institute of Scientific and Technical Information of China (English)

    李明

    2007-01-01

    This paper discusses the relationship between commanding those basic information contained in a text and the final p urpose of comprehending in a text-reading process. By using the main topic and the central meaning that all texts have as two main examples, the author mainly illustrates what a reader should pay attention to in reading a text.

  13. What makes a written text written

    Institute of Scientific and Technical Information of China (English)

    赵亦倩

    2008-01-01

    Text can be used for both written and spoken language, and different features of spoken and written texts provide us the possibility to have a general idea of the division of two main categories--spoken English and written English. In this article, an attempt will be given to a sample text in order to discuss the general features of written texts.

  14. Examining Text Complexity in the Early Grades

    Science.gov (United States)

    Fitzgerald, Jill; Elmore, Jeff; Hiebert, Elfrieda H.; Koons, Heather H.; Bowen, Kimberly; Sanford-Moore, Eleanor E.; Stenner, A. Jackson

    2016-01-01

    The Common Core raises the stature of texts to new heights, creating a hubbub. The fuss is especially messy at the early grades, where children are expected to read more complex texts than in the past. But early-grades teachers have been given little actionable guidance about text complexity. The authors recently examined early-grades texts to…

  15. Text Power: tools for the Cultural Heritage.

    OpenAIRE

    Picchi, Eugenio; Sassolini, Eva

    2009-01-01

    This article presents NLP techniques (text mining, text analysis) to create tools for the avaluation, analysis and classification of text materials available on the web. In particular we developed tools for the automatic extraction of mistic relevant information related to the cultural heritage domain and tools for linguistic resouces creation. On this knowledge basis, we also developed a system for text browsing.

  16. Open architecture for multilingual parallel texts

    CERN Document Server

    Benitez, M T Carrasco

    2008-01-01

    Multilingual parallel texts (abbreviated to parallel texts) are linguistic versions of the same content ("translations"); e.g., the Maastricht Treaty in English and Spanish are parallel texts. This document is about creating an open architecture for the whole Authoring, Translation and Publishing Chain (ATP-chain) for the processing of parallel texts.

  17. Text Classification and Classifiers:A Survey

    Directory of Open Access Journals (Sweden)

    Vandana Korde

    2012-03-01

    Full Text Available As most information (over 80% is stored as text, text mining is believed to have a high commercial potential value. knowledge may be discovered from many sources of information; yet, unstructured texts remain the largest readily available source of knowledge .Text classification which classifies the documents according to predefined categories .In this paper we are tried to give the introduction of text classification, process of text classification as well as the overview of the classifiers and tried to compare the some existing classifier on basis of few criteria like time complexity, principal and performance.

  18. Scalable Text Mining with Sparse Generative Models

    OpenAIRE

    Puurula, Antti

    2016-01-01

    The information age has brought a deluge of data. Much of this is in text form, insurmountable in scope for humans and incomprehensible in structure for computers. Text mining is an expanding field of research that seeks to utilize the information contained in vast document collections. General data mining methods based on machine learning face challenges with the scale of text data, posing a need for scalable text mining methods. This thesis proposes a solution to scalable text mining: gener...

  19. Using compression to identify acronyms in text

    OpenAIRE

    Yeates, Stuart; Bainbridge, David; Witten, Ian H

    2000-01-01

    Text mining is about looking for patterns in natural language text, and may be defined as the process of analyzing text to extract information from it for particular purposes. In previous work, we claimed that compression is a key technology for text mining, and backed this up with a study that showed how particular kinds of lexical tokens---names, dates, locations, etc.---can be identified and located in running text, using compression models to provide the leverage necessary to distinguish ...

  20. Discover Effective Pattern for Text Mining

    OpenAIRE

    Khade, A. D.; A. B. Karche

    2014-01-01

    Many data mining techniques have been discovered for finding useful patterns in documents like text document. However, how to use effective and bring to up to date discovered patterns is still an open research task, especially in the domain of text mining. Text mining is the finding of very interesting knowledge (or features) in the text documents. It is a challenging task to find appropriate knowledge (or features) in text documents to help users to find what they exactly want...

  1. ERGONYMS AS COMPONENTS OF PHARMACEUTICALS ADVERTISING TEXTS

    OpenAIRE

    НАСАКІНА, Світлана Вікторівна

    2016-01-01

    The article deals with the functioning of ergonyms in pharmaceuticals advertising texts. The purpose of the article is the analysis of the ergonyms functioning in pharmaceuticals advertising texts. The tasks are defining the specifics of the ergonyms functioning in pharmaceuticals advertising texts and creating an ergonyms classification on the material under study. There were found similarities in the pharmaceuticals advertising texts in the Ukrainian, Bulgarian and Russian languages. The ar...

  2. On-Line Full Text Pathology Database

    OpenAIRE

    Fink, Daniel; Clark, Anthony; Sideli, Robert

    1988-01-01

    A free text database for pathology reports has been developed using the BRS/SEARCH free text management software. All pathology reports are stored in the free text pathology database. Standardized section headings make any word searchable both by itself or within the context of a specific part of the report. The free text management software supplies a rich set of Boolean, positional, and relational operators. These operators make an iterative search strategy an effective method of searching ...

  3. Folklore text in a process of oblivion

    Directory of Open Access Journals (Sweden)

    Ilić Marija

    2005-01-01

    Full Text Available The paper presents contemporary research of the traditional folklore from the perspective of ethno linguistics and anthrop linguistics. The analysis is based on material collected among Serbs from Szigetcsep (Hungary, 2001 during the ethno linguistic field survey. The paper specifically discusses the methods of collecting traditional folklore texts in the ethno linguistic interview, discourse analysis of utterances commenting on folklore texts and ways of memorization of folklore texts.

  4. Comparative Discourse Analysis of Parallel Texts

    CERN Document Server

    Van der Eijk, P

    1994-01-01

    A quantitative representation of discourse structure can be computed by measuring lexical cohesion relations among adjacent blocks of text. These representations have been proposed to deal with sub-topic text segmentation. In a parallel corpus, similar representations can be derived for versions of a text in various languages. These can be used for parallel segmentation and as an alternative measure of text-translation similarity.

  5. Automatic text categorisation of racist webpages

    OpenAIRE

    Greevy, Edel

    2004-01-01

    Automatic Text Categorisation (TC) involves the assignment of one or more predefined categories to text documents in order that they can be effectively managed. In this thesis we examine the possibility of applying automatic text categorisation to the problem of categorising texts (web pages) based on whether or not they are racist. TC has proven successful for topic-based problems such as news story categorisation. However, the problem of detecting racism is dissimilar to topic-based pro...

  6. Arabic Text Mining Using Rule Based Classification

    OpenAIRE

    Fadi Thabtah; Omar Gharaibeh; Rashid Al-Zubaidy

    2012-01-01

    A well-known classification problem in the domain of text mining is text classification, which concerns about mapping textual documents into one or more predefined category based on its content. Text classification arena recently attracted many researchers because of the massive amounts of online documents and text archives which hold essential information for a decision-making process. In this field, most of such researches focus on classifying English documents while there are limited studi...

  7. Mathematical Texts as Narrative: Rethinking Curriculum

    Science.gov (United States)

    Dietiker, Leslie

    2013-01-01

    This paper proposes a framework for reading mathematics texts as narratives. Building from a narrative framework of Meike Bal, a reader's experience with the mathematical content as it unfolds in the text (the "mathematical story") is distinguished from his or her logical reconstruction of the content beyond the text (the…

  8. Text Complexity: Primary Teachers' Views

    Science.gov (United States)

    Fitzgerald, Jill; Hiebert, Elfrieda H.; Bowen, Kimberly; Relyea-Kim, E. Jackie; Kung, Melody; Elmore, Jeff

    2015-01-01

    The research question was, "What text characteristics do primary teachers think are most important for early grades text complexity?" Teachers from across the United States accomplished a two-part task. First, to stimulate teachers' thinking about important text characteristics, primary teachers completed an online paired-text…

  9. Applying statistical methods to text steganography

    CERN Document Server

    Nechta, Ivan

    2011-01-01

    This paper presents a survey of text steganography methods used for hid- ing secret information inside some covertext. Widely known hiding techniques (such as translation based steganography, text generating and syntactic embed- ding) and detection are considered. It is shown that statistical analysis has an important role in text steganalysis.

  10. Teacher Modeling Using Complex Informational Texts

    Science.gov (United States)

    Fisher, Douglas; Frey, Nancy

    2015-01-01

    Modeling in complex texts requires that teachers analyze the text for factors of qualitative complexity and then design lessons that introduce students to that complexity. In addition, teachers can model the disciplinary nature of content area texts as well as word solving and comprehension strategies. Included is a planning guide for think aloud.

  11. The Costs of Texting in the Classroom

    Science.gov (United States)

    Lawson, Dakota; Henderson, Bruce B.

    2015-01-01

    Many college students seem to find it impossible to resist the temptation to text on electronic devices during class lectures and discussions. One common response of college professors is to yield to the inevitable and try to ignore student texting. However, research indicates that because of limited cognitive capacities, even simple texting can…

  12. Interdisciplinary Approach to Understanding Literary Texts

    Science.gov (United States)

    Dossanova, Altynay Zh.; Ismakova, Bibissara S.; Tapanova, Saule E.; Ayupova, Gulbagira K.; Gotting, Valentina V.; Kaltayeva, Gulnar K.

    2016-01-01

    The primary purpose is the implementation of the interdisciplinary approach to understanding and the construction of integrative models of understanding literary texts. The interdisciplinary methodological paradigm of studying text understanding, based on the principles of various sciences facilitating the identification of the text understanding…

  13. Does Writing Summaries Improve Memory for Text?

    Science.gov (United States)

    Spirgel, Arie S.; Delaney, Peter F.

    2016-01-01

    In five experiments, we consistently found that items included in summaries were better remembered than items omitted from summaries. We did not, however, find evidence that summary writing was better than merely restudying the text. These patterns held with shorter and longer texts, when the text was present or absent during the summary writing,…

  14. Text History of the Greek Numbers

    OpenAIRE

    Wevers, John William

    1982-01-01

    Chapter 1 The x Group 7; Chapter 2 The Byzantine Text 17; Chapter 3 The Hexaplaric Recension 43; Chapter 4 The Texts of B and A 66; Chapter 5 Papyrus 963 as Textual Witness 86; Chapter 6 The Critical Text (Num) 94; Index of Passages 136

  15. Academic Journal Embargoes and Full Text Databases.

    Science.gov (United States)

    Brooks, Sam

    2003-01-01

    Documents the reasons for embargoes of academic journals in full text databases (i.e., publisher-imposed delays on the availability of full text content) and provides insight regarding common misconceptions. Tables present data on selected journals covering a cross-section of subjects and publishers and comparing two full text business databases.…

  16. Knowledge discovery data and text mining

    CERN Document Server

    Olmer, Petr

    2008-01-01

    Data mining and text mining refer to techniques, models, algorithms, and processes for knowledge discovery and extraction. Basic de nitions are given together with the description of a standard data mining process. Common models and algorithms are presented. Attention is given to text clustering, how to convert unstructured text to structured data (vectors), and how to compute their importance and position within clusters.

  17. Role of Terms in Popular Science Text

    Directory of Open Access Journals (Sweden)

    Zhabbarova F. U.

    2013-01-01

    Full Text Available The article examines and determines the specifics of terminological vocabulary used in a popular science text. It differentiates the notions of cohesion and coherence. The article reveals the main terminological means realizing cohesion in the text of a popular science article.

  18. Text line Segmentation of Curved Document Images

    Directory of Open Access Journals (Sweden)

    Anusree.M

    2014-05-01

    Full Text Available Document image analysis has been widely used in historical and heritage studies, education and digital library. Document image analytical techniques are mainly used for improving the human readability and the OCR quality of the document. During the digitization, camera captured images contain warped document due perspective and geometric distortions. The main difficulty is text line detection in the document. Many algorithms had been proposed to address the problem of printed document text line detection, but they failed to extract text lines in curved document. This paper describes a segmentation technique that detects the curled text line in camera captured document images.

  19. An Embedded Application for Degraded Text Recognition

    Directory of Open Access Journals (Sweden)

    Thillou Céline

    2005-01-01

    Full Text Available This paper describes a mobile device which tries to give the blind or visually impaired access to text information. Three key technologies are required for this system: text detection, optical character recognition, and speech synthesis. Blind users and the mobile environment imply two strong constraints. First, pictures will be taken without control on camera settings and a priori information on text (font or size and background. The second issue is to link several techniques together with an optimal compromise between computational constraints and recognition efficiency. We will present the overall description of the system from text detection to OCR error correction.

  20. Adaptive Text Entry for Mobile Devices

    DEFF Research Database (Denmark)

    Proschowsky, Morten Smidt

    -aware language models that is introduced in the thesis. YourText enables different language models to be combined to a new common language model. The framework is designed so it can be adapted to different text entry methods, thereby enabling the language model to be transferred between devices. Your...... for mobile devices and a framework for adaptive context-aware language models. Based on analysis of current text entry methods, the requirements to the new text entry methods are established. Transparent User guided Prediction (TUP) is a text entry method for devices with one dimensional touch input. It can...... to improve the models of human motor behaviour. TUP-Key is a variant of TUP, designed for 12 key phone keyboards. It is introduced in the thesis but has not been implemented or evaluated. Both text entry methods support adaptive context-aware language models. YourText is a framework for adaptive context...

  1. The nuclear modification of charged particles in Pb-Pb at $\\sqrt{\\text{s}_\\text{NN}} = \\text{5.02}\\,\\text{TeV}$ measured with ALICE

    CERN Document Server

    Gronefeld, Julius

    2016-01-01

    The study of inclusive charged-particle production in heavy-ion collisions provides insights into the density of the medium and the energy-loss mechanisms. The observed suppression of high-$\\textit{p}_\\text{T}$ yield is generally attributed to energy loss of partons as they propagate through a deconfined state of quarks and gluons - Quark-Gluon Plasma (QGP) - predicted by QCD. Such measurements allow the characterization of the QGP by comparison with models. In these proceedings, results on high-$\\textit{p}_\\text{T}$ particle production measured by ALICE in Pb-Pb collisions at $ \\sqrt{\\text{s}_\\text{NN}}\\, = 5.02\\ \\rm{TeV}$ as well as well in pp at $\\sqrt{\\text{s}}\\,=5.02\\ \\rm{TeV}$ are presented for the first time. The nuclear modification factors ($\\text{R}_\\text{AA}$) in Pb-Pb collisions are presented and compared with model calculations.

  2. NEW TECHNIQUES USED IN AUTOMATED TEXT ANALYSIS

    Directory of Open Access Journals (Sweden)

    M. I strate

    2010-12-01

    Full Text Available Automated analysis of natural language texts is one of the most important knowledge discovery tasks for any organization. According to Gartner Group, almost 90% of knowledge available at an organization today is dispersed throughout piles of documents buried within unstructured text. Analyzing huge volumes of textual information is often involved in making informed and correct business decisions. Traditional analysis methods based on statistics fail to help processing unstructured texts and the society is in search of new technologies for text analysis. There exist a variety of approaches to the analysis of natural language texts, but most of them do not provide results that could be successfully applied in practice. This article concentrates on recent ideas and practical implementations in this area.

  3. The Research of Chinese Text Proofreading Algorithm

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    Generally, text proofreading consists of two procedures, finding the wrongly used words and then presenting the correct forms. At present, most of the Chinese text proofreading focuses on finding the wrongly used words, but pays less attention to correcting these errors. In this paper, the Chinese text features are interpreted first and then a Chinese text proofreading method and its algorithm are introduced. In this algorithm, text features, including text statistical feature and language structure feature, are properly used. Here, correcting errors goes on at the same time with finding errors. Experimental results show that this method has a performance of detecting 75% of wrongly used Chinese words and correcting about 60% of them with the first candidates.

  4. Translation Strategies of Non-literary Texts

    Institute of Scientific and Technical Information of China (English)

    杨静

    2015-01-01

    Translator's subjectivity is closely related to the choice of the style of the translated texts and translation strategies.This paper presents an analytical study of translation strategies of non-literary texts.It introduces different non-literary texts,and then generalizes some factors influencing the selection of translation strategies.Take these Influencing factors into account,Translators should adopt different translation strategies

  5. Scene text segmentation based on thresholding

    OpenAIRE

    Perez Sanmartín, Alejandro

    2014-01-01

    This research deals with the problem of text segmentation in scene images. Introduction deals with the information contained in an image and the different properties that will be useful for image segmentation. After that, the process of extraction of textual information is explained step by step. Furthermore, the problem of scene text segmentation is described more precisely and an overview of more popular existing methods is given. Text segmentation method is created and implemented using...

  6. Multilingual Text Detection with Nonlinear Neural Network

    OpenAIRE

    Lin Li; Shengsheng Yu; Luo Zhong; Xiaozhen Li

    2015-01-01

    Multilingual text detection in natural scenes is still a challenging task in computer vision. In this paper, we apply an unsupervised learning algorithm to learn language-independent stroke feature and combine unsupervised stroke feature learning and automatically multilayer feature extraction to improve the representational power of text feature. We also develop a novel nonlinear network based on traditional Convolutional Neural Network that is able to detect multilingual text regions in th...

  7. Hex: dynamics and probabilistic text entry

    OpenAIRE

    J. Williamson; Murray-Smith, R.

    2005-01-01

    We present a gestural interface for entering text on a mobile device via continuous movements, with control based on feedback from a probabilistic language model. Text is represented by continuous trajectories over a hexagonal tessellation, and entry becomes a manual control task. The language model is used to infer user intentions and provide predictions about future actions, and the local dynamics adapt to reduce effort in entering probable text. This leads to an interface with a stable lay...

  8. Automatic summary evaluation based on text grammars

    OpenAIRE

    Branny, Emilia

    2007-01-01

    In this paper, I describe a method for evaluating automatically generated text summaries. The method is inspired by research in text grammars by Teun Van Dijk. It addresses a text as a complex structure, the elements of which are interconnected both on the level of form and meaning, and the well-formedness of which should be described on both of these levels. The method addresses current problems of summary evaluation methods, especially the problem of quantifying informativity, as well as th...

  9. Frontiers of biomedical text mining: current progress

    OpenAIRE

    Zweigenbaum, Pierre; Demner-Fushman, Dina; Hong YU; Cohen, Kevin B.

    2007-01-01

    It is now almost 15 years since the publication of the first paper on text mining in the genomics domain, and decades since the first paper on text mining in the medical domain. Enormous progress has been made in the areas of information retrieval, evaluation methodologies and resource construction. Some problems, such as abbreviation-handling, can essentially be considered solved problems, and others, such as identification of gene mentions in text, seem likely to be solved soon. However, a ...

  10. Mining Quality Phrases from Massive Text Corpora

    OpenAIRE

    Liu, Jialu; Shang, Jingbo; Wang, Chi; Ren, Xiang; Han, Jiawei

    2015-01-01

    Text data are ubiquitous and play an essential role in big data applications. However, text data are mostly unstructured. Transforming unstructured text into structured units (e.g., semantically meaningful phrases) will substantially reduce semantic ambiguity and enhance the power and efficiency at manipulating such data using database technology. Thus mining quality phrases is a critical research problem in the field of databases. In this paper, we propose a new framework that extracts quali...

  11. Beyond Text Theory: Understanding Literary Response

    OpenAIRE

    Miall, David S.; Kuiken, Don

    1994-01-01

    Approaches to text comprehension that focus on propositional, inferential, and elaborative processes have often been considered capable of extension in principle to literary texts, such as stories or poems. However, we argue that literary response is influenced by stylistic features that result in defamiliarization; that defamiliarization invokes feeling which calls on personal perspectives and meanings; and that these aspects of literary response are not addressed by current text theories. T...

  12. Multilingual Text Detection with Nonlinear Neural Network

    Directory of Open Access Journals (Sweden)

    Lin Li

    2015-01-01

    Full Text Available Multilingual text detection in natural scenes is still a challenging task in computer vision. In this paper, we apply an unsupervised learning algorithm to learn language-independent stroke feature and combine unsupervised stroke feature learning and automatically multilayer feature extraction to improve the representational power of text feature. We also develop a novel nonlinear network based on traditional Convolutional Neural Network that is able to detect multilingual text regions in the images. The proposed method is evaluated on standard benchmarks and multilingual dataset and demonstrates improvement over the previous work.

  13. An approach for NL text interpretation

    Directory of Open Access Journals (Sweden)

    Anatol Popescu

    2007-11-01

    Full Text Available For modeling the interpretation process of NL sentences we use the mechanisms implying semantic networks that assure syntactic - semantic text interpretation (SSI, including an understanding axiomatic model, interpretation model and denotation model to represent the result of SSI. These models estimate the correctness and the consistency of texts too. Also it implements an information extraction from texts in NL. Our approach based, mainly, upon semantic networks grammars has an extraordinary interpretation potential implying a system of completely new concepts and processing methods.

  14. Financial Statement Fraud Detection using Text Mining

    Directory of Open Access Journals (Sweden)

    Rajan Gupta

    2013-01-01

    Full Text Available Data mining techniques have been used enormously by the researchers’ community in detecting financial statement fraud. Most of the research in this direction has used the numbers (quantitative information i.e. financial ratios present in the financial statements for detecting fraud. There is very little or no research on the analysis of text such as auditor’s comments or notes present in published reports. In this study we propose a text mining approach for detecting financial statement fraud by analyzing the hidden clues in the qualitative information (text present in financial statements.

  15. [Text comprehension, cognitive resources and aging].

    Science.gov (United States)

    Chesneau, Sophie; Jbabdi, Saad; Champagne-Lavau, Maud; Giroux, Francine; Ska, Bernadette

    2007-03-01

    Aging brings cognitive changes. Language is not immune to these changes. The use of compensation strategies may permit older adults to achieve a performance level identical to the one obtained by younger adults. This research aims to study text comprehension in aging and the reading strategies used for by older and younger adults. Kintsch's cognitive model (1988) allows the identification of different levels of representation within text treatment (linguistic form, macrostructure, microstructure and situation model) and predicts the underlying cognitive components. Eye-tracking analyses during reading permit inference about the moments of reading treatment and detection of reading strategies. Sixty highly educated participants were assessed. They were divided in two age groups (20-40 and 60-80 years old). Participants were asked to read and understand three texts constructed to highlight the features of text comprehension within each one of the different levels of text representation. The amount of detail and the necessity of updating the situation model varied for each text. Eye movements were registered by an eye-tracker (Cambridge research) during the reading process. Specific complementary tasks were administered to evaluate working memory, long-term memory, and executive functions. Variances analyses showed significantly lower performance by older adults regarding: 1) recall of the microstructure of the two texts with a high degree of detail, 2) macrostructure of the text with fewer details, and 3) performance on all tasks that evaluated cognitive components. Aging influenced treatment of levels of text representation depending on text characteristics. However, cluster analysis of the text comprehension and eye-tracker data revealed a group of older adults whose performance in reading comprehension was identical to the performance of younger adults, with the same reading profile. This result seems to show that use of compensation strategies by older adults at

  16. Texts, Transmissions, Receptions. Modern Approaches to Narratives

    NARCIS (Netherlands)

    Lardinois, A.P.M.H.; Levie, S.A.; Hoeken, H.; Lüthy, C.H.

    2015-01-01

    The papers collected in this volume study the function and meaning of narrative texts from a variety of perspectives. The word 'text' is used here in the broadest sense of the term: it denotes literary books, but also oral tales, speeches, newspaper articles and comics. One of the purposes of this v

  17. The Patchwork Text in Teaching Greek Tragedy.

    Science.gov (United States)

    Parker, Jan

    2003-01-01

    Describes the rewards and challenges of using the Patchwork Text to teach Greek Tragedy to Cambridge University English final-year students. The article uses close reading of the students' texts, analysis and reflection to discuss both the products and the process of Patchwork writing. (Author/AEF)

  18. Flexible frontiers for text division into rows

    Directory of Open Access Journals (Sweden)

    Dan L. Lacrămă

    2009-01-01

    Full Text Available This paper presents an original solution for flexible hand-written text division into rows. Unlike the standard procedure, the proposed method avoids the isolated characters extensions amputation and reduces the recognition error rate in the final stage.

  19. Texts in multiple versions: histories of editions

    NARCIS (Netherlands)

    L. Giuliani; H. Brinkman; G. Lernout; M. Mathijsen

    2006-01-01

    Texts in multiple versions constitute the core problem of textual scholarship. For texts from antiquity and the medieval period, the many versions may be the result of manuscript transmission, requiring editors and readers to discriminate between levels of authority in variant readings produced alon

  20. Opening Mathematics Texts: Resisting the Seduction

    Science.gov (United States)

    Wagner, David

    2012-01-01

    This analysis of the writing in a grade 7 mathematics textbook distinguishes between closed texts and open texts, which acknowledge multiple possibilities. I use tools that have recently been applied in mathematics contexts, focussing on grammatical features that include personal pronouns, modality, and types of imperatives, as well as on…

  1. Ontology Assisted Formal Specification Extraction from Text

    Directory of Open Access Journals (Sweden)

    Andreea Mihis

    2010-12-01

    Full Text Available In the field of knowledge processing, the ontologies are the most important mean. They make possible for the computer to understand better the natural language and to make judgments. In this paper, a method which use ontologies in the semi-automatic extraction of formal specifications from a natural language text is proposed.

  2. Text comprehension strategy instruction with poor readers

    NARCIS (Netherlands)

    Van den Bos, K.P.; Aarnoudse, C.C.; Brand-Gruwel, S.

    1998-01-01

    The goal of this study was to investigate the effects of teaching text comprehension strategies to children with decoding and reading comprehension problems and with a poor or normal listening ability. Two experiments are reported. Four text comprehension strategies, viz., question generation, summa

  3. Readability Revisited? The Implications of Text Complexity

    Science.gov (United States)

    Wray, David; Janan, Dahlia

    2013-01-01

    The concept of readability has had a variable history, moving from a position where it was considered as a very important topic for those responsible for producing texts and matching those texts to the abilities and needs of learners, to its current declining visibility in the education literature. Some important work has been coming from the USA…

  4. On the Techniques of Journalistic Text Translation

    Institute of Scientific and Technical Information of China (English)

    林燕

    2015-01-01

    With the development of economy globalization,the translation of journalistic text has become increasingly important to cultural exchanges or economy communication among different countries. This paper briefly introduces the characteristics of news text and provides some feasible techniques for translation from English to Chinese or Chinese to English based on the case study.

  5. The Weaknesses of Full-Text Searching

    Science.gov (United States)

    Beall, Jeffrey

    2008-01-01

    This paper provides a theoretical critique of the deficiencies of full-text searching in academic library databases. Because full-text searching relies on matching words in a search query with words in online resources, it is an inefficient method of finding information in a database. This matching fails to retrieve synonyms, and it also retrieves…

  6. Tree patterns with full text search

    NARCIS (Netherlands)

    M.H. Peetz; M. Marx

    2010-01-01

    Tree patterns with full text search form the core of both XQuery Full Text and the NEXI query language. On such queries, users expect a relevance-ranked list of XML elements as an answer. But this requirement may lead to undesirable behavior of XML retrieval systems: two queries which are intuitivel

  7. Text mining and visualization using VOSviewer

    OpenAIRE

    van Eck, Nees Jan; Waltman, Ludo

    2011-01-01

    VOSviewer is a computer program for creating, visualizing, and exploring bibliometric maps of science. In this report, the new text mining functionality of VOSviewer is presented. A number of examples are given of applications in which VOSviewer is used for analyzing large amounts of text data.

  8. Monolingual Accounting Dictionaries for EFL Text Production

    DEFF Research Database (Denmark)

    Nielsen, Sandro

    2009-01-01

    that deal with these aspects are necessary for the international user group as they produce subject-field specific and register-specific texts in a foreign language, and the data items are relevant for the various stages in text production: draft writing, copyediting, stylistic editing and proofreading....

  9. Monolingual accounting dictionaries for EFL text production

    DEFF Research Database (Denmark)

    Nielsen, Sandro

    2006-01-01

    that deal with these aspects are necessary for the international user group as they produce subject-field specific and register-specific texts in a foreign language, and the data items are relevant for the various stages in text production: draft writing, copyediting, stylistic editing and proofreading....

  10. Classifying Written Texts Through Rhythmic Features

    NARCIS (Netherlands)

    Balint, Mihaela; Dascalu, Mihai; Trausan-Matu, Stefan

    2016-01-01

    Rhythm analysis of written texts focuses on literary analysis and it mainly considers poetry. In this paper we investigate the relevance of rhythmic features for categorizing texts in prosaic form pertaining to different genres. Our contribution is threefold. First, we define a set of rhythmic featu

  11. The socio-demographics of texting

    DEFF Research Database (Denmark)

    Ling, Richard; Bertel, Troels Fibæk; Sundsøy, Pål

    2012-01-01

    messages go to only five other persons. Finally, we find that there is pronounced homophily in terms of age and gender in texting relationships. These findings support previous claims that texting is an important element of teen culture and is an element in the construction of a bounded solidarity....

  12. Text Writing at an Undergraduate College.

    Science.gov (United States)

    Myers, David G.

    Strategies for writing a text are offered by a college professor on the basis of his own experience of writing a text on social psychology. Suggestions are given on creating an efficient office environment, researching the topic, and drafting the manuscript. One way to improve efficiency is to compress teaching into a few days, leaving the…

  13. Rapid and effective synthesis of $\\text{}^{40}\\text{Ca}-\\text{}^{27}\\text{Al}$ ion pair towards quantum logic optical clock

    CERN Document Server

    Shang, Junjuan; Cao, Jian; Wang, Shaomao; Shu, Hualin; Huang, Xueren

    2016-01-01

    High precision atomic clocks have been applied not only to very important technological problems such as synchronization and global navigation systems, but to the fundament precision measurement physics. Single $\\text{}^{27}\\text{Al}^+$ is one of the most attractions of selection system due to its very low blackbody radiation effect which dominates frequency shifts in other optical clock systems. Up to now, the $\\text{}^{27}\\text{Al}^+$ still could not be laser-cooled directly by reason that the absence of 167nm laser. Sympathetic cooling is a viable method to solve this problem. In this work, we used a single laser cooled $\\text{}^{40}\\text{Ca}^+$ to sympathetically cool one $\\text{}^{27}\\text{Al}^+$ in linear Paul trap. Comparing to laser ablation method we got a much lower velocity atoms sprayed from a home-made atom oven, which would make loading aluminum ion more efficient and the sympathetic cooling much easier. By the method of precisely measuring the secular frequency of the ion pair, finally we prove...

  14. Modeling text with generalizable Gaussian mixtures

    DEFF Research Database (Denmark)

    Hansen, Lars Kai; Sigurdsson, Sigurdur; Kolenda, Thomas;

    2000-01-01

    We apply and discuss generalizable Gaussian mixture (GGM) models for text mining. The model automatically adapts model complexity for a given text representation. We show that the generalizability of these models depends on the dimensionality of the representation and the sample size. We discuss...

  15. Arabic Text Classification Using Support Vector Machines

    NARCIS (Netherlands)

    Gharib, Tarek Fouad; Habib, Mena Badieh; Fayed, Zaki Taha; Zhu, Qiang

    2009-01-01

    Text classification (TC) is the process of classifying documents into a predefined set of categories based on their content. Arabic language is highly inflectional and derivational language which makes text mining a complex task. In this paper we applied the Support Vector Machines (SVM) model in cl

  16. Teaching Theory through Popular Culture Texts

    Science.gov (United States)

    Trier, James

    2007-01-01

    In this article, the author describes a pedagogical approach to teaching theory to pre-service teachers. This approach involves articulating academic texts that introduce theoretical ideas and tools with carefully selected popular culture texts that can be taken up to illustrate the elements of a particular theory. Examples of the theories…

  17. Code-Mixing in Social Media Text

    Directory of Open Access Journals (Sweden)

    Amitava Das

    Full Text Available Automatic understanding of noisy social media text is one of the prime presentday research areas. Most research has so far concentrated on English texts; however, more than half of the users are writing in other languages, making language identification a prerequisite for comprehensive processing of social media text. Though language identification has been considered an almost solved problem in other applications, language detectors fail in the social media context due to phenomena such as code-mixing, code-switching, lexical borrowings, Anglicisms, and phonetic typing. This paper reports an initial study to understand the characteristics of code-mixing in the social media context and presents a system developed to automatically detect language boundaries in code-mixed social media text, here exemplified by Facebook messages in mixed English-Bengali and English-Hindi.

  18. Integrating Text Plans for Conciseness and Coherence

    CERN Document Server

    Harvey, T; Harvey, Terrence; Carberry, Sandra

    1998-01-01

    Our experience with a critiquing system shows that when the system detects problems with the user's performance, multiple critiques are often produced. Analysis of a corpus of actual critiques revealed that even though each individual critique is concise and coherent, the set of critiques as a whole may exhibit several problems that detract from conciseness and coherence, and consequently assimilation. Thus a text planner was needed that could integrate the text plans for individual communicative goals to produce an overall text plan representing a concise, coherent message. This paper presents our general rule-based system for accomplishing this task. The system takes as input a \\emph{set} of individual text plans represented as RST-style trees, and produces a smaller set of more complex trees representing integrated messages that still achieve the multiple communicative goals of the individual text plans. Domain-independent rules are used to capture strategies across domains, while the facility for addition...

  19. How Popular Culture Texts Inform and Shape Students' Discussions of Social Studies Texts

    Science.gov (United States)

    Hall, Leigh A.

    2012-01-01

    In this article, I examine how 6th-grade students used pop culture texts to inform their understandings about social studies texts and shape their discussions of it. Discussions showed that students used pop culture texts in three ways when talking about social studies texts. First, students applied comprehension strategies to pop culture texts to…

  20. Text-Based Recall and Extra-Textual Generations Resulting from Simplified and Authentic Texts

    Science.gov (United States)

    Crossley, Scott A.; McNamara, Danielle S.

    2016-01-01

    This study uses a moving windows self-paced reading task to assess text comprehension of beginning and intermediate-level simplified texts and authentic texts by L2 learners engaged in a text-retelling task. Linear mixed effects (LME) models revealed statistically significant main effects for reading proficiency and text level on the number of…

  1. The network of concepts in written texts

    CERN Document Server

    Caldeira, S M G; Andrade, R F S; Neme, A; Miranda, J G V; Caldeira, Silvia M. G.; Lobao, Thierry C. Petit; Neme, Alexis

    2005-01-01

    Complex network theory is used to investigate the structure of meaningful concepts in written texts of individual authors. Networks have been constructed after a two phase filtering, where words with less meaning contents are eliminated, and all remaining words are set to their canonical form, without any number, gender or time flexion. Each sentence in the text is added to the network as a clique. A large number of written texts have been scrutinized, and its found that texts have small-world as well as scale-free structures. The growth process of these networks has also been investigated, and a universal evolution of network quantifiers have been found among the set of texts written by distinct authors. Further analyzes, based on shufling procedures taken either on the texts or on the constructed networks, provide hints on the role played by the word frequency and sentence length distributions to the network structure. Since the meaningful words are related to concepts in the author's mind, results for text...

  2. Figure-associated text summarization and evaluation.

    Directory of Open Access Journals (Sweden)

    Balaji Polepalli Ramesh

    Full Text Available Biomedical literature incorporates millions of figures, which are a rich and important knowledge resource for biomedical researchers. Scientists need access to the figures and the knowledge they represent in order to validate research findings and to generate new hypotheses. By themselves, these figures are nearly always incomprehensible to both humans and machines and their associated texts are therefore essential for full comprehension. The associated text of a figure, however, is scattered throughout its full-text article and contains redundant information content. In this paper, we report the continued development and evaluation of several figure summarization systems, the FigSum+ systems, that automatically identify associated texts, remove redundant information, and generate a text summary for every figure in an article. Using a set of 94 annotated figures selected from 19 different journals, we conducted an intrinsic evaluation of FigSum+. We evaluate the performance by precision, recall, F1, and ROUGE scores. The best FigSum+ system is based on an unsupervised method, achieving F1 score of 0.66 and ROUGE-1 score of 0.97. The annotated data is available at figshare.com (http://figshare.com/articles/Figure_Associated_Text_Summarization_and_Evaluation/858903.

  3. HANDWRITTEN TEXT IMAGE AUTHENTICATION USING BACK PROPAGATION

    Directory of Open Access Journals (Sweden)

    A S N Chakravarthy

    2011-10-01

    Full Text Available Authentication is the act of confirming the truth of an attribute of a datum or entity. This might involveconfirming the identity of a person, tracing the origins of an artefact, ensuring that a product is whatit’s packaging and labelling claims to be, or assuring that a computer program is a trusted one. Theauthentication of information can pose special problems (especially man-in-the-middle attacks, and isoften wrapped up with authenticating identity. Literary can involve imitating the style of a famous author.If an original manuscript, typewritten text, or recording is available, then the medium itself (or itspackaging - anything from a box to e-mail headers can help prove or disprove the authenticity of thedocument. The use of digital images of handwritten historical documents has become more popular inrecent years. Volunteers around the world now read thousands of these images as part of theirindexing process. Handwritten text images of old documents are sometimes difficult to read or noisy dueto the preservation of the document and quality of the image [1]. Handwritten text offers challenges thatare rarely encountered in machine-printed text. In addition, most problems faced in reading machineprintedtext (e.g., character recognition, word segmentation, letter segmentation, etc. are more severe, inhandwritten text. In this paper we Here in this paper we proposed a method for authenticating handwritten text images using back propagation algorithm..

  4. Chapter 16: text mining for translational bioinformatics.

    Directory of Open Access Journals (Sweden)

    K Bretonnel Cohen

    2013-04-01

    Full Text Available Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.

  5. Extracting and Sharing Knowledge from Medical Texts

    Institute of Scientific and Technical Information of China (English)

    曹存根

    2002-01-01

    In recent years, we have been developing a new framework for acquiring medical knowledge from Encyclopedic texts. This framework consists of three major parts. The first part is an extended high-level conceptual language (called HLCL 1.1) for use by knowledge engineers to formalize knowledge texts in an encyclopedia. The other part is an HLCL 1.1compiler for parsing and analyzing the formalized texts into knowledge models. The third part is a set of domain-specific ontologies for sharing knowledge.

  6. The Evaluation of Ontology Matching versus Text

    OpenAIRE

    Andreea-Diana MIHIS

    2010-01-01

    Lately, the ontologies have become more and more complex, and they are used in different domains. Some of the ontologies are domain independent; some are specific to a domain. In the case of text processing and information retrieval, it is important to identify the corresponding ontology to a specific text. If the ontology is of a great scale, only a part of it may be reflected in the natural language text. This article presents metrics which evaluate the degree in which an ontology matches a...

  7. A New Text Location Approach Based Wavelet

    Institute of Scientific and Technical Information of China (English)

    Weihua Li; Zhen Fang; Shuozhong Wang

    2002-01-01

    With the advancement of content-based retrieval technology, the importance of semantics for text information contained in images attracts many researchers. An algorithm which will automatically locate the textual regions in the input image will facilitate the retrieving task, and the optical character recognizer can then be applied to only those regions of the image which contain text. In this paper a new text location method based wavelet is described, which can be used to locate textual regions from complex image and video frame. Experimental results show that the textual regions in image can be located effectively and quickly.

  8. Editor of mind map's text equivalent

    OpenAIRE

    Hazuza, Petr

    2013-01-01

    The work analyzes the possibility of writing of mental maps in the text form and compares it with the classical creation of mental maps in graphical form. This work also tries to find the ideal solution of mental maps in the text form that fulfil the most functions possible as it is in the graphical version and at the same time it is a rival to the graphical version of mental maps thanks to its simplicity. Work then performs an analysis of the text editor of mental maps. It describes how and ...

  9. A New Text Location Approach Based Wavelet

    Institute of Scientific and Technical Information of China (English)

    Weihua Li; Zhen Fang; Shuozhong Wang

    2002-01-01

    With the advancement of content-based retrieval technology, the importance of semantics for text information contained in images attracts many researchers. An algorithm which will automatically locate the textual regions in the input image will facilitate the retrieving task, and the optical character recognizer can then be applied to only those regions of the image which contain text. In this paper a new text location method is described, which can be used to locate textual regions from complex image and video frame. Experimental results show that the textual regions in image can be located effectively and quickly.

  10. NOTICING AND TEXT-BASED CHAT

    Directory of Open Access Journals (Sweden)

    Chun Lai

    2006-09-01

    Full Text Available This study examined the capacity of text-based online chat to promote learners’ noticing of their problematic language productions and of the interactional feedback from their interlocutors. In this study, twelve ESL learners formed six mixed-proficiency dyads. The same dyads worked on two spot-the-difference tasks, one via online chat and the other through face-to-face conversation. Stimulated recall sessions were held subsequently to identify instances of noticing. It was found that text-based online chat promotes noticing more than face-to-face conversations, especially in terms of learners’ noticing of their own linguistic mistakes.

  11. Executive Decision:Text or Talk?

    Institute of Scientific and Technical Information of China (English)

    Rahma; Karam

    2011-01-01

    Nielsen Media, a global market researchcompany, reported in March that spending onvoice calls has gone down significantly over thelast five years, while customers’ text spendingis increasing. It’s anticipated that textingwill eclipse voice calls totally in three years.

  12. Strategies to Increase Accuracy in Text Classification

    NARCIS (Netherlands)

    Blommesteijn, D.

    2014-01-01

    Text classification via supervised learning involves various steps from processing raw data, features extraction to training and validating classifiers. Within these steps implementation decisions are critical to the resulting classifier accuracy. This paper contains a report of the study performed

  13. Building Fluency through the Phrased Text Lesson

    Science.gov (United States)

    Rasinski, Timothy; Yildirim, Kasim; Nageldinger, James

    2012-01-01

    This Teaching Tip article explores the importance of phrasing while reading. It also presents an instructional intervention strategy for helping students develop greater proficiency in reading with phrases that reflect the meaning of the text.

  14. A new text book for forest planning

    OpenAIRE

    Scotti R

    2007-01-01

    A new text book by P. Corona (University of Tuscia, Viterbo, Italy) is presented, dealing with sampling and measuring methods for determining forest stand volumes and increments in the frame of forest planning. The book is written in Italian.

  15. Talking, Texting Teen Drivers Take Deadly Toll

    Science.gov (United States)

    ... medlineplus.gov/news/fullstory_159138.html Talking, Texting Teen Drivers Take Deadly Toll Distractions played role in ... too many cases -- killing people in crashes involving teen drivers, a new report shows. A full 60 ...

  16. Voice to Text Language Translation (VTLT) Project

    Data.gov (United States)

    National Aeronautics and Space Administration — A feasibility analysis of adding a second modality to pilot/Air Traffic Control (ATC) communications. The real time availability of text in Air Traffic Control...

  17. QuitNowTXT Text Messaging Library

    Data.gov (United States)

    U.S. Department of Health & Human Services — Overview: The QuitNowTXT text messaging program is designed as a resource that can be adapted to specific contexts including those outside the United States and in...

  18. The Relationship between Paraphrasing and Text Analysis

    Directory of Open Access Journals (Sweden)

    María Luisa Cepeda Islas

    2013-04-01

    Full Text Available Given the importance of paraphrasing in the process of comprehension for college students, this study assessed the level of implementation of text analysis and paraphrases the response of a sample of senior students of the career psychology. We selected a group of freshmen to the Psychology course, which was asked to answer a questionnaire and carry out the summary of an empirical article. The results showed that participants have a low level of text analysis, at the same time had low levels of paraphrasing. It was seen that the predominant textual copy. They envision some possibilities for the structure of a training workshop not only paraphrasing but on the analysis of text.

  19. AUTOMATIC TEXT SUMMARIZATION BASED ON TEXTUAL COHESION

    Institute of Scientific and Technical Information of China (English)

    Chen Yanmin; Liu Bingquan; Wang Xiaolong

    2007-01-01

    This paper presents two different algorithms that derive the cohesion structure in the form of lexical chains from two kinds of language resources HowNet and TongYiCiCiLin.The research that connects the cohesion structure of a text to the derivation of its summary is displayed.A novel model of automatic text summarization is devised,based on the data provided by lexicai chains from original texts.Moreover,the construction rules of lexical chains are modified according to characteristics of the knowledge database in order to be more suitable for Chinese suIninarization.Evaluation results show that high quality indicative summaries are produced from Chinese texts.

  20. Punctuation effects in English and Esperanto texts

    CERN Document Server

    Ausloos, M

    2010-01-01

    A statistical physics study of punctuation effects on sentence lengths is presented for written texts: {\\it Alice in wonderland} and {\\it Through a looking glass}. The translation of the first text into esperanto is also considered as a test for the role of punctuation in defining a style, and for contrasting natural and artificial, but written, languages. Several log-log plots of the sentence length-rank relationship are presented for the major punctuation marks. Different power laws are observed with characteristic exponents. The exponent can take a value much less than unity ($ca.$ 0.50 or 0.30) depending on how a sentence is defined. The texts are also mapped into time series based on the word frequencies. The quantitative differences between the original and translated texts are very minutes, at the exponent level. It is argued that sentences seem to be more reliable than word distributions in discussing an author style.

  1. Punctuation effects in english and esperanto texts

    Science.gov (United States)

    Ausloos, M.

    2010-07-01

    A statistical physics study of punctuation effects on sentence lengths is presented for written texts: Alice in wonderland and Through a looking glass. The translation of the first text into esperanto is also considered as a test for the role of punctuation in defining a style, and for contrasting natural and artificial, but written, languages. Several log-log plots of the sentence-length-rank relationship are presented for the major punctuation marks. Different power laws are observed with characteristic exponents. The exponent can take a value much less than unity ( ca. 0.50 or 0.30) depending on how a sentence is defined. The texts are also mapped into time series based on the word frequencies. The quantitative differences between the original and translated texts are very minutes, at the exponent level. It is argued that sentences seem to be more reliable than word distributions in discussing an author style.

  2. Discovery of Recurring Anomalies in Text Reports

    Data.gov (United States)

    National Aeronautics and Space Administration — This paper describes the results of a significant research and development effort conducted at NASA Ames Research Center to develop new text mining algorithms to...

  3. Utterance and Text in Freshman English.

    Science.gov (United States)

    Lotto, Edward

    1989-01-01

    Analyzes the distinction between utterance and writing to determine why students have difficulty using specific details to explore their generalizations. Describes successful strategies and assignments to encourage student awareness of text and concrete expression. (KEH)

  4. WRITTEN TEXT AUTHOR'S CHARACTERISTICS ASCERTAINMENT (PROFILING)

    OpenAIRE

    Litvinova, Tat'yana

    2012-01-01

    Nowadays it is considered to be proved that a text reflects its author's personality, the author of the article states that one of the effective ways of personal peculiarities revelation is the analysis of deixis units, especially personal pronouns, prepositions and conjunctions, and suggests applying the techniques of a text author's personality peculiarities ascertainment by deixis units analysis to the Russian language and considering the possibility of deixis analysis as the means of a wr...

  5. Statistical machine translation for automobile marketing texts

    OpenAIRE

    Läubli, Samuel; Fishel, Mark; Weibel, Manuela; Volk, Martin

    2013-01-01

    We describe a project on introducing an in-house statistical machine translation system for marketing texts from the automobile industry with the final aim of replacing manual translation with post-editing, based on the translation system. The focus of the paper is the suitability of such texts for SMT; we present experiments in domain adaptation and decompounding that improve the baseline translation systems, the results of which are evaluated using automatic metrics as well as manual evalua...

  6. Challenges in Persian Electronic Text Analysis

    OpenAIRE

    QasemiZadeh, Behrang; Rahimi, Saeed; Ghalati, Mehdi Safaee

    2014-01-01

    Farsi, also known as Persian, is the official language of Iran and Tajikistan and one of the two main languages spoken in Afghanistan. Farsi enjoys a unified Arabic script as its writing system. In this paper we briefly introduce the writing standards of Farsi and highlight problems one would face when analyzing Farsi electronic texts, especially during development of Farsi corpora regarding to transcription and encoding of Farsi e-texts. The pointes mentioned may sounds easy but they are cru...

  7. Text Messaging for Addiction: A Review

    OpenAIRE

    Keoleian, Victoria; Polcin, Douglas; Galloway, Gantt P.

    2015-01-01

    Individuals seeking treatment for addiction often experience barriers due to cost, lack of local treatment resources, or either school or work schedule conflicts. Text messaging-based addiction treatment is inexpensive and has the potential to be widely accessible in real time. We conducted a comprehensive literature review identifying 11 published randomized controlled trials (RCTs) evaluating text messaging-based interventions for tobacco smoking, 4 studies for reducing alcohol consumption,...

  8. Comparison between Two Text Digital Watermarking Algorithms

    Institute of Scientific and Technical Information of China (English)

    TANG Sheng; XUE Xu-ce

    2011-01-01

    In this paper,two text digital watermarking methods are compared in the context of their robustness performances.A nonlinear watermarking algorithm embeds the watermark into the reordered DCT coefficients of a text image,and utilizes a nonlinear detector to detect the watermark in some attacks.Compared with the classical watermarking algorithm,experimental results show that this nonlinear watennarking nlgorithm has some potential merits.

  9. Text mining for the biocuration workflow

    OpenAIRE

    Hirschman, L.; Burns, G. A. P. C.; Krallinger, M.; Arighi, C.; Cohen, K. B.; Valencia, A.; Wu, C H; Chatr-aryamontri, A; Dowell, K. G.; Huala, E; Lourenco, A.; Nash, R; Veuthey, A.-L.; Wiegers, T.; Winter, A. G.

    2012-01-01

    Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations too...

  10. Financial Statement Fraud Detection using Text Mining

    OpenAIRE

    Rajan Gupta; Nasib Singh Gill

    2013-01-01

    Data mining techniques have been used enormously by the researchers’ community in detecting financial statement fraud. Most of the research in this direction has used the numbers (quantitative information) i.e. financial ratios present in the financial statements for detecting fraud. There is very little or no research on the analysis of text such as auditor’s comments or notes present in published reports. In this study we propose a text mining approach for detecting financial statement frau...

  11. Chapter 16: Text Mining for Translational Bioinformatics

    OpenAIRE

    Bretonnel Cohen, K; Hunter, Lawrence E.

    2013-01-01

    Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. P...

  12. Automatic Arabic Hand Written Text Recognition System

    Directory of Open Access Journals (Sweden)

    I. A. Jannoud

    2007-01-01

    Full Text Available Despite of the decent development of the pattern recognition science applications in the last decade of the twentieth century and this century, text recognition remains one of the most important problems in pattern recognition. To the best of our knowledge, little work has been done in the area of Arabic text recognition compared with those for Latin, Chins and Japanese text. The main difficulty encountered when dealing with Arabic text is the cursive nature of Arabic writing in both printed and handwritten forms. An Automatic Arabic Hand-Written Text Recognition (AHTR System is proposed. An efficient segmentation stage is required in order to divide a cursive word or sub-word into its constituting characters. After a word has been extracted from the scanned image, it is thinned and its base line is calculated by analysis of horizontal density histogram. The pattern is then followed through the base line and the segmentation points are detected. Thus after the segmentation stage, the cursive word is represented by a sequence of isolated characters. The recognition problem thus reduces to that of classifying each character. A set of features extracted from each individual characters. A minimum distance classifier is used. Some approaches are used for processing the characters and post processing added to enhance the results. Recognized characters will be appended directly to a word file which is editable form.

  13. Chapter 16: text mining for translational bioinformatics.

    Science.gov (United States)

    Cohen, K Bretonnel; Hunter, Lawrence E

    2013-04-01

    Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.

  14. A Survey of Unstructured Text Summarization Techniques

    Directory of Open Access Journals (Sweden)

    Sherif Elfayoumy

    2014-05-01

    Full Text Available Due to the explosive amounts of text data being created and organizations increased desire to leverage their data corpora, especially with the availability of Big Data platforms, there is not usually enough time to read and understand each document and make decisions based on document contents. Hence, there is a great demand for summarizing text documents to provide a representative substitute for the original documents. By improving summarizing techniques, precision of document retrieval through search queries against summarized documents is expected to improve in comparison to querying against the full spectrum of original documents. Several generic text summarization algorithms have been developed, each with its own advantages and disadvantages. For example, some algorithms are particularly good for summarizing short documents but not for long ones. Others perform well in identifying and summarizing single-topic documents but their precision degrades sharply with multi-topic documents. In this article we present a survey of the literature in text summarization. We also surveyed some of the most common evaluation methods for the quality of automated text summarization techniques. Last, we identified some of the challenging problems that are still open, in particular the need for a universal approach that yields good results for mixed types of documents.

  15. Monolingual accounting dictionaries for EFL text production

    Directory of Open Access Journals (Sweden)

    Sandro Nielsen

    2006-10-01

    Full Text Available Monolingual accounting dictionaries are important for producing financial reporting texts in English in an international setting, because of the lack of specialised bilingual dictionaries. As the intended user groups have different factual and linguistic competences, they require specific types of information. By identifying and analysing the users' factual and linguistic competences, user needs, use-situations and the stages involved in producing accounting texts in English as a foreign language, lexicographers will have a sound basis for designing the optimal English accounting dictionary for EFL text production. The monolingual accounting dictionary needs to include information about UK, US and international accounting terms, their grammatical properties, their potential for being combined with other words in collocations, phrases and sentences in order to meet user requirements. Data items that deal with these aspects are necessary for the international user group as they produce subject-field specific and register-specific texts in a foreign language, and the data items are relevant for the various stages in text production: draft writing, copyediting, stylistic editing and proofreading.

  16. Text Entry by Gazing and Smiling

    Directory of Open Access Journals (Sweden)

    Outi Tuisku

    2013-01-01

    Full Text Available Face Interface is a wearable prototype that combines the use of voluntary gaze direction and facial activations, for pointing and selecting objects on a computer screen, respectively. The aim was to investigate the functionality of the prototype for entering text. First, three on-screen keyboard layout designs were developed and tested (n=10 to find a layout that would be more suitable for text entry with the prototype than traditional QWERTY layout. The task was to enter one word ten times with each of the layouts by pointing letters with gaze and select them by smiling. Subjective ratings showed that a layout with large keys on the edge and small keys near the center of the keyboard was rated as the most enjoyable, clearest, and most functional. Second, using this layout, the aim of the second experiment (n=12 was to compare entering text with Face Interface to entering text with mouse. The results showed that text entry rate for Face Interface was 20 characters per minute (cpm and 27 cpm for the mouse. For Face Interface, keystrokes per character (KSPC value was 1.1 and minimum string distance (MSD error rate was 0.12. These values compare especially well with other similar techniques.

  17. Rupture sismique des fondations par perte de capacit\\'e portante: Le cas des semelles circulaires

    OpenAIRE

    Chatzigogos, Charisis; Pecker, Alain; Salençon, J.

    2008-01-01

    Within the context of earthquake-resistant design of shallow foundations, the present study is concerned with the determination of the seismic bearing capacity of a circular footing resting on the surface of a heterogene-ous purely cohesive semi-infinite soil layer. In the first part of the paper, a database, containing case histories of civil engineering structures that sustained a foundation seismic bearing capacity failure, is briefly pre-sented, aiming at a better understanding of the stu...

  18. Rupture sismique des fondations par perte de capacité portante: Le cas des semelles circulaires

    OpenAIRE

    Chatzigogos, Charisis; Pecker, Alain; Salençon, J.

    2007-01-01

    International audience Within the context of earthquake-resistant design of shallow foundations, the present study is concerned with the determination of the seismic bearing capacity of a circular footing resting on the surface of a heterogene-ous purely cohesive semi-infinite soil layer. In the first part of the paper, a database, containing case histories of civil engineering structures that sustained a foundation seismic bearing capacity failure, is briefly pre-sented, aiming at a bette...

  19. Sustainable packaging. Packaging for a circular economy; Duurzaam verpakken. Verpakken voor de circulaire economie

    Energy Technology Data Exchange (ETDEWEB)

    Haffmans, S. [Partners for Innovation, Amsterdam (Netherlands); Standhardt, G. [Nederlands Verpaskkingscentrum NVC, Gouda (Netherlands); Hamer, A. [Agentschap NL, Utrecht (Netherlands)

    2013-10-15

    What is Sustainable Packaging? And what is the most sustainable packaging for a product? The publication is intended for anyone who wants to take into account the environment in the design of a product and packaging. It offers concrete suggestions and inspiring examples to bring sustainable packaging into practice [Dutch] Wat is Duurzaam Verpakken? En wat is de duurzaamste verpakking voor mijn product? De publicatie is bestemd voor iedereen die rekening wil houden met het milieu bij het ontwerp van een product-verpakkingscombinatie. Ze biedt concrete aanknopingspunten en inspirerende voorbeelden om hier praktisch mee aan de slag te gaan.

  20. Rupture sismique des fondations par perte de capacit\\'e portante: Le cas des semelles circulaires

    CERN Document Server

    Chatzigogos, Charisis; Salençon, J

    2008-01-01

    Within the context of earthquake-resistant design of shallow foundations, the present study is concerned with the determination of the seismic bearing capacity of a circular footing resting on the surface of a heterogene-ous purely cohesive semi-infinite soil layer. In the first part of the paper, a database, containing case histories of civil engineering structures that sustained a foundation seismic bearing capacity failure, is briefly pre-sented, aiming at a better understanding of the studied phenomenon and offering a number of case studies useful for validation of theoretical computations. In the second part of the paper, the aforementioned problem is addressed using the kinematic approach of the Yield Design theory, thus establishing optimal upper bounds for the ultimate seismic loads supported by the soil-footing system. The results lead to the establishment of some very simple guidelines that extend the existing formulae for the seismic bearing capacity contained in the European norms (proposed for st...

  1. Native Language Processing using Exegy Text Miner

    Energy Technology Data Exchange (ETDEWEB)

    Compton, J

    2007-10-18

    Lawrence Livermore National Laboratory's New Architectures Testbed recently evaluated Exegy's Text Miner appliance to assess its applicability to high-performance, automated native language analysis. The evaluation was performed with support from the Computing Applications and Research Department in close collaboration with Global Security programs, and institutional activities in native language analysis. The Exegy Text Miner is a special-purpose device for detecting and flagging user-supplied patterns of characters, whether in streaming text or in collections of documents at very high rates. Patterns may consist of simple lists of words or complex expressions with sub-patterns linked by logical operators. These searches are accomplished through a combination of specialized hardware (i.e., one or more field-programmable gates arrays in addition to general-purpose processors) and proprietary software that exploits these individual components in an optimal manner (through parallelism and pipelining). For this application the Text Miner has performed accurately and reproducibly at high speeds approaching those documented by Exegy in its technical specifications. The Exegy Text Miner is primarily intended for the single-byte ASCII characters used in English, but at a technical level its capabilities are language-neutral and can be applied to multi-byte character sets such as those found in Arabic and Chinese. The system is used for searching databases or tracking streaming text with respect to one or more lexicons. In a real operational environment it is likely that data would need to be processed separately for each lexicon or search technique. However, the searches would be so fast that multiple passes should not be considered as a limitation a priori. Indeed, it is conceivable that large databases could be searched as often as necessary if new queries were deemed worthwhile. This project is concerned with evaluating the Exegy Text Miner installed in the

  2. ERRORS AND DIFFICULTIES IN TRANSLATING LEGAL TEXTS

    Directory of Open Access Journals (Sweden)

    Camelia, CHIRILA

    2014-11-01

    Full Text Available Nowadays the accurate translation of legal texts has become highly important as the mistranslation of a passage in a contract, for example, could lead to lawsuits and loss of money. Consequently, the translation of legal texts to other languages faces many difficulties and only professional translators specialised in legal translation should deal with the translation of legal documents and scholarly writings. The purpose of this paper is to analyze translation from three perspectives: translation quality, errors and difficulties encountered in translating legal texts and consequences of such errors in professional translation. First of all, the paper points out the importance of performing a good and correct translation, which is one of the most important elements to be considered when discussing translation. Furthermore, the paper presents an overview of the errors and difficulties in translating texts and of the consequences of errors in professional translation, with applications to the field of law. The paper is also an approach to the differences between languages (English and Romanian that can hinder comprehension for those who have embarked upon the difficult task of translation. The research method that I have used to achieve the objectives of the paper was the content analysis of various Romanian and foreign authors' works.

  3. A NOVEL MULTIDICTIONARY BASED TEXT COMPRESSION

    Directory of Open Access Journals (Sweden)

    Y. Venkataramani

    2012-01-01

    Full Text Available The amount of digital contents grows at a faster speed as a result does the demand for communicate them. On the other hand, the amount of storage and bandwidth increases at a slower rate. Thus powerful and efficient compression methods are required. The repetition of words and phrases cause the reordered text much more compressible than the original text. On the whole system is fast and achieves close to the best result on the test files. In this study a novel fast dictionary based text compression technique MBRH (Multidictionary with burrows wheeler transforms, Run length coding and Huffman coding is proposed for the purpose of obtaining improved performance on various document sizes. MBRH algorithm comprises of two stages, the first stage is concerned with the conversion of input text into dictionary based compression .The second stage deals mainly with reduction of the redundancy in multidictionary based compression by using BWT, RLE and Huffman coding. Bib test files of input size of 111, 261 bytes achieves compression ratio of 0.192, bit rate of 1.538 and high speed using MBRH algorithm. The algorithm has attained a good compression ratio, reduction of bit rate and the increase in execution speed.

  4. The Impact of Texting on Comprehension

    Directory of Open Access Journals (Sweden)

    Jamal K. M. Ali

    2015-07-01

    Full Text Available This paper presents a study of the effects of texting on English language comprehension. The authors believe that English used in texting causes a lack of comprehension for English speakers, learners, and texters. Wei, Xian-hai and Jiang (2008:3 declare “In Netspeak, there are some newly-created vocabularies, which people cannot comprehend them either from their partial pronunciation or from their figures.” Crystal (2007:23 claims; “variation causes problems of comprehension and acceptability. If you speak or write differently from the way I do, we may fail to understand each other.”  In this paper, the authors conducted a questionnaire at Aligarh Muslim University to ninety respondents from five different Faculties and four different levels. To measure respondents’ comprehension of English texting, the authors gave the respondents abbreviations used by texters and asked them to write the full forms of the abbreviations. The authors found that many abbreviations were not understood, which suggested that most of the respondents did not understand and did not use these abbreviations.Keywords: abbreviation, comprehension, texting, texters, variation

  5. Introduction, Critical Text logy and Textual Criticism

    Directory of Open Access Journals (Sweden)

    فرزاد قائمی

    2013-06-01

    Full Text Available Asadi’s Shahnameh is a great epic consisting of twenty-four thousand distiches and is attributed to Asadi or another poet of the same nickname. This work was created in the same line of development as Ferdowsi’s Shahnameh. The main theme is the old campaign of Soleymān to Iran to confront with Rostam and Keykhosrow and to repeat the pattern of Rostam’s battles with his children in a state of anonymity. The text structure is episodic with numerous central characters. The narratives are for the most part derived from oral literature. Textual evidence demonstrates that the poet is Shiite. The narrative content, chronogram as well as the literary and linguistic style of one of the manuscripts reveal that the text was written in the ninth century (probably 809 A.H.. The article first introduces the text and the origin of its narratives in oral literature; it then proceeds with the study of the narrative structure of the epic using three available manuscripts dating back to the thirteenth and fourteenth centuries (A.H.. Textology and Textual Criticism have been employed as the research methodology. The literary and linguistic features of the text have also been examined at three levels: lexical, syntactic and rhetorical.

  6. Handwriting segmentation of unconstrained Oriya text

    Indian Academy of Sciences (India)

    N Tripathy; U Pal

    2006-12-01

    Segmentation of handwritten text into lines, words and characters is one of the important steps in the handwritten text recognition process. In this paper we propose a water reservoir concept-based scheme for segmentation of unconstrained Oriya handwritten text into individual characters. Here, at first, the text image is segmented into lines, and the lines are then segmented into individual words. For line segmentation, the document is divided into vertical stripes. Analysing the heights of the water reservoirs obtained from different components of the document, the width of a stripe is calculated. Stripe-wise horizontal histograms are then computed and the relationship of the peak–valley points of the histograms is used for line segmentation. Based on vertical projection profiles and structural features of Oriya characters, text lines are segmented into words. For character segmentation, at first, the isolated and connected (touching) characters in a word are detected. Using structural, topological and water reservoir concept-based features, characters of the word that touch are then segmented. From experiments we have observed that the proposed “touching character” segmentation module has 96·7% accuracy for two-character touching strings.

  7. Putting Text Complexity in Context: Refocusing on Comprehension of Complex Text

    Science.gov (United States)

    Valencia, Sheila W.; Wixson, Karen K.; Pearson, P. David

    2014-01-01

    The Common Core State Standards for English Language Arts have prompted enormous attention to issues of text complexity. The purpose of this article is to put text complexity in perspective by moving from a primary focus on the text itself to a focus on the comprehension of complex text. We argue that a focus on comprehension is at the heart of…

  8. Exploring the Effect of Background Knowledge and Text Cohesion on Learning from Texts in Computer Science

    Science.gov (United States)

    Gasparinatou, Alexandra; Grigoriadou, Maria

    2013-01-01

    In this study, we examine the effect of background knowledge and local cohesion on learning from texts. The study is based on construction-integration model. Participants were 176 undergraduate students who read a Computer Science text. Half of the participants read a text of maximum local cohesion and the other a text of minimum local cohesion.…

  9. OMG! Texting in Class = U Fail :( Empirical Evidence That Text Messaging During Class Disrupts Comprehension

    Science.gov (United States)

    Gingerich, Amanda C.; Lineweaver, Tara T.

    2014-01-01

    In two experiments, we examined the effects of text messaging during lecture on comprehension of lecture material. Students (in Experiment 1) and randomly assigned participants (in Experiment 2) in a text message condition texted a prescribed conversation while listening to a brief lecture. Students and participants in the no-text condition…

  10. Runaway electrons in TEXT-U

    International Nuclear Information System (INIS)

    Runaway electrons have long been studied in tokamak plasmas. The previous results regarding runaway electrons and the detection of hard x-rays are reviewed. The hard x-ray energy on TEXT-U is measured and the scaling of energy with electron density, ne, is noted. This scaling suggests a runaway source term that scales roughly as ne/1. The results indicate that runaways are created throughout the discharges. An upper bound for Xe due to magnetic fluctuations was found to be .0343 m2/s. This is an order of magnitude too low to explain the thermal transport in TEXT, implying that electrostatic fluctuations are important in thermal transport in TEXT

  11. Context Based Word Sense Extraction in Text

    Directory of Open Access Journals (Sweden)

    Ranjeetsingh S.Suryawanshi

    2011-11-01

    Full Text Available In the era of modern e-document technology, everyone using computerized document for their purpose. Due to huge amount of text document available in the form of pdf, doc, txt, html, and xml user may confuse about reading sense of these entire documents, if same word interpret different sense. Word sense has always been an important problem in information retrieval and extraction, as well as, text mining, because machines don’t have that much intelligence as compared to human to sense word in particular context. User want to determine which sense of a word is used in a given context. Word is usage-based, and part of it can be created automatically from an electronic dictionary. This paper describes word sense as expressed by its WordNet synsets, arranged according to their relevance and their context are expressed by means of word association

  12. Text Classification Using Sentential Frequent Itemsets

    Institute of Scientific and Technical Information of China (English)

    Shi-Zhu Liu; He-Ping Hu

    2007-01-01

    Text classification techniques mostly rely on single term analysis of the document data set, while more concepts,especially the specific ones, are usually conveyed by set of terms. To achieve more accurate text classifier, more informative feature including frequent co-occurring words in the same sentence and their weights are particularly important in such scenarios. In this paper, we propose a novel approach using sentential frequent itemset, a concept comes from association rule mining, for text classification, which views a sentence rather than a document as a transaction, and uses a variable precision rough set based method to evaluate each sentential frequent itemset's contribution to the classification. Experiments over the Reuters and newsgroup corpus are carried out, which validate the practicability of the proposed system.

  13. WYLBUR reference manual. [For interactive text editing

    Energy Technology Data Exchange (ETDEWEB)

    Krupp, R.F.; Messina, P.C.; Peavler, J.M.; Schustack, S.; Starai, T.

    1977-04-01

    WYLBUR is a system for manipulating various kinds of text, such as computer programs, manuscripts, letters, forms, articles, or reports. Its on-line interactive text-editing capabilities allow the user to create, change, and correct text, and to search and display it. WYLBUR also has facilities for job submission and retrieval from remote terminals that make it possible for a user to inquire about the status of any job in the system, cancel jobs that are executing or awaiting execution, reroute output, raise job priority, or get information on the backlog of batch jobs. WYLBUR also has excellent recovery capabilities and a fast response time. This manual describes the WYLBUR version currently used at ANL. It is intended primarily as a reference manual; thus, examples of WYLBUR commands are kept to a minimum. (RWR)

  14. Kombination av text och bild i undervisningen

    OpenAIRE

    Lindbom, Yvonne

    2008-01-01

    Mitt arbete beskriver hur man kan arbeta ämnesövergripande med bild och text i kombination. Jag vill i mitt arbete genom exempel visa vikten av att bild och text vävs samman i undervisningen. Mitt val av arbete grundar sig på att mina elever saknade förståelse av bildspråket. Orden och texterna kom därför in som ett naturligt moment i undervisningen. Syftet med arbetet är att undersöka hur text och bild i kombination kan ge en djupare förståelse för bildspråket. Ett ämnesövergripande arbete g...

  15. Text mining patents for biomedical knowledge.

    Science.gov (United States)

    Rodriguez-Esteban, Raul; Bundschus, Markus

    2016-06-01

    Biomedical text mining of scientific knowledge bases, such as Medline, has received much attention in recent years. Given that text mining is able to automatically extract biomedical facts that revolve around entities such as genes, proteins, and drugs, from unstructured text sources, it is seen as a major enabler to foster biomedical research and drug discovery. In contrast to the biomedical literature, research into the mining of biomedical patents has not reached the same level of maturity. Here, we review existing work and highlight the associated technical challenges that emerge from automatically extracting facts from patents. We conclude by outlining potential future directions in this domain that could help drive biomedical research and drug discovery.

  16. Enhanced Integrated Scoring for Cleaning Dirty Texts

    CERN Document Server

    Wong, Wilson; Bennamoun, Mohammed

    2008-01-01

    An increasing number of approaches for ontology engineering from text are gearing towards the use of online sources such as company intranet and the World Wide Web. Despite such rise, not much work can be found in aspects of preprocessing and cleaning dirty texts from online sources. This paper presents an enhancement of an Integrated Scoring for Spelling error correction, Abbreviation expansion and Case restoration (ISSAC). ISSAC is implemented as part of a text preprocessing phase in an ontology engineering system. New evaluations performed on the enhanced ISSAC using 700 chat records reveal an improved accuracy of 98% as compared to 96.5% and 71% based on the use of only basic ISSAC and of Aspell, respectively.

  17. Text mining patents for biomedical knowledge.

    Science.gov (United States)

    Rodriguez-Esteban, Raul; Bundschus, Markus

    2016-06-01

    Biomedical text mining of scientific knowledge bases, such as Medline, has received much attention in recent years. Given that text mining is able to automatically extract biomedical facts that revolve around entities such as genes, proteins, and drugs, from unstructured text sources, it is seen as a major enabler to foster biomedical research and drug discovery. In contrast to the biomedical literature, research into the mining of biomedical patents has not reached the same level of maturity. Here, we review existing work and highlight the associated technical challenges that emerge from automatically extracting facts from patents. We conclude by outlining potential future directions in this domain that could help drive biomedical research and drug discovery. PMID:27179985

  18. Tagging and Morphological Disambiguation of Turkish Text

    CERN Document Server

    Oflazer, K; Oflazer, Kemal; Kuruoz, Ilker

    1994-01-01

    Automatic text tagging is an important component in higher level analysis of text corpora, and its output can be used in many natural language processing applications. In languages like Turkish or Finnish, with agglutinative morphology, morphological disambiguation is a very crucial process in tagging, as the structures of many lexical forms are morphologically ambiguous. This paper describes a POS tagger for Turkish text based on a full-scale two-level specification of Turkish morphology that is based on a lexicon of about 24,000 root words. This is augmented with a multi-word and idiomatic construct recognizer, and most importantly morphological disambiguator based on local neighborhood constraints, heuristics and limited amount of statistical information. The tagger also has functionality for statistics compilation and fine tuning of the morphological analyzer, such as logging erroneous morphological parses, commonly used roots, etc. Preliminary results indicate that the tagger can tag about 98-99\\% of the...

  19. Combinatorial Classification for Chunking Arabic Text

    Directory of Open Access Journals (Sweden)

    Feriel Ben Fraj

    2012-10-01

    Full Text Available Text parsing has always benefited from special attention since the first applications of natural languageprocessing (NLP. The problem gets worse for the Arabic language because of its specific features thatmake it quite different and even more ambiguous than other natural languages when processed. In thispaper, we discuss a new approach for chunking Arabic texts based on a combinatorial classificationprocess. It is a modular chunker that identifies the chunk heads using a combinatorial binary classificationbefore recognizing their types based on the parts-of-speech of the chunk heads, already identified. For theexperimentation, we use over than 2300 words as training data. The evaluation of the chunker consists oftwo steps and gives results that we consider very satisfactory (average accuracy of 89,60% for theclassification step and 80,46% for the full chunking process.

  20. Combinatorial Classification for Chunking Arabic Texts

    Directory of Open Access Journals (Sweden)

    Fériel Ben Fraj

    2012-09-01

    Full Text Available Text parsing has always benefited from special attention since the first applications of natural language processing (NLP. The problem gets worse for the Arabic language because of its specific features that make it quite different and even more ambiguous than other natural languages when processed. In this paper, we discuss a new approach for chunking Arabic texts based on a combinatorial classification process. It is a modular chunker that identifies the chunk heads using a combinatorial binary classification before recognizing their types based on the parts-of-speech of the chunk heads, already identified. For the experimentation, we use over than 2300 words as training data. The evaluation of the chunker consists of two steps and gives results that we consider very satisfactory (average accuracy of 89,60% for the classification step and 80,46% for the full chunking process.

  1. Urdu Text Classification using Majority Voting

    Directory of Open Access Journals (Sweden)

    Muhammad Usman

    2016-08-01

    Full Text Available Text classification is a tool to assign the predefined categories to the text documents using supervised machine learning algorithms. It has various practical applications like spam detection, sentiment detection, and detection of a natural language. Based on the idea we applied five well-known classification techniques on Urdu language corpus and assigned a class to the documents using majority voting. The corpus contains 21769 news documents of seven categories (Business, Entertainment, Culture, Health, Sports, and Weird. The algorithms were not able to work directly on the data, so we applied the preprocessing techniques like tokenization, stop words removal and a rule-based stemmer. After preprocessing 93400 features are extracted from the data to apply machine learning algorithms. Furthermore, we achieved up to 94% precision and recall using majority voting.

  2. Preprocessing and Morphological Analysis in Text Mining

    Directory of Open Access Journals (Sweden)

    Krishna Kumar Mohbey Sachin Tiwari

    2011-12-01

    Full Text Available This paper is based on the preprocessing activities which is performed by the software or language translators before applying mining algorithms on the huge data. Text mining is an important area of Data mining and it plays a vital role for extracting useful information from the huge database or data ware house. But before applying the text mining or information extraction process, preprocessing is must because the given data or dataset have the noisy, incomplete, inconsistent, dirty and unformatted data. In this paper we try to collect the necessary requirements for preprocessing. When we complete the preprocess task then we can easily extract the knowledgful information using mining strategy. This paper also provides the information about the analysis of data like tokenization, stemming and semantic analysis like phrase recognition and parsing. This paper also collect the procedures for preprocessing data i.e. it describe that how the stemming, tokenization or parsing are applied.

  3. Text Mining the History of Medicine.

    Directory of Open Access Journals (Sweden)

    Paul Thompson

    Full Text Available Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc., synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.. TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research

  4. Modified Approach to Transform Arc From Text to Linear Form Text : A Preprocessing Stage for OCR

    Directory of Open Access Journals (Sweden)

    Vijayashree C S

    2014-08-01

    Full Text Available Arc-form-text is an artistic-text which is quite common in several documents such as certificates, advertisements and history documents. OCRs fail to read such arc-form-text and it is necessary to transform the same to linear-form-text at preprocessing stage. In this paper, we present a modification to an existing transformation model for better readability by OCRs. The method takes the segmented arcform-text as input. Initially two concentric ellipses are approximated to enclose the arc-form-text and later the modified transformation model transforms the text in arc-form to linear-form. The proposed method is implemented on several upper semi-circular arc-form-text inputs and the readability of the transformed text is analyzed with an OCR

  5. Extracting Conceptual Feature Structures from Text

    DEFF Research Database (Denmark)

    Andreasen, Troels; Bulskov, Henrik; Jensen, Per Anker;

    2011-01-01

    and mapped into concepts in a generative ontology. Synonymous but linguistically quite distinct expressions are mapped to the same concept in the ontology. This allows us to perform a content-based search which will retrieve relevant documents independently of the linguistic form of the query as well......This paper describes an approach to indexing texts by their conceptual content using ontologies along with lexico-syntactic information and semantic role assignment provided by lexical resources. The conceptual content of meaningful chunks of text is transformed into conceptual feature structures...

  6. Text Data Mining: Theory and Methods

    Directory of Open Access Journals (Sweden)

    Jeffrey L. Solka

    2008-01-01

    Full Text Available This paper provides the reader with a very brief introduction to some of the theory and methods of text data mining. The intent of this article is to introduce the reader to some of the current methodologies that are employed within this discipline area while at the same time making the reader aware of some of the interesting challenges that remain to be solved within the area. Finally, the articles serves as a very rudimentary tutorial on some of techniques while also providing the reader with a list of references for additional study.

  7. Multilingual Topic Models for Unaligned Text

    CERN Document Server

    Boyd-Graber, Jordan

    2012-01-01

    We develop the multilingual topic model for unaligned text (MuTo), a probabilistic model of text that is designed to analyze corpora composed of documents in two languages. From these documents, MuTo uses stochastic EM to simultaneously discover both a matching between the languages and multilingual latent topics. We demonstrate that MuTo is able to find shared topics on real-world multilingual corpora, successfully pairing related documents across languages. MuTo provides a new framework for creating multilingual topic models without needing carefully curated parallel corpora and allows applications built using the topic model formalism to be applied to a much wider class of corpora.

  8. Quantum mechanics a comprehensive text for chemistry

    CERN Document Server

    Arora, Kishor

    2010-01-01

    This book contains 14 chapters. The text includes the inadequacy of classical mechanics and covers basic and fundamental concepts of quantum mechanics including concepts of transitional, vibration rotation and electronic energies, introduction to concepts of angular momenta, approximatemethods and their application concepts related to electron spin, symmetery concepts and quantum mechanics and ultimately the book features the theories of chemical bonding and use of softwares in quantum mechanics. the text of the book is presented in a lucid manner with ample examples and illustrations wherever

  9. Events and Trends in Text Streams

    Energy Technology Data Exchange (ETDEWEB)

    Engel, David W.; Whitney, Paul D.; Cramer, Nicholas O.

    2010-03-04

    "Text streams--collections of documents or messages that are generated and observed over time--are ubiquitous. Our research and development are targeted at developing algorithms to find and characterize changes in topic within text streams. To date, this research has demonstrated the ability to detect and describe 1) short duration, atypical events and 2) the emergence of longer-term shifts in topical content. This technology has been applied to predefined temporally ordered document collections but is also suitable for application to near-real-time textual data streams."

  10. Distinguishing Word Senses in Untagged Text

    CERN Document Server

    Pedersen, T; Pedersen, Ted; Bruce, Rebecca

    1997-01-01

    This paper describes an experimental comparison of three unsupervised learning algorithms that distinguish the sense of an ambiguous word in untagged text. The methods described in this paper, McQuitty's similarity analysis, Ward's minimum-variance method, and the EM algorithm, assign each instance of an ambiguous word to a known sense definition based solely on the values of automatically identifiable features in text. These methods and feature sets are found to be more successful in disambiguating nouns rather than adjectives or verbs. Overall, the most accurate of these procedures is McQuitty's similarity analysis in combination with a high dimensional feature set.

  11. A Sequential Algorithm for Training Text Classifiers

    CERN Document Server

    Lewis, D D; Lewis, David D.; Gale, William A.

    1994-01-01

    The ability to cheaply train text classifiers is critical to their use in information retrieval, content analysis, natural language processing, and other tasks involving data which is partly or fully textual. An algorithm for sequential sampling during machine learning of statistical classifiers was developed and tested on a newswire text categorization task. This method, which we call uncertainty sampling, reduced by as much as 500-fold the amount of training data that would have to be manually classified to achieve a given level of effectiveness.

  12. Text-Filled Stacked Area Graphs

    DEFF Research Database (Denmark)

    Kraus, Martin

    2011-01-01

    Text can add a significant amount of detail and value to an information visualization. In particular, it can integrate more of the data that a visualization is based on, and it can also integrate information that is personally relevant to readers of a visualization. This may influence readers...... to consider a visualization a detailed enrichment of their personal experience instead of an abstract representation of anonymous numbers. However, the integration of textual detail into a visualization is often very challenging. This work discusses one particular approach to this problem, namely text...

  13. Texts, Transmissions, Receptions. Modern Approaches to Narratives

    NARCIS (Netherlands)

    Lardinois, A.P.M.H.; Levie, S.A.; Hoeken, H.; Lüthy, C.H.

    2015-01-01

    The papers collected in this volume study the function and meaning of narrative texts from a variety of perspectives. The word “text” is used here in the broadest sense of the term: it denotes literary books, but also oral tales, speeches, newspaper articles and comics. One of the purposes of this v

  14. Texting your way to healthier eating?

    DEFF Research Database (Denmark)

    Pedersen, Susanne; Grønhøj, Alice; Thøgersen, John

    2016-01-01

    This study investigates the effects of a feedback intervention employing text messaging during 11 weeks on adolescents’ behavior, self-efficacy and outcome expectations regarding fruit and vegetable intake. A pre- and post-survey was completed by 1488 adolescents school-wise randomly allocated to ...

  15. Understanding Curriculum as a Racial Text.

    Science.gov (United States)

    Pinar, William F.

    1991-01-01

    Discusses curriculum as a racial text, focusing on European Americans as a major part of the racial dilemma. The Eurocentric curriculum denies nonwhite students role models and denies white students self-understanding. African Americans' presence informs every element of U.S. life, and the absence of African-American knowledge in curriculum…

  16. Fieldwork, Heritage and Engaging Landscape Texts

    Science.gov (United States)

    Mains, Susan P.

    2014-01-01

    This paper outlines and analyses efforts to critically engage with "heritage" through the development and responses to a series of undergraduate residential fieldwork trips held in the North Coast of Jamaica. The ways in which we read heritage through varied "texts"--specifically, material landscapes, guided heritage tours,…

  17. The Challenges of Qualitatively Coding Ancient Texts

    Science.gov (United States)

    Slingerland, Edward; Chudek, Maciej

    2012-01-01

    We respond to several important and valid concerns about our study ("The Prevalence of Folk Dualism in Early China," "Cognitive Science" 35: 997-1007) by Klein and Klein, defending our interpretation of our data. We also argue that, despite the undeniable challenges involved in qualitatively coding texts from ancient cultures, the standard tools…

  18. Elementary Functions, Student's Text, Unit 21.

    Science.gov (United States)

    Allen, Frank B.; And Others

    Unit 21 in the SMSG secondary school mathematics series is a student text covering the following topics in elementary functions: functions, polynomial functions, tangents to graphs of polynomial functions, exponential and logarithmic functions, and circular functions. Appendices discuss set notation, mathematical induction, significance of…

  19. Assessing Assessment Texts: Where Is Planning?

    Science.gov (United States)

    Fives, Helenrose; Barnes, Nicole; Dacey, Charity; Gillis, Anna

    2016-01-01

    We conducted a content analysis of 27 assessment textbooks to determine how assessment planning was framed in texts for preservice teachers. We identified eight assessment planning themes: alignment, assessment purpose and types, reliability and validity, writing goals and objectives, planning specific assessments, unpacking, overall assessment…

  20. Leveled Reading and Engagement with Complex Texts

    Science.gov (United States)

    Hastings, Kathryn

    2016-01-01

    The benefits of engaging with age-appropriate reading materials in classroom settings are numerous. For example, students' comprehension is developed as they acquire new vocabulary and concepts. The Common Core requires all students have daily opportunities to engage with "complex text" regardless of students' decoding levels. However,…

  1. Improving text recall with multiple summaries

    NARCIS (Netherlands)

    Meij, van der Hans; Meij, van der Jan

    2012-01-01

    Background. QuikScan (QS) is an innovative design that aims to improve accessibility, comprehensibility, and subsequent recall of expository text by means of frequent within-document summaries that are formatted as numbered list items. The numbers in the QS summaries correspond to numbers placed in

  2. Exploring Academic Voice in Multimodal Quantitative Texts

    Directory of Open Access Journals (Sweden)

    Robert Prince

    2014-10-01

    Full Text Available Research on students’ academic literacies practices has tended to focus on the written mode in order to understand the academic conventions necessary to access Higher Education. However, the representation of quantitative information can be a challenge to many students. Quantitative information can be represented through a range of modes (such as writing, visuals and numbers and different information graphics (such as tables, charts, graphs. This paper focuses on the semiotic aspects of graphic representation in academic work, using student and published data from the Health Science, and an information graphic from the social domain as a counterpoint to explore aspects about agency and choice in academic voice in multimodal texts. It explores voice in terms of three aspects which work across modes, namely authorial engagement, citation and modality. The work of different modes and their inter-relations in quantitative texts is established, as is the use of sources in citation. We also look at the ways in which credibility and validity are established through modality. This exploration reveals that there is a complex interplay of modes in the construction of academic voice, which are largely tacit. This has implications for the way we think about and teach writing and text-making in quantitative disciplines in Higher Education.

  3. Modeling statistical properties of written text.

    Directory of Open Access Journals (Sweden)

    M Angeles Serrano

    Full Text Available Written text is one of the fundamental manifestations of human language, and the study of its universal regularities can give clues about how our brains process information and how we, as a society, organize and share it. Among these regularities, only Zipf's law has been explored in depth. Other basic properties, such as the existence of bursts of rare words in specific documents, have only been studied independently of each other and mainly by descriptive models. As a consequence, there is a lack of understanding of linguistic processes as complex emergent phenomena. Beyond Zipf's law for word frequencies, here we focus on burstiness, Heaps' law describing the sublinear growth of vocabulary size with the length of a document, and the topicality of document collections, which encode correlations within and across documents absent in random null models. We introduce and validate a generative model that explains the simultaneous emergence of all these patterns from simple rules. As a result, we find a connection between the bursty nature of rare words and the topical organization of texts and identify dynamic word ranking and memory across documents as key mechanisms explaining the non trivial organization of written text. Our research can have broad implications and practical applications in computer science, cognitive science and linguistics.

  4. Text Memorisation in Chinese Foreign Language Education

    Science.gov (United States)

    Yu, Xia

    2012-01-01

    In China, a widespread learning practice for foreign languages are reading, reciting and memorising texts. This book investigates this practice against a background of Confucian heritage learning and western attitudes towards memorising, particularly audio-lingual approaches to language teaching and later largely negative attitudes. The author…

  5. CONAN : Text Mining in the Biomedical Domain

    NARCIS (Netherlands)

    Malik, R.

    2006-01-01

    This thesis is about Text Mining. Extracting important information from literature. In the last years, the number of biomedical articles and journals is growing exponentially. Scientists might not find the information they want because of the large number of publications. Therefore a system was cons

  6. Interword spacing in Chinese text layout.

    Science.gov (United States)

    Hsu, S H; Huang, K C

    2000-10-01

    Three experiments using Chinese text were conducted to investigate word spacing and its effect on reading performance. In Exp. 1, a sonogram detector was used to analyze interword and intercharacter (within a word) time intervals from text read aloud by professional TV broadcasters versus college graduates. The results showed interword intervals were significantly longer than intercharacter intervals, indicating that interword spacing has psychological reality in speech. Exp. 2 examined the effect on reading performance due to separating the characters that compose a word. Separating the characters of a word did not decrease reading accuracy but did result in significantly longer reading times. Exp. 3 explored the effect of word spacing in Chinese sentences on reading performance. Analysis showed that word spacing did not affect reading accuracy, but half character and whole-character spacing significantly reduced reading time. The results of the present study suggest that word spacing in Chinese text layout enhances reading performance. Word spacing may help the reader to segment more quickly a string of characters into words and reduce the likelihood of misinterpretation. Also, ambiguity of sentence structure severely degraded reading accuracy. The implications of the results for word spacing design in Chinese text are discussed. PMID:11065294

  7. INNER DIALOGICITY OF MEDICAL SCIENTIFIC TEXTS

    Directory of Open Access Journals (Sweden)

    Efremova Nataliya Vladimirovna

    2015-06-01

    Full Text Available The author studies inner dialogicity as an integral property of a scientist's thinking activity, a way of a scientific idea development, one of the cognitive and discursive mechanisms of new knowledge formation, its crystallization and dementalisation in a text, as a way of search for truth. Such approach to dialogicity in the study of a scientific text makes it possible to analyze the cogitative processes proceeding in human consciousness and cognitive activity, allows to fully understand the stated scientific concept, to define pragmatic strategies of the author, to plunge into his reflexive world. On the material of medical scientific texts of N.M. Amosov and F. G. Uglov, famous scientists in the field of cardio surgery, it is established that traces of internal dialogicity manifestation in the textual space of scientists actualize the origin of new knowledge, the change of author's semantic positions, his ability to reflect, compare, analyze his own thoughts and actions, to estimate oneself and the features of thinking process which are realized in logic of a statement of the scientific concept, an explanation of concepts, terms at judgment of the points of view of contemporaries and predecessors, adherents and scientist's opponents, and also orientation to the addressee's presupposition, activization of his cogitative activity. Linguistic, discursive, verbal analysis singles out the impact on the addressee, his mental activity.

  8. "The Politics of Location": Text as Opposition.

    Science.gov (United States)

    Moreno, Renee

    Eduardo Galeano's "Memory of Fire: Genesis" raises a number of questions concerning the "politics of location," a term that may be defined as the intersections, tensions, and complications that people of color bring to space and what space means in terms of hierarchies and power, racial and gender stratifications. Text can also be a fluid,…

  9. Writing Treatment for Aphasia: A Texting Approach

    Science.gov (United States)

    Beeson, Pelagie M.; Higginson, Kristina; Rising, Kindle

    2013-01-01

    Purpose: Treatment studies have documented the therapeutic and functional value of lexical writing treatment for individuals with severe aphasia. The purpose of this study was to determine whether such retraining could be accomplished using the typing feature of a cellular telephone, with the ultimate goal of using text messaging for…

  10. Linguistic expertise of the advertising text

    Directory of Open Access Journals (Sweden)

    Milaeva O. V.

    2011-03-01

    Full Text Available This article is devoted to the analysis such indicator of development of Russian science, such as bibliometric (the number of of publications in the world of publication stream. The main analyzed period is 1999-2008 years. Statistical data on the main directions of research are taken from the analytical report Thomson Reuters during January 2010.

  11. Texts, Languages & Information Technology in Egyptology. Introduction

    OpenAIRE

    Polis, Stéphane

    2013-01-01

    A short introduction to the volume "Texts, Languages & Information Technology in Egyptology. Selected papers from the meeting of the Computer Working Group of the International Association of Egyptologists (Informatique & Égyptologie), Liège, 6-8 July 2010".

  12. BaffleText: a Human Interactive Proof

    Science.gov (United States)

    Chew, Monica; Baird, Henry S.

    2003-01-01

    Internet services designed for human use are being abused by programs. We present a defense against such attacks in the form of a CAPTCHA (Completely Automatic Public Turing test to tell Computers and Humans Apart) that exploits the difference in ability between humans and machines in reading images of text. CAPTCHAs are a special case of 'human interactive proofs,' a broad class of security protocols that allow people to identify themselves over networks as members of given groups. We point out vulnerabilities of reading-based CAPTCHAs to dictionary and computer-vision attacks. We also draw on the literature on the psychophysics of human reading, which suggests fresh defenses available to CAPTCHAs. Motivated by these considerations, we propose BaffleText, a CAPTCHA which uses non-English pronounceable words to defend against dictionary attacks, and Gestalt-motivated image-masking degradations to defend against image restoration attacks. Experiments on human subjects confirm the human legibility and user acceptance of BaffleText images. We have found an image-complexity measure that correlates well with user acceptance and assists in engineering the generation of challenges to fit the ability gap. Recent computer-vision attacks, run independently by Mori and Jitendra, suggest that BaffleText is stronger than two existing CAPTCHAs.

  13. Texts and Literacies of the Shi Jinrui

    Science.gov (United States)

    Carrington, Victoria

    2004-01-01

    In post-industrial societies saturated with the multimodal texts of consumer culture--film, computer games, interactive toys, SMS, email, the internet, television, DVDs--young people are developing literacy skills and knowledge in and for a world significantly changed from that of their parents and educators. Given this context, this paper seeks…

  14. Examining Response Confidence in Multiple Text Tasks

    Science.gov (United States)

    List, Alexandra; Alexander, Patricia A.

    2015-01-01

    Students' confidence in their responses to a multiple text-processing task and their justifications for those confidence ratings were investigated. Specifically, 215 undergraduates responded to two academic questions, differing by type (i.e., discrete and open-ended) and by domain (i.e., developmental psychology and astrophysics), using a digital…

  15. Validation Study of Waray Text Readability Instrument

    Science.gov (United States)

    Oyzon, Voltaire Q.; Corrales, Juven B.; Estardo, Wilfredo M., Jr.

    2015-01-01

    In 2012 the Leyte Normal University developed a computer software--modelled after the Spache Readability Formula (1953) made for English--made to help rank texts that can is used by teachers or research groups on selecting appropriate reading materials to support the DepEd's MTB-MLE program in Region VIII, in the Philippines. However,…

  16. Text Mining applied to Molecular Biology

    NARCIS (Netherlands)

    R. Jelier (Rob)

    2008-01-01

    textabstractThis thesis describes the development of text-mining algorithms for molecular biology, in particular for DNA microarray data analysis. Concept profiles were introduced, which characterize the context in which a gene is mentioned in literature, to retrieve functional associations

  17. Polarity Analysis of Texts using Discourse Structure

    NARCIS (Netherlands)

    Heerschop, Bas; Goosen, Frank; Hogenboom, Alexander; Frasincar, Flavius; Kaymak, Uzay; Jong, de Franciska

    2011-01-01

    Sentiment analysis has applications in many areas and the exploration of its potential has only just begun. We propose Pathos, a framework which performs document sentiment analysis (partly) based on a document’s discourse structure. We hypothesize that by splitting a text into important and less im

  18. REPRESENTATION OF FABRICATED KNOWLEDGE IN SCIENTIFIC TEXTS

    OpenAIRE

    Menshakova, N.N.

    2015-01-01

    In the article the phenomenon of fabricated knowledge is considered. The author views its nature, its use by researchers for verbalisation of new objective knowledge and for persuasion. The author names the main ways of representation of fabricated knowledge in scientific text.

  19. Evaluating Text-to-Speech Synthesizers

    Science.gov (United States)

    Cardoso, Walcir; Smith, George; Fuentes, Cesar Garcia

    2015-01-01

    Text-To-Speech (TTS) synthesizers have piqued the interest of researchers for their potential to enhance the L2 acquisition of writing (Kirstein, 2006), vocabulary and reading (Proctor, Dalton, & Grisham, 2007) and pronunciation (Cardoso, Collins, & White, 2012; Soler-Urzua, 2011). Despite their proven effectiveness, there is a need for…

  20. Full Text Journal Subscriptions: An Evolutionary Process.

    Science.gov (United States)

    Luther, Judy

    1997-01-01

    Provides an overview of companies offering Web accessible subscriptions to full text electronic versions of scientific, technical, and medical journals (Academic Press, Blackwell, EBSCO, Elsevier, Highwire Press, Information Quest, Institute of Physics, Johns Hopkins University Press, OCLC, OVID, Springer, and SWETS). Also lists guidelines for…

  1. Automatic Induction of Rule Based Text Categorization

    Directory of Open Access Journals (Sweden)

    D.Maghesh Kumar

    2010-12-01

    Full Text Available The automated categorization of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuingneed to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. This paper describes, a novel method for the automatic induction of rule-based text classifiers. This method supports a hypothesis language of the form "if T1, … or Tn occurs in document d, and none of T1+n,... Tn+m occurs in d, then classify d under category c," where each Ti is a conjunction of terms. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. Issues pertaining tothree different problems, namely, document representation, classifier construction, and classifier evaluation were discussed in detail.

  2. Investigating Text Input Methods for Mobile Phones

    Directory of Open Access Journals (Sweden)

    Barry O’Riordan

    2005-01-01

    Full Text Available Human Computer Interaction is a primary factor in the success or failure of any device but if an objective view is taken of the current mobile phone market you would be forgiven for thinking usability was secondary to aesthetics. Many phone manufacturers modify the design of phones to be different than the competition and to target fashion trends, usually at the expense of usability and performance. There is a lack of awareness among many buyers of the usability of the device they are purchasing and the disposability of modern technology is an effect rather than a cause of this. Designing new text entry methods for mobile devices can be expensive and labour-intensive. The assessment and comparison of a new text entry method with current methods is a necessary part of the design process. The best way to do this is through an empirical evaluation. The aim of the study was to establish which mobile phone text input method best suits the requirements of a select group of target users. This study used a diverse range of users to compare devices that are in everyday use by most of the adult population. The proliferation of the devices is as yet unmatched by the study of their application and the consideration of their user friendliness.

  3. COMPENDEX/TEXT-PAC: RETROSPECTIVE SEARCH.

    Science.gov (United States)

    Standera, Oldrich

    The Text-Pac System is capable of generating indexes and bulletins to provide a current information service without the selectivity feature. Indexes of the accumulated data base may also be used as a basis for manual retrospective searching. The manual search involves searching computer-prepared indexes from a machine readable data base produced…

  4. A Scheme for Text Analysis Using Fortran.

    Science.gov (United States)

    Koether, Mary E.; Coke, Esther U.

    Using string-manipulation algorithms, FORTRAN computer programs were designed for analysis of written material. The programs measure length of a text and its complexity in terms of the average length of words and sentences, map the occurrences of keywords or phrases, calculate word frequency distribution and certain indicators of style. Trials of…

  5. The Cultural Content of Business Spanish Texts.

    Science.gov (United States)

    Grosse, Christine Uber; Uber, David

    A study examined eight business Spanish textbooks for cultural content by looking at commonly appearing cultural topics and themes, presentation of cultural information, activities and techniques used to promote cultural understanding, and incorporation of authentic materials. The texts were evenly divided among beginning, intermediate, and…

  6. Text Mining the History of Medicine.

    Science.gov (United States)

    Thompson, Paul; Batista-Navarro, Riza Theresa; Kontonatsios, Georgios; Carter, Jacob; Toon, Elizabeth; McNaught, John; Timmermann, Carsten; Worboys, Michael; Ananiadou, Sophia

    2016-01-01

    Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while

  7. Text Mining the History of Medicine.

    Science.gov (United States)

    Thompson, Paul; Batista-Navarro, Riza Theresa; Kontonatsios, Georgios; Carter, Jacob; Toon, Elizabeth; McNaught, John; Timmermann, Carsten; Worboys, Michael; Ananiadou, Sophia

    2016-01-01

    Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while

  8. Making School Development Credible. Text, Context, Irony

    Directory of Open Access Journals (Sweden)

    Mats Börjesson

    2012-01-01

    Full Text Available

    The article argues for the importance of an open, reflexive-methodological approach when switching between studying text, context and researcher activity. Close linguistic analysis can benefit from being linked with the researcher’s contextualisation of his empirical material as well as with more distanced readings. The more specific starting point for this article is that school development, like other similar terms such as school improvement and the like, makes use of linguistic building blocks with which whole narratives about today’s and tomorrow’s schools can be constructed. The subject of the study is a short text issued by the Swedish Schools Inspectorate (Skolinspektionen. Government language changes according to the authorities’ role in society and their own definitions of their functions, and an important aspect here is the legitimacy of the authorities’ texts. By means of various kinds of close linguistic analysis, the above-mentioned text is studied with regard to choice of categories, hierarchies of modalisation and the rhetorical effects of different types of formulations in a broader political-social landscape. The article concludes with a reflective discussion on the relationship between government language and irony as a stylistic device – a device that is based on the results of the close empirical analysis.[i]



    [i] The article is part of the project ”School  Development as Narrative”, funded by the Swedish Research Council. The author would like to thank the two reviewers for very valuable comments.

  9. A TWO STAGE METHOD FOR BENGALI TEXT EXTRACTION FROM STILL IMAGES CONTAINING TEXT

    Directory of Open Access Journals (Sweden)

    Ankita Sikdar

    2012-07-01

    Full Text Available Bengali text data present in multimedia images having multiple content forms, such as still images and text, contain information that when extracted finds a lot of applications. The images can be of different types, where objects and text may be completely separated or overlapped or embedded in each other. The Bengali text can be of different shapes and sizes. Extraction of text from these types of images becomes challenging because the textual portion has to be correctly separated from the rest of the background. The input image passes through two stages. The first step tries to locate the different components in the image using entropy filtering and the second stage distinguishes the components representing text from the non-textual components based on several features of Bengali text. The text thus obtained from the image can then be used in software such as Bengali OCR for character recognition.

  10. Text Character Extraction Implementation from Captured Handwritten Image to Text Conversionusing Template Matching Technique

    Directory of Open Access Journals (Sweden)

    Barate Seema

    2016-01-01

    Full Text Available Images contain various types of useful information that should be extracted whenever required. A various algorithms and methods are proposed to extract text from the given image, and by using that user will be able to access the text from any image. Variations in text may occur because of differences in size, style,orientation, alignment of text, and low image contrast, composite backgrounds make the problem during extraction of text. If we develop an application that extracts and recognizes those texts accurately in real time, then it can be applied to many important applications like document analysis, vehicle license plate extraction, text- based image indexing, etc and many applications have become realities in recent years. To overcome the above problems we develop such application that will convert the image into text by using algorithms, such as bounding box, HSV model, blob analysis,template matching, template generation.

  11. DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures.

    Directory of Open Access Journals (Sweden)

    Xu-Cheng Yin

    Full Text Available Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information. A high-quality ground truth standard can greatly facilitate the development of an automated system. This article describes DeTEXT: A database for evaluating text extraction from biomedical literature figures. It is the first publicly available, human-annotated, high quality, and large-scale figure-text dataset with 288 full-text articles, 500 biomedical figures, and 9308 text regions. This article describes how figures were selected from open-access full-text biomedical articles and how annotation guidelines and annotation tools were developed. We also discuss the inter-annotator agreement and the reliability of the annotations. We summarize the statistics of the DeTEXT data and make available evaluation protocols for DeTEXT. Finally we lay out challenges we observed in the automated detection and recognition of figure text and discuss research directions in this area. DeTEXT is publicly available for downloading at http://prir.ustb.edu.cn/DeTEXT/.

  12. Can An Evolutionary Process Create English Text?

    Energy Technology Data Exchange (ETDEWEB)

    Bailey, David H.

    2008-10-29

    Critics of the conventional theory of biological evolution have asserted that while natural processes might result in some limited diversity, nothing fundamentally new can arise from 'random' evolution. In response, biologists such as Richard Dawkins have demonstrated that a computer program can generate a specific short phrase via evolution-like iterations starting with random gibberish. While such demonstrations are intriguing, they are flawed in that they have a fixed, pre-specified future target, whereas in real biological evolution there is no fixed future target, but only a complicated 'fitness landscape'. In this study, a significantly more sophisticated evolutionary scheme is employed to produce text segments reminiscent of a Charles Dickens novel. The aggregate size of these segments is larger than the computer program and the input Dickens text, even when comparing compressed data (as a measure of information content).

  13. Online Visual Analytics of Text Streams.

    Science.gov (United States)

    Liu, Shixia; Yin, Jialun; Wang, Xiting; Cui, Weiwei; Cao, Kelei; Pei, Jian

    2016-11-01

    We present an online visual analytics approach to helping users explore and understand hierarchical topic evolution in high-volume text streams. The key idea behind this approach is to identify representative topics in incoming documents and align them with the existing representative topics that they immediately follow (in time). To this end, we learn a set of streaming tree cuts from topic trees based on user-selected focus nodes. A dynamic Bayesian network model has been developed to derive the tree cuts in the incoming topic trees to balance the fitness of each tree cut and the smoothness between adjacent tree cuts. By connecting the corresponding topics at different times, we are able to provide an overview of the evolving hierarchical topics. A sedimentation-based visualization has been designed to enable the interactive analysis of streaming text data from global patterns to local details. We evaluated our method on real-world datasets and the results are generally favorable.

  14. Ordinary differential equations a graduate text

    CERN Document Server

    Bhamra, K S

    2015-01-01

    ORDINARY DIFFERENTIAL EQUATIONS: A Graduate Text presents a systematic and comprehensive introduction to ODEs for graduate and postgraduate students. The systematic organized text on differential inequalities, Gronwall's inequality, Nagumo's theorems, Osgood's criteria and applications of different equations of first order is dealt with in a greater depth. The book discusses qualitative and quantitative aspects of the Strum - Liouville problems, Green's function, integral equations, Laplace transform and is supported by a number of worked-out examples in each lesson to make the concepts clear. A lot of stress on stability theory is laid down, especially on Lyapunov and Poincare stability theory. A numerous figures in various lessons (in particular lessons dealing with stability theory) have been added to clarify the key concepts in DE theory. Nonlinear oscillation in conservative systems and Hamiltonian systems highlights basic nature of the systems considered. Perturbation techniques lesson deals in fairly d...

  15. Cell Phoning and Texting While Driving

    Directory of Open Access Journals (Sweden)

    Judy Honoria Rosaire Telemaque

    2015-07-01

    Full Text Available A qualitative phenomenological study was conducted on the consequences of cell phone use while operating a vehicle. We discussed why talking and texting on cell phones are so popular through the analysis of our interviews with police officers, driving instructors, and parents of teens and young adults. The participants came from central, northeastern, northwestern, and southeastern Connecticut. All had exposure with respect to the effects of cell phone usage problem. The study reached a point of theoretical saturation or redundancy by which the analysis no longer resulted in new themes. We concluded that the discoveries revealed the necessity for education, expansion of technology, and additional driver education preparation, which may provide a path for leadership to help solve the problem.

  16. There is a Text in 'The Balloon'

    DEFF Research Database (Denmark)

    Elias, Camelia

    2009-01-01

    From the Introduction: Camelia Elias' "There is a Text in 'The Balloon': Donald Barthelme's Allegorical Flights" provides its reader with a much-need and useful distinction between fantasy and the fantastic: "whereas fantasy in critical discourse can be aligned with allegory, in which a supernatu......From the Introduction: Camelia Elias' "There is a Text in 'The Balloon': Donald Barthelme's Allegorical Flights" provides its reader with a much-need and useful distinction between fantasy and the fantastic: "whereas fantasy in critical discourse can be aligned with allegory, in which...... York in Donald Barthelme's short story  "The Balloon" from 1968 is discussed in the light of the chapter's epistemological understanding of fantasy....

  17. Text to Speech Conversion with Phonematic Concatenation

    Directory of Open Access Journals (Sweden)

    Tapas Kumar Patra

    2012-09-01

    Full Text Available This paper presents a method to design a Text to Speech conversion module by the use of Matlab by simple matrix operations. Firstly by the use of microphone some similar sounding words are recorded using a record program in the Matlab window and recorded sounds are saved in .wav format in the directory. The recorded sounds are then sampled and the sampled values are taken and separated into their constituent phonetics. The separated syllables are then concatenated to reconstruct the desired words. By the use of various Matlab commands i.e. wavread, subplot etc. the waves are sampled and extracted to get the desired result. This method is simple to implement and involves much lesser use of memory spaces.

  18. Bimodal Emotion Recognition from Speech and Text

    Directory of Open Access Journals (Sweden)

    Weilin Ye

    2014-01-01

    Full Text Available This paper presents an approach to emotion recognition from speech signals and textual content. In the analysis of speech signals, thirty-seven acoustic features are extracted from the speech input. Two different classifiers Support Vector Machines (SVMs and BP neural network are adopted to classify the emotional states. In text analysis, we use the two-step classification method to recognize the emotional states. The final emotional state is determined based on the emotion outputs from the acoustic and textual analyses. In this paper we have two parallel classifiers for acoustic information and two serial classifiers for textual information, and a final decision is made by combing these classifiers in decision level fusion. Experimental results show that the emotion recognition accuracy of the integrated system is better than that of either of the two individual approaches.

  19. Automated assessment of medical training evaluation text.

    Science.gov (United States)

    Zhang, Rui; Pakhomov, Serguei; Gladding, Sophia; Aylward, Michael; Borman-Shoap, Emily; Melton, Genevieve B

    2012-01-01

    Medical post-graduate residency training and medical student training increasingly utilize electronic systems to evaluate trainee performance based on defined training competencies with quantitative and qualitative data, the later of which typically consists of text comments. Medical education is concomitantly becoming a growing area of clinical research. While electronic systems have proliferated in number, little work has been done to help manage and analyze qualitative data from these evaluations. We explored the use of text-mining techniques to assist medical education researchers in sentiment analysis and topic analysis of residency evaluations with a sample of 812 evaluation statements. While comments were predominantly positive, sentiment analysis improved the ability to discriminate statements with 93% accuracy. Similar to other domains, Latent Dirichlet Analysis and Information Gain revealed groups of core subjects and appear to be useful for identifying topics from this data.

  20. Extraction of information from unstructured text

    Energy Technology Data Exchange (ETDEWEB)

    Irwin, N.H.; DeLand, S.M.; Crowder, S.V.

    1995-11-01

    Extracting information from unstructured text has become an emphasis in recent years due to the large amount of text now electronically available. This status report describes the findings and work done by the end of the first year of a two-year LDRD. Requirements of the approach included that it model the information in a domain independent way. This means that it would differ from current systems by not relying on previously built domain knowledge and that it would do more than keyword identification. Three areas that are discussed and expected to contribute to a solution include (1) identifying key entities through document level profiling and preprocessing, (2) identifying relationships between entities through sentence level syntax, and (3) combining the first two with semantic knowledge about the terms.

  1. Choices of texts for literary education

    DEFF Research Database (Denmark)

    Skyggebjerg, Anna Karlskov

    readers with literary interests, competences, possibilities, needs, etc. Generally speaking the criteria for the choice of texts for teaching literature in Danish schools have been dominated by considerations for the subject and Literature in itself. The predominant view of literature comes from...... literature studies at universities, where criteria concerning language and form are often more valued than criteria concerning character and content. This tendency to celebrate the formal aspects and the literariness of literature is recognized in governmental documents, teaching materials......, and in the registration of texts for examinations. Genres such as poetry and short stories, periods such as avant-garde and modernism, and acknowledged and well-known authorships are often included, whereas, representations of popular fiction and such genres as fantasy, sci-fi, and biography are rare. Often, pupils...

  2. An unpublished text of Jovellanos about mineralogy

    Directory of Open Access Journals (Sweden)

    Jorge ORDAZ GARGALLO

    2012-02-01

    Full Text Available An unpublished manuscript of Gaspar Melchor de Jovellanos about the history of mineralogy, written during his captivity in Bellver Castle (Palma de Mallorca is presented and analyzed. In this writing the importance of the chemical knowledge as a source of other branches of science and its applications in different fields of agriculture, mining and industry is considered. The author made a historical synthesis reviewing the men of science that contributed in a great extent to the advance of the chemistry and mineralogy. The text clearly supports the new contributions of Lavoisier and other supporters of experimentation as a scientific method, which agrees with Jovellanos’ ideas about the development of the «useful» sciences for the progress of the countries.

  3. Reading an ESL Writer’s Text

    Directory of Open Access Journals (Sweden)

    Paul Kei Matsuda

    2011-03-01

    Full Text Available This paper focuses on reading as a central act of communication in the tutorial session. Writing center tutors without extensive experience reading writing by second language writers may have difficulty getting past the many differences in surface-level features, organization, and rhetorical moves. After exploring some of the sources of these differences in writing, the authors present strategies that writing tutors can use to work effectively with second language writers.

  4. Services for annotation of biomedical text

    OpenAIRE

    Hakenberg, Jörg

    2008-01-01

    Motivation: Text mining in the biomedical domain in recent years has focused on the development of tools for recognizing named entities and extracting relations. Such research resulted from the need for such tools as basic components for more advanced solutions. Named entity recognition, entity mention normalization, and relationship extraction now have reached a stage where they perform comparably to human annotators (considering inter--annotator agreement, measured in many studies to be aro...

  5. Automatic Induction of Rule Based Text Categorization

    OpenAIRE

    D.Maghesh Kumar

    2010-01-01

    The automated categorization of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuingneed to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. This paper describ...

  6. Machine Learning in Automated Text Categorization

    OpenAIRE

    Sebastiani, Fabrizio

    2001-01-01

    The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categori...

  7. READING ALOUD STRATEGIES IN READING ENGLISH TEXTS

    OpenAIRE

    Iyen Nurlaelawati; Shofa Dzulqodah

    2014-01-01

    Abstract: Reading aloud by a young language learner shows unique patterns as the evidence of his/her language data processing. This study, thus, explored the strategies applied by an Indonesian young language learner to read English written texts aloud to identify errors that actually bring certain benefits in her language learning process such as making intelligent guesses when she encountered unfamiliar words. It adopted qualitative case study design involving a seven-year old girl as the s...

  8. Logistic regression a self-learning text

    CERN Document Server

    Kleinbaum, David G

    1994-01-01

    This textbook provides students and professionals in the health sciences with a presentation of the use of logistic regression in research. The text is self-contained, and designed to be used both in class or as a tool for self-study. It arises from the author's many years of experience teaching this material and the notes on which it is based have been extensively used throughout the world.

  9. Stochastic text models for music categorization

    OpenAIRE

    Pérez Sancho, Carlos; Rizo Valero, David; Iñesta Quereda, José Manuel

    2008-01-01

    Music genre meta-data is of paramount importance for the organization of music repositories. People use genre in a natural way when entering a music store or looking into music collections. Automatic genre classification has become a popular topic in music information retrieval research. This work brings to symbolic music recognition some technologies, like the stochastic language models, already successfully applied to text categorization. In this work we model chord progressions and melodie...

  10. Stemming of Slovenian library science texts

    Directory of Open Access Journals (Sweden)

    Polona Vilar

    2002-01-01

    Full Text Available The theme of the article is the preparation of a stemming algorithm for Slovenian library science texts. The procedure consisted of three phases: learning, testing and evaluation.The preparation of the optimal stemmer for Slovenian texts from the field of library science is presented, its testing and comparison with two other stemmers for the Slovenian language: the Popovič stemmer and the Generic stemmer. A corpus of 790.000 words from the field of library science was used for learning. Lists of stems, word endings and stop-words were built. In the testing phase, the component parts of the algorithm were tested on an additional corpus of 167.000 words. In the evaluation phase, a comparison of the three stemmers processing the same word corpus was made. The results of each stemmer were compared with an intellectually prepared control result of the stemming of the corpus. It consisted of groups of semantically connected words with no errors. Understemming was especially monitored – the number of stems for semantically connected words, produced by an algorithm. The results were statistically processed with the Kruskal-Wallis test. The Optimal stemmer produced the best results.It matched best with the reference results and also gave the smallest number of stems for one semantic meaning. The Popovič stemmer followed closely. The Generic stemmer proved to be the least accurate. The procedures described in the thesis can represent a platform for the development of the tools for automatic indexing and retrieval for library science texts in Slovenian language.

  11. Generating text from functional brain images

    Directory of Open Access Journals (Sweden)

    Francisco ePereira

    2011-08-01

    Full Text Available Recent work has shown that it is possible to take brain images acquired during viewing of a scene and reconstruct an approximation of the scene from those images. Here we show that it is also possible to generate text about the mental content reflected in brain images. We began with images collected as participants read names of concrete items (e.g., "Apartment" while also seeing line drawings of the item named. We built a model of the mental semantic representation of concrete concepts from text data and learned to map aspects of such representation to patterns of activation in the corresponding brain image. In order to validate this mapping, without accessing information about the items viewed for left-out individual brain images, we were able to generate from each one a collection of semantically pertinent words (e.g., "door," "window" for "Apartment". Furthermore, we show that the ability to generate such words allows us to perform a classification task and thus validate our method quantitatively.

  12. PEDANT: Parallel Texts in Göteborg

    Directory of Open Access Journals (Sweden)

    Daniel Ridings

    2012-09-01

    Full Text Available

    The article presents the status of the PEDANT project with parallel corpora at the Language Bank at Göteborg University. The solutions for access to the corpus data are presented. Access is provided by way of the internet and standard applications and SGML-aware programming tools. The SGML format for encoding translation pairs is outlined together. The methods allow working with everything from plain text to texts densely encoded with linguistic information.

     

    In hierdie artikel word 'n beskrywing gegee van die stand van die PEDANT-projek met parallelle korpora by die Taalbank by die Universiteit van Göteborg. Oplossings vir die verkryging van toegang tot die korpusdata word aangedui. Toegang word verskaf deur middel van die Internet en standaardtoepassings en SGML-sensitiewe programmeringshulpmiddels. Die SGML-formaat vir die enkodering van vertaalpare word gesamentlik geskets. Hierdie metodes laat toe dat gewerk kan word met enigiets vanaf suiwer teks tot tekste wat taalkundig dig geëtiketteer is.

     

  13. TEXT SIGNAGE RECOGNITION IN ANDROID MOBILE DEVICES

    Directory of Open Access Journals (Sweden)

    Oi-Mean Foong

    2013-01-01

    Full Text Available This study presents a Text Signage Recognition (TSR model in Android mobile devices for Visually Impaired People (VIP. Independence navigation is always a challenge to VIP for indoor navigation in unfamiliar surroundings. Assistive Technology such as Android smart devices has great potential to assist VIPs in indoor navigation using built-in speech synthesizer. In contrast to previous TSR research which was deployed in standalone personal computer system using Otsu’s algorithm, we have developed an affordable Text Signage Recognition in Android Mobile Devices using Tesseract OCR engine. The proposed TSR model used the input images from the International Conference on Document Analysis and Recognition (ICDAR 2003 dataset for system training and testing. The TSR model was tested by four volunteers who were blind-folded. The system performance of the TSR model was assessed using different metrics (i.e., Precision, Recall, F-Score and Recognition Formulas to determine its accuracy. Experimental results show that the proposed TSR model has achieved recognition rate satisfactorily.

  14. Statistical Language Model for Chinese Text Proofreading

    Institute of Scientific and Technical Information of China (English)

    张仰森; 曹元大

    2003-01-01

    Statistical language modeling techniques are investigated so as to construct a language model for Chinese text proofreading. After the defects of n-gram model are analyzed, a novel statistical language model for Chinese text proofreading is proposed. This model takes full account of the information located before and after the target word wi, and the relationship between un-neighboring words wi and wj in linguistic environment(LE). First, the word association degree between wi and wj is defined by using the distance-weighted factor, wj is l words apart from wi in the LE, then Bayes formula is used to calculate the LE related degree of word wi, and lastly, the LE related degree is taken as criterion to predict the reasonability of word wi that appears in context. Comparing the proposed model with the traditional n-gram in a Chinese text automatic error detection system, the experiments results show that the error detection recall rate and precision rate of the system have been improved.

  15. Learning from text: The effect of adjunct questions and alignment on text comprehension

    NARCIS (Netherlands)

    Reijners, Pauline; Kester, Liesbeth; Wetzels, Sandra; Kirschner, Paul A.

    2012-01-01

    Reijners, P. B. G., Kester, L., Wetzels, S. A. J., & Kirschner, P. A. (2012, November). Learning from text: The effect of adjunct questions and alignment on text comprehension. Poster presented at the ICO International Fall School, Girona, Spain.

  16. How Much Handwritten Text Is Needed for Text-Independent Writer Verification and Identification

    NARCIS (Netherlands)

    Brink, Axel; Bulacu, Marius; Schomaker, Lambert

    2008-01-01

    The performance of off-line text-independent writer verification and identification increases when the documents contain more text. This relation was examined by repeatedly conducting writer verification and identification performance tests while gradually increasing the amount of text on the pages.

  17. Text Belief Consistency Effects in the Comprehension of Multiple Texts with Conflicting Information

    Science.gov (United States)

    Maier, Johanna; Richter, Tobias

    2013-01-01

    When reading multiple texts about controversial scientific issues, learners must construct a coherent mental representation of the issue based on conflicting information that can be more or less belief-consistent. The present experiment investigated the effects of text-belief consistency on the situation model and memory for text. Students read…

  18. Learning from Conflicting Texts: The Role of Intertextual Conflict Resolution in Between-Text Integration

    Science.gov (United States)

    Kobayashi, Keiichi

    2015-01-01

    The present study examined the effect of intertextual conflict resolution on learning from conflicting texts. In two experiments, participants read sets of two texts under the condition of being encouraged either to resolve a conflict between the texts' arguments (the resolution condition) or to comprehend the arguments (the comprehension…

  19. Towards Text Simplification for Poor Readers with Intellectual Disability: When Do Connectives Enhance Text Cohesion?

    Science.gov (United States)

    Fajardo, Inmaculada; Tavares, Gema; Avila, Vicenta; Ferrer, Antonio

    2013-01-01

    Cohesive elements of texts such as connectives (e.g., "but," "in contrast") are expected to facilitate inferential comprehension in poor readers. Two experiments tested this prediction in poor readers with intellectual disability (ID) by: (a) comparing literal and inferential text comprehension of texts with and without connectives and/or high…

  20. On the application of text input metrics to handwritten text input

    OpenAIRE

    Read, Janet C.

    2006-01-01

    This paper describes the current metrics used in text input research, considering those used for discrete text input as well as those used for spoken input. It examines how these metrics might be used for handwritten text input and provides some thoughts about different metrics that might allow for a more fine grained evaluation of recognition improvement or input accuracy.

  1. Layout-aware text extraction from full-text PDF of scientific articles

    Directory of Open Access Journals (Sweden)

    Ramakrishnan Cartic

    2012-05-01

    Full Text Available Abstract Background The Portable Document Format (PDF is the most commonly used file format for online scientific publications. The absence of effective means to extract text from these PDF files in a layout-aware manner presents a significant challenge for developers of biomedical text mining or biocuration informatics systems that use published literature as an information source. In this paper we introduce the ‘Layout-Aware PDF Text Extraction’ (LA-PDFText system to facilitate accurate extraction of text from PDF files of research articles for use in text mining applications. Results Our paper describes the construction and performance of an open source system that extracts text blocks from PDF-formatted full-text research articles and classifies them into logical units based on rules that characterize specific sections. The LA-PDFText system focuses only on the textual content of the research articles and is meant as a baseline for further experiments into more advanced extraction methods that handle multi-modal content, such as images and graphs. The system works in a three-stage process: (1 Detecting contiguous text blocks using spatial layout processing to locate and identify blocks of contiguous text, (2 Classifying text blocks into rhetorical categories using a rule-based method and (3 Stitching classified text blocks together in the correct order resulting in the extraction of text from section-wise grouped blocks. We show that our system can identify text blocks and classify them into rhetorical categories with Precision1 = 0.96% Recall = 0.89% and F1 = 0.91%. We also present an evaluation of the accuracy of the block detection algorithm used in step 2. Additionally, we have compared the accuracy of the text extracted by LA-PDFText to the text from the Open Access subset of PubMed Central. We then compared this accuracy with that of the text extracted by the PDF2Text system, 2commonly used to extract text from PDF

  2. THE PSYCHOLOGICAL NATURE OF TEXT COMPREHENSION IN TERMS OF TEXT LEARNING PROCESSES

    Directory of Open Access Journals (Sweden)

    Ferhat ENSAR

    2013-06-01

    Full Text Available Texts are important tools for learning. Thus, the attempt to make texts more understandable is a reflection of a purpose-function related necessity for learning from text. On the other hand, the idea of development and recovery of informative texts via corrective teaching materials is frequently explored by contemporary researchers. Thus, it is evident that more advanced proficiency is needed for the illustrated aspect of the structure of texts in the learning process and to make the efforts to prepare educational materials at more scientific ground. Therefore, in this study textual organization and a general theory of learning from texts are outlined and later language processing in working memory and related phenomena about learning from texts and individual differences including information about texts development, texts comprehension, and inferences from texts are discussed. The reason for this is the idea that working memory is responsible for not only recalling the stored information but also for storing the results of partial processes such as successive processes like language comprehension as explained in the related literature for modern memory theories. The other reason is the generalizations about the interaction between the processes of physical representation and pattern of a text manifested in accordance with these ideas. Additionally, not only the different procedures used to develop informative texts, at the same time, differences of these procedures including a learner’s view of world and process styles and measurement of text comprehension and the complex relations among them are the current and available information in the literature. As a result, due to the nature of factors, which affect a learner’s level of recalling and his understanding from text, this study aims to discuss this assumptions.

  3. Computational text analysis and reading comprehension exam complexity towards automatic text classification

    CERN Document Server

    Liontou, Trisevgeni

    2014-01-01

    This book delineates a range of linguistic features that characterise the reading texts used at the B2 (Independent User) and C1 (Proficient User) levels of the Greek State Certificate of English Language Proficiency exams in order to help define text difficulty per level of competence. In addition, it examines whether specific reader variables influence test takers' perceptions of reading comprehension difficulty. The end product is a Text Classification Profile per level of competence and a formula for automatically estimating text difficulty and assigning levels to texts consistently and re

  4. Text4Health: a qualitative evaluation of parental readiness for text message immunization reminders.

    Science.gov (United States)

    Kharbanda, Elyse Olshen; Stockwell, Melissa S; Fox, Harrison W; Rickert, Vaughn I

    2009-12-01

    We conducted focus groups and individual interviews in a diverse population of parents to qualitatively explore preferences and readiness for text message immunization reminders. We used content analysis to review and independently code transcripts. Text message reminders were well-accepted by parents; many thought they would be more effective than standard phone or mail reminders. Parents preferred text message reminders to be brief and personalized. Most parents were able to retrieve sample text messages but many had difficulty with interactive texting. PMID:19833982

  5. Extracting laboratory test information from biomedical text

    Directory of Open Access Journals (Sweden)

    Yanna Shen Kang

    2013-01-01

    Full Text Available Background: No previous study reported the efficacy of current natural language processing (NLP methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with the current tools and techniques, especially machine learning and symbolic NLP methods. The study data came from a text corpus maintained by the U.S. Food and Drug Administration, containing a rich set of information on laboratory tests and test devices. Methods: The authors developed a symbolic information extraction (SIE system to extract device and test specific information about four types of laboratory test entities: Specimens, analytes, units of measures and detection limits. They compared the performance of SIE and three prominent machine learning based NLP systems, LingPipe, GATE and BANNER, each implementing a distinct supervised machine learning method, hidden Markov models, support vector machines and conditional random fields, respectively. Results: Machine learning systems recognized laboratory test entities with moderately high recall, but low precision rates. Their recall rates were relatively higher when the number of distinct entity values (e.g., the spectrum of specimens was very limited or when lexical morphology of the entity was distinctive (as in units of measures, yet SIE outperformed them with statistically significant margins on extracting specimen, analyte and detection limit information in both precision and F-measure. Its high recall performance was statistically significant on analyte information extraction. Conclusions: Despite its shortcomings against machine learning methods, a well-tailored symbolic system may better discern relevancy among a pile of information of the same type and may outperform a machine learning system by tapping into lexically non-local contextual information such as the document structure.

  6. Dialogical surface text features in abstracts

    OpenAIRE

    Ingrid García-Østbye

    2008-01-01

    A sample driven description of Research Article-Comment-Reply (RA-C-R) abstracts in terms of abstract sentence length, reference, possessive structures, modal verbs and word range was carried out to find out whether their surface text features showed some trace of a dialogical construction of knowledge within the psychology discourse community. The study served an exploratory purpose. A Boolean search was conducted in the PsycLIT database yielding a sample of 149 PsycLIT RA-C-R abstracts (13,...

  7. Advances in text analytics for drug discovery.

    Science.gov (United States)

    Roberts, Phoebe M; Hayes, William S

    2005-05-01

    The automated extraction of biological and chemical information has improved over the past year, with advances in access to content, entity extraction of genes, chemicals, kinetic data and relationships, and algorithms for generating and testing hypotheses. As the systems for reading and understanding scientific literature grow more powerful, so must the infrastructure in which to assemble information. Advances in infrastructure systems are discussed in this review. Research efforts have flourished as a result of text analytics competitions that attract participants from various disciplines, from computer science to bioinformatics.

  8. Text to reference ratios in scientific journals

    OpenAIRE

    Little, Anne E.; Roma M. Harris; Nicholls, Paul T.

    1990-01-01

    I n 1987, Peter Junars, the editor of Limnology and Oceanography, reported that the ratio of printed pages of text t o nunber of references had decreased during the period 1980 to 1987. I n other words, authors were using an increasing nunber o f references - an observation which was o f sane concern because Limnozoology and Oceanography publishes only a fixed nunber of pages per year. I n the present study, an attempt was made t o detenine whether journals from other scientific discipl...

  9. Text Data Mining: Theory and Methods

    OpenAIRE

    Solka, Jeffrey L.

    2008-01-01

    This paper provides the reader with a very brief introduction to some of the theory and methods of text data mining. The intent of this article is to introduce the reader to some of the current methodologies that are employed within this discipline area while at the same time making the reader aware of some of the interesting challenges that remain to be solved within the area. Finally, the articles serves as a very rudimentary tutorial on some of techniques while also providing the reader wi...

  10. Methods for Mining and Summarizing Text Conversations

    CERN Document Server

    Carenini, Giuseppe; Murray, Gabriel

    2011-01-01

    Due to the Internet Revolution, human conversational data -- in written forms -- are accumulating at a phenomenal rate. At the same time, improvements in speech technology enable many spoken conversations to be transcribed. Individuals and organizations engage in email exchanges, face-to-face meetings, blogging, texting and other social media activities. The advances in natural language processing provide ample opportunities for these "informal documents" to be analyzed and mined, thus creating numerous new and valuable applications. This book presents a set of computational methods

  11. Unsupervised information extraction by text segmentation

    CERN Document Server

    Cortez, Eli

    2013-01-01

    A new unsupervised approach to the problem of Information Extraction by Text Segmentation (IETS) is proposed, implemented and evaluated herein. The authors' approach relies on information available on pre-existing data to learn how to associate segments in the input string with attributes of a given domain relying on a very effective set of content-based features. The effectiveness of the content-based features is also exploited to directly learn from test data structure-based features, with no previous human-driven training, a feature unique to the presented approach. Based on the approach, a

  12. On the role of autocorrelations in texts

    CERN Document Server

    Lande, D V

    2007-01-01

    The task of finding a criterion allowing to distinguish a text from an arbitrary set of words is rather relevant in itself, for instance, in the aspect of development of means for internet-content indexing or separating signals and noise in communication channels. The Zipf law is currently considered to be the most reliable criterion of this kind [3]. At any rate, conventional stochastic word sets do not meet this law. The present paper deals with one of possible criteria based on the determination of the degree of data compression.

  13. On the role of autocorrelations in texts

    OpenAIRE

    Lande, D. V.; Snarskii, A. A.

    2007-01-01

    The task of finding a criterion allowing to distinguish a text from an arbitrary set of words is rather relevant in itself, for instance, in the aspect of development of means for internet-content indexing or separating signals and noise in communication channels. The Zipf law is currently considered to be the most reliable criterion of this kind [3]. At any rate, conventional stochastic word sets do not meet this law. The present paper deals with one of possible criteria based on the determi...

  14. Relative clauses in French children's narrative texts.

    Science.gov (United States)

    Jisa, H; Kern, S

    1998-10-01

    This study investigates the use of relative clauses in French children's narrative monologues. Narrative texts were collected from French-speaking monolinguals in four age groups (five, seven, ten years and adults). Twenty subjects from each group were asked to tell a story based on a picture book consisting of twenty-four images without text (Frog, Where are you?). Relative constructions were coded following the categories defined by Dasinger & Toupin (1994) into two main functional classes: general discourse and narrative functions. The results show that the use of relative clauses in general discourse functions precedes their use in more specific narrative functions. An analysis of textual connectivity (Berman & Slobin, 1994) in one episode reveals that children and adults differ in their choice of preferred structures. The results also show that children use fewer transitive predicates in relative clauses than do adults. Transitive verbs are essential for advancing the narrative plot (Hopper & Thompson, 1980). While subject relative clauses are acquired early and used frequently, the development of their multifunctional use in diverse narrative functions extends well beyond childhood.

  15. Chemical-text hybrid search engines.

    Science.gov (United States)

    Zhou, Yingyao; Zhou, Bin; Jiang, Shumei; King, Frederick J

    2010-01-01

    As the amount of chemical literature increases, it is critical that researchers be enabled to accurately locate documents related to a particular aspect of a given compound. Existing solutions, based on text and chemical search engines alone, suffer from the inclusion of "false negative" and "false positive" results, and cannot accommodate diverse repertoire of formats currently available for chemical documents. To address these concerns, we developed an approach called Entity-Canonical Keyword Indexing (ECKI), which converts a chemical entity embedded in a data source into its canonical keyword representation prior to being indexed by text search engines. We implemented ECKI using Microsoft Office SharePoint Server Search, and the resultant hybrid search engine not only supported complex mixed chemical and keyword queries but also was applied to both intranet and Internet environments. We envision that the adoption of ECKI will empower researchers to pose more complex search questions that were not readily attainable previously and to obtain answers at much improved speed and accuracy.

  16. Reading Instruments: Objects, Texts and Museums

    Science.gov (United States)

    Anderson, Katharine; Frappier, Mélanie; Neswald, Elizabeth; Trim, Henry

    2013-05-01

    Science educators, historians of science and their students often share a curiosity about historical instruments as a tangible link between past and present practices in the sciences. We less often integrate instruments into our research and pedagogy, considering artefact study as the domain of museum specialists. We argue here that scholars and teachers new to material culture can readily use artefacts to reveal rich and complex networks of narratives. We illustrate this point by describing our own lay encounter with an artefact turned over for our analysis during a week-long workshop at the Canada Science and Technology Museum. The text explains how elements as disparate as the military appearance of the instrument, the crest stamped on its body, the manipulation of its telescopes, or a luggage tag revealed the object's scientific and political significance in different national contexts. In this way, the presence of the instrument in the classroom vividly conveyed the nature of geophysics as a field practice and an international science, and illuminated relationships between pure and applied science for early twentieth century geologists. We conclude that artefact study can be an unexpectedly powerful and accessible tool in the study of science, making visible the connections between past and present, laboratory and field, texts and instruments.

  17. Pathology of Commentary in Persian Literary Texts

    Directory of Open Access Journals (Sweden)

    احمد رضی

    2011-10-01

    Full Text Available Today commentary work has a significant role and place among the readers of Persian literary texts and those interested in them. The growing importance of commentary works in helping the readers understand and popularity of commentary works, notably in recent decades, has caused different commentators with different knowledge level and abilities to write comments and foster this disorganized market. This study intends to investigate the published commentary works in the past decades, analyze their week points. To do so, over 250 works, which have been written and published between 1300 AP (circa 1921 AD and 1387 AP (circa2008 AD and an attempt has been made to classify, describe, and analyze their most important problems and week points, and at the end, the most important items of best commentary and best commentators have been explained. This article intends to analyz the most important problems and week points of commentary works, which can be summarized in seven broad categories: 1 content shortcomings; 2 inappropriate approach; 3 incongruence between the structure of commentary work and type of the work and the commentator's objective; 4 lack of attention towards the readership; 5 carelessness and incompetency of the commentator; 6 complex statement and insensible language; 7 inaudibility of introductions. Key words: research methodology, commentary works, pathology, literary works

  18. Chemical-text hybrid search engines.

    Science.gov (United States)

    Zhou, Yingyao; Zhou, Bin; Jiang, Shumei; King, Frederick J

    2010-01-01

    As the amount of chemical literature increases, it is critical that researchers be enabled to accurately locate documents related to a particular aspect of a given compound. Existing solutions, based on text and chemical search engines alone, suffer from the inclusion of "false negative" and "false positive" results, and cannot accommodate diverse repertoire of formats currently available for chemical documents. To address these concerns, we developed an approach called Entity-Canonical Keyword Indexing (ECKI), which converts a chemical entity embedded in a data source into its canonical keyword representation prior to being indexed by text search engines. We implemented ECKI using Microsoft Office SharePoint Server Search, and the resultant hybrid search engine not only supported complex mixed chemical and keyword queries but also was applied to both intranet and Internet environments. We envision that the adoption of ECKI will empower researchers to pose more complex search questions that were not readily attainable previously and to obtain answers at much improved speed and accuracy. PMID:20047295

  19. Handwritten Text Image Authentication using Back Propagation

    CERN Document Server

    Chakravarthy, A S N; Avadhani, P S

    2011-01-01

    Authentication is the act of confirming the truth of an attribute of a datum or entity. This might involve confirming the identity of a person, tracing the origins of an artefact, ensuring that a product is what it's packaging and labelling claims to be, or assuring that a computer program is a trusted one. The authentication of information can pose special problems (especially man-in-the-middle attacks), and is often wrapped up with authenticating identity. Literary can involve imitating the style of a famous author. If an original manuscript, typewritten text, or recording is available, then the medium itself (or its packaging - anything from a box to e-mail headers) can help prove or disprove the authenticity of the document. The use of digital images of handwritten historical documents has become more popular in recent years. Volunteers around the world now read thousands of these images as part of their indexing process. Handwritten text images of old documents are sometimes difficult to read or noisy du...

  20. El manual como texto Schoolbook as text

    Directory of Open Access Journals (Sweden)

    Agustín Escolano Benito

    2012-12-01

    Full Text Available Este trabajo aborda la cuestión de la identidad del libro escolar como un género textual específico en el contexto de la manualística clásica y moderna, contextualizando los análisis en el marco de la cultura de la escuela tradicional y en la era de la revolución digital y bajo una perspectiva historiográfica y teórica. También plantea el nacimiento y primeros desarrollos de la manualística como campo intelectual y académico y sus contribuciones a la definición de la identidad del libro escolar.This paper discusses the question of identifying a coursebook as a specific text genre in the context of the classical and modern manualistics, situating the analysis within the traditional school culture and the digital revolution era, under a historical and theoretical perspective. It also covers the birth and initial development of manualistics as an intelectual and academic field and its contributions to the definition of the schoolbook identity.

  1. [Formula: see text] and [Formula: see text] Spoken Word Processing: Evidence from Divided Attention Paradigm.

    Science.gov (United States)

    Shafiee Nahrkhalaji, Saeedeh; Lotfi, Ahmad Reza; Koosha, Mansour

    2016-10-01

    The present study aims to reveal some facts concerning first language ([Formula: see text] and second language ([Formula: see text] spoken-word processing in unbalanced proficient bilinguals using behavioral measures. The intention here is to examine the effects of auditory repetition word priming and semantic priming in first and second languages of these bilinguals. The other goal is to explore the effects of attention manipulation on implicit retrieval of perceptual and conceptual properties of spoken [Formula: see text] and [Formula: see text] words. In so doing, the participants performed auditory word priming and semantic priming as memory tests in their [Formula: see text] and [Formula: see text]. In a half of the trials of each experiment, they carried out the memory test while simultaneously performing a secondary task in visual modality. The results revealed that effects of auditory word priming and semantic priming were present when participants processed [Formula: see text] and [Formula: see text] words in full attention condition. Attention manipulation could reduce priming magnitude in both experiments in [Formula: see text]. Moreover, [Formula: see text] word retrieval increases the reaction times and reduces accuracy on the simultaneous secondary task to protect its own accuracy and speed. PMID:26643309

  2. Graphical support for comprehending science texts: The contributions of diagram design and text directives

    Science.gov (United States)

    McTigue, Erin M.

    The present study examined the combined effect of diagram design and text directives on the comprehension of explanatory science texts for middle school readers. Three types of diagram designs were compared. Each design contained the same graphical representation of a cycle but differed in the labels. The labels indicated either the (a) parts of the, cycle, (b) steps of the cycle, or (c) both the parts and steps. Additionally, there were two conditions of text, both with and without embedded directives. The directives guided the reader to the diagram to help readers integrate the two sources of information. Finally, each of the 189 sixth grade participants read two texts---a life-science text and a physical-science text. Results indicated that for the life-science text both the parts diagrams and the steps diagrams facilitated the readers' comprehension, but that the parts & steps diagram did not. Overall, the directives assisted readers in the life-science text, when they were viewing the complex diagrams: the steps diagram, and the parts & steps diagrams, but not the parts diagram. Directives also helped girls who were reading at the below- and on-grade level, but not the girls reading above-grade level. Neither the diagrams nor directives facilitated comprehension of the physical science text. There was a gender difference favoring boys on the physical science but no gender difference on the life-science text.

  3. MANAGING THE TRANSLATION OF ECONOMIC TEXTS

    Directory of Open Access Journals (Sweden)

    Pop Anamaria Mirabela

    2012-12-01

    Full Text Available Theoretically, translation may pass as science; practically, it seems closer to art. Translation is a challenging activity requiring a set of abilities and posing few difficulties that appear during the translation process. This paper investigates the extent to which sub-technical vocabulary can constitute a problem to Romanian students of economics reading in English, by looking at the translations produced as independent or pair work during English classes and analyzing the various errors which may appeared. The exigencies required by the efficient business communication have increased in the past few decades because of rising international trade, increased migration, globalization, the recognition of linguistic minorities, and the expansion of the mass media and technology. All these led us to approach the topic of translation which is actually a job that requires skills, stages of research necessary for disclosure of transfer characteristic into the target language, training, experience and a good sense of languages. The paper defines the theoretical issues and terminology: translation, types of translation, economic texts and then focuses on the presentation of the practical work carried out throughout the academic year of second year students. Considering that only 28% of the entire European population can read English, and even less people in South America and Asia can, it is obvious that an effective communication of business matters relies on an accurate understanding of terminology. Economics is a field of knowledge in accelerated scientific and technological development. As there is a permanent and ever increasing need to quickly update their knowledge, economists read and learn directly in the original language of the publication and stick to it in daily usage, including conferences, scientific events and articles written in Romanian. Besides researching properly the markets, finding distribution channels, and dealing with legal

  4. Algorithmic Detection of Computer Generated Text

    CERN Document Server

    Lavoie, Allen

    2010-01-01

    Computer generated academic papers have been used to expose a lack of thorough human review at several computer science conferences. We assess the problem of classifying such documents. After identifying and evaluating several quantifiable features of academic papers, we apply methods from machine learning to build a binary classifier. In tests with two hundred papers, the resulting classifier correctly labeled papers either as human written or as computer generated with no false classifications of computer generated papers as human and a 2% false classification rate for human papers as computer generated. We believe generalizations of these features are applicable to similar classification problems. While most current text-based spam detection techniques focus on the keyword-based classification of email messages, a new generation of unsolicited computer-generated advertisements masquerade as legitimate postings in online groups, message boards and social news sites. Our results show that taking the formatti...

  5. Metaphor identification in large texts corpora.

    Directory of Open Access Journals (Sweden)

    Yair Neuman

    Full Text Available Identifying metaphorical language-use (e.g., sweet child is one of the challenges facing natural language processing. This paper describes three novel algorithms for automatic metaphor identification. The algorithms are variations of the same core algorithm. We evaluate the algorithms on two corpora of Reuters and the New York Times articles. The paper presents the most comprehensive study of metaphor identification in terms of scope of metaphorical phrases and annotated corpora size. Algorithms' performance in identifying linguistic phrases as metaphorical or literal has been compared to human judgment. Overall, the algorithms outperform the state-of-the-art algorithm with 71% precision and 27% averaged improvement in prediction over the base-rate of metaphors in the corpus.

  6. Exploiting Surrounding Text for Retrieving Web Images

    Directory of Open Access Journals (Sweden)

    S. A. Noah

    2008-01-01

    Full Text Available Web documents contain useful textual information that can be exploited for describing images. Research had been focused on representing images by means of its content (low level description such as color, shape and texture, little research had been directed to exploiting such textual information. The aim of this research was to systematically exploit the textual content of HTML documents for automatically indexing and ranking of images embedded in web documents. A heuristic approach for locating and assigning weight surrounding web images and a modified tf.idf weighting scheme was proposed. Precision-recall measures of evaluation had been conducted for ten queries and promising results had been achieved. The proposed approach showed slightly better precision measure as compared to a popular search engine with an average of 0.63 and 0.55 relative precision measures respectively.

  7. “Girls Text Really Weird”: Gender, Texting and Identity Among Teens

    DEFF Research Database (Denmark)

    Ling, Richard; Baron, Naomi; Lenhart, Amanda;

    2014-01-01

    this portion of their lives. Texting is a direct, person-to-person venue where they can develop their gendered identity and also investigate romantic interaction. In this activity, both genders show the ability to make fine-grained interpretations of texts, often interpreting the meaning of punctuation......This article examines the strategies used by teenagers for interacting with members of the opposite sex when texting. This article uses material from a series of nine focus groups from 2009 in four US cities. It reports on the strategies they use and the problems they encounter as they negotiate...... and other paralinguistic devices. In addition, they use texts to characterize the opposite sex. Teen boys' texts are seen as short and perhaps brisk when viewed by girls. Boys see teen girls' texts as being overly long, prying and containing unneeded elements. The discussion of these practices shows how...

  8. Eye movements during the recollection of text information reflect content rather than the text itself

    DEFF Research Database (Denmark)

    Traub, Franziska; Johansson, Roger; Holmqvist, Kenneth

    description, the memory of the physical layout of the text itself might compete with the memory of the spatial arrangement of the described scene. The present study was designed to address this fundamental issue by having participants read scene descriptions that where manipulated to be either congruent......Several studies have reported that spontaneous eye movements occur when visuospatial information is recalled from memory. Such gazes closely reflect the content and spatial relations from the original scene layout (e.g., Johansson et al., 2012). However, when someone has originally read a scene...... or incongruent with the spatial layout of the text itself. 28 participants read and recalled three texts: (1) a scene description congruent with the spatial layout of the text; (2) a scene description incongruent with the spatial layout of the text; and (3) a control text without any spatial scene content...

  9. Text Mining Approaches To Extract Interesting Association Rules from Text Documents

    Directory of Open Access Journals (Sweden)

    Vishwadeepak Singh Baghela

    2012-05-01

    Full Text Available A handful of text data mining approaches are available to extract many potential information and association from large amount of text data. The term data mining is used for methods that analyze data with the objective of finding rules and patterns describing the characteristic properties of the data. The 'mined information is typically represented as a model of the semantic structure of the dataset, where the model may be used on new data for prediction or classification. In general, data mining deals with structured data (for example relational databases, whereas text presents special characteristics and is unstructured. The unstructured data is totally different from databases, where mining techniques are usually applied and structured data is managed. Text mining can work with unstructured or semi-structured data sets A brief review of some recent researches related to mining associations from text documents is presented in this paper.

  10. Dialogical surface text features in abstracts

    Directory of Open Access Journals (Sweden)

    Ingrid García-Østbye

    2008-04-01

    Full Text Available A sample driven description of Research Article-Comment-Reply (RA-C-R abstracts in terms of abstract sentence length, reference, possessive structures, modal verbs and word range was carried out to find out whether their surface text features showed some trace of a dialogical construction of knowledge within the psychology discourse community. The study served an exploratory purpose. A Boolean search was conducted in the PsycLIT database yielding a sample of 149 PsycLIT RA-C-R abstracts (13,978 words. Relative frequency percent distributions were calculated for all variables, including reported speech verbs. Specific comparisons with a Medline corpus were conducted and variations were accounted for in terms of scientific discourse characteristics, field, database policies, and dialogical nature; that is, in the framework provided by the strands of research of quantitative applied linguistics, social concerns in genre analysis and the model monopoly theory developed in the implementation in sociology of the systems theory. The results suggest: (i a word range affected by both psychology as a discipline and the dialogical content on which PsycLIT RA-C-R abstracts report; (ii a complementarity of reference and possessive structures characterised by features of scientific discourse, feedback genres and dialogical dimensions; (iii the presence of both deontic and epistemic modality in the modal verbs of our sample; (iv and also that abstract length, sentence length and number of sentences per paragraph in our sample may not vary greatly in general terms from those of the social sciences.

  11. Counting OCR errors in typeset text

    Science.gov (United States)

    Sandberg, Jonathan S.

    1995-03-01

    Frequently object recognition accuracy is a key component in the performance analysis of pattern matching systems. In the past three years, the results of numerous excellent and rigorous studies of OCR system typeset-character accuracy (henceforth OCR accuracy) have been published, encouraging performance comparisons between a variety of OCR products and technologies. These published figures are important; OCR vendor advertisements in the popular trade magazines lead readers to believe that published OCR accuracy figures effect market share in the lucrative OCR market. Curiously, a detailed review of many of these OCR error occurrence counting results reveals that they are not reproducible as published and they are not strictly comparable due to larger variances in the counts than would be expected by the sampling variance. Naturally, since OCR accuracy is based on a ratio of the number of OCR errors over the size of the text searched for errors, imprecise OCR error accounting leads to similar imprecision in OCR accuracy. Some published papers use informal, non-automatic, or intuitively correct OCR error accounting. Still other published results present OCR error accounting methods based on string matching algorithms such as dynamic programming using Levenshtein (edit) distance but omit critical implementation details (such as the existence of suspect markers in the OCR generated output or the weights used in the dynamic programming minimization procedure). The problem with not specifically revealing the accounting method is that the number of errors found by different methods are significantly different. This paper identifies the basic accounting methods used to measure OCR errors in typeset text and offers an evaluation and comparison of the various accounting methods.

  12. How indexicals function in texts: Discourse, text, and one neo-Gricean account of indexical reference

    OpenAIRE

    Cornish, Francis

    2008-01-01

    International audience My goal in this article is to compare the behavior of a variety of non clause-bound types of indexical expression in English across three texts from different genres, spoken as well as written. A key distinction is the one claimed to exist between the dimensions of text and discourse, and the comparison of the indexical types demonstrates its relevance. In a given text, certain lexically-specific types of indexical bearing an anaphoric interpretation may perform part...

  13. The Link between Text Difficulty, Reading Speed and Exploration of Printed Text during Shared Book Reading

    Science.gov (United States)

    Roy-Charland, Annie; Perron, Melanie; Turgeon, Krystle-Lee; Hoffman, Nichola; Chamberland, Justin A.

    2016-01-01

    In the current study the reading speed of the narration and the difficulty of the text was manipulated and links were explored with children's attention to the printed text in shared book reading. Thirty-nine children (24 grade 1 and 15 grade 2) were presented easy and difficult books at slow (syllable by syllable) or fast (adult reading speed)…

  14. "Romeo and Juliet" in the Minneapolis Public Schools: Accurate Text or Bowdlerized Text?

    Science.gov (United States)

    Reed, Margaret A.

    In 1984, parents of a Minneapolis, Minnesota, ninth grader came before the school district's "Students' Right to Learn Committee" to object to what they described as a bowdlerized version of "Romeo and Juliet" in the Scott, Foresman text, and the publisher's failure to acknowledge in the text that the play was abridged. The committee concurred…

  15. Generation Text: The Influence of Audience, Environment, and Social Impression on Text Message Construction

    Science.gov (United States)

    Camuti, Alice Kerlin

    2011-01-01

    The purpose of this interpretivist qualitative study is to discover what factors influence first-year college students as they construct their text messages. Using grounded theory methodology, 11 first-year college students at a university in the Southeast were interviewed one-on-one and through text messaging in order to gain insight into…

  16. What can measures of text comprehension tell us about creative text production?

    NARCIS (Netherlands)

    Bos, Lisanne T.; de Koning, Bjorn; van Wesel, F.; Boonstra, Marije; van der Schoot, Menno

    2015-01-01

    Evidence is accumulating that the level of text comprehension is dependent on the situatedness and sensory richness of a child's mental representation formed during reading. This study investigated whether these factors involved in text comprehension also serve a functional role in writing a narrati

  17. Mining Causality for Explanation Knowledge from Text

    Institute of Scientific and Technical Information of China (English)

    Chaveevan Pechsiri; Asanee Kawtrakul

    2007-01-01

    Mining causality is essential to provide a diagnosis. This research aims at extracting the causality existing within multiple sentences or EDUs (Elementary Discourse Unit). The research emphasizes the use of causality verbs because they make explicit in a certain way the consequent events of a cause, e.g., "Aphids suck the sap from rice leaves. Then leaves will shrink. Later, they will become yellow and dry.". A verb can also be the causal-verb link between cause and effect within EDU(s), e.g., "Aphids suck the sap from rice leaves causing leaves to be shrunk" ("causing" is equivalent to a causal-verb link in Thai). The research confronts two main problems: identifying the interesting causality events from documents and identifying their boundaries. Then, we propose mining on verbs by using two different machine learning techniques, Naive Bayes classifier and Support Vector Machine. The resulted mining rules will be used for the identification and the causality extraction of the multiple EDUs from text. Our multiple EDUs extraction shows 0.88 precision with 0.75 recall from Na'ive Bayes classifier and 0.89 precision with 0.76 recall from Support Vector Machine.

  18. Named entity recognition in Slovene text

    Directory of Open Access Journals (Sweden)

    Tadej Štajner

    2013-12-01

    Full Text Available This paper presents an approach and an implementation of a named entity extractor for Slovene language, based on a machine learning approach. It is designed as a supervised algorithm based on Conditional Random Fields and is trained on the ssj500k annotated corpus of Slovene. The corpus, which is available under a Creative Commons CC-BY-NC-SA licence, is annotated with morphosyntactic tags, as well as named entities for people, locations, organisations, and miscellaneous names. The paper discusses the influence of morphosyntactic tags, lexicons and conjunctions of features of neighbouring words. An important contribution of this investigation is that morphosyntactic tags benefit named entity extraction. Using all the best-performing features the recognizer reaches a precision of 74% and a recall of 72%, having stronger performance on personal and geographical named entities, followed by organizations, but performs poorly on the miscellaneous entities, since this class is very diverse and consequently difficult to predict. A major contribution of the paper is also showing the benefits of splitting the class of miscellaneous entities into organizations and other entities, which in turn improves performance even on personal and organizational names. The software, developed in this research is freely available under the Apache 2.0 licence at http://ailab.ijs.si/~tadej/slner.zip, while development versions are available at https://github.com/tadejs/slner.

  19. Text legibility and the letter superiority effect.

    Science.gov (United States)

    Sheedy, James E; Subbaram, Manoj V; Zimmerman, Aaron B; Hayes, John R

    2005-01-01

    Effects of font design and electronic display parameters upon text legibility were determined using a threshold size method. Participants' visual acuity (inverse of the minimum detection size, representing the threshold legibility for each condition) was measured using upper- and lowercase letters and lowercase words in combinations of 6 fonts, 3 font-smoothing modes, 4 font sizes, 10 pixel heights, and 4 stroke widths. Individual lowercase letters were 10% to 20% more legible than lowercase words (i.e., lowercase words must be 10%-20% larger to have the same threshold legibility). This letter superiority effect suggests that individual letters play a large role and word shape plays a smaller role, if any, in word identification at threshold. Pixel height, font, stroke width, and font smoothing had significant main effects on threshold legibility. Optimal legibility was attained at 9 pixels (10 points). Verdana and Arial were the most legible fonts; Times New Roman and Franklin were least legible. Subpixel rendering (ClearType) improved threshold legibility for some fonts and, in combination with Verdana, was the most legible condition. Increased stroke width (bold) improved threshold legibility but only at the thinnest width tested. Potential applications of this research include optimization of font design for legibility and readability. PMID:16553067

  20. Hybrid Method for Tagging Arabic Text

    Directory of Open Access Journals (Sweden)

    Yamina Tlili-Guiassa

    2006-01-01

    Full Text Available Many natural language expressions are ambiguous and need to draw on other sources of information to be interpreted. Interpretation of the word ﺗﻌﺎون to be considered as a noun or a verb depends on the presence of contextual cues. This study proposes a hybrid method of based- rules and a machine learning method for tagging Arabic words. So this method is based firstly on rules (that considered the post-position, ending of a word and patterns and then the anomaly is corrected by adopting a memory-based learning method (MBL. The memory based learning is an efficient method to integrate various sources of information and handling exceptional data in natural language processing tasks. Secondly checking the exceptional cases of rules and more information is made available to the learner for treating those exceptional cases. To evaluate the proposed method a number of experiments has been run and in order, to improve the importance of the various information in learning.

  1. Semantic text mining support for lignocellulose research

    Directory of Open Access Journals (Sweden)

    Meurs Marie-Jean

    2012-04-01

    Full Text Available Abstract Background Biofuels produced from biomass are considered to be promising sustainable alternatives to fossil fuels. The conversion of lignocellulose into fermentable sugars for biofuels production requires the use of enzyme cocktails that can efficiently and economically hydrolyze lignocellulosic biomass. As many fungi naturally break down lignocellulose, the identification and characterization of the enzymes involved is a key challenge in the research and development of biomass-derived products and fuels. One approach to meeting this challenge is to mine the rapidly-expanding repertoire of microbial genomes for enzymes with the appropriate catalytic properties. Results Semantic technologies, including natural language processing, ontologies, semantic Web services and Web-based collaboration tools, promise to support users in handling complex data, thereby facilitating knowledge-intensive tasks. An ongoing challenge is to select the appropriate technologies and combine them in a coherent system that brings measurable improvements to the users. We present our ongoing development of a semantic infrastructure in support of genomics-based lignocellulose research. Part of this effort is the automated curation of knowledge from information on fungal enzymes that is available in the literature and genome resources. Conclusions Working closely with fungal biology researchers who manually curate the existing literature, we developed ontological natural language processing pipelines integrated in a Web-based interface to assist them in two main tasks: mining the literature for relevant knowledge, and at the same time providing rich and semantically linked information.

  2. Semantic-based image retrieval by text mining on environmental texts

    Science.gov (United States)

    Yang, Hsin-Chang; Lee, Chung-Hong

    2003-01-01

    In this paper we propose a novel method to bridge the 'semantic gap' between a user's information need and the image content. The semantic gap describes the major deficiency of content-based image retrieval (CBIR) systems which use visual features extracted from images to describe the images. We conquer the deficiency by extracting semantic of an image from the environmental texts around it. Since an image generally co-exists with accompany texts in various formats, we may rely on such environmental texts to discover the semantic of the image. A text mining approach based on self-organizing maps is used to extract the semantic of an image from its environmental texts. We performed experiments on a small set of images and obtained promising results.

  3. Text from corners: a novel approach to detect text and caption in videos.

    Science.gov (United States)

    Zhao, Xu; Lin, Kai-Hsiang; Fu, Yun; Hu, Yuxiao; Liu, Yuncai; Huang, Thomas S

    2011-03-01

    Detecting text and caption from videos is important and in great demand for video retrieval, annotation, indexing, and content analysis. In this paper, we present a corner based approach to detect text and caption from videos. This approach is inspired by the observation that there exist dense and orderly presences of corner points in characters, especially in text and caption. We use several discriminative features to describe the text regions formed by the corner points. The usage of these features is in a flexible manner, thus, can be adapted to different applications. Language independence is an important advantage of the proposed method. Moreover, based upon the text features, we further develop a novel algorithm to detect moving captions in videos. In the algorithm, the motion features, extracted by optical flow, are combined with text features to detect the moving caption patterns. The decision tree is adopted to learn the classification criteria. Experiments conducted on a large volume of real video shots demonstrate the efficiency and robustness of our proposed approaches and the real-world system. Our text and caption detection system was recently highlighted in a worldwide multimedia retrieval competition, Star Challenge, by achieving the superior performance with the top ranking. PMID:20729170

  4. Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text

    Directory of Open Access Journals (Sweden)

    Altman Russ B

    2009-02-01

    Full Text Available Abstract Background Pharmacogenomics studies the relationship between genetic variation and the variation in drug response phenotypes. The field is rapidly gaining importance: it promises drugs targeted to particular subpopulations based on genetic background. The pharmacogenomics literature has expanded rapidly, but is dispersed in many journals. It is challenging, therefore, to identify important associations between drugs and molecular entities – particularly genes and gene variants, and thus these critical connections are often lost. Text mining techniques can allow us to convert the free-style text to a computable, searchable format in which pharmacogenomic concepts (such as genes, drugs, polymorphisms, and diseases are identified, and important links between these concepts are recorded. Availability of full text articles as input into text mining engines is key, as literature abstracts often do not contain sufficient information to identify these pharmacogenomic associations. Results Thus, building on a tool called Textpresso, we have created the Pharmspresso tool to assist in identifying important pharmacogenomic facts in full text articles. Pharmspresso parses text to find references to human genes, polymorphisms, drugs and diseases and their relationships. It presents these as a series of marked-up text fragments, in which key concepts are visually highlighted. To evaluate Pharmspresso, we used a gold standard of 45 human-curated articles. Pharmspresso identified 78%, 61%, and 74% of target gene, polymorphism, and drug concepts, respectively. Conclusion Pharmspresso is a text analysis tool that extracts pharmacogenomic concepts from the literature automatically and thus captures our current understanding of gene-drug interactions in a computable form. We have made Pharmspresso available at http://pharmspresso.stanford.edu.

  5. AUTOMATED TEXT CLUSTERING OF NEWSPAPER AND SCIENTIFIC TEXTS IN BRAZILIAN PORTUGUESE: ANALYSIS AND COMPARISON OF METHODS

    Directory of Open Access Journals (Sweden)

    Alexandre Ribeiro Afonso

    2014-10-01

    Full Text Available This article reports the findings of an empirical study about Automated Text Clustering applied to scientific articles and newspaper texts in Brazilian Portuguese, the objective was to find the most effective computational method able to cluster the input of texts in their original groups. The study covered four experiments, each experiment had four procedures: 1. Corpus Selections (a set of texts is selected for clustering, 2. Word Class Selections (Nouns, Verbs and Adjectives are chosen from each text by using specific algorithms, 3. Filtering Algorithms (a set of terms is selected from the results of the preview stage, a semantic weight is also inserted for each term and an index is generated for each text, 4. Clustering Algorithms (the clustering algorithms Simple K-Means, sIB and EM are applied to the indexes. After those procedures, clustering correctness and clustering time statistical results were collected. The sIB clustering algorithm is the best choice for both scientific and newspaper corpus, under the condition that the sIB clustering algorithm asks for the number of clusters as input before running (for the newspaper corpus, 68.9% correctness in 1 minute and for the scientific corpus, 77.8% correctness in 1 minute. The EM clustering algorithm additionally guesses the number of clusters without user intervention, but its best case is less than 53% correctness. Considering the experiments carried out, the results of human text classification and automated clustering are distant; it was also observed that the clustering correctness results vary according to the number of input texts and their topics.

  6. Using LSA and text segmentation to improve automatic Chinese dialogue text summarization

    Institute of Scientific and Technical Information of China (English)

    LIU Chuan-han; WANG Yong-cheng; ZHENG Fei; LIU De-rong

    2007-01-01

    Automatic Chinese text summarization for dialogue style is a relatively new research area. In this paper, Latent Semantic Analysis (LSA) is first used to extract semantic knowledge from a given document, all question paragraphs are identified,an automatic text segmentation approach analogous to TextTiling is exploited to improve the precision of correlating question paragraphs and answer paragraphs, and finally some "important" sentences are extracted from the generic content and the question-answer pairs to generate a complete summary. Experimental results showed that our approach is highly efficient and improves significantly the coherence of the summary while not compromising informativeness.

  7. Text4baby: Development and Implementation of a National Text Messaging Health Information Service

    Science.gov (United States)

    Whittaker, Robyn; Meehan, Judy; Jordan, Elizabeth; Stange, Paul; Cash, Amanda; Meyer, Paul; Baitty, Julie; Johnson, Pamela; Ratzan, Scott; Rhee, Kyu

    2012-01-01

    Text4baby is the first free national health text messaging service in the United States that aims to provide timely information to pregnant women and new mothers to help them improve their health and the health of their babies. Here we describe the development of the text messages and the large public–private partnership that led to the national launch of the service in 2010. Promotion at the local, state, and national levels produced rapid uptake across the United States. More than 320 000 people enrolled with text4baby between February 2010 and March 2012. Further evaluations of the effectiveness of the service are ongoing; however, important lessons can be learned from its development and uptake. PMID:23078509

  8. Metacognition and learning from text: Constructing a metacognitive questionnaire for text studying

    NARCIS (Netherlands)

    G. Schellings; B. van Hout-Wolters

    2008-01-01

    Teaching metacognitive strategies in learning from text is an important educational objective. So, metacognitive assessment methods are necessary within school settings. Although the advantages of metacognitive questionnaires are numerous, the convergent validity (correlation) with the thinking alou

  9. Computer-Aided Generation of Result Text for Clinical Laboratory Texts

    OpenAIRE

    Kuzmak, Peter M.; Miller, R. E.

    1983-01-01

    Efficient processing of non-numeric textual data is a frequent requirement with medical computer applications such as clinical laboratory result reporting. In such instances, it is often desirable that the computer control the generation of the text to ensure that the intended meaning is conveyed. This paper describes a technique for interactively selecting predefined text segments to form complex textual reports for laboratory tests. The approach, which uses algorithms based on augmented tra...

  10. Making sense of text : skills that support text comprehension and its development.

    OpenAIRE

    Cain, Kate

    2009-01-01

    Skilled reading involves two main components: word reading and text comprehension. In this article, I focus on three skills that have been shown to support the latter: integration and inference, comprehension monitoring, and knowledge and use of story structure. Research has shown that children with unexpectedly poor reading comprehension have difficulties with each of these text processing skills and that each skill contributes to development in reading comprehension during middle childhood....

  11. PROSAIC TEXTS OF ABBÂS VESIM AND INVESTIGATING OF THE POEMS IN THESE TEXTS

    Directory of Open Access Journals (Sweden)

    İbrahim HALİL TUĞLUK

    2015-12-01

    Full Text Available Classical Turkish Literature has been formed by effect of Persian literature to a large extent and formed its own language and so had a large important period of Turkish literature. Basic expression means of this literature is poetry. Prose has been always in the shadow of poetry and second class, in fact prosaic texts are usually about didactical subjects just as history, geography, science of religion, astronomy, medicine and biography. Prosaic wording has also differences in the context of purpose and content. An important feature of prose form is its effort to approach to poetry form. Harmony in language that is constituted by rhythm especially approached prose texts to poetic wording further. Studies about these poetic texts in prosaic texts are quite important in points of confirming statistics of poetic texts in Classical Turkish prose literature, retaining, meaning and harmony combinations that have been established by poetry in prose and also designating classical cultural substructure of Ottoman. There are a lot of scholar and craftsman who came into prominence by their literal and scholarly identity in Ottoman history. Abbâs Vesȋm is among the people who lived in 18th century and had these features. His works must be investigated in many aspects because of these features. In this context, it is important to research poetical sections of his works about medicine and astrology that Abbâs Vesȋm wrote except literature in the sense of confirming poetical wording in prosaic texts and designating wording features in prosaic texts.In this study, it is aimed to search the prosaic works of Abbâs Vesȋm who is poet of 18th century, confirm the copies, transcript of poetic sections in these works and search these works in the sense of form and content.

  12. Text Mining Approaches To Extract Interesting Association Rules from Text Documents

    OpenAIRE

    Vishwadeepak Singh Baghela; S. P. Tripathi

    2012-01-01

    A handful of text data mining approaches are available to extract many potential information and association from large amount of text data. The term data mining is used for methods that analyze data with the objective of finding rules and patterns describing the characteristic properties of the data. The 'mined information is typically represented as a model of the semantic structure of the dataset, where the model may be used on new data for prediction or classification. In general, data mi...

  13. An enhanced text categorization method based on improved text frequency approach and mutual information algorithm

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    Text categorization plays an important role in data mining.Feature selection is the most important process of text categorization.Focused on feature selection,we present an improved text frequency method for filtering of low frequency features to deal with the data preprocessing,propose an improved mutual information algorithm for feature selection,and develop an improved tf.idf method for characteristic weights evaluation.The proposed method is applied to the benchmark test set Reuters-21578 Top10 to examine its effectiveness.Numerical results show that the precision,the recall and the value of F1 of the proposed method are all superior to those of existing conventional methods.

  14. Automatic Summarization of Opinionated Texts Résumé automatique de textes d'opinion

    Directory of Open Access Journals (Sweden)

    Thierry Poibeau

    2011-04-01

    Full Text Available In this paper, we present a summarization system that is specifically designed to process blog posts, where factual information is mixed with opinions on the discussed facts. Our approach combines redundancy analysis with new information tracking and is enriched by a module that computes the polarity of textual fragments in order to summarize blog posts more efficiently. The system is evaluated against English data, especially through the participation in TAC (Text Analysis Conference, an international evaluation framework for automatic summarization, in which our system obtained interesting results.

  15. The Original Text and Translated Text in Derrida's Deconstruction Theory of Translation

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    Since the 1960s translation has made great progress on the way to becoming a systematic and scientific discipline. The theory of deconstruction, originating in France, has made great impact on traditional translation. It has become more influential in recent days. Through the discussion of deconstruction and its idea of translation, this thesis clarifies people's skeptical attitudes towards deconstruction and explains radical changes it has brought for translation field, especially in explaining the relationship between the original text and the translated text in Derrida's deconstruction theory. At the end of this thesis, the application and limitations of deconstruction are discussed.

  16. Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications

    CERN Document Server

    Miner, Gary; Hill, Thomas; Nisbet, Robert; Delen, Dursun

    2012-01-01

    The world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. This makes it possible to do many things that previously could not be done: spot business trends, prevent diseases, combat crime and so on. Managed well, the textual data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account. As the Internet expands and our natural capacity to process the unstructured text that it contains diminishes, the value of text mining for information retrieval and search will increase d

  17. Enhancing Summarization Skills Using Twin Texts: Instruction in Narrative and Expository Text Structures

    Science.gov (United States)

    Furtado, Leena; Johnson, Lisa

    2010-01-01

    This action-research case study endeavors to enhance the summarization skills of first grade students who are reading at or above the third grade level during the first trimester of the academic school year. Students read "twin text" sources, meaning, fiction and nonfiction literary selections focusing on a common theme to help identify and…

  18. Psychologie des discours et didactique des textes (Psychology of Discourse and the Teaching of Texts).

    Science.gov (United States)

    Bronckart, Jean-Paul, Ed.

    1995-01-01

    This collection of articles on the nature of discourse and writing instruction include: "Une demarche de psychologie de discours; quelques aspects introductifs" ("An Application of Discourse Psychology; Introductory Thoughts") (Jean-Paul Bronckart); "Les procedes de prise en charge enonciative dans trois genres de texts expositifs" ("The Processes…

  19. Text(ing) in Context: The Future of Workplace Communication in the United States

    Science.gov (United States)

    Kiddie, Thomas J.

    2014-01-01

    Following Rogers's theory of the diffusion of innovations, the author questions whether youth entering the workforce will act as change agents to evolve primary business communication channels from email to text-messaging. Expanding on research performed in 2009, the author investigates three communication scenarios: scheduling meetings,…

  20. Processing the Text of the Holy Quran: a Text Mining Study

    Directory of Open Access Journals (Sweden)

    Mohammad Alhawarat

    2015-02-01

    Full Text Available The Holy Quran is the reference book for more than 1.6 billion of Muslims all around the world Extracting information and knowledge from the Holy Quran is of high benefit for both specialized people in Islamic studies as well as non-specialized people. This paper initiates a series of research studies that aim to serve the Holy Quran and provide helpful and accurate information and knowledge to the all human beings. Also, the planned research studies aim to lay out a framework that will be used by researchers in the field of Arabic natural language processing by providing a ”Golden Dataset” along with useful techniques and information that will advance this field further. The aim of this paper is to find an approach for analyzing Arabic text and then providing statistical information which might be helpful for the people in this research area. In this paper the holly Quran text is preprocessed and then different text mining operations are applied to it to reveal simple facts about the terms of the holy Quran. The results show a variety of characteristics of the Holy Quran such as its most important words, its wordcloud and chapters with high term frequencies. All these results are based on term frequencies that are calculated using both Term Frequency (TF and Term Frequency-Inverse Document Frequency (TF-IDF methods.

  1. Public Text/Private Text: Making Visible the Voices That Shape Our Social Conscience.

    Science.gov (United States)

    Urion, Marilyn Vogler

    1995-01-01

    Discusses Julia Kristeva's notion of text--the tension between the semiotic and the symbolic--and how the tension can be made visible through typeface variation and other shaping techniques possible with word-processing software. Shares ways the author encourages students in first-year English classes to explore possibilities for incorporating…

  2. Connected text reading and differences in text reading fluency in adult readers

    NARCIS (Netherlands)

    Wallot, S.; Hollis, G.; Rooij, M. de

    2013-01-01

    The process of connected text reading has received very little attention in contemporary cognitive psychology. This lack of attention is in parts due to a research tradition that emphasizes the role of basic lexical constituents, which can be studied in isolated words or sentences. However, this lac

  3. Does Compare-Contrast Text Structure Help Students with Autism Spectrum Disorder Comprehend Science Text?

    Science.gov (United States)

    Carnahan, Christina R.; Williamson, Pamela S.

    2013-01-01

    Using a single-subject reversal design, this study evaluated the use of a compare-contrast strategy on the ability of students with autism spectrum disorder to comprehend science text. Three middle school students with high-functioning autism and their teacher participated in this study. A content analysis comparing the number of meaning units in…

  4. Comprehending expository texts: the dynamic neurobiological correlates of building a coherent text representation.

    Science.gov (United States)

    Swett, Katherine; Miller, Amanda C; Burns, Scott; Hoeft, Fumiko; Davis, Nicole; Petrill, Stephen A; Cutting, Laurie E

    2013-01-01

    Little is known about the neural correlates of expository text comprehension. In this study, we sought to identify neural networks underlying expository text comprehension, how those networks change over the course of comprehension, and whether information central to the overall meaning of the text is functionally distinct from peripheral information. Seventeen adult subjects read expository passages while being scanned using functional magnetic resonance imaging (fMRI). By convolving phrase onsets with the hemodynamic response function (HRF), we were able to identify regions that increase and decrease in activation over the course of passage comprehension. We found that expository text comprehension relies on the co-activation of the semantic control network and regions in the posterior midline previously associated with mental model updating and integration [posterior cingulate cortex (PCC) and precuneus (PCU)]. When compared to single word comprehension, left PCC and left Angular Gyrus (AG) were activated only for discourse-level comprehension. Over the course of comprehension, reliance on the same regions in the semantic control network increased, while a parietal region associated with attention [intraparietal sulcus (IPS)] decreased. These results parallel previous findings in narrative comprehension that the initial stages of mental model building require greater visuospatial attention processes, while maintenance of the model increasingly relies on semantic integration regions. Additionally, we used an event-related analysis to examine phrases central to the text's overall meaning vs. peripheral phrases. It was found that central ideas are functionally distinct from peripheral ideas, showing greater activation in the PCC and PCU, while over the course of passage comprehension, central and peripheral ideas increasingly recruit different parts of the semantic control network. The finding that central information elicits greater response in mental model

  5. Mobile characters, mobile texts: homelessness and intertextuality in contemporary texts for young people

    Directory of Open Access Journals (Sweden)

    Mavis Reimer

    2013-06-01

    Full Text Available Since the 1990s, narratives about homelessness for and about young people have proliferated around the world. A cluster of thematic elements shared by many of these narratives of the age of globalization points to the deep anxiety that is being expressed about a social, economic, and cultural system under stress or struggling to find a new formation. More surprisingly, many of the narratives also use canonical cultural texts extensively as intertexts. This article considers three novels from three different national traditions to address the work of intertextuality in narratives about homelessness: Skellig by UK author David Almond, which was published in 1998; Chronicler of the Winds by Swedish author Henning Mankell, which was first published in 1988 in Swedish as Comédia Infantil and published in an English translation in 2006; and Stained Glass by Canadian author Michael Bedard, which was published in 2002. Using Julia Kristeva's definition of intertextuality as the “transposition of one (or several sign systems into another,” I propose that all intertexts can be thought of as metaphoric texts, in the precise sense that they carry one text into another. In the narratives under discussion in this article, the idea of homelessness is in perpetual motion between texts and intertexts, ground and figure, the literal and the symbolic. What the child characters and the readers who take up the position offered to implied readers are asked to do, I argue, is to put on a way of seeing that does not settle, a way of being that strains forward toward the new.

  6. Comprehending expository texts: The dynamic neurobiological correlates of building a coherent text representation

    Directory of Open Access Journals (Sweden)

    Amanda eMiller

    2013-12-01

    Full Text Available Little is known about the neural correlates of expository text comprehension. In this study, we sought to identify neural networks underlying expository text comprehension, how those networks change over the course of comprehension, and whether information central to the overall meaning of the text is functionally distinct from peripheral information. Seventeen adult subjects read expository passages while being scanned using functional magnetic resonance imaging (fMRI. By convolving phrase onsets with the hemodynamic response function (HRF, we were able to identify regions that increase and decrease in activation over the course of passage comprehension. We found that expository text comprehension relies on the co-activation of the semantic control network and regions in the posterior midline previously associated with mental model updating and integration (posterior cingulate cortex (PCC and precuneus (PCU. When compared to single word comprehension, left PCC and left Angular Gyrus (AG were activated only for discourse-level comprehension. Over the course of comprehension, reliance on the same regions in the semantic control network and posterior midline increased, while a parietal region associated with attention (intraparietal sulcus (IPS decreased. These results parallel previous findings in narrative comprehension that the initial stages of mental model building require greater visuospatial attention processes, while maintenance of the model increasingly relies on semantic integration regions. Additionally, we used an event-related analysis to examine phrases central to the text’s overall meaning versus peripheral phrases. It was found that central ideas are functionally distinct from peripheral (showing greater activation in the PCC and PCU, and also recruit different parts of the semantic control network over time than peripheral ideas. These findings support previous behavioral models on the cognitive importance of distinguishing

  7. On the reduction of generalized polylogarithms to $\\text{Li}_n$ and $\\text{Li}_{2,2}$ and on the evaluation thereof

    CERN Document Server

    Frellesvig, Hjalte; Wever, Christopher

    2016-01-01

    We give expressions for all generalized polylogarithms up to weight four in terms of the functions log, $\\text{Li}_n$, and $\\text{Li}_{2,2}$, valid for arbitrary complex variables. Furthermore we provide algorithms for manipulation and numerical evaluation of $\\text{Li}_n$ and $\\text{Li}_{2,2}$, and add codes in Mathematica and C++ implementing the results. With these results we calculate a number of previously unknown integrals, which we add in App. C.

  8. Text2Video: text-driven facial animation using MPEG-4

    Science.gov (United States)

    Rurainsky, J.; Eisert, P.

    2005-07-01

    We present a complete system for the automatic creation of talking head video sequences from text messages. Our system converts the text into MPEG-4 Facial Animation Parameters and synthetic voice. A user selected 3D character will perform lip movements synchronized to the speech data. The 3D models created from a single image vary from realistic people to cartoon characters. A voice selection for different languages and gender as well as a pitch shift component enables a personalization of the animation. The animation can be shown on different displays and devices ranging from 3GPP players on mobile phones to real-time 3D render engines. Therefore, our system can be used in mobile communication for the conversion of regular SMS messages to MMS animations.

  9. Text-based Research of Early Warning Platform from Food Complaint Texts

    OpenAIRE

    Yueyi Zhang; Taiyi Chen; Jing Hu; Xinghua Fang

    2015-01-01

    This study proposes a food complaint text early warning method based on the guidance of ontology and establishes a scientific and reasonable system of early warning, builds and improves the food security early warning platform. All of those make this study play a supplementary role in the research content of food safety regulators. Based on traditional early warning system, this study constructs food safety complaints warning platform model and builds the food domain ontology and expands food...

  10. Aspects in developing of a text analizer for processing unstructured text data

    OpenAIRE

    Petic, Mircea; Osoian, Ecaterina

    2015-01-01

    Тhe article presents our approach in the elaboration of the system for processing unstructured text data in order to create a structured data output as computer linguistics resources using a lexicon of markers. First, a description of the research on the proposed topic, as well as its relation to the national and international level research is presented, being followed by the depiction of a useful to this particular research functionality - PoS Tagger for Romanian. A special section is de...

  11. A new graph based text segmentation using Wikipedia for automatic text summarization

    Directory of Open Access Journals (Sweden)

    Mohsen Pourvali

    2012-01-01

    Full Text Available The technology of automatic document summarization is maturing and may provide a solution to the information overload problem. Nowadays, document summarization plays an important role in information retrieval. With a large volume of documents, presenting the user with a summary of each document greatly facilitates the task of finding the desired documents. Document summarization is a process of automatically creating a compressed version of a given document that provides useful information to users, and multi-document summarization is to produce a summary delivering the majority of information content from a set of documents about an explicit or implicit main topic. According to the input text, in this paper we use the knowledge base of Wikipedia and the words of the main text to create independent graphs. We will then determine the important of graphs. Then we are specified importance of graph and sentences that have topics with high importance. Finally, we extract sentences with high importance. The experimental results on an open benchmark datasets from DUC01 and DUC02 show that our proposed approach can improve the performance compared to state-of-the-art summarization approaches

  12. Text processing for technical reports (direct computer-assisted origination, editing, and output of text)

    Energy Technology Data Exchange (ETDEWEB)

    De Volpi, A.; Fenrick, M. R.; Stanford, G. S.; Fink, C. L.; Rhodes, E. A.

    1980-10-01

    Documentation often is a primary residual of research and development. Because of this important role and because of the large amount of time consumed in generating technical reports, particularly those containing formulas and graphics, an existing data-processing computer system has been adapted so as to provide text-processing of technical documents. Emphasis has been on accuracy, turnaround time, and time savings for staff and secretaries, for the types of reports normally produced in the reactor development program. The computer-assisted text-processing system, called TXT, has been implemented to benefit primarily the originator of technical reports. The system is of particular value to professional staff, such as scientists and engineers, who have responsibility for generating much correspondence or lengthy, complex reports or manuscripts - especially if prompt turnaround and high accuracy are required. It can produce text that contains special Greek or mathematical symbols. Written in FORTRAN and MACRO, the program TXT operates on a PDP-11 minicomputer under the RSX-11M multitask multiuser monitor. Peripheral hardware includes videoterminals, electrostatic printers, and magnetic disks. Either data- or word-processing tasks may be performed at the terminals. The repertoire of operations has been restricted so as to minimize user training and memory burden. Spectarial staff may be readily trained to make corrections from annotated copy. Some examples of camera-ready copy are provided.

  13. Advanced text authorship detection methods and their application to biblical texts

    Science.gov (United States)

    Putniņš, Tālis; Signoriello, Domenic J.; Jain, Samant; Berryman, Matthew J.; Abbott, Derek

    2005-12-01

    Authorship attribution has a range of applications in a growing number of fields such as forensic evidence, plagiarism detection, email filtering, and web information management. In this study, three attribution techniques are extended, tested on a corpus of English texts, and applied to a book in the New Testament of disputed authorship. The word recurrence interval based method compares standard deviations of the number of words between successive occurrences of a keyword both graphically and with chi-squared tests. The trigram Markov method compares the probabilities of the occurrence of words conditional on the preceding two words to determine the similarity between texts. The third method extracts stylometric measures such as the frequency of occurrence of function words and from these constructs text classification models using multiple discriminant analysis. The effectiveness of these techniques is compared. The accuracy of the results obtained by some of these extended methods is higher than many of the current state of the art approaches. Statistical evidence is presented about the authorship of the selected book from the New Testament.

  14. Relating interesting quantitative time series patterns with text events and text features

    Science.gov (United States)

    Wanner, Franz; Schreck, Tobias; Jentner, Wolfgang; Sharalieva, Lyubka; Keim, Daniel A.

    2013-12-01

    In many application areas, the key to successful data analysis is the integrated analysis of heterogeneous data. One example is the financial domain, where time-dependent and highly frequent quantitative data (e.g., trading volume and price information) and textual data (e.g., economic and political news reports) need to be considered jointly. Data analysis tools need to support an integrated analysis, which allows studying the relationships between textual news documents and quantitative properties of the stock market price series. In this paper, we describe a workflow and tool that allows a flexible formation of hypotheses about text features and their combinations, which reflect quantitative phenomena observed in stock data. To support such an analysis, we combine the analysis steps of frequent quantitative and text-oriented data using an existing a-priori method. First, based on heuristics we extract interesting intervals and patterns in large time series data. The visual analysis supports the analyst in exploring parameter combinations and their results. The identified time series patterns are then input for the second analysis step, in which all identified intervals of interest are analyzed for frequent patterns co-occurring with financial news. An a-priori method supports the discovery of such sequential temporal patterns. Then, various text features like the degree of sentence nesting, noun phrase complexity, the vocabulary richness, etc. are extracted from the news to obtain meta patterns. Meta patterns are defined by a specific combination of text features which significantly differ from the text features of the remaining news data. Our approach combines a portfolio of visualization and analysis techniques, including time-, cluster- and sequence visualization and analysis functionality. We provide two case studies, showing the effectiveness of our combined quantitative and textual analysis work flow. The workflow can also be generalized to other

  15. The Relationship between the Original Text and the Translated Text in Derrida’s Deconstruction Theory

    Institute of Scientific and Technical Information of China (English)

    LI Xueping

    2015-01-01

    Deconstructualism is a post-modernist trend of thought that sprung up in France during the middle of 1960s. Derrida is the representative of deconstructualism.This paper firstly briefly introduces Derrida’s deconstructive thoughts and his deconstructive translation theory.And then it analyzes the reason why Derrida introduces his deconstructive thoughts into translation theory.Thirdly, it mainly analyzes the relationship between the original text and the translated text in Derrida’s deconstruction theory.%解构主义于上个世纪60年代在法国崛起,其代表人物是雅克•德里达。本文首先简要介绍了德里达的解构主义思想及其解构主义翻译理论;其次分析了德里达将解构主义思想引入翻译理论的原因;然后主要分析了德里达解构主义理论中原文与译文的关系。

  16. GURMUKHI TEXT EXTRACTION FROM IMAGE USING SUPPORT VECTOR MACHINE (SVM

    Directory of Open Access Journals (Sweden)

    SUKHWINDER KAUR

    2011-04-01

    Full Text Available Extensive research has been done on image classification for different purposes like face recognition, identification of different objects and identification/extraction of text from image having some background. Text identification is an active research area where by system tries to identify the text area in a given image. Text area identified is then passed to OCR system for further recognition of the text. This work is about classifying image area in two classes text and non text using SVM (support vector machine. We identified the features and train a model based on the feature vector which is then used to classify text and non text area in an image. The system reports 70.5% accuracy for caption text images, 70.43% for document text images and 50.40% for scene text image.

  17. The Informational Text Structure Survey (ITS[superscript 2]): An Exploration of Primary Grade Teachers' Sensitivity to Text Structure in Young Children's Informational Texts

    Science.gov (United States)

    Reutzel, D. Ray; Jones, Cindy D.; Clark, Sarah K.; Kumar, Tamara

    2016-01-01

    There has been no research reported about if or how well primary grade teachers can identify information text structures in children's authentic informational texts. The ability to do so accurately and reliably is a prerequisite for teachers to be able to teach students how to recognize and use text structures to assist them in comprehending…

  18. Acoustic Evaluation as a Variety of Text Metonymy

    Directory of Open Access Journals (Sweden)

    Ella V. Nesterik

    2013-01-01

    Full Text Available The article deals with sensorial evaluation, namely, acoustic evaluation as a text-forming category, studied in terms of text linguistics and text stylistics. Acoustic evaluation is considered as a variety of text metonymy, a sort of stylistic device expressing characters’ emotional state and time perception metonymically

  19. Texting, Textese and Literacy Abilities: A Naturalistic Study

    Science.gov (United States)

    Drouin, Michelle; Driver, Brent

    2014-01-01

    In this study, we examined texting behaviours, text message characteristics (textese) of actual sent text messages and the relationships between texting, textese and literacy abilities in a sample of 183 American undergraduates. As compared to previous naturalistic and experimental studies with English-speaking adults, both texting frequency and…

  20. T-Scan: a new tool for analyzing Dutch text

    NARCIS (Netherlands)

    Pander Maat, H.L.W.; Kraf, R.L.; van den Bosch, Antal; van Gompel, Maarten; Kleijn, S.; Sanders, T.J.M.; van der Sloot, Ko

    2014-01-01

    T-Scan is a new tool for analyzing Dutch text. It aims at extracting text features that are theoretically interesting, in that they relate to genre and text complexity, as well as practically interesting, in that they enable users and text producers to make text-specific diagnoses. T-Scan derives it

  1. LITURGICAL TEXT IN ANTON CHEKHOV'S NOVELLA "THE DUEL"

    Directory of Open Access Journals (Sweden)

    Syzranov S. V.

    2008-11-01

    Full Text Available The article examines the principle of interaction between the sacred speech, embodied in liturgical texts, and the literary text, typical for Anton Chekhov's works, by the example of his novella "The Duel".

  2. On the Functions of Lexical Collocation in English Texts

    Institute of Scientific and Technical Information of China (English)

    XIAO Fuliang

    2016-01-01

    Lexical collocation , as a cohesive device of an English text, is helpful to make up a cohesive and coherent text. Therefore, to better comprehend English text, different patterns and functions of lexical collocation should be guided in detail.

  3. Closely Reading Informational Texts in the Primary Grades

    Science.gov (United States)

    Fisher, Douglas; Frey, Nancy

    2014-01-01

    In this article we discuss the differences between close reading in the primary grades and upper elementary grades. We focus on text selection, initial reading. repeated reading, annotation, text-based discussions, and responding to texts.

  4. Text Analytics: the convergence of Big Data and Artificial Intelligence

    OpenAIRE

    Antonio Moreno; Teófilo Redondo

    2016-01-01

    The analysis of the text content in emails, blogs, tweets, forums and other forms of textual communication constitutes what we call text analytics. Text analytics is applicable to most industries: it can help analyze millions of emails; you can analyze customers’ comments and questions in forums; you can perform sentiment analysis using text analytics by measuring positive or negative perceptions of a company, brand, or product. Text Analytics has also been called text mining, and is a subcat...

  5. Techniques, Applications and Challenging Issue in Text Mining

    Directory of Open Access Journals (Sweden)

    Shaidah Jusoh

    2012-11-01

    Full Text Available Text mining is a very exciting research area as it tries to discover knowledge from unstructured texts. These texts can be found on a desktop, intranets and the internet. The aim of this paper is to give an overview of text mining in the contexts of its techniques, application domains and the most challenging issue. The focus is given on fundamentals methods of text mining which include natural language possessing and information extraction. This paper also gives a short review on domains which have employed text mining. The challenging issue in text mining which is caused by the complexity in a natural language is also addressed in this paper.

  6. More Than Words can Tell - Using Multimodal Texts to Support Reading Comprehension of Literary Texts in English

    OpenAIRE

    Leismann, Silke

    2015-01-01

    This thesis explores the possibilities of multimodality in supporting text comprehension of literary texts in language learning of the L2. While multimodal texts offer multiple ways of meaning making that sometimes go beyond the written text, I have focussed on multimodal expressions that mirror the context of a given text. I conducted an empirical study with 114 students (grade 9; 13-14 years) in two schools in Trondheim, Norway. The material I used consisted of three literary texts (e...

  7. Drawing on Text Features for Reading Comprehension and Composing

    Science.gov (United States)

    Risko, Victoria J.; Walker-Dalhouse, Doris

    2011-01-01

    Students read multiple-genre texts such as graphic novels, poetry, brochures, digitized texts with videos, and informational and narrative texts. Features such as overlapping illustrations and implied cause-and-effect relationships can affect students' comprehension. Teaching with these texts and drawing attention to organizational features hold…

  8. What oral text reading fluency can reveal about reading comprehension

    NARCIS (Netherlands)

    Veenendaal, N.J.; Groen, M.A.; Verhoeven, L.T.W.

    2015-01-01

    Text reading fluency – the ability to read quickly, accurately and with a natural intonation – has been proposed as a predictor of reading comprehension. In the current study, we examined the role of oral text reading fluency, defined as text reading rate and text reading prosody, as a contributor t

  9. Classroom Writing Tasks and Students' Analytic Text-Based Writing

    Science.gov (United States)

    Matsumura, Lindsay Clare; Correnti, Richard; Wang, Elaine

    2015-01-01

    The Common Core State Standards emphasize students writing analytically in response to texts. Questions remain about the nature of instruction that develops students' text-based writing skills. In the present study, we examined the role that writing task quality plays in students' mastery of analytic text-based writing. Text-based writing tasks…

  10. Comprehension and Learning from Refutation and Expository Texts

    Science.gov (United States)

    Diakidoy, Irene-Anna N.; Mouskounti, Thalia; Ioannides, Christos

    2011-01-01

    The study compared the effects of a refutation text on comprehension and learning outcomes to those of a standard expository text. Undergraduate students with varying amounts of accurate and inaccurate prior knowledge read and recalled a refutation or an expository text about energy. Comprehension measures included the amount of text information…

  11. Acoustic Evaluation as a Variety of Text Metonymy

    OpenAIRE

    Ella V. Nesterik; Anna D. Matrossova

    2013-01-01

    The article deals with sensorial evaluation, namely, acoustic evaluation as a text-forming category, studied in terms of text linguistics and text stylistics. Acoustic evaluation is considered as a variety of text metonymy, a sort of stylistic device expressing characters’ emotional state and time perception metonymically

  12. Introducing Text Analytics as a Graduate Business School Course

    Science.gov (United States)

    Edgington, Theresa M.

    2011-01-01

    Text analytics refers to the process of analyzing unstructured data from documented sources, including open-ended surveys, blogs, and other types of web dialog. Text analytics has enveloped the concept of text mining, an analysis approach influenced heavily from data mining. While text mining has been covered extensively in various computer…

  13. Text Processing and Formatting: Composure, Composition and Eros.

    Science.gov (United States)

    Blair, John C., Jr.

    1984-01-01

    Review of computer software offering text editing/processing capabilities highlights work habits, elements of computer style and composition, buffers, the CRT, line- and screen-oriented text editors, video attributes, "swapping,""cache" memory, "disk emulators," text editing versus text processing, and UNIX operating system. Specific programs…

  14. IM Set to Talk with You with Text!

    Science.gov (United States)

    Descy, Don E.

    2007-01-01

    In this article, the author discusses text messaging and instant messaging (IM). In a nutshell, text messaging is another name for Short Message Service (SMS). SMS is a service available on most digital mobile phones that permits the sending of short messages (also known as SMSes, text messages, messages, or more colloquially texts or even txts)…

  15. Toward a Model of Text Comprehension and Production.

    Science.gov (United States)

    Kintsch, Walter; Van Dijk, Teun A.

    1978-01-01

    Described is the system of mental operations occurring in text comprehension and in recall and summarization. A processing model is outlined: 1) the meaning elements of a text become organized into a coherent whole, 2) the full meaning of the text is condensed into its gist, and 3) new texts are generated from the comprehension processes.…

  16. TEXT MINING – PREREQUISITE FOR KNOWLEDGE MANAGEMENT SYSTEMS

    OpenAIRE

    Dragoº Marcel VESPAN

    2009-01-01

    Text mining is an interdisciplinary field with the main purpose of retrieving new knowledge from large collections of text documents. This paper presents the main techniques used for knowledge extraction through text mining and their main areas of applicability and emphasizes the importance of text mining in knowledge management systems.

  17. Teaching Literature in an Age of Text Complexity

    Science.gov (United States)

    Alsup, Janet

    2013-01-01

    The recently released Common Core State Standards increase classroom emphasis on informational texts in high school and recommend a three-part measurement for text complexity when selecting texts for classroom use. In this commentary I argue that fictional narratives can not only meet these stated criteria for complex texts and result in critical…

  18. Fiction vs Informational Texts: Which Will Kindergartners Choose?

    Science.gov (United States)

    Correia, Marlene Ponte

    2011-01-01

    Informational texts include books as well as text in other formats such as magazines, newspapers, and online articles. The primary purpose of informational text is to provide information about the natural and social world. Literacy research cites many reasons why nonfiction/informational texts should be included in primary classrooms. The…

  19. What Oral Text Reading Fluency Can Reveal about Reading Comprehension

    Science.gov (United States)

    Veenendaal, Nathalie J.; Groen, Margriet A.; Verhoeven, Ludo

    2015-01-01

    Text reading fluency--the ability to read quickly, accurately and with a natural intonation--has been proposed as a predictor of reading comprehension. In the current study, we examined the role of oral text reading fluency, defined as text reading rate and text reading prosody, as a contributor to reading comprehension outcomes in addition to…

  20. Syntactic Complexity as an Aspect of Text Complexity

    Science.gov (United States)

    Frantz, Roger S.; Starr, Laura E.; Bailey, Alison L.

    2015-01-01

    Students' ability to read complex texts is emphasized in the Common Core State Standards (CCSS) for English Language Arts and Literacy. The standards propose a three-part model for measuring text complexity. Although the model presents a robust means for determining text complexity based on a variety of features inherent to a text as well as…

  1. A New Method to Extract Text from Natural Scenes

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

    This paper presents a new method for text detection, location and binarization fron natural scenes. Several morphological steps are used to detect the general positian of the text, including English, Chinese and Japanese characters. Next bounding boxes are processed by a new "Expand, Break and Merge" (EBM) method to get the precise text areas. Finally, text is binarized by a hybrid method based on Otsu and Niblack. This new approach can extract different kinds of text from complicated natural scenes. It is insensitive to noise, distortedness, and text orientation. It also has good performance on extracting texts in various sizes.

  2. Detection of text in images using SUSAN edge detector

    Institute of Scientific and Technical Information of China (English)

    MAO Wen-ge; ZHANG Tian-wen; WANG Li

    2005-01-01

    Text embedded in images is one of many important cues for indexing and retrieval of images and videos. In the paper, we present a novel method of detecting text aligned either horizontally or vertically, in which a pyramid structure is used to represent an image and the features of the text are extracted using SUSAN edge detector. Text regions at each level of the pyramid are identified according to the autocorrelation analysis. New techniques are introduced to split the text regions into basic ones and merge them into text lines. By evaluating the method on a set of images, we obtain a very good performance of text detection.

  3. A contrastive analysis of French and English social statistics texts

    OpenAIRE

    Creed, Mairead

    1995-01-01

    This thesis adopts the theoretical framework of contrastive textology (CT) developed by Hartmann (1980) for the analysis of the language of French and English expository texts from the domain of social statistics CT results from a combination of two linguistic orientations text linguistics and contrastive stylistics (CS). Hartmann uses the term parallel texts to describe (a) translated texts and (b) non-translated texts in two languages which were produced in circumstances so similar as ...

  4. Techniques, Applications and Challenging Issue in Text Mining

    OpenAIRE

    Shaidah Jusoh; Hejab M. Alfawareh

    2012-01-01

    Text mining is a very exciting research area as it tries to discover knowledge from unstructured texts. These texts can be found on a desktop, intranets and the internet. The aim of this paper is to give an overview of text mining in the contexts of its techniques, application domains and the most challenging issue. The focus is given on fundamentals methods of text mining which include natural language possessing and information extraction. This paper also gives a short review on domains whi...

  5. Research on Text Mining Based on Domain Ontology

    OpenAIRE

    Li-hua, Jiang; Neng-fu, Xie; Hong-bin, Zhang

    2013-01-01

    This paper improves the traditional text mining technology which cannot understand the text semantics. The author discusses the text mining methods based on ontology and puts forward text mining model based on domain ontology. Ontology structure is built firstly and the “concept-concept” similarity matrix is introduced, then a conception vector space model based on domain ontology is used to take the place of traditional vector space model to represent the documents in order to realize text m...

  6. The Chernobyl plant shutdown; L'arret de la centrale de Tchernobyl

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2000-12-01

    The Chernobylsk-1 reactor, operational in september 1977 has been stopped in november 1996; the Chernobylsk-2 reactor started in november 1978 is out of order since 1991 following a fire. The Chernobylsk-3 reactor began in 1981. During the last three years it occurs several maintenance operations that stop it. In june 2000, the Ukrainian authorities decided to stop it definitively on the 15. of december (2000). This file handles the subject. it is divided in four chapters: the first one gives the general context of the plant shutdown, the second chapter studies the supporting projects to stop definitively the nuclear plant, the third chapter treats the question of the sarcophagus, and the fourth and final chapter studies the consequences of the accident and the contaminated territories. (N.C.)

  7. A Survey On Various Approaches Of Text Extraction In Images

    Directory of Open Access Journals (Sweden)

    C.P. Sumathi

    2012-09-01

    Full Text Available Text Extraction plays a major role in finding vital and valuable information. Text extraction involvesdetection, localization, tracking, binarization, extraction, enhancement and recognition of the text from the given image. These text characters are difficult to be detected and recognized due to their deviation of size, font, style, orientation, alignment, contrast, complex colored, textured background. Due to rapid growth of available multimedia documents and growing requirement for information, identification, indexing and retrieval, many researches have been done on text extraction in images.Several techniqueshave been developed for extracting the text from an image. The proposed methods were based on morphological operators, wavelet transform, artificial neural network,skeletonization operation,edge detection algorithm, histogram technique etc. All these techniques have their benefits and restrictions. This article discusses various schemes proposed earlier for extracting the text from an image. This paper also provides the performance comparison of several existing methods proposed by researchers in extracting the text from an image.

  8. Why Are Some Texts Good and Others Not? Relationship between Text Quality and Management of the Writing Processes

    Science.gov (United States)

    Beauvais, Caroline; Olive, Thierry; Passerault, Jean-Michel

    2011-01-01

    Two experiments examined whether text quality is related to online management of the writing processes. Experiment 1 focused on the relationship between online management and text quality in narrative and argumentative texts. Experiment 2 investigated how this relationship might be affected by a goal emphasizing text quality. In both experiments,…

  9. The Application of the Cooperative Principle in Text Messages

    Institute of Scientific and Technical Information of China (English)

    李军霞

    2015-01-01

    The language of text messages speeds up the transmission of information,shows the richness of languages,and contains all kinds of implication. Many researches on text messages have been published but the analysis of the languages of text messages in the domain of Grice’s cooperative principle is open to investigate. This paper explores the language of text messages based on Grice’s Cooperative Principle(CP) and its maxims,which aims to understand how the theory influences the text message communication and create some humorous effect. It is of practical significance to research text messages as a kind of language phenomenon.

  10. The Application of the Cooperative Principle in Text Messages

    Institute of Scientific and Technical Information of China (English)

    李军霞

    2015-01-01

    The language of text messages speeds up the transmission of information,shows the richness of languages,and contains all kinds of implication. Many researches on text messages have been published but the analysis of the languages of text messages in the domain of Grice’s cooperative principle is open to investigate. This paper explores the language of text messages based on Grice’s Cooperative Principle (CP) and its maxims,which aims to understand how the theory influences the text message communication and create some humorous effect. It is of practical significance to research text messages as a kind of language phenomenon.

  11. Modeling, Learning, and Processing of Text Technological Data Structures

    CERN Document Server

    Kühnberger, Kai-Uwe; Lobin, Henning; Lüngen, Harald; Storrer, Angelika; Witt, Andreas

    2012-01-01

    Researchers in many disciplines have been concerned with modeling textual data in order to account for texts as the primary information unit of written communication. The book “Modelling, Learning and Processing of Text-Technological Data Structures” deals with this challenging information unit. It focuses on theoretical foundations of representing natural language texts as well as on concrete operations of automatic text processing. Following this integrated approach, the present volume includes contributions to a wide range of topics in the context of processing of textual data. This relates to the learning of ontologies from natural language texts, the annotation and automatic parsing of texts as well as the detection and tracking of topics in texts and hypertexts. In this way, the book brings together a wide range of approaches to procedural aspects of text technology as an emerging scientific discipline.

  12. Pseudo-Label Generation for Multi-Label Text Classification

    Data.gov (United States)

    National Aeronautics and Space Administration — With the advent and expansion of social networking, the amount of generated text data has seen a sharp increase. In order to handle such a huge volume of text data,...

  13. The classical dramatic text and its value in contemporary theatre

    Directory of Open Access Journals (Sweden)

    Nina Žavbi Milojević

    2013-06-01

    Full Text Available This paper deals with the classical dramatic text and its staging in contemporary theatre. Specifically, it aims to show that classical texts can address topical issues. This is illustrated by the example of several stagings of Ivan Cankar’s Hlapci, one of the most influential dramatic texts in Slovene literature. The history of this dramatic text is presented from its first publication and reception to the different stagings in various Slovene professional theatres. The focus is on how the situation in Slovene society is reflected in each examined staging. The drama Hlapci was first staged almost one hundred years ago, when the staging followed closely the dramatic text. However, after 1980 stagings became more independent from the text and more artistic freedom was allowed. The paper will prove that classical dramatic texts are very appropriate for staging in contemporary theatre, especially with an innovative director’s approach.

  14. Six Auxiliary Texts to AACR2: A Review Article.

    Science.gov (United States)

    Hirshon, Arnold; Branson, Barbara

    1981-01-01

    Examines six monographs intended to support the use of "Anglo-American Cataloging Rules, Second Edition (AACR2)": three general texts, two on cataloging nonbook materials, and one concerned with serials cataloging. Five additional texts are cited. (FM)

  15. Hierarchical Three-level Ontology for Text Processing

    OpenAIRE

    Gladun, Victor; Velychko, Vitalii; Svyatogor, Leonid

    2008-01-01

    The principal feature of ontology, which is developed for a text processing, is wider knowledge representation of an external world due to introduction of three-level hierarchy. It allows to improve semantic interpretation of natural language texts.

  16. The text plan concept: contributions to the writing planning process

    Directory of Open Access Journals (Sweden)

    Ana Lúcia Tinoco Cabral

    2013-12-01

    Full Text Available Students - at different levels, ranging from early grades up to PhD - face problems both on comprehension and text production. This paper focuses on the text plan concept according to the DTA (Discourse Text Analysis approach, i.e., a principle of organization that allows students to put into practice the production intention as well as to arrange text information while producing; being responsible for the text compositional structure (Adam, 2008. The study analyzes the relation between text plan and the writing planning process, in which the first one provides the second with theoretical support. In order to develop such research, the study covers some issues related to the reading skill, analyzes an argumentative text as per its textual plan, and presents some reflections on the writing process, focusing on the relation between textual plan and the writing planning process.

  17. A New Fragile Watermarking Scheme for Text Documents Authentication

    Institute of Scientific and Technical Information of China (English)

    XIANG Huazheng; SUN Xingming; TANG Chengliang

    2006-01-01

    Because there are different modification types of deleting characters and inserting characters in text documents, the algorithms for image authentication can not be used for text documents authentication directly. A text watermarking scheme for text document authentication is proposed in this paper. By extracting the features of character cascade together with the user secret key, the scheme combines the features of the text with the user information as a watermark which is embedded into the transformed text itself. The receivers can verify the integrity and the authentication of the text through the blind detection technique. A further research demonstrates that it can also localize the tamper, classify the type of modification, and recover part of modified text documents. The aforementioned conclusion has been proved by both our experiment results and analysis.

  18. Towards Multi Label Text Classification through Label Propagation

    Directory of Open Access Journals (Sweden)

    Shweta C. Dharmadhikari

    2012-06-01

    Full Text Available Classifying text data has been an active area of research for a long time. Text document is multifaceted object and often inherently ambiguous by nature. Multi-label learning deals with such ambiguous object. Classification of such ambiguous text objects often makes task of classifier difficult while assigning relevant classes to input document. Traditional single label and multi class text classification paradigms cannot efficiently classify such multifaceted text corpus. Through our paper we are proposing a novel label propagation approach based on semi supervised learning for Multi Label Text Classification. Our proposed approach models the relationship between class labels and also effectively represents input text documents. We are using semi supervised learning technique for effective utilization of labeled and unlabeled data for classification. Our proposed approach promises better classification accuracy and handling of complexity and elaborated on the basis of standard datasets such as Enron, Slashdot and Bibtex.

  19. Message Encryption Using Deceptive Text and Randomized Hashing

    Directory of Open Access Journals (Sweden)

    VAMSIKRISHNA YENIKAPATI,

    2011-02-01

    Full Text Available In this paper a new approach for message encryption using the concept called deceptive text is proposed.In this scheme we don’t need send encrypted plain text to receiver, instead, we send a meaningful deceptive text and an encrypted special index file to message receiver.The original message is embedded in the meaningful deceptive text.The positions of the characters of the plain text in thedeceptive text are stored in the index file.The receiver decrypts the index file and gets back the original message from the received deceptive text. Authentication is achieved by verifying the hash value of the plaintext created by the Message Digest Algorithm at the receiver side.In order to prevent collision attcks on hashing algorithms that are intended for use with standard digital signature algorithms we provide an extra layer of security using randomized hashing method.

  20. Introduction to Text Mining with R for Information Professionals

    Directory of Open Access Journals (Sweden)

    Monica Maceli

    2016-07-01

    Full Text Available The 'tm: Text Mining Package' in the open source statistical software R has made text analysis techniques easily accessible to both novice and expert practitioners, providing useful ways of analyzing and understanding large, unstructured datasets. Such an approach can yield many benefits to information professionals, particularly those involved in text-heavy research projects. This article will discuss the functionality and possibilities of text mining, as well as the basic setup necessary for novice R users to employ the RStudio integrated development environment (IDE. Common use cases, such as analyzing a corpus of text documents or spreadsheet text data, will be covered, as well as the text mining tools for calculating term frequency, term correlations, clustering, creating wordclouds, and plotting.

  1. Review: Current writing: Text and reception in Southern Africa

    Directory of Open Access Journals (Sweden)

    A. L. Combrink

    1990-05-01

    Full Text Available Current writing: Text and reception in Southern Africa. (Published by the University of Natal under the joint editorship of Margaret Lenta, Michael Chapman, Margaret Daymond and Johan U. Jacobs. Volume 1, 1989 - editor: Margaret Lenta

  2. Complex network analysis of literary and scientific texts

    CERN Document Server

    Grabska-Gradzinska, Iwona; Kwapien, Jaroslaw; Drozdz, Stanislaw

    2012-01-01

    We present results from our quantitative study of statistical and network properties of literary and scientific texts written in two languages: English and Polish. We show that Polish texts are described by the Zipf law with the scaling exponent smaller than the one for the English language. We also show that the scientific texts are typically characterized by the rank-frequency plots with relatively short range of power-law behavior as compared to the literary texts. We then transform the texts into their word-adjacency network representations and find another difference between the languages. For the majority of the literary texts in both languages, the corresponding networks revealed the scale-free structure, while this was not always the case for the scientific texts. However, all the network representations of texts were hierarchical. We do not observe any qualitative and quantitative difference between the languages. However, if we look at other network statistics like the clustering coefficient and the...

  3. Generating Weather Forecast Texts with Case Based Reasoning

    OpenAIRE

    Adeyanju, Ibrahim

    2015-01-01

    Several techniques have been used to generate weather forecast texts. In this paper, case based reasoning (CBR) is proposed for weather forecast text generation because similar weather conditions occur over time and should have similar forecast texts. CBR-METEO, a system for generating weather forecast texts was developed using a generic framework (jCOLIBRI) which provides modules for the standard components of the CBR architecture. The advantage in a CBR approach is that systems can be built...

  4. Texting while driving as impulsive choice: A behavioral economic analysis

    OpenAIRE

    Hayashi, Yusuke; Russo, Christopher T.; Wirth, Oliver

    2015-01-01

    The goal of the present study was to examine the utility of a behavioral economic analysis to investigate the role of delay discounting in texting while driving. A sample of 147 college students completed a survey to assess how frequently they send and read text messages while driving. Based on this information, students were assigned to one of two groups: 19 students who frequently text while driving and 19 matched-control students who infrequently text while driving but were similar in gend...

  5. Text comparison using word vector representations and dimensionality reduction

    OpenAIRE

    Heuer, Hendrik

    2016-01-01

    This paper describes a technique to compare large text sources using word vector representations (word2vec) and dimensionality reduction (t-SNE) and how it can be implemented using Python. The technique provides a bird's-eye view of text sources, e.g. text summaries and their source material, and enables users to explore text sources like a geographical map. Word vector representations capture many linguistic properties such as gender, tense, plurality and even semantic concepts like "capital...

  6. TYPES OF REPETITIONS IN TEXTS MANSI CHILDREN’S FOLKLORE

    OpenAIRE

    Kumaeva Maria Vladimirovna

    2012-01-01

    The article discusses the various types of repetitions in the text Mansi children's folklore, in such genres as lullabies, tales, riddles. A characteristic of the individual types of repeats and the functions they perform in the texts. Numerous repetitions that occur in the texts of folklore, are compositional and stylistic value. We have seen reruns of stylistic values. The aim of this work is the identification and classification of types of repetition in the text children's folklore - lull...

  7. Generating an Ordered Data Set from an OCR Text File

    Directory of Open Access Journals (Sweden)

    Jon Crump

    2014-11-01

    Full Text Available This tutorial illustrates strategies for taking raw OCR output from a scanned text, parsing it to isolate and correct essential elements of metadata, and generating an ordered data set (a python dictionary from it. These illustrations are specific to a particular text, but the overall strategy, and some of the individual procedures, can be adapted to organize any scanned text, even if it doesn’t look like this one.

  8. Interpretation of Overly Open Advertising Texts in Print Media

    OpenAIRE

    Chen, Yuchi

    2011-01-01

    The advertising texts of print media is one of the important manner to communicate the marketing messages with consumers. Under the post modern era, the form of advertising texts has been influenced by the reading habit of readers. Overly open advertising texts which is anti-formed, irrational appealing, image centred and ambiguous, is popular for marketers to transmitting their product and brand information. However, whether the overly open advertising texts is interpreted by consumers, to w...

  9. Parody adverstising texts in V.Pelevin's novel GENERATION "P"

    OpenAIRE

    Maslova, Julija

    2006-01-01

    The master`s thesis analyses advertising texts in V. Pelevin`s novel Generation “P” from the point of view of parody of really existing advertising and from the aspect of intertextuality and linguistic analysis. Linguistic analysis involves finding out the linguistic means used in the base text (real advertising) and the prototype text (parodic advertising), comparison and analysis of these means. Linguistic realization of the category of intertextuality in advertising texts (quotes, allusion...

  10. Classifying racist texts using a support vector machine

    OpenAIRE

    Greevy, Edel; Alan F. SMEATON

    2004-01-01

    In this poster we present an overview of the techniques we used to develop and evaluate a text categorisation system to automatically classify racist texts. Detecting racism is difficult because the presence of indicator words is insufficient to indicate racist texts, unlike some other text classification tasks. Support Vector Machines (SVM) are used to automatically categorise web pages based on whether or not they are racist. Different interpretations of what constitutes a term are taken, a...

  11. Text mining meets workflow: linking U-Compare with Taverna

    OpenAIRE

    Kano, Yoshinobu; Dobson, Paul; Nakanishi, Mio; Tsujii, Jun'ichi; Ananiadou, Sophia

    2010-01-01

    Summary: Text mining from the biomedical literature is of increasing importance, yet it is not easy for the bioinformatics community to create and run text mining workflows due to the lack of accessibility and interoperability of the text mining resources. The U-Compare system provides a wide range of bio text mining resources in a highly interoperable workflow environment where workflows can very easily be created, executed, evaluated and visualized without coding. We have linked U-Compare t...

  12. Predicting Abnormal Returns From News Using Text Classification

    OpenAIRE

    Luss, Ronny; d'Aspremont, Alexandre

    2008-01-01

    We show how text from news articles can be used to predict intraday price movements of financial assets using support vector machines. Multiple kernel learning is used to combine equity returns with text as predictive features to increase classification performance and we develop an analytic center cutting plane method to solve the kernel learning problem efficiently. We observe that while the direction of returns is not predictable using either text or returns, their size is, with text featu...

  13. Estimation of Morphological Tables Using Text Analysis Results

    Directory of Open Access Journals (Sweden)

    Illia Savchenko

    2016-08-01

    Full Text Available This paper proposes methods for obtaining input data, necessary for the modified morphological analysis method, from the text sources of data using text analysis tools. Several methods are described that are suitable for calculating initial estimates of alternatives and cross-consistency matrix values based on processing text fragments by rule-based categorization and sentiment analysis tools. A practical implementation of this tool set for assessing statements in news regarding Ukraine is considered.

  14. AN EFFICIENT TEXT CLASSIFICATION USING KNN AND NAIVE BAYESIAN

    OpenAIRE

    J.Sreemathy; P. S. Balamurugan

    2012-01-01

    The main objective is to propose a text classification based on the features selection and preprocessing thereby reducing the dimensionality of the Feature vector and increase the classificationaccuracy. Text classification is the process of assigning a document to one or more target categories, based on its contents. In the proposed method, machine learning methods for text classification is used to apply some text preprocessing methods in different dataset, and then to extract feature vecto...

  15. MINING TEXTS TO UNDERSTAND CUSTOMERS' IMAGE OF BRANDS

    Directory of Open Access Journals (Sweden)

    Hyung Jun Ahn

    2013-06-01

    Full Text Available Text mining is becoming increasingly important in understanding customers and markets these days. This paper presents a method of mining texts about customer sentiments using a network analysis technique. A data set collected about two global mobile device manufactures were used for testing the method. The analysis results show that the method can be effectively used to extract key sentiments in the customers' texts.

  16. Effects of H3O+, OH-, \\text{O}_{2}^{-} , \\text{NO}_{\\text{x}}^{-} and NO x for Escherichia coli inactivation in atmospheric pressure DC corona discharges

    Science.gov (United States)

    Sekimoto, Kanako; Gonda, Rena; Takayama, Mitsuo

    2015-08-01

    The effects of ionic and neutral species such as H3O+, OH-, \\text{O}2- , \\text{NO}x- (x = 2, 3), and NO x on Escherichia coli (E. coli) inactivation in gas and liquid phases was investigated using atmospheric pressure DC corona discharges with point-to-plane electrodes. The above chemical species as well as OH and O3 were selectively irradiated onto E. coli suspensions on agar plates using a needle angle of 45° with respect to the plates, airflow, and a grid plate. Irradiation with the positive ion H3O+ did not inactivate E. coli, while the negative ions OH-/\\text{O}2- resulted in bactericidal inactivation, in both gas and liquid phases. In contrast, the negative ions \\text{NO}x- and neutral species NO x in the gas phase had quite strong bactericidal effects on E. coli compared to those species in the liquid phase. These results suggest that liquid-phase HNO3, formed primarily via the reaction of gas-phase \\text{NO}x- and NO x with H2O in agar, has only a weak inactivation effect on E. coli. Furthermore, using naphthylethylenediamine spectrophotometry, the threshold amount of gas-phase \\text{NO}x- and NO x for E. coli inactivation was determined to be  ≈1.3   ×   10-9 mol mm-1.

  17. Districts Gear up for Shift to Informational Texts

    Science.gov (United States)

    Gewertz, Catherine

    2012-01-01

    The Common Core State Standards' emphasis on informational text arose in part from research suggesting that employers and college instructors found students weak at comprehending technical manuals, scientific and historical journals, and other texts pivotal to work in those arenas. The common core's vision of informational text includes literary…

  18. Comprehension Strategy Instruction: Teaching Narrative Text Structure Awareness

    Science.gov (United States)

    Dymock, Susan

    2007-01-01

    Research shows that students who have a good understanding of narrative text structure have fewer problems comprehending stories. Research also suggests that many students require explicit instruction in how to comprehend this text type. While some children are able to figure out the more elaborate structure of narrative text on their own (i.e.,…

  19. Demo: Using RapidMiner for Text Mining

    OpenAIRE

    Shterev, Yordan

    2013-01-01

    In this demo the basic text mining technologies by using RapidMining have been reviewed. RapidMining basic characteristics and operators of text mining have been described. Text mining example by using Navie Bayes algorithm and process modeling have been revealed.

  20. Guiding Readers to New Understandings through Electronic Text.

    Science.gov (United States)

    Patterson, Nancy, Ed.; Pipkin, Gloria, Ed.

    2001-01-01

    Argues that computer technology can help to engage struggling readers in meaningful transactions with text. Lists and describes seven web sites that will captivate reluctant readers. Notes three web sites that send students on "WebQuests" to transact with text in order to build knowledge. Discusses other ways to engage students in text via…

  1. Gender differences in psychosocial predictors of texting while driving.

    Science.gov (United States)

    Struckman-Johnson, Cindy; Gaster, Samuel; Struckman-Johnson, Dave; Johnson, Melissa; May-Shinagle, Gabby

    2015-01-01

    A sample of 158 male and 357 female college students at a midwestern university participated in an on-line study of psychosocial motives for texting while driving. Men and women did not differ in self-reported ratings of how often they texted while driving. However, more women sent texts of less than a sentence while more men sent texts of 1-5 sentences. More women than men said they would quit texting while driving due to police warnings, receiving information about texting dangers, being shown graphic pictures of texting accidents, and being in a car accident. A hierarchical regression for men's data revealed that lower levels of feeling distracted by texting while driving (20% of the variance), higher levels of cell phone dependence (11.5% of the variance), risky behavioral tendencies (6.5% of the variance) and impulsivity (2.3%) of the variance) were significantly associated with more texting while driving (total model variance=42%). A separate regression for women revealed that higher levels of cell phone dependence (10.4% of the variance), risky behavioral tendencies (9.9% of the variance), texting distractibility (6.2%), crash risk estimates (2.2% of the variance) and driving confidence (1.3% of the variance) were significantly associated with more texting while driving (total model variance=31%.) Friendship potential and need for intimacy were not related to men's or women's texting while driving. Implications of the results for gender-specific prevention strategies are discussed. PMID:25463963

  2. PISA - A procedure for analyzing the structure of explanatory texts.

    NARCIS (Netherlands)

    Sanders, T.J.M.; van Wijk, C.

    1996-01-01

    Linguistic analyses of text corpora have contributed to the understanding of natural language processing in both reading and writing. However, the impact of text analysis in psycho-linguistic research has been limited, mainly because the analyses hardly ever concern text structure. Existing models f

  3. On the Concept of Zero Meaning of Text

    Directory of Open Access Journals (Sweden)

    Nikitina E.S.

    2015-08-01

    Full Text Available In the semiotic tradition text is considered a sign with its own content. This content is shaped by three meanings within three spaces of sign: semantic, syntactic and pragmatic. It is crucial that text is heterogeneous from the point of view of meaning organization. Three spaces or three spheres of experience integrated within text existential, rational and communicative focus upon themselves the narrative, typological and paralogical meanings of text. These meanings constitute the true 'pattern' of text. The world of text is the one created, arranged and thought over in great details. The first layer is the level of the plot of existence. And since text is, by definition, an intertextual determinacy, in communication this level of meaning acts as the initial, or zero, meaning in the process of understanding text. However, understanding content only begins at this point, continuing through the typological level and, further on, through interpretational practices, finally reaching the paralogical subtleties of understanding. Text is a reality oriented at being understood. And it is the very structure of meaning of text that shapes the technologies of understanding. One can and must be taught to understand. The paper addresses the concept of zero meaning as the initial, existential layer of meanings that form the subjectness of text

  4. Recognition of pornographic web pages by classifying texts and images.

    Science.gov (United States)

    Hu, Weiming; Wu, Ou; Chen, Zhouyao; Fu, Zhouyu; Maybank, Steve

    2007-06-01

    With the rapid development of the World Wide Web, people benefit more and more from the sharing of information. However, Web pages with obscene, harmful, or illegal content can be easily accessed. It is important to recognize such unsuitable, offensive, or pornographic Web pages. In this paper, a novel framework for recognizing pornographic Web pages is described. A C4.5 decision tree is used to divide Web pages, according to content representations, into continuous text pages, discrete text pages, and image pages. These three categories of Web pages are handled, respectively, by a continuous text classifier, a discrete text classifier, and an algorithm that fuses the results from the image classifier and the discrete text classifier. In the continuous text classifier, statistical and semantic features are used to recognize pornographic texts. In the discrete text classifier, the naive Bayes rule is used to calculate the probability that a discrete text is pornographic. In the image classifier, the object's contour-based features are extracted to recognize pornographic images. In the text and image fusion algorithm, the Bayes theory is used to combine the recognition results from images and texts. Experimental results demonstrate that the continuous text classifier outperforms the traditional keyword-statistics-based classifier, the contour-based image classifier outperforms the traditional skin-region-based image classifier, the results obtained by our fusion algorithm outperform those by either of the individual classifiers, and our framework can be adapted to different categories of Web pages. PMID:17431300

  5. Selecting Texts and Tasks for Content Area Reading and Learning

    Science.gov (United States)

    Fisher, Douglas; Frey, Nancy

    2015-01-01

    For students to learn science, social studies, and technical subjects, their teachers have to engage them in meaningful lessons. As part of those lessons, students read informational texts. The selection of those texts is critical. Teachers can select texts worthy of attention and then align instruction and the post-reading tasks such that…

  6. Constructing a Plan for Text-Based Discussion

    Science.gov (United States)

    DeFrance, Nancy L.; Fahrenbruck, Mary L.

    2016-01-01

    Secondary students are often challenged to make sense of conceptually complex content area texts. Text-based discussion has the potential to facilitate comprehension by engaging students in collaboratively constructing their understanding of important ideas and relationships among ideas while interacting with a demanding text. However, teachers…

  7. Text Complexity: The Importance of Building the Right Staircase

    Science.gov (United States)

    Papola-Ellis, Aimee L.

    2014-01-01

    As more districts begin implementing the Common Core State Standards, text complexity is receiving a lot of discussion. It is important for educators to understand the numerous factors involved with text complexity and to have a wide range of strategies to support students with challenging text. This paper shares data from three elementary…

  8. Metacognitive Strategies Help Students to Comprehend All Text

    Science.gov (United States)

    Eilers, Linda H.; Pinkley, Christine

    2006-01-01

    Reading comprehension instruction in many classrooms focuses on teacher-generated questions which actually measure comprehension of specific text rather than developing metacognitive strategies for comprehending all text. Explicit instruction in the metacognitive strategies of making text connections, predicting, and sequencing, was evaluated for…

  9. Text Readability and Intuitive Simplification: A Comparison of Readability Formulas

    Science.gov (United States)

    Crossley, Scott A.; Allen, David B.; McNamara, Danielle S.

    2011-01-01

    Texts are routinely simplified for language learners with authors relying on a variety of approaches and materials to assist them in making the texts more comprehensible. Readability measures are one such tool that authors can use when evaluating text comprehensibility. This study compares the Coh-Metrix Second Language (L2) Reading Index, a…

  10. Intergeneric Derivation: on the Genealogy of an LSP text

    DEFF Research Database (Denmark)

    Askehave, Inger; Kastberg, Peter

    2001-01-01

    is derived from another text or to establish what aspects of the text have been derived, one must gain control over external variables that are not easily controllable. In our approach, we suggest a method that - while controlling external variables - is designed to isolate a suitable text corpus. Contrary...

  11. Eye movement measures for studying global text processing

    NARCIS (Netherlands)

    Hyönä, J.; Lorch, R.F.; Rinck, M.

    2003-01-01

    In this chapter, we demonstrate the usefulness of the eye tracking method in studying global text processing. By "global text processing," we refer to processes responsible for the integration of information from sentences that are not adjacent in the text. Potential eye movement measures indexing g

  12. 48 CFR 1952.102-2 - Incorporation in full text.

    Science.gov (United States)

    2010-10-01

    ... 48 Federal Acquisition Regulations System 6 2010-10-01 2010-10-01 true Incorporation in full text... Clauses 1952.102-2 Incorporation in full text. All IAAR provisions and clauses shall be incorporated in solicitations and/or contracts in full text....

  13. 48 CFR 2852.102-270 - Incorporation in full text.

    Science.gov (United States)

    2010-10-01

    ... 48 Federal Acquisition Regulations System 6 2010-10-01 2010-10-01 true Incorporation in full text... 2852.102-270 Incorporation in full text. JAR provisions or clauses shall be incorporated in solicitations and contracts in full text....

  14. Guidelines for Effective Usage of Text Highlighting Techniques.

    Science.gov (United States)

    Strobelt, Hendrik; Oelke, Daniela; Kwon, Bum Chul; Schreck, Tobias; Pfister, Hanspeter

    2016-01-01

    Semi-automatic text analysis involves manual inspection of text. Often, different text annotations (like part-of-speech or named entities) are indicated by using distinctive text highlighting techniques. In typesetting there exist well-known formatting conventions, such as bold typeface, italics, or background coloring, that are useful for highlighting certain parts of a given text. Also, many advanced techniques for visualization and highlighting of text exist; yet, standard typesetting is common, and the effects of standard typesetting on the perception of text are not fully understood. As such, we surveyed and tested the effectiveness of common text highlighting techniques, both individually and in combination, to discover how to maximize pop-out effects while minimizing visual interference between techniques. To validate our findings, we conducted a series of crowdsourced experiments to determine: i) a ranking of nine commonly-used text highlighting techniques; ii) the degree of visual interference between pairs of text highlighting techniques; iii) the effectiveness of techniques for visual conjunctive search. Our results show that increasing font size works best as a single highlighting technique, and that there are significant visual interferences between some pairs of highlighting techniques. We discuss the pros and cons of different combinations as a design guideline to choose text highlighting techniques for text viewers.

  15. Online Adaptation for Mobile Device Text Input Personalization

    Science.gov (United States)

    Baldwin, Tyler

    2012-01-01

    As mobile devices have become more common, the need for efficient methods of mobile device text entry has grown. With this growth comes new challenges, as the constraints imposed by the size, processing power, and design of mobile devices impairs traditional text entry mechanisms in ways not seen in previous text entry tasks. To combat this,…

  16. La figure de Pénélope ou l’immobilité dans le contexte des migrations circulaires

    OpenAIRE

    Boyer, Florence

    2015-01-01

    Photographie n° 1 : Battage du mil par des femmes en l’absence de leurs époux. Source : Florence Boyer. Interroger ces deux facettes que sont la mobilité et l’immobilité amène à observer différentes échelles spatiales et temporelles. De même que la « mobilité généralisée », expression de plus en plus présente dans la littérature scientifique, n’a pas vraiment de réalité, « l’immobilité généralisée » est très difficilement observable. Ainsi, dans des contextes de migrations internationales, o...

  17. THE IMPACT OF TEXT DRIVING ON DRIVING SAFETY

    Directory of Open Access Journals (Sweden)

    Sanaz Motamedi

    2016-09-01

    Full Text Available In an increasingly mobile era, the wide availability of technology for texting and the prevalence of hands-free form have introduced a new safety concern for drivers. To assess this concern, a questionnaire was first deployed online to gain an understanding of drivers’ text driving experiences as well as their demographic information. The results from 232 people revealed that the majority of drivers are aware of the associated risks with texting while driving. However, more than one-fourth of them still frequently send or read text messages while driving. In addition to the questionnaire, through the use of a virtual-reality driving simulator, this study examined drivers’ driving performance while they were engaged in some forms of text driving under different challenging traffic conditions. Through a blocked factorial experiment, drivers would either read a text message or respond to it with two levels of text complexity while using either hand-held or hands-free texting method. Their driving performance was assessed based on the number of driving violations observed in each scenario. Conclusions regarding the impacts of different forms of texting, text complexity, and response mode on drivers driving performance were drawn.

  18. Text Analytics: the convergence of Big Data and Artificial Intelligence

    Directory of Open Access Journals (Sweden)

    Antonio Moreno

    2016-03-01

    Full Text Available The analysis of the text content in emails, blogs, tweets, forums and other forms of textual communication constitutes what we call text analytics. Text analytics is applicable to most industries: it can help analyze millions of emails; you can analyze customers’ comments and questions in forums; you can perform sentiment analysis using text analytics by measuring positive or negative perceptions of a company, brand, or product. Text Analytics has also been called text mining, and is a subcategory of the Natural Language Processing (NLP field, which is one of the founding branches of Artificial Intelligence, back in the 1950s, when an interest in understanding text originally developed. Currently Text Analytics is often considered as the next step in Big Data analysis. Text Analytics has a number of subdivisions: Information Extraction, Named Entity Recognition, Semantic Web annotated domain’s representation, and many more. Several techniques are currently used and some of them have gained a lot of attention, such as Machine Learning, to show a semisupervised enhancement of systems, but they also present a number of limitations which make them not always the only or the best choice. We conclude with current and near future applications of Text Analytics.

  19. Structure strategy interventions: Increasing reading comprehension of expository text

    Directory of Open Access Journals (Sweden)

    Bonnie J. F. MEYER

    2011-11-01

    Full Text Available In this review of the literature we examine empirical studies designed to teach the structure strategy to increase reading comprehension of expository texts. First, we review the research that has served as a foundation for many of the studies examining the effects of text structure instruction. Text structures generally can be grouped into six categories: comparison, problem-and solution, causation, sequence, collection, and description. Next, we provide a historical look at research of structure strategyinterventions. Strategy interventions employ modeling, practice, and feedback to teach students how to use text structure strategically and eventually automatically. Finally, we review recent text structure interventions for elementary school students. We present similarities and differences among these studies and applications for instruction. Our review of intervention research suggests that direct instruction, modeling, scaffolding, elaborated feedback, and adaptation of instruction to student performance are keys in teaching students to strategically use knowledge about text structure.

  20. Text analysis with R for students of literature

    CERN Document Server

    Jockers, Matthew L

    2014-01-01

    Text Analysis with R for Students of Literature is written with students and scholars of literature in mind but will be applicable to other humanists and social scientists wishing to extend their methodological tool kit to include quantitative and computational approaches to the study of text. Computation provides access to information in text that we simply cannot gather using traditional qualitative methods of close reading and human synthesis. Text Analysis with R for Students of Literature provides a practical introduction to computational text analysis using the open source programming language R. R is extremely popular throughout the sciences and because of its accessibility, R is now used increasingly in other research areas. Readers begin working with text right away and each chapter works through a new technique or process such that readers gain a broad exposure to core R procedures and a basic understanding of the possibilities of computational text analysis at both the micro and macro scale. Each c...